### Abduction and the Dualization Problem* (invited lecture for DS 2003)

Author: Thomas Eiter

Affiliation: Knowledge-Based Systems Group, Institute of Information Systems, Vienna University of Technology, Vienna, Austria

Abstract. Abduction is a fundamental mode of reasoning which was extensively studied by C.S. Peirce, who also introduced the term for inference of explanations for observed phenomena. Abduction has taken on increasing importance in Artificial Intelligence (AI) and related disciplines, where it has been recognized as an important principle of common-sense reasoning. It has applications in many areas of AI and Computer Science including diagnosis, database updates, planning, natural language understanding, learning, to number some of them.

In a logic-based setting, abduction can be viewed as the task to find, given a set of formulas T (the background theory) and a formula Q (the query), a minimal set of formulas E (an explanation) from a set of abducibles A such that T union E is satisfiable and logically entails Q. In many application scenarios T is a propositional Horn theory, Q is a literal or a conjunction of literals, and the abducibles A are certain literals of interest. For computing abductive explanations, well-known early systems such as Poole's Theorist or assumption-based Truth Maintenance Systems have been devised in the 1980s.

Besides computing some explanation, the problem of generating several or all explanations has received more attention in the last years. This is important since often one would like to select one out of a set of alternative explanations according to a preference or plausibility relation. In general, a query may have exponentially many explanations, and thus efficient generation of all explanations may be understood in terms of polynomial total-time (or output-polynomial time), i.e., in time polynomial in the combined size of the input and the output. Furthermore, if exponential resources are prohibitive, one would like to know whether a few explanations (e.g., polynomially many) can be generated in polynomial time.

In recent and ongoing work, we have investigated the computational complexity of these issues, and compiled several results for charting the tractability / intractability frontier of these problems. In this talk, we shall recall some of the results and then focus on abduction from Horn theories represented by their characteristic models. In this setting, the background theory T is represented by a set of so called characteristic models, charset(T), rather than by formulas. The benefit is that for certain formulas, logical consequence from T efficiently reduces to deciding consequence from charset(T) (which is easy) and thus admits tractable inference. In fact, finding some abductive explanation for a query literal is polynomial in this setting, while this is well-known to be NP-hard under formula-based representation.

Computing all abductive explanations for a query literal is polynomial-time equivalent (in a precise sense) to the problem of dualizing a Boolean function given by a monotone CNF. The latter problem, Monotone Dualization, is with respect to complexity a somewhat mysterious problem which since more than 20 years resists to a precise classification in terms of well-established complexity classes. Currently, no polynomial total-time algorithm solving this problem is known; on the other hand, there is also no stringent evidence that such an algorithm is unlikely to exist (such as, eg., coNP-hardness of the associated decision problem whether, given two monotone CNFs F and G, they represent dual functions). On the contrary, results in the 1990's provided some hints that the problem is closer to polynomial total-time, since as shown by Fredman and Khachyian, the decisional variant can be solved in quasi-polynomial time, i.e., in time O(n^{log n}). This was recently refined to solvability in polynomial time with limited nondeterminism, i.e., using a poly-logarithmic number of bit guesses. We will consider some possible extensions of the results for abductive explanations which are polynomial-time equivalent to Monotone Dualization.

Apart from this peculiarity, Monotone Dualization has been recognized as an important problem since there are a large number of other problems in Computer Science which are known to be polynomial-time equivalent to this problem. It has a role similar to the one of SAT for the class NP: A polynomial total-time algorithm for Monotone Dualization implies polynomial total-time algorithms for all the polynomial-time equivalent problems.

We consider some possible extensions og the results for abductive explanations which are polynomial-time equivalent to Monotone Dualization. Besides generating all abductive explanations for a literal, there are many other problems in Knowledge Discovery and Data Mining which are polynomial-time equivalent or closely related to Monotone Dualization, including learning with oracles, computation of infrequent and frequent sets, and key generation. We shall give a brief account of such problems, and finally will conclude with some open problems and issues for future research.

The results presented are joint work with Kazuhisa Makino, Osaka University.

*The full version of this paper is published in the Proceedings of the 6th International Conference on Discovery Science, Lecture Notes in Artificial Intelligence Vol. 2843