Robust Inference of Relevant Attributes

Authors: Jan Arpe and Rüdiger Reischuk.

Source: Lecture Notes in Artificial Intelligence Vol. 2842, 2003, 99 - 113.

Abstract. Given n Boolean input variables representing a set of attritubes, we consider Boolean functions f (i.e., binary classifications of tuples) that actually depend only on a small but unknown subset of these variables/attributes, in the following called relevant. The goal is to determine the relevant attributes given a sequence of examples - input vectors X and corresponding classifications f(X). We analyze two simple greedy strategies and prove that they are able to achieve this goal for various kinds of Boolean functions and various input distributions according to which the examples are drawn at random.

This generalizes results obtained by Akutsu, Miyano, and Kuhara for the uniform distribution. The analysis also provides explicit upper bounds on the number of necessary examples. They depend on the distribution and combinatorial properties of the function to be inferred.

Our second contribution is an extension of these results to the situation where attribute noise is present, i.e., a certain number of input bits x i may be wrong. This is a typical situation, e.g., in medical research or computational biology, where not all attributes can be measured reliably. We show that even in such an error-prone situation, reliable inference of the relevant attributes can be performed, because our greedy strategies are robust even against a linear number of errors

©Copyright 2003 Springer