Average-Case Analysis of Classification Algorithms for Boolean Functions and Decision Trees

Author: Tobias Scheffer.

Source: Lecture Notes in Artificial Intelligence Vol. 1968, 2000, 194 - 208.

Abstract. We conduct an average-case analysis of the generalization error rate of classifcation algorithms with finite model classes. Unlike worst-case approaches, we do not rely on bounds that hold for all possible learning problems. Instead, we study the behavior of a learning algorithm for a given problem, taking properties of the problem and the learner into account. The solution depends only on known quantities (e.g., the sample size), and the histogram of error rates in the model class which we determine for the case that the sought target is a randomly drawn Boolean function. We then discuss how the error histogram can be estimated from a given sample and thus show how the analysis be applied approximately in the more realistic scenario that the target is unknown. Experiments show that our analysis can predict the behavior of decision tree algorithms fairly accurately even if the error histogram is estimated from a sample.

©Copyright 2000 Springer