Editors' Introduction |
Learning theory is a research area involved in the study, design, and analysis
of computer programs that are able to learn from past experiences.
Over the last years the idea of learning system has independently emerged
in a variety of fields: pattern recognition, theory of
repeated games, machine learning, universal prediction and data compression,
inductive inference, adaptive optimal control, computational psychology, and
others.
This is not surprising: on the one hand, the notion of adaptivity is an extremely
attractive and powerful tool; on the other hand, the abstract phenomenon of
learning is so complex that it is unlikely that a single theory will ever
be able to explain it in a fully satisfactory way. Despite their apparent
diversity, these learning models present deep connections whose study is an
active research area in learning theory.
Another important aspect is the increasingly tight interaction between the theoretical analysis and the practical application of learning. The sharpening of the theoretical tools for designing and analyzing learning techniques, the large amounts of real-world data freely available in digital form, and the growing demand of adaptive techniques for e-commerce, computational biology, and other applications, have stimulated the development of new learning algorithms, some of which are successfully used in commercial and scientific applications. The papers collected in this volume offer a broad view on the current research in the field, including studies on the most important learning models (inductive inference, statistical learning, large-margin learning techniques, learning with queries, reinforcement learning). The scope of this research is not limited to theory. It also includes algorithm design and analysis, experimental studies of learning heuristics, and application of learning to engineering problems. Moreover, there is an increasing need to learn from semi-structured data that are typically available within the world-wide-web.
In his invited lecture, Susumi Hayashi develops a mathematics that is based on learning. Traditionally, learning theory is mainly concerned with the learnability of mathematical and logical objects from a finite amount of data. Thus, the main goal is to achieve a better understanding of learning. In contrast, Susumi Hayashi proposes a mathematics that is founded on learning theoretical notions. This research started and is ultimately aimed at testing and debugging formal proofs. Interestingly, when the investigations started, learning theory was not involved at all and at the current stage of investigations it is indispensable, The next invited lecture, given by John Shawe-Taylor and co-authored by Chris Williams, Nello Christianini, and Jaz Kandola, studies the eigenspectrum of the Gram matrix and its relationship to the operator eigenspectrum. It is shown that the difference between the two spectra can be bounded. Moreover, a performance bound on kernel principal component analysis is provided. These results are fundamental for a number of learning algorithms. These algorithms project data into a low dimensional subspace that hopefully will catch most of the variance in the data. Experiments are presented that confirm the theoretical predictions on a real world data set for small projection dimensions. Data mining with graphical models is presented in the invited lecture for DS 2002 by Rudolf Kruse (co-authored by Christian Borgelt). Graphical models are an important technique for classifier construction and dependency analysis. The paper discusses the main ideas of learning graphical models from datasets of sample cases with an emphasis on probabilistic networks. The invited lecture for DS 2002 by Gerhard Widmer deals with the complex phenomenon of expressive music performance. It is a big challenge to capture this phenomenon by using machine learning techniques and automated discovery methods. The lecture reports the results achieved and identifies a number of important research problems.
The paper by Grieser develops the notion of reflective inductive inference. Reflective learners are able to detect the fact that the sequence of examples contains inconsistent information, but occasionally they may also make detection mistakes. The results determine how learners obeying to different notions of reflection behave in different learning models. It turns out that for some learning models changes in the notion of reflection do not change the learning power, whereas in others the learning power is greatly affected by these changes. The work by Martin, Sharma, and Stephan goes towards providing a unified view over logic, learning, and topology. The outcome is a notion of parameterized logic, based on topology, that makes use of learning-theoretic notions for its interpretation and application. The framework is also seen to provide a nice interpolation between applications based on induction and deduction. Finally, the paper by Harizanov and Stephan establishes a number of fundamental results relating the learnability properties and algebraic properties of recursive vector spaces. The main result states that for identification in the limit from positive examples only, the learnability properties can be characterized by algebraic properties, whereas for learning from positive and negative examples this is not true anymore.
The paper by Servedio studies the PAC learnability of another important class of Boolean functions: those representable by constant-depth, polynomial-size circuits of unbounded fan-in (the ACC class). A first result relates the efficient learnability of ACC to that of a different class: the embedded midbit functions. Several positive and negative results on the efficient learnability of this latter class, providing important insights on the learnability of ACC, are shown for the PAC model and some of its restrictions. The last paper in this group is by Bshouty and Burroughs. The main theme is the proof of impossibility results for efficient learning of several subclasses of Boolean functions in an extension of PAC learning called co-agnostic learning. The key proof technique is the construction of reductions between learning problems and well-studied approximability problems for which there exist known hardness results.
The paper by Grieser, Jantke, and Lange applies theoretical results from inductive inference to perform wrapper induction; i.e., selecting interesting fragment of web pages (originated from a given source) on the basis of a few learning examples. The paper describes the theoretical framework together with a functioning system called LExIKON. Suzuki, Shoudai, Uchida, and Miyahara consider algorithms for the efficient learning of ordered tree structures. This rich class of structures is well suited for modeling the parse trees of semi-structured data such as HTML or XML documents.
Two papers by Köbler and Lindner present results that characterize the computational complexity of query learning for concept classes satisfying certain structural requirements based on the notion of general dimension. The main focus of the first paper is to understand when learning with a few queries (of a given type) implies learning in a reasonable amount of time. The second paper uses the general dimension to characterize the complexity of learning using queries whose response is a random variable and when the goal is relaxed to the approximate identification of the target concept. The last paper in this group is by Kalnishkan and Vyugin and investigates the notion of predictive complexity. This quantity is an extension of the Kolmogorov complexity of individual sequences and may be viewed as a measure of learnability of a given data sequence abstracted from any specific learning strategy. Predictive complexity is parameterized by the loss function used to measure the success of learning. The main result of the paper characterizes the set of losses admitting the existence of predictive complexity.
The work by Schmitt applies ideas derived from the Descartes' rule of signs to derive upper and lower bounds on the Vapnik-Chervonenkis dimension and pseudodimension of radial basis function networks (a popular family of neural networks) with one input variable. In the next paper, Vovk studies region predictors. These are on-line learning algorithms for pattern classification parameterized by a confidence value. Whereas usual pattern classification algorithms output a single label as prediction, region predictors output a subset of the possible labels, and the size of the subset may increase as the confidence parameter approaches 1. The paper by Dasgupta, Pavlov, and Singer investigates provably correct and efficient algorithms for reconstructing images. They define a noisy sampling model for images consisting only of lines in the plane and present an algorithm that reconstructs these lines with high probability. A method for transforming any multi-class, multi-label, or ranking learning problem into a binary classification problem with linear hypotheses is proposed in the paper by Har-Peled, Roth, and Zimak. The last paper in this group, by Braess, Forster, Sauer, and Simon, has an information-theoretic flavor: it analyzes the problem of learning a probability distribution over a finite domain of fixed size using the Kullback-Leibler divergence as cost function.
In the first paper, Fürnkranz considers the problem of rule learning using a covering approach; i.e., finding a small set of simple rules that explains (covers) all the positive examples and none of the negative examples. In particular, his work points out a problem in covering strategies that proceed bottom-up (i.e., by generalizing a rule that initially covers a single example) and proposes some techniques to alleviate this problem. In the second and last paper of this group, Fronhöfer and Yamamoto study the notion of relevant logic. A relevant logic is a logic where a formula may be derived as conclusion only if it is deductively related to all the premises. The main idea of the paper is to use relevance as a notion of appropriateness for hypotheses in inductive logic.
The first one is by Gavinsky who proposes a new boosting method. Boosting [2] is a general technique for obtaining a highly accurate hypothesis by combining hypotheses generated on different reweightings of the training set. Good boosting algorithms are adaptive as they work without need to know any a priori lower bound on the accuracy of the individual hypotheses to combine. The price of adaptivity is lack of robustness: traditional boosting is likely to work not so well on noisy data. On the other hand, this new boosting method is the first one able to provide adaptivity together with a certain noise robustness. The second paper in this group is by Kivinen, Smola, and Williamson and investigates two incremental variants of the support vector machine algorithm. The main result shows that these variants are robust with respect to drift of the target. The performance of both algorithms is shown to degrade in proportion to the amount of drift in the considered time interval and also in proportion to the amount of nonlinearity in the data sequence. The third paper, by Forster and Simon, is concerned with some fundamental questions about the use of linear-threshold hypotheses. Statistical learning theory ensures that a good generalization error is achieved whenever the sample can be embedded in a space where some linear hypothesis can be found that separates positive examples from negative ones with large margin. The paper proves bounds relating the spectrum of a matrix derived from the sample to the margin which can be obtained via the embedding technique.
The last paper in this group is by Coulom and applies neural networks to the solution of a motor control problem. The control problem is modeled using reinforcement learning, a very general framework formalizing the learning of plans and complex strategies. The proposed approach is based on the temporal difference learning algorithm coupled with feedforward neural networks. Empirical convergence of the resulting algorithm is shown via simulation of a mechanical system consisting of five articulated segments. |
September 2002 | Nicolò Cesa-Bianchi |
Masayuki Numao | |
Rüdiger Reischuk |