ALT Page 1

# Learning in the Limit

Learning in the limit is a very general paradigm. For defining it, we assume any recursively enumerable set X and refer to it as to the learning domain. Any subset c of X is called a concept, and any collection C of concepts is a concept class, i.e., C is a subset of 2X.

The information given to the learner are successively growing initial segments of a data sequence d = d0,d1,d2,... of the target concept. By data(c) we denote the set of all data sequences for concept c. Furthermore, we use
d[j] = df d0,d1, ... , dj
to denote the initial segment of the data sequence d of length j + 1.

Next, we specify what the dj are standing for. Thereby we distinguish between learning from positive data or, synonymously from text, and learning from both positive and negative data, or synonymously from informant. In the first case, the elements of a data sequence are any elements of X that belong to the target concept c. Usually, a text is additionally required to exhaust the target concept, i.e., range(d) = c. That is, a positive presentation must contain all the elements belonging to the target at least once, and none of the elements that do not belong to c. We use text(c) to denote the set of all texts for c.

In case of informants, each dj is a labeled example, i.e., dj = (xj,bj), where bj = 0, if xj does not belong to c and bj = 1, otherwise. An informant is usually required to contain every element from the learning domain X at least once. That is, the set of all examples labeled 1 is exhausting the target c, and the set of all examples labeled 0 is just X\c. By info(c) we denote the set of all informants for c.

Note however, that we are going to relax the requirements posed to a text and informant to exhaust the target concept and the learning domain, respectively, by requiring both to contain ``enough information'' to learn. What is meant by enough will be explained later.

A limit learner is an inductive inference machine (abbr. IIM). Such a machine M works as follows. As inputs it gets incrementally growing segments of a positive presentation (resp. of an informant) d. After each new input, it outputs a hypothesis M(d[j]) from a predefined hypothesis space H. Each hypothesis refers to a unique element of the target concept class. Note that the hypothesis space and the concept class can coincide.

Definition: Let C be a concept class and let H be a hypothesis space for it. C is called learnable in the limit from text (informant) if there is an IIM M such that for every c in C and every d in text(c) (resp. d in info(c))

[1] M(d[j]) is defined for all j,

[2] M(d[j]) = h for all but finitely many j, where h in H is a hypothesis referring to c.

By LIM we denote the collection of all concepts classes C that are learnable in the limit. Note that instead of LIM sometimes EX is used.

It can be shown that many interesting concepts classes are learnable in the limit from informant. As far as learning from text is concerned, the situation is more subtle. But before developing these topics further, it should be noted that there are some major

 Points of Concern

• In the definition given above, the limit learner has always access to the whole actual initial segment of the data sequence provided. Clearly, in practice such a learner would run out of space pretty soon.

• Since a limit learner is only supposed to converge, one never knows at any particular learning stage whether or not it has already been successful.
Such an uncertainty may be not acceptable in many applications.

• If one modifies the definition of learning in the limit by adding the requirement that convergence should be decidable, one arrives at finite learning. However, the power of finite learning is much smaller than that of learning in the limit.

• It seems difficult to define an appropriate measure for the complexity of learning.
We shall outline and/or discuss possibilities to overcome these points of concern on the following pages.

Continue to Complexity