**Author: John Case**

**Source: ***SIAM Journal on Computing* Vol. **28**, No. 6,
1999, 1941-1969.

**Abstract.**
Some extensions are considered of Gold's influential model of
language learning by machine from positive data. Studied are criteria of
successful learning featuring convergence in
the limit to vacillation between several alternative correct grammars.
The main theorem of this paper is that there are classes of languages
that can be learned if convergence in the limit to up to
(*n* + 1) exactly correct grammars is allowed but which cannot be
learned if convergence in the limit is to no more than *n* grammars,
where the no more than *n* grammars can each make finitely
many mistakes. This contrasts sharply with results of Barzdin and Podnieks
and, later, Case and Smith for learnability from both positive and negative data.

A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle provides a necessary condition for avoiding overgeneralization in learning from positive data. It is applied to prove another theorem to the effect that one can optimally eliminate half of the mistakes from final programs for vacillatory criteria if one is willing to converge in the limit to infinitely many different programs instead.

Child language learning may be sensitive to the order or timing of data
presentation. It is shown, though, that for the vacillatory success criteria
of this paper, there is no loss of learning power for
machines which are *in*sensitive to order in several ways
simultaneously. For example, *partly set-driven* machines attend
only to the *set* and *length of sequence* of positive data,
not the actual sequence itself. A machine **M** is weakly
*n-ary order independent*
for each language *L*, **M** converges in the limit to a finite set of
grammars, there is a finite set of grammars *D* (of cardinality
* n*
such that **M** converges to a subset of this same *D* for *each*
ordering of the positive data for *L*. The theorem most difficult
to prove in the paper implies that machines which
are simultaneously partly set-driven and weakly *n*-ary order
independent do not lose learning power for converging in the limit to up to
*n* grammars. Several variants of this theorem are
obtained by modifying its proof, and some of these variants have application
in this and other papers. Along the way it is also shown, for the
vacillatory criteria, that learning power is not
increased if one restricts the sequence of positive data presentation
to be computable. Some of these results are nontrivial lifts of prior work
for the *n* = 1 case done by the Blums; Wiehagen;
Osherson, Stob, and Weinstein; Schäfer; and Fulk.

©Copyright 1999, Society for Industrial and Applied Mathematics.