Author: John Case
Source: SIAM Journal on Computing Vol. 28, No. 6, 1999, 1941-1969.
Abstract. Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n + 1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith for learnability from both positive and negative data.
A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle provides a necessary condition for avoiding overgeneralization in learning from positive data. It is applied to prove another theorem to the effect that one can optimally eliminate half of the mistakes from final programs for vacillatory criteria if one is willing to converge in the limit to infinitely many different programs instead.
Child language learning may be sensitive to the order or timing of data
presentation. It is shown, though, that for the vacillatory success criteria
of this paper, there is no loss of learning power for
machines which are insensitive to order in several ways
simultaneously. For example, partly set-driven machines attend
only to the set and length of sequence of positive data,
not the actual sequence itself. A machine M is weakly
n-ary order independent
©Copyright 1999, Society for Industrial and Applied Mathematics.
for each language
n
such that M converges to a subset of this same D for each
ordering of the positive data for L. The theorem most difficult
to prove in the paper implies that machines which
are simultaneously partly set-driven and weakly n-ary order
independent do not lose learning power for converging in the limit to up to
n grammars. Several variants of this theorem are
obtained by modifying its proof, and some of these variants have application
in this and other papers. Along the way it is also shown, for the
vacillatory criteria, that learning power is not
increased if one restricts the sequence of positive data presentation
to be computable. Some of these results are nontrivial lifts of prior work
for the n = 1 case done by the Blums; Wiehagen;
Osherson, Stob, and Weinstein; Schäfer; and Fulk.