The Power of Vacillation in Language Learning

Author: John Case

Source: SIAM Journal on Computing Vol. 28, No. 6, 1999, 1941-1969.

Abstract. Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n + 1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith for learnability from both positive and negative data.

A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle provides a necessary condition for avoiding overgeneralization in learning from positive data. It is applied to prove another theorem to the effect that one can optimally eliminate half of the mistakes from final programs for vacillatory criteria if one is willing to converge in the limit to infinitely many different programs instead.

Child language learning may be sensitive to the order or timing of data presentation. It is shown, though, that for the vacillatory success criteria of this paper, there is no loss of learning power for machines which are insensitive to order in several ways simultaneously. For example, partly set-driven machines attend only to the set and length of sequence of positive data, not the actual sequence itself. A machine M is weakly n-ary order independent for each language L on which, for some ordering of the positive data about L, M converges in the limit to a finite set of grammars, there is a finite set of grammars D (of cardinality n such that M converges to a subset of this same D for each ordering of the positive data for L. The theorem most difficult to prove in the paper implies that machines which are simultaneously partly set-driven and weakly n-ary order independent do not lose learning power for converging in the limit to up to n grammars. Several variants of this theorem are obtained by modifying its proof, and some of these variants have application in this and other papers. Along the way it is also shown, for the vacillatory criteria, that learning power is not increased if one restricts the sequence of positive data presentation to be computable. Some of these results are nontrivial lifts of prior work for the n = 1 case done by the Blums; Wiehagen; Osherson, Stob, and Weinstein; Schäfer; and Fulk.