1-Variable Pattern Learner with Empty Substitutions II

Faster Version with Complete Compatibility Tests, Animation

We present an animation of an average case optimal learning algorithm for 1-variable pattern languages first described in the technical report [1] and in extended and more polished form in the article [3].
This animation is an extension to the case that also examples may occur which are generated by substituting the pattern variable with the empty string.
This implementation uses the version of our 1-variable pattern language learner with faster convergence by performing complete compatibility tests.

You can also use the simpler version ; (if you still use netscape 4.x, please click here).
If you want to test the algorithm actively by providing your own inputs click here.

Outline of the Algorithm
Receiving a sequence of sample strings from an unknown 1-variable pattern language the algorithm computes a sequence of hypotheses that explain the samples seen. At any stage of the learning procedure the algorithm remembers the common prefix and suffix of all strings received so far plus a single appropriate example among them (that means it requires only constant space).

Starting the Animation
Before starting the algorithm one has to choose
the size of the alphabet {a,b,c,...} of the pattern language, which has to be at least 2,
a 1-variable pattern p, for example abxbcaxxdad, where x denotes the pattern variable, and
a probability distribution for substituting appearances of the pattern variable x in p by a string w
over the pattern alphabet. To visualize the probability distribution this animation has to limit the maximal size of the alphabet as specifed below.

Distributions
For the length uniform distribution the length of w is allowed to vary between 1 and 24. This value is probabilistically chosen according to weights. To each possible length assign as weight a natural number between 0 and 10 (the default is 0, but at least one weight should be positive). The weights define the frequency by which each specific length is chosen (the fraction between an individual weight and the sum of all weights gives its probability).
The maximal alphabet size is 10. For each new sample string after having selected a length the animation then will probabilistically select a string of this length over the chosen alphabet with uniform probability.

In the Markov chain distribution the substitution string w is generated by a random walk in the alphabet. You may choose weights for the transitions between the letters of the alphabet. (NOTE: the transition graph derived from the positive weights should have the property that the random walk will eventually reach the final state with probability 1. This animation will not check this property and thus will be trapped if you choose unsuitable weights).
The maximal alphabet size is 4.

In case of a symmetric distribution you may also choose weights for the k-symmetries of a substitution w (that means w = y^k z y^k for substrings y, z - for details see the paper).
k runs from 0 to 6. (NOTE: If you give weight 0 to the case k=0 it will imply that only symmetric substitutions occur. In this case any algorithm (including this one) may not be able to deduce the correct hypothesis because it does not get enough information about the pattern language.)
y and z will be chosen length uniform with a maximal length 6.
The maximal alphabet size is 5 in this case.

The Animation

Bibliography

Rüdiger Reischuk and Thomas Zeugmann
Learning One-Variable Pattern Languages in Linear Average Time.
Technical Report DOI-TR-140, Department of Informatics, Kyushu University, Fukuoka, Japan, September 1997.
[Abstract] [Full Paper].
R. Reischuk and T. Zeugmann, Learning One-Variable Pattern Languages in Linear Average Time, in ``Proc. 11th Annual Conference on Computational Learning Theory - COLT'98,'' July 24th - 26th, Madison, pp. 198 - 208, ACM Press 1998.
[Abstract] [Full Paper].
R. Reischuk and T. Zeugmann, An Average-Case Optimal One-Variable Pattern Language Learner, Journal of Computer and System Sciences Vol. 60, No. 2, 2000, 302-335. (Special Issue for COLT'98). [Abstract] [Full Paper].

Back to my Homepage

Implemented by Semen Gehman and Rüdiger Reischuk; modified by Thomas Zeugmann.

Last change: November 1, 2004