Efficient Learning of One-Variable Pattern Languages from Positive Data

Authors: Thomas Erlebach, Peter Rossmanith*, Hans Stadtherr, Angelika Steger, and Thomas Zeugmann**

Source: DOI-Technical Report DOI-TR 128, Department of Informatics, Kyushu University, Fukuoka, Japan, December 12, 1996.

Abstract. A pattern is a finite string of constant and variable symbols. The language generated by a pattern is the set of all strings of constant symbols which can be obtained from the pattern by substituting non-empty strings for variables. Descriptive patterns are a key concept for inductive inference of pattern languages. A pattern $\pi$ is descriptive for a given sample if the sample is contained in the language $L(\pi)$ generated by $\pi$ and no other pattern having this property generates a proper subset of the language $L(\pi)$. The best previously known algorithm for computing descriptive one-variable patterns requires time $O(n^4\log n)$, where $n$ is the size of the sample. We present a simpler and more efficient algorithm solving the same problem in time $O(n^2\log n)$. In addition, we give a parallel version of this algorithm that requires time $O(\log n)$ and $O(n^3/\log n)$ processors on an EREW-PRAM. Previously, no parallel algorithm was known for this problem.

Using a modified version of the algorithm computing descriptive one-variable patterns as a subroutine, we devise a learning algorithm for one-variable patterns whose expected total learning time is $O(\ell^2\log\ell)$ provided the sample strings are drawn from the target language according to a probability distribution with expected string length~$\ell$. The probability distribution must be such that strings of equal length have equal probability, but can be arbitrary otherwise. Furthermore, we show how the algorithm for descriptive one-variable patterns can be used for learning one-variable patterns with a polynomial number of superset queries.


Note that we have implemented the learning algorithms presented in this paper, and you may try them out using our one-variable pattern language learning page which is also mirrored in Germany.


* A substantial part of this work has been done while the second author was visiting the Research Institute of Fundamental Information Science (RIFIS) (now Department of Informatics) of Kyushu University at Fukuoka, Japan. This visit has been supported by the Japanese Society for the Promotion of Science under Grant No. 106011. He is greatfully indebted to Setsuo Arikawa for providing excellent working conditions during his stay at RIFIS.

** The fifth author kindly acknowledges the support by the Grant-in-Aid for Scientific Research (C) from the Japan Ministry of Education, Science, Sports, and Culture under Grant No. 07680403.


©Copyright 1996 Authors