Learning regular languages using RFSAs

Authors: François Denis, Aurélien Lemay and Alain Terlutte
Email: lemay@univ-lille3.fr

Source: Theoretical Computer Science Vol. 313, Issue 2, 17 February 2004, pp. 267-294.

Abstract. Residual languages are important and natural components of regular languages and several grammatical inference algorithms naturally rely on this notion. In order to identify a given target language L, classical inference algorithms try to identify words which define identical residual languages of L. Here, we study whether it could be interesting to perform a tighter analysis by identifying inclusion relations between the residual languages of L. We consider the class of Residual Finite State Automata (RFSAs). An RFSA A is a NonDeterministic Automaton whose states corresponds to residual languages of the language LA it recognizes. The inclusion relations between residual languages of LA can be naturally materialized on A. We prove that the class of RFSAs is not polynomially characterizable. We lead some experiments which show that when a regular language is randomly drawn by using a nondeterministic representation, the number of inclusion relations between its residual languages is very important. Moreover, its minimal RFSA representation is much smaller than its minimal DFA representation. Finally, we design a new learning algorithm, DeLeTe2, based on the search for the inclusion relations between the residual languages of the target language. We give sufficient conditions for the identifiability of the target language. We experimentally compare the performance of DeLeTe2 to those of classical inference algorithms.

©Copyright 2004 Elsevier Science B.V.