Ordered term tree languages which are polynomial time inductively inferable from positive data

Authors: Yusuke Suzuki, Takayoshi Shoudai, Tomoyuki Uchida and Tetsuhiro Miyahara

Source: Theoretical Computer Science Vol. 350, Issue 1, January 2006, pp. 63-90
(Special Issue Algorithmic Learning Theory (ALT 2002)).

Abstract. In the fields of data mining and knowledge discovery, many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t are ordered and t has edge labels. In order to represent structural features common to such semistructured data, we propose a linear ordered term tree, which is a rooted tree pattern consisting of ordered tree structures and internal structured variables with distinct variable labels. For a set of edge labels Λ let OTTΛ be the set of all linear ordered term trees. For a linear ordered term tree t in OTTΛ, the term tree language of t, denoted by LΛ(t), is the set of all ordered trees obtained from t by substituting arbitrary ordered trees for all variables in t. Given a set of ordered trees S, the minimal language problem for OTTΛ = {LΛ(t) | tOTTΛ} is to find a linear ordered term tree t in OTTΛ such that LΛ(t) is minimal among all term tree languages which contain all ordered trees in S. We show that the class OTTΛ is polynomial time inductively inferable from positive data, by giving a polynomial time algorithm for solving the minimal language problem for OTTΛ.


Keywords: Machine learning; Inductive inference; Tree structured pattern


©Copyright 2005 Elsevier Science B.V.