N-gram Analysis Based on Zero-Suppressed BDDs

Authors: Ryutaro Kurai, Shin-ichi Minato, and Thomas Zeugmann

Source: New Frontiers in Artificial Intelligence, JSAI 2006 Conference and Workshops, Tokyo, Japan, June 2006, Revised Selected Papers, (Takashi Washio, Ken Satoh, Hideaki Takeda, and Akihiro Inokuchi, Eds.), Lecture Notes in Artificial Intelligence 4384, pp. 289-300, Springer 2007.

Abstract. In the present paper, we propose a new method of n-gram analysis using ZBDDs (Zero-suppressed BDDs). ZBDDs are known as a compact representation of combinatorial item sets. Here, we newly apply the ZBDD-based techniques for efficiently handling sets of sequences. Using the algebraic operations defined over ZBDDs, such as union, intersection, difference, etc., we can execute various processings and/or analyses for large-scale sequence data. We conducted experiments for generating n-gram statistical data for given real document files. The obtained results show the potentiality of the ZBDD-based method for the sequence database analysis.

©Copyright 2007, Springer

Valid HTML 4.1