Title: N-gram Analysis Based on Zero-suppressed BDDs

Authors: Ryutaro Kurai, Shin-ichi Minato, and Thomas Zeugmann


Abstract. In present paper, we propose a new method of n-gram analysis using ZBDDs (Zero-suppressed BDDs). ZBDDs are known as a compact representation of combinatorial item sets. Here, we newly apply the ZBDD-based techniques for efficiently handling sets of sequences. Using the algebraic operations defined over ZBDDs, such as union, intersection, difference, etc., we can execute various processings and/or analyses for large-scale sequence data. We conducted experiments for generating n-gram statistical data for given real document files, and the obtained results show the potentiality of the ZBDD-based method for the sequence database analysis.

