TCS-TR-A-06-16

Date: Sat Jun 17 07:22:56 2006

Title: N-gram Analysis Based on Zero-suppressed BDDs

Authors: Ryutaro Kurai, Shin-ichi Minato, and Thomas Zeugmann

Contact:

  • First name: Shin-ichi
  • Last name: Minato
  • Address: Division of Computer Science, Hokkaido University, N-14, W-9, Sapporo 060-0814, Japan
  • Email: minato@ist.hokudai.ac.jp

Abstract. In present paper, we propose a new method of n-gram analysis using ZBDDs (Zero-suppressed BDDs). ZBDDs are known as a compact representation of combinatorial item sets. Here, we newly apply the ZBDD-based techniques for efficiently handling sets of sequences. Using the algebraic operations defined over ZBDDs, such as union, intersection, difference, etc., we can execute various processings and/or analyses for large-scale sequence data. We conducted experiments for generating n-gram statistical data for given real document files, and the obtained results show the potentiality of the ZBDD-based method for the sequence database analysis.


©Copyright 2006 Authors