TCS-TR-A-10-42

Date: Wed Apr 14 19:50:57 2010

Title: Substring Indices Based on Sequence BDDs

Authors: Shuhei Denzumi, Hiroki Arimura, and Shin-ichi Minato

Contact:

Abstract. There is a demand for efficient indexed-substring data structures, which can store all substrings of a given text. Suffix trees and Directed Acyclic Word Graphs (DAWGs) are examples of substring indices, but they lack operations for manipulating sets of strings. The Sequence Binary Decision Diagram (SeqBDD) data structure proposed by E. Loekito, J. Bailey, and J. Pei (KAIS, 2009) is a new type of Binary Decision Diagram (BDD), and represents sets of sequences. This study focuses mainly on two issues: (a) compact substring indices based on SeqBDD, called Suffix Decision Diagrams (SuffixDDs), which make it possible to represent the set of all substrings efficiently via various operations inherited from Zerosuppressed Binary Decision Diagrams (ZDDs), and (b) methods for building SuffixDD, beginning with an empty string and updating iteratively whenever a new letter is read. This paper presents an efficient algorithm for constructing a SuffixDD for a given text, together with a proof of correctness and some notes about BDD families, and discusses why the new data structure appears to have advantages over existing substring indices. An upper bound on the running time is also obtained. It is hoped that presenting this data structure to a wider audience at this time will help to promote useful discussion of the important issues.


©Copyright 2010 Authors