TCS-TR-A-08-36

Date: Tue Nov 18 12:07:35 2008

Title: Suffix Tree Based VF-Coding for Compressed Pattern Matching

Authors: Takuya Kida

Contact:

  • First name: Takuya
  • Last name: Kida
  • Address: Hokkaido University, Kita 14-jo Nishi 9-chome, 060-0814 Sapporo, Japan.
  • Email: kida@ist.hokudai.ac.jp

Abstract. In this paper, we study the coding efficiency of VF codes and the computational complexity of compressed pattern matching on them. VF codes as typified by Tunstall code are paid less attention to so far as compared to FV code such as Huffman code due to its mild compression ratio. From a viewpoint of compressed pattern matching, however, VF codes have some desirable features. We also propose a new VF code called ST-VF code which uses a frequency-base-pruned suffix tree as a parse tree. We also discuss an efficient compressed pattern matching algorithm on ST-VF codes, and show it runs in $O(n+R)$ time for scanning with $O(|D|+m^2)$ time and space for preprocessing, where $n$ is the compressed text length, $R$ is the number of occurrences, $|D|$ is the dictionary size, and $m$ is the pattern length. Some experimental results show that the proposed code achieves the compression ratio of about $41\%\sim 50\%$, which is better than Tunstall code and Huffman code. Moreover a preliminary experiment on speed comparison of compressed pattern matching algorithms on Tunstall code and the proposed code are presented.


©Copyright 2008 Authors