TCS-TR-A-10-44

Date: Mon May 17 20:09:03 2010

Title: An Improvement of STVF Code by Almost Instantaneous Encoding

Authors: Takashi Uemura, Satoshi Yoshida, and Takuya Kida

Contact:

  • First name: Takuya
  • Last name: Kida
  • Address: Graduate School of Information Science and Technology, Hokkaido University. Kita 14-jo Nishi 9-chome, 060-0814 Sapporo, Japan. +81-11-706-7679.
  • Email: kida@ist.hokudai.ac.jp

Abstract. An improved STVF coding algorithm is presented in this paper. STVF code is a novel variable-length-to-fixed-length code (VF code) in which the parse tree is constructed by pruning a suffix tree for an input text. It gives a better compression ratio rather than the previous VF codes, and it is a promising scheme for compressed pattern matching and applications to large-scale text databases. However, its compression ratio is still worse than that of a state-of-the-art compression method. The improvement of the compression ratio of STVF code is required. In our method, all the incomplete inner nodes in addition to leaves in the parse tree are assigned codewords, while just leaves are assigned in the original STVF coding. This code assignment leads an improvement in compression ratio because we can prune the parse tree of infrequent leaves and extend the other frequent edges. In practice experimental results show that the proposed method improves more than 18% in compression ratio for natural language texts in comparison with the original STVF coding.


©Copyright 2010 Authors