TCS-TR-A-06-21Date: Wed Sep 13 08:02:51 2006 Authors: Tatsuya Asai, Seishi Okamoto, and Hiroki Arimura Contact:
Abstract. With a recent increase of text database and semi-structured database like XML documents, efficient algorithms for sorting large set of strings has been important. Sorting strings is a fundamental task of many processing, such as group-by aggregations and functional join, on text and semi-structured database. However, in such a large database, it is sometimes difficult to sort its attribute values of string type in internal memory. In this paper, we present an algorithm for sorting a large set of strings in external memory. The main idea of the algorithm are splitting a set of strings to some buckets by constructing synopsis trie of input strings. Experiments on real datasets show the effectiveness of our algorithm. ©Copyright 2006 Authors |