TCS-TR-A-06-21

Date: Wed Sep 13 08:02:51 2006

Title: An Adaptive Algorithm for Splitting Large Sets of Strings and Its Application to Efficient External Sorting

Authors: Tatsuya Asai, Seishi Okamoto, and Hiroki Arimura

Contact:

Abstract. With a recent increase of text database and semi-structured database like XML documents, efficient algorithms for sorting large set of strings has been important. Sorting strings is a fundamental task of many processing, such as group-by aggregations and functional join, on text and semi-structured database. However, in such a large database, it is sometimes difficult to sort its attribute values of string type in internal memory. In this paper, we present an algorithm for sorting a large set of strings in external memory. The main idea of the algorithm are splitting a set of strings to some buckets by constructing synopsis trie of input strings. Experiments on real datasets show the effectiveness of our algorithm.


©Copyright 2006 Authors