TCS-TR-A-07-30

Date: Wed Oct 10 11:42:44 2007

Title: Fast Generation of Very Large-Scale Frequent Itemsets Using a Compact Graph-Based Representation

Authors: Shin-ichi Minato, Takeaki Uno, and Hiroki Arimura

Contact:

First name: Shin-ichi
Last name: Minato
Address: Division of Computer Science, Hokkaido University, North 14 West 9, Sapporo, 060-0814 Japan.
Email: minato@ist.hokudai.ac.jp

Abstract. Frequent itemset mining is one of the fundamental techniques for data mining and knowledge discovery. In the last decade, a number of efficient algorithms for frequent itemset mining have been presented, but most of them focused on just enumerating the itemsets which satisfy the given conditions, and it was a different matter how to store and index the mining result for efficient data analysis. In this paper, we propose a fast algorithm for generating very large-scale all/closed/maximal frequent itemsets using Zero-suppressed BDDs (ZBDDs), a compact graph-based data structure. Our method, ``LCM over ZBDDs,'' is based on one of the most efficient state-of-the-art algorithms proposed before, and not only enumerating/listing the itemsets but also generating a compact output data structure on the main memory. The result can efficiently be post-processed by using algebraic ZBDD operations. The original LCM is known as an output linear time algorithm, but our new method requires a sub-linear time to the number of frequent patterns when the ZBDD-based data compression works well. Our method may greatly accelerate the data mining process and will lead a new style of on-memory processing for knowledge discovery problems.