TCS-TR-A-07-30Date: Wed Oct 10 11:42:44 2007 Title: Fast Generation of Very Large-Scale Frequent Itemsets Using a Compact Graph-Based Representation Authors: Shin-ichi Minato, Takeaki Uno, and Hiroki Arimura Contact:
Abstract. Frequent itemset mining is one of the fundamental techniques for data mining and knowledge discovery. In the last decade, a number of efficient algorithms for frequent itemset mining have been presented, but most of them focused on just enumerating the itemsets which satisfy the given conditions, and it was a different matter how to store and index the mining result for efficient data analysis. In this paper, we propose a fast algorithm for generating very large-scale all/closed/maximal frequent itemsets using Zero-suppressed BDDs (ZBDDs), a compact graph-based data structure. Our method, ``LCM over ZBDDs,'' is based on one of the most efficient state-of-the-art algorithms proposed before, and not only enumerating/listing the itemsets but also generating a compact output data structure on the main memory. The result can efficiently be post-processed by using algebraic ZBDD operations. The original LCM is known as an output linear time algorithm, but our new method requires a sub-linear time to the number of frequent patterns when the ZBDD-based data compression works well. Our method may greatly accelerate the data mining process and will lead a new style of on-memory processing for knowledge discovery problems. ©Copyright 2007 Authors |