TCS-TR-A-09-37Date: Mon Jan 12 09:39:23 2009 Authors: Shin-ichi Minato and Takeaki Uno Contact:
Abstract. Frequent itemset mining is one of the fundamental techniques for data mining and knowledge discovery. Recently, Minato et al. proposed a fast algorithm ``LCM over ZDDs'' for generating very large-scale frequent itemsets using Zero-suppressed BDDs (ZDDs), a compact graph-based data structure. Their method is based on LCM algorithm, one of the most efficient state-of-the-art techniques for itemset mining, and directly generates compact output data structures on the main memory, to be efficiently post-processed by using ZDD-based algebraic operations. In this paper, we propose a novel method of finding distinctive frequent itemsets from time segmented (e.g. daily, weekly, monthly) sequential transaction databases. We define ``frequency pattern chart'' using regular expressions for specifying distinctive frequency patterns in time segmented databases. Our method efficiently extracts all itemsets satisfying a given frequency pattern chart using LCM over ZDDs algorithm and ZDD-based symbolic processing of finite automata. Experimental results show that our method is applicable to very large-scale problems, for example, we can find a small number of distinctive itemsets from a huge number (more than $10^{44}$) of frequent itemsets in a few seconds. Time segmented databases often appear in many real-life problems, so our new method will have a significant impact to various practical applications. ©Copyright 2009 Authors |