TCS-TR-A-09-37

Date: Mon Jan 12 09:39:23 2009

Title: Distinctive Frequent Itemset Mining from Time Segmented Databases Using ZDD-Based Symbolic Processing

Authors: Shin-ichi Minato and Takeaki Uno

Contact:

  • First name: Shin-ichi
  • Last name: Minato
  • Address: Graduate School of Information Science and Technology, Hokkaido University, North 14 West 9, Sapporo, 060-0814 Japan.
  • Email: minato@ist.hokudai.ac.jp

Abstract. Frequent itemset mining is one of the fundamental techniques for data mining and knowledge discovery. Recently, Minato et al. proposed a fast algorithm ``LCM over ZDDs'' for generating very large-scale frequent itemsets using Zero-suppressed BDDs (ZDDs), a compact graph-based data structure. Their method is based on LCM algorithm, one of the most efficient state-of-the-art techniques for itemset mining, and directly generates compact output data structures on the main memory, to be efficiently post-processed by using ZDD-based algebraic operations. In this paper, we propose a novel method of finding distinctive frequent itemsets from time segmented (e.g. daily, weekly, monthly) sequential transaction databases. We define ``frequency pattern chart'' using regular expressions for specifying distinctive frequency patterns in time segmented databases. Our method efficiently extracts all itemsets satisfying a given frequency pattern chart using LCM over ZDDs algorithm and ZDD-based symbolic processing of finite automata. Experimental results show that our method is applicable to very large-scale problems, for example, we can find a small number of distinctive itemsets from a huge number (more than $10^{44}$) of frequent itemsets in a few seconds. Time segmented databases often appear in many real-life problems, so our new method will have a significant impact to various practical applications.


©Copyright 2009 Authors