Recent Developments in Pattern Mining
(invited lecture for DS 2012)
Author: Toon Calders
Affiliation:
Department of Mathematics and Computer Science
Eindhoven University of Technology
The Netherlands
Abstract.
Pattern Mining is one of the most researched topics in the data mining
community. Literally hundreds of algorithms for efficiently
enumerating all frequent itemsets have been proposed. These exhaustive
algorithms, however, all suffer from the pattern explosion
problem. Depending on the minimal support threshold, even for
moderately sized databases, millions of patterns may be generated.
Although this problem is by now well recognized in te pattern mining
community, it has not yet been solved satisfactorily. In my talk I
will give an overview of the different approaches that have been
proposed to alleviate this problem. As a first step, constraint-based
mining and condensed representations such as the closed itemsets and
the non-derivable itemsets were introduced. These methods, however,
still produce too many and redundant results. More recently, promising
methods based upon the minimal description length principle,
information theory, and statistical models have been introduced. We
show the respective advantages and disadvantages of these approaches
and their connections, and illustrate their usefulness on real life
data. After this overview we move from itemsets to more complex
patterns, such as sequences and graphs. Even though these extensions
seem trivial at first, they turn out to be quite challenging. I will
end my talk with an overview of what I consider to be important open
questions in this fascinating research area.
His
Slides are available.
©Copyright Author
|