TCS-TR-A-10-41

Date: Mon Mar 1 10:53:04 2010

Title: Mining Approximate Patterns with Frequent Locally Optimal Occurrences

Authors: Atsuyoshi Nakamura Hisashi Tosaka Mineichi Kudo

Contact:

First name: Atsuyoshi
Last name: Nakamura
Address: Graduate School of Information Science and Technology Hokkaido University Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan
Email: atsu@main.ist.hokudai.ac.jp

Abstract. We propose a novel frequent approximate pattern mining that suits estimation of occurrence regions. Given a string s, our mining enumerates its substrings that locally optimally match many substrings of s. We show an algorithm for this problem in which candidate patterns are generated without duplication using the suffix tree of s. This problem can be extended to the problem of enumerating approximate frequent subforests of a given ordered labeled tree T. Our mining was applied to the task of extraction of search result records from a web page returned by a search engine, and had good performance for benchmark data sets.