Finding Total and Partial Orders from Data for Seriation
(invited lecture for ALT & DS 2008)
Author: Heikki Mannila
Affiliations: HIIT, Helsinki University of Technology and
Department of Computer Science,
University of Helsinki, Helsinki, Finland
Abstract.
Ordering and ranking items of different types (observations, web pages, etc.)
are important tasks in various applications, such as query processing and
scientific data mining. We consider different problems of inferring total or
partial orders from data, with special emphasis on applications arising from
paleontology.
Seriation can be viewed as the task of ordering rows of a 0-1 matrix so that
certain conditions hold. We review different approaches to this task,
including spectral ordering and MCMC techniques.
A total order for the items can sometimes be misleading, since there are
groups of items that have practically equal ranks. Bucket orders, i.e., total
orders with ties can be used to capture the essential order information
without overfitting the data: they form a useful concept class between total
orders and arbitrary partial orders. We review different methods for finding
partial orders and bucket orders; some of the methods have a provable
approximation guarantee, and scale well to large datasets.
Joint work with Antti Ukkonen, Aris Gionis, Mikael Fortelius, and Kai
Puolamäki.
©Copyright 2008 Author
|