A Theory of Similarity Functions for Learning and Clustering
(invited lecture for ALT 2007)
Author: Avrim Blum
Affiliations: Department of Computer Science,
Carnegie Mellon University
Pittsburgh, PA 15213
Abstract.
Kernel methods have proven to be powerful tools in machine learning.
They perform well in many applications, and there is also a
well-developed theory of sufficient conditions for a kernel to be
useful for a given learning problem. However, while a kernel
can be thought of as just a pairwise similarity function that
satisfies additional mathematical properties, this theory requires
viewing kernels as implicit (and often difficult to characterize) maps
into high-dimensional spaces. In this talk I will describe work on
developing a theory that applies to more general similarity functions
(not just legal kernels) and furthermore describes the usefulness of a
given similarity function in terms of more intuitive, direct
properties, without need to refer to any implicit spaces.
An interesting feature of the proposed framework is that it can also
be applied to learning from purely unlabeled data, i.e., clustering.
In particular, one can ask how much stronger the properties of a
similarity function should be (in terms of its relation to the unknown
desired clustering) so that it can be used to cluster well: to
learn well without any label information at all. We find that if we
are willing to relax the objective a bit (for example, allow the
algorithm to produce a hierarchical clustering that we will call
successful if some pruning is close to the correct answer), then this
question leads to a number of interesting graph-theoretic and
game-theoretic properties that are sufficient to cluster well. This
work can be viewed as an approach to defining a PAC model for
clustering.
This talk is based on work joint with Maria-Florina Balcan and Santosh
Vempala.
©Copyright 2007 Author
|