A Theory of Similarity Functions for Learning and Clustering
(invited lecture for ALT 2007)

Author: Avrim Blum

Affiliations: Department of Computer Science, Carnegie Mellon University
Pittsburgh, PA 15213

Abstract. Kernel methods have proven to be powerful tools in machine learning. They perform well in many applications, and there is also a well-developed theory of sufficient conditions for a kernel to be useful for a given learning problem. However, while a kernel can be thought of as just a pairwise similarity function that satisfies additional mathematical properties, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into high-dimensional spaces. In this talk I will describe work on developing a theory that applies to more general similarity functions (not just legal kernels) and furthermore describes the usefulness of a given similarity function in terms of more intuitive, direct properties, without need to refer to any implicit spaces.

An interesting feature of the proposed framework is that it can also be applied to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to cluster well: to learn well without any label information at all. We find that if we are willing to relax the objective a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if some pruning is close to the correct answer), then this question leads to a number of interesting graph-theoretic and game-theoretic properties that are sufficient to cluster well. This work can be viewed as an approach to defining a PAC model for clustering.

This talk is based on work joint with Maria-Florina Balcan and Santosh Vempala.

©Copyright 2007 Author