Data-Driven Discovery using Probabilistic Hidden Variable Models
(invited lecture for DS 2006)

Author: Padhraic Smyth

Affiliation: Department of Computer Science, University of California, Irvine, U.S.A.

Abstract. Generative probabilistic models have proven to be a very useful framework for machine learning from scientific data. Key ideas that underlie the generative approach include

(a) representing complex stochastic phenomena using the structured language of graphical models,

(b) using latent (hidden) variables to make inferences about unobserved phenomena, and

(c) leveraging Bayesian ideas for learning and prediction.

This talk will begin with a brief review of learning from data with hidden variables and then discuss some exciting recent work in this area that has direct application to a broad range of scientific problems. A number of different scientific data sets will be used as examples to illustrate the application of these ideas in probabilistic learning, such as time-course microarray expression data, functional magnetic resonance imaging (fMRI) data of the human brain, text documents from the biomedical literature, and sets of cyclone trajectories.

©Copyright 2006 Author