Algorithms and Software for Collaborative Discovery from Autonomous,
Semantically Heterogeneous, Distributed, Information Sources
(invited lecture for ALT 2005)
Author: Vasant Honavar
Affiliation:
Artificial Intelligence Research Laboratory,
Center for Computational Intelligence, Learning, and Discovery,
Department of Computer Science,
Iowa State University, Ames, Iowa, U.S.A.
Abstract.
Development of high throughput data acquisition technologies, together
with advances in computing, and communications have resulted in an
explosive growth in the number, size, and diversity of potentially
useful information sources. This has resulted in unprecedented
opportunities in data-driven knowledge acquisition and decision-making
in a number of emerging increasingly data-rich application domains such
as bioinformatics, environmental informatics, enterprise informatics,
and social informatics (among others). However, the massive size,
semantic heterogeneity, autonomy, and distributed nature of the data
repositories present significant hurdles in acquiring useful knowledge
from the available data. In this talk, I will introduce some of the
algorithmic and statistical problems that arise in such a setting. I
will describe algorithms for learning classifiers from distributed data
that offers rigorous performance guarantees (relative to their
centralized or batch counterparts). I will describe how this approach
can be extended to work with autonomous, and hence, inevitably
semantically heterogeneous data sources, by making explicit, the
ontologies (attributes and relationships between attributes) associated
with the data sources and reconciling the semantic differences among
the data sources from a user's point of view. This allows user or
context-dependent exploration of semantically heterogeneous data
sources. The resulting algorithms have been implemented in INDUS - an
open source software package for collaborative discovery from
autonomous, semantically heterogeneous, distributed data sources. I
will briefly describe some representative applications of INDUS to
data-driven knowledge acquisition tasks in bioinformatics and
computational biology. I will conclude the talk with a summary of the
main results, a brief discussion of related work, and an outline of some
directions for further research on this topic.
Acknowledgements:
Much of this work has been carried out in
collaboration with members of the ISU Artificial Intelligence Research
Laboratory and has been supported in part by Iowa State University and
grants from the National Science Foundation (IIS 0219699) and the
National Institutes of Health (GM 0066387).
©Copyright 2005 Author
|