Tutorials for ALT/DS 2006

The 17th International Conference
on
Algorithmic Learning Theory

AND

The 9th International Conference on Discovery Science

Barcelona, Spain
October 7 - 10, 2006

TUTORIALS

FOR ALT 2006 AND DS 2006

organized by DS

and with the support of
Idescat, the Statistical Institute of Catalonia.

Michael May, Fraunhofer Institute for Autonomous Intelligent Systems, Germany

Geographic and Spatial Data Mining

The widespread use of ubiquitous and mobile technologies such as sensor networks, GPS, mobile phones and RFID, as well as the recent success of Google Earth lead to a situation where more and more data mining applications will have to deal with non-trivial problems of spatio-temporal data analysis. Applications range from telecommunication, retail and market research to scientific applications from ecology or epidemiology.

Despite the importance, standard data mining tools and methods cannot not adequately deal with spatial information. Consequently, important information is thrown away, leading to non-optimal results. The last years have seen several lines of research that try to change this situation. Various classes of data mining algorithms - e.g. clustering, association rules, decision trees, subgroup discovery - have been upgraded to handle geographic objects such as lines, points and polygons and their spatial relationships. Nicely complementing classical approaches that have been pioneered in geostatistics (e.g. Kriging, Point Pattern Analysis), those approaches are often rooted in some form of Multi-Relational Data Mining.

In this tutorial, we will first clarify the various data types relevant for geographic data mining and work out the specific characteristics and challenges of geographic data. Next, we discuss several examples of algorithms that take advantage of these data types. Finally, we present a wide range of applications to illustrate the potential, successes and shortcomings of current Spatial Data Mining approaches. We conclude by pointing out some future challenges and directions.

Luis Torgo, University of Porto, Portugal

Using R for Data Mining and Scientific Discovery

R is a freely downloadable language and environment for data analysis. The R community has been growing at a very fast rate, the same happening to the list of available add-on packages addressing a very large set of domains of application. The main purpose of this tutorial is to illustrate R capabilities on typical data mining and scientific discovery tasks. We aim to convince you that R is an excellent tool to implement ideas to solve specific tasks within these areas. We will pursuit our goal by means of presenting a set of concrete case studies. These case studies will be described and all necessary steps to reach the results using R will be provided as a means of both introducing you to R, but also for allowing you to continue, adapt, and change these "solutions" after attending the tutorial. An associated web site will be made available containing all code and data necessary for you to replicate what will be shown in the tutorial, following the open source spirit of the R project.

Our presentation of R will be illustrated by three different case studies. The first is an ecological modelling task, where the main objective is to obtain models that are able to early forecast harmful algae blooms in a river dam used to collect potable water. The second case study is related to stock market trading. We will show how to obtain models for these complex dynamic systems, and also how to use these models for decision making. Finally, the third case study addresses the exploratory analysis of micro-array genomic data so common in bioinformatics applications.

AND

Barcelona, Spain October 7 - 10, 2006

Geographic and Spatial Data Mining

Using R for Data Mining and Scientific Discovery

Barcelona, Spain
October 7 - 10, 2006