A Survey of Preference-based Online Learning with Bandit Algorithms
(invited tutorial for ALT 2014)
Author: Eyke Hüllermeier
Affiliation:Institut für Informatik, Universität
Paderborn, Germany
Abstract.
In machine learning, the notion of multi-armed bandits refers
to a class of online learning problems, in which an agent is supposed
to simultaneously explore and exploit a given set of choice
alternatives in the course of a sequential decision process. In the
standard setting, the agent learns from stochastic feedback in the
form of real-valued rewards. In many applications, however, numerical
reward signals are not readily available---instead, only weaker
information is provided, in particular relative preferences in the
form of qualitative comparisons between pairs of alternatives. This
observation has motivated the study of variants of the multi-armed
bandit problem, in which more general representations are used both
for the type of feedback to learn from and the target of prediction.
The aim of this paper is to provide a survey of the state-of-the-art
in this field, that we refer to as preference-based multi-armed
bandits. To this end, we provide an overview of problems that have
been considered in the literature as well as methods for tackling
them. Our systematization is mainly based on the assumptions made by
these methods about the data-generating process and, related to this,
the properties of the preference-based feedback.
This is joint work with Róbert Busa-Fekete
Bio.
Eyke Hüllermeier was born in 1969. He holds MSc degrees in
mathematics and business informatics, both from the University of
Paderborn (Germany). From the Computer Science Department of the same
university he obtained his PhD in 1997 and a Habilitation degree in
2002. He worked as a researcher and teaching assistant in the fields
of computer science (artificial intelligence, knowledge-based systems)
and statistics at the University of Paderborn and the University of
Dortmund. From 1998 to 2000, he spend two years as a Marie Curie
fellow at the IRIT (Institut de Recherche en Informatique de
Toulouse).
Prior to joining the Department of Computer Science of the University
of Paderborn in April 2014 he held a position as full professor at the
Department of Mathematics and Computer Science at Marburg University,
a position as an associate professor in the Faculty of Computer Science at the
Otto-von-Guericke-Universität Magdeburg (2004-2006), and as a
Juniorprofessor in Marburg (2002-2004).
©Copyright Author
|