A Survey of Preference-based Online Learning with Bandit Algorithms
(invited tutorial for ALT 2014)

Author: Eyke Hüllermeier

Affiliation:Institut für Informatik, Universität Paderborn, Germany

Abstract. In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. In the standard setting, the agent learns from stochastic feedback in the form of real-valued rewards. In many applications, however, numerical reward signals are not readily available---instead, only weaker information is provided, in particular relative preferences in the form of qualitative comparisons between pairs of alternatives. This observation has motivated the study of variants of the multi-armed bandit problem, in which more general representations are used both for the type of feedback to learn from and the target of prediction. The aim of this paper is to provide a survey of the state-of-the-art in this field, that we refer to as preference-based multi-armed bandits. To this end, we provide an overview of problems that have been considered in the literature as well as methods for tackling them. Our systematization is mainly based on the assumptions made by these methods about the data-generating process and, related to this, the properties of the preference-based feedback.

This is joint work with Róbert Busa-Fekete

Bio. Eyke Hüllermeier was born in 1969. He holds MSc degrees in mathematics and business informatics, both from the University of Paderborn (Germany). From the Computer Science Department of the same university he obtained his PhD in 1997 and a Habilitation degree in 2002. He worked as a researcher and teaching assistant in the fields of computer science (artificial intelligence, knowledge-based systems) and statistics at the University of Paderborn and the University of Dortmund. From 1998 to 2000, he spend two years as a Marie Curie fellow at the IRIT (Institut de Recherche en Informatique de Toulouse). Prior to joining the Department of Computer Science of the University of Paderborn in April 2014 he held a position as full professor at the Department of Mathematics and Computer Science at Marburg University, a position as an associate professor in the Faculty of Computer Science at the Otto-von-Guericke-Universität Magdeburg (2004-2006), and as a Juniorprofessor in Marburg (2002-2004).

©Copyright Author
Valid HTML 4.1