On Following the Perturbed Leader in the Bandit SettingAuthors: Jussi Kujala and Tapio Elomaa Source: Algorithmic Learning Theory, 16th International Conference, ALT 2005, Singapore, October 2005, Proceedings, (Sanjay Jain, Hans Ulrich Simon and Etsuji Tomita, Eds.), Lecture Notes in Artificial Intelligence 3734, pp. 371 - 385, Springer 2005.
Abstract.
In an online decision problem an algorithm is at each time step required to choose
one of the feasible points without knowing the cost associated with it.
An adversary assigns the cost to possible decisions either obliviously or adaptively.
The online algorithm, naturally, attempts to collect as little cost as possible.
The cost difference of the online algorithm and the best static decision in hindsight
is called the regret of the algorithm.
In this paper we study Kalai and Vempala's model from the viewpoint of bandit algorithms. We show that the algorithm of McMahan and Blum attains a regret of O(T2/3) against an oblivious adversary. Moreover, we show a tighter bound for the expert setting using m experts.
©Copyright 2005, Springer |