| 
 
Authors: Peter Auer1,
Shiau Hong Lim1, and 
Chris Watkins2
 
Affiliation1: 
Chair for Information Technology 
Department of Mathematics and Information Technology 
University of Leoben, Austria 
2Department of Computer Science 
Royal Holloway, University of London.
                
 
Abstract. 
One of the striking differences between current reinforcement learning
algorithms and early human learning is that animals and infants appear
to explore their environments with autonomous purpose, in a manner
appropriate to their current level of skills.   
An important intuition for autonomously motivated exploration was
proposed by Schmidhuber [1], [2]:
an agent should be interested in making observations that reduce its
uncertainty about future observations.  
However, there is not yet a theoretical analysis of the usefulness of
autonomous exploration in respect to the overall performance of a
learning agent. We discuss models for a learning agent's autonomous
exploration and present some recent results. In particular, we
investigate the exploration time for navigating effectively in a
Markov Decsion Process (MDP) without rewards, and we consider
extensions to MDPs with infinite state spaces.
 
 
References
 
[1] J. Schmidhuber.
    A Possibility for Implementing Curiosity and Boredom in
    Model-Building Neural Controllers.
    In J. A. Meyer and S. W. Wilson, editors, International
    Conference on Simulation of Adaptive Behavior: From Animals to Animats,
    pages 222--227. MIT Press, 1991.
[2] J. Schmidhuber.
    Developmental Robotics, Optimal Artificial Curiosity, Creativity,
    Music, and the Fine Arts.
    Connection Science, 18(2):173--187, 2006.
©Copyright 2011 Springer
  
 
 |