loading...
The Two Facets of the Exploration-Exploitation Dilemma
2006 IEEE/WIC/ACM International Confe ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Kaifu Zhang, Tsinghua University, China
Wei Pan, Tsinghua University, China
This paper proposes an algorithm to better solve the exploration-exploitation dilemma faced by model-less reinforcement learning agents. The main contribution is twofold: (1) The two facets of the exploration-exploitation dilemma are distinguished: in some cases, the agent faces a non-stationary environment, therefore it needs to choose the best moment to explore in order to adapt to the changes; in some other cases, the agent faces a relatively large state-action space, and it therefore needs to choose the most promising subset of states/actions to explore. In this two-facet framework, we compared the relative advantage and limitations of two previously proposed algorithms in difference situations. (2) We unified these two algorithms to produce the new algorithm which works fairly well in all testing situations.
Citation:
Kaifu Zhang, Wei Pan, "The Two Facets of the Exploration-Exploitation Dilemma," iat,pp.371-380, 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.


Click here to go to beta feedback form