Reinforcement learning by policy search

January 2002

Author:
Leonid M. Peshkin,
Adviser:
Leslie Kaelbling

Publisher:

Brown University
Department of Computer Science Box 1910 Providence, RI
United States

ISBN:978-0-493-65594-9

Order Number:AAI3050949

Pages:

106

Purchase on ProQuest

Bibliometrics

Abstract

One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not them and must learn.

In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. Reinforcement learning means learning a policy—a mapping of observations into actions-based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment.

The set of policies being searched is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate various architectures for controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent system. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement.

Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.

Cited By

Contributors

Leslie Pack Kaelbling
MIT Computer Science & Artificial Intelligence Laboratory
- Publication Years1986 - 2024
- Publication counts151
- Citation count3,215
- Available for Download21
- Downloads (cumulative)7,692
- Downloads (12 months)890
- Downloads (6 weeks)138
- Average Downloads per Article366
- Average Citation per Article21
View Full Profile
Leonid Peshkin
Harvard Medical School
- Publication Years1998 - 2009
- Publication counts14
- Citation count183
- Available for Download2
- Downloads (cumulative)340
- Downloads (12 months)90
- Downloads (6 weeks)11
- Average Downloads per Article170
- Average Citation per Article13
View Full Profile

Index Terms

Reinforcement learning by policy search

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection
Special issue on Recent advances on machine learning and Cybernetics

As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open ...
Shaping and policy search in reinforcement learning
Policy Synthesis and Reinforcement Learning for Discounted LTL
Computer Aided Verification
Abstract
The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in ...

Browse Theses

Sections

Cited By

Index Terms

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

Shaping and policy search in reinforcement learning

Policy Synthesis and Reinforcement Learning for Discounted LTL

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

Shaping and policy search in reinforcement learning

Policy Synthesis and Reinforcement Learning for Discounted LTL