Article

Reducing the Planning Horizon Through Reinforcement Learning

Authors:

Benjamin Rosman,

Anthony G. Cohn,

Matteo LeonettiAuthors Info & Claims

Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part IV

Pages 68 - 83

https://doi.org/10.1007/978-3-031-26412-2_5

Published: 17 March 2023 Publication History

Abstract

Planning is a computationally expensive process, which can limit the reactivity of autonomous agents. Planning problems are usually solved in isolation, independently of similar, previously solved problems. The depth of search that a planner requires to find a solution, known as the planning horizon, is a critical factor when integrating planners into reactive agents. We consider the case of an agent repeatedly carrying out a task from different initial states. We propose a combination of classical planning and model-free reinforcement learning to reduce the planning horizon over time. Control is smoothly transferred from the planner to the model-free policy as the agent compiles the planner’s policy into a value function. Local exploration of the model-free policy allows the agent to adapt to the environment and eventually overcome model inaccuracies. We evaluate the efficacy of our framework on symbolic PDDL domains and a stochastic grid world environment and show that we are able to significantly reduce the planning horizon while improving upon model inaccuracies.

References

[1]

Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5) (2009)

[2]

Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1) (1995)

[3]

Bejjani, W., Dogar, M.R., Leonetti, M.: Learning physics-based manipulation in clutter: combining image-based generalization and look-ahead planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2019)

[4]

Bertsekas, D.P.: Distributed asynchronous computation of fixed points. Math. Program. 27(1) (1983)

[5]

Bylander, T.: Complexity results for planning. In: 12th International Joint Conference on Artificial Intelligence (1991)

[6]

Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8 (2005)

[7]

De Klerk, M., Venter, P.W., Hoffman, P.A.: Parameter analysis of the Jensen-Shannon divergence for shot boundary detection in streaming media applications. SAIEE Africa Res. J. 109(3) (2018)

[8]

Gershman, S.J., Markman, A.B., Otto, A.R.: Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. General 143(1) (2014)

[9]

Grounds M and Kudenko D Tuyls K, Nowe A, Guessoum Z, and Kudenko D Combining reinforcement learning with symbolic planning Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning 2008 Heidelberg Springer 75-86

[10]

Grzes, M., Kudenko, D.: Plan-based reward shaping for reinforcement learning. In: 4th International IEEE Conference Intelligent Systems, vol. 2. IEEE (2008)

[11]

Helmert, M.: The fast downward planning system. J. Artif. Intell. Res. 26 (2006)

[12]

Jiménez, S., De La Rosa, T., Fernández, S., Fernández, F., Borrajo, D.: A review of machine learning for automated planning. Knowl. Eng. Rev. 27(4) (2012)

[13]

Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49(2–3) (2002)

[14]

Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7(5) (2011)

[15]

Koenig, S., Likhachev, M.: Fast replanning for navigation in unknown terrain. IEEE Trans. Robot. 21(3) (2005)

[16]

Korf, R.E.: Real-time heuristic search. Artif. Intell. 42(2) (1990)

[17]

Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1) (1951)

[18]

Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241 (2016)

[19]

Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1) (1991)

[20]

Marom, O., Rosman, B.: Utilising uncertainty for efficient learning of likely-admissible heuristics. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30 (2020)

[21]

Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Reward function and initial values: better choices for accelerated goal-directed reinforcement learning. In: International Conference on Artificial Neural Networks (2006)

[22]

Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning (1999)

[23]

Pérez-Higueras N, Caballero F, and Merino L Agah A, Cabibihan J-J, Howard AM, Salichs MA, and He H Learning robot navigation behaviors by demonstration using a RRT

^{*}

planner Social Robotics 2016 Cham Springer 1-10

[24]

Silver, T., Chitnis, R.: PDDLGym: gym environments from PDDL problems. In: International Conference on Automated Planning and Scheduling (ICAPS) PRL Workshop (2020)

[25]

Solway, A., Botvinick, M.M.: Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychol. Rev. 119(1) (2012)

[26]

Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4) (1991)

[27]

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

[28]

Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)

[29]

Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th International Conference on Machine Learning (2003)

[30]

Yoon, S.W., Fern, A., Givan, R.: Learning heuristic functions from relaxed plans. In: Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling, vol. 2 (2006)

[31]

Yoon, S., Fern, A., Givan, R.: Learning control knowledge for forward search planning. J. Mach. Learn. Res. 9(4) (2008)

Recommendations

Relational Reinforcement Learning

Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions ...
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of ...
The first learning track of the international planning competition

The International Planning Competition is a biennial event organized in the context of the International Conference on Automated Planning and Scheduling. The 2008 competition included, for the first time, a learning track for comparing approaches for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part IV

Sep 2022

679 pages

ISBN:978-3-031-26411-5

DOI:10.1007/978-3-031-26412-2

Editors:
Massih-Reza Amini
Grenoble Alpes University, Saint Martin d’Hères, France
,
Stéphane Canu
INSA Rouen Normandy, Saint Etienne du Rouvray, France
,
Asja Fischer
Ruhr-Universität Bochum, Bochum, Germany
,
Tias Guns
KU Leuven, Leuven, Belgium
,
Petra Kralj Novak
Central European University, Vienna, Austria
,
Grigorios Tsoumakas
Aristotle University of Thessaloniki, Thessaloniki, Greece

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 March 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents