Reducing the Planning Horizon Through Reinforcement Learning

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

765 Accesses

Abstract

Planning is a computationally expensive process, which can limit the reactivity of autonomous agents. Planning problems are usually solved in isolation, independently of similar, previously solved problems. The depth of search that a planner requires to find a solution, known as the planning horizon, is a critical factor when integrating planners into reactive agents. We consider the case of an agent repeatedly carrying out a task from different initial states. We propose a combination of classical planning and model-free reinforcement learning to reduce the planning horizon over time. Control is smoothly transferred from the planner to the model-free policy as the agent compiles the planner’s policy into a value function. Local exploration of the model-free policy allows the agent to adapt to the environment and eventually overcome model inaccuracies. We evaluate the efficacy of our framework on symbolic PDDL domains and a stochastic grid world environment and show that we are able to significantly reduce the planning horizon while improving upon model inaccuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Abstract Planning Domains and Mappings to Real World Perceptions

Learning to Solve Sequential Planning Problems Without Rewards

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Notes

1.
This can be prespecified or adapted online as per the work of De Klerk et al. [7].
2.
Implemented by PDDLGym [24].
3.
Code at https://github.com/logan-dunbar/plan_compilation_framework.

References

Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5) (2009)
Google Scholar
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1) (1995)
Google Scholar
Bejjani, W., Dogar, M.R., Leonetti, M.: Learning physics-based manipulation in clutter: combining image-based generalization and look-ahead planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2019)
Google Scholar
Bertsekas, D.P.: Distributed asynchronous computation of fixed points. Math. Program. 27(1) (1983)
Google Scholar
Bylander, T.: Complexity results for planning. In: 12th International Joint Conference on Artificial Intelligence (1991)
Google Scholar
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8 (2005)
Google Scholar
De Klerk, M., Venter, P.W., Hoffman, P.A.: Parameter analysis of the Jensen-Shannon divergence for shot boundary detection in streaming media applications. SAIEE Africa Res. J. 109(3) (2018)
Google Scholar
Gershman, S.J., Markman, A.B., Otto, A.R.: Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. General 143(1) (2014)
Google Scholar
Grounds, M., Kudenko, D.: Combining reinforcement learning with symbolic planning. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds.) AAMAS/ALAMAS 2005-2007. LNCS (LNAI), vol. 4865, pp. 75–86. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77949-0_6
Chapter Google Scholar
Grzes, M., Kudenko, D.: Plan-based reward shaping for reinforcement learning. In: 4th International IEEE Conference Intelligent Systems, vol. 2. IEEE (2008)
Google Scholar
Helmert, M.: The fast downward planning system. J. Artif. Intell. Res. 26 (2006)
Google Scholar
Jiménez, S., De La Rosa, T., Fernández, S., Fernández, F., Borrajo, D.: A review of machine learning for automated planning. Knowl. Eng. Rev. 27(4) (2012)
Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49(2–3) (2002)
Google Scholar
Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7(5) (2011)
Google Scholar
Koenig, S., Likhachev, M.: Fast replanning for navigation in unknown terrain. IEEE Trans. Robot. 21(3) (2005)
Google Scholar
Korf, R.E.: Real-time heuristic search. Artif. Intell. 42(2) (1990)
Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1) (1951)
Google Scholar
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241 (2016)
Google Scholar
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1) (1991)
Google Scholar
Marom, O., Rosman, B.: Utilising uncertainty for efficient learning of likely-admissible heuristics. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30 (2020)
Google Scholar
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Reward function and initial values: better choices for accelerated goal-directed reinforcement learning. In: International Conference on Artificial Neural Networks (2006)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning (1999)
Google Scholar
Pérez-Higueras, N., Caballero, F., Merino, L.: Learning robot navigation behaviors by demonstration using a RRT$^{*}$ planner. In: Agah, A., Cabibihan, J.-J., Howard, A.M., Salichs, M.A., He, H. (eds.) ICSR 2016. LNCS (LNAI), vol. 9979, pp. 1–10. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47437-3_1
Chapter Google Scholar
Silver, T., Chitnis, R.: PDDLGym: gym environments from PDDL problems. In: International Conference on Automated Planning and Scheduling (ICAPS) PRL Workshop (2020)
Google Scholar
Solway, A., Botvinick, M.M.: Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychol. Rev. 119(1) (2012)
Google Scholar
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4) (1991)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Google Scholar
Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th International Conference on Machine Learning (2003)
Google Scholar
Yoon, S.W., Fern, A., Givan, R.: Learning heuristic functions from relaxed plans. In: Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling, vol. 2 (2006)
Google Scholar
Yoon, S., Fern, A., Givan, R.: Learning control knowledge for forward search planning. J. Mach. Learn. Res. 9(4) (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, University of Leeds, Leeds, UK
Logan Dunbar & Anthony G. Cohn
University of the Witwatersrand, Johannesburg, South Africa
Benjamin Rosman
Tongji University, Shanghai, China
Anthony G. Cohn
Alan Turing Institute, London, UK
Anthony G. Cohn
Qingdao University of Science and Technology, Qingdao, China
Anthony G. Cohn
Shandong University, Jinan, China
Anthony G. Cohn
Department of Informatics, King’s College London, London, UK
Matteo Leonetti

Authors

Logan Dunbar
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Rosman
View author publications
You can also search for this author in PubMed Google Scholar
Anthony G. Cohn
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Leonetti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Logan Dunbar .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dunbar, L., Rosman, B., Cohn, A.G., Leonetti, M. (2023). Reducing the Planning Horizon Through Reinforcement Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_5
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Reducing the Planning Horizon Through Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Abstract Planning Domains and Mappings to Real World Perceptions

Learning to Solve Sequential Planning Problems Without Rewards

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Reducing the Planning Horizon Through Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Abstract Planning Domains and Mappings to Real World Perceptions

Learning to Solve Sequential Planning Problems Without Rewards

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation