Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-26412-2_5guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Reducing the Planning Horizon Through Reinforcement Learning

Published: 17 March 2023 Publication History

Abstract

Planning is a computationally expensive process, which can limit the reactivity of autonomous agents. Planning problems are usually solved in isolation, independently of similar, previously solved problems. The depth of search that a planner requires to find a solution, known as the planning horizon, is a critical factor when integrating planners into reactive agents. We consider the case of an agent repeatedly carrying out a task from different initial states. We propose a combination of classical planning and model-free reinforcement learning to reduce the planning horizon over time. Control is smoothly transferred from the planner to the model-free policy as the agent compiles the planner’s policy into a value function. Local exploration of the model-free policy allows the agent to adapt to the environment and eventually overcome model inaccuracies. We evaluate the efficacy of our framework on symbolic PDDL domains and a stochastic grid world environment and show that we are able to significantly reduce the planning horizon while improving upon model inaccuracies.

References

[1]
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5) (2009)
[2]
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1) (1995)
[3]
Bejjani, W., Dogar, M.R., Leonetti, M.: Learning physics-based manipulation in clutter: combining image-based generalization and look-ahead planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2019)
[4]
Bertsekas, D.P.: Distributed asynchronous computation of fixed points. Math. Program. 27(1) (1983)
[5]
Bylander, T.: Complexity results for planning. In: 12th International Joint Conference on Artificial Intelligence (1991)
[6]
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8 (2005)
[7]
De Klerk, M., Venter, P.W., Hoffman, P.A.: Parameter analysis of the Jensen-Shannon divergence for shot boundary detection in streaming media applications. SAIEE Africa Res. J. 109(3) (2018)
[8]
Gershman, S.J., Markman, A.B., Otto, A.R.: Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. General 143(1) (2014)
[9]
Grounds M and Kudenko D Tuyls K, Nowe A, Guessoum Z, and Kudenko D Combining reinforcement learning with symbolic planning Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning 2008 Heidelberg Springer 75-86
[10]
Grzes, M., Kudenko, D.: Plan-based reward shaping for reinforcement learning. In: 4th International IEEE Conference Intelligent Systems, vol. 2. IEEE (2008)
[11]
Helmert, M.: The fast downward planning system. J. Artif. Intell. Res. 26 (2006)
[12]
Jiménez, S., De La Rosa, T., Fernández, S., Fernández, F., Borrajo, D.: A review of machine learning for automated planning. Knowl. Eng. Rev. 27(4) (2012)
[13]
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49(2–3) (2002)
[14]
Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7(5) (2011)
[15]
Koenig, S., Likhachev, M.: Fast replanning for navigation in unknown terrain. IEEE Trans. Robot. 21(3) (2005)
[16]
Korf, R.E.: Real-time heuristic search. Artif. Intell. 42(2) (1990)
[17]
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1) (1951)
[18]
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241 (2016)
[19]
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1) (1991)
[20]
Marom, O., Rosman, B.: Utilising uncertainty for efficient learning of likely-admissible heuristics. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30 (2020)
[21]
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Reward function and initial values: better choices for accelerated goal-directed reinforcement learning. In: International Conference on Artificial Neural Networks (2006)
[22]
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning (1999)
[23]
Pérez-Higueras N, Caballero F, and Merino L Agah A, Cabibihan J-J, Howard AM, Salichs MA, and He H Learning robot navigation behaviors by demonstration using a RRT planner Social Robotics 2016 Cham Springer 1-10
[24]
Silver, T., Chitnis, R.: PDDLGym: gym environments from PDDL problems. In: International Conference on Automated Planning and Scheduling (ICAPS) PRL Workshop (2020)
[25]
Solway, A., Botvinick, M.M.: Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychol. Rev. 119(1) (2012)
[26]
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4) (1991)
[27]
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
[28]
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
[29]
Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th International Conference on Machine Learning (2003)
[30]
Yoon, S.W., Fern, A., Givan, R.: Learning heuristic functions from relaxed plans. In: Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling, vol. 2 (2006)
[31]
Yoon, S., Fern, A., Givan, R.: Learning control knowledge for forward search planning. J. Mach. Learn. Res. 9(4) (2008)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part IV
Sep 2022
679 pages
ISBN:978-3-031-26411-5
DOI:10.1007/978-3-031-26412-2

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 March 2023

Author Tags

  1. Planning
  2. Planning horizon
  3. Reinforcement learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media