Abstract
Planning is a computationally expensive process, which can limit the reactivity of autonomous agents. Planning problems are usually solved in isolation, independently of similar, previously solved problems. The depth of search that a planner requires to find a solution, known as the planning horizon, is a critical factor when integrating planners into reactive agents. We consider the case of an agent repeatedly carrying out a task from different initial states. We propose a combination of classical planning and model-free reinforcement learning to reduce the planning horizon over time. Control is smoothly transferred from the planner to the model-free policy as the agent compiles the planner’s policy into a value function. Local exploration of the model-free policy allows the agent to adapt to the environment and eventually overcome model inaccuracies. We evaluate the efficacy of our framework on symbolic PDDL domains and a stochastic grid world environment and show that we are able to significantly reduce the planning horizon while improving upon model inaccuracies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5) (2009)
Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to act using real-time dynamic programming. Artif. Intell. 72(1) (1995)
Bejjani, W., Dogar, M.R., Leonetti, M.: Learning physics-based manipulation in clutter: combining image-based generalization and look-ahead planning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2019)
Bertsekas, D.P.: Distributed asynchronous computation of fixed points. Math. Program. 27(1) (1983)
Bylander, T.: Complexity results for planning. In: 12th International Joint Conference on Artificial Intelligence (1991)
Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8 (2005)
De Klerk, M., Venter, P.W., Hoffman, P.A.: Parameter analysis of the Jensen-Shannon divergence for shot boundary detection in streaming media applications. SAIEE Africa Res. J. 109(3) (2018)
Gershman, S.J., Markman, A.B., Otto, A.R.: Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. General 143(1) (2014)
Grounds, M., Kudenko, D.: Combining reinforcement learning with symbolic planning. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds.) AAMAS/ALAMAS 2005-2007. LNCS (LNAI), vol. 4865, pp. 75–86. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77949-0_6
Grzes, M., Kudenko, D.: Plan-based reward shaping for reinforcement learning. In: 4th International IEEE Conference Intelligent Systems, vol. 2. IEEE (2008)
Helmert, M.: The fast downward planning system. J. Artif. Intell. Res. 26 (2006)
Jiménez, S., De La Rosa, T., Fernández, S., Fernández, F., Borrajo, D.: A review of machine learning for automated planning. Knowl. Eng. Rev. 27(4) (2012)
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Mach. Learn. 49(2–3) (2002)
Keramati, M., Dezfouli, A., Piray, P.: Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7(5) (2011)
Koenig, S., Likhachev, M.: Fast replanning for navigation in unknown terrain. IEEE Trans. Robot. 21(3) (2005)
Korf, R.E.: Real-time heuristic search. Artif. Intell. 42(2) (1990)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1) (1951)
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241 (2016)
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1) (1991)
Marom, O., Rosman, B.: Utilising uncertainty for efficient learning of likely-admissible heuristics. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30 (2020)
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Reward function and initial values: better choices for accelerated goal-directed reinforcement learning. In: International Conference on Artificial Neural Networks (2006)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning (1999)
Pérez-Higueras, N., Caballero, F., Merino, L.: Learning robot navigation behaviors by demonstration using a RRT\(^{*}\) planner. In: Agah, A., Cabibihan, J.-J., Howard, A.M., Salichs, M.A., He, H. (eds.) ICSR 2016. LNCS (LNAI), vol. 9979, pp. 1–10. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47437-3_1
Silver, T., Chitnis, R.: PDDLGym: gym environments from PDDL problems. In: International Conference on Automated Planning and Scheduling (ICAPS) PRL Workshop (2020)
Solway, A., Botvinick, M.M.: Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. Psychol. Rev. 119(1) (2012)
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4) (1991)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge (1989)
Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th International Conference on Machine Learning (2003)
Yoon, S.W., Fern, A., Givan, R.: Learning heuristic functions from relaxed plans. In: Proceedings of the Sixteenth International Conference on Automated Planning and Scheduling, vol. 2 (2006)
Yoon, S., Fern, A., Givan, R.: Learning control knowledge for forward search planning. J. Mach. Learn. Res. 9(4) (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dunbar, L., Rosman, B., Cohn, A.G., Leonetti, M. (2023). Reducing the Planning Horizon Through Reinforcement Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-26412-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)