Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Hierarchical Reinforcement Learning: A Comprehensive Survey

Published: 05 June 2021 Publication History

Abstract

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.

Supplementary Material

a109-pateria-suppl.pdf (pateria.zip)
Supplemental movie, appendix, image and software files for, Hierarchical Reinforcement Learning: A Comprehensive Survey

References

[1]
Joshua Achiam, Harrison Edwards, Dario Amodei, and Pieter Abbeel. 2018. Variational option discovery algorithms. arxiv:1807.10299 (2018).
[2]
Sanjeevan Ahilan and Peter Dayan. 2019. Feudal multi-agent hierarchies for cooperative reinforcement learning. arxiv:1901.08492 (2019).
[3]
Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). AAAI Press, 1726–1734.
[4]
Akhil Bagaria and George Konidaris. 2020. Option discovery using deep skill chaining. In Proceedings of the 8th International Conference on Learning Representations.
[5]
Bram Bakker and Jürgen Schmidhuber. 2004. Hierarchical reinforcement learning with subpolicies specializing for learned subgoals. In Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence. IASTED/ACTA Press, 125–130.
[6]
Andre Barreto, Diana Borsa, Shaobo Hou, Gheorghe Comanici, Eser Aygün, Philippe Hamel, Daniel Toyama, Jonathan Hunt, Shibl Mourad, David Silver, and Doina Precup. 2019. The option keyboard: Combining skills in reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 13052–13062.
[7]
Andrew G. Barto and Sridhar Mahadevan. 2003. Recent advances in hierarchical reinforcement learning. Discr. Event Dyn. Syst. 13, 1-2 (2003), 41–77.
[8]
Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 1 (02 1970), 164–171.
[9]
Melike Baykal-Gürsoy. 2010. Semi-Markov decision processes. Wiley Encyclopedia of Operations Research and Management Science (2010).
[10]
Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Rémi Munos. 2016. Unifying count-based exploration and intrinsic motivation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 1479–1487.
[11]
Richard Bellman. 1962. Dynamic programming treatment of the travelling salesman problem. J. ACM 9, 1 (Jan. 1962), 61–63.
[12]
Richard Bellman. 1954. The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11 1954), 503–515.
[13]
Mordechai Ben-Ari and Francesco Mondada. 2018. Finite state machines. In Elements of Robotics. Springer, 55–61.
[14]
Jhelum Chakravorty, Patrick Nadeem Ward, Julien Roy, Maxime Chevalier-Boisvert, Sumana Basu, Andrei Lupu, and Doina Precup. 2020. Option-critic in cooperative multi-agent systems. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS’20). International Foundation for Autonomous Agents and Multiagent Systems, 1792–1794.
[15]
Z. Chen and B. Liu. 2018. Lifelong Machine Learning. Vol. 12. Morgan & Claypool Publishers. 1–207 pages.
[16]
Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, 103–111.
[17]
Christian Daniel, Herke Van Hoof, Jan Peters, and Gerhard Neumann. 2016. Probabilistic inference for determining options in reinforcement learning. Mach. Learn. 104, 2-3 (2016), 337–357.
[18]
G. Dantzig and Delbert Ray Fulkerson. 2003. On the max flow min cut theorem of networks. Lin. Ineq. Relat. Syst. 38 (2003), 225–231.
[19]
Peter Dayan and Geoffrey E. Hinton. 1993. Feudal reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 5. Morgan-Kaufmann, 271–278.
[20]
Thomas G. Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Int. Res. 13, 1 (Nov. 2000), 227–303.
[21]
Mostafa Al-Emran. 2015. Hierarchical reinforcement learning: A survey. Int. J. Comput. Dig. Syst. 4, 02 (2015).
[22]
Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning?J. Mach. Learn. Res. 11 (Mar. 2010), 625–660.
[23]
Ben Eysenbach, Russ R. Salakhutdinov, and Sergey Levine. 2019. Search on the replay buffer: Bridging planning and reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 15246–15257.
[24]
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is all you need: Learning skills without a reward function. arxiv:1802.06070 (2018).
[25]
Carlos Florensa, Yan Duan, and Pieter Abbeel. 2017. Stochastic neural networks for hierarchical reinforcement learning. arxiv:1704.03012 (2017).
[26]
Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 2145–2153.
[27]
Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual multi-agent policy gradients. arxiv:1705.08926 (2017).
[28]
Roy Fox, Sanjay Krishnan, Ion Stoica, and Ken Goldberg. 2017. Multi-level discovery of deep options. arxiv:1703.08294 (2017).
[29]
Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. 2017. Meta learning shared hierarchies. arxiv:1710.09767 (2017).
[30]
Mohammad Ghavamzadeh, Sridhar Mahadevan, and Rajbala Makar. 2006. Hierarchical multi-agent reinforcement learning. Auton. Agents Multi-agent Syst. 13, 2 (2006), 197–229.
[31]
Karol Gregor, Danilo Jimenez Rezende, and Daan Wierstra. 2016. Variational intrinsic control. arxiv:1611.07507 (2016).
[32]
Shixiang Gu, Ethan Holly, Timothy P. Lillicrap, and Sergey Levine. 2016. Deep reinforcement learning for robotic manipulation. CoRR abs/1610.00633 (2016).
[33]
Abhishek Gupta, Vikash Kumar, Corey Lynch, Sergey Levine, and Karol Hausman. 2020. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Proceedings of the Conference on Robot Learning (Proceedings of Machine Learning Research), Vol. 100. PMLR, 1025–1037.
[34]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 80. PMLR, 1861–1870.
[35]
Tuomas Haarnoja, Aurick Zhou, Sehoon Ha, Jie Tan, George Tucker, and Sergey Levine. 2018. Learning to walk via deep reinforcement learning. (2018). arxiv:1812.11103
[36]
Jean Harb, Pierre-Luc Bacon, Martin Klissarov, and Doina Precup. 2018. When waiting is not an option: Learning options with a deliberation cost. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[37]
Leonard Hasenclever, Fabio Pardo, Raia Hadsell, Nicolas Heess, and Josh Merel. 2020. CoMic: Complementary task learning & mimicry for reusable skills. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 119. PMLR, 4105–4115.
[38]
Karol Hausman, Jost Tobias Springenberg, Ziyu Wang, Nicolas Heess, and Martin A. Riedmiller. 2018. Learning an embedding space for transferable robot skills. In Proceedings of the 6th International Conference on Learning Representations.
[39]
Bernhard Hengst. 2010. Hierarchical reinforcement learning. In Encyclopedia of Machine Learning. Springer US, Boston, MA, 495–502.
[40]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780.
[41]
YiDing Jiang, Shixiang (Shane) Gu, Kevin P. Murphy, and Chelsea Finn. 2019. Language as an abstraction for hierarchical deep reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 9419–9431.
[42]
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross B. Girshick. 2016. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. arxiv:1612.06890 (2016).
[43]
Nicholas K. Jong, Todd Hester, and Peter Stone. 2008. The utility of temporal abstraction in reinforcement learning. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’08). International Foundation for Autonomous Agents and Multiagent Systems, 299–306.
[44]
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 1–2 (May 1998), 99–134.
[45]
Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, and Doina Precup. 2020. Options of interest: Temporal abstraction with interest functions. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. AAAI Press, 4444–4451.
[46]
Khimya Khetarpal and Doina Precup. 2019. Learning options with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence. 9955–9956.
[47]
Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations.
[48]
Martin Klissarov, Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. Learnings options end-to-end for continuous action tasks. CoRR abs/1712.00004 (2017).
[49]
George Konidaris and Andrew Barto. 2007. Building portable options: Skill transfer in reinforcement learning. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, 895–900.
[50]
George Konidaris and Andrew Barto. 2009. Skill discovery in continuous reinforcement learning domains using skill chaining. In Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS’09). Curran Associates Inc., Red Hook, NY, 1015–1023.
[51]
Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, and Joshua B. Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 3682–3690.
[52]
Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J. Gershman. 2016. Deep successor reinforcement learning. arxiv:1606.02396 (2016).
[53]
Alessandro Lazaric. 2012. Transfer in reinforcement learning: A framework and a survey. In Reinforcement Learning. Springer, 143–173.
[54]
Andrew Levy, George Dimitri Konidaris, Robert Platt Jr., and Kate Saenko. 2019. Learning multi-level hierarchies with hindsight. In Proceedings of the 7th International Conference on Learning Representations.
[55]
Kfir Y. Levy and Nahum Shimkin. 2011. Unified inter and intra options learning using policy gradient methods. In Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning (EWRL’11). Springer-Verlag, Berlin, 153–164.
[56]
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations.
[57]
Fredrik Linåker. 2000. Time series segmentation using an adaptive resource allocating vector quantization network based on change detection. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IEEE Computer Society, 6323.
[58]
Miao Liu, Christopher Amato, Emily P. Anesta, J. Daniel Griffith, and Jonathan P. How. 2016. Learning for decentralized control of multiagent systems in large, partially-observable stochastic environments. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 2523–2529. Retrieved from https://dl.acm.org/doi/10.5555/3016100.3016253.
[59]
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 6382–6393.
[60]
Marlos C. Machado, Marc G. Bellemare, and Michael Bowling. 2017. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 2295–2304.
[61]
Sridhar Mahadevan and Mauro Maggioni. 2007. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. J. Mach. Learn. Res. 8 (Dec. 2007), 2169–2231.
[62]
Rajbala Makar, Sridhar Mahadevan, and Mohammad Ghavamzadeh. 2001. Hierarchical multi-agent reinforcement learning. In Proceedings of the 5th International Conference on Autonomous Agents (AGENTS’01). Association for Computing Machinery, New York, NY, 246–253.
[63]
Oded Maron and Tomás Lozano-Pérez. 1998. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems, Vol. 10. The MIT Press, 570–576.
[64]
Amy McGovern and Andrew G. Barto. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, 361–368.
[65]
Francisco S. Melo. 2001. Convergence of q-learning: A Simple Proof. Technical Report. Institute of Systems and Robotics.
[66]
Ishai Menache, Shie Mannor, and Nahum Shimkin. 2002. Q-cut—Dynamic discovery of sub-goals in reinforcement learning. In Proceedings of the 13th European Conference on Machine Learning (ECML’02). Springer-Verlag, Berlin, 295–306.
[67]
Matheus R. F. Mendonça, Artur Ziviani, and André M. S. Barreto. 2019. Graph-based skill acquisition for reinforcement learning. ACM Comput. Surv. 52, 1 (Feb. 2019).
[68]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.
[69]
Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Data-efficient hierarchical reinforcement learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., Red Hook, NY, 3307–3317.
[70]
Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Near-optimal representation learning for hierarchical reinforcement learning. arxiv:1810.01257 (2018).
[71]
Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, and Sergey Levine. 2019. Why does hierarchy (sometimes) work so well in reinforcement learning? arxiv:1909.10618 (2019).
[72]
S. Omidshafiei, A. Agha-mohammadi, C. Amato, and J. P. How. 2015. Decentralized control of partially observable Markov decision processes using belief space macro-actions. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’15). 5962–5969.
[73]
Ronald Parr and Stuart Russell. 1998. Reinforcement learning with hierarchies of machines. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’97). The MIT Press, Cambridge, MA, 1043–1049.
[74]
Shubham Pateria, Budhitama Subagdja, and Ah-Hwee Tan. 2019. Multi-agent reinforcement learning in spatial domain tasks using inter subtask empowerment rewards. In Proceedings of the IEEE Symposium Series on Computational Intelligence. IEEE, 86–93.
[75]
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 2778–2787.
[76]
Tabish Rashid, Mikayel Samvelyan, Christian Schröder de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2018. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. arxiv:1803.11485 (2018).
[77]
John D. Co-Reyes, Yuxuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, and Sergey Levine. 2018. Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings. arxiv:1806.02813 (2018).
[78]
Matthew Riemer, Miao Liu, and Gerald Tesauro. 2018. Learning abstract options. In Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc., 10424–10434.
[79]
Khashayar Rohanimanesh and Sridhar Mahadevan. 2002. Learning to take concurrent actions. In Proceedings of the 15th International Conference on Neural Information Processing Systems (NIPS’02). The MIT Press, Cambridge, MA, 1651–1658.
[80]
Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall Press.
[81]
Andrei A. Rusu, Sergio Gomez Colmenarejo, Çaglar Gülçehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2016. Policy distillation. In Proceedings of the 4th International Conference on Learning Representations.
[82]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 37. PMLR, 1889–1897.
[83]
Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. 2019. Dynamics-aware unsupervised discovery of skills. arxiv:1907.01657 (2019).
[84]
Özgür Şimşek and Andrew G. Barto. 2008. Skill characterization based on betweenness. In Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS’08). Curran Associates Inc., Red Hook, NY, 1497–1504.
[85]
Özgür Şimşek, Alicia P. Wolfe, and Andrew G. Barto. 2005. Identifying useful subgoals in reinforcement learning by local graph partitioning (ICML’05). Association for Computing Machinery, New York, NY, 816–823.
[86]
Sungryull Sohn, Junhyuk Oh, and Honglak Lee. 2018. Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., Red Hook, NY, 7156–7166.
[87]
Martin Stolle and Doina Precup. 2002. Learning options in reinforcement learning. In Proceedings of the International Symposium on Abstraction, Reformulation, and Approximation. Springer, 212–223.
[88]
Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, and Rob Fergus. 2018. Learning goal embeddings via self-play for hierarchical reinforcement learning. arxiv:1811.09083 (2018).
[89]
Sainbayar Sukhbaatar, Ilya Kostrikov, Arthur Szlam, and Rob Fergus. 2017. Intrinsic motivation and automatic curricula via asymmetric self-play. arxiv:1703.05407 (2017).
[90]
Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. 2016. Learning multiagent communication with backpropagation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 2252–2260.
[91]
Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.) The MIT Press, Cambridge, MA.
[92]
Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS’99). The MIT Press, Cambridge, MA, 1057–1063.
[93]
Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 1–2 (Aug. 1999), 181–211.
[94]
Ming Tan. 1997. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. Morgan Kaufmann Publishers Inc., San Francisco, CA, 487–494.
[95]
Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Changjie Fan, and Li Wang. 2018. Hierarchical deep multiagent reinforcement learning. arxiv:1809.09332 (2018).
[96]
Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, and Shie Mannor. 2017. A deep hierarchical approach to lifelong learning in minecraft. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). AAAI Press, 1553–1561.
[97]
E. Todorov, T. Erez, and Y. Tassa. 2012. MuJoCo: A physics engine for model-based control. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. 5026–5033.
[98]
Claudio Turchetti. 2004. Stochastic Models of Neural Networks. Vol. 102. IOS Press.
[99]
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. FeUdal networks for hierarchical reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). JMLR.org, 3540–3549.
[100]
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Mach. Learn. 8, 3--4 (1992), 279–292.
[101]
Jiachen Yang, Igor Borovikov, and Hongyuan Zha. 2020. Hierarchical cooperative multi-agent reinforcement learning with skill discovery. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1566–1574.
[102]
Z. Yang, K. Merrick, L. Jin, and H. A. Abbass. 2018. Hierarchical deep reinforcement learning for continuous action control. IEEE Trans. Neural Netw. Learn. Syst. 29, 11 (2018), 5174–5184.
[103]
Tom Zahavy, Avinatan Hasidim, Haim Kaplan, and Yishay Mansour. 2020. Planning in hierarchical reinforcement learning: Guarantees for using local policies. Proc. Mach. Learn. Res., Vol. 117. PMLR, 906–934.
[104]
Shangtong Zhang and Shimon Whiteson. 2019. DAC: The double actor-critic architecture for learning options. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 2012–2022.

Cited By

View all
  • (2025)Understanding world models through multi-step pruning policy via reinforcement learningInformation Sciences10.1016/j.ins.2024.121361686(121361)Online publication date: Jan-2025
  • (2024)HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent PerformanceProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663103(2189-2191)Online publication date: 6-May-2024
  • (2024)Mixed-Initiative Bayesian Sub-Goal Optimization in Hierarchical Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662991(1328-1336)Online publication date: 6-May-2024
  • Show More Cited By

Index Terms

  1. Hierarchical Reinforcement Learning: A Comprehensive Survey

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 54, Issue 5
    June 2022
    719 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/3467690
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2021
    Accepted: 01 February 2021
    Revised: 01 February 2021
    Received: 01 July 2020
    Published in CSUR Volume 54, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Hierarchical reinforcement learning
    2. hierarchical reinforcement learning survey
    3. hierarchical reinforcement learning taxonomy
    4. skill discovery
    5. subtask discovery

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier-1
    • National Research Foundation, Singapore under its AI Singapore Programme AISG

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2,790
    • Downloads (Last 6 weeks)383
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Understanding world models through multi-step pruning policy via reinforcement learningInformation Sciences10.1016/j.ins.2024.121361686(121361)Online publication date: Jan-2025
    • (2024)HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent PerformanceProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663103(2189-2191)Online publication date: 6-May-2024
    • (2024)Mixed-Initiative Bayesian Sub-Goal Optimization in Hierarchical Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662991(1328-1336)Online publication date: 6-May-2024
    • (2024)Reinforcement Learning with Value Function Decomposition for Hierarchical Multi-Agent Consensus ControlMathematics10.3390/math1219306212:19(3062)Online publication date: 30-Sep-2024
    • (2024)Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience ModelElectronics10.3390/electronics1319385813:19(3858)Online publication date: 29-Sep-2024
    • (2024)How to Design Reinforcement Learning Methods for the Edge: An Integrated Approach toward Intelligent Decision MakingElectronics10.3390/electronics1307128113:7(1281)Online publication date: 29-Mar-2024
    • (2024)Transformer in reinforcement learning for decision-making: a survey基于Transformer的强化学习方法在智能决策领域的应用: 综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230054825:6(763-790)Online publication date: 5-Jul-2024
    • (2024)The successor representation subserves hierarchical abstraction for goal-directed behaviorPLOS Computational Biology10.1371/journal.pcbi.101131220:2(e1011312)Online publication date: 20-Feb-2024
    • (2024)Manipulating Recommender Systems: A Survey of Poisoning Attacks and CountermeasuresACM Computing Surveys10.1145/367732857:1(1-39)Online publication date: 7-Oct-2024
    • (2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 5-Jun-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media