Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3512290.3528845acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Diversity policy gradient for sample efficient quality-diversity optimization

Published: 08 July 2022 Publication History

Abstract

A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single eficient solution to a given problem. Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off that plays a central role in learning. It also allows for increased robustness when the returned collection contains several working solutions to the considered problem, making it well-suited for real applications such as robotics. Quality-Diversity (QD) methods are evolutionary algorithms designed for this purpose. This paper proposes a novel algorithm, qd-pg, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches to produce a collection of diverse and high-performing neural policies in continuous control environments. The main contribution of this work is the introduction of a Diversity Policy Gradient (DPG) that exploits information at the time-step level to drive policies towards more diversity in a sample-efficient manner. Specifically, qd-pg selects neural controllers from a map-elites grid and uses two gradient-based mutation operators to improve both quality and diversity. Our results demonstrate that qd-pg is significantly more sample-eficient than its evolutionary competitors.

Supplemental Material

PDF File
Supplemental material.

References

[1]
Alberto Alvarez, Steve Dahlskog, Jose Font, and Julian Togelius. 2019. Empowering quality diversity in dungeon design with interactive constrained MAP-Elites. In 2019 IEEE Conference on Games (CoG). IEEE, 1--8.
[2]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).
[3]
Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. 2018. Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018).
[4]
Leo Cazenille, Nicolas Bredeche, and Nathanael Aubert-Kato. 2019. Exploring Self-Assembling Behaviors in a Swarm of Bio-micro-robots using Surrogate-Assisted MAP-Elites. arXiv preprint arXiv:1910.00230 (2019).
[5]
Cédric Colas, Vashisht Madhavan, Joost Huizinga, and Jeff Clune. 2020. Scaling MAP-Elites to deep neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 67--75.
[6]
Cédric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. 2018. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. arXiv preprint arXiv:1802.05054 (2018).
[7]
Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2018. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in neural information processing systems. 5027--5038.
[8]
Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals. Nature 521, 7553 (2015), 503--507.
[9]
Antoine Cully and Yiannis Demiris. 2017. Quality and diversity optimization: A unifying modular framework. IEEE Transactions on Evolutionary Computation 22, 2 (2017), 245--259.
[10]
Thang Doan, Bogdan Mazoure, Audrey Durand, Joelle Pineau, and R Devon Hjelm. 2019. Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning. arXiv preprint arXiv:1909.07543 (2019).
[11]
Stephane Doncieux, Alban Laflaquière, and Alexandre Coninx. 2019. Novelty search: a theoretical perspective. In Proceedings of the Genetic and Evolutionary Computation Conference. 99--106.
[12]
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2019. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019).
[13]
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is All You Need: Learning Skills without a Reward Function. arXiv preprint arXiv:1802.06070 (2018).
[14]
Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, and Philippe Preux. [n. d.]. ADVERSARIALLY GUIDED ACTOR-CRITIC. ([n. d.]).
[15]
Matthew C. Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Diversity. CoRR abs/2106.03894 (2021). arXiv:2106.03894 https://arxiv.org/abs/2106.03894
[16]
Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190 (2017).
[17]
Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. 2018. Meta learning shared hierarchies. Proc. of ICLR (2018).
[18]
Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018).
[19]
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).
[20]
Riashat Islam, Zafarali Ahmed, and Doina Precup. 2019. Marginalized State Distribution Entropy Regularization in Policy Optimization. arXiv preprint arXiv:1912.05128 (2019).
[21]
Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. 2017. Population-based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).
[22]
Whiyoung Jung, Giseung Park, and Youngchul Sung. 2020. Population-Guided Parallel Policy Search for Reinforcement Learning. In International Conference on Learning Representations.
[23]
Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Evren Tumer, Tarek Nassar, Zach Dwiel, Yinyin Liu, and Kagan Tumer. 2019. Collaborative evolutionary reinforcement learning. arXiv preprint arXiv:1905.00976 (2019).
[24]
Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, and Kagan Tumer. [n. d.]. Collaborative Evolutionary Reinforcement Learning. ([n. d.]).
[25]
Shauharda Khadka and Kagan Tumer. 2018. Evolution-Guided Policy Gradient in Reinforcement Learning. In Neural Information Processing Systems.
[26]
Sylvain Koos, Jean-Baptiste Mouret, and Stéphane Doncieux. 2012. The transferability approach: Crossing the reality gap in evolutionary robotics. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 122--145.
[27]
Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey Levine, and Ruslan Salakhutdinov. 2019. Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019).
[28]
Joel Lehman and Kenneth O Stanley. 2011. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation 19, 2 (2011), 189--223.
[29]
Joel Lehman and Kenneth O Stanley. 2011. Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation. 211--218.
[30]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
[31]
Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).
[32]
Soroush Nasiriany, Vitchyr H Pong, Steven Lin, and Sergey Levine. 2019. Planning with goal-conditioned policies. arXiv preprint arXiv:1911.08453 (2019).
[33]
Olle Nilsson and Antoine Cully. 2021. Policy Gradient Assisted MAP-Elites; Policy Gradient Assisted MAP-Elites. (2021). &iuml
[34]
Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, and Stephen Roberts. 2020. Effective Diversity in Population-Based Reinforcement Learning. In Neural Information Processing Systems.
[35]
Vitchyr H Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. 2019. Skew-fit: State-covering self-supervised reinforcement learning. arXiv preprint arXiv:1903.03698 (2019).
[36]
Aloïs Pourchot and Olivier Sigaud. 2018. CEM-RL: Combining evolutionary and gradient-based methods for policy search. arXiv preprint arXiv:1810.01222 (2018).
[37]
Justin K Pugh, Lisa B Soros, and Kenneth O. Stanley. 2016. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI 3 (2016), 40.
[38]
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
[39]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[40]
Longxiang Shi, Shijian Li, Qian Zheng, Min Yao, and Gang Pan. 2020. Efficient Novelty Search Through Deep Reinforcement Learning. IEEE Access 8 (2020), 128809--128818.
[41]
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proceedings of the 30th International Conference in Machine Learning.
[42]
Christopher Stanton and Jeff Clune. 2016. Curiosity search: producing generalists by encouraging individuals to continually explore and acquire skills throughout their lifetime. PloS one 11, 9 (2016), e0162235.
[43]
Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. 1999. Policy gradient methods for reinforcement learning with function approximation. In NIPs, Vol. 99. Citeseer, 1057--1063.
[44]
Vassilis Vassiliades, Konstantinos I. Chatzilygeroudis, and Jean-Baptiste Mouret. 2016. Scaling Up MAP-Elites Using Centroidal Voronoi Tessellations. CoRR abs/1610.05729 (2016). arXiv:1610.05729 http://arxiv.org/abs/1610.05729
[45]
Vassilis Vassiliades and Jean-Baptiste Mouret. 2018. Discovering the Elite Hypervolume by Leveraging Interspecies Correlation. GECCO 2018 - Proceedings of the 2018 Genetic and Evolutionary Computation Conference (4 2018), 149--156.
[46]
Anirudh Vemula, Wen Sun, and J Bagnell. 2019. Contrasting exploration in parameter and action space: A zeroth-order optimization perspective. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2926--2935.

Cited By

View all
  • (2024)Quality Diversity for Robot Learning: Limitations and Future DirectionsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654431(587-590)Online publication date: 14-Jul-2024
  • (2024)Enhancing MAP-Elites with Multiple Parallel Evolution StrategiesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654089(1082-1090)Online publication date: 14-Jul-2024
  • (2024)Multi-Objective Evolutionary Hindsight Experience Replay for Robot Manipulation TasksProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654045(403-411)Online publication date: 14-Jul-2024
  • Show More Cited By

Index Terms

  1. Diversity policy gradient for sample efficient quality-diversity optimization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2022
    1472 pages
    ISBN:9781450392372
    DOI:10.1145/3512290
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MAP-elites
    2. neuroevolution
    3. policy gradient
    4. quality-diversity

    Qualifiers

    • Research-article

    Data Availability

    Conference

    GECCO '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)93
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 27 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Quality Diversity for Robot Learning: Limitations and Future DirectionsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654431(587-590)Online publication date: 14-Jul-2024
    • (2024)Enhancing MAP-Elites with Multiple Parallel Evolution StrategiesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654089(1082-1090)Online publication date: 14-Jul-2024
    • (2024)Multi-Objective Evolutionary Hindsight Experience Replay for Robot Manipulation TasksProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654045(403-411)Online publication date: 14-Jul-2024
    • (2024)Uncertain Quality-Diversity: Evaluation Methodology and New Methods for Quality-Diversity in Uncertain DomainsIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.327356028:4(891-902)Online publication date: Aug-2024
    • (2024)Phasic Diversity Optimization for Population-Based Reinforcement Learning2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610814(272-278)Online publication date: 13-May-2024
    • (2024)Quality-diversity based semi-autonomous teleoperation using reinforcement learningNeural Networks10.1016/j.neunet.2024.106543(106543)Online publication date: Jul-2024
    • (2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
    • (2023)Efficient Quality-Diversity Optimization through Diverse Quality SpeciesProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3590581(699-702)Online publication date: 15-Jul-2023
    • (2023)MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single PolicyProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590503(138-146)Online publication date: 15-Jul-2023
    • (2023)Understanding the Synergies between Quality-Diversity and Deep Reinforcement LearningProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590388(1212-1220)Online publication date: 15-Jul-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media