research-article

Diversity policy gradient for sample efficient quality-diversity optimization

Authors:

Thomas Pierrot,

Valentin Macé,

Felix Chalumeau,

Arthur Flajolet,

Geoffrey Cideron,

Olivier Sigaud,

Nicolas Perrin-GilbertAuthors Info & Claims

GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1075 - 1083

https://doi.org/10.1145/3512290.3528845

Published: 08 July 2022 Publication History

Abstract

A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single eficient solution to a given problem. Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off that plays a central role in learning. It also allows for increased robustness when the returned collection contains several working solutions to the considered problem, making it well-suited for real applications such as robotics. Quality-Diversity (QD) methods are evolutionary algorithms designed for this purpose. This paper proposes a novel algorithm, qd-pg, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches to produce a collection of diverse and high-performing neural policies in continuous control environments. The main contribution of this work is the introduction of a Diversity Policy Gradient (DPG) that exploits information at the time-step level to drive policies towards more diversity in a sample-efficient manner. Specifically, qd-pg selects neural controllers from a map-elites grid and uses two gradient-based mutation operators to improve both quality and diversity. Our results demonstrate that qd-pg is significantly more sample-eficient than its evolutionary competitors.

Supplemental Material

PDF File

Supplemental material.

Download
1.42 MB

References

[1]

Alberto Alvarez, Steve Dahlskog, Jose Font, and Julian Togelius. 2019. Empowering quality diversity in dungeon design with interactive constrained MAP-Elites. In 2019 IEEE Conference on Games (CoG). IEEE, 1--8.

Digital Library

[2]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).

[3]

Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. 2018. Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018).

[4]

Leo Cazenille, Nicolas Bredeche, and Nathanael Aubert-Kato. 2019. Exploring Self-Assembling Behaviors in a Swarm of Bio-micro-robots using Surrogate-Assisted MAP-Elites. arXiv preprint arXiv:1910.00230 (2019).

[5]

Cédric Colas, Vashisht Madhavan, Joost Huizinga, and Jeff Clune. 2020. Scaling MAP-Elites to deep neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 67--75.

Digital Library

[6]

Cédric Colas, Olivier Sigaud, and Pierre-Yves Oudeyer. 2018. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms. arXiv preprint arXiv:1802.05054 (2018).

[7]

Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2018. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in neural information processing systems. 5027--5038.

Digital Library

[8]

Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals. Nature 521, 7553 (2015), 503--507.

[9]

Antoine Cully and Yiannis Demiris. 2017. Quality and diversity optimization: A unifying modular framework. IEEE Transactions on Evolutionary Computation 22, 2 (2017), 245--259.

[10]

Thang Doan, Bogdan Mazoure, Audrey Durand, Joelle Pineau, and R Devon Hjelm. 2019. Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning. arXiv preprint arXiv:1909.07543 (2019).

[11]

Stephane Doncieux, Alban Laflaquière, and Alexandre Coninx. 2019. Novelty search: a theoretical perspective. In Proceedings of the Genetic and Evolutionary Computation Conference. 99--106.

Digital Library

[12]

Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2019. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019).

[13]

Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is All You Need: Learning Skills without a Reward Function. arXiv preprint arXiv:1802.06070 (2018).

[14]

Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, and Philippe Preux. [n. d.]. ADVERSARIALLY GUIDED ACTOR-CRITIC. ([n. d.]).

[15]

Matthew C. Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Diversity. CoRR abs/2106.03894 (2021). arXiv:2106.03894 https://arxiv.org/abs/2106.03894

[16]

Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190 (2017).

[17]

Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. 2018. Meta learning shared hierarchies. Proc. of ICLR (2018).

[18]

Scott Fujimoto, Herke Van Hoof, and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018).

[19]

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).

[20]

Riashat Islam, Zafarali Ahmed, and Doina Precup. 2019. Marginalized State Distribution Entropy Regularization in Policy Optimization. arXiv preprint arXiv:1912.05128 (2019).

[21]

Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. 2017. Population-based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).

[22]

Whiyoung Jung, Giseung Park, and Youngchul Sung. 2020. Population-Guided Parallel Policy Search for Reinforcement Learning. In International Conference on Learning Representations.

[23]

Shauharda Khadka, Somdeb Majumdar, Santiago Miret, Evren Tumer, Tarek Nassar, Zach Dwiel, Yinyin Liu, and Kagan Tumer. 2019. Collaborative evolutionary reinforcement learning. arXiv preprint arXiv:1905.00976 (2019).

[24]

Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, and Kagan Tumer. [n. d.]. Collaborative Evolutionary Reinforcement Learning. ([n. d.]).

[25]

Shauharda Khadka and Kagan Tumer. 2018. Evolution-Guided Policy Gradient in Reinforcement Learning. In Neural Information Processing Systems.

[26]

Sylvain Koos, Jean-Baptiste Mouret, and Stéphane Doncieux. 2012. The transferability approach: Crossing the reality gap in evolutionary robotics. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 122--145.

Digital Library

[27]

Lisa Lee, Benjamin Eysenbach, Emilio Parisotto, Eric Xing, Sergey Levine, and Ruslan Salakhutdinov. 2019. Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274 (2019).

[28]

Joel Lehman and Kenneth O Stanley. 2011. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation 19, 2 (2011), 189--223.

[29]

Joel Lehman and Kenneth O Stanley. 2011. Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation. 211--218.

Digital Library

[30]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).

[31]

Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).

[32]

Soroush Nasiriany, Vitchyr H Pong, Steven Lin, and Sergey Levine. 2019. Planning with goal-conditioned policies. arXiv preprint arXiv:1911.08453 (2019).

[33]

Olle Nilsson and Antoine Cully. 2021. Policy Gradient Assisted MAP-Elites; Policy Gradient Assisted MAP-Elites. (2021). &iuml

Digital Library

[34]

Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, and Stephen Roberts. 2020. Effective Diversity in Population-Based Reinforcement Learning. In Neural Information Processing Systems.

[35]

Vitchyr H Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. 2019. Skew-fit: State-covering self-supervised reinforcement learning. arXiv preprint arXiv:1903.03698 (2019).

[36]

Aloïs Pourchot and Olivier Sigaud. 2018. CEM-RL: Combining evolutionary and gradient-based methods for policy search. arXiv preprint arXiv:1810.01222 (2018).

[37]

Justin K Pugh, Lisa B Soros, and Kenneth O. Stanley. 2016. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI 3 (2016), 40.

[38]

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).

[39]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[40]

Longxiang Shi, Shijian Li, Qian Zheng, Min Yao, and Gang Pan. 2020. Efficient Novelty Search Through Deep Reinforcement Learning. IEEE Access 8 (2020), 128809--128818.

[41]

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proceedings of the 30th International Conference in Machine Learning.

[42]

Christopher Stanton and Jeff Clune. 2016. Curiosity search: producing generalists by encouraging individuals to continually explore and acquire skills throughout their lifetime. PloS one 11, 9 (2016), e0162235.

[43]

Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. 1999. Policy gradient methods for reinforcement learning with function approximation. In NIPs, Vol. 99. Citeseer, 1057--1063.

Digital Library

[44]

Vassilis Vassiliades, Konstantinos I. Chatzilygeroudis, and Jean-Baptiste Mouret. 2016. Scaling Up MAP-Elites Using Centroidal Voronoi Tessellations. CoRR abs/1610.05729 (2016). arXiv:1610.05729 http://arxiv.org/abs/1610.05729

[45]

Vassilis Vassiliades and Jean-Baptiste Mouret. 2018. Discovering the Elite Hypervolume by Leveraging Interspecies Correlation. GECCO 2018 - Proceedings of the 2018 Genetic and Evolutionary Computation Conference (4 2018), 149--156.

Digital Library

[46]

Anirudh Vemula, Wen Sun, and J Bagnell. 2019. Contrasting exploration in parameter and action space: A zeroth-order optimization perspective. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2926--2935.

Cited By

Batra STjanaka BNikolaidis SSukhatme GLi XHandl J(2024)Quality Diversity for Robot Learning: Limitations and Future DirectionsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654431(587-590)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654431
Flageat MLim BCully ALi XHandl J(2024)Enhancing MAP-Elites with Multiple Parallel Evolution StrategiesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654089(1082-1090)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654089
Sayar EIacca GKnoll ALi XHandl J(2024)Multi-Objective Evolutionary Hindsight Experience Replay for Robot Manipulation TasksProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654045(403-411)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654045
Show More Cited By

Index Terms

Diversity policy gradient for sample efficient quality-diversity optimization
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Bio-inspired approaches
        Evolutionary robotics

Recommendations

Policy gradient assisted MAP-Elites
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference

Quality-Diversity optimization algorithms such as MAP-Elites, aim to generate collections of both diverse and high-performing solutions to an optimization problem. MAP-Elites has shown promising results in a variety of applications. In particular in ...
MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy
GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

Quality-Diversity algorithms, such as MAP-Elites, are a branch of Evolutionary Computation generating collections of diverse and high-performing solutions, that have been successfully applied to a variety of domains and particularly in evolutionary ...
Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning
A hallmark of intelligence is the ability to exhibit a wide range of effective behaviors. Inspired by this principle, Quality-Diversity algorithms, such as, are evolutionary methods designed to generate a set of diverse and high-fitness solutions. However,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

July 2022

1472 pages

ISBN:9781450392372

DOI:10.1145/3512290

Editor:
Jonathan E. Fieldsend
University of Exeter
,
General Chair:
Markus Wagner
The University of Adelaide

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Data Availability

Supplemental material. https://dl.acm.org/doi/10.1145/3512290.3528845#p1075-pierrot-suppl.pdf

Conference

GECCO '22

Sponsor:

SIGEVO

GECCO '22: Genetic and Evolutionary Computation Conference

July 9 - 13, 2022

Massachusetts, Boston

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
225
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)11

Reflects downloads up to 27 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Batra STjanaka BNikolaidis SSukhatme GLi XHandl J(2024)Quality Diversity for Robot Learning: Limitations and Future DirectionsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654431(587-590)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654431
Flageat MLim BCully ALi XHandl J(2024)Enhancing MAP-Elites with Multiple Parallel Evolution StrategiesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654089(1082-1090)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654089
Sayar EIacca GKnoll ALi XHandl J(2024)Multi-Objective Evolutionary Hindsight Experience Replay for Robot Manipulation TasksProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654045(403-411)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654045
Flageat MCully A(2024)Uncertain Quality-Diversity: Evaluation Methodology and New Methods for Quality-Diversity in Uncertain DomainsIEEE Transactions on Evolutionary Computation10.1109/TEVC.2023.327356028:4(891-902)Online publication date: Aug-2024
https://doi.org/10.1109/TEVC.2023.3273560
Jiang JPiao HFu YHao YJiang CWei ZYang X(2024)Phasic Diversity Optimization for Population-Based Reinforcement Learning2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610814(272-278)Online publication date: 13-May-2024
https://doi.org/10.1109/ICRA57147.2024.10610814
Park SYoon TLee JPark SChoi S(2024)Quality-diversity based semi-autonomous teleoperation using reinforcement learningNeural Networks10.1016/j.neunet.2024.106543(106543)Online publication date: Jul-2024
https://doi.org/10.1016/j.neunet.2024.106543
Bai HCheng RJin Y(2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
https://doi.org/10.34133/icomputing.0025
Wickman RPoudel BVillarreal TZhang XLi WSilva SPaquete L(2023)Efficient Quality-Diversity Optimization through Diverse Quality SpeciesProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3590581(699-702)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3590581
Faldor MChalumeau FFlageat MCully ASilva SPaquete L(2023)MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single PolicyProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590503(138-146)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583131.3590503
Lim BFlageat MCully ASilva SPaquete L(2023)Understanding the Synergies between Quality-Diversity and Deep Reinforcement LearningProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590388(1212-1220)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583131.3590388
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents