research-article

Understanding features on evolutionary policy optimizations: feature learning difference between gradient-based and evolutionary policy optimizations

Authors:

Myoung Hoon Ha,

Byoungro MoonAuthors Info & Claims

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

Pages 1112 - 1118

https://doi.org/10.1145/3341105.3373966

Published: 30 March 2020 Publication History

Abstract

We analyze two deep reinforcement learning algorithms, gradient-based policy optimization and evolutionary one, by a number of visualization techniques and supplement experiments. As such techniques, filter visualization and saliency map are used to examine whether meaningful features properly extracted in the two algorithms. In addition to visual analysis, some experiments are devised to enhance the validity of the analysis. We observed that an evolutionary policy optimization tends to make use of the prior knowledge and learn the prior action distribution of the policy by a powerful exploration ability, which a gradient-based algorithm cannot do easily.

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[2]

Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798--1828.

Digital Library

[3]

Hans-Georg Beyer and Hans-Paul Schwefel. 2002. Evolution strategies-A comprehensive introduction. Natural computing 1, 1 (2002), 3--52.

[4]

Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2018. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in Neural Information Processing Systems. 5027--5038.

Digital Library

[5]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1126--1135.

Digital Library

[6]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.

Digital Library

[7]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT Press.

Digital Library

[8]

Sam Greydanus, Anurag Koul, Jonathan Dodge, and Alan Fern. 2017. Visualizing and understanding atari agents. arXiv preprint arXiv:1711.00138 (2017).

[9]

Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. 2018. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-Second AAAI Conference on Artificial Intelligence.

[10]

Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine 29 (2012).

[11]

Shauharda Khadka and Kagan Tumer. 2018. Evolution-guided policy gradient in reinforcement learning. In Advances in Neural Information Processing Systems. 1188--1200.

[12]

Nate Kohl and Peter Stone. 2004. Policy gradient reinforcement learning for fast quadrupedal locomotion. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA'04. 2004, Vol. 3. IEEE, 2619--2624.

[13]

Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014.

[14]

Jan Koutník, Jürgen Schmidhuber, and Faustino Gomez. 2014. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, 541--548.

Digital Library

[15]

Solomon Kullback and Richard A Leibler. 1951. On information and sufficiency. The annals of mathematical statistics 22, 1 (1951), 79--86.

[16]

Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2017. Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154 (2017).

[17]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. 1928--1937.

Digital Library

[18]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.

[19]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

[20]

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).

[21]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. 1889--1897.

Digital Library

[22]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[23]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618--626.

[24]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484.

[25]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).

[26]

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017).

[27]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

[28]

Richard S Sutton, Andrew G Barto, et al. 1998. Introduction to reinforcement learning. Vol. 2. MIT press Cambridge.

[29]

Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.

[30]

Gerald Tesauro. 1995. Temporal difference learning and TD-Gammon. Commun. ACM 38, 3(1995), 58--68.

Digital Library

[31]

Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26--31.

[32]

Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.

[33]

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.

Cited By

Feng JShi TWu YXie XHe HTan H(2023)Multi-Lane Differential Variable Speed Limit Control via Deep Neural Networks Optimized by an Adaptive Evolutionary StrategySensors10.3390/s2310465923:10(4659)Online publication date: 11-May-2023
https://doi.org/10.3390/s23104659

Index Terms

Understanding features on evolutionary policy optimizations: feature learning difference between gradient-based and evolutionary policy optimizations
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Robotic planning
        Evolutionary robotics
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Comparison of cauchy EDA and BIPOP-CMA-ES algorithms on the BBOB noiseless testbed
GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computation

Estimation-of-distribution algorithm using Cauchy sampling distribution is compared with the bi-population CMA evolutionary strategy which was one of the best contenders in the black-box optimization benchmarking workshop in 2009. The results clearly ...
Biogeography-based optimization combined with evolutionary strategy and immigration refusal
SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

Biogeography-based optimization (BBO) is a recently developed heuristic algorithm which has shown impressive performance on many well known benchmarks. In order to improve BBO, this paper incorporates distinctive features from other successful heuristic ...
A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation
Abstract
On-policy deep reinforcement learning (DRL) has the inherent advantage of using multi-step interaction data for policy learning. However, on-policy DRL still faces challenges in improving the sample efficiency of policy evaluations. Therefore, we ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

March 2020

2348 pages

ISBN:9781450368667

DOI:10.1145/3341105

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Tomas Cerny
Baylor University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Alessio Bechini
University of Pisa, Italy

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea

Conference

SAC '20

Sponsor:

SIGAPP

SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing

March 30 - April 3, 2020

Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
186
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Feng JShi TWu YXie XHe HTan H(2023)Multi-Lane Differential Variable Speed Limit Control via Deep Neural Networks Optimized by an Adaptive Evolutionary StrategySensors10.3390/s2310465923:10(4659)Online publication date: 11-May-2023
https://doi.org/10.3390/s23104659

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents