Abstract
Deep Reinforcement Learning is known to be brittle towards selection of appropriate hyperparameters. In particular, the selection of the structure of employed Deep Neural Networks has shown to be important and overfitting has become a common problem for DRL approaches. This study, first, analyzes how severe overfitting in DRL is in standard continuous control problems. Secondly, we argue that this might be partially due to the centralized perspective in control in which a single holistic controller is used that has to learn from all possible sensory inputs. As this is usually a high dimensional space, it appears natural that a Neural Network controller of high capacity starts to pick up non-meaningful correlations when relying on limited data during training. As a consequence, large Neural Network controller start to base their decisions on unimportant inputs. As a contrast, we offer a decentralized perspective in which control is distributed to local modules that act on local sensory inputs of much lower dimensionality. Such a prior of local inputs is biological inspired and shows, on the one hand, much faster learning, and, on the other hand, it is more robust against overfitting. Last, as decentralization impacts input (and output) dimensionality, we evaluated different common Neural Network initialization schemes and found Glorot initialization providing the most robust results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andrychowicz, M., et al.: What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study. arXiv:2006.05990 [cs, stat] (2020)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: A brief survey of deep reinforcement learning. IEEE Sig. Process. Mag. 34(6), 26–38 (2017)
Bidaye, S.S., Bockemühl, T., Büschges, A.: Six-legged walking in insects: how CPGs, peripheral feedback, and descending signals generate coordinated and adaptive motor rhythms. J. Neurophysiol. 119(2), 459–475 (2018)
Clary, K., Tosch, E., Foley, J., Jensen, D.: Let’s Play Again: Variability of Deep Reinforcement Learning Agents in Atari Environments. arXiv:1904.06312 [cs, stat] (2019)
Clune, J., Mouret, J.B., Lipson, H.: The evolutionary origins of modularity. Proc. R. Soc. B Biol. Sci. 280(1755), 20122863 (2013). https://doi.org/10.1098/rspb.2012.2863
Dickinson, M.H., Farley, C.T., Full, R.J., Koehl, M.R., Kram, R., Lehman, S.: How animals move: an integrative view. Science 288(5463), 100–106 (2000)
Dürr, V., et al.: Integrative biomimetics of autonomous hexapedal locomotion. Front. Neurorobot. 13, 88 (2019)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 249–256 (2010)
Heess, N., et al.: Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017)
Hsu, C.C.Y., Mendler-Dünner, C., Hardt, M.: Revisiting Design Choices in Proximal Policy Optimization. arXiv:2009.10897 [cs, stat] (2020)
Hwangbo, J., et al.: Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4(26), eaau5872 (2019). https://doi.org/10.1126/scirobotics.aau5872
Khadka, S., et al.: Collaborative Evolutionary Reinforcement Learning. arXiv:1905.00976 [cs, stat] (2019)
Liang, E., et al.: RLlib: Abstractions for Distributed Reinforcement Learning. arXiv:1712.09381 [cs] (2018)
Reda, D., Tao, T., van de Panne, M.: Learning to locomote: Understanding how environment design matters for deep reinforcement learning. In: Proceedings of the ACM SIGGRAPH Conference on Motion, Interaction and Games (2020)
Schilling, M., Cruse, H.: Decentralized control of insect walking: a simple neural network explains a wide range of behavioral and neurophysiological results. PLOS Comput. Biol. 16(4), e1007804 (2020)
Schilling, M., Hoinville, T., Schmitz, J., Cruse, H.: Walknet, a bio-inspired controller for hexapod walking. Biol. Cybern. 107(4), 397–419 (2013)
Schilling, M., Konen, K., Ohl, F.W., Korthals, T.: Decentralized deep reinforcement learning for a distributed and adaptive locomotion controller of a hexapod robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2020)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press (2018). http://incompleteideas.net/book/the-book-2nd.html
Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ IROS. pp. 5026–5033 (2012)
Zhang, C., Vinyals, O., Munos, R., Bengio, S.: A Study on Overfitting in Deep Reinforcement Learning. arXiv:1804.06893 [cs, stat] (2018)
Acknowledgements
This research was supported by the research training group “Dataninja” (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Schilling, M. (2021). Avoid Overfitting in Deep Reinforcement Learning: Increasing Robustness Through Decentralized Control. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-86380-7_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)