Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3583133.3596321acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article
Open access

Empirical Loss Landscape Analysis of Neural Network Activation Functions

Published: 24 July 2023 Publication History

Abstract

Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations.

References

[1]
Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. 2018. Understanding deep neural networks with rectified linear units. In Proceedings of the International Conference on Learning Representations. Vancouver, Canada, 1--17.
[2]
Anna Sergeevna Bosman, Andries Engelbrecht, and Mardé Helbig. 2020. Visualising basins of attraction for the cross-entropy and the squared error neural network loss functions. Neurocomputing 400 (2020), 113--136.
[3]
Anna Sergeevna Bosman, Andries P Engelbrecht, and Mardé Helbig. 2016. Search space boundaries in neural network error landscape analysis. In Proceedings of the IEEE Symposium Series on Computational Intelligence. IEEE, Piscataway, USA, 1--8.
[4]
Anna Sergeevna Bosman, Andries P Engelbrecht, and Mardé Helbig. 2018. Progressive gradient walk for neural network fitness landscape analysis. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, 1473--1480.
[5]
Anna Sergeevna Bosman, Andries Petrus Engelbrecht, and Mardé Helbig. 2020. Loss Surface Modality of Feed-Forward Neural Network Architectures. In 2020 International Joint Conference on Neural Networks (IJCNN). 1--8.
[6]
Jiezhang Cao, Qingyao Wu, Yuguang Yan, Li Wang, and Mingkui Tan. 2017. On the flatness of loss surface for two-layered relu networks. In Asian Conference on Machine Learning. PMLR, 545--560.
[7]
Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and accurate deep network learning by exponential linear units (ELUs). In Proceedings of the International Conference on Learning Representations. 1--14.
[8]
Ronald A Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of eugenics 7, 2 (1936), 179--188.
[9]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 249--256.
[10]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 315--323.
[11]
Leonard G C Hamey. 1998. XOR has no local minima: A case study in neural network error surface analysis. Neural Networks 11, 4 (1998), 669--681.
[12]
S. Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 02 (1998), 107--116.
[13]
T. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359--366.
[14]
Mirosław Kordos and Wlodzisław Duch. 2004. A survey of factors influencing MLP error surface. Control and Cybernetics 33, 4 (2004), 611--631.
[15]
Thomas Laurent and James Brecht. 2018. The multilinear structure of ReLU networks. In International conference on machine learning. PMLR, 2908--2916.
[16]
Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. AT&T Labs (2010).
[17]
Shiyu Liang, Ruoyu Sun, Yixuan Li, and Rayadurgam Srikant. 2018. Understanding the Loss Surface of Neural Networks for Binary Classification. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2835--2843.
[18]
Bo Liu. 2021. Understanding the loss landscape of one-hidden-layer ReLU networks. Knowledge-Based Systems 220 (2021), 106923.
[19]
Katherine Mary Malan. 2014. Characterising continuous optimisation problems for particle swarm optimisation performance prediction. Ph. D. Dissertation. University of Pretoria.
[20]
Katherine Mary Malan. 2021. A survey of advances in landscape analysis for optimisation. Algorithms 14, 2 (2021), 40.
[21]
Katherine M Malan and Andries P Engelbrecht. 2009. Quantifying ruggedness of continuous landscapes using entropy. In IEEE Congress on Evolutionary Computation. IEEE, 1440--1447.
[22]
Katherine M Malan and Andries P Engelbrecht. 2013. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences 241 (2013), 148--163.
[23]
Katherine M Malan and Andries P Engelbrecht. 2014. A progressive random walk algorithm for sampling continuous fitness landscapes. In Proceedings of the IEEE Congress on Evolutionary Computation. IEEE, 2507--2514.
[24]
Dhagash Mehta, Xiaojun Zhao, Edgar A Bernal, and David J Wales. 2018. Loss surface of XOR artificial neural networks. Physical Review E 97, 5 (2018), 052307.
[25]
Tristan Milne. 2019. Piecewise strong convexity of neural networks. Advances in Neural Information Processing Systems 32 (2019).
[26]
Mario A Muñoz, Yuan Sun, Michael Kirley, and Saman K Halgamuge. 2015. Algorithm selection for black-box continuous optimization problems: A survey on methods and challenges. Information Sciences 317 (2015), 224--245.
[27]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning. 807--814.
[28]
Erik Pitzer and Michael Affenzeller. 2012. A comprehensive survey on fitness landscape analysis. In Recent Advances in Intelligent Engineering Systems. Springer, 161--191.
[29]
Lutz Prechelt. 1994. Proben1 - A Set of Neural Network Benchmark Problems and Benchmarking Rules. Technical Report. Universität Karlsruhe, Karlsruhe, Germany.
[30]
Anna Rakitianskaia and Andries Engelbrecht. 2015. Measuring saturation in neural networks. In Proceedings of the IEEE Symposium Series on Computational Intelligence. IEEE, 1423--1430.
[31]
Anna Rakitianskaia and Andries Engelbrecht. 2015. Saturation in PSO Neural Network Training: Good or Evil?. In Proceedings of the IEEE Congress on Evolutionary Computation. IEEE, Sendai, Japan, 125--132.
[32]
Blaine Rister and Daniel L. Rubin. 2017. Piecewise convexity of artificial neural networks. Neural Networks 94 (2017), 34 -- 45.
[33]
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California University, San Diego La Jolla Institute for Cognitive Science.
[34]
Itay Safran and Ohad Shamir. 2018. Spurious local minima are common in two-layer relu neural networks. In International Conference on Machine Learning. PMLR, 4433--4441.
[35]
Tom Smith, Phil Husbands, and Michael O'Shea. 2001. Not measuring evolvability: Initial investigation of an evolutionary robotics search space. In Proceedings of the IEEE Congress on Evolutionary Computation, Vol. 1. IEEE, 9--16.
[36]
Frank Spitzer. 2013. Principles of random walk. Vol. 34. Springer Science & Business Media.
[37]
Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015).

Cited By

View all
  • (2024)Fitness Landscape Analysis of Product Unit Neural NetworksAlgorithms10.3390/a1706024117:6(241)Online publication date: 4-Jun-2024
  • (2024)Characterising Deep Learning Loss Landscapes with Local Optima Networks2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10611772(1-8)Online publication date: 30-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation
July 2023
2519 pages
ISBN:9798400701207
DOI:10.1145/3583133
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. neural networks
  2. activation functions
  3. loss landscape
  4. fitness landscape analysis

Qualifiers

  • Research-article

Funding Sources

Conference

GECCO '23 Companion
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)140
  • Downloads (Last 6 weeks)21
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fitness Landscape Analysis of Product Unit Neural NetworksAlgorithms10.3390/a1706024117:6(241)Online publication date: 4-Jun-2024
  • (2024)Characterising Deep Learning Loss Landscapes with Local Optima Networks2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10611772(1-8)Online publication date: 30-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media