research-article

Open access

Empirical Loss Landscape Analysis of Neural Network Activation Functions

Authors:

Anna Sergeevna Bosman,

Andries Engelbrecht,

Marde HelbigAuthors Info & Claims

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

Pages 2029 - 2037

https://doi.org/10.1145/3583133.3596321

Published: 24 July 2023 Publication History

Abstract

Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations.

References

[1]

Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. 2018. Understanding deep neural networks with rectified linear units. In Proceedings of the International Conference on Learning Representations. Vancouver, Canada, 1--17.

[2]

Anna Sergeevna Bosman, Andries Engelbrecht, and Mardé Helbig. 2020. Visualising basins of attraction for the cross-entropy and the squared error neural network loss functions. Neurocomputing 400 (2020), 113--136.

[3]

Anna Sergeevna Bosman, Andries P Engelbrecht, and Mardé Helbig. 2016. Search space boundaries in neural network error landscape analysis. In Proceedings of the IEEE Symposium Series on Computational Intelligence. IEEE, Piscataway, USA, 1--8.

[4]

Anna Sergeevna Bosman, Andries P Engelbrecht, and Mardé Helbig. 2018. Progressive gradient walk for neural network fitness landscape analysis. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, 1473--1480.

Digital Library

[5]

Anna Sergeevna Bosman, Andries Petrus Engelbrecht, and Mardé Helbig. 2020. Loss Surface Modality of Feed-Forward Neural Network Architectures. In 2020 International Joint Conference on Neural Networks (IJCNN). 1--8.

[6]

Jiezhang Cao, Qingyao Wu, Yuguang Yan, Li Wang, and Mingkui Tan. 2017. On the flatness of loss surface for two-layered relu networks. In Asian Conference on Machine Learning. PMLR, 545--560.

[7]

Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and accurate deep network learning by exponential linear units (ELUs). In Proceedings of the International Conference on Learning Representations. 1--14.

[8]

Ronald A Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of eugenics 7, 2 (1936), 179--188.

[9]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 249--256.

[10]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 315--323.

[11]

Leonard G C Hamey. 1998. XOR has no local minima: A case study in neural network error surface analysis. Neural Networks 11, 4 (1998), 669--681.

Digital Library

[12]

S. Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 02 (1998), 107--116.

Digital Library

[13]

T. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359--366.

[14]

Mirosław Kordos and Wlodzisław Duch. 2004. A survey of factors influencing MLP error surface. Control and Cybernetics 33, 4 (2004), 611--631.

[15]

Thomas Laurent and James Brecht. 2018. The multilinear structure of ReLU networks. In International conference on machine learning. PMLR, 2908--2916.

[16]

Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. AT&T Labs (2010).

[17]

Shiyu Liang, Ruoyu Sun, Yixuan Li, and Rayadurgam Srikant. 2018. Understanding the Loss Surface of Neural Networks for Binary Classification. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 2835--2843.

[18]

Bo Liu. 2021. Understanding the loss landscape of one-hidden-layer ReLU networks. Knowledge-Based Systems 220 (2021), 106923.

[19]

Katherine Mary Malan. 2014. Characterising continuous optimisation problems for particle swarm optimisation performance prediction. Ph. D. Dissertation. University of Pretoria.

[20]

Katherine Mary Malan. 2021. A survey of advances in landscape analysis for optimisation. Algorithms 14, 2 (2021), 40.

[21]

Katherine M Malan and Andries P Engelbrecht. 2009. Quantifying ruggedness of continuous landscapes using entropy. In IEEE Congress on Evolutionary Computation. IEEE, 1440--1447.

[22]

Katherine M Malan and Andries P Engelbrecht. 2013. A survey of techniques for characterising fitness landscapes and some possible ways forward. Information Sciences 241 (2013), 148--163.

Digital Library

[23]

Katherine M Malan and Andries P Engelbrecht. 2014. A progressive random walk algorithm for sampling continuous fitness landscapes. In Proceedings of the IEEE Congress on Evolutionary Computation. IEEE, 2507--2514.

[24]

Dhagash Mehta, Xiaojun Zhao, Edgar A Bernal, and David J Wales. 2018. Loss surface of XOR artificial neural networks. Physical Review E 97, 5 (2018), 052307.

[25]

Tristan Milne. 2019. Piecewise strong convexity of neural networks. Advances in Neural Information Processing Systems 32 (2019).

[26]

Mario A Muñoz, Yuan Sun, Michael Kirley, and Saman K Halgamuge. 2015. Algorithm selection for black-box continuous optimization problems: A survey on methods and challenges. Information Sciences 317 (2015), 224--245.

Digital Library

[27]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning. 807--814.

Digital Library

[28]

Erik Pitzer and Michael Affenzeller. 2012. A comprehensive survey on fitness landscape analysis. In Recent Advances in Intelligent Engineering Systems. Springer, 161--191.

[29]

Lutz Prechelt. 1994. Proben1 - A Set of Neural Network Benchmark Problems and Benchmarking Rules. Technical Report. Universität Karlsruhe, Karlsruhe, Germany.

[30]

Anna Rakitianskaia and Andries Engelbrecht. 2015. Measuring saturation in neural networks. In Proceedings of the IEEE Symposium Series on Computational Intelligence. IEEE, 1423--1430.

[31]

Anna Rakitianskaia and Andries Engelbrecht. 2015. Saturation in PSO Neural Network Training: Good or Evil?. In Proceedings of the IEEE Congress on Evolutionary Computation. IEEE, Sendai, Japan, 125--132.

[32]

Blaine Rister and Daniel L. Rubin. 2017. Piecewise convexity of artificial neural networks. Neural Networks 94 (2017), 34 -- 45.

[33]

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1985. Learning internal representations by error propagation. Technical Report. California University, San Diego La Jolla Institute for Cognitive Science.

[34]

Itay Safran and Ohad Shamir. 2018. Spurious local minima are common in two-layer relu neural networks. In International Conference on Machine Learning. PMLR, 4433--4441.

[35]

Tom Smith, Phil Husbands, and Michael O'Shea. 2001. Not measuring evolvability: Initial investigation of an evolutionary robotics search space. In Proceedings of the IEEE Congress on Evolutionary Computation, Vol. 1. IEEE, 9--16.

[36]

Frank Spitzer. 2013. Principles of random walk. Vol. 34. Springer Science & Business Media.

[37]

Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015).

Cited By

Engelbrecht AGouldie R(2024)Fitness Landscape Analysis of Product Unit Neural NetworksAlgorithms10.3390/a1706024117:6(241)Online publication date: 4-Jun-2024
https://doi.org/10.3390/a17060241
Zhou YNeri FBai R(2024)Characterising Deep Learning Loss Landscapes with Local Optima Networks2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10611772(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/CEC60901.2024.10611772

Index Terms

Empirical Loss Landscape Analysis of Neural Network Activation Functions
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Continuous space search
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

An Empirical Study on Generalizations of the ReLU Activation Function
ACMSE '19: Proceedings of the 2019 ACM Southeast Conference

Deep Neural Networks have become the tool of choice for Machine Learning practitioners today. They have been successfully applied for solving a large class of learning problems both in the industry and academia with applications in fields such as ...
Comparison of new activation functions in neural network for forecasting financial time series

In artificial neural networks (ANNs), the activation function most used in practice are the logistic sigmoid function and the hyperbolic tangent function. The activation functions used in ANNs have been said to play an important role in the convergence ...
Progressive gradient walk for neural network fitness landscape analysis
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Understanding the properties of neural network error landscapes is an important problem faced by the neural network research community. A few attempts have been made in the past to gather insight about neural network error landscapes using fitness ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '23 Companion: Proceedings of the Companion Conference on Genetic and Evolutionary Computation

July 2023

2519 pages

ISBN:9798400701207

DOI:10.1145/3583133

Chair:
Sara Silva,
Program Chair:
Luís Paquete

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation

Conference

GECCO '23 Companion

Sponsor:

SIGEVO

GECCO '23 Companion: Companion Conference on Genetic and Evolutionary Computation

July 15 - 19, 2023

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
162
Total Downloads

Downloads (Last 12 months)140
Downloads (Last 6 weeks)21

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Engelbrecht AGouldie R(2024)Fitness Landscape Analysis of Product Unit Neural NetworksAlgorithms10.3390/a1706024117:6(241)Online publication date: 4-Jun-2024
https://doi.org/10.3390/a17060241
Zhou YNeri FBai R(2024)Characterising Deep Learning Loss Landscapes with Local Optima Networks2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10611772(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/CEC60901.2024.10611772

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents