A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

Philippe Thomas^1,2 &
Marie-Christine Suhner^1,2

1211 Accesses
3 Altmetric
Explore all metrics

Abstract

Optimizing the structure of neural networks remains a hard task. If too small, the architecture does not allow for proper learning from the data, whereas if the structure is too large, learning leads to the well-known overfitting problem. This paper considers this issue, and proposes a new pruning approach to determine the optimal structure. Our algorithm is based on variance sensitivity analysis, and prunes the different types of unit (hidden neurons, inputs, and weights) sequentially. The stop criterion is based on a performance evaluation of the network results from both the learning and validation datasets. Four variants of this algorithm are proposed. These variants use two different estimators of the variance. They are tested and compared with four classical algorithms on three classification and three regression problems. The results show that the proposed algorithms outperform the classical approaches in terms of both computational time and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Computationally Efficient Weight Pruning Algorithm for Artificial Neural Network Classifiers

Article 24 October 2017

Training Multilayer Perceptron: A Comparison of Back-propagation and Enhanced Jaya Algorithm for Data Classification

Second Order Training and Sizing for the Multilayer Perceptron

Article 08 October 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Rumelhart DE, McClelland JL (1986) Parallel distributed processing: exploration in the microstructure of cognition, vol 1. MIT Press, Cambridge
Google Scholar
Han HG, Qiao JF (2013) A structure optimisation algorithm for feedforward neural network construction. Neurocomputing 99:347–357
Article Google Scholar
Patel MC, Panchal M (2012) A review on ensemble of diverse artificial neural networks. Int J Adv Res Comput Eng Technol 1:63–70
Article Google Scholar
Drucker H (2002) Effect of pruning and early stopping on performance of a boosting ensemble. Comput Stat Data Anal 38:393–406
Kerlirzin P, Vallet F (1993) Robustness in multilayer perceptrons. Neural Comput 5:473–482
Article MATH Google Scholar
Thomas P, Bloch G, Sirou F, Eustache V (1999) Neural modeling of an induction furnace using robust learning criteria. J Integr Comput Aided Eng 6:15–25
MATH Google Scholar
Williams PM (1995) Bayesian regularization and pruning using a Laplace prior. Neural Comput 7:117–143
Article Google Scholar
Bartlett PL (1997) For valid generalization, the size of the weights is more important than the size of the network. Adv Neural Inf Process Systs 9:134–140. http://yaroslavvb.com/papers/bartlett-for.pdf. Accessed 10 July 2013
Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861
Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford
Google Scholar
Beigy H, Meybodi MR (2009) A learning automata-based algorithm for determination of the number of hidden units for three-layer neural networks. Int J Syst Sci 40:101–118
Article MATH Google Scholar
Kruschke JH (1989) Improving generalization in backpropagation networks. Proc Int Joint Conf Neural Netw 1:443–447
Article Google Scholar
LeCun Y, Denker J, Solla S, Howard RE, Jackel LD (1990) Optimal brain damage. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San Mateo
Google Scholar
Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Hanson SH, Cowan JD, Gilles CL (eds) Advances in neural information processing systems. Morgan Kaufmann, San Mateo
Google Scholar
Reed R (1993) Pruning algorithm—a survey. IEEE Trans Neural Netw 4:740–747
Article Google Scholar
Jutten C, Fambon O (1995) Pruning methods: a review. Proc Eur Symp Artif Neural Netw pp 129–140
Mezard M, Nadal JP (1989) Learning in feedforward neural networks: the tiling algorithm. J Phys 22:1285–1296
MathSciNet MATH Google Scholar
Fahlman SE, Lebier C (1990) The cascade-correlation learning architecture. Computer Science Department paper 138. http://repository.cmu.edu/compsci/1938. Accessed 10 July 2013
Yeung DY (1991) Automatic determination of network size for supervised learning. IEEE Int J Conf Neural Netw 1:158–164
MathSciNet MATH Google Scholar
Chentouf R, Jutten C (1996) Combining sigmoids and radial basis function in evolutive neural architecture. Eur Symp Artif Neural Netw 129–134
Hirose Y, Yamashita K, Hijita S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4:61–66
Article Google Scholar
Nabhan TM, Zomaya AY (1994) Toward generating neural network structures for function approximation. Neural Netw 7:89–99
Article MATH Google Scholar
Narasimha PL, Delashmit WH, Manry MT, Li J, Maldonado F (2008) An integrated growing-pruning method for feedforward network training. Neurocomputing 71:2831–2847
Article Google Scholar
Whitley D, Bogart C (1990) The evolution of connectivity: pruning neural networks using genetic algorithms. Proc Int J Conf Neural Netw 134–137
Angeline PJ, Saunders GM, Pollack JB (1994) Evolutionary algorithm that construct recurrent neural networks. IEEE Trans Neural Netw 5:54–65
Article MATH Google Scholar
Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12:1386–1399
Article MATH Google Scholar
Kolmogorov AN (1957) On the representational of continuous functions of many variables by superpositions of continuous functional of one variable and addition. Doklady Akademii Nauk URSS 114:953–956
MathSciNet Google Scholar
Cybenko G (1989) Approximation by superposition of a sigmoïdal function. Math Control Syst Signals 2:303–314
Article MathSciNet MATH Google Scholar
Funahashi K (1989) On the approximate realisation of continuous mapping by neural networks. Neural Netw 2:183–192
Article MATH Google Scholar
Hecht-Nielsen R (1987) Kolmogorov’s mapping neural network existence theorem. IEEE First Ann Int Conf Neural Netw 3:11–13
MATH Google Scholar
Huang GB (2003) Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans Neural Netw 14:274–281
Article Google Scholar
Dreyfus G, Martinez JM, Samuelides M, Gordon MB, Badran F, Thiria S, Hérault L (2002) Réseaux de neurones: méthodologies et applications. Editions Eyrolles, Paris
Google Scholar
Yao X (1993) Evolutionary artificial neural networks. Int J Neural Syst 4:203–222
Article Google Scholar
Huang DS, Du JX (2008) A constructive hybrid structure approach to solve routing problems. IEEE Trans Neural Netw 19:2099–2115
Article MATH Google Scholar
Mantzaris D, Anastassopoulos G, Adamopoulos A (2011) Genetic algorithm pruning of probabilitic neural networks in medical disease estimation. Neural Netw 24:831–835
Article Google Scholar
Stathakis D (2009) How many hidden layers and nodes? Int J Remote Sens 30:2133–2147
Article Google Scholar
Yang SH, Chen YP (2012) An evolutionary constructive and pruning algorithm for artificial neural networks and its applications. Neurocomputing 86:140–149
Article Google Scholar
Maniezzo V (1994) Genetic evolution of the topology and weight distribution of neural networks. IEEE Trans Neural Netw 5:39–53
Article MATH Google Scholar
Wilamowski BM, Cotton NJ, Kaynak O, Dundar G (2007) Computing gradient vector and Jacobian matrix in arbitrarily connected neural networks. Proc IEEE ISIE 3298–3789
Puma-Villanueva WJ, Von Zuben FJ (2010) Evolving arbitrarily connected feedforward neural networks via genetic algorithm. Proc Brazilian Symp Artif Neural Netw 23–28
Puma-Villanueva WJ, Dos Santos EP, Von Zuben FJ (2012) A constructive algorithm to synthesize arbitrarily connected feedforward neural networks. Neurocomputing 75:23–28
Article MATH Google Scholar
Kwok TY, Yeung DY (1997) Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans Neural Netw 8:630–645
Article Google Scholar
Ash T (1989) Dynamic node creation in backpropagation networks. Connect Sci 1:365–375
Article Google Scholar
Setiono R, Hui LCK (1995) Use of a quasi-Newton method in a feedforward neural network construction algorithm. IEEE Trans Neural Netw 6:273–277
Article Google Scholar
Lee TC (1991) Neural network systems techniques and applications: algorithms and architectures. Academic Press, New-York
Google Scholar
Weng W, Khorasani K (1996) An adaptive structural neural network with application to EEG automatic seizure detection. Neural Netw 9:1223–1240
Article Google Scholar
Prechelt L (1997) Investigation of the CasCor family of learning algorithms. Neural Netw 10:885–896
Article Google Scholar
Ma L, Khorasani K (2004) New training strategies for constructive neural networks with application to regression problems. Neural Netw 17:589–609
Article Google Scholar
Ma L, Khorasani K (2005) Constructive feedforward neural networks using Hermite polynomial activation functions. IEEE Trans Neural Netw 16:821–833
Article Google Scholar
Islam M, Sattar A, Amin F, Yao X, Murase K (2009) A new constructive algorithm for architectural and functional adaptation of artificial neural networks. IEEE Trans Syst Man Cybern Part B 39:1590–1605
Article Google Scholar
Bebis G, Georgiopoulos M (1994) Feed-forward neural networks: why network size is so important. IEEE Potentials 13:27–31
Article MATH Google Scholar
Khosravi A, Nahavendi S, Creighton D (2010) A prediction interval-based approach to determine optimal structures of neural network metamodels. Expert Syst Appl 37:2377–2387
Article Google Scholar
Chauvin Y (1988) A back-propagation algorithm with optimal use of hidden units. In: Touretzky DS (ed) Advances in neural information processing, pp 519–526. http://books.nips.cc/papers/files/nips01/0519.pdf. Accessed 10 July 2013
Weigend AS, Rumelhart DE, Huberman BA (1990) Generalization by weight-elimination with application to forecasting. In: Lippmann R, Moody J, Touretzky DS (eds) Advances in neural information processing, pp 875–882. http://books.nips.cc/papers/files/nips03/0875.pdf. Accessed 10 July 2013
Nowland SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4:473–493
Article Google Scholar
Larsen J, Hansen LK (1994) Generalization performance of regularized neural network models. IEEE workshop on neural network for signal processing, pp 42–51
Leung CS, Wang HJ, Sum J (2010) On the selection of weight decay parameter for faulty networks. IEEE Trans Neural Netw 21:1232–1244
Article MATH Google Scholar
Wang J, Wu W, Zurada JM (2012) Computational properties and convergence analysis of BPNN for cyclic and almost cyclic learning with penalty. Neural Netw 33:127–135
Article MATH Google Scholar
Setiono R, Leow WK (2000) Pruned neural networks for regression. 6th Pacific RIM international conference on artificial intelligence PRICAI’00, pp 500–509
Norgaard M (1996) System identification and control with neural networks, Ph.D. dissertation, Institute of Automation, Technical University of Denmark
Thomas P, Bloch G (1998) Robust pruning for multilayer perceptrons. IMACS/IEEE multiconference on computational engineering in systems applications CESA’98, pp 17–22
Leung CS, Wang HJ, Sum J, Chan LW (2001) A pruning method for the recursive least squared algorithm. Neural Netw 14:147–174
Article Google Scholar
Tang HS, Xue ST, Chen R, Sato T (2007) $\text{ H }\infty $ Filtering method for neural network training and pruning. J Comp Civ Eng 21:47–58
Article Google Scholar
Ricotti ME, Zio E (1999) Neural network approach to sensitivity and uncertainty analysis. Reliab Eng Syst Saf 64:59–71
Article Google Scholar
Zeng X, Yeung DS (2006) Hidden neuron pruning of multilayer perceptrons using a quantified sensitivity measure. Neurocomputing 69:825–837
Article Google Scholar
Chandrasekaran H, Chen HH, Maury MT (2000) Pruning of basis functions in nonlinear approximators. Neurocomputing 34:29–53
Article MATH Google Scholar
Lauret P, Fock E, Mara TA (2006) A node pruning algorithm based on a Fourier amplitude sensitivity test method. IEEE Trans Neural Netw 17:273–293
Article Google Scholar
Augasta MG, Kathirvalavakumar T (2011) A novel pruning algorithm for optimizing feedforward neural network of classification problems. Neural Process Lett 34:241–258
Article Google Scholar
Levin AU, Leen TK, Moody JE (1994) Fast pruning using principal components. In; Cowan J, Tesauro G, Alspector J (eds) Advances in neural information processing. http://books.nips.cc/papers/files/nips06/0035.pdf. Accessed 10 July 2013
Medeiros CMS, Baretto GA (2013) A novel weight pruning method for MLP classifiers on the MAXCORE principle. Neural Comput Appl 22:71–84
Article Google Scholar
Liang X (2007) Removal of hidden neurons in MLP by orthogonal projection and weight crosswise propagation. Neural Comput Appl 16:57–68
Article Google Scholar
Rivals I, Personnaz L (2003) Neural network construction and selection in nonlinear modeling. IEEE Trans Neural Netw 14:804–819
Article Google Scholar
Huang GB, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16:57–67
Article Google Scholar
Hsu CF (2008) Adaptive growing and pruning neural network control for a linear piezoelectric ceramic motor. Eng Appl Artif Intell 21:1153–1163
Article Google Scholar
Qi M, Zhang GP (2001) An investigation of model selection criteria for neural network time series forecasting. Eur J Oper Res 132:666–680
Article Google Scholar
Egrioglu E, Aladag CH, Gunay S (2008) A new model selection strategy in artificial neural networks. Appl Math Comput 195:591–597
Article MathSciNet Google Scholar
Hand D, Mannila H, Smyth P (2001) Principles of data mining. The MIT press, Cambridge
Google Scholar
Thomas P, Bloch G (1997) Initialization of one hidden layer feed-forward neural networks for non-linear system identification. Proceedings of the 15th IMACS world congress on scientific computation, modelling and applied mathematics WC’97, pp 295–300
Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. Proc Int J Conf Neural Netw 3:21–26
Google Scholar
Demuth H, Beale P (1994) Neural networks toolbox user’s guide V2.0. The MathWorks Inc., Natick
Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101
Article MATH Google Scholar
Ruck DW, Rogers SK, Kabrisky M (1990) Feature selection using a multilayer perceptron. Neural Netw Comput 2:40–48
Google Scholar
Tarr G (1991) Multilayered feedforward networks for image segmentation, Ph.D. dissertation, Air Force Inst. Technol. Wright-Patterson AFB
Ljung L (1987) System identification: theory for the user. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Smith JW, Everhart JE, Dickson WC, Knowler WC, Johannes RS (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proceedings of the symposium on computer application and medical care, pp 261–265
WEKA. http://weka.wikispaces.com/Datasets. Accessed 10 July 2013
Sigillito VG, Wing SP, Hutton LV, Baker KB (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech Digest 10:262–266
Google Scholar
Noyel M, Thomas P, Charpentier P, Thomas A, Beaupretre B (2013) Improving production process performance thanks to neuronal analysis. Proceedings of 11th IFAC workshop on intelligent manufacturing system IMS’13
Thomas P, Thomas A (2008) Elagage d’un perceptron multicouches: utilisation de l’analyse de la variance de la sensibilité des paramètres. $5^{{\grave{\rm e}}{\rm me}}$ Conférence Internationale Francophone d’Automatique CIFA’08

Download references

Author information

Authors and Affiliations

Université de Lorraine, CRAN, UMR 7039, Campus Sciences, BP 70239, 54506 , Vandœuvre-lès-Nancy Cedex, France
Philippe Thomas & Marie-Christine Suhner
CNRS, CRAN, UMR, 7039, France
Philippe Thomas & Marie-Christine Suhner

Authors

Philippe Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Christine Suhner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Thomas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thomas, P., Suhner, MC. A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications. Neural Process Lett 42, 437–458 (2015). https://doi.org/10.1007/s11063-014-9366-5

Download citation

Published: 22 June 2014
Issue Date: October 2015
DOI: https://doi.org/10.1007/s11063-014-9366-5

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Computationally Efficient Weight Pruning Algorithm for Artificial Neural Network Classifiers

Training Multilayer Perceptron: A Comparison of Back-propagation and Enhanced Jaya Algorithm for Data Classification

Second Order Training and Sizing for the Multilayer Perceptron

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Computationally Efficient Weight Pruning Algorithm for Artificial Neural Network Classifiers

Training Multilayer Perceptron: A Comparison of Back-propagation and Enhanced Jaya Algorithm for Data Classification

Second Order Training and Sizing for the Multilayer Perceptron

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now