Abstract
Optimizing the structure of neural networks remains a hard task. If too small, the architecture does not allow for proper learning from the data, whereas if the structure is too large, learning leads to the well-known overfitting problem. This paper considers this issue, and proposes a new pruning approach to determine the optimal structure. Our algorithm is based on variance sensitivity analysis, and prunes the different types of unit (hidden neurons, inputs, and weights) sequentially. The stop criterion is based on a performance evaluation of the network results from both the learning and validation datasets. Four variants of this algorithm are proposed. These variants use two different estimators of the variance. They are tested and compared with four classical algorithms on three classification and three regression problems. The results show that the proposed algorithms outperform the classical approaches in terms of both computational time and accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Rumelhart DE, McClelland JL (1986) Parallel distributed processing: exploration in the microstructure of cognition, vol 1. MIT Press, Cambridge
Han HG, Qiao JF (2013) A structure optimisation algorithm for feedforward neural network construction. Neurocomputing 99:347–357
Patel MC, Panchal M (2012) A review on ensemble of diverse artificial neural networks. Int J Adv Res Comput Eng Technol 1:63–70
Drucker H (2002) Effect of pruning and early stopping on performance of a boosting ensemble. Comput Stat Data Anal 38:393–406
Kerlirzin P, Vallet F (1993) Robustness in multilayer perceptrons. Neural Comput 5:473–482
Thomas P, Bloch G, Sirou F, Eustache V (1999) Neural modeling of an induction furnace using robust learning criteria. J Integr Comput Aided Eng 6:15–25
Williams PM (1995) Bayesian regularization and pruning using a Laplace prior. Neural Comput 7:117–143
Bartlett PL (1997) For valid generalization, the size of the weights is more important than the size of the network. Adv Neural Inf Process Systs 9:134–140. http://yaroslavvb.com/papers/bartlett-for.pdf. Accessed 10 July 2013
Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861
Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford
Beigy H, Meybodi MR (2009) A learning automata-based algorithm for determination of the number of hidden units for three-layer neural networks. Int J Syst Sci 40:101–118
Kruschke JH (1989) Improving generalization in backpropagation networks. Proc Int Joint Conf Neural Netw 1:443–447
LeCun Y, Denker J, Solla S, Howard RE, Jackel LD (1990) Optimal brain damage. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San Mateo
Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Hanson SH, Cowan JD, Gilles CL (eds) Advances in neural information processing systems. Morgan Kaufmann, San Mateo
Reed R (1993) Pruning algorithm—a survey. IEEE Trans Neural Netw 4:740–747
Jutten C, Fambon O (1995) Pruning methods: a review. Proc Eur Symp Artif Neural Netw pp 129–140
Mezard M, Nadal JP (1989) Learning in feedforward neural networks: the tiling algorithm. J Phys 22:1285–1296
Fahlman SE, Lebier C (1990) The cascade-correlation learning architecture. Computer Science Department paper 138. http://repository.cmu.edu/compsci/1938. Accessed 10 July 2013
Yeung DY (1991) Automatic determination of network size for supervised learning. IEEE Int J Conf Neural Netw 1:158–164
Chentouf R, Jutten C (1996) Combining sigmoids and radial basis function in evolutive neural architecture. Eur Symp Artif Neural Netw 129–134
Hirose Y, Yamashita K, Hijita S (1991) Back-propagation algorithm which varies the number of hidden units. Neural Netw 4:61–66
Nabhan TM, Zomaya AY (1994) Toward generating neural network structures for function approximation. Neural Netw 7:89–99
Narasimha PL, Delashmit WH, Manry MT, Li J, Maldonado F (2008) An integrated growing-pruning method for feedforward network training. Neurocomputing 71:2831–2847
Whitley D, Bogart C (1990) The evolution of connectivity: pruning neural networks using genetic algorithms. Proc Int J Conf Neural Netw 134–137
Angeline PJ, Saunders GM, Pollack JB (1994) Evolutionary algorithm that construct recurrent neural networks. IEEE Trans Neural Netw 5:54–65
Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12:1386–1399
Kolmogorov AN (1957) On the representational of continuous functions of many variables by superpositions of continuous functional of one variable and addition. Doklady Akademii Nauk URSS 114:953–956
Cybenko G (1989) Approximation by superposition of a sigmoïdal function. Math Control Syst Signals 2:303–314
Funahashi K (1989) On the approximate realisation of continuous mapping by neural networks. Neural Netw 2:183–192
Hecht-Nielsen R (1987) Kolmogorov’s mapping neural network existence theorem. IEEE First Ann Int Conf Neural Netw 3:11–13
Huang GB (2003) Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans Neural Netw 14:274–281
Dreyfus G, Martinez JM, Samuelides M, Gordon MB, Badran F, Thiria S, Hérault L (2002) Réseaux de neurones: méthodologies et applications. Editions Eyrolles, Paris
Yao X (1993) Evolutionary artificial neural networks. Int J Neural Syst 4:203–222
Huang DS, Du JX (2008) A constructive hybrid structure approach to solve routing problems. IEEE Trans Neural Netw 19:2099–2115
Mantzaris D, Anastassopoulos G, Adamopoulos A (2011) Genetic algorithm pruning of probabilitic neural networks in medical disease estimation. Neural Netw 24:831–835
Stathakis D (2009) How many hidden layers and nodes? Int J Remote Sens 30:2133–2147
Yang SH, Chen YP (2012) An evolutionary constructive and pruning algorithm for artificial neural networks and its applications. Neurocomputing 86:140–149
Maniezzo V (1994) Genetic evolution of the topology and weight distribution of neural networks. IEEE Trans Neural Netw 5:39–53
Wilamowski BM, Cotton NJ, Kaynak O, Dundar G (2007) Computing gradient vector and Jacobian matrix in arbitrarily connected neural networks. Proc IEEE ISIE 3298–3789
Puma-Villanueva WJ, Von Zuben FJ (2010) Evolving arbitrarily connected feedforward neural networks via genetic algorithm. Proc Brazilian Symp Artif Neural Netw 23–28
Puma-Villanueva WJ, Dos Santos EP, Von Zuben FJ (2012) A constructive algorithm to synthesize arbitrarily connected feedforward neural networks. Neurocomputing 75:23–28
Kwok TY, Yeung DY (1997) Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans Neural Netw 8:630–645
Ash T (1989) Dynamic node creation in backpropagation networks. Connect Sci 1:365–375
Setiono R, Hui LCK (1995) Use of a quasi-Newton method in a feedforward neural network construction algorithm. IEEE Trans Neural Netw 6:273–277
Lee TC (1991) Neural network systems techniques and applications: algorithms and architectures. Academic Press, New-York
Weng W, Khorasani K (1996) An adaptive structural neural network with application to EEG automatic seizure detection. Neural Netw 9:1223–1240
Prechelt L (1997) Investigation of the CasCor family of learning algorithms. Neural Netw 10:885–896
Ma L, Khorasani K (2004) New training strategies for constructive neural networks with application to regression problems. Neural Netw 17:589–609
Ma L, Khorasani K (2005) Constructive feedforward neural networks using Hermite polynomial activation functions. IEEE Trans Neural Netw 16:821–833
Islam M, Sattar A, Amin F, Yao X, Murase K (2009) A new constructive algorithm for architectural and functional adaptation of artificial neural networks. IEEE Trans Syst Man Cybern Part B 39:1590–1605
Bebis G, Georgiopoulos M (1994) Feed-forward neural networks: why network size is so important. IEEE Potentials 13:27–31
Khosravi A, Nahavendi S, Creighton D (2010) A prediction interval-based approach to determine optimal structures of neural network metamodels. Expert Syst Appl 37:2377–2387
Chauvin Y (1988) A back-propagation algorithm with optimal use of hidden units. In: Touretzky DS (ed) Advances in neural information processing, pp 519–526. http://books.nips.cc/papers/files/nips01/0519.pdf. Accessed 10 July 2013
Weigend AS, Rumelhart DE, Huberman BA (1990) Generalization by weight-elimination with application to forecasting. In: Lippmann R, Moody J, Touretzky DS (eds) Advances in neural information processing, pp 875–882. http://books.nips.cc/papers/files/nips03/0875.pdf. Accessed 10 July 2013
Nowland SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4:473–493
Larsen J, Hansen LK (1994) Generalization performance of regularized neural network models. IEEE workshop on neural network for signal processing, pp 42–51
Leung CS, Wang HJ, Sum J (2010) On the selection of weight decay parameter for faulty networks. IEEE Trans Neural Netw 21:1232–1244
Wang J, Wu W, Zurada JM (2012) Computational properties and convergence analysis of BPNN for cyclic and almost cyclic learning with penalty. Neural Netw 33:127–135
Setiono R, Leow WK (2000) Pruned neural networks for regression. 6th Pacific RIM international conference on artificial intelligence PRICAI’00, pp 500–509
Norgaard M (1996) System identification and control with neural networks, Ph.D. dissertation, Institute of Automation, Technical University of Denmark
Thomas P, Bloch G (1998) Robust pruning for multilayer perceptrons. IMACS/IEEE multiconference on computational engineering in systems applications CESA’98, pp 17–22
Leung CS, Wang HJ, Sum J, Chan LW (2001) A pruning method for the recursive least squared algorithm. Neural Netw 14:147–174
Tang HS, Xue ST, Chen R, Sato T (2007) \(\text{ H }\infty \) Filtering method for neural network training and pruning. J Comp Civ Eng 21:47–58
Ricotti ME, Zio E (1999) Neural network approach to sensitivity and uncertainty analysis. Reliab Eng Syst Saf 64:59–71
Zeng X, Yeung DS (2006) Hidden neuron pruning of multilayer perceptrons using a quantified sensitivity measure. Neurocomputing 69:825–837
Chandrasekaran H, Chen HH, Maury MT (2000) Pruning of basis functions in nonlinear approximators. Neurocomputing 34:29–53
Lauret P, Fock E, Mara TA (2006) A node pruning algorithm based on a Fourier amplitude sensitivity test method. IEEE Trans Neural Netw 17:273–293
Augasta MG, Kathirvalavakumar T (2011) A novel pruning algorithm for optimizing feedforward neural network of classification problems. Neural Process Lett 34:241–258
Levin AU, Leen TK, Moody JE (1994) Fast pruning using principal components. In; Cowan J, Tesauro G, Alspector J (eds) Advances in neural information processing. http://books.nips.cc/papers/files/nips06/0035.pdf. Accessed 10 July 2013
Medeiros CMS, Baretto GA (2013) A novel weight pruning method for MLP classifiers on the MAXCORE principle. Neural Comput Appl 22:71–84
Liang X (2007) Removal of hidden neurons in MLP by orthogonal projection and weight crosswise propagation. Neural Comput Appl 16:57–68
Rivals I, Personnaz L (2003) Neural network construction and selection in nonlinear modeling. IEEE Trans Neural Netw 14:804–819
Huang GB, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16:57–67
Hsu CF (2008) Adaptive growing and pruning neural network control for a linear piezoelectric ceramic motor. Eng Appl Artif Intell 21:1153–1163
Qi M, Zhang GP (2001) An investigation of model selection criteria for neural network time series forecasting. Eur J Oper Res 132:666–680
Egrioglu E, Aladag CH, Gunay S (2008) A new model selection strategy in artificial neural networks. Appl Math Comput 195:591–597
Hand D, Mannila H, Smyth P (2001) Principles of data mining. The MIT press, Cambridge
Thomas P, Bloch G (1997) Initialization of one hidden layer feed-forward neural networks for non-linear system identification. Proceedings of the 15th IMACS world congress on scientific computation, modelling and applied mathematics WC’97, pp 295–300
Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. Proc Int J Conf Neural Netw 3:21–26
Demuth H, Beale P (1994) Neural networks toolbox user’s guide V2.0. The MathWorks Inc., Natick
Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101
Ruck DW, Rogers SK, Kabrisky M (1990) Feature selection using a multilayer perceptron. Neural Netw Comput 2:40–48
Tarr G (1991) Multilayered feedforward networks for image segmentation, Ph.D. dissertation, Air Force Inst. Technol. Wright-Patterson AFB
Ljung L (1987) System identification: theory for the user. Prentice-Hall, Englewood Cliffs
Smith JW, Everhart JE, Dickson WC, Knowler WC, Johannes RS (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proceedings of the symposium on computer application and medical care, pp 261–265
WEKA. http://weka.wikispaces.com/Datasets. Accessed 10 July 2013
Sigillito VG, Wing SP, Hutton LV, Baker KB (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech Digest 10:262–266
Noyel M, Thomas P, Charpentier P, Thomas A, Beaupretre B (2013) Improving production process performance thanks to neuronal analysis. Proceedings of 11th IFAC workshop on intelligent manufacturing system IMS’13
Thomas P, Thomas A (2008) Elagage d’un perceptron multicouches: utilisation de l’analyse de la variance de la sensibilité des paramètres. \(5^{{\grave{\rm e}}{\rm me}}\) Conférence Internationale Francophone d’Automatique CIFA’08
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thomas, P., Suhner, MC. A New Multilayer Perceptron Pruning Algorithm for Classification and Regression Applications. Neural Process Lett 42, 437–458 (2015). https://doi.org/10.1007/s11063-014-9366-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-014-9366-5