Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A novel weight pruning method for MLP classifiers based on the MAXCORE principle

  • Cont. Dev. of Neural Compt. & Appln.
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

We introduce a novel weight pruning methodology for MLP classifiers that can be used for model and/or feature selection purposes. The main concept underlying the proposed method is the MAXCORE principle, which is based on the observation that relevant synaptic weights tend to generate higher correlations between error signals associated with the neurons of a given layer and the error signals propagated back to the previous layer. Nonrelevant (i.e. prunable) weights tend to generate smaller correlations. Using the MAXCORE as a guiding principle, we perform a cross-correlation analysis of the error signals at successive layers. Weights for which the cross-correlations are smaller than a user-defined error tolerance are gradually discarded. Computer simulations using synthetic and real-world data sets show that the proposed method performs consistently better than standard pruning techniques, with much lower computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. The AIC has the follow structure \(AIC=-2\ln(\varepsilon_{\rm train})+2N_c \) [23].

  2. Since the proposed approach is dependent on the classifier model, it belongs to the class of wrappers for feature subset selection ([16]).

  3. Recall that the task now is feature selection, not pattern classification. Thus, we can train the network with all the available pattern vectors.

References

  1. Aran O, Yildiz OT, Alpaydin E (2009) An incremental framework based on cross-validation for estimating the architecture of a multilayer perceptron. Int J Pattern Recogn Artif Intell 23(2):159–190

    Article  Google Scholar 

  2. Benardos PG, Vosniakos GC (2007) Optimizing feedforward artificial neural network architecture. Eng Appl Artif Intell 20(3):365–382

    Article  Google Scholar 

  3. Berthonnaud E, Dimnet J, Roussouly P, Labelle H (2005) Analysis of the sagittal balance of the spine and pelvis using shape and orientation parameters. J Spinal Disorders Tech 18(1):40–47

    Article  Google Scholar 

  4. Bishop CM (1992) Exact calculation of the hessian matrix for the multi-layer perceptron. Neural Comput 4(4):494–501

    Article  Google Scholar 

  5. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

  6. Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531

    Article  Google Scholar 

  7. Cataltepe Z, Abu-Mostafa YS, Magdon-Ismail M (1999) No free lunch for early stopping. Neural Comput 11(4):995–1009

    Article  Google Scholar 

  8. Curry B, Morgan PH (2006) Model selection in neural networks: some dificulties. Eur J Oper Res 170(2):567–577

    Article  MathSciNet  MATH  Google Scholar 

  9. Dandurand F, Berthiaume V, Shultz TR (2007) A systematic comparison of flat and standard cascade-correlation using a student-teacher network approximation task. Connect Sci 19(3):223–244

    Article  Google Scholar 

  10. Delogu R, Fanni A, Montisci A (2008) Geometrical synthesis of MLP neural networks. Neurocomputing 71:919–930

    Article  Google Scholar 

  11. Engelbrecht AP (2001) A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans Neural Netw 12(6):1386–1399

    Article  Google Scholar 

  12. Fahlman SE, Lebiere C (1990) The cascade-correlation learning architecture. In: Touretzky DS (ed) Advances in neural information processing systems. Morgan Kaufmann, San Mateo, vol 2, pp 524–532

    Google Scholar 

  13. Gómez I, Franco L, Jerez JM (2009) Neural network architecture selection: can function complexity help? Neural Process Lett 30:71–87

    Article  Google Scholar 

  14. Hammer B, Micheli A, Sperduti A (2006) Universal approximation capability of cascade correlation for structures. Neural Comput 17(5):1109–1159

    Article  MathSciNet  Google Scholar 

  15. Hassibi B, Stork DG (1993) Second order derivatives for network pruning: optimal brain surgeon. In: Hanson SJ, Cowan JD, Giles CL (eds) Advances in neural information processing systems. Morgan Kaufmann, San Mateo, vol 5, pp 164–171

    Google Scholar 

  16. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  17. Littmann E, Ritter H (1996) Learning and generalization in cascade network architectures. Neural Comput 8(7):1521–1539

    Article  Google Scholar 

  18. Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Exp Syst Appl 38(4):4600–4607

    Article  Google Scholar 

  19. Moustakidis S, Theocharis J (2010) SVM-FuzCoC: a novel SVM-based feature selection method using a fuzzy complementary criterion. Pattern Recogn 43(11):3712–3729

    Article  MATH  Google Scholar 

  20. Nakamura T, Judd K, Mees AI, Small M (2006) A comparative study of information criteria for model selection. Int J Bifur Chaos 16(8):2153–2175

    Article  MathSciNet  MATH  Google Scholar 

  21. Parekh R, Yang J, Honavar V (2000) Constructive neural-network learning algorithms for pattern classification. IEEE Trans Neural Netw 11(2):436–451

    Article  Google Scholar 

  22. Platt JC (1998) Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208

  23. Principe JC, Euliano NR, Lefebvre WC (2000) Neural and adaptive systems. Wiley, London

  24. Reed R (1993) Pruning algorithms—a survey. IEEE Trans Neural Netw 4(5):740–747

    Article  Google Scholar 

  25. Rocha M, Cortez P, Neves J (2007) Evolution of neural networks for classification and regression. Neurocomputing 70(16–18):1054–1060

    Google Scholar 

  26. Rocha Neto AR, Barreto GA (2009) On the application of ensembles of classifiers to the diagnosis of pathologies of the vertebral column: a comparative analysis. IEEE Latin Am Trans 7(4):487–496

    Article  Google Scholar 

  27. Saxena A, Saad A (2007) Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems. Appl Soft Comput 7(1):441–454

    Article  Google Scholar 

  28. Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the kullback-leibler divergence. IEEE Trans Neural Netw 18(1):97–106

    Article  Google Scholar 

  29. Stathakis D, Kanellopoulos I (2008) Global optimization versus deterministic pruning for the classification of remotely sensed imagery. Photogrammetr Eng Remote Sens 74(10):1259–1265

    Google Scholar 

  30. Trenn S (2008) Multilayer perceptrons: approximation order and necessary number of hidden units. IEEE Trans Neural Netw 19(5):836–844

    Article  Google Scholar 

  31. Wan W, Mabu S, Shimada K, Hirasawa K, Hu J (2009) Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 9(1):404–414

    Article  Google Scholar 

  32. Weigend AS, Rumelhart DE, Huberman AB (1990) Generalization by weight-elimination with application to forecasting. In: Lippmann RP, Moody J, Touretzky DS (eds) Advances in neural information processing systems. Morgan Kauffman, San Mateo, vol 3, pp 875–882

    Google Scholar 

  33. Xiang C, Ding SQ, Lee TH (2005) Geometric interpretation and architecture selection of the MLP. IEEE Trans Neural Netw 16(1):84–96

    Article  Google Scholar 

  34. Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447

    Article  Google Scholar 

  35. Yu J, Wanga S, Xi L (2008) Evolving artificial neural networks using an improved PSO and DPSO. Neurocomputing 71(4–6):1054–1060

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank Prof. Ajalmar Rêgo da Rocha Neto (Federal Institute of Ceará—IFCE) for running the experiments with the SVM classifiers on the vertebral column data set. We also thank the anonymous reviewers for their valuable suggestions for improving this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guilherme A. Barreto.

Appendix

Appendix

The WDE algorithm originates from a regularization method that modifies the error function by adding a term that penalizes large weights. As a consequence, Eqs. 7, 8 are now written as [23]

$$ \begin{aligned} m_{ki}(t+1) &= m_{ki}(t)\left( 1 - \frac{\lambda}{(1 + m_{ki}^2(t))^2}\right) + \eta \delta_{k}^{(o)}(t) y_{i}^{(h)}(t),\\ w_{ij}(t+1) &= w_{ij}(t)\left( 1 -\frac{ \lambda}{(1 + w_{ij}^2(t))^2}\right) + \eta \delta_{i}^{(h)}(t) x_j(t), \end{aligned} $$

where 0 < λ < 1 is a user-defined parameter.

The OBS algorithm [15] requires that the weights are ranked based on the computation of weight saliencies defined as

$$ S_i = \Updelta E_i =\frac{1}{2} \frac{\omega_i^2}{[{{\mathbf{H}}}^{-1}]_{ii}} $$
(21)

where ω i is the ith weight (or bias) of interest and \([{\mathbf{H}}^{-1}]_{ii}\) is the ith diagonal entry of the inverse of the Hessian matrix \({\mathbf{H}} = [H_{ij}] = \frac{\partial^2 E }{\partial \omega_i \partial \omega_j}\).

Pruning by weight magnitude (PWM) is a pruning method based on the elimination of small magnitude weights ([5]). Weights are sort in increasing order of magnitude. Starting from the smallest weight, a given weight is pruned as long as its elimination does not decrease the classification rate in training data set to a value below a predefined value.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Medeiros, C.M.S., Barreto, G.A. A novel weight pruning method for MLP classifiers based on the MAXCORE principle. Neural Comput & Applic 22, 71–84 (2013). https://doi.org/10.1007/s00521-011-0748-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-011-0748-6

Keywords

Navigation