Neural Network Classification and Prior Class Probabilities

Steve Lawrence¹⁸,
Ian Burns¹⁹,
Andrew Back²⁰,
Ah Chung Tsoi²¹ &
…
C. Lee Giles¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

66k Accesses
12 Citations

Abstract

A commonly encountered problem in MLP (multi-layer perceptron) classification problems is related to the prior probabilities of the individual classes - if the number of training examples that correspond to each class varies significantly between the classes, then it may be harder for the network to learn the rarer classes in some cases. Such practical experience does not match theoretical results which show that MLPs approximate Bayesian a posteriori probabilities (independent of the prior class probabilities). Our investigation of the problem shows that the difference between the theoretical and practical results lies with the assumptions made in the theory (accurate estimation of Bayesian a posteriori probabilities requires the network to be large enough, training to converge to a global minimum, infinite training data, and the a priori class probabilities of the test set to be correctly represented in the training set). Specifically, the problem can often be traced to the fact that efficient MLP training mechanisms lead to sub-optimal solutions for most practical problems. In this chapter, we demonstrate the problem, discuss possible methods for alleviating it, and introduce new heuristics which are shown to perform well on a sample ECG classification problem. The heuristics may also be used as a simple means of adjusting for unequal misclassification costs.

Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Best Neural Network Architecture

An unsupervised learning approach for multilayer perceptron networks

Article 26 November 2018

MMLD Inference of Multilayer Perceptrons

References

AAMI. Testing and reporting performance results of ventricular Arrhythmia detection algorithms. In: Association for the Advancement of Medical Instrumentation, ECAR 1987, Arlington, VA (1987)
Google Scholar
Anand, R., Mehrotra, K.G., Mohan, C.K., Ranka, S.: An improved algorithm for neural network classification of imbalanced training sets. IEEE Transactions on Neural Networks 4(6), 962–969 (1993)
Article Google Scholar
Barnard, E., Botha, E.C.: Back-propagation uses prior information efficiently. IEEE Transactions on Neural Networks 4(5), 794–802 (1993)
Article Google Scholar
Barnard, E., Casasent, D.: A comparison between criterion functions for linear classifiers, with an application to neural nets. IEEE Transactions on Systems, Man, and Cybernetics 19(5), 1030–1041 (1989)
Article Google Scholar
Barnard, E., Cole, R.A., Hou, L.: Location and classification of plosive constants using expert knowledge and neural-net classifiers. Journal of the Acoustical Society of America 84(suppl. 1), S60 (1988)
Google Scholar
Bourlard, H.A., Morgan, N.: Links between Markov models and multilayer perceptrons. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems, vol. 1, pp. 502–510. Morgan Kaufmann, San Mateo (1989)
Google Scholar
Bourlard, H.A., Morgan, N.: Connnectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Boston (1994)
Book Google Scholar
Scott Cardell, N., Joerding, W., Li, Y.: Why some feedforward networks cannot learn some polynomials. Neural Computation 6(4), 761–766 (1994)
Article Google Scholar
Fletcher, R.: Practical Methods of Optimization, Second Edition, 2nd edn. John Wiley & Sons (1987)
Google Scholar
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4(1), 1–58 (1992)
Article Google Scholar
Gish, H.: A probabilistic approach to the understanding and training of neural network classifiers. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, pp. 1361–1364. IEEE Press (1990)
Google Scholar
Hampshire, J.B., Pearlmutter, B.: Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. In: Touretzky, D.S., Elman, J.L., Sejnowski, T.J., Hinton, G.E. (eds.) Proceedings of the 1990 Connectionist Models Summer School, Morgan Kaufmann, San Mateo (1990)
Google Scholar
Hampshire, J.B., Waibel, A.H.: A novel objective function for improved phoneme recognition using time delay neural networks. In: International Joint Conference on Neural Networks, Washington, DC, pp. 235–241 (June 1989)
Google Scholar
Haykin, S.: Neural Networks, A Comprehensive Foundation. Macmillan, New York (1994)
MATH Google Scholar
Kanaya, F., Miyake, S.: Bayes statistical behavior and valid generalization of pattern classifying neural networks. IEEE Transactions on Neural Networks 2(1), 471 (1991)
Article Google Scholar
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, vol. 4, pp. 950–957. Morgan Kaufmann, San Mateo (1992)
Google Scholar
Lawrence, S., Lee Giles, C., Tsoi, A.C.: Lessons in neural network training: Overfitting be harder than expected. In: Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI 1997, pp. 540–545. AAAI Press, Menlo Park (1997)
Google Scholar
LeCun, Y.: Efficient learning and second order methods. In: Tutorial Presented at Neural Information Processing Systems, vol. 5 (1993)
Google Scholar
LeCun, Y., Bengio, Y.: Pattern recognition. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 711–715. MIT Press (1995)
Google Scholar
Lyon, R., Yaeger, L.: On-line hand-printing recognition with neural networks. In: Fifth International Conference on Microelectronics for Neural Networks and Fuzzy Systems, Lausanne, Switzerland. IEEE Computer Society Press (1996)
Google Scholar
MIT-BIH. MIT-BIH Arrhythmia database directory. Technical Report BMEC TR010 (Revised), Massachusetts Institute of Technology and Beth Israel Hospital (1988)
Google Scholar
Murray, A.F., Edwards, P.J.: Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks 5(5), 792–802 (1994)
Article Google Scholar
Richard, M.D., Lippmann, R.P.: Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation 3(4), 461–483 (1991)
Article Google Scholar
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Book MATH Google Scholar
Rojas, R.: A short proof of the posterior probability property of classifier neural networks. Neural Computation 8, 41–43 (1996)
Article Google Scholar
Ruck, D.W., Rogers, S.K., Kabrisky, K., Oxley, M.E., Suter, B.W.: The multilayer perceptron as an approximation to an optimal Bayes estimator. IEEE Transactions on Neural Networks 1(4), 296–298 (1990)
Article Google Scholar
Schiffman, W., Joost, M., Werner, R.: Optimization of the backpropagation algorithm for training multilayer perceptrons. Technical report, University of Koblenz (1994)
Google Scholar
Shoemaker, P.A.: A note on least-squares learning procedures and classification by neural network models. IEEE Transactions on Neural Networks 2(1), 158–160 (1991)
Article Google Scholar
Wan, E.: Neural network classification: A Bayesian interpretation. IEEE Transactions on Neural Networks 1(4), 303–305 (1990)
Article MathSciNet Google Scholar
Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: Lippmann, R.P., Moody, J.E., Touretzky, D.S. (eds.) Advances in Neural Information Processing Systems, vol. 3, pp. 875–882. Morgan Kaufmann, San Mateo (1991)
Google Scholar
Weiss, N.A., Hassett, M.J.: Introductory Statistics, 2nd edn. Addison-Wesley, Reading (1987)
MATH Google Scholar
White, H.: Learning in artificial neural networks: A statistical perspective. Neural Computation 1(4), 425–464 (1989)
Article Google Scholar
Yaeger, L., Lyon, R., Webb, B.: Effective training of a neural network character classifier for word recognition. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

NEC Research Institute, 4 Independence Way, Princeton, NJ, USA, 08540
Steve Lawrence & C. Lee Giles
Open Access Pty Ltd, Level 2, 7-9 Albany St, St., Leonards, NSW, 2065, Australia
Ian Burns
RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi, Saitama, 351-0198, Japan
Andrew Back
Faculty of Informatics, University of Wollongong, Northfields Ave, Wollongong, NSW, 2522, Australia
Ah Chung Tsoi

Authors

Steve Lawrence
View author publications
You can also search for this author in PubMed Google Scholar
Ian Burns
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Back
View author publications
You can also search for this author in PubMed Google Scholar
Ah Chung Tsoi
View author publications
You can also search for this author in PubMed Google Scholar
C. Lee Giles
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, Technische Universität Berlin, Franklinstr. 28/29, 10587, Berlin, Germany
Grégoire Montavon & Klaus-Robert Müller &
Dept. of computer Science, Willamette University, 900 State Street, 97301, Salem, OR, USA
Geneviève B. Orr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lawrence, S., Burns, I., Back, A., Tsoi, A.C., Giles, C.L. (2012). Neural Network Classification and Prior Class Probabilities. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-35289-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Neural Network Classification and Prior Class Probabilities

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Best Neural Network Architecture

An unsupervised learning approach for multilayer perceptron networks

MMLD Inference of Multilayer Perceptrons

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Neural Network Classification and Prior Class Probabilities

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Best Neural Network Architecture

An unsupervised learning approach for multilayer perceptron networks

MMLD Inference of Multilayer Perceptrons

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation