Evaluating generalization through interval-based neural network inversion

Stavros P. Adam^1,2,
Aristidis C. Likas³ &
Michael N. Vrahatis²

324 Accesses
Explore all metrics

Abstract

Typically, measuring the generalization ability of a neural network relies on the well-known method of cross-validation which statistically estimates the classification error of a network architecture thus assessing its generalization ability. However, for a number of reasons, cross-validation does not constitute an efficient and unbiased estimator of generalization and cannot be used to assess generalization of neural network after training. In this paper, we introduce a new method for evaluating generalization based on a deterministic approach revealing and exploiting the network’s domain of validity. This is the area of the input space containing all the points for which a class-specific network output provides values higher than a certainty threshold. The proposed approach is a set membership technique which defines the network’s domain of validity by inverting its output activity on the input space. For a trained neural network, the result of this inversion is a set of hyper-boxes which constitute a reliable and $\varepsilon$-accurate computation of the domain of validity. Suitably defined metrics on the volume of the domain of validity provide a deterministic estimation of the generalization ability of the trained network not affected by random test set selection as with cross-validation. The effectiveness of the proposed generalization measures is demonstrated on illustrative examples using artificial and real datasets using swallow feed-forward neural networks such as Multi-layer perceptrons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interval Analysis Based Neural Network Inversion: A Means for Evaluating Generalization

Generalized Entropy Loss Function in Neural Network: Variable’s Importance and Sensitivity Analysis

Estimates for the generalized cross-validation function via an extrapolation and statistical approach

Article 18 June 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Abbreviations

HPD:: Highest posterior density
INTLAB:: INTerval LABoratory
IA:: Interval analysis
MLP:: Multi-layer perceptron
OTS:: Off training set
PDF:: Probability density function
SCS:: Set computations with subpavings
SIVIA:: Set inversion via interval analysis

References

Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2015) Reliable estimation of a neural network’s domain of validity through interval analysis based inversion. In: 2015 international joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280794
Adam SP, Likas AC, Vrahatis MN (2017) Interval analysis based neural network inversion: a means for evaluating generalization. In: Boracchi G, Iliadis L, Jayne C, Likas A (eds) Engineering applications of neural networks. Springer International Publishing, Berlin, pp 314–326
Chapter Google Scholar
Adam SP, Magoulas GD, Karras DA, Vrahatis MN (2016) Bounding the search space for global optimization of neural networks learning error: an interval analysis approach. J Mach Learn Res 17(169):1–40. http://jmlr.org/papers/v17/14-350.html
Bishop CM (1996) Neural networks for pattern recognition. Oxford University Press, Oxford
MATH Google Scholar
Courrieu P (1994) Three algorithms for estimating the domain of validity of feedforward neural networks. Neural Netw 7(1):169–174
Article Google Scholar
Eberhart R, Dobbins R (1991) Designing neural network explanation facilities using genetic algorithms. In: 1991 IEEE international joint conference on neural networks, vol 2, pp 1758–1763
Hampshire II JB, Pearlmutter BA (1991) Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. In: Proceedings of the 1990 connectionist models summer school, vol 1, pp 159–172
Chapter Google Scholar
Hassoun MH (1995) Fundamentals of artificial neural networks. MIT Press, Cambridge
MATH Google Scholar
Haykin S (1999) Neural networks a comprehensive foundation, 2nd edn. Prentice-Hall, Upper Saddle River, NJ
MATH Google Scholar
Hernández-Espinosa C, Fernández-Redondo M, Ortiz-Gómez M (2003) Inversion of a Neural Network via Interval Arithmetic for Rule Extraction. In: Kaynak O, Alpaydin E, Oja E, Xu L (eds) Artificial Neural Networks and Neural Information Processing ICANN/ICONIP 2003, vol 2714. Springer, Berlin Heidelberg, pp 670–677 Lecture Notes in Computer Science
Chapter MATH Google Scholar
Jaulin L, Kieffer M, Didrit O, Walter E (2001) Applied interval analysis with examples in parameter and state estimation, robust control and robotics. Springer, London
MATH Google Scholar
Jaulin L, Walter E (1993) Set inversion via interval analysis for nonlinear bounded-error estimation. Automatica 29(4):1053–1064
Article MathSciNet MATH Google Scholar
Jensen C, Reed R, Marks R, El-Sharkawi M, Jung JB, Miyamoto R, Anderson G, Eggen C (1999) Inversion of feedforward neural networks: algorithms and applications. In: Proceedings of the IEEE 87(9):1536–1549
Article Google Scholar
Kamimura R (2017) Mutual information maximization for improving and interpreting multi-layered neural networks. In: 2017 IEEE symposium series on computational intelligence (SSCI), pp 1–7
Karystinos GN, Pados DA (2000) On overfitting, generalization, and randomly expanded training sets. IEEE Trans Neural Netw 11(5):1050–1057
Article Google Scholar
Kearfott RB (1996) Interval computations: introduction, uses, and resources. Euromath Bull 2(1):95–112
MathSciNet Google Scholar
Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann Math Stat 23:462–466
Article MathSciNet MATH Google Scholar
Kindermann J, Linden A (1990) Inversion of neural networks by gradient descent. Parallel Comput 14(3):277–286
Article Google Scholar
Likas A (2001) Probability density estimation using artificial neural networks. Comput Phys Commun 135(2):167–175
Article MATH Google Scholar
Liu Y (1995) Unbiased estimate of generalization error and model selection in neural network. Neural Netw 8(2):215–219
Article MathSciNet Google Scholar
Lu BL, Kita H, Nishikawa Y (1999) Inverting feedforward neural networks using linear and nonlinear programming. IEEE Trans Neural Netw 10(6):1271–1290
Article Google Scholar
Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: International conference on learning representations. https://openreview.net/forum?id=HJC2SzZCW
Reed R, Marks R (1995) An evolutionary algorithm for function inversion and boundary marking. In: IEEE international conference on evolutionary computation, 1995, vol 2, pp 794–797
Richard M, Lippmann R (1991) Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput 3(4):461–483. https://doi.org/10.1162/neco.1991.3.4.461
Article Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407
Article MathSciNet MATH Google Scholar
Rump SM (1999) INTLAB - INTerval LABoratory. In: Csendes T (ed) Developments in reliable computing. Kluwer Academic, Dordrecht, Netherlands, pp 77–104
Chapter Google Scholar
Saad EW, Wunsch DC II (2007) Neural network explanation using inversion. Neural Netw 20(1):78–93
Article MATH Google Scholar
Theodoridis S, Pikrakis A, Koutroumbas K, Kavouras D (2010) Introduction to pattern recognition: a MATLAB approach. Academic Press, Burlington, MA 01803, USA
Thrun SB (1993) Extracting provably correct rules from artificial neural networks. Technical Report IAI–TR–93–5, Institut fur Informatik III, Bonn, Germany
Tornil-Sin S, Puig V, Escobet T (2010) Set computations with subpavings in MATLAB: the SCS toolbox. In: 2010 IEEE international symposium on computer-aided control system design (CACSD), pp 1403–1408
Wolpert DH (1990) A mathematical theory of generalization: part I. Complex Syst 4(2):151–200
MATH Google Scholar
Wolpert DH (1990) A mathematical theory of generalization: part II. Complex Syst 4(2):201–249
MATH Google Scholar
Wolpert DH (1992) On the connection between in-sample testing and generalization error. Complex Syst 6(1):47–94
MathSciNet MATH Google Scholar
Wolpert DH (1996) The existence of a priori distinctions between learning algorithms. Neural Comput 8(7):1391–1420. https://doi.org/10.1162/neco.1996.8.7.1391
Article Google Scholar
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390. https://doi.org/10.1162/neco.1996.8.7.1341
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable suggestions and comments on earlier version of the manuscript that helped to significantly improve the paper at hand.

Author information

Authors and Affiliations

Department of Informatics and Telecommunications, University of Ioannina, Arta, Greece
Stavros P. Adam
Computational Intelligence Laboratory, Department of Mathematics, University of Patras, Patras, Greece
Stavros P. Adam & Michael N. Vrahatis
Department of Computer Science and Engineering, University of Ioannina, Ioannina, Greece
Aristidis C. Likas

Authors

Stavros P. Adam
View author publications
You can also search for this author in PubMed Google Scholar
Aristidis C. Likas
View author publications
You can also search for this author in PubMed Google Scholar
Michael N. Vrahatis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stavros P. Adam.

Ethics declarations

Conflict of interest:

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

In order to illustrate the impact of the $\beta$-cut on the domain of validity first let us consider the two-dimensional classification dataset with two classes forming nine groups shown in Fig. 6a. A $2-10-1$ MLP, using logistic sigmoid activation functions, has been trained on this dataset, and the contour plot of its output is shown in Fig. 6b. In this Figure, the white regions (output greater than $1-\beta$) correspond to patterns classified by the MLP network into class 1 (red points), while the black regions (output lower than $\beta$) correspond to patterns classified into class 2 (blue points). Obviously, the gray level zone depicts the ambiguity of classification for patterns near the class boundaries and provides MLP output values in the interval $[\beta ,1-\beta ]$.

The impact of this $\beta$-cut classification decision is better depicted in Figs. 7 and 8. For each one of these Figures, the red colored area corresponds to a specific domain of validity defined for some specific interval $[1-\beta ,1]$ of the network output, for the MLP trained on the above two-dimensional problem. Each area is determined using SIVIA to invert the MLP output interval $[1-\beta ,1]$ for class 1 in the input space.

It is obvious that the value of $\beta$ clearly extends or restricts the input space area classified by the MLP into class 1. This argument can be easily verified by simple observation of Fig. 7a, b, while for problems with a higher dimension this can be confirmed with the comparison of the volumes of the respective domains of validity. This shows the importance of choosing the right value for $\beta$ which, here, needs to be 0.1 if one wants to take the right classification decision for a significant part of the input space. As shown in Fig. 8a, b, the appropriate value of $\beta$ also depends on the number of training epochs and the error threshold that were used chosen to train the MLP.

Appendix B

In the extreme case, of a pattern producing more than one valid outputs (i.e., it is assigned to more than one classes), the current implementation computing the domain of validity results in considering this pattern misclassified. Actually, for its proper class the pattern is correctly classified while for any other class for the other classes it is a misclassified pattern. A previous approach for determining the domain of validity considered such a pattern unclassified. However in terms of the proposed metrics both approaches compute the same result given that unclassified and misclassified patterns have the same status for the computed metrics.

In many cases a training algorithm results in either under-trained or over-trained networks. Under-training arises for many reasons (insufficient training, small sized training data, inappropriate network architecture, etc.). In consequence, as shown in Fig. 9b, the domain of validity covers either small regions of the input space or a large region is incorrectly classified. For instance, the validity domain of a 2–4–2 MLP, shown in Fig. 9a, exhibits a more regular coverage of the input space, while the 2–2–2 MLP, as shown in Fig. 9b manages to cover a narrow strip in the input space. In general, it can be stated the validity domain of an under-trained network is composed of a small number of large regions with regularly shaped boundaries.

Besides under-training, another issue affecting generalization is network over-training. Typically, an over-trained network fails to correctly classify unseen patterns as it has learned “exactly” the training data and hence it is not able to generalize well. In this case the decision boundaries computed by the network delimit as firmly as possible the regions in the input space and the network fails to interpolate even among close neighboring groups. In such a case the domain of validity consists of smaller regions and so its volume diminishes. An indicative example of such a validity domain is given in Fig. 9c. As a result we may state that for a well-trained network, the lower the volume of its domain of validity, the poorer the generalization achieved by the network due to over-training.

Another unfortunate result, when considering over-training is that MLPs, especially those with a high number of nodes in the hidden(s) layer(s), tend to fit outliers, noisy input patterns as well as patterns with noisy class labels, see Fig. 10. In these cases the network has the flexibility to form the decision boundaries that discriminate the outlying or misplaced patterns. Doing so, the network defines isolated regions, such as isles or lobes, in the input space which delimit not only these very patterns but also important parts of the input space for which there is no information about the class or the classes they belong to. In general, it can be stated that the validity domain of an over-trained network contains regions with small size and irregularly shaped boundaries.

Hence, the previous cases constitute different aspects of over-training that need to be taken into account when considering the volume of the domain of validity as a metric of the network’s generalization performance.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adam, S.P., Likas, A.C. & Vrahatis, M.N. Evaluating generalization through interval-based neural network inversion. Neural Comput & Applic 31, 9241–9260 (2019). https://doi.org/10.1007/s00521-019-04129-5

Download citation

Received: 04 February 2018
Accepted: 04 March 2019
Published: 22 March 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00521-019-04129-5