Probabilistic Bounds for Binary Classification of Large Data Sets

Věra Kůrková⁷ &
Marcello Sanguineti⁸

Part of the book series: Proceedings of the International Neural Networks Society ((INNS,volume 1))

Included in the following conference series:

INNS Big Data and Deep Learning conference

1054 Accesses
1 Citations

Abstract

A probabilistic model for classification of task relevance is investigated. Correlations between randomly-chosen functions and network input-output functions are estimated. Impact of large data sets is analyzed from the point of view of the concentration of measure phenomenon. The Azuma-Hoeffding Inequality is exploited, which can be applied also when the naive Bayes assumption is not satisfied (i.e., when assignments of class labels to feature vectors are not independent).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Correlations of random classifiers on large data sets

Article 15 June 2021

Accuracy of regularized D-rule for binary classification

Article 06 December 2017

Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers

Article 15 November 2019

References

Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. 19, 357–367 (1967)
Article MathSciNet Google Scholar
Bengio, Y., Courville, A.: Deep learning of representations. In: Bianchini, M., Maggini, M., Jain, L. (eds.) Handbook of Neural Information Processing. Springer, Heidelberg (2013)
Google Scholar
Chung, F., Lui, L.: Concentration inequalities and martingale inequalities: a survey. Internet Math. 3, 79–127 (2005)
Article MathSciNet Google Scholar
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)
Article MathSciNet Google Scholar
Doerr, B.: Analyzing randomized search heuristics: tools from probability theory. In: Theory of Randomized Search Heuristics - Foundations and Recent Developments, chap. 1, pp. 1–20. World Scientific Publishing (2011)
Google Scholar
Dubhashi, D., Panconesi, A.: Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Gorban, A.N., Golubkov, A., Grechuk, B., Mirkes, E.M., Tyukin, I.Y.: Correction of AI systems by linear discriminants: probabilistic foundations. Inf. Sci. 466, 303–322 (2018)
Article MathSciNet Google Scholar
Gorban, A., Tyukin, I.: Stochastic separation theorems. Neural Netw. 94, 255–259 (2017)
Article Google Scholar
Ito, Y.: Finite mapping by neural networks and truth functions. Math. Sci. 17, 69–77 (1992)
MathSciNet MATH Google Scholar
Kůrková, V., Sanguineti, M.: Probabilistic lower bounds for approximation by shallow perceptron networks. Neural Netw. 91, 34–41 (2017)
Article Google Scholar
Kůrková, V., Sanguineti, M.: Probabilistic bounds on complexity of networks computing binary classification tasks. In: Krajči, S. (ed.) Proceedings of ITAT 2018. CEUR Workshop Proceedings, vol. 2203, pp. 86–91 (2018)
Google Scholar
Kůrková, V., Sanguineti, M.: Classification by sparse neural networks. IEEE Trans. Neural Netw. Learn. Syst. (2019). https://doi.org/10.1109/TNNLS.2018.2888517
Ledoux, M.: The Concentration of Measure Phenomenon. AMS, Providence (2001)
Google Scholar
Lin, H., Tegmark, M., Rolnick, D.: Why does deep and cheap learning work so well? J. Stat. Phys. 168, 1223–1247 (2017)
Article MathSciNet Google Scholar
Mhaskar, H.N., Poggio, T.: Deep vs. shallow networks: an approximation theory perspective. Anal. Appl. 14, 829–848 (2016)
Article MathSciNet Google Scholar
Rennie, J., Shih, L., Teevan, J., Karger, D.: Tackling the poor assumptions of Naive Bayes classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003) (2003)
Google Scholar
Tropp, A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50, 2231–2242 (2004)
Article MathSciNet Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1997)
MATH Google Scholar

Download references

Acknowledgments

V.K. was partially supported by the Czech Grant Foundation grant GA 18-23827S and by institutional support of the Institute of Computer Science RVO 67985807. M.S. was partially supported by a FFABR grant of the Italian Ministry of Education, University and Research (MIUR). He is Research Associate at INM (Institute for Marine Engineering) of CNR (National Research Council of Italy) under the Project PDGP 2018/20 DIT.AD016.001 “Technologies for Smart Communities” and he is a member of GNAMPA-INdAM (Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni - Instituto Nazionale di Alta Matematica).

Author information

Authors and Affiliations

Institute of Computer Science, Czech Academy of Sciences, Prague, Czech Republic
Věra Kůrková
Department of Computer Science, Bioengineering, Robotics, and Systems Engineering (DIBRIS), University of Genoa, Genoa, Italy
Marcello Sanguineti

Authors

Věra Kůrková
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Sanguineti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcello Sanguineti .

Editor information

Editors and Affiliations

Department of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genova, Genoa, Italy
Luca Oneto
Department of Mathematics, University of Padova, Padua, Italy
Nicolò Navarin
Department of Mathematics, University of Padova, Padua, Italy
Alessandro Sperduti
Department of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genova, Genoa, Italy
Davide Anguita

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kůrková, V., Sanguineti, M. (2020). Probabilistic Bounds for Binary Classification of Large Data Sets. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds) Recent Advances in Big Data and Deep Learning. INNSBDDL 2019. Proceedings of the International Neural Networks Society, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-030-16841-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-16841-4_32
Published: 03 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16840-7
Online ISBN: 978-3-030-16841-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Probabilistic Bounds for Binary Classification of Large Data Sets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Correlations of random classifiers on large data sets

Accuracy of regularized D-rule for binary classification

Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Probabilistic Bounds for Binary Classification of Large Data Sets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Correlations of random classifiers on large data sets

Accuracy of regularized D-rule for binary classification

Feature uncertainty bounds for explicit feature maps and large robust nonlinear SVM classifiers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation