article

General Convergence Results for Linear Discriminant Updates

Authors:

Nick Littlestone,

Dale SchuurmansAuthors Info & Claims

Machine Learning, Volume 43, Issue 3

Pages 173 - 210

https://doi.org/10.1023/A:1010844028087

Published: 01 June 2001 Publication History

Abstract

The problem of learning linear-discriminant concepts can be solved by various mistake-driven update procedures, including the Winnow family of algorithms and the well-known Perceptron algorithm. In this paper we define the general class of “quasi-additive” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge.

Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additive-update behavior of Perceptron and the multiplicative-update behavior of Winnow.

References

[1]

Azoury, K. & Warmuth, M. (1999). Relative loss bounds for on-line density estimation with the exponential family of distributions. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 31-40).

Digital Library

[2]

Block, H. D. (1962). The perceptron: A model for brain functioning. Reviews of Modern Physics, 34:1, 123-135.

[3]

Blum, A. (1997). Empirical support for Winnow and Weighted-Majority based algorithms: Results on a calendar scheduling domain. Machine Learning, 26:1, 5-23.

Digital Library

[4]

Bregman, L. M. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7, 200-217.

[5]

Cesa-Bianchi, N., Freund, Y., Helmbold, D., Haussler, D., Schapire, R., & Warmuth, M. (1997). How to use expert advice. J. ACM 44, 427-485.

Digital Library

[6]

Cesa-Bianchi, N., Helmbold, D., & Panizza., S. (1996). On Bayes methods for on-line boolean prediction. In Proceedings of the Ninth Annual Conference on Computational Learning Theory (pp. 314-324).

Digital Library

[7]

Censor, Y. & Zenios, S. A. (1997). Parallel Optimization: Theory, Algorithms, and Applications. New York: Oxford University Press.

Digital Library

[8]

Duda, R. O. & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.

[9]

Dagan, I., Karov, Y., & Roth, D. (1997). Mistake-driven learning in text categorization. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (pp. 55-63).

[10]

Ellis, R. (1985). Entropy, large deviations, and statistical mechanics. New York: Springer-Verlag.

[11]

Gentile, C. & Littlestone, N. (1999). The robustness of the p-norm algorithms. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 1-11).

Digital Library

[12]

Grove, A., Littlestone, N., & Schuurmans, D. (1997). General convergence results for linear discriminant updates. In Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 171-183).

Digital Library

[13]

Golding, A. & Roth, D. A Winnow-based approach to spelling correction. Machine Learning, 34:1, 107-130.

Digital Library

[14]

Gentile, C. & Warmuth, M. (1999). Linear hinge loss and average margin. In Advances in Neural Information Processing Systems 11 (pp. 225-231).

Digital Library

[15]

Helmbold, D., Kivinen, J., & Warmuth, M. (1999). Worst-case loss bounds for single neurons. IEEE Trans. Neural Networks, 10, 1291-1304.

Digital Library

[16]

Khardon, R., Roth, D., & Valiant, L. (1999). Relational learning for NLP using linear threshold elements. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (pp. 911-917).

Digital Library

[17]

Kivinen, J. & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132:1, 1-63.

Digital Library

[18]

Kivinen, J. & Warmuth, M. (1998). Relative loss bounds for multidimensional regression problems. In Advances in Neural Information Processing Systems 10 (pp. 287-293).

Digital Library

[19]

Kivinen, J., Warmuth, M., & Auer, P. (1997). The perceptron algorithm versus winnow: linear versus logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:1-2, 325-343.

Digital Library

[20]

Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear threshold algorithm. Machine Learning, 2:4, 285-318.

Digital Library

[21]

Littlestone, N. (1989). Mistake bounds & linear-threshold learning algorithms. Ph.D. thesis, University of California, Santa Cruz, Technical Report UCSC-CRL-89-11.

Digital Library

[22]

Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear-threshold learning using Winnow. In Proceedings of the Fourth Annual Conference on Computational Learning Theory (pp. 147-156).

Digital Library

[23]

Littlestone, N. (1995). Comparing several linear-threshold learning algorithms on tasks involving superfluous attributes. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 353-361).

Digital Library

[24]

Littlestone, N. & Mesterharm, C. (1997). An apobayesian relative of winnow. In Advances in Neural Information Processing Systems 9 (pp. 204-210).

[25]

Littlestone, N. & Warmuth, M. (1989). The weighted majority algorithm. In Proceedings of the Thirtieth IEEE Annual Symposium on the Foundations of Computer Science (pp. 256-261).

Digital Library

[26]

Minsky, M. L. & Papert, S. A. (1969). Perceptrons. Cambridge, MA: MIT Press.

[27]

Nilsson, N. J. (1965). Learning machines. San Mateo, CA: Morgan Kaufmann.

[28]

Papert, S. (1961). Some mathematical models of learning. In Proceedings of the Fourth London Symposium on Information Theory.

[29]

Rockafellar, R. (1970). Convex analysis. Princeton University Press.

[30]

Rosenblatt, F. (1962). Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Washington, DC: Spartan Books.

[31]

Vovk, V. (1990). Aggregating strategies. In Proceedings of the Third Annual Conference on Computational Learning Theory (pp. 371-383).

Digital Library

[32]

Warmuth, M. & Jagota, A. (1998). Continuous and discrete-time non-linear gradient descent: Relative loss bounds and convergence. In Fifth International Symposium on Artificial Intelligence and Mathematics.

Cited By

Sun HAhn KThrampoulidis CAzizan NKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Mirror descent maximizes generalized margin and can be implemented efficientlyProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602524(31089-31101)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602524
Le Thi HHo V(2021)DCA for online prediction with expert adviceNeural Computing and Applications10.1007/s00521-021-05709-033:15(9521-9544)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1007/s00521-021-05709-0
Diakonikolas IKane DManurangsi PLarochelle HRanzato MHadsell RBalcan MLin H(2020)The complexity of adversarially robust proper learning of halfspaces with agnostic noiseProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497441(20449-20461)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497441
Show More Cited By

Index Terms

General Convergence Results for Linear Discriminant Updates
1. Computing methodologies
  1. Machine learning

Recommendations

Perceptron, Winnow, and PAC Learning

We analyze the performance of the widely studied Perceptron and Winnow algorithms for learning linear threshold functions under Valiant's probably approximately correct (PAC) model of concept learning. We show that under the uniform distribution on ...
Direct and indirect algorithms for on-line learning of disjunctions

It is easy to design on-line learning algorithms for learning k out of n variable monotone disjunctions by simply keeping one weight per disjunction. Such algorithms use roughly O(n^k) weights which can be prohibitively expensive. Surprisingly, ...
Minimum Generalization Via Reflection: A Fast Linear Threshold Learner

The number of adjustments required to learn the average LTU function of d features, each of which can take on n equally spaced values, grows as approximately n ^2d when the standard perceptron training algorithm is used on the complete input space of n ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 43, Issue 3

June 2001

147 pages

ISSN:0885-6125

Issue’s Table of Contents

Copyright © Copyright © 2001 Kluwer Academic Publishers.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2001

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun HAhn KThrampoulidis CAzizan NKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Mirror descent maximizes generalized margin and can be implemented efficientlyProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602524(31089-31101)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602524
Le Thi HHo V(2021)DCA for online prediction with expert adviceNeural Computing and Applications10.1007/s00521-021-05709-033:15(9521-9544)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1007/s00521-021-05709-0
Diakonikolas IKane DManurangsi PLarochelle HRanzato MHadsell RBalcan MLin H(2020)The complexity of adversarially robust proper learning of halfspaces with agnostic noiseProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497441(20449-20461)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3497441
Zhang CShen JAwasthi PLarochelle HRanzato MHadsell RBalcan MLin H(2020)Efficient active learning of sparse halfspaces with arbitrary bounded noiseProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496327(7184-7197)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496327
Yang LDeng LHajiesmaili MTan CWong W(2018)An Optimal Algorithm for Online Non-Convex LearningProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244202:2(1-25)Online publication date: 13-Jun-2018
https://dl.acm.org/doi/10.1145/3224420
Hazan E(2016)Introduction to Online Convex OptimizationFoundations and Trends in Optimization10.1561/24000000132:3-4(157-325)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1561/2400000013
Zhang LLi LLi T(2015)Patent MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370416:2(1-19)Online publication date: 21-May-2015
https://dl.acm.org/doi/10.1145/2783702.2783704
Devroye LLugosi GNeu G(2015)Random-Walk Perturbations for Online Combinatorial OptimizationIEEE Transactions on Information Theory10.1109/TIT.2015.242825361:7(4099-4106)Online publication date: 1-Jul-2015
https://dl.acm.org/doi/10.1109/TIT.2015.2428253
Audibert JBubeck SLugosi G(2014)Regret in Online Combinatorial OptimizationMathematics of Operations Research10.1287/moor.2013.059839:1(31-45)Online publication date: 1-Feb-2014
https://dl.acm.org/doi/10.1287/moor.2013.0598
Shalev-Shwartz S(2012)Online Learning and Online Convex OptimizationFoundations and Trends® in Machine Learning10.1561/22000000184:2(107-194)Online publication date: 1-Feb-2012
https://dl.acm.org/doi/10.1561/2200000018
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents