Nothing Special   »   [go: up one dir, main page]

skip to main content
article

General Convergence Results for Linear Discriminant Updates

Published: 01 June 2001 Publication History

Abstract

The problem of learning linear-discriminant concepts can be solved by various mistake-driven update procedures, including the Winnow family of algorithms and the well-known Perceptron algorithm. In this paper we define the general class of “quasi-additive” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge.
Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additive-update behavior of Perceptron and the multiplicative-update behavior of Winnow.

References

[1]
Azoury, K. & Warmuth, M. (1999). Relative loss bounds for on-line density estimation with the exponential family of distributions. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 31-40).
[2]
Block, H. D. (1962). The perceptron: A model for brain functioning. Reviews of Modern Physics, 34:1, 123-135.
[3]
Blum, A. (1997). Empirical support for Winnow and Weighted-Majority based algorithms: Results on a calendar scheduling domain. Machine Learning, 26:1, 5-23.
[4]
Bregman, L. M. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7, 200-217.
[5]
Cesa-Bianchi, N., Freund, Y., Helmbold, D., Haussler, D., Schapire, R., & Warmuth, M. (1997). How to use expert advice. J. ACM 44, 427-485.
[6]
Cesa-Bianchi, N., Helmbold, D., & Panizza., S. (1996). On Bayes methods for on-line boolean prediction. In Proceedings of the Ninth Annual Conference on Computational Learning Theory (pp. 314-324).
[7]
Censor, Y. & Zenios, S. A. (1997). Parallel Optimization: Theory, Algorithms, and Applications. New York: Oxford University Press.
[8]
Duda, R. O. & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
[9]
Dagan, I., Karov, Y., & Roth, D. (1997). Mistake-driven learning in text categorization. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (pp. 55-63).
[10]
Ellis, R. (1985). Entropy, large deviations, and statistical mechanics. New York: Springer-Verlag.
[11]
Gentile, C. & Littlestone, N. (1999). The robustness of the p-norm algorithms. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 1-11).
[12]
Grove, A., Littlestone, N., & Schuurmans, D. (1997). General convergence results for linear discriminant updates. In Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 171-183).
[13]
Golding, A. & Roth, D. A Winnow-based approach to spelling correction. Machine Learning, 34:1, 107-130.
[14]
Gentile, C. & Warmuth, M. (1999). Linear hinge loss and average margin. In Advances in Neural Information Processing Systems 11 (pp. 225-231).
[15]
Helmbold, D., Kivinen, J., & Warmuth, M. (1999). Worst-case loss bounds for single neurons. IEEE Trans. Neural Networks, 10, 1291-1304.
[16]
Khardon, R., Roth, D., & Valiant, L. (1999). Relational learning for NLP using linear threshold elements. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (pp. 911-917).
[17]
Kivinen, J. & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132:1, 1-63.
[18]
Kivinen, J. & Warmuth, M. (1998). Relative loss bounds for multidimensional regression problems. In Advances in Neural Information Processing Systems 10 (pp. 287-293).
[19]
Kivinen, J., Warmuth, M., & Auer, P. (1997). The perceptron algorithm versus winnow: linear versus logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:1-2, 325-343.
[20]
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear threshold algorithm. Machine Learning, 2:4, 285-318.
[21]
Littlestone, N. (1989). Mistake bounds & linear-threshold learning algorithms. Ph.D. thesis, University of California, Santa Cruz, Technical Report UCSC-CRL-89-11.
[22]
Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear-threshold learning using Winnow. In Proceedings of the Fourth Annual Conference on Computational Learning Theory (pp. 147-156).
[23]
Littlestone, N. (1995). Comparing several linear-threshold learning algorithms on tasks involving superfluous attributes. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 353-361).
[24]
Littlestone, N. & Mesterharm, C. (1997). An apobayesian relative of winnow. In Advances in Neural Information Processing Systems 9 (pp. 204-210).
[25]
Littlestone, N. & Warmuth, M. (1989). The weighted majority algorithm. In Proceedings of the Thirtieth IEEE Annual Symposium on the Foundations of Computer Science (pp. 256-261).
[26]
Minsky, M. L. & Papert, S. A. (1969). Perceptrons. Cambridge, MA: MIT Press.
[27]
Nilsson, N. J. (1965). Learning machines. San Mateo, CA: Morgan Kaufmann.
[28]
Papert, S. (1961). Some mathematical models of learning. In Proceedings of the Fourth London Symposium on Information Theory.
[29]
Rockafellar, R. (1970). Convex analysis. Princeton University Press.
[30]
Rosenblatt, F. (1962). Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Washington, DC: Spartan Books.
[31]
Vovk, V. (1990). Aggregating strategies. In Proceedings of the Third Annual Conference on Computational Learning Theory (pp. 371-383).
[32]
Warmuth, M. & Jagota, A. (1998). Continuous and discrete-time non-linear gradient descent: Relative loss bounds and convergence. In Fifth International Symposium on Artificial Intelligence and Mathematics.

Cited By

View all
  • (2022)Mirror descent maximizes generalized margin and can be implemented efficientlyProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602524(31089-31101)Online publication date: 28-Nov-2022
  • (2021)DCA for online prediction with expert adviceNeural Computing and Applications10.1007/s00521-021-05709-033:15(9521-9544)Online publication date: 1-Aug-2021
  • (2020)The complexity of adversarially robust proper learning of halfspaces with agnostic noiseProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497441(20449-20461)Online publication date: 6-Dec-2020
  • Show More Cited By

Index Terms

  1. General Convergence Results for Linear Discriminant Updates

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Machine Language
    Machine Language  Volume 43, Issue 3
    June 2001
    147 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 June 2001

    Author Tags

    1. Bregman divergence
    2. Perceptron
    3. Winnow
    4. mistake bounds
    5. on-line learning

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Mirror descent maximizes generalized margin and can be implemented efficientlyProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602524(31089-31101)Online publication date: 28-Nov-2022
    • (2021)DCA for online prediction with expert adviceNeural Computing and Applications10.1007/s00521-021-05709-033:15(9521-9544)Online publication date: 1-Aug-2021
    • (2020)The complexity of adversarially robust proper learning of halfspaces with agnostic noiseProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497441(20449-20461)Online publication date: 6-Dec-2020
    • (2020)Efficient active learning of sparse halfspaces with arbitrary bounded noiseProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496327(7184-7197)Online publication date: 6-Dec-2020
    • (2018)An Optimal Algorithm for Online Non-Convex LearningProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244202:2(1-25)Online publication date: 13-Jun-2018
    • (2016)Introduction to Online Convex OptimizationFoundations and Trends in Optimization10.1561/24000000132:3-4(157-325)Online publication date: 1-Aug-2016
    • (2015)Patent MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370416:2(1-19)Online publication date: 21-May-2015
    • (2015)Random-Walk Perturbations for Online Combinatorial OptimizationIEEE Transactions on Information Theory10.1109/TIT.2015.242825361:7(4099-4106)Online publication date: 1-Jul-2015
    • (2014)Regret in Online Combinatorial OptimizationMathematics of Operations Research10.1287/moor.2013.059839:1(31-45)Online publication date: 1-Feb-2014
    • (2012)Online Learning and Online Convex OptimizationFoundations and Trends® in Machine Learning10.1561/22000000184:2(107-194)Online publication date: 1-Feb-2012
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media