Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1390156.1390268acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Accurate max-margin training for structured output spaces

Published: 05 July 2008 Publication History

Abstract

Tsochantaridis et al. (2005) proposed two formulations for maximum margin training of structured spaces: margin scaling and slack scaling. While margin scaling has been extensively used since it requires the same kind of MAP inference as normal structured prediction, slack scaling is believed to be more accurate and better-behaved. We present an efficient variational approximation to the slack scaling method that solves its inference bottleneck while retaining its accuracy advantage over margin scaling.
We further argue that existing scaling approaches do not separate the true labeling comprehensively while generating violating constraints. We propose a new max-margin trainer PosLearn that generates violators to ensure separation at each position of a decomposable loss function. Empirical results on real datasets illustrate that PosLearn can reduce test error by up to 25% over margin scaling and 10% over slack scaling. Further, PosLearn violators can be generated more efficiently than slack violators; for many structured tasks the time required is just twice that of MAP inference.

References

[1]
Bordes, A., Bottou, L., Gallinari, P., & Weston, J. (2007). Solving multiclass support vector machines with larank. ICML (pp. 89--96).
[2]
Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. NIPS.
[3]
Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23, 1222--1239.
[4]
Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res., 3, 951--991.
[5]
Joachims, T. (2006). Training linear SVMs in linear time. KDD.
[6]
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. (1999). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning in graphical models. MIT Press.
[7]
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the International Conference on Machine Learning (ICML-2001). Williams, MA.
[8]
LeCun, Y., Chopra, S., Hadsell, R., Marc'Aurelio, R., & Huang, F. (2006). A tutorial on energy-based learning. Predicting Structured Data. MIT Press.
[9]
McCallum, A., Nigam, K., Reed, J., Rennie, J., & Seymore, K. (2000). Cora: Computer science research paper search engine. http://cora.whizbang.com/.
[10]
McDonald, R., Crammer, K., & Pereira, F. (2005a). Flexible text segmentation with structured multilabel classification. HLT/EMNLP.
[11]
McDonald, R., Crammer, K., & Pereira, F. (2005b). Online large-margin training of dependency parsers. ACL (pp. 91--98).
[12]
Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional random fields. HLT-NAACL (pp. 329--336).
[13]
Ratliff, N., Bagnell, J., & Zinkevich, M. (2007). (online) subgradient methods for structured prediction. AIStats.
[14]
Sarawagi, S., & Cohen, W. W. (2004). Semi-markov conditional random fields for information extraction. NIPS.
[15]
Taskar, B. (2004). Learning structured prediction models: A large margin approach. Doctoral dissertation, Stanford University.
[16]
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. EMNLP.
[17]
Taskar, B., Lacoste-Julien, S., & Jordan, M. I. (2006). Structured prediction, dual extragradient and bregman projections. J. Mach. Learn. Res., 7, 1627--1653.
[18]
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6(Sep), 1453--1484.

Cited By

View all
  • (2023)Polynomial-Time Constrained Message Passing for Exact MAP Inference on Discrete Models with Global DependenciesMathematics10.3390/math1112262811:12(2628)Online publication date: 8-Jun-2023
  • (2018)Learning latent variable structured prediction models with Gaussian perturbationsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327236(3149-3159)Online publication date: 3-Dec-2018
  • (2017)Efficient Exact Inference With Loss Augmented Objective in Structured LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2016.259872128:11(2566-2579)Online publication date: Nov-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Pascal
  • University of Helsinki
  • Xerox
  • Federation of Finnish Learned Societies
  • Google Inc.
  • NSF
  • Machine Learning Journal/Springer
  • Microsoft Research: Microsoft Research
  • Intel: Intel
  • Yahoo!
  • Helsinki Institute for Information Technology
  • IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '08
Sponsor:
  • Microsoft Research
  • Intel
  • IBM

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Polynomial-Time Constrained Message Passing for Exact MAP Inference on Discrete Models with Global DependenciesMathematics10.3390/math1112262811:12(2628)Online publication date: 8-Jun-2023
  • (2018)Learning latent variable structured prediction models with Gaussian perturbationsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327236(3149-3159)Online publication date: 3-Dec-2018
  • (2017)Efficient Exact Inference With Loss Augmented Objective in Structured LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2016.259872128:11(2566-2579)Online publication date: Nov-2017
  • (2017)Accurate Maximum-Margin Training for Parsing With Context-Free GrammarsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2015.249714928:1(44-56)Online publication date: Jan-2017
  • (2016)Discovering Structure in the Universe of Attribute NamesProceedings of the 25th International Conference on World Wide Web10.1145/2872427.2882975(939-949)Online publication date: 11-Apr-2016
  • (2015)Importance sampling over setsProceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence10.5555/3020847.3020885(355-364)Online publication date: 12-Jul-2015
  • (2014)Hierarchical multi-label classification of social text streamsProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609595(213-222)Online publication date: 3-Jul-2014
  • (2014)Efficient Algorithms for Exact Inference in Sequence Labeling SVMsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2013.228176125:5(870-881)Online publication date: May-2014
  • (2014)Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for DetectionComputer Vision – ECCV 201410.1007/978-3-319-10590-1_6(80-94)Online publication date: 2014
  • (2012)Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion ClassificationIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2011.216240520:2(585-598)Online publication date: 1-Feb-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media