research-article

Accurate max-margin training for structured output spaces

Authors:

Sunita Sarawagi,

Rahul GuptaAuthors Info & Claims

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 888 - 895

https://doi.org/10.1145/1390156.1390268

Published: 05 July 2008 Publication History

Abstract

Tsochantaridis et al. (2005) proposed two formulations for maximum margin training of structured spaces: margin scaling and slack scaling. While margin scaling has been extensively used since it requires the same kind of MAP inference as normal structured prediction, slack scaling is believed to be more accurate and better-behaved. We present an efficient variational approximation to the slack scaling method that solves its inference bottleneck while retaining its accuracy advantage over margin scaling.

We further argue that existing scaling approaches do not separate the true labeling comprehensively while generating violating constraints. We propose a new max-margin trainer PosLearn that generates violators to ensure separation at each position of a decomposable loss function. Empirical results on real datasets illustrate that PosLearn can reduce test error by up to 25% over margin scaling and 10% over slack scaling. Further, PosLearn violators can be generated more efficiently than slack violators; for many structured tasks the time required is just twice that of MAP inference.

References

[1]

Bordes, A., Bottou, L., Gallinari, P., & Weston, J. (2007). Solving multiclass support vector machines with larank. ICML (pp. 89--96).

Digital Library

[2]

Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. NIPS.

[3]

Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23, 1222--1239.

Digital Library

[4]

Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res., 3, 951--991.

Digital Library

[5]

Joachims, T. (2006). Training linear SVMs in linear time. KDD.

Digital Library

[6]

Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. (1999). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning in graphical models. MIT Press.

Digital Library

[7]

Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the International Conference on Machine Learning (ICML-2001). Williams, MA.

Digital Library

[8]

LeCun, Y., Chopra, S., Hadsell, R., Marc'Aurelio, R., & Huang, F. (2006). A tutorial on energy-based learning. Predicting Structured Data. MIT Press.

[9]

McCallum, A., Nigam, K., Reed, J., Rennie, J., & Seymore, K. (2000). Cora: Computer science research paper search engine. http://cora.whizbang.com/.

[10]

McDonald, R., Crammer, K., & Pereira, F. (2005a). Flexible text segmentation with structured multilabel classification. HLT/EMNLP.

Digital Library

[11]

McDonald, R., Crammer, K., & Pereira, F. (2005b). Online large-margin training of dependency parsers. ACL (pp. 91--98).

Digital Library

[12]

Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional random fields. HLT-NAACL (pp. 329--336).

[13]

Ratliff, N., Bagnell, J., & Zinkevich, M. (2007). (online) subgradient methods for structured prediction. AIStats.

[14]

Sarawagi, S., & Cohen, W. W. (2004). Semi-markov conditional random fields for information extraction. NIPS.

[15]

Taskar, B. (2004). Learning structured prediction models: A large margin approach. Doctoral dissertation, Stanford University.

Digital Library

[16]

Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. EMNLP.

[17]

Taskar, B., Lacoste-Julien, S., & Jordan, M. I. (2006). Structured prediction, dual extragradient and bregman projections. J. Mach. Learn. Res., 7, 1627--1653.

Digital Library

[18]

Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6(Sep), 1453--1484.

Digital Library

Cited By

Bauer ANakajima SMüller K(2023)Polynomial-Time Constrained Message Passing for Exact MAP Inference on Discrete Models with Global DependenciesMathematics10.3390/math1112262811:12(2628)Online publication date: 8-Jun-2023
https://doi.org/10.3390/math11122628
Bello KHonorio J(2018)Learning latent variable structured prediction models with Gaussian perturbationsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327236(3149-3159)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327236
Bauer ANakajima SMuller K(2017)Efficient Exact Inference With Loss Augmented Objective in Structured LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2016.259872128:11(2566-2579)Online publication date: Nov-2017
https://doi.org/10.1109/TNNLS.2016.2598721
Show More Cited By

Index Terms

Accurate max-margin training for structured output spaces

Recommendations

Max-Margin Deep Generative Models for (Semi-)Supervised Learning
Deep generative models (DGMs) can effectively capture the underlying distributions of complex data by learning multilayered representations and performing inference. However, it is relatively insufficient to boost the discriminative ability of DGMs. This ...
Max-margin deep generative models
NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2

Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, little work has been done on examining or empowering the ...
Robust Bayesian max-margin clustering
NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1

We present max-margin Bayesian clustering (BMC), a general and robust framework that incorporates the max-margin criterion into Bayesian clustering models, as well as two concrete models of BMC to demonstrate its flexibility and effectiveness in dealing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '08: Proceedings of the 25th international conference on Machine learning

July 2008

1310 pages

ISBN:9781605582054

DOI:10.1145/1390156

General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Pascal
University of Helsinki
Xerox
Federation of Finnish Learned Societies
Google Inc.
NSF
Machine Learning Journal/Springer
Microsoft Research: Microsoft Research
Intel: Intel
Yahoo!
Helsinki Institute for Information Technology
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '08

Sponsor:

Microsoft Research
Intel
IBM

ICML '08: The 25th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

July 5 - 9, 2008

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
209
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bauer ANakajima SMüller K(2023)Polynomial-Time Constrained Message Passing for Exact MAP Inference on Discrete Models with Global DependenciesMathematics10.3390/math1112262811:12(2628)Online publication date: 8-Jun-2023
https://doi.org/10.3390/math11122628
Bello KHonorio J(2018)Learning latent variable structured prediction models with Gaussian perturbationsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327236(3149-3159)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327236
Bauer ANakajima SMuller K(2017)Efficient Exact Inference With Loss Augmented Objective in Structured LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2016.259872128:11(2566-2579)Online publication date: Nov-2017
https://doi.org/10.1109/TNNLS.2016.2598721
Bauer ABraun MMuller K(2017)Accurate Maximum-Margin Training for Parsing With Context-Free GrammarsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2015.249714928:1(44-56)Online publication date: Jan-2017
https://doi.org/10.1109/TNNLS.2015.2497149
Halevy ANoy NSarawagi SWhang SYu XBourdeau JHendler JNkambou RHorrocks IZhao B(2016)Discovering Structure in the Universe of Attribute NamesProceedings of the 25th International Conference on World Wide Web10.1145/2872427.2882975(939-949)Online publication date: 11-Apr-2016
https://dl.acm.org/doi/10.1145/2872427.2882975
Hadjis SErmon S(2015)Importance sampling over setsProceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence10.5555/3020847.3020885(355-364)Online publication date: 12-Jul-2015
https://dl.acm.org/doi/10.5555/3020847.3020885
Ren ZPeetz MLiang Svan Dolen Wde Rijke MGeva STrotman ABruza PClarke CJärvelin K(2014)Hierarchical multi-label classification of social text streamsProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609595(213-222)Online publication date: 3-Jul-2014
https://dl.acm.org/doi/10.1145/2600428.2609595
Bauer AGornitz NBiegler FMuller KKloft M(2014)Efficient Algorithms for Exact Inference in Sequence Labeling SVMsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2013.228176125:5(870-881)Online publication date: May-2014
https://doi.org/10.1109/TNNLS.2013.2281761
Ahmed EShakhnarovich GMaji S(2014)Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for DetectionComputer Vision – ECCV 201410.1007/978-3-319-10590-1_6(80-94)Online publication date: 2014
https://doi.org/10.1007/978-3-319-10590-1_6
Sungrack Yun Yoo C(2012)Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion ClassificationIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2011.216240520:2(585-598)Online publication date: 1-Feb-2012
https://dl.acm.org/doi/10.1109/TASL.2011.2162405
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents