article

Free access

Posterior Regularization for Structured Latent Variable Models

Authors:

Kuzman Ganchev,

Jennifer Gillenwater,

Ben TaskarAuthors Info & Claims

The Journal of Machine Learning Research, Volume 11

Pages 2001 - 2049

Published: 01 August 2010 Publication History

Abstract

We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.

References

[1]

A. Abeillé. Treebanks: Building and Using Parsed Corpora. Springer, 2003.

[2]

S. Afonso, E. Bick, R. Haber, and D. Santos. Floresta Sinta(c)tica: a treebank for Portuguese. In Proc. LREC, 2002.

[3]

Y. Altun, M. Johnson, and T. Hofmann. Investigating loss functions and optimization methods for discriminative learning of label sequences. In Proc. EMNLP, 2003.

Digital Library

[4]

R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817-1853, 2005.

Digital Library

[5]

J. Atserias, B. Casas, E. Comelles, M. González, L. Padró, and M. Padró. Freeling 1.3: Syntactic and semantic services in an open-source nlp library. In Proc. LREC, 2006.

[6]

M. Balcan and A. Blum. A PAC-style model for learning from labeled and unlabeled data. In Proc. COLT, 2005.

Digital Library

[7]

C. Bannard and C. Callison-Burch. Paraphrasing with bilingual parallel corpora. In Proc. ACL, 2005.

Digital Library

[8]

K. Bellare, G. Druck, and A. McCallum. Alternating projections for learning with expectation constraints. In Proc. UAI, 2009.

Digital Library

[9]

D. P. Bertsekas. Nonlinear Programming: 2nd Edition. Athena scientific, 1999.

[10]

J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proc. EMNLP, 2006.

Digital Library

[11]

J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proc. ACL, 2007.

[12]

A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proc. COLT, 1998.

Digital Library

[13]

U. Brefeld, C. Büscher, and T. Scheffer. Multi-view hidden markov perceptrons. In Proc. LWA, 2005.

[14]

P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, M. J. Goldsmith, J. Hajic, R. L. Mercer, and S. Mohanty. But dictionaries are data too. In Proc. HLT, 1993.

Digital Library

[15]

P. F. Brown, S. Della Pietra, V. J. Della Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-311, 1994.

Digital Library

[16]

C. Callison-Burch. Paraphrasing and Translation. PhD thesis, University of Edinburgh, 2007.

[17]

C. Callison-Burch. Syntactic Constraints on Paraphrases Extracted from Parallel Corpora. In Proc. EMNLP, 2008.

Digital Library

[18]

A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., and T. M. Mitchell. Coupled Semi-Supervised Learning for Information Extraction. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM), 2010.

Digital Library

[19]

M. Chang, L. Ratinov, and D. Roth. Guiding semi-supervision with constraint-driven learning. In Proc. ACL, 2007.

[20]

M.W. Chang, L. Ratinov, N. Rizzolo, and D. Roth. Learning and inference with constraints. In Proceedings of the National Conference on Artificial Intelligence (AAAI). AAAI, 2008.

Digital Library

[21]

C. Chelba, D. Engle, F. Jelinek, V. Jimenez, S. Khudanpur, L. Mangu, H. Printz, E. Ristad, R. Rosenfeld, A. Stolcke, and D. Wu. Structure and performance of a dependency language model. In Proc. Eurospeech, 1997.

[22]

D. Chiang, A. Lopez, N. Madnani, C. Monz, P. Resnik, and M. Subotin. The hiero machine translation system: extensions, evaluation, and analysis. In Proc. HLT-EMNLP, 2005.

Digital Library

[23]

M. Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, 1999.

Digital Library

[24]

M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proc. SIGDATEMNLP , 1999.

[25]

H. Daumé III. Cross-task knowledge-constrained self training. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008.

Digital Library

[26]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Royal Statistical Society, Ser. B, 39(1):1-38, 1977.

[27]

G. Druck, G. Mann, and A. McCallum. Semi-supervised learning of dependency parsers using generalized expectation criteria. In Proc. ACL-IJCNLP, 2009.

Digital Library

[28]

J. Eisner. Three new probabilistic models for dependency parsing: an exploration. In Proc. CoLing, 1996.

Digital Library

[29]

H. Fox. Phrasal cohesion and statistical machine translation. In Proc. EMNLP, 2002.

Digital Library

[30]

M. Galley, M. Hopkins, K. Knight, and D. Marcu. What's in a translation rule? In Proc. HLT-NAACL , 2004.

[31]

K. Ganchev, J. Graça, J. Blitzer, and B. Taskar. Multi-view learning over structured and nonidentical outputs. In Proc. UAI, 2008a.

[32]

K. Ganchev, J. Graça, and B. Taskar. Better alignments = better translations? In Proc. ACL, 2008b.

[33]

K. Ganchev, J. Gillenwater, and B. Taskar. Dependency grammar induction via bitext projection constraints. In Proc. ACL-IJCNLP, 2009.

Digital Library

[34]

J. Gao and M. Johnson. A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proc. EMNLP, 2008.

Digital Library

[35]

S. Goldwater and T. Griffiths. A fully bayesian approach to unsupervised part-of-speech tagging. In Proc. ACL, 2007.

[36]

J. Graça, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In Proc. NIPS, 2007.

[37]

J. Graça, K. Ganchev, F. Pereira, and B. Taskar. Parameter vs. posterior sparisty in latent variable models. In Proc. NIPS, 2009a.

[38]

J. Graça, K. Ganchev, and B. Taskar. Postcat - posterior constrained alignment toolkit. In The Third Machine Translation Marathon, 2009b.

[39]

J. Graça, K. Ganchev, and B. Taskar. Learning tractable word alignment models with complex constraints. Computational Linguistics, 36, September 2010.

Digital Library

[40]

A. Haghighi and D. Klein. Prototype-driven learning for sequence models. In Proc. NAACL, 2006.

Digital Library

[41]

A. Haghighi, A. Ng, and C. Manning. Robust textual inference via graph matching. In Proc. EMNLP, 2005.

Digital Library

[42]

R. Hwa, P. Resnik, A. Weinberg, C. Cabezas, and O. Kolak. Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering, 11:11-311, 2005.

Digital Library

[43]

M. Johnson. Why doesn't EM find good HMM POS-taggers. In Proc. EMNLP-CoNLL, 2007.

[44]

T. Kailath. The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communications, 15(1):52-60, 2 1967. ISSN 0096-2244.

[45]

S. Kakade and D. Foster. Multi-view regression via canonical correlation analysis. In Proc. COLT, 2007.

Digital Library

[46]

D. Klein and C. Manning. Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proc. ACL, 2004.

Digital Library

[47]

P. Koehn. Europarl: A parallel corpus for statistical machine translation. In MT Summit, 2005.

[48]

P. Koehn, F. J. Och, and D. Marcu. Statistical phrase-based translation. In Proc. NAACL, 2003.

Digital Library

[49]

S. Lee and K. Choi. Reestimation and best-first parsing algorithm for probabilistic dependency grammar. In Proc. WVLC-5, 1997.

[50]

Z. Li and J. Eisner. First- and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 40-51, Singapore, 2009.

Digital Library

[51]

P. Liang, B. Taskar, and D. Klein. Alignment by agreement. In Proc. HLT-NAACL, 2006.

Digital Library

[52]

P. Liang, M. I. Jordan, and D. Klein. Learning from measurements in exponential families. In Proc. ICML, 2009.

Digital Library

[53]

G. S. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proc. ICML, 2007.

Digital Library

[54]

G. S. Mann and A. McCallum. Generalized expectation criteria for semi-supervised learning of conditional random fields. In Proc. ACL, 2008.

[55]

M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2):313-330, 1993.

Digital Library

[56]

E. Matusov, R. Zens, and H. Ney. Symmetric word alignments for statistical machine translation. In Proc. COLING, 2004.

Digital Library

[57]

E. Matusov, N. Ueffing, and H. Ney. Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. In Proc. EACL, 2006.

[58]

R. McDonald, K. Crammer, and F. Pereira. Online large-margin training of dependency parsers. In Proc. ACL, 2005.

Digital Library

[59]

R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental, sparse and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355-368. Kluwer, 1998.

Digital Library

[60]

J. Nivre, J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret. The CoNLL 2007 shared task on dependency parsing. In Proc. EMNLP-CoNLL, 2007.

[61]

F. J. Och and H. Ney. Improved statistical alignment models. In Proc. ACL, 2000.

Digital Library

[62]

F. J. Och and H. Ney. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19-51, 2003. ISSN 0891-2017.

Digital Library

[63]

A. Pauls, J. Denero, and D. Klein. Consensus training for consensus decoding in machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1418-1427, Singapore, 2009. Association for Computational Linguistics.

Digital Library

[64]

N. Quadrianto, J. Petterson, and A. Smola. Distribution matching for transduction. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1500-1508. MIT Press, 2009.

[65]

C. Quirk, A. Menezes, and C. Cherry. Dependency treelet translation: syntactically informed phrasal smt. In Proc. ACL, 2005.

Digital Library

[66]

M. Rogati, S. McCarley, and Y. Yang. Unsupervised learning of arabic stemming using a parallel corpus. In Proc. ACL, 2003.

Digital Library

[67]

D. Rosenberg and P. Bartlett. The rademacher complexity of co-regularized kernel classes. In Proc. AI Stats, 2007.

[68]

E. F. Tjong Kim Sang and S. Buchholz. Introduction to the CoNLL-2000 shared task: Chunking. In Proc. CoNLL and LLL, 2000.

Digital Library

[69]

E. F. Tjong Kim Sang and F. De Meulder. Introduction to the conll-2003 shared task: language-independent named entity recognition. In Proc. HLT-NAACL, 2003.

Digital Library

[70]

L. Shen, J. Xu, and R. Weischedel. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proc. ACL, 2008.

[71]

K. Simov, P. Osenova, M. Slavcheva, S. Kolkovska, E. Balabanova, D. Doikoff, K. Ivanova, A. Simov, E. Simov, and M. Kouylekov. Building a linguistically interpreted corpus of bulgarian: the bultreebank. In Proc. LREC, 2002.

[72]

V. Sindhwani, P. Niyogi, and M. Belkin. A co-regularization approach to semi-supervised learning with multiple views. In Proc. ICML, 2005.

[73]

A. Smith, T. Cohn, and M. Osborne. Logarithmic opinion pools for conditional random fields. In Proc. ACL, 2005.

Digital Library

[74]

B. Snyder and R. Barzilay. Unsupervised multilingual learning for morphological segmentation. In Proc. ACL, 2008.

[75]

B. Snyder, T. Naseem, J. Eisenstein, and R. Barzilay. Adding more languages improves unsupervised multilingual part-of-speech tagging: a bayesian non-parametric approach. In Proc. NAACL, 2009.

Digital Library

[76]

J. Tiedemann. Building a multilingual parallel subtitle corpus. In Proc. CLIN, 2007.

[77]

K. Toutanova, D. Klein, C. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. HLT-NAACL, 2003.

Digital Library

[78]

P. Tseng. An analysis of the EM algorithm and entropy-like proximal point methods. Mathematics of Operations Research, 29(1):27-44, 2004.

Digital Library

[79]

Y. Tsuruoka and J. Tsujii. Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proc. HLT-EMNLP, 2005.

Digital Library

[80]

L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8: 189-201, 1979.

[81]

S. Vogel, H. Ney, and C. Tillmann. Hmm-based word alignment in statistical translation. In Proc. COLING, 1996.

Digital Library

[82]

H. Yamada and Y. Matsumoto. Statistical dependency analysis with support vector machines. In Proc. IWPT, 2003.

[83]

D. Yarowsky and G. Ngai. Inducing multilingual pos taggers and np bracketers via robust projection across aligned corpora. In Proc. NAACL, 2001.

Digital Library

Cited By

Wang SDu YGuo XPan BQin ZZhao L(2024)Controllable Data Generation by Deep Learning: A ReviewACM Computing Surveys10.1145/364860956:9(1-38)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3648609
Coletta AGopalakrishan SBorrajo DVyetrenko SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On the constrained time-series generation problemProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668790(61048-61059)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668790
Pukdee RSam DKolter JBalcan MRavikumar POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning with explanation constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668294(49883-49926)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668294
Show More Cited By

Index Terms

Posterior Regularization for Structured Latent Variable Models
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Structured sparsity with group-graph regularization
AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

In many learning tasks with structural properties, structural sparsity methods help induce sparse models, usually leading to better interpretability and higher generalization performance. One popular approach is to use group sparsity regularization that ...
Graph learning for latent-variable Gaussian graphical models under laplacian constraints
Highlights
- The problem of graph Laplacian estimation with latent variables is formulated.
- ...
Abstract
In recent years, graph learning for smooth signals under Laplacian constraints has attracted increasing attention due to the wide application of graph Laplacian matrix in spectral graph theory, machine learning, and graph signal ...
Image compressive sensing via Truncated Schatten-p Norm regularization

Low-rank property as a useful image prior has attracted much attention in image processing communities. Recently, a nonlocal low-rank regularization (NLR) approach toward exploiting low-rank property has shown the state-of-the-art performance in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 11, Issue

3/1/2010

3637 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 August 2010

Published in JMLR Volume 11

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

116
Total Citations
View Citations
979
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)9

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang SDu YGuo XPan BQin ZZhao L(2024)Controllable Data Generation by Deep Learning: A ReviewACM Computing Surveys10.1145/364860956:9(1-38)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1145/3648609
Coletta AGopalakrishan SBorrajo DVyetrenko SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On the constrained time-series generation problemProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668790(61048-61059)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668790
Pukdee RSam DKolter JBalcan MRavikumar POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning with explanation constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668294(49883-49926)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668294
Ahmed KChang KVan den Broeck GOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)A pseudo-semantic loss for autoregressive models with logical constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666928(18325-18340)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666928
Wang KTsamoura ERoth DOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On learning latent models with multi-instance weak supervisionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666546(9661-9694)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666546
Naik AWu YNaik MWong EKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Do machine learning models learn statistical rules inferred from data?Proceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619475(25677-25693)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619475
Feldstein JJurčius MTsamoura EKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Parallel neurosymbolic integration with concordiaProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618802(9870-9885)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618802
Li ZZhang XDeng FZhang Y(2023)Integrating deep neural network with logic rules for credit scoringIntelligent Data Analysis10.3233/IDA-21646027:2(483-500)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/IDA-216460
Wilson GDoppa JCook D(2023)CALDA: Improving Multi-Source Time Series Domain Adaptation With Contrastive Adversarial LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.329834645:12(14208-14221)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3298346
Meng TLu SPeng NChang KKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Controllable text generation with neurally-decomposed oracleProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602309(28125-28139)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602309
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents