Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Posterior Regularization for Structured Latent Variable Models

Published: 01 August 2010 Publication History

Abstract

We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.

References

[1]
A. Abeillé. Treebanks: Building and Using Parsed Corpora. Springer, 2003.
[2]
S. Afonso, E. Bick, R. Haber, and D. Santos. Floresta Sinta(c)tica: a treebank for Portuguese. In Proc. LREC, 2002.
[3]
Y. Altun, M. Johnson, and T. Hofmann. Investigating loss functions and optimization methods for discriminative learning of label sequences. In Proc. EMNLP, 2003.
[4]
R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817-1853, 2005.
[5]
J. Atserias, B. Casas, E. Comelles, M. González, L. Padró, and M. Padró. Freeling 1.3: Syntactic and semantic services in an open-source nlp library. In Proc. LREC, 2006.
[6]
M. Balcan and A. Blum. A PAC-style model for learning from labeled and unlabeled data. In Proc. COLT, 2005.
[7]
C. Bannard and C. Callison-Burch. Paraphrasing with bilingual parallel corpora. In Proc. ACL, 2005.
[8]
K. Bellare, G. Druck, and A. McCallum. Alternating projections for learning with expectation constraints. In Proc. UAI, 2009.
[9]
D. P. Bertsekas. Nonlinear Programming: 2nd Edition. Athena scientific, 1999.
[10]
J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proc. EMNLP, 2006.
[11]
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proc. ACL, 2007.
[12]
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proc. COLT, 1998.
[13]
U. Brefeld, C. Büscher, and T. Scheffer. Multi-view hidden markov perceptrons. In Proc. LWA, 2005.
[14]
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, M. J. Goldsmith, J. Hajic, R. L. Mercer, and S. Mohanty. But dictionaries are data too. In Proc. HLT, 1993.
[15]
P. F. Brown, S. Della Pietra, V. J. Della Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-311, 1994.
[16]
C. Callison-Burch. Paraphrasing and Translation. PhD thesis, University of Edinburgh, 2007.
[17]
C. Callison-Burch. Syntactic Constraints on Paraphrases Extracted from Parallel Corpora. In Proc. EMNLP, 2008.
[18]
A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., and T. M. Mitchell. Coupled Semi-Supervised Learning for Information Extraction. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM), 2010.
[19]
M. Chang, L. Ratinov, and D. Roth. Guiding semi-supervision with constraint-driven learning. In Proc. ACL, 2007.
[20]
M.W. Chang, L. Ratinov, N. Rizzolo, and D. Roth. Learning and inference with constraints. In Proceedings of the National Conference on Artificial Intelligence (AAAI). AAAI, 2008.
[21]
C. Chelba, D. Engle, F. Jelinek, V. Jimenez, S. Khudanpur, L. Mangu, H. Printz, E. Ristad, R. Rosenfeld, A. Stolcke, and D. Wu. Structure and performance of a dependency language model. In Proc. Eurospeech, 1997.
[22]
D. Chiang, A. Lopez, N. Madnani, C. Monz, P. Resnik, and M. Subotin. The hiero machine translation system: extensions, evaluation, and analysis. In Proc. HLT-EMNLP, 2005.
[23]
M. Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, 1999.
[24]
M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proc. SIGDATEMNLP , 1999.
[25]
H. Daumé III. Cross-task knowledge-constrained self training. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008.
[26]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Royal Statistical Society, Ser. B, 39(1):1-38, 1977.
[27]
G. Druck, G. Mann, and A. McCallum. Semi-supervised learning of dependency parsers using generalized expectation criteria. In Proc. ACL-IJCNLP, 2009.
[28]
J. Eisner. Three new probabilistic models for dependency parsing: an exploration. In Proc. CoLing, 1996.
[29]
H. Fox. Phrasal cohesion and statistical machine translation. In Proc. EMNLP, 2002.
[30]
M. Galley, M. Hopkins, K. Knight, and D. Marcu. What's in a translation rule? In Proc. HLT-NAACL , 2004.
[31]
K. Ganchev, J. Graça, J. Blitzer, and B. Taskar. Multi-view learning over structured and nonidentical outputs. In Proc. UAI, 2008a.
[32]
K. Ganchev, J. Graça, and B. Taskar. Better alignments = better translations? In Proc. ACL, 2008b.
[33]
K. Ganchev, J. Gillenwater, and B. Taskar. Dependency grammar induction via bitext projection constraints. In Proc. ACL-IJCNLP, 2009.
[34]
J. Gao and M. Johnson. A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proc. EMNLP, 2008.
[35]
S. Goldwater and T. Griffiths. A fully bayesian approach to unsupervised part-of-speech tagging. In Proc. ACL, 2007.
[36]
J. Graça, K. Ganchev, and B. Taskar. Expectation maximization and posterior constraints. In Proc. NIPS, 2007.
[37]
J. Graça, K. Ganchev, F. Pereira, and B. Taskar. Parameter vs. posterior sparisty in latent variable models. In Proc. NIPS, 2009a.
[38]
J. Graça, K. Ganchev, and B. Taskar. Postcat - posterior constrained alignment toolkit. In The Third Machine Translation Marathon, 2009b.
[39]
J. Graça, K. Ganchev, and B. Taskar. Learning tractable word alignment models with complex constraints. Computational Linguistics, 36, September 2010.
[40]
A. Haghighi and D. Klein. Prototype-driven learning for sequence models. In Proc. NAACL, 2006.
[41]
A. Haghighi, A. Ng, and C. Manning. Robust textual inference via graph matching. In Proc. EMNLP, 2005.
[42]
R. Hwa, P. Resnik, A. Weinberg, C. Cabezas, and O. Kolak. Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering, 11:11-311, 2005.
[43]
M. Johnson. Why doesn't EM find good HMM POS-taggers. In Proc. EMNLP-CoNLL, 2007.
[44]
T. Kailath. The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communications, 15(1):52-60, 2 1967. ISSN 0096-2244.
[45]
S. Kakade and D. Foster. Multi-view regression via canonical correlation analysis. In Proc. COLT, 2007.
[46]
D. Klein and C. Manning. Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proc. ACL, 2004.
[47]
P. Koehn. Europarl: A parallel corpus for statistical machine translation. In MT Summit, 2005.
[48]
P. Koehn, F. J. Och, and D. Marcu. Statistical phrase-based translation. In Proc. NAACL, 2003.
[49]
S. Lee and K. Choi. Reestimation and best-first parsing algorithm for probabilistic dependency grammar. In Proc. WVLC-5, 1997.
[50]
Z. Li and J. Eisner. First- and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 40-51, Singapore, 2009.
[51]
P. Liang, B. Taskar, and D. Klein. Alignment by agreement. In Proc. HLT-NAACL, 2006.
[52]
P. Liang, M. I. Jordan, and D. Klein. Learning from measurements in exponential families. In Proc. ICML, 2009.
[53]
G. S. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proc. ICML, 2007.
[54]
G. S. Mann and A. McCallum. Generalized expectation criteria for semi-supervised learning of conditional random fields. In Proc. ACL, 2008.
[55]
M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2):313-330, 1993.
[56]
E. Matusov, R. Zens, and H. Ney. Symmetric word alignments for statistical machine translation. In Proc. COLING, 2004.
[57]
E. Matusov, N. Ueffing, and H. Ney. Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. In Proc. EACL, 2006.
[58]
R. McDonald, K. Crammer, and F. Pereira. Online large-margin training of dependency parsers. In Proc. ACL, 2005.
[59]
R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justifies incremental, sparse and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355-368. Kluwer, 1998.
[60]
J. Nivre, J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret. The CoNLL 2007 shared task on dependency parsing. In Proc. EMNLP-CoNLL, 2007.
[61]
F. J. Och and H. Ney. Improved statistical alignment models. In Proc. ACL, 2000.
[62]
F. J. Och and H. Ney. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19-51, 2003. ISSN 0891-2017.
[63]
A. Pauls, J. Denero, and D. Klein. Consensus training for consensus decoding in machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1418-1427, Singapore, 2009. Association for Computational Linguistics.
[64]
N. Quadrianto, J. Petterson, and A. Smola. Distribution matching for transduction. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1500-1508. MIT Press, 2009.
[65]
C. Quirk, A. Menezes, and C. Cherry. Dependency treelet translation: syntactically informed phrasal smt. In Proc. ACL, 2005.
[66]
M. Rogati, S. McCarley, and Y. Yang. Unsupervised learning of arabic stemming using a parallel corpus. In Proc. ACL, 2003.
[67]
D. Rosenberg and P. Bartlett. The rademacher complexity of co-regularized kernel classes. In Proc. AI Stats, 2007.
[68]
E. F. Tjong Kim Sang and S. Buchholz. Introduction to the CoNLL-2000 shared task: Chunking. In Proc. CoNLL and LLL, 2000.
[69]
E. F. Tjong Kim Sang and F. De Meulder. Introduction to the conll-2003 shared task: language-independent named entity recognition. In Proc. HLT-NAACL, 2003.
[70]
L. Shen, J. Xu, and R. Weischedel. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proc. ACL, 2008.
[71]
K. Simov, P. Osenova, M. Slavcheva, S. Kolkovska, E. Balabanova, D. Doikoff, K. Ivanova, A. Simov, E. Simov, and M. Kouylekov. Building a linguistically interpreted corpus of bulgarian: the bultreebank. In Proc. LREC, 2002.
[72]
V. Sindhwani, P. Niyogi, and M. Belkin. A co-regularization approach to semi-supervised learning with multiple views. In Proc. ICML, 2005.
[73]
A. Smith, T. Cohn, and M. Osborne. Logarithmic opinion pools for conditional random fields. In Proc. ACL, 2005.
[74]
B. Snyder and R. Barzilay. Unsupervised multilingual learning for morphological segmentation. In Proc. ACL, 2008.
[75]
B. Snyder, T. Naseem, J. Eisenstein, and R. Barzilay. Adding more languages improves unsupervised multilingual part-of-speech tagging: a bayesian non-parametric approach. In Proc. NAACL, 2009.
[76]
J. Tiedemann. Building a multilingual parallel subtitle corpus. In Proc. CLIN, 2007.
[77]
K. Toutanova, D. Klein, C. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. HLT-NAACL, 2003.
[78]
P. Tseng. An analysis of the EM algorithm and entropy-like proximal point methods. Mathematics of Operations Research, 29(1):27-44, 2004.
[79]
Y. Tsuruoka and J. Tsujii. Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proc. HLT-EMNLP, 2005.
[80]
L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8: 189-201, 1979.
[81]
S. Vogel, H. Ney, and C. Tillmann. Hmm-based word alignment in statistical translation. In Proc. COLING, 1996.
[82]
H. Yamada and Y. Matsumoto. Statistical dependency analysis with support vector machines. In Proc. IWPT, 2003.
[83]
D. Yarowsky and G. Ngai. Inducing multilingual pos taggers and np bracketers via robust projection across aligned corpora. In Proc. NAACL, 2001.

Cited By

View all
  • (2024)Controllable Data Generation by Deep Learning: A ReviewACM Computing Surveys10.1145/364860956:9(1-38)Online publication date: 25-Apr-2024
  • (2023)On the constrained time-series generation problemProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668790(61048-61059)Online publication date: 10-Dec-2023
  • (2023)Learning with explanation constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668294(49883-49926)Online publication date: 10-Dec-2023
  • Show More Cited By

Index Terms

  1. Posterior Regularization for Structured Latent Variable Models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Machine Learning Research
    The Journal of Machine Learning Research  Volume 11, Issue
    3/1/2010
    3637 pages
    ISSN:1532-4435
    EISSN:1533-7928
    Issue’s Table of Contents

    Publisher

    JMLR.org

    Publication History

    Published: 01 August 2010
    Published in JMLR Volume 11

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)65
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Controllable Data Generation by Deep Learning: A ReviewACM Computing Surveys10.1145/364860956:9(1-38)Online publication date: 25-Apr-2024
    • (2023)On the constrained time-series generation problemProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668790(61048-61059)Online publication date: 10-Dec-2023
    • (2023)Learning with explanation constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668294(49883-49926)Online publication date: 10-Dec-2023
    • (2023)A pseudo-semantic loss for autoregressive models with logical constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666928(18325-18340)Online publication date: 10-Dec-2023
    • (2023)On learning latent models with multi-instance weak supervisionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666546(9661-9694)Online publication date: 10-Dec-2023
    • (2023)Do machine learning models learn statistical rules inferred from data?Proceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619475(25677-25693)Online publication date: 23-Jul-2023
    • (2023)Parallel neurosymbolic integration with concordiaProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618802(9870-9885)Online publication date: 23-Jul-2023
    • (2023)Integrating deep neural network with logic rules for credit scoringIntelligent Data Analysis10.3233/IDA-21646027:2(483-500)Online publication date: 1-Jan-2023
    • (2023)CALDA: Improving Multi-Source Time Series Domain Adaptation With Contrastive Adversarial LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.329834645:12(14208-14221)Online publication date: 1-Dec-2023
    • (2022)Controllable text generation with neurally-decomposed oracleProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602309(28125-28139)Online publication date: 28-Nov-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media