Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2390524.2390664dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free access

Exploring deterministic constraints: from a constrained English POS tagger to an efficient ILP solution to Chinese word segmentation

Published: 08 July 2012 Publication History

Abstract

We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference. For tagging, learned constraints are directly used to constrain Viterbi decoding. For segmentation, character-based tagging constraints can be learned with the same templates. However, they are better applied to a word-based model, thus an integer linear programming (ILP) formulation is proposed. For both problems, the corresponding constrained solutions have advantages in both efficiency and accuracy.

References

[1]
M. Bansal and D. Klein. 2011. Web-scale features for full-scale parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, pages 693--702.
[2]
Noam Chomsky. 1970. Remarks on nominalization. In R Jacobs and P Rosenbaum, editors, Readings in English Transformational Grammar, pages 184--221. Ginn.
[3]
Michael Collins. 2002. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, EMNLP '02, pages 1--8.
[4]
L. Huang. 2008. Forest reranking: Discriminative parsing with non-local features. In In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics.
[5]
W. Jiang, L. Huang, Q. Liu, and Y. Lü. 2008a. A cascaded linear model for joint chinese word segmentation and part-of-speech tagging. In In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics.
[6]
W. Jiang, H. Mi, and Q. Liu. 2008b. Word lattice reranking for chinese word segmentation and part-of-speech tagging. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING '08, pages 385--392.
[7]
T. Kristjansson, A. Culotta, and P. Viola. 2004. Interactive information extraction with constrained conditional random fields. In In AAAI, pages 412--418.
[8]
C. Kruengkrai, K. Uchimoto, J. Kazama, Y. Wang, K. Torisawa, and H. Isahara. 2009. An error-driven word-character hybrid model for joint chinese word segmentation and pos tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL '09, pages 513--521.
[9]
Mitch Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2): 313--330.
[10]
A. F. T. Martins, N. A. Smith, and E. P. Xing. 2009. Concise integer linear programming formulations for dependency parsing. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP), pages 342--350, Singapore.
[11]
H. T. Ng and J. K. Low. 2004. Chinese partof-speech tagging: One-at-a-time or all-at-once? word-based or character-based? In In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 277C284.
[12]
A. Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In In Proceedings of the Empirical Methods in Natural Language Processing Conference (EMNLP).
[13]
S. Ravi and K. Knight. 2009. Minimized models for unsupervised part-of-speech tagging. In Proc. ACL.
[14]
D. Roth and W. Yih. 2005. Integer linear programming inference for conditional random fields. In In Proceedings of the International Conference on Machine Learning (ICML), pages 737--744.
[15]
L. Shen, G. Satta, and A. K. Joshi. 2007. Guided learning for bidirectional sequence classification. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.
[16]
R. Sproat, W. Gale, C. Shih, and N. Chang. 1996. A stochastic finite-state word-segmentation algorithm for chinese. Comput. Linguist., 22(3): 377--404.
[17]
W. Sun. 2011. A stacked sub-word model for joint chinese word segmentation and part-of-speech tagging. In Proceedings of the ACL-HLT 2011.
[18]
K. Toutanova, D. Klein, C. Manning, and Y. Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL-2003.
[19]
N. Xue. 2003. Chinese word segmentation as character tagging. International Journal of Computational Linguistics and Chinese Language Processing, 9(1): 29--48.
[20]
Y. Zhang and S. Clark. 2007. Chinese Segmentation with a Word-Based Perceptron Algorithm. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 840--847.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
July 2012
1100 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 08 July 2012

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 101
    Total Downloads
  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media