Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

When Errors Become the Rule: Twenty Years with Transformation-Based Learning

Published: 01 April 2014 Publication History

Abstract

Transformation-based learning (TBL) is a machine learning method for, in particular, sequential classification, invented by Eric Brill [Brill 1993b, 1995a]. It is widely used within computational linguistics and natural language processing, but surprisingly little in other areas.
TBL is a simple yet flexible paradigm, which achieves competitive or even state-of-the-art performance in several areas and does not overtrain easily. It is especially successful at catching local, fixed-distance dependencies and seamlessly exploits information from heterogeneous discrete feature types. The learned representation—an ordered list of transformation rules—is compact and efficient, with clear semantics. Individual rules are interpretable and often meaningful to humans.
The present article offers a survey of the most important theoretical work on TBL, addressing a perceived gap in the literature. Because the method should be useful also outside the world of computational linguistics and natural language processing, a chief aim is to provide an informal but relatively comprehensive introduction, readable also by people coming from other specialities.

Supplementary Material

a50-uneson-apndx.pdf (uneson.zip)
Supplemental movie, appendix, image and software files for, When Errors Become the Rule: Twenty Years with Transformation-Based Learning

References

[1]
Harold Abelson and Gerald J. Sussman. 1996. Structure and Interpretation of Computer Programs. MIT Press, Cambridge.
[2]
John Aberdeen, John Burger, David Day, Lynette Hirschman, Patricia Robinson, and Marc Vilain. 1995. MITRE: description of the Alembic system used for MUC-6. In Proceedings of the 6th Conference on Message Understanding. Association for Computational Linguistics, 141--155.
[3]
Chinatsu Aone and Kevin Hausman. 1996. Unsupervised learning of a rule-based Spanish part of speech tagger. In Proceedings of the 16th Conference on Computational Linguistics, Vol. 1. Association for Computational Linguistics, 53--58.
[4]
Nezip F. Ayan, Bonnie J. Dorr, and Christof Monz. 2005. Alignment link projection using transformation-based learning. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 185--192.
[5]
Lalit R. Bahl, Peter F. Brown, Peter V. de Souza, and Robert L. Mercer. 1989. A tree-based statistical language model for natural language speech recognition. Acoustics, Speech and Signal Processing, IEEE Transactions 37, 7 (1989), 1001--1008.
[6]
Markus Becker. 1998. Unsupervised part of speech tagging with extended templates. In Proceedings of ESSLLI 1998, Student Session.
[7]
Gosse Bouma. 2000. A finite state and data oriented method for grapheme to phoneme conversion. In NAACL-2000. 303--310.
[8]
Gosse Bouma. 2003. Finite state methods for hyphenation. Natural Language Engineering 9 (2003), 5--20.
[9]
Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123--140.
[10]
Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. 1984. Classification and Regression Trees. Wadsworth and Brooks, Monterrey, CA.
[11]
Eric Brill. 1993a. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the Workshop on Human Language Technology. Association for Computational Linguistics, 237--242.
[12]
Eric Brill. 1993b. A Corpus-Based Approach to Language Learning. Ph.D. Dissertation. University of Pennsylvania, Philadelphia, PA.
[13]
Eric Brill. 1994. Some advances in transformation-based part of speech tagging. In Proceedings of the 12th National Conference on Artificial Intelligence. Arxiv preprint cmp-lg/9406010 (1994), 722--727.
[14]
Eric Brill. 1995a. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21, 4 (1995), 543--565.
[15]
Eric Brill. 1995b. Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the 3rd Workshop on Very Large Corpora, Vol. 30. 1--13.
[16]
Eric Brill. 1996. Learning to parse with transformations. In Recent Advances in Parsing Technology. Kluwer.
[17]
Eric Brill and Philip Resnik. 1994. A rule-based approach to prepositional phrase attachment disambiguation. In Proceedings of COLING'94. 1198--1204.
[18]
Eric Brill and Jun Wu. 1998. Classifier combination for improved lexical disambiguation. In Proceedings of the 17th International Conference on Computational Linguistics, Vol. 1. Association for Computational Linguistics, 191--195.
[19]
Björn Bringmann, Stefan Kramer, Friedrich Neubarth, Hannes Pirker, and Gerhard Widmer. 2002. Transformation-based regression. In Machine Learning: International Workshop then Conference. Citeseer, 59--66.
[20]
Sandra Carberry, K. Vijay-Shanker, Andrew Wilson, and Ken Samuel. 2001. Randomized rule selection in transformation-based learning: A comparative study. Natural Language Engineering 7, 2 (2001), 99--116.
[21]
John Carroll, Ted Briscoe, and Antonio Sanfilippo. 1998. Parser evaluation: A survey and a new proposal. In Proceedings of the 1st International Conference on Language Resources and Evaluation. 447--454.
[22]
Rich Caruana. 1997. Multitask learning. Machine Learning 28, 1 (1997), 41--75.
[23]
James R. Curran and Raymond K. Wong. 1999. Transformation-based learning for automatic translation from HTML to XML. In Proceedings of the 4th Australasian Document Computing Symposium (ADCS99). Citeseer.
[24]
James R. Curran and Raymond K. Wong. 2000. Formalization of transformation-based learning. In ACSC. IEEE Computer Society, 51--57.
[25]
Walter Daelemans. 1995. Memory-based lexical acquisition and processing. In Machine Translation and the Lexicon, P. Steffens (Ed.). Springer, Berlin, 85--98.
[26]
David Day, John Aberdeen, Lynette Hirschman, Robyn Kozierok, Patricia Robinson, and Marc Vilain. 1997. Mixed-initiative development of language processing systems. In Proceedings of the Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics, 348--355.
[27]
Luca Dini, Vittorio Di Tomaso, and Frédérique Segond. 1998. Error driven word sense disambiguation. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Vol. 1. Association for Computational Linguistics, 320--324.
[28]
Cícero N. dos Santos. 2009. Entropy Guided Transformation Learning. Ph.D. Dissertation. Pontifícia Universidade Católica do Rio de Janeiro.
[29]
Cícero N. dos Santos and Ruy L. Milidiú. 2007. Probabilistic classifications with TBL. In Computational Linguistics and Intelligent Text Processing, Alexander Gelbukh (Ed.). Lecture Notes in Computer Science, Vol. 4394. Springer, Berlin, 196--207.
[30]
Cícero N. dos Santos and Ruy L. Milidiú. 2009. Entropy guided transformation learning. Foundations of Computational Intelligence. 1, (2009), 159--184.
[31]
Cícero N. dos Santos, Ruy L. Milidiú, Carlos E. M. Crestana, and Eraldo R. Fernandes. 2010. ETL Ensembles for Chunking, NER and SRL. In Computational Linguistics and Intelligent Text Processing, Alexander Gelbukh (Ed.). Lecture Notes in Computer Science, Vol. 6008. Springer, Berlin, 100--112.
[32]
Cícero N. dos Santos, Ruy L. Milidiú, and Raúl Rentería. 2008. Portuguese part-of-speech tagging using entropy guided transformation learning. In Computational Processing of the Portuguese Language, António Teixeira, Vera de Lima, Luís de Oliveira, and Paulo Quaresma (Eds.). Lecture Notes in Computer Science, Vol. 5190. Springer, Berlin, 143--152.
[33]
Cícero N. dos Santos and Claudia Oliveira. 2005. Constrained atomic term: Widening the reach of rule templates in transformation based learning. In EPIA(Lecture Notes in Computer Science), Carlos Bento, Amílcar Cardoso, and Gaël Dias (Eds.), Vol. 3808. Springer, 622--633. 11595014_61
[34]
Philip Edmonds. 2002. SENSEVAL: The evaluation of word sense disambiguation systems. ELRA Newsletter 7, 3 (2002), 5--14.
[35]
Eraldo R. Fernandes, Cícero N. dos Santos, and Ruy L. Milidiú. 2010. A machine learning approach to Portuguese clause identification. Computational Processing of the Portuguese Language (2010), 55--64.
[36]
Radu Florian. 2002a. Named entity recognition as a house of cards: Classifier stacking. In Proceedings of the 6th Conference on Natural Language Learning. 1--4.
[37]
Radu Florian. 2002b. Transformation Based Learning and Data-Driven Lexical Disambiguation. Syntactic and Semantic Ambiguity Resolution. Ph.D. Dissertation, Johns Hopkins University.
[38]
Radu Florian, John Henderson, and Grace Ngai. 2000. Coaxing confidences from an old friend: Probabilistic classifications from transformation rule lists. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora. Association for Computational Linguistics, 26--34.
[39]
Radu Florian, Abe Ittycheriah, Hongyan Jing, and Tong Zhang. 2003. Named entity recognition through classifier combination. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, Vol. 4. Association for Computational Linguistics, 171.
[40]
Radu Florian and Grace Ngai. 2001. Multidimensional transformation-based learning, In Proceedings of the 5th Workshop on Computational Language Learning (CoNLL-2001). CoRR cs.CL/0107021 (2001).
[41]
Cameron Fordyce. 1998. Prosody Prediction for Speech Synthesis Using Transformational Rule-Based Learning. Master's Thesis, Boston University.
[42]
Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4 (2003), 933--969.
[43]
William A. Gale, Kenneth W. Church, and David Yarowsky. 1992. A method for disambiguating word senses in a large corpus. Computers and the Humanities 26, 5--6 (1992), 415--439.
[44]
Daniel Hardt. 1998. Improving ellipsis resolution with transformation-based learning. In AAAI Fall Symposium.
[45]
Daniel Hardt. 2001. Transformation-based learning of Danish grammar correction. In Proceedings of RANLP 2001, Tzigov Chark. Citeseer.
[46]
Per Hedelin, Anders Jonsson, and Per Lindblad. 1987. Svenskt uttalslexikon (3rd ed.). Technical report. Chalmers University of Technology.
[47]
Mark Hepple. 2000. Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 278.
[48]
Paul Hudak. 1996. Building domain-specific embedded languages. ACM Computing Surveys (CSUR) 28, 4 (1996).
[49]
Paul Hudak. 1998. Modular domain specific languages and tools. In Proceedings of the 5th International Conference on Software Reuse, P. Devanbu and J. Poulin (Eds.). IEEE Computer Society Press, 134--142.
[50]
Daniel Jurafsky and James H. Martin. 2008. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd ed.). Prentice-Hall.
[51]
Fred Karlsson, Atro Voutilainen, Juha Heikkilä, and Arto Anttila (Eds.). 1995. Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter.
[52]
Ergina Kavallieratou, Efstathios Stamatatos, Nikos Fakotakis, and George Kokkinakis. 2000. Handwritten character segmentation using transformation-based learning. In Proceedings of the 15th International Conference on Pattern Recognition (ICPR'00). 634--637.
[53]
Joungbum Kim, Sarah E. Schwarm, and Mari Ostendorf. 2004. Detecting structural metadata with decision trees and transformation-based learning. In Proceedings of HLT-NAACL04. 137--144.
[54]
Ludmila I. Kuncheva and Christopher J. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 2 (2003), 181--207.
[55]
Torbjörn Lager. 1999a. μ-TBL Lite: A small, extensible transformation-based learner. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL'99). Bergen. Poster paper.
[56]
Torbjörn Lager. 1999b. The μ-TBL system: Logic programming tools for transformation-based learning. In Proceedings of CoNLL, Vol. 99.
[57]
Torbjörn Lager. 2001. Transformation-based learning of rules for constraint grammar tagging. In 13th Nordic Conference in Computational Linguistics. Uppsala, Sweden, 21--22.
[58]
Torbjörn Lager and Natalia Zinovjeva. 1999. Training a dialogue act tagger with the μ-TBL System. In Proceedings of the 3rd Swedish Symposium on Multimodal Communication. Linköping University Natural Language Processing Laboratory (NLPLAB).
[59]
Niels Landwehr, Bernd Gutmann, Ingo Thon, Luc De Raedt, and Matthai Philipose. 2008. Relational transformation-based tagging for human activity recognition. Fundamenta Informaticae 89, 1 (2008), 111--129.
[60]
Xin Li, Xuan-Jing Huang, and Li-de Wu. 2006. Question classification by ensemble learning. IJCSNS 6, 3 (2006), 147.
[61]
Nikolaj Lindberg and Martin Eineborg. 1998. Learning constraint grammar-style disambiguation rules using inductive logic programming. In Proceedings of the 17th International Conference on Computational Linguistics. Association for Computational Linguistics, 775--779.
[62]
Lidia Mangu and Eric Brill. 1997. Automatic rule acquisition for spelling correction. In Machine Learning -- International Workshop then Conference. Citeseer, 187--194.
[63]
Christopher D. Manning and Hinrich Schütze. 2001. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
[64]
Andrei Mikheev. 1997. Automatic rule induction for unknown-word guessing. Computational Linguistics 23, 3 (1997), 405--423.
[65]
Ruy Luiz Milidiú, C. E. M. Crestana, and Cícero Nogueira dos Santos. 2010. A token classification approach to dependency parsing. In Proceedings of the 7th Brazilian Symposium on Information and Human Language Technology (STIL'09). IEEE, 80--88.
[66]
Ruy L. Milidiú, Cícero N. dos Santos, and Julio C. Duarte. 2008. Phrase chunking using entropy guided transformation learning. In Proceedings of ACL 2008. Citeseer.
[67]
Ruy L. Milidiú, Julio C. Duarte, and Cícero N. dos Santos. 2007. Evolutionary TBL template generation. Journal of the Brazilian Computer Society 13(4) (2007), 39--50.
[68]
Tom Mitchell. 1997. Machine Learning. McGraw-Hill.
[69]
Un Yong Nahm. 2005. Transformation-based information extraction using learned meta-rules. Computational Linguistics and Intelligent Text Processing (2005), 535--538.
[70]
Lee Naish. 1996. Higher-order logic programming in Prolog. In Proceedings of the Workshop on Multi-Paradigm Logic Programming, JICSLP, Vol. 96.
[71]
Grace Ngai and Radu Florian. 2001a. Transformation-based learning in the fast lane. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies 2001. Association for Computational Linguistics, 8.
[72]
Grace Ngai and Radu Florian. 2001b. Transformation Based Learning in the Fast Lane: A Generative Approach. Technical Report. Center for Speech and Language Processing, Johns Hopkins University.
[73]
Kemal Oflazer and Gökhan Tür. 1996. Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 69--81.
[74]
Jonathan Oliver. 1992. Decision Graphs: An Extension of Decision Trees. Technical Report 92/173. Department of Computer Science, Monash University.
[75]
David D. Palmer. 1997. A trainable rule-based algorithm for word segmentation. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 321--328.
[76]
Seong-Bae Park, Jeong-Ho Chang, and Byoung-Tak Zhang. 2004. Korean compound noun decomposition using syllabic information only. Computational Linguistics and Intelligent Text Processing (2004), 146--157.
[77]
Fernando Pereira and Yves Schabes. 1992. Inside-outside reestimation from partially bracketed corpora. In ACL.
[78]
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.
[79]
Lance A. Ramshaw and Mitchell P. Marcus. 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language. 128--135.
[80]
Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the ACL 3rd Workshop on Very Large Corpora, David Yarowsky and Kenneth W. Church (Eds.), Vol. cmp-lg/9505040. Association of Computational Linguistics, Somerset, NJ, 82--94.
[81]
Ronald Rivest. 1987. Learning decision lists. Machine Learning 2, 3 (1987), 229--246.
[82]
Emmanuel Roche and Yves Schabes. 1995. Deterministic part-of-speech tagging with finite-state transducers. Computational Linguistics 21, 2 (1995), 227--253.
[83]
Dan Roth. 1998. Learning to resolve natural language ambiguities: A unified approach. In Proceedings of the National Conference on Artificial Intelligence. John Wiley & Sons Ltd., 806--813.
[84]
Tobias Ruland. 2000. A context-sensitive model for probabilistic LR parsing of spoken language with transformation-based postprocessing. In Proceedings of the 18th Conference on Computational Linguistics, Vol. 2. Association for Computational Linguistics, 677--683.
[85]
Ken Samuel. 1998a. Discourse learning: Dialogue act tagging with transformation-based learning. In Proceedings of the National Conference on Artificial Intelligence. John Wiley and Sons, Ltd., 1199--1199.
[86]
Ken Samuel. 1998b. Lazy transformation-based learning. In Proceedings of the 11th International Florida Artificial Intelligence Research Society Conference. AAAI Press, 235--239.
[87]
Ken Samuel, Sandra Carberry, and K. Vijay-Shanker. 1998. An investigation of transformation-based learning in discourse. In Machine Learning: Proceedings of the 15th International Conference.
[88]
Christer Samuelsson, Pasi Tapanainen, and Atro Voutilainen. 1996. Inducing constraint grammars. Grammatical Interference: Learning Syntax from Sentences (1996), 146--155.
[89]
Erik Tjong, Kim Sang, and Jorn Veenstra. 1999. Representing text chunks. In Proceedings of the 9th Conference on European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 173--179.
[90]
Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou. 2008. Normalizing biomedical terms by minimizing ambiguity and variability. BMC Bioinformatics 9, Suppl 3 (2008), S2.
[91]
Leslie G. Valiant. 1984. A theory of the learnable. Communication ACM 27, 11 (1984), 1134--1142.
[92]
Arie van Deursen, Paul Klint, and Joost Visser. 2000. Domain-specific languages: An annotated bibliography. ACM SIGPLAN Notices 35, 6 (2000), 26--36.
[93]
Ken Williams, Christopher Dozier, and Andrew McCulloh. 2004. Learning transformation rules for semantic role labeling. In Proceedings of CoNLL-2004.
[94]
Garnett Wilson and Malcolm Heywood. 2005. Use of a genetic algorithm in Brill's transformation-based part-of-speech tagger. In GECCO'05: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation. ACM, New York, NY, 2067--2073.
[95]
David Wolpert. 1992. Stacked generalization. Neural Networks 5(2) (1992), 241260.
[96]
Dekai Wu, Grace Ngai, and Marine Carpuat. 2004. Raising the bar: Stacked conservative error correction beyond boosting. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004). Lisbon.
[97]
George K. Zipf. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley.
[98]
Win Zonneveld, Mieke Trommelen, Michael Jessen, Curtis Rice, Gösta Bruce, and Kristjan Arnason. 1999. Wordstress in West-Germanic and North-Germanic languages. In Word Prosodic Systems in the Languages of Europe, Harry van der Hulst (Ed.). Walter de Gruyter, Chapter 8, 477--604.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 46, Issue 4
April 2014
463 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2597757
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2014
Accepted: 01 October 2013
Revised: 01 September 2013
Received: 01 October 2012
Published in CSUR Volume 46, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Transformation-based learning
  2. brill tagging
  3. computational linguistics
  4. error-driven rule learning
  5. natural language processing
  6. sequential classification
  7. supervised learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 546
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media