Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/981863.981873dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Combining Trigram-based and feature-based methods for context-sensitive spelling correction

Published: 24 June 1996 Publication History

Abstract

This paper addresses the problem of correcting spelling errors that result in valid, though unintended words (such as peace and piece, or quiet and quite) and also the problem of correcting particular word usage errors (such as amount and number, or among and between). Such corrections require contextual information and are not handled by conventional spelling programs such as Unix spell. First, we introduce a method called Trigrams that uses part-of-speech trigrams to encode the context. This method uses a small number of parameters compared to previous methods based on word trigrams. However, it is effectively unable to distinguish among words that have the same part of speech. For this case, an alternative feature-based method called Bayes performs better; but Bayes is less effective than Trigrams when the distinction among words depends on syntactic constraints. A hybrid method called Tribayes is then introduced that combines the best of the previous two methods. The improvement in performance of Tribayes over its components is verified experimentally. Tribayes is also compared with the grammar checker in Microsoft Word, and is found to have substantially higher performance.

References

[1]
Church, Kenneth Ward. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Second Conference on Applied Natural Language Processing, pages 136--143, Austin, TX.
[2]
DeRose, S. J. 1988. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14:31--39.
[3]
Flexner, S. B., editor. 1983. Random House Unabridged Dictionary. Random House, New York. Second edition.
[4]
Gale, William A., Kenneth W. Church, and David Yarowsky. 1993. A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26:415--439.
[5]
Golding, Andrew R. and Dan Roth. 1996. Applying Winnow to context-sensitive spelling correction. In Lorenza Saitta, editor, Machine Learning: Proceedings of the 13th International Conference, Bari, Italy. To appear.
[6]
Golding, Andrew R. 1995. A Bayesian hybrid method for context-sensitive spelling correction. In Proceedings of the Third Workshop on Very Large Corpora, pages 39--53, Boston, MA.
[7]
Kukich, Karen. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4):377--439, December.
[8]
Kučera, H. and W. N. Francis. 1967. Computational Analysis of Present-Day American English. Brown University Press, Providence, RI.
[9]
Mays, Eric, Fred J. Damerau, and Robert L. Mercer. 1991. Context based spelling correction. Information Processing and Management, 27(5):517--522.
[10]
Peterson, James L. 1986. A note on undetected typing errors. Communications of the ACM, 29(7):633--637, July.
[11]
Yarowsky, David. 1994. Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pages 88--95, Las Cruces, NM.

Cited By

View all
  • (2018)“UTTAM”ACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326462018:1(1-26)Online publication date: 19-Nov-2018
  • (2016)A Seed-Based Method for Generating Chinese Confusion SetsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/293339616:1(1-16)Online publication date: 22-Jul-2016
  • (2015)Blind Recognition of Text Input on Mobile Devices via Natural Language ProcessingProceedings of the 2015 Workshop on Privacy-Aware Mobile Computing10.1145/2757302.2757304(19-24)Online publication date: 22-Jun-2015
  • Show More Cited By
  1. Combining Trigram-based and feature-based methods for context-sensitive spelling correction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics
      June 1996
      399 pages
      • Program Chairs:
      • Aravind Joshi,
      • Martha Palmer

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 24 June 1996

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 85 of 443 submissions, 19%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)50
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 26 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)“UTTAM”ACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326462018:1(1-26)Online publication date: 19-Nov-2018
      • (2016)A Seed-Based Method for Generating Chinese Confusion SetsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/293339616:1(1-16)Online publication date: 22-Jul-2016
      • (2015)Blind Recognition of Text Input on Mobile Devices via Natural Language ProcessingProceedings of the 2015 Workshop on Privacy-Aware Mobile Computing10.1145/2757302.2757304(19-24)Online publication date: 22-Jun-2015
      • (2013)Detection of semantic errors in Arabic textsArtificial Intelligence10.1016/j.artint.2012.07.002195(249-264)Online publication date: 1-Feb-2013
      • (2012)HOO 2012 shared taskProceedings of the Seventh Workshop on Building Educational Applications Using NLP10.5555/2390384.2390422(302-306)Online publication date: 7-Jun-2012
      • (2012)Measuring contextual fitness using error contexts extracted from the Wikipedia revision historyProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380880(529-538)Online publication date: 23-Apr-2012
      • (2011)Correcting different types of errors in textsProceedings of the 24th Canadian conference on Advances in artificial intelligence10.5555/2018192.2018215(192-203)Online publication date: 25-May-2011
      • (2011)Syntactic error detection and correction in date expressions using finite-state transducersNatural Language Engineering10.1017/S135132491100008817:2(145-161)Online publication date: 1-Apr-2011
      • (2010)Exploring web scale language models for search query processingProceedings of the 19th international conference on World wide web10.1145/1772690.1772737(451-460)Online publication date: 26-Apr-2010
      • (2009)Short and informal documentsProceedings of the 7th international conference on Next generation information technologies and systems10.5555/1813323.1813341(109-120)Online publication date: 16-Jun-2009
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media