Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Class-based n-gram models of natural language

Published: 01 December 1992 Publication History

Abstract

We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co-occurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.

References

[1]
Averbuch, A.; Bahl, L.; Bakis, R.; Brown, P.; Cole, A.; Daggett, G.; Das, S.; Davies, K.; Gennaro, S. De.; de Souza, P.; Epstein, E.; Fraleigh, D.; Jelinek, F.; Moorhead, J.; Lewis, B.; Mercer, R.; Nadas, A.; Nahamoo, D.; Picheny, M.; Shichman, G.; Spinelli, P.; Van Compernolle, D.; and Wilkens, H. (1987). "Experiments with the Tangora 20,000 word speech recognizer." In Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, Texas, 701--704.
[2]
Bahl, L. R.; Jelinek, F.; and Mercer, R. L. (1983). "A maximum likelihood approach to continuous speech recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(2), 179--190.
[3]
Baum, L. (1972). "An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process." Inequalities, 3, 1--8.
[4]
Brown, P. F.; Cocke, J.; DellaPietra, S. A.; DellaPietra, V. J.; Jelinek, F.; Lafferty, J. D.; Mercer, R. L.; and Roossin, P. S. (1990). "A statistical approach to machine translation." Computational Linguistics, 16(2), 79--85.
[5]
Dempster, A.; Laird, N.; and Rubin, D. (1977). "Maximum likelihood from incomplete data via the EM algorithm." Journal of the Royal Statistical Society, 39(B), 1--38.
[6]
Feller, W. (1950). An Introduction to Probability Theory and its Applications, Volume I. John Wiley & Sons, Inc.
[7]
Gallagher, R. G. (1968). Information Theory and Reliable Communication. John Wiley & Sons, Inc.
[8]
Good, I. (1953). "The population frequencies of species and the estimation of population parameters." Biometrika, 40(3--4), 237--264.
[9]
Jelinek, F., and Mercer, R. L. (1980). "Interpolated estimation of Markov source parameters from sparse data." In Proceedings, Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands, 381--397.
[10]
Kuçera, H., and Francis, W. (1967). Computational Analysis of Present Day American English. Brown University Press.
[11]
Mays, E.; Damerau, F. J.; and Mercer, R. L. (1990). "Context-based spelling correction." In Proceedings, IBM Natural Language ITL. Paris, France, 517--522.

Cited By

View all
  • (2024)NGram-Bayes, A Joint Model for Long Distance Context Dependency in Speech RecognitionProceedings of the 2024 7th International Conference on Signal Processing and Machine Learning10.1145/3686490.3686528(258-262)Online publication date: 12-Jul-2024
  • (2024)A Tale of Two Comprehensions? Analyzing Student Programmer Attention during Code SummarizationACM Transactions on Software Engineering and Methodology10.1145/366480833:7(1-37)Online publication date: 26-Aug-2024
  • (2024)A Survey on Evaluation of Large Language ModelsACM Transactions on Intelligent Systems and Technology10.1145/364128915:3(1-45)Online publication date: 29-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Linguistics
Computational Linguistics  Volume 18, Issue 4
December 1992
175 pages
ISSN:0891-2017
EISSN:1530-9312
Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 December 1992
Published in COLI Volume 18, Issue 4

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)178
  • Downloads (Last 6 weeks)39
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)NGram-Bayes, A Joint Model for Long Distance Context Dependency in Speech RecognitionProceedings of the 2024 7th International Conference on Signal Processing and Machine Learning10.1145/3686490.3686528(258-262)Online publication date: 12-Jul-2024
  • (2024)A Tale of Two Comprehensions? Analyzing Student Programmer Attention during Code SummarizationACM Transactions on Software Engineering and Methodology10.1145/366480833:7(1-37)Online publication date: 26-Aug-2024
  • (2024)A Survey on Evaluation of Large Language ModelsACM Transactions on Intelligent Systems and Technology10.1145/364128915:3(1-45)Online publication date: 29-Mar-2024
  • (2024)Automatic real-word error correction in persian textNeural Computing and Applications10.1007/s00521-024-10045-036:29(18125-18149)Online publication date: 1-Oct-2024
  • (2023)Automated Filipino Language Treebank GeneratorProceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval10.1145/3639233.3639238(119-123)Online publication date: 15-Dec-2023
  • (2023)Semantic Template-based Convolutional Neural Network for Text ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/362782022:11(1-21)Online publication date: 16-Oct-2023
  • (2023)API Entity and Relation Joint Extraction from Text via Dynamic Prompt-tuned Language ModelACM Transactions on Software Engineering and Methodology10.1145/360718833:1(1-25)Online publication date: 23-Nov-2023
  • (2023)Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language ModelsProceedings of the 2023 12th International Conference on Software and Computer Applications10.1145/3587828.3587872(297-301)Online publication date: 23-Feb-2023
  • (2023)Leveraging Large Language Models for the Generation of Novel Metaheuristic Optimization AlgorithmsProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3596401(1812-1820)Online publication date: 15-Jul-2023
  • (2023)A Survey of Implicit Discourse Relation RecognitionACM Computing Surveys10.1145/357413455:12(1-34)Online publication date: 2-Mar-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media