Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/980691.980696dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Automatic retrieval and clustering of similar words

Published: 10 August 1998 Publication History

Abstract

Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the thesaurus is significantly closer to WordNet than Roget Thesaurus is.

References

[1]
Hiyan Alshawi and David Carter. 1994. Training and scaling preference functions for disambiguation. Computational Linguistics, 20(4): 635--648, December.
[2]
Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of ACL-93, pages 164--171, Columbus, Ohio, June.
[3]
Ido Dagan, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting of the ACL, pages 272--278, Las Cruces, NM.
[4]
Ido Dagan, Lillian Lee, and Fernando Pereira. 1997. Similarity-based method for word sense disambiguation. In Proceedings of the 35th Annual Meeting of the ACL, pages 56--63, Madrid, Spain.
[5]
Ute Essen and Volker Steinbiss. 1992. Cooccurrence smoothing for stochastic language modeling. In Proceedings of ICASSP, volume 1, pages 161--164.
[6]
W. B. Frakes and R. Baeza-Yates, editors. 1992. Information Retrieval, Data Structure and Algorithms. Prentice Hall.
[7]
D. Gentner. 1982. Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In S. A. Kuczaj, editor, Language development: Vol. 2. Language, thought, and culture, pages 301--334. Erlbaum, Hillsdale, NJ.
[8]
Gregory Grefenstette. 1994. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston, MA.
[9]
Donald Hindle. 1990. Noun classification from predicate-argument structures. In Proceedings of ACL-90, pages 268--275, Pittsburg, Pennsylvania, June.
[10]
Dekang Lin. 1993. Principle-based parsing without overgeneration. In Proceedings of ACL-93, pages 112--120, Columbus, Ohio.
[11]
Dekang Lin. 1994. Principar---an efficient, broad-coverage, principle-based parser. In Proceedings of COLING-94, pages 482--488. Kyoto, Japan.
[12]
Dekang Lin. 1997. Using syntactic dependency as local context to resolve word sense ambiguity. In Proceedings of ACL/EACL-97, pages 64--71, Madrid, Spain, July.
[13]
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4): 235--244.
[14]
George A. Miller. 1990. WordNet: An on-line lexical database. International Journal of Lexicography, 3(4): 235--312.
[15]
Eugene A. Nida. 1975. Componential Analysis of Meaning. The Hague, Mouton.
[16]
F. Pereira, N. Tishby, and L. Lee. 1993. Distributional Clustering of English Words. In Proceedings of ACL93, pages 183--190, Ohio State University, Columbus, Ohio.
[17]
Gerda Ruge. 1992. Experiments on linguistically based term associations. Information Processing & Management, 28(3): 317--332.
[18]
Frank Smadja. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1): 143--178.

Cited By

View all
  • (2022)Hypernymy Detection for Low-resource Languages: A Study for Hindi, Bengali, and AmharicACM Transactions on Asian and Low-Resource Language Information Processing10.1145/349038921:4(1-21)Online publication date: 4-Mar-2022
  • (2019)Unsupervised compositionality prediction of nominal compoundsComputational Linguistics10.1162/coli_a_0034145:1(1-57)Online publication date: 1-Mar-2019
  • (2019)A cascaded framework for identification and extraction of antonym for Turkish languageSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-018-3417-123:17(7853-7864)Online publication date: 1-Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
August 1998
768 pages

Sponsors

  • Government of Canada
  • Université de Montréal

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 10 August 1998

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)9
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Hypernymy Detection for Low-resource Languages: A Study for Hindi, Bengali, and AmharicACM Transactions on Asian and Low-Resource Language Information Processing10.1145/349038921:4(1-21)Online publication date: 4-Mar-2022
  • (2019)Unsupervised compositionality prediction of nominal compoundsComputational Linguistics10.1162/coli_a_0034145:1(1-57)Online publication date: 1-Mar-2019
  • (2019)A cascaded framework for identification and extraction of antonym for Turkish languageSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-018-3417-123:17(7853-7864)Online publication date: 1-Sep-2019
  • (2018)Concept Identification Using Co-Occurrence GraphInternational Journal of Web Portals10.4018/IJWP.201801010310:1(27-38)Online publication date: 1-Jan-2018
  • (2018)Domain-Specific Ontology Concept Extraction and Hierarchy ExtensionProceedings of the 2nd International Conference on Natural Language Processing and Information Retrieval10.1145/3278293.3278302(60-64)Online publication date: 7-Sep-2018
  • (2018)Pause-Based Phrase Extraction and Effective OOV Handling for Low-Resource Machine Translation SystemsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326575118:2(1-22)Online publication date: 14-Dec-2018
  • (2018)Narrative Plot Comparison Based on a Bag-of-actors Document ModelProceedings of the 29th on Hypertext and Social Media10.1145/3209542.3209556(136-144)Online publication date: 3-Jul-2018
  • (2018)Expanding Paraphrase Lexicons by Exploiting GeneralitiesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/316048817:2(1-36)Online publication date: 30-Jan-2018
  • (2017)Improving selection of synsets from WordNet for domain-specific word sense disambiguationComputer Speech and Language10.1016/j.csl.2016.06.00341:C(128-145)Online publication date: 1-Jan-2017
  • (2017)Pattern graph-based image retrieval system combining semantic and visual featuresMultimedia Tools and Applications10.1007/s11042-017-4716-876:19(20287-20316)Online publication date: 1-Oct-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media