Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/1220175.1220295dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Accurate collocation extraction using a multilingual parser

Published: 17 July 2006 Publication History

Abstract

This paper focuses on the use of advanced techniques of text analysis as support for collocation extraction. A hybrid system is presented that combines statistical methods and multilingual parsing for detecting accurate collocational information from English, French, Spanish and Italian corpora. The advantage of relying on full parsing over using a traditional window method (which ignores the syntactic information) is first theoretically motivated, then empirically validated by a comparative evaluation experiment.

References

[1]
Morton Benson. 1990. Collocations and general-purpose dictionaries. International Journal of Lexicography, 3(1):23--35.
[2]
Elisabeth Breidt. 1993. Extraction of V-N-collocations from text corpora: A feasibility study for German. In Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, U.S.A.
[3]
Yaacov Choueka. 1988. Looking for needles in a haystack, or locating interesting collocational expressions in large textual databases expressions in large textual databases. In Proceedings of the International Conference on User-Oriented Content-Based Text and Image Handling, pages 609--623, Cambridge, MA.
[4]
Anthony P. Cowie. 1978. The place of illustrative material and collocations in the design of a learner's dictionary. In P. Strevens, editor, In Honour of A. S. Hornby, pages 127--139. Oxford: Oxford University Press.
[5]
D. Alan Cruse. 1986. Lexical Semantics. Cambridge University Press, Cambridge.
[6]
Gaël Dias. 2003. Multiword unit hybrid extraction. In Proceedings of the ACL Workshop on Multiword Expressions, pages 41--48, Sapporo, Japan.
[7]
Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61--74.
[8]
Stefan Evert and Hannah Kermes. 2003. Experiments on candidate data for collocation extraction. In Companion Volume to the Proceedings of the 10th Conference of The European Chapter of the Association for Computational Linguistics, pages 83--86, Budapest, Hungary.
[9]
Stefan Evert. 2004. The Statistics of Word Cooccurrences: Word Pairs and Collocations Word Pairs and Collocations. Ph.D. thesis, University of Stuttgart.
[10]
John Rupert Firth. 1957. Papers in Linguistics 1934-1951. Oxford Univ. Press, Oxford.
[11]
Ray Jackendoff. 1997. The Architecture of the Language Faculty. MIT Press, Cambridge, MA.
[12]
John S. Justeson and Slava M. Katz. 1995. Technical terminology: Some linguistis properties and an algorithm for identification in text. Natural Language Engineering, 1:9--27.
[13]
Brigitte Krenn and Stefan Evert. 2001. Can we do better than frequency? A case study on extracting PP-verb collocations. In Proceedings of the ACL Workshop on Collocations, pages 39--46, Toulouse, France.
[14]
Dekang Lin. 1998. Extracting collocations from text corpora. In First Workshop on Computational Terminology, pages 57--63, Montreal.
[15]
Christopher Manning and Heinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Mass.
[16]
Kathleen R. McKeown and Dragomir R. Radev. 2000. Collocations. In Robert Dale, Hermann Moisl, and Harold Somers, editors, A Handbook of Natural Language Processing, pages 507--523. Marcel Dekker, New York, U.S.A.
[17]
Igor Mel'čuk. 1998. Collocations and lexical functions. In Anthony P. Cowie, editor, Phraseology. Theory, Analysis, and Applications, pages 23--53. Claredon Press, Oxford.
[18]
Brigitte Orliac and Mike Dillinger. 2003. Collocation extraction for machine translation. In Proceedings of Machine Translation Summit IX, pages 292--298, New Orleans, Lousiana, U.S.A.
[19]
Darren Pearce. 2001. Synonymy in collocation extraction. In WordNet and Other Lexical Resources: Applications, Extensions and Customizations (NAACL 2001 Workshop), pages 41--46, Carnegie Mellon University, Pittsburgh.
[20]
Pavel Pecina. 2005. An extensive empirical study of collocation extraction methods. In Proceedings of the ACL Student Research Workshop, pages 13--18, Ann Arbor, Michigan, June. Association for Computational Linguistics.
[21]
Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger. 2002. Multiword expressions: A pain in the neck for NLP. In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), pages 1--15, Mexico City.
[22]
Violeta Seretan and Eric Wehrli. 2006. Multilingual collocation extraction: Issues and solutions solutions. In Proceedings or COLING/ACL Workshop on Multilingual Language Resources and Interoperability, Sydney, Australia, July. To appear.
[23]
Violeta Seretan, Luka Nerima, and Eric Wehrli. 2003. Extraction of multi-word collocations using syntactic bigram composition. In Proceedings of the Fourth International Conference on Recent Advances in NLP (RANLP-2003), pages 424--431, Borovets, Bulgaria.
[24]
Frank Smadja. 1993. Retrieving collocations form text: Xtract. Computational Linguistics, 19(1):143--177.
[25]
Eric Wehrli. 2004. Un modèle multilingue d'analyse syntaxique. In A. Auchlin et al., editor, Structures et discours - Mélanges offerts à Eddy Roulet, pages 311--329. Éditions Nota bene, Québec.

Cited By

View all
  • (2011)Two-Word Collocation Extraction Using Monolingual Word Alignment MethodACM Transactions on Intelligent Systems and Technology10.1145/2036264.20362803:1(1-29)Online publication date: 1-Oct-2011
  • (2010)Book review:Computational Linguistics10.1162/coli_r_0002436:4(777-780)Online publication date: 1-Dec-2010
  • (2009)Collocation extraction using monolingual word alignment methodProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 210.5555/1699571.1699575(487-495)Online publication date: 6-Aug-2009
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
July 2006
1214 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 17 July 2006

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)22
Reflects downloads up to 29 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Two-Word Collocation Extraction Using Monolingual Word Alignment MethodACM Transactions on Intelligent Systems and Technology10.1145/2036264.20362803:1(1-29)Online publication date: 1-Oct-2011
  • (2010)Book review:Computational Linguistics10.1162/coli_r_0002436:4(777-780)Online publication date: 1-Dec-2010
  • (2009)Collocation extraction using monolingual word alignment methodProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 210.5555/1699571.1699575(487-495)Online publication date: 6-Aug-2009
  • (2009)Deep linguistic multilingual translation and bilingual dictionariesProceedings of the Fourth Workshop on Statistical Machine Translation10.5555/1626431.1626450(90-94)Online publication date: 30-Mar-2009
  • (2008)Various criteria of collocation cohesion in internetProceedings of the 9th international conference on Computational linguistics and intelligent text processing10.5555/1787578.1787585(64-72)Online publication date: 17-Feb-2008
  • (2008)Efficient multi-word expressions extractor using suffix arrays and related structuresProceedings of the 2nd ACM workshop on Improving non english web searching10.1145/1460027.1460029(1-8)Online publication date: 30-Oct-2008
  • (2007)Fips, a "deep" linguistic multilingual parserProceedings of the Workshop on Deep Linguistic Processing10.5555/1608912.1608931(120-127)Online publication date: 28-Jun-2007
  • (2006)Multilingual collocation extractionProceedings of the Workshop on Multilingual Language Resources and Interoperability10.5555/1613162.1613168(40-49)Online publication date: 23-Jul-2006

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media