Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/1073012.1073020dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Extracting paraphrases from a parallel corpus

Published: 06 July 2001 Publication History

Abstract

While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases.

References

[1]
R. H. Baayen, R. Piepenbrock, and H. van Rijn, editors. 1993. The CELEX Lexical Database(CD-ROM). Linguistic Data Consortium, University of Pennsylvania.
[2]
R. Barzilay and M. Elhadad. 1997. Using lexical chains for text summarization. In Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, pages 10--17, Madrid, Spain, August.
[3]
P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263--311.
[4]
M. Collins and Y. Singer. 1999. Unsupervised models for named entity classification. In proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.
[5]
R. de Beaugrande and W. V. Dressler. 1981. Introduction to Text Linguistics. Longman, New York, NY.
[6]
M. Dras. 1999. Tree Adjoining Grammar and the Reluctant Paraphrasing of Text. Ph.D. thesis, Macquarie University, Australia.
[7]
W. Gale and K. W. Church. 1991. A program for aligning sentences in bilingual corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pages 1--8.
[8]
M. Halliday. 1985. An introduction to functional grammar. Edward Arnold, UK.
[9]
V. Hatzivassiloglou and K. R. McKeown. 1993. Towards the automatic identification of adjectival scales: Clustering adjectives according to their meaning. In Proceedings of the 31rd Annual Meeting of the Association for Computational Linguistics, pages 172--182.
[10]
V. Hatzivassiloglou, J. Klavans, and E. Eskin. 1999. Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.
[11]
L. Iordanskaja, R. Kittredge, and A. Polguere, 1991. Natural language Generation in Artificial Intelligence and Computational Linguistics, chapter 11. Kluwer Academic Publishers.
[12]
C. Jacquemin, J. Klavans, and E. Tzoukermann. 1997. Expansion of multi-word terms for indexing and retrieval using morphology and syntax. In proceedings of the 35th Annual Meeting of the ACL, pages 24--31, Madrid, Spain, July. ACL.
[13]
J. R. Landis and G. G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159--174.
[14]
I. Langkilde and K. Knight. 1998. Generation that exploits corpus-based statistical knowledge. In proceedings of the COLING-ACL.
[15]
Maria Lapata. 2001. A corpus-based account of regular polysemy: The case of context-sensitive adjectives. In Proceedings of the 2nd Meeting of the NAACL, Pittsburgh, PA.
[16]
D. Lin. 1998. Automatic retrieval and clustering of similar words. In proceedings of the COLING-ACL, pages 768--774.
[17]
Melamed. 2001. Empirical Methods for Exploiting Parallel Texts. MIT press.
[18]
A. Mikheev. 1997. the ltg part of speech tagger. University of Edinburgh.
[19]
G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue), 3(4):235--245.
[20]
F. Pereira, N. Tishby, and L. Lee. 1993. Distributional clustering of english words. In proceedings of the 30th Annual Meeting of the ACL, pages 183--190. ACL.
[21]
E. Riloff and R. Jones. 1999. Learning Dictionaries for Information Extraction by Multi-level Boot-strapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, pages 1044--1049. The AAAI Press/MIT Press.
[22]
J. Robin. 1994. Revision-Based Generation of Natural Language Summaries Providing Historical Background: Corpus-Based Analysis, Design, Implementation, and Evaluation. Ph.D. thesis, Department of Computer Science, Columbia University, NY.
[23]
S. Siegel and N. J. Castellan. 1988. Non Parametric Statistics for Behavioral Sciences. McGraw-Hill.
[24]
J. Veronis, editor. 2000. Parallel Text Processing: Alignment and Use of Translation Corpora. Kluwer Academic Publishers.
[25]
R. Wechsler. 1998. Performing Without a Stage: The Art of Literary Translation. Catbird Press.
[26]
D. Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 189-- 196.

Cited By

View all
  • (2023)OSPT: European Portuguese Paraphrastic Dataset with Machine TranslationProgress in Artificial Intelligence10.1007/978-3-031-49008-8_36(454-466)Online publication date: 5-Sep-2023
  • (2020)A Memory-Based Sentence Split and Rephrase Model with Multi-task TrainingNeural Information Processing10.1007/978-3-030-63830-6_54(643-654)Online publication date: 18-Nov-2020
  • (2019)A task in a suit and a tieProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33017176(7176-7183)Online publication date: 27-Jan-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
July 2001
562 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 July 2001

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)12
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)OSPT: European Portuguese Paraphrastic Dataset with Machine TranslationProgress in Artificial Intelligence10.1007/978-3-031-49008-8_36(454-466)Online publication date: 5-Sep-2023
  • (2020)A Memory-Based Sentence Split and Rephrase Model with Multi-task TrainingNeural Information Processing10.1007/978-3-030-63830-6_54(643-654)Online publication date: 18-Nov-2020
  • (2019)A task in a suit and a tieProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33017176(7176-7183)Online publication date: 27-Jan-2019
  • (2019)PARABANKProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33016521(6521-6528)Online publication date: 27-Jan-2019
  • (2019)Cleaning StackOverflow for machine translationProceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00021(79-83)Online publication date: 26-May-2019
  • (2019)Graph-based clustering of extracted paraphrases for labelling crime reportsKnowledge-Based Systems10.1016/j.knosys.2019.05.004179:C(55-76)Online publication date: 1-Sep-2019
  • (2018)Deep text classification can be fooledProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304222.3304355(4208-4215)Online publication date: 13-Jul-2018
  • (2018)A Globalization-Semantic Matching Neural Network for Paraphrase IdentificationProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3272004(2067-2075)Online publication date: 17-Oct-2018
  • (2018)Expanding Paraphrase Lexicons by Exploiting GeneralitiesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/316048817:2(1-36)Online publication date: 30-Jan-2018
  • (2017)Text rewriting improves semantic role labelingProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3171837.3172021(5095-5099)Online publication date: 19-Aug-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media