Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1088622.1088644acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
Article

Collecting paraphrase corpora from volunteer contributors

Published: 02 October 2005 Publication History

Abstract

Extensive and deep paraphrase corpora are important for a variety of natural language processing and user interaction tasks. In this paper, we present an approach which i) collects multiple paraphrases per given item from volunteers and ii) incentivises responsible contributions by volunteer contributors. Our approach is to solicit paraphrases from Web volunteers, both collecting new paraphrases with no prompting and asking contributors to guess partially obfuscated paraphrases. To test the approach, we have implemented an online game, 1001 Paraphrases (http://ai-games.org/paraphrase.html), and deployed it to collect 20,944 entries focused on paraphrases of 400 statements. The approach complements existing text extraction methods and has some inherent unique advantages. We present and motivate our design as well as share preliminary observations and lessons learned about the performance of the approach.

References

[1]
Luis von Ahn and Laura Dabbish. 2004. Labeling Images with a Computer Game. In ACM CHI 2004.
[2]
Allen, J., Byron, D, Dzikovska, M, Ferguson, G, Galescu, L, and Stent, A. "Towards Conversational Human-Computer Interaction," AI Magazine 22(4), pages 27--38, Winter, 2001.
[3]
Barzilay R. and Lee, L. 2003. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment. In Proceedings of North American Chapter of the Association for Computational Linguistics and Human Language Technology, (NAACL-HLT), 2003.
[4]
Belasco, A., Curtis, J., Kahlert, R., Klein, C., Mayans, C., Reagan, P. 2002. Representing Knowledge Gaps Effectively. In Practical Aspects of Knowledge Management, (PAKM), Vienna, Austria, December 2-3.
[5]
Chklovski, T. and Mihalcea, R. 2002. Building a Sense Tagged Corpus with Open Mind Word Expert. In Proceedings of the Workshop on "Word Sense Disambiguation: Recent Successes and Future Directions", Association for Computational Linguistics (ACL) 2002. pp. 116--122.
[6]
Chklovski, T. 2003a. Using Analogy to Acquire Commonsense Knowledge from Human Contributors, PhD thesis. MIT Artificial Intelligence Laboratory technical report AITR-2003-002.
[7]
Chklovski, T. 2003b. LEARNER: A System for Acquiring Commonsense Knowledge by Analogy. In Proceedings of Second International Conference on Knowledge Capture (K-CAP 2003).
[8]
Chklovski, T. and Gil, Y. 2005. An Analysis of Knowledge Collected from Volunteer Contributors. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05).
[9]
Chklovski, T. 2005. Designing Interfaces for Guided Collection of Knowledge about Everyday Objects from Volunteers. In Proceedings of Conference on Intelligent User Interfaces (IUI-05) San Diego, CA.
[10]
Dolan, W. B., Quirk, C., and Brockett, C. 2004. Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In Proceedings of COLING 2004, Geneva, Switzerland.
[11]
Gupta, R., and Kochenderfer, M. 2004. Common sense data acquisition for indoor mobile robots. In Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04).
[12]
Knight, K. and Marcu, D. 2000. Statistics-based summarization Step one: Sentence compression. In Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-00).
[13]
Mihalcea, R., and Chklovski, T. 2004. Building Sense Tagged Corpora with Volunteer Contributions over the Web. In Current Issues in Linguistic Theory: Recent Advances in Natural Language Processing, Nicolas Nicolov and Ruslan Mitkov (eds), John Benjamins Publishers.
[14]
Narayanan, S., Ananthakrishnan, S., Belvin, R., Ettelaie, E. Ganjavi, S. Georgiou, P., Hein, C., Kadambe, S., Knight, K., Marcu, D., Neely, H., Srinivasamurthy, N., Traum, D. and Wang, D. 2003. Transonics: A Speech to Speech System for English-Persian Interactions. In Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (IEEE ASRU).
[15]
Stork, D. 2003. Invited talk at the Workshop on Distributed and Collaborative Knowledge Capture (DC-KCAP), held in conjunction with the International conference on Knowledge Capture (K-CAP 2003).

Cited By

View all
  • (2023) Juegos con propósito para la anotación del Corpus Oral Sonoro del Español rural Dialectologia et Geolinguistica10.1515/dialect-2023-000731:1(135-164)Online publication date: 22-Dec-2023
  • (2022)Gamified crowdsourcing for idiom corpora constructionNatural Language Engineering10.1017/S1351324921000401(1-33)Online publication date: 20-Jan-2022
  • (2020)User Utterance Acquisition for Training Task-Oriented Bots: A Review of Challenges, Techniques and OpportunitiesIEEE Internet Computing10.1109/MIC.2020.297815724:3(30-38)Online publication date: 1-May-2020
  • Show More Cited By

Index Terms

  1. Collecting paraphrase corpora from volunteer contributors

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      K-CAP '05: Proceedings of the 3rd international conference on Knowledge capture
      October 2005
      234 pages
      ISBN:1595931635
      DOI:10.1145/1088622
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 October 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. interfaces for knowledge elicitation
      2. paraphrase corpora
      3. volunteer contributor-based knowledge acquisition

      Qualifiers

      • Article

      Conference

      K-Cap05
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 55 of 198 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023) Juegos con propósito para la anotación del Corpus Oral Sonoro del Español rural Dialectologia et Geolinguistica10.1515/dialect-2023-000731:1(135-164)Online publication date: 22-Dec-2023
      • (2022)Gamified crowdsourcing for idiom corpora constructionNatural Language Engineering10.1017/S1351324921000401(1-33)Online publication date: 20-Jan-2022
      • (2020)User Utterance Acquisition for Training Task-Oriented Bots: A Review of Challenges, Techniques and OpportunitiesIEEE Internet Computing10.1109/MIC.2020.297815724:3(30-38)Online publication date: 1-May-2020
      • (2019)Assessing the Robustness of Conversational Agents using Paraphrases2019 IEEE International Conference On Artificial Intelligence Testing (AITest)10.1109/AITest.2019.000-7(55-62)Online publication date: Apr-2019
      • (2017)CrowdsourcingHandbook of Linguistic Annotation10.1007/978-94-024-0881-2_10(277-295)Online publication date: 17-Jun-2017
      • (2015)BibliographyGames with a Purpose (Gwaps)10.1002/9781119136309.biblio(127-134)Online publication date: 3-Jul-2015
      • (2014)Annotation Game for Textual Entailment EvaluationProceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 840310.1007/978-3-642-54906-9_28(340-350)Online publication date: 6-Apr-2014
      • (2014)Gaming with Purpose: Heuristic Understanding of Ubiquitous Game Development and Design for Human ComputationHandbook of Digital Games10.1002/9781118796443.ch24(645-666)Online publication date: 7-Mar-2014
      • (2013)Crowdsourced Knowledge AcquisitionInternational Journal on Semantic Web & Information Systems10.4018/ijswis.20130701029:3(14-41)Online publication date: 1-Jul-2013
      • (2013)Phrase detectivesACM Transactions on Interactive Intelligent Systems10.1145/2448116.24481193:1(1-44)Online publication date: 24-Apr-2013
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media