Nothing Special   »   [go: up one dir, main page]

Skip to main content

ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2021)

Abstract

This paper presents ParaQA, a question answering (QA) dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG). The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation. The existing datasets for conversational question answering over KGs (single-turn/multi-turn) focus on question paraphrasing and provide only up to one answer verbalization. However, ParaQA contains 5000 question-answer pairs with a minimum of two and a maximum of eight unique paraphrased responses for each question. We complement the dataset with baseline models and illustrate the advantage of having multiple paraphrased answers through commonly used metrics such as BLEU and METEOR. The ParaQA dataset is publicly available on a persistent URI for broader usage and adaptation in the research community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://docs.microsoft.com/en-us/cortana/skills/mva31-understanding-conversations.

  2. 2.

    http://qald.aksw.org/.

  3. 3.

    https://www.mturk.com/.

  4. 4.

    http://www.statmt.org/wmt18/translation-task.html.

  5. 5.

    https://github.com/barshana-banerjee/ParaQA.

  6. 6.

    https://creativecommons.org/licenses/by/4.0/.

  7. 7.

    https://github.com/barshana-banerjee/ParaQA_Experiments.

  8. 8.

    https://opensource.org/licenses/MIT.

  9. 9.

    http://cleopatra-project.eu/.

  10. 10.

    https://sda.tech/.

  11. 11.

    https://www.iais.fraunhofer.de/.

  12. 12.

    https://github.com/barshana-banerjee/ParaQA_Experiments.

References

  1. Abujabal, A., Roy, R.S., Yahya, M., Weikum, G.: Comqa: a community-sourced dataset for complex factoid question answering with paraphrase clusters. In: NAACL (Long and Short Papers), pp. 307–317 (2019)

    Google Scholar 

  2. Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics (2005)

    Google Scholar 

  3. Bao, J., Duan, N., Yan, Z., Zhou, M., Zhao, T.: Constraint-based question answering with knowledge graph. In: COLING, pp. 2503–2514 (2016)

    Google Scholar 

  4. Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: EMNLP, pp. 1533–1544 (2013)

    Google Scholar 

  5. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008)

    Google Scholar 

  6. Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015)

  7. Boussaha, B.E.A., Hernandez, N., Jacquin, C., Morin, E.: Deep retrieval-based dialogue systems: A short review. arXiv preprint arXiv:1907.12878 (2019)

  8. Cai, Q., Yates, A.: Large-scale semantic parsing via schema matching and lexicon extension. In: ACL (2013)

    Google Scholar 

  9. Christmann, P., Saha Roy, R., Abujabal, A., Singh, J., Weikum, G.: Look before you hop: Conversational question answering over knowledge graphs using judicious context expansion. In: CIKM, pp. 729–738 (2019)

    Google Scholar 

  10. Dubey, M., Banerjee, D., Abdelkawi, A., Lehmann, J.: LC-QuAD 2.0: a large dataset for complex question answering over Wikidata and DBpedia. In: Ghidini, C., et al. (eds.) ISWC 2019, Part II. LNCS, vol. 11779, pp. 69–78. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_5

    Chapter  Google Scholar 

  11. Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In: ACL (2018)

    Google Scholar 

  12. Egonmwan, E., Chali, Y.: Transformer and seq2seq model for paraphrase generation. In: Proceedings of the 3rd Workshop on Neural Generation and Translation, pp. 249–255. ACL, Hong Kong (Nov 2019)

    Google Scholar 

  13. Ell, B., Harth, A., Simperl, E.: SPARQL query verbalization for explaining semantic search engine queries. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 426–441. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_29

    Chapter  Google Scholar 

  14. Federmann, C., Elachqar, O., Quirk, C.: Multilingual whispers: generating paraphrases with translation. In: Proceedings of the 5th Workshop on Noisy User-Generated Text (W-NUT 2019), pp. 17–26. ACL (2019)

    Google Scholar 

  15. Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional Sequence to Sequence Learning. arXiv e-prints arXiv:1705.03122 (May 2017)

  16. Gupta, P., Mehri, S., Zhao, T., Pavel, A., Eskenazi, M., Bigham, J.: Investigating evaluation of open-domain dialogue systems with human generated multiple references. In: SIGdial, pp. 379–391. ACL (2019)

    Google Scholar 

  17. Hasan, S.A., et al.: Neural clinical paraphrase generation with attention. In: Clinical Natural Language Processing Workshop (ClinicalNLP), pp. 42–53 (Dec 2016)

    Google Scholar 

  18. Honnibal, M., Montani, I.: spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017)

    Google Scholar 

  19. Huang, M., Zhu, X., Gao, J.: Challenges in building intelligent open-domain dialog systems. ACM Trans. Inf. Syst. (TOIS) 38(3), 1–32 (2020)

    Google Scholar 

  20. Kacupaj, E., Zafar, H., Lehmann, J., Maleshkova, M.: VQuAnDa: verbalization question answering dataset. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 531–547. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_31

    Chapter  Google Scholar 

  21. Lehmann, J., et al.: Dbpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6, 167–195 (2015)

    Article  Google Scholar 

  22. Liu, K., Feng, Y.: Deep learning in question answering. In: Deng, L., Liu, Y. (eds.) Deep Learning in Natural Language Processing, pp. 185–217. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5209-5_7

    Chapter  Google Scholar 

  23. Lowe, R., Pow, N., Serban, I., Pineau, J.: The Ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems. In: SigDial, pp. 285–294. Association for Computational Linguistics (2015)

    Google Scholar 

  24. Luong, M.T., Pham, H., Manning, C.D.: Effective Approaches to Attention-based Neural Machine Translation. arXiv e-prints arXiv:1508.04025 (Aug 2015)

  25. McKeown, K.R.: Paraphrasing questions using given and new information. Am. J. Comput. Linguist. 9(1), 1–10 (1983)

    MathSciNet  Google Scholar 

  26. Ngonga Ngomo, A.C., Bühmann, L., Unger, C., Lehmann, J., Gerber, D.: Sorry, i don’t speak SPARQL: translating SPARQL queries into natural language. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 977–988 (2013)

    Google Scholar 

  27. Ngonga Ngomo, A.C., Bühmann, L., Unger, C., Lehmann, J., Gerber, D.: SPARQL2NL: verbalizing SPARQL queries. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 329–332. ACM (2013)

    Google Scholar 

  28. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  29. Prakash, A., et al.: Neural paraphrase generation with stacked residual LSTM networks. In: COLING (Dec 2016)

    Google Scholar 

  30. Quirk, C., Brockett, C., Dolan, W.: Monolingual machine translation for paraphrase generation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 142–149. Association for Computational Linguistics (Jul 2004)

    Google Scholar 

  31. Saha, A., Pahuja, V., Khapra, M.M., Sankaranarayanan, K., Chandar, S.: Complex sequential question answering: Towards learning to converse over linked question answer pairs with a knowledge graph. In: Thirty-Second AAAI Conference (2018)

    Google Scholar 

  32. Serban, I.V., et al.: Generating factoid questions with recurrent neural networks: the 30 m factoid question-answer corpus. arXiv preprint arXiv:1603.06807 (2016)

  33. Shen, T., et al.: Multi-task learning for conversational question answering over a large-scale knowledge base. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2442–2451. Association for Computational Linguistics (2019)

    Google Scholar 

  34. Singh, K., et al.: Why reinvent the wheel: let’s build question answering systems together. In: Proceedings of the 2018 World Wide Web Conference, pp. 1247–1256 (2018)

    Google Scholar 

  35. Su, Y., et al.: On generating characteristic-rich question sets for QA evaluation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)

    Google Scholar 

  36. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)

    Google Scholar 

  37. Talmor, A., Berant, J.: The web as a knowledge-base for answering complex questions. arXiv preprint arXiv:1803.06643 (2018)

  38. Trivedi, P., Maheshwari, G., Dubey, M., Lehmann, J.: LC-QuAD: a corpus for complex question answering over knowledge graphs. In: d’Amato, C., et al. (eds.) ISWC 2017, Part II. LNCS, vol. 10588, pp. 210–218. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_22

    Chapter  Google Scholar 

  39. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  40. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  41. Wubben, S., van den Bosch, A., Krahmer, E.: Paraphrase generation as monolingual translation: data and evaluation. In: Proceedings of the 6th International Natural Language Generation Conference (2010)

    Google Scholar 

  42. Yih, W.T., Richardson, M., Meek, C., Chang, M.W., Suh, J.: The value of semantic parse labeling for knowledge base question answering. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016)

    Google Scholar 

  43. Zamanirad, S., Benatallah, B., Rodriguez, C., Yaghoubzadehfard, M., Bouguelia, S., Brabra, H.: State machine based human-bot conversation model and services. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 199–214. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_13

    Chapter  Google Scholar 

Download references

Acknowledgments

The project leading to this publication has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 812997 (Cleopatra).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Endri Kacupaj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kacupaj, E., Banerjee, B., Singh, K., Lehmann, J. (2021). ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation. In: Verborgh, R., et al. The Semantic Web. ESWC 2021. Lecture Notes in Computer Science(), vol 12731. Springer, Cham. https://doi.org/10.1007/978-3-030-77385-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77385-4_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77384-7

  • Online ISBN: 978-3-030-77385-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics