ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12731))

Included in the following conference series:

European Semantic Web Conference

2592 Accesses
4 Citations

Abstract

This paper presents ParaQA, a question answering (QA) dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG). The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation. The existing datasets for conversational question answering over KGs (single-turn/multi-turn) focus on question paraphrasing and provide only up to one answer verbalization. However, ParaQA contains 5000 question-answer pairs with a minimum of two and a maximum of eight unique paraphrased responses for each question. We complement the dataset with baseline models and illustrate the advantage of having multiple paraphrased answers through commonly used metrics such as BLEU and METEOR. The ParaQA dataset is publicly available on a persistent URI for broader usage and adaptation in the research community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evaluating the Robustness of Question-Answering Models to Paraphrased Questions

Conversational question answering: a survey

Article Open access 06 September 2022

DaNetQA: A Yes/No Question Answering Dataset for the Russian Language

Notes

References

Abujabal, A., Roy, R.S., Yahya, M., Weikum, G.: Comqa: a community-sourced dataset for complex factoid question answering with paraphrase clusters. In: NAACL (Long and Short Papers), pp. 307–317 (2019)
Google Scholar
Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics (2005)
Google Scholar
Bao, J., Duan, N., Yan, Z., Zhou, M., Zhao, T.: Constraint-based question answering with knowledge graph. In: COLING, pp. 2503–2514 (2016)
Google Scholar
Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: EMNLP, pp. 1533–1544 (2013)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008)
Google Scholar
Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015)
Boussaha, B.E.A., Hernandez, N., Jacquin, C., Morin, E.: Deep retrieval-based dialogue systems: A short review. arXiv preprint arXiv:1907.12878 (2019)
Cai, Q., Yates, A.: Large-scale semantic parsing via schema matching and lexicon extension. In: ACL (2013)
Google Scholar
Christmann, P., Saha Roy, R., Abujabal, A., Singh, J., Weikum, G.: Look before you hop: Conversational question answering over knowledge graphs using judicious context expansion. In: CIKM, pp. 729–738 (2019)
Google Scholar
Dubey, M., Banerjee, D., Abdelkawi, A., Lehmann, J.: LC-QuAD 2.0: a large dataset for complex question answering over Wikidata and DBpedia. In: Ghidini, C., et al. (eds.) ISWC 2019, Part II. LNCS, vol. 11779, pp. 69–78. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_5
Chapter Google Scholar
Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In: ACL (2018)
Google Scholar
Egonmwan, E., Chali, Y.: Transformer and seq2seq model for paraphrase generation. In: Proceedings of the 3rd Workshop on Neural Generation and Translation, pp. 249–255. ACL, Hong Kong (Nov 2019)
Google Scholar
Ell, B., Harth, A., Simperl, E.: SPARQL query verbalization for explaining semantic search engine queries. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 426–441. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_29
Chapter Google Scholar
Federmann, C., Elachqar, O., Quirk, C.: Multilingual whispers: generating paraphrases with translation. In: Proceedings of the 5th Workshop on Noisy User-Generated Text (W-NUT 2019), pp. 17–26. ACL (2019)
Google Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional Sequence to Sequence Learning. arXiv e-prints arXiv:1705.03122 (May 2017)
Gupta, P., Mehri, S., Zhao, T., Pavel, A., Eskenazi, M., Bigham, J.: Investigating evaluation of open-domain dialogue systems with human generated multiple references. In: SIGdial, pp. 379–391. ACL (2019)
Google Scholar
Hasan, S.A., et al.: Neural clinical paraphrase generation with attention. In: Clinical Natural Language Processing Workshop (ClinicalNLP), pp. 42–53 (Dec 2016)
Google Scholar
Honnibal, M., Montani, I.: spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017)
Google Scholar
Huang, M., Zhu, X., Gao, J.: Challenges in building intelligent open-domain dialog systems. ACM Trans. Inf. Syst. (TOIS) 38(3), 1–32 (2020)
Google Scholar
Kacupaj, E., Zafar, H., Lehmann, J., Maleshkova, M.: VQuAnDa: verbalization question answering dataset. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 531–547. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_31
Chapter Google Scholar
Lehmann, J., et al.: Dbpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6, 167–195 (2015)
Article Google Scholar
Liu, K., Feng, Y.: Deep learning in question answering. In: Deng, L., Liu, Y. (eds.) Deep Learning in Natural Language Processing, pp. 185–217. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5209-5_7
Chapter Google Scholar
Lowe, R., Pow, N., Serban, I., Pineau, J.: The Ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems. In: SigDial, pp. 285–294. Association for Computational Linguistics (2015)
Google Scholar
Luong, M.T., Pham, H., Manning, C.D.: Effective Approaches to Attention-based Neural Machine Translation. arXiv e-prints arXiv:1508.04025 (Aug 2015)
McKeown, K.R.: Paraphrasing questions using given and new information. Am. J. Comput. Linguist. 9(1), 1–10 (1983)
MathSciNet Google Scholar
Ngonga Ngomo, A.C., Bühmann, L., Unger, C., Lehmann, J., Gerber, D.: Sorry, i don’t speak SPARQL: translating SPARQL queries into natural language. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 977–988 (2013)
Google Scholar
Ngonga Ngomo, A.C., Bühmann, L., Unger, C., Lehmann, J., Gerber, D.: SPARQL2NL: verbalizing SPARQL queries. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 329–332. ACM (2013)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002)
Google Scholar
Prakash, A., et al.: Neural paraphrase generation with stacked residual LSTM networks. In: COLING (Dec 2016)
Google Scholar
Quirk, C., Brockett, C., Dolan, W.: Monolingual machine translation for paraphrase generation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 142–149. Association for Computational Linguistics (Jul 2004)
Google Scholar
Saha, A., Pahuja, V., Khapra, M.M., Sankaranarayanan, K., Chandar, S.: Complex sequential question answering: Towards learning to converse over linked question answer pairs with a knowledge graph. In: Thirty-Second AAAI Conference (2018)
Google Scholar
Serban, I.V., et al.: Generating factoid questions with recurrent neural networks: the 30 m factoid question-answer corpus. arXiv preprint arXiv:1603.06807 (2016)
Shen, T., et al.: Multi-task learning for conversational question answering over a large-scale knowledge base. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2442–2451. Association for Computational Linguistics (2019)
Google Scholar
Singh, K., et al.: Why reinvent the wheel: let’s build question answering systems together. In: Proceedings of the 2018 World Wide Web Conference, pp. 1247–1256 (2018)
Google Scholar
Su, Y., et al.: On generating characteristic-rich question sets for QA evaluation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)
Google Scholar
Talmor, A., Berant, J.: The web as a knowledge-base for answering complex questions. arXiv preprint arXiv:1803.06643 (2018)
Trivedi, P., Maheshwari, G., Dubey, M., Lehmann, J.: LC-QuAD: a corpus for complex question answering over knowledge graphs. In: d’Amato, C., et al. (eds.) ISWC 2017, Part II. LNCS, vol. 10588, pp. 210–218. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_22
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Article Google Scholar
Wubben, S., van den Bosch, A., Krahmer, E.: Paraphrase generation as monolingual translation: data and evaluation. In: Proceedings of the 6th International Natural Language Generation Conference (2010)
Google Scholar
Yih, W.T., Richardson, M., Meek, C., Chang, M.W., Suh, J.: The value of semantic parse labeling for knowledge base question answering. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016)
Google Scholar
Zamanirad, S., Benatallah, B., Rodriguez, C., Yaghoubzadehfard, M., Bouguelia, S., Brabra, H.: State machine based human-bot conversation model and services. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 199–214. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_13
Chapter Google Scholar

Download references

Acknowledgments

The project leading to this publication has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 812997 (Cleopatra).

Author information

Authors and Affiliations

University of Bonn, Bonn, Germany
Endri Kacupaj, Barshana Banerjee & Jens Lehmann
Zerotha Research and Cerence GmbH, Aachen, Germany
Kuldeep Singh
Fraunhofer IAIS, Dresden, Germany
Jens Lehmann

Authors

Endri Kacupaj
View author publications
You can also search for this author in PubMed Google Scholar
Barshana Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Kuldeep Singh
View author publications
You can also search for this author in PubMed Google Scholar
Jens Lehmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Endri Kacupaj .

Editor information

Editors and Affiliations

Ghent University, Ghent, Belgium
Ruben Verborgh
Aalborg University, Aalborg, Denmark
Katja Hose
University of Mannheim, Mannheim, Germany
Heiko Paulheim
ERCIM, Sophia Antipolis, France
Pierre-Antoine Champin
University of Siegen, Siegen, Germany
Maria Maleshkova
Universidad Politécnica de Madrid, Boadilla del Monte, Spain
Oscar Corcho
eBay Inc., San Jose, CA, USA
Petar Ristoski
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Eggenstein-Leopoldshafen, Germany
Mehwish Alam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kacupaj, E., Banerjee, B., Singh, K., Lehmann, J. (2021). ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation. In: Verborgh, R., et al. The Semantic Web. ESWC 2021. Lecture Notes in Computer Science(), vol 12731. Springer, Cham. https://doi.org/10.1007/978-3-030-77385-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-77385-4_36
Published: 31 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77384-7
Online ISBN: 978-3-030-77385-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating the Robustness of Question-Answering Models to Paraphrased Questions

Conversational question answering: a survey

DaNetQA: A Yes/No Question Answering Dataset for the Russian Language

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating the Robustness of Question-Answering Models to Paraphrased Questions

Conversational question answering: a survey

DaNetQA: A Yes/No Question Answering Dataset for the Russian Language

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation