Abstract
This paper presents ParaQA, a question answering (QA) dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG). The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation. The existing datasets for conversational question answering over KGs (single-turn/multi-turn) focus on question paraphrasing and provide only up to one answer verbalization. However, ParaQA contains 5000 question-answer pairs with a minimum of two and a maximum of eight unique paraphrased responses for each question. We complement the dataset with baseline models and illustrate the advantage of having multiple paraphrased answers through commonly used metrics such as BLEU and METEOR. The ParaQA dataset is publicly available on a persistent URI for broader usage and adaptation in the research community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Abujabal, A., Roy, R.S., Yahya, M., Weikum, G.: Comqa: a community-sourced dataset for complex factoid question answering with paraphrase clusters. In: NAACL (Long and Short Papers), pp. 307–317 (2019)
Banerjee, S., Lavie, A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72. Association for Computational Linguistics (2005)
Bao, J., Duan, N., Yan, Z., Zhou, M., Zhao, T.: Constraint-based question answering with knowledge graph. In: COLING, pp. 2503–2514 (2016)
Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: EMNLP, pp. 1533–1544 (2013)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: A collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008)
Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015)
Boussaha, B.E.A., Hernandez, N., Jacquin, C., Morin, E.: Deep retrieval-based dialogue systems: A short review. arXiv preprint arXiv:1907.12878 (2019)
Cai, Q., Yates, A.: Large-scale semantic parsing via schema matching and lexicon extension. In: ACL (2013)
Christmann, P., Saha Roy, R., Abujabal, A., Singh, J., Weikum, G.: Look before you hop: Conversational question answering over knowledge graphs using judicious context expansion. In: CIKM, pp. 729–738 (2019)
Dubey, M., Banerjee, D., Abdelkawi, A., Lehmann, J.: LC-QuAD 2.0: a large dataset for complex question answering over Wikidata and DBpedia. In: Ghidini, C., et al. (eds.) ISWC 2019, Part II. LNCS, vol. 11779, pp. 69–78. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_5
Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In: ACL (2018)
Egonmwan, E., Chali, Y.: Transformer and seq2seq model for paraphrase generation. In: Proceedings of the 3rd Workshop on Neural Generation and Translation, pp. 249–255. ACL, Hong Kong (Nov 2019)
Ell, B., Harth, A., Simperl, E.: SPARQL query verbalization for explaining semantic search engine queries. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 426–441. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07443-6_29
Federmann, C., Elachqar, O., Quirk, C.: Multilingual whispers: generating paraphrases with translation. In: Proceedings of the 5th Workshop on Noisy User-Generated Text (W-NUT 2019), pp. 17–26. ACL (2019)
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional Sequence to Sequence Learning. arXiv e-prints arXiv:1705.03122 (May 2017)
Gupta, P., Mehri, S., Zhao, T., Pavel, A., Eskenazi, M., Bigham, J.: Investigating evaluation of open-domain dialogue systems with human generated multiple references. In: SIGdial, pp. 379–391. ACL (2019)
Hasan, S.A., et al.: Neural clinical paraphrase generation with attention. In: Clinical Natural Language Processing Workshop (ClinicalNLP), pp. 42–53 (Dec 2016)
Honnibal, M., Montani, I.: spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017)
Huang, M., Zhu, X., Gao, J.: Challenges in building intelligent open-domain dialog systems. ACM Trans. Inf. Syst. (TOIS) 38(3), 1–32 (2020)
Kacupaj, E., Zafar, H., Lehmann, J., Maleshkova, M.: VQuAnDa: verbalization question answering dataset. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 531–547. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_31
Lehmann, J., et al.: Dbpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6, 167–195 (2015)
Liu, K., Feng, Y.: Deep learning in question answering. In: Deng, L., Liu, Y. (eds.) Deep Learning in Natural Language Processing, pp. 185–217. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5209-5_7
Lowe, R., Pow, N., Serban, I., Pineau, J.: The Ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems. In: SigDial, pp. 285–294. Association for Computational Linguistics (2015)
Luong, M.T., Pham, H., Manning, C.D.: Effective Approaches to Attention-based Neural Machine Translation. arXiv e-prints arXiv:1508.04025 (Aug 2015)
McKeown, K.R.: Paraphrasing questions using given and new information. Am. J. Comput. Linguist. 9(1), 1–10 (1983)
Ngonga Ngomo, A.C., Bühmann, L., Unger, C., Lehmann, J., Gerber, D.: Sorry, i don’t speak SPARQL: translating SPARQL queries into natural language. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 977–988 (2013)
Ngonga Ngomo, A.C., Bühmann, L., Unger, C., Lehmann, J., Gerber, D.: SPARQL2NL: verbalizing SPARQL queries. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 329–332. ACM (2013)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (2002)
Prakash, A., et al.: Neural paraphrase generation with stacked residual LSTM networks. In: COLING (Dec 2016)
Quirk, C., Brockett, C., Dolan, W.: Monolingual machine translation for paraphrase generation. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 142–149. Association for Computational Linguistics (Jul 2004)
Saha, A., Pahuja, V., Khapra, M.M., Sankaranarayanan, K., Chandar, S.: Complex sequential question answering: Towards learning to converse over linked question answer pairs with a knowledge graph. In: Thirty-Second AAAI Conference (2018)
Serban, I.V., et al.: Generating factoid questions with recurrent neural networks: the 30 m factoid question-answer corpus. arXiv preprint arXiv:1603.06807 (2016)
Shen, T., et al.: Multi-task learning for conversational question answering over a large-scale knowledge base. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2442–2451. Association for Computational Linguistics (2019)
Singh, K., et al.: Why reinvent the wheel: let’s build question answering systems together. In: Proceedings of the 2018 World Wide Web Conference, pp. 1247–1256 (2018)
Su, Y., et al.: On generating characteristic-rich question sets for QA evaluation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)
Talmor, A., Berant, J.: The web as a knowledge-base for answering complex questions. arXiv preprint arXiv:1803.06643 (2018)
Trivedi, P., Maheshwari, G., Dubey, M., Lehmann, J.: LC-QuAD: a corpus for complex question answering over knowledge graphs. In: d’Amato, C., et al. (eds.) ISWC 2017, Part II. LNCS, vol. 10588, pp. 210–218. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_22
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp. 5998–6008 (2017)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Wubben, S., van den Bosch, A., Krahmer, E.: Paraphrase generation as monolingual translation: data and evaluation. In: Proceedings of the 6th International Natural Language Generation Conference (2010)
Yih, W.T., Richardson, M., Meek, C., Chang, M.W., Suh, J.: The value of semantic parse labeling for knowledge base question answering. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016)
Zamanirad, S., Benatallah, B., Rodriguez, C., Yaghoubzadehfard, M., Bouguelia, S., Brabra, H.: State machine based human-bot conversation model and services. In: Dustdar, S., Yu, E., Salinesi, C., Rieu, D., Pant, V. (eds.) CAiSE 2020. LNCS, vol. 12127, pp. 199–214. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49435-3_13
Acknowledgments
The project leading to this publication has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 812997 (Cleopatra).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kacupaj, E., Banerjee, B., Singh, K., Lehmann, J. (2021). ParaQA: A Question Answering Dataset with Paraphrase Responses for Single-Turn Conversation. In: Verborgh, R., et al. The Semantic Web. ESWC 2021. Lecture Notes in Computer Science(), vol 12731. Springer, Cham. https://doi.org/10.1007/978-3-030-77385-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-77385-4_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77384-7
Online ISBN: 978-3-030-77385-4
eBook Packages: Computer ScienceComputer Science (R0)