Abstract
This article presents a system that translates natural language questions into SPARQL queries. The question answering system includes a syntax parser that generates a parse tree of the input sentence; a component that generates a SPARQL query template based on the parse tree; and models that identify entities and relations to be inserted into the SPARQL query template. Entity extraction and relation ranking is performed using BERT. Training BERT for a Russian-language question answering system is faced with the problem of an insufficient volume of available training data. To counter this issue, we investigate the possibility of training multilingual BERT pretrained on the LC-QUAD2.0 dataset to perform the tasks of entity extraction and relation ranking on a small amount of Russian-language samples from the RuBQ dataset. The proposed question answering system, as tested on the RuBQ dataset, outperforms the accuracy of previous approaches.
REFERENCES
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P., SQuAD: 100,000+ questions for machine comprehension of text, Proc. 2016 Conf. on Empirical Methods in Natural Language Processing, Austin, Texas, 2016, Association for Computational Linguistics, 2016, pp. 2383–2392. https://doi.org/10.18653/v1/D16-1264
Chen, D., Fisch, A., Weston, J., and Bordes, A., Reading Wikipedia to answer open-domain questions, Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, Association for Computational Linguistics, 2017, vol. 1, pp. 1870–1879. https://doi.org/10.18653/v1/P17-1171
Seo, M., Lee, J., Kwiatkowski, T., Parikh, A., Farhadi, A., and Hajishirzi, H., Real-time open-domain question answering with dense-sparse phrase index, Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, 2019, Association for Computational Linguistics, 2019, pp. 4430–4441. https://doi.org/10.18653/v1/P19-1436
Bordes A., Usunier N., Chopra S., and Weston J., Large-scale simple question answering with memory networks, 2015. arXiv:1506.02075 [cs.LG].
Vakulenko, S., Garcia, J.D.F., Polleres, A., de Rijke, M., and Cochez, M., Message passing for complex question answering over knowledge graphs, CIKM ’19: Proc. 28th ACM Int. Conf. on Information and Knowledge Management, Beijing, 2019, New York: Association for Computing Machinery, 2019, pp. 1431–1440. https://doi.org/10.1145/3357384.3358026
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minn., 2019, Association for Computational Linguistics, 2019, vol. 1, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423
Li, X., Sun, X., Meng, Y., Liang, J., Wu, F., and Li, J., Dice loss for data-imbalanced NLP tasks, Proc. 58th Annu. Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, pp. 465–476. https://doi.org/10.18653/v1/2020.acl-main.45
Korablinov, V. and Braslavski, P. RuBQ: A Russian dataset for question answering over Wikidata, The Semantic Web—ISWC 2020, Pan, J.Z., Tamma, V., d’Amato, C., Janowicz, K., Fu, Bo, Polleres, A., Seneviratne, O., and Kagal, L., Eds., Lecture Notes in Computer Science, vol. 12607, Cham: Springer, 2020, pp. 97–110. https://doi.org/10.1007/978-3-030-62466-8_7
Konovalov, V.P., Gulyaev, P.A., Sorokin, A.A., Kuratov, Y.M., and Burtsev, M.S., Exploring the BERT cross-lingual transfer for reading comprehension, Computational Linguistics and Intellectual Technologies, Selegei, V.P., Ed., Moscow: Ross. Gos. Gumanit. Univ., 2020, pp. 445–453. https://doi.org/10.28995/2075-7182-2020-19-445-453
Pires, T., Schlinger, E., and Garrette, D., How multilingual is multilingual BERT?, Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, 2019, Association for Computational Linguistics, 2019, pp. 4996–5001. https://doi.org/10.18653/v1/P19-1493
Dubey, M., Banerjee, D., Abdelkawi, A., and Lehmann, J., LC-QuAD 2.0: A large dataset for complex question answering over Wikidata and DBpedia, The Semantic Web—ISWC 2019, Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I., Hogan, A., Song, J., Lefrançois, M., and Gandon, F., Eds., Lecture Notes in Computer Science, vol. 11779, Cham: Springer, 2019, pp. 69–78. https://doi.org/10.1007/978-3-030-30796-7_5
Dai, Z., Li, L., and Xu, W., CFO: Conditional focused neural question answering with large-scale knowledge bases, Proc. 54th Annu. Meeting of the Association for Computational Linguistics Berlin, 2016, Association for Computational Linguistics, 2016, vol. 1, pp. 800–810. https://doi.org/10.18653/v1/P16-1076
Ture F. and Jojic, O., No need to pay attention: Simple recurrent neural networks work!, Proc. 2017 Conf. on Empirical Methods in Natural Language Processing, Copenhagen, 2017, Association for Computational Linguistics, 2017, pp. 2866–2872. https://doi.org/10.18653/v1/D17-1307
Mohammed, S., Shi, P., and Lin, J., Strong baselines for simple question answering over knowledge graphs with and without neural networks, Proc. 2018 Conf. of the North American Chapter of the Association for Computational Linguistics, New Orleans, 2018, Association for Computational Linguistics, 2018, pp. 291–296. https://doi.org/10.18653/v1/N18-2047
Maheshwari, G., Trivedi, P., Lukovnikov, D., Chakraborty, N., Fischer, A., and Lehmann, J., Learning to rank query graphs for complex question answering over knowledge graphs, The Semantic Web—ISWC 2019, Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I., Hogan, A., Song, J., Lefrançois, M., and Gandon, F., Eds., Lecture Notes in Computer Science, vol. 11779, Cham: Springer, 2019, pp. 487–504. https://doi.org/10.1007/978-3-030-30793-6_28
Zafar, H., Napolitano, G., and Lehmann, J., Formal query generation for question answering over knowledge bases, The Semantic Web. ESWC 2018, Gangemi, A., Navigli, R., Vidal, M.-E., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., and Alam, M., Eds., Lecture Notes in Computer Science, vol. 10843, Cham: Springer, 2018, pp. 714–728. https://doi.org/10.1007/978-3-319-93417-4_46
Ochieng, P., PAROT: Translating natural language to SPARQL, Expert Syst. Appl.: X, 2020, vol. 5, p. 100024. https://doi.org/10.1016/j.eswax.2020.100024
Evseev, D.A. and Arkhipov, M.Y., SPARQL query generation for complex question answering with BERT and BiLSTM-based model, Computational Linguistics and Intellectual Technologies, Moscow: Ross. Gos. Gumanit. Univ., 2020, pp. 276–282. https://doi.org/10.28995/2075-7182-2020-19-270-282
Diefenbach, D., Both, A., Singh, K., and Maret, P., Towards a question answering system over the semantic web, Semantic Web, 2020, vol. 11, no. 3, pp. 421–439. https://doi.org/10.3233/SW-190343
CONFLICT OF INTEREST
The author declares that he has no conflicts of interest.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by A. Ovchinnikova
About this article
Cite this article
Evseev, D.A. Query Generation for Answering Complex Questions in Russian Using a Syntax Parser. Sci. Tech. Inf. Proc. 49, 310–316 (2022). https://doi.org/10.3103/S0147688222050045
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0147688222050045