Query Generation for Answering Complex Questions in Russian Using a Syntax Parser

D. A. Evseev¹

60 Accesses
Explore all metrics

Abstract

This article presents a system that translates natural language questions into SPARQL queries. The question answering system includes a syntax parser that generates a parse tree of the input sentence; a component that generates a SPARQL query template based on the parse tree; and models that identify entities and relations to be inserted into the SPARQL query template. Entity extraction and relation ranking is performed using BERT. Training BERT for a Russian-language question answering system is faced with the problem of an insufficient volume of available training data. To counter this issue, we investigate the possibility of training multilingual BERT pretrained on the LC-QUAD2.0 dataset to perform the tasks of entity extraction and relation ranking on a small amount of Russian-language samples from the RuBQ dataset. The proposed question answering system, as tested on the RuBQ dataset, outperforms the accuracy of previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

REFERENCES

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P., SQuAD: 100,000+ questions for machine comprehension of text, Proc. 2016 Conf. on Empirical Methods in Natural Language Processing, Austin, Texas, 2016, Association for Computational Linguistics, 2016, pp. 2383–2392. https://doi.org/10.18653/v1/D16-1264
Chen, D., Fisch, A., Weston, J., and Bordes, A., Reading Wikipedia to answer open-domain questions, Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, Association for Computational Linguistics, 2017, vol. 1, pp. 1870–1879. https://doi.org/10.18653/v1/P17-1171
Seo, M., Lee, J., Kwiatkowski, T., Parikh, A., Farhadi, A., and Hajishirzi, H., Real-time open-domain question answering with dense-sparse phrase index, Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, 2019, Association for Computational Linguistics, 2019, pp. 4430–4441. https://doi.org/10.18653/v1/P19-1436
Bordes A., Usunier N., Chopra S., and Weston J., Large-scale simple question answering with memory networks, 2015. arXiv:1506.02075 [cs.LG].
Vakulenko, S., Garcia, J.D.F., Polleres, A., de Rijke, M., and Cochez, M., Message passing for complex question answering over knowledge graphs, CIKM ’19: Proc. 28th ACM Int. Conf. on Information and Knowledge Management, Beijing, 2019, New York: Association for Computing Machinery, 2019, pp. 1431–1440. https://doi.org/10.1145/3357384.3358026
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., BERT: Pre-training of deep bidirectional transformers for language understanding, Proc. 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minn., 2019, Association for Computational Linguistics, 2019, vol. 1, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423
Li, X., Sun, X., Meng, Y., Liang, J., Wu, F., and Li, J., Dice loss for data-imbalanced NLP tasks, Proc. 58th Annu. Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, pp. 465–476. https://doi.org/10.18653/v1/2020.acl-main.45
Korablinov, V. and Braslavski, P. RuBQ: A Russian dataset for question answering over Wikidata, The Semantic Web—ISWC 2020, Pan, J.Z., Tamma, V., d’Amato, C., Janowicz, K., Fu, Bo, Polleres, A., Seneviratne, O., and Kagal, L., Eds., Lecture Notes in Computer Science, vol. 12607, Cham: Springer, 2020, pp. 97–110. https://doi.org/10.1007/978-3-030-62466-8_7
Konovalov, V.P., Gulyaev, P.A., Sorokin, A.A., Kuratov, Y.M., and Burtsev, M.S., Exploring the BERT cross-lingual transfer for reading comprehension, Computational Linguistics and Intellectual Technologies, Selegei, V.P., Ed., Moscow: Ross. Gos. Gumanit. Univ., 2020, pp. 445–453. https://doi.org/10.28995/2075-7182-2020-19-445-453
Book Google Scholar
Pires, T., Schlinger, E., and Garrette, D., How multilingual is multilingual BERT?, Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, 2019, Association for Computational Linguistics, 2019, pp. 4996–5001. https://doi.org/10.18653/v1/P19-1493
Dubey, M., Banerjee, D., Abdelkawi, A., and Lehmann, J., LC-QuAD 2.0: A large dataset for complex question answering over Wikidata and DBpedia, The Semantic Web—ISWC 2019, Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I., Hogan, A., Song, J., Lefrançois, M., and Gandon, F., Eds., Lecture Notes in Computer Science, vol. 11779, Cham: Springer, 2019, pp. 69–78. https://doi.org/10.1007/978-3-030-30796-7_5
Book Google Scholar
Dai, Z., Li, L., and Xu, W., CFO: Conditional focused neural question answering with large-scale knowledge bases, Proc. 54th Annu. Meeting of the Association for Computational Linguistics Berlin, 2016, Association for Computational Linguistics, 2016, vol. 1, pp. 800–810. https://doi.org/10.18653/v1/P16-1076
Ture F. and Jojic, O., No need to pay attention: Simple recurrent neural networks work!, Proc. 2017 Conf. on Empirical Methods in Natural Language Processing, Copenhagen, 2017, Association for Computational Linguistics, 2017, pp. 2866–2872. https://doi.org/10.18653/v1/D17-1307
Mohammed, S., Shi, P., and Lin, J., Strong baselines for simple question answering over knowledge graphs with and without neural networks, Proc. 2018 Conf. of the North American Chapter of the Association for Computational Linguistics, New Orleans, 2018, Association for Computational Linguistics, 2018, pp. 291–296. https://doi.org/10.18653/v1/N18-2047
Maheshwari, G., Trivedi, P., Lukovnikov, D., Chakraborty, N., Fischer, A., and Lehmann, J., Learning to rank query graphs for complex question answering over knowledge graphs, The Semantic Web—ISWC 2019, Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I., Hogan, A., Song, J., Lefrançois, M., and Gandon, F., Eds., Lecture Notes in Computer Science, vol. 11779, Cham: Springer, 2019, pp. 487–504. https://doi.org/10.1007/978-3-030-30793-6_28
Book Google Scholar
Zafar, H., Napolitano, G., and Lehmann, J., Formal query generation for question answering over knowledge bases, The Semantic Web. ESWC 2018, Gangemi, A., Navigli, R., Vidal, M.-E., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., and Alam, M., Eds., Lecture Notes in Computer Science, vol. 10843, Cham: Springer, 2018, pp. 714–728. https://doi.org/10.1007/978-3-319-93417-4_46
Book Google Scholar
Ochieng, P., PAROT: Translating natural language to SPARQL, Expert Syst. Appl.: X, 2020, vol. 5, p. 100024. https://doi.org/10.1016/j.eswax.2020.100024
Article Google Scholar
Evseev, D.A. and Arkhipov, M.Y., SPARQL query generation for complex question answering with BERT and BiLSTM-based model, Computational Linguistics and Intellectual Technologies, Moscow: Ross. Gos. Gumanit. Univ., 2020, pp. 276–282. https://doi.org/10.28995/2075-7182-2020-19-270-282
Book Google Scholar
Diefenbach, D., Both, A., Singh, K., and Maret, P., Towards a question answering system over the semantic web, Semantic Web, 2020, vol. 11, no. 3, pp. 421–439. https://doi.org/10.3233/SW-190343
Article Google Scholar

Download references

CONFLICT OF INTEREST

The author declares that he has no conflicts of interest.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, 141701, Moscow, Russia
D. A. Evseev

Authors

D. A. Evseev
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to D. A. Evseev.

Additional information

Translated by A. Ovchinnikova

About this article

Cite this article

Evseev, D.A. Query Generation for Answering Complex Questions in Russian Using a Syntax Parser. Sci. Tech. Inf. Proc. 49, 310–316 (2022). https://doi.org/10.3103/S0147688222050045

Download citation

Received: 14 December 2020
Revised: 12 May 2021
Accepted: 03 June 2021
Published: 06 March 2023
Issue Date: December 2022
DOI: https://doi.org/10.3103/S0147688222050045

Abstract

Access this article

Subscribe and save

Buy Now

Explore related subjects

REFERENCES

CONFLICT OF INTEREST

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords:

Subscribe and save

Buy Now