Gar: Natural Language to SQL Translation with Efficient Generate-and-Rank
Pages 411 - 427
Abstract
Web applications heavily depend on databases, yet the conventional database interface often presents challenges for efficient data utilization. It is imperative to address the considerable demand emanating from a vast array of end users seeking seamless input of their requirements and effortless retrieval of query results. Natural Language (NL) Interfaces to Databases serve to make databases accessible to end users. Mainstream approaches typically prioritize building language translation models for converting NL queries to SQL queries, while a novel generate-and-rank approach is proposed to achieve this through a procedure involving generation and ranking. Despite yielding superior translation results on the public benchmark, this generate-and-rank approach encounters efficiency issues that may impede its practical application. In this paper, we introduce Gar, which extends the existing generate-and-rank approach for a more efficient generation and robust ranking procedure. Specifically, Gar utilizes the bloom filter to accelerate the data generation process by reducing unnecessary function calls. Additionally, Gar provides a brand-new implementation of the ranking module, specifically the re-ranking model, empowered with enhanced language understanding ability. We evaluate the effectiveness of Gar on three public benchmarks, namely GEO, Spider, and Mt-teql. Gar achieved an overall accuracy of 66.6% on Geo, 80.6% on Spider, and 78.4% on Mt-teql, respectively.
References
[1]
Fan, Y., et al.: GAR: a generate-and-rank approach for natural language to SQL translation. In: Proceedings of the 39th International Conference on Data Engineering, pp. 110–122 (2023)
[2]
Fan, Y., Ren, T., He, Z., Wang, X., Zhang, Y., Li, X.: GenSQL: a generative natural language interface to database systems. In: Proceedings of the 39th International Conference on Data Engineering, pp. 3603–3606 (2023)
[3]
Bogin, B., Berant, J., Gardner, M.: Representing schema structure with graph neural networks for text-to-SQL parsing. In: Association for Computational Linguistics, pp. 4560–4565 (2019)
[4]
Guo, J., et al.: Towards complex text-to-SQL in cross-domain database with intermediate representation. In: Association for Computational Linguistics, pp. 4524–4535 (2019)
[5]
Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. In: Association for Computational Linguistics, pp. 7567–7578 (2020)
[6]
Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 4870–4888 (2020)
[7]
Rubin, O., Berant, J.: SmBoP: semi-autoregressive bottom-up semantic parsing. In: Proceedings of the Human Language Technology Annual Conference North American Chapter Association Computational Linguistics, pp. 311–324 (2021)
[8]
Shi, P., et al.: Learning contextual representations for semantic parsing with generation-augmented pre-training. In: Proceedings of the AAAI Conference Artificial Intelligence, pp. 13806–13814 (2021)
[9]
Scholak, T., Schucher, N., Bahdanau, D.: PICARD: parsing incrementally for constrained auto-regressive decoding from language models. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 9895–9901 (2021)
[10]
Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 3911–3921 (2018)
[11]
Ma P and Wang S MT-Teql: evaluating and augmenting neural NLIDB on real-world linguistic and schema variations Proc. VLDB Endowment 2021 15 3 569-582
[12]
Suhr, A., Chang, M., Shaw, P., Lee, K.: Exploring unexplored generalization challenges for cross-database semantic parsing. In: Association for Computational Linguistics, pp. 8372–8388 (2020)
[13]
Zelle, J.M., Mooney, R.J.: Learning to parse database queries using inductive logic programming. In: Proceedings of the AAAI Conference Artificial Intelligence, pp. 1050–1055 (1996)
[14]
Gan, Y., et al.: Towards robustness of text-to-SQL models against synonym substitution. In: Association for Computational Linguistics, pp. 2505–2515 (2021)
[15]
Deng, X., Awadallah, A.H., Meek, C., Polozov, O., Sun, H., Richardson, M.: Structure-grounded pretraining for text-to-SQL. In: Proceedings of the Human Language Technology Annual Conference North America Chapter Association Computation Linguistics, pp. 1337–1350 (2021)
[16]
Gan,Y., Chen, X. Purver, M.: Exploring underexplored limitations of cross-domain text-to-SQL generalization. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 8926–8931 (2021)
[17]
Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Explaining structured queries in natural language. In: Proceedings of the 12st International Conference on Data Engineering, pp. 333–344 (2010)
[18]
Xu, K., Wu, L., Wang, Z., Feng, Y., Sheinin, V.: SQL-to-text generation with graph-to-sequence model. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 931–936 (2018)
[19]
Dai, Z., Callan, J.: Deeper text understanding for IR with contextual neural language modeling. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 985–988 (2019)
[20]
Han, S., Wang, X., Bendersky, M., Najork, M.: Learning-to-rank with BERT in TF-ranking. arXiv preprint arXiv:2004.08476 (2020)
[21]
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Human Language Technology Annual Conference North America Chapter Association Computational Linguistics, pp. 4171–4186 (2019)
[22]
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceeding of the IEEE Conference on Computer Vision Pattern Recognition, pp. 815-823 (2015)-
[23]
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 3980–3990 (2019)
[24]
Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: Association for Computational Linguistics, pp. 963–973 (2017)
[25]
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)
[26]
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
[27]
Johnson J, Douze M, and Jégou H Billion-scale similarity search with GPUs IEEE Trans. Big Data 2021 7 3 535-547
[28]
Pobrotyn, P., Bialobrzeski, R.: NeuralNDCG: direct optimisation of a ranking metric via differentiable relaxation of sorting. arXiv preprint arXiv:2102.07831 (2021)
[29]
Hull, D.A.: Xerox TREC-8 question answering track report. In: TREC (1999)
[30]
SQLNet. Generating structured queries from natural language without reinforcement learning. arXiv (2017)
[31]
From databases to natural language: the unusual direction. In: Proceedings of the International Conference on Application Natural Language Information System, pp. 12–16 (2008)
[32]
DBMSs should talk back too. In: Conference on Innovative Data System Research (2009)
[33]
Shu, C., Zhang, Y., Dong, X., Shi, P., Yu, T., Zhang, R.: Logic-consistency text generation from semantic parses. In: Association for Computational Linguistics, pp. 4414–4426 (2021)
[34]
Yang, L., et al.: Beyond factoid QA: effective methods for non-factoid answer sentence retrieval. In: Proceedings of the 38th European Conference on Information Retrieval, pp. 115–128 (2016)
[35]
Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.: An empirical study on learning to rank of tweets. In: Proceeding of the International Conference on Computational Linguistics, pp. 295–303 (2010)
[36]
Learning to Rank for Information Retrieval, pp. 1–285. Springer (2011)
[37]
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. Association for Computational Linguistics (2016)
[38]
Sun, X., Tang, H., Zhang, F., Cui, Y., Jin, B., Wang, Z.: TABLE: a task-adaptive BERT-based ListwisE ranking model for document retrieval. In: Proceedings of the ACM 29th International Conference on Information Knowledge Management, pp. 2233–2236 (2020)
[39]
Cao, Z., Qin, T., Liu, T., Tsai, M., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings International Conference on Machine Learning, pp. 129–136 (2007)
[40]
Bloom BH Space/time trade-offs in hash coding with allowable errors Commun. ACM 1970 13 4 422-426
[41]
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)
[42]
Sun, R., et al.: SQL-PaLM: improved large language model adaptation for text-to-SQL. arXiv preprint arXiv:2306.00739 (2023)
[43]
Simitsis, A., Koutrika, G.: Comprehensible answers to précis qeries. In: International Conference on Advanced Information System Engineering, pp. 142–156 (2006)
[44]
Simitsis, A., Koutrika, G., Alexandrakis, Y., Ioannidis, Y.E.: Synthesizing structured text from logical database subsets. In: International Conference Extended Database Technology, pp. 428–439 (2008)
[45]
Zhuang, H., et al.: RankT5: fine-tuning T5 for text ranking with ranking losses. arXiv preprint arXiv:2210.10634 (2022)
[46]
Yates, A., Nogueira, R.F., Lin, J.: Pretrained transformers for text ranking: BERT and beyond. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 2666–2668 (2021)
[47]
Lin, J., Nogueira, R.F., Yates, A.: Pretrained transformers for text ranking, BERT and beyond. arXiv preprint arXiv:2010.06467 (2020)
[48]
Nogueira, R.F., Jiang, Z., Pradeep, R., Lin, J.: Document ranking with a pretrained sequence-to-sequence model. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 708–718 (2020)
[49]
Ju, J., Yang, J., Wang, C.: Text-to-text multi-view learning for passage re-ranking. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 1803–1807 (2021)
[50]
Li, H., Zhang, J., Li, C., Chen, H.: Decoupling schema linking and skeleton parsing for text-to-SQL. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13067–13075 (2023)
[51]
Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners. In: Proceedings of the Advances Neural Informational Processing System (2020)
[52]
Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: ZeRO: memory optimizations toward training trillion parameter models. In: International Conference on High Perfomence Computer Network Storage Analysis, vol. 20 (2020)
[53]
Fan, Y., et al.: MetaSQL: a generate-then-rank framework for natural language to SQL translation. arXiv preprint arXiv:2402.17144 (2024)
[54]
Ren, T., et al.: PURPLE: making a large language model a better SQL writer. arXiv preprint arXiv:2403.20014 (2024)
[55]
Pourreza, M., Rafiei, D.: DIN-SQL: decomposed in-context learning of text-to-SQL with self-correction. In: Proceedings of the Advances Neural Information Processing System (2023)
[56]
Wang, B., et al.: MAC-SQL: a multi-agent collaborative framework for text-to-SQL. arXiv preprint arXiv:2312.11242 (2023)
[57]
Li, H., et al.: CodeS: towards building open-source language models for text-to-SQL. arXiv preprint arXiv:2402.16347 (2024)
Index Terms
- Gar: Natural Language to SQL Translation with Efficient Generate-and-Rank
Index terms have been assigned to the content through auto-classification.
Recommendations
A complete translation from SPARQL into efficient SQL
IDEAS '09: Proceedings of the 2009 International Database Engineering & Applications SymposiumThis paper presents a feature-complete translation from SPARQL, the proposed standard for RDF querying, into efficient SQL. We propose "SQL model"-based algorithms that implement each SPARQL algebra operator via SQL query augmentation, and generate a ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Aug 2024
524 pages
ISBN:978-981-97-7237-7
DOI:10.1007/978-981-97-7238-4
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
Published: 31 August 2024
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024