Article

Gar $+ +$ : Natural Language to SQL Translation with Efficient Generate-and-Rank

Authors:

Yifan YangAuthors Info & Claims

Web and Big Data: 8th International Joint Conference, APWeb-WAIM 2024, Jinhua, China, August 30 – September 1, 2024, Proceedings, Part III

Pages 411 - 427

https://doi.org/10.1007/978-981-97-7238-4_26

Published: 31 August 2024 Publication History

Abstract

Web applications heavily depend on databases, yet the conventional database interface often presents challenges for efficient data utilization. It is imperative to address the considerable demand emanating from a vast array of end users seeking seamless input of their requirements and effortless retrieval of query results. Natural Language (NL) Interfaces to Databases serve to make databases accessible to end users. Mainstream approaches typically prioritize building language translation models for converting NL queries to SQL queries, while a novel generate-and-rank approach is proposed to achieve this through a procedure involving generation and ranking. Despite yielding superior translation results on the public benchmark, this generate-and-rank approach encounters efficiency issues that may impede its practical application. In this paper, we introduce Gar

+ +

, which extends the existing generate-and-rank approach for a more efficient generation and robust ranking procedure. Specifically, Gar

+ +

utilizes the bloom filter to accelerate the data generation process by reducing unnecessary function calls. Additionally, Gar

+ +

provides a brand-new implementation of the ranking module, specifically the re-ranking model, empowered with enhanced language understanding ability. We evaluate the effectiveness of Gar

+ +

on three public benchmarks, namely GEO, Spider, and Mt-teql. Gar

+ +

achieved an overall accuracy of 66.6% on Geo, 80.6% on Spider, and 78.4% on Mt-teql, respectively.

References

[1]

Fan, Y., et al.: GAR: a generate-and-rank approach for natural language to SQL translation. In: Proceedings of the 39th International Conference on Data Engineering, pp. 110–122 (2023)

[2]

Fan, Y., Ren, T., He, Z., Wang, X., Zhang, Y., Li, X.: GenSQL: a generative natural language interface to database systems. In: Proceedings of the 39th International Conference on Data Engineering, pp. 3603–3606 (2023)

[3]

Bogin, B., Berant, J., Gardner, M.: Representing schema structure with graph neural networks for text-to-SQL parsing. In: Association for Computational Linguistics, pp. 4560–4565 (2019)

[4]

Guo, J., et al.: Towards complex text-to-SQL in cross-domain database with intermediate representation. In: Association for Computational Linguistics, pp. 4524–4535 (2019)

[5]

Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-SQL parsers. In: Association for Computational Linguistics, pp. 7567–7578 (2020)

[6]

Lin, X.V., Socher, R., Xiong, C.: Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 4870–4888 (2020)

[7]

Rubin, O., Berant, J.: SmBoP: semi-autoregressive bottom-up semantic parsing. In: Proceedings of the Human Language Technology Annual Conference North American Chapter Association Computational Linguistics, pp. 311–324 (2021)

[8]

Shi, P., et al.: Learning contextual representations for semantic parsing with generation-augmented pre-training. In: Proceedings of the AAAI Conference Artificial Intelligence, pp. 13806–13814 (2021)

[9]

Scholak, T., Schucher, N., Bahdanau, D.: PICARD: parsing incrementally for constrained auto-regressive decoding from language models. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 9895–9901 (2021)

[10]

Yu, T., et al.: Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 3911–3921 (2018)

[11]

Ma P and Wang S MT-Teql: evaluating and augmenting neural NLIDB on real-world linguistic and schema variations Proc. VLDB Endowment 2021 15 3 569-582

Digital Library

[12]

Suhr, A., Chang, M., Shaw, P., Lee, K.: Exploring unexplored generalization challenges for cross-database semantic parsing. In: Association for Computational Linguistics, pp. 8372–8388 (2020)

[13]

Zelle, J.M., Mooney, R.J.: Learning to parse database queries using inductive logic programming. In: Proceedings of the AAAI Conference Artificial Intelligence, pp. 1050–1055 (1996)

[14]

Gan, Y., et al.: Towards robustness of text-to-SQL models against synonym substitution. In: Association for Computational Linguistics, pp. 2505–2515 (2021)

[15]

Deng, X., Awadallah, A.H., Meek, C., Polozov, O., Sun, H., Richardson, M.: Structure-grounded pretraining for text-to-SQL. In: Proceedings of the Human Language Technology Annual Conference North America Chapter Association Computation Linguistics, pp. 1337–1350 (2021)

[16]

Gan,Y., Chen, X. Purver, M.: Exploring underexplored limitations of cross-domain text-to-SQL generalization. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 8926–8931 (2021)

[17]

Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Explaining structured queries in natural language. In: Proceedings of the 12st International Conference on Data Engineering, pp. 333–344 (2010)

[18]

Xu, K., Wu, L., Wang, Z., Feng, Y., Sheinin, V.: SQL-to-text generation with graph-to-sequence model. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 931–936 (2018)

[19]

Dai, Z., Callan, J.: Deeper text understanding for IR with contextual neural language modeling. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 985–988 (2019)

[20]

Han, S., Wang, X., Bendersky, M., Najork, M.: Learning-to-rank with BERT in TF-ranking. arXiv preprint arXiv:2004.08476 (2020)

[21]

Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Human Language Technology Annual Conference North America Chapter Association Computational Linguistics, pp. 4171–4186 (2019)

[22]

Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceeding of the IEEE Conference on Computer Vision Pattern Recognition, pp. 815-823 (2015)-

[23]

Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 3980–3990 (2019)

[24]

Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: Association for Computational Linguistics, pp. 963–973 (2017)

[25]

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)

[26]

Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

[27]

Johnson J, Douze M, and Jégou H Billion-scale similarity search with GPUs IEEE Trans. Big Data 2021 7 3 535-547

[28]

Pobrotyn, P., Bialobrzeski, R.: NeuralNDCG: direct optimisation of a ranking metric via differentiable relaxation of sorting. arXiv preprint arXiv:2102.07831 (2021)

[29]

Hull, D.A.: Xerox TREC-8 question answering track report. In: TREC (1999)

[30]

SQLNet. Generating structured queries from natural language without reinforcement learning. arXiv (2017)

[31]

From databases to natural language: the unusual direction. In: Proceedings of the International Conference on Application Natural Language Information System, pp. 12–16 (2008)

[32]

DBMSs should talk back too. In: Conference on Innovative Data System Research (2009)

[33]

Shu, C., Zhang, Y., Dong, X., Shi, P., Yu, T., Zhang, R.: Logic-consistency text generation from semantic parses. In: Association for Computational Linguistics, pp. 4414–4426 (2021)

[34]

Yang, L., et al.: Beyond factoid QA: effective methods for non-factoid answer sentence retrieval. In: Proceedings of the 38th European Conference on Information Retrieval, pp. 115–128 (2016)

[35]

Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.: An empirical study on learning to rank of tweets. In: Proceeding of the International Conference on Computational Linguistics, pp. 295–303 (2010)

[36]

Learning to Rank for Information Retrieval, pp. 1–285. Springer (2011)

[37]

Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. Association for Computational Linguistics (2016)

[38]

Sun, X., Tang, H., Zhang, F., Cui, Y., Jin, B., Wang, Z.: TABLE: a task-adaptive BERT-based ListwisE ranking model for document retrieval. In: Proceedings of the ACM 29th International Conference on Information Knowledge Management, pp. 2233–2236 (2020)

[39]

Cao, Z., Qin, T., Liu, T., Tsai, M., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings International Conference on Machine Learning, pp. 129–136 (2007)

[40]

Bloom BH Space/time trade-offs in hash coding with allowable errors Commun. ACM 1970 13 4 422-426

Digital Library

[41]

Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020)

[42]

Sun, R., et al.: SQL-PaLM: improved large language model adaptation for text-to-SQL. arXiv preprint arXiv:2306.00739 (2023)

[43]

Simitsis, A., Koutrika, G.: Comprehensible answers to précis qeries. In: International Conference on Advanced Information System Engineering, pp. 142–156 (2006)

[44]

Simitsis, A., Koutrika, G., Alexandrakis, Y., Ioannidis, Y.E.: Synthesizing structured text from logical database subsets. In: International Conference Extended Database Technology, pp. 428–439 (2008)

[45]

Zhuang, H., et al.: RankT5: fine-tuning T5 for text ranking with ranking losses. arXiv preprint arXiv:2210.10634 (2022)

[46]

Yates, A., Nogueira, R.F., Lin, J.: Pretrained transformers for text ranking: BERT and beyond. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 2666–2668 (2021)

[47]

Lin, J., Nogueira, R.F., Yates, A.: Pretrained transformers for text ranking, BERT and beyond. arXiv preprint arXiv:2010.06467 (2020)

[48]

Nogueira, R.F., Jiang, Z., Pradeep, R., Lin, J.: Document ranking with a pretrained sequence-to-sequence model. In: Proceedings of the Conference on Empirical Methods Natural Language Process, pp. 708–718 (2020)

[49]

Ju, J., Yang, J., Wang, C.: Text-to-text multi-view learning for passage re-ranking. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 1803–1807 (2021)

[50]

Li, H., Zhang, J., Li, C., Chen, H.: Decoupling schema linking and skeleton parsing for text-to-SQL. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13067–13075 (2023)

[51]

Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners. In: Proceedings of the Advances Neural Informational Processing System (2020)

[52]

Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: ZeRO: memory optimizations toward training trillion parameter models. In: International Conference on High Perfomence Computer Network Storage Analysis, vol. 20 (2020)

[53]

Fan, Y., et al.: MetaSQL: a generate-then-rank framework for natural language to SQL translation. arXiv preprint arXiv:2402.17144 (2024)

[54]

Ren, T., et al.: PURPLE: making a large language model a better SQL writer. arXiv preprint arXiv:2403.20014 (2024)

[55]

Pourreza, M., Rafiei, D.: DIN-SQL: decomposed in-context learning of text-to-SQL with self-correction. In: Proceedings of the Advances Neural Information Processing System (2023)

[56]

Wang, B., et al.: MAC-SQL: a multi-agent collaborative framework for text-to-SQL. arXiv preprint arXiv:2312.11242 (2023)

[57]

Li, H., et al.: CodeS: towards building open-source language models for text-to-SQL. arXiv preprint arXiv:2402.16347 (2024)

Index Terms

Gar $+ +$ : Natural Language to SQL Translation with Efficient Generate-and-Rank
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization
    2. Query languages
  2. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Index terms have been assigned to the content through auto-classification.

Recommendations

Sql: Learn Basics of Queries and Implement Easily (sql programming, SQL 2016, sql database programming, sql for beginners, sql beginners guide, sql ... sql workbook, sql guide, MSSQL) (Volume 1)
SQL: The Complete Reference, Second Edition
A complete translation from SPARQL into efficient SQL
IDEAS '09: Proceedings of the 2009 International Database Engineering & Applications Symposium

This paper presents a feature-complete translation from SPARQL, the proposed standard for RDF querying, into efficient SQL. We propose "SQL model"-based algorithms that implement each SPARQL algebra operator via SQL query augmentation, and generate a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Web and Big Data: 8th International Joint Conference, APWeb-WAIM 2024, Jinhua, China, August 30 – September 1, 2024, Proceedings, Part III

Aug 2024

524 pages

ISBN:978-981-97-7237-7

DOI:10.1007/978-981-97-7238-4

Editors:
Wenjie Zhang
University of New South Wales, Sydney, NSW, Australia
,
Anthony Tung
National University of Singapore, Queenstown, Singapore
,
Zhonglong Zheng
Zhejiang Normal University, Jinhua, China
,
Zhengyi Yang
University of New South Wales, Sydney, NSW, Australia
,
Xiaoyang Wang
University of New South Wales, Sydney, NSW, Australia
,
Hongjie Guo
Zhejiang Normal University, Jinhua, China

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 31 August 2024

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents