research-article

LLMs as an Interactive Database Interface for Designing Large Queries

Authors:

Yilin Li,

Deddy JobsonAuthors Info & Claims

HILDA 24: Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics

Pages 1 - 7

https://doi.org/10.1145/3665939.3665969

Published: 18 June 2024 Publication History

Get Access

Abstract

Text2SQL is typically considered a one-shot process where the user gives a natural language query and receives an SQL query in return. This approach is fraught with potential concerns, such as syntactical errors, logical mismatches, and schema hallucination, which often require time-consuming validations by end users. These challenges are exacerbated by the complexity of large queries typical in industry settings and the inherent ambiguity of natural language. To address these limitations, we propose a system that employs an iterative process for both query creation and validation, ensuring that the resulting data set meets the user's expectations. We tested this system against existing text-to-SQL LLM approaches using a standard industry use case, showcasing our system's ability to deliver coherent and accurate outcomes. Opportunities for future research to further refine this approach are also discussed¹.

References

[1]

G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. 2002. Keyword searching and browsing in databases using BANKS. In Proceedings 18th International Conference on Data Engineering. 431--440. ISSN: 1063-6382.

Crossref

Google Scholar

[2]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs].

Crossref

Google Scholar

[3]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. arXiv:2204.02311 [cs].

Crossref

Google Scholar

[4]

Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. 2003. Efficient IR-style keyword search over relational databases. In Proceedings of the 29th international conference on Very large data bases - Volume 29 (VLDB '03, Vol. 29). VLDB Endowment, Berlin, Germany, 850--861.

Crossref

Google Scholar

[5]

George Katsogiannis-Meimarakis and Georgia Koutrika. 2023. A survey on deep learning approaches for text-to-SQL. The VLDB Journal 32, 4 (July 2023), 905--936.

Digital Library

Google Scholar

[6]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459--9474.

Google Scholar

[7]

Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, et al. 2024. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems 36 (2024).

Google Scholar

[8]

Aiwei Liu, Xuming Hu, Lijie Wen, and Philip S. Yu. 2023. A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability. arXiv:2303.13547 [cs].

Crossref

Google Scholar

[9]

Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, and William Yang Wang. 2023. Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies. arXiv:2308.03188 [cs].

Crossref

Google Scholar

[10]

Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. arXiv:2304.11015 [cs].

Crossref

Google Scholar

[11]

Nitarshan Rajkumar, Raymond Li, and Dzmitry Bahdanau. 2022. Evaluating the Text-to-SQL Capabilities of Large Language Models. arXiv:2204.00498 [cs].

Crossref

Google Scholar

[12]

Chang-You Tai, Ziru Chen, Tianshu Zhang, Xiang Deng, and Huan Sun. 2023. Exploring Chain-of-Thought Style Prompting for Text-to-SQL. arXiv:2305.14215 [cs].

Crossref

Google Scholar

[13]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824--24837.

Google Scholar

[14]

J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny Can't Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23). Association for Computing Machinery, New York, NY, USA, 1--21.

Digital Library

Google Scholar

Index Terms

LLMs as an Interactive Database Interface for Designing Large Queries

Recommendations

Answering imprecise database queries: a novel approach
WIDM '03: Proceedings of the 5th ACM international workshop on Web information and data management

A growing number of databases especially those published on the Web are becoming available to external users. Users of these databases are provided simple form-based query interfaces that hide the underlying schematic details. Constrained by the ...
Global Top-k Aggregate Queries Based on X-tuple in Uncertain Database
WAINA '10: Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops

A Top-k aggregate query, which is a powerful technique when dealing with large quantity of data, ranks groups of tuples by their aggregate values and returns k groups with the highest aggregate values. However, compared to Top-k in traditional databases,...
Two Novel Semantics of Top-k Queries Processing in Uncertain Database
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology

Top-k query is a powerful technique in uncertain databases because of the existence of exponential possible worlds, and it is necessary to combine score and confidence of tuples to derive top k answers. Different semantics, the combination methods of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

HILDA 24: Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics

June 2024

91 pages

ISBN:9798400706936

DOI:10.1145/3665939

Program Chairs:
Jean-Daniel Fekete
Inria & Université Paris-Saclay
,
Behrooz Omidvar-Tehrani
AWS AI Labs
,
Kexin Rong
Georgia Institute of Technology
,
Roee Shraga
Worcester Polytechnic Institute

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HILDA 24

Sponsor:

SIGMOD

HILDA 24: 2024 Workshop on Human-In-the-Loop Data Analytics

June 14, 2024

AA, Santiago, Chile

Acceptance Rates

Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
122
Total Downloads

Downloads (Last 12 months)122
Downloads (Last 6 weeks)26

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Answering imprecise database queries: a novel approach

Global Top-k Aggregate Queries Based on X-tuple in Uncertain Database

Two Novel Semantics of Top-k Queries Processing in Uncertain Database