research-article

Public Access

Text-to-ESQ: A Two-Stage Controllable Approach for Efficient Retrieval of Vaccine Adverse Events from NoSQL Database

Authors:

Ping WangAuthors Info & Claims

BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Article No.: 54, Pages 1 - 10

https://doi.org/10.1145/3584371.3613008

Published: 04 October 2023 Publication History

BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Text-to-ESQ: A Two-Stage Controllable Approach for Efficient Retrieval of Vaccine Adverse Events from NoSQL Database

Pages 1 - 10

Abstract
References

Abstract

The Vaccine Adverse Event Reporting System (VAERS) contains detailed reports of adverse events following vaccine administration. However, efficiently and accurately searching for specific information from VAERS poses significant challenges, especially for medical experts. Natural language querying (NLQ) methods tackle the challenge by translating the input questions into executable queries, allowing for the exploration of complex databases with large amounts of information. Most existing studies focus on the relational database and solve the Text-to-SQL task. However, the capability of full-text for Text-to-SQL is greatly limited by the data structures and functionality of the SQL databases. In addition, the potential of natural language querying has not been comprehensively explored in the healthcare domain. To overcome these limitations, we investigate the potential of NoSQL databases, specifically Elasticsearch, and forge a new research direction for NLQ, which we refer to as Text-to-ESQ generation. This exploration requires us to re-design various aspects of NLQ, such as the target application and the advantages of NoSQL database. In our approach, we develop a two-stage controllable (TSC) framework consisting of a question-to-question (Q2Q) translation module and an ESQ condition extraction (ECE) module. These modules are carefully designed to efficiently retrieve information from the VEARS data stored in a NoSQL database. Additionally, we construct a dedicated question-ESQ pair dataset called VAERSESQ, to support the task in the healthcare domain. Extensive experiments were conducted on the VAERSESQ dataset to evaluate the proposed methods. The results, both quantitative and qualitative, demonstrate the accuracy and efficiency of our approach in generating queries for NoSQL databases, thus enabling efficient retrieval of VEARS data.

References

[1]

2000. Resource Description Framework (RDF). https://www.elastic.co/cn/.

[2]

Asma Ben Abacha, Sadid A Hasan, Vivek V Datla, Joey Liu, Dina Demner-Fushman, and Henning Müller. 2019. VQA-Med: Overview of the medical visual question answering task at ImageCLEF 2019. CLEF (Working Notes) 2, 6 (2019).

[3]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision. 2425--2433.

Digital Library

[4]

Asma Ben Abacha and Pierre Zweigenbaum. 2012. Medical question answering: translating medical questions into sparql queries. In Proceedings of the 2nd ACM SIGHIT international health informatics symposium. 41--50.

Digital Library

[5]

Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, and Kai Yu. 2021. LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and NonLocal Relations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2541--2555.

[6]

James Clifford. 1988. Natural language querying of historical databases. Computational Linguistics 14, 4 (1988), 10--34.

Digital Library

[7]

Wanyun Cui, Yanghua Xiao, Haixun Wang, Yangqiu Song, Seung-won Hwang, and Wei Wang. 2019. KBQA: learning question answering over QA corpora and knowledge bases. In Proceedings of the VLDB Endowment. 565--576.

[8]

Sasa M Dekleva. 1994. Is natural language querying practical? ACM SIGMIS Database: the DATABASE for Advances in Information Systems 25, 2 (1994), 24--36.

Digital Library

[9]

Li Dong and Mirella Lapata. 2018. Coarse-to-Fine Decoding for Neural Semantic Parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 731--742.

[10]

Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, et al. 2021. Beyond english-centric multilingual machine translation. The Journal of Machine Learning Research 22, 1 (2021), 4839--4886.

Digital Library

[11]

Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 457--468.

[12]

Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, John R Woodward, John Drake, and Qiaofu Zhang. 2021. Natural SQL: Making SQL Easier to Infer from Natural Language Specifications. In Findings of the Association for Computational Linguistics: EMNLP 2021. 2030--2042.

[13]

Sadid A Hasan, Yuan Ling, Oladimeji Farri, Joey Liu, Henning Müller, and Matthew P Lungren. 2018. Overview of ImageCLEF 2018 Medical Domain Visual Question Answering Task. In CLEF (Working Notes).

[14]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[15]

Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).

[16]

Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Qin, Yanyang Li, Bowen Li, Jian Sun, and Yongbin Li. 2022. S2SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers. In Findings of the Association for Computational Linguistics: ACL 2022. 1254--1262.

[17]

Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. 2019.QA: A Dataset for Biomedical Research Question Answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2567--2577.

[18]

Qiao Jin, Zheng Yuan, Guangzhi Xiong, Qianlan Yu, Huaiyuan Ying, Chuanqi Tan, Mosha Chen, Songfang Huang, Xiaozhong Liu, and Sheng Yu. 2022. Biomedical question answering: A survey of approaches and challenges. ACM Computing Surveys (CSUR) 55, 2 (2022), 1--36.

Digital Library

[19]

Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. Dvqa: Understanding data visualizations via question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5648--5656.

[20]

Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension. In Proceedings of the IEEE Conference on Computer Vision and Pattern recognition. 4999--5007.

[21]

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171--4186.

[22]

Hyeonji Kim, Byeong-Hoon So, Wook-Shin Han, and Hongrae Lee. 2020. Natural language to SQL: where are we today? Proceedings of the VLDB Endowment 13, 10 (2020), 1737--1750.

Digital Library

[23]

Chuan Lei, Fatma Özcan, Abdul Quamar, Ashish R Mittal, Jaydeep Sen, Diptikalyan Saha, and Karthik Sankaranarayanan. 2018. Ontology-based natural language query interfaces for data exploration. (2018).

[24]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).

[25]

Fei Li and Hosagrahar V Jagadish. 2014. Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment 8, 1 (2014), 73--84.

Digital Library

[26]

Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. In Findings of the Association for Computational Linguistics: EMNLP 2020. 4870--4888.

[27]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).

[28]

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1412--1421.

[29]

Anusri Pampari, Preethi Raghavan, Jennifer Liang, and Jian Peng. 2018. emrqa: A large corpus for question answering on electronic medical records. arXiv preprint arXiv:1809.00732 (2018).

[30]

Junwoo Park, Youngwoo Cho, Haneol Lee, Jaegul Choo, and Edward Choi. 2021. Knowledge graph-based question answering with electronic health records. In Machine Learning for Healthcare Conference. PMLR, 36--53.

[31]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).

[32]

Ana-Maria Popescu, Oren Etzioni, and Henry Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 8th international conference on Intelligent user interfaces. 149--157.

[33]

Bowen Qin, Binyuan Hui, Lihan Wang, Min Yang, Jinyang Li, Binhua Li, Ruiying Geng, Rongyu Cao, Jian Sun, Luo Si, et al. 2022. A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions. arXiv preprint arXiv:2208.13629 (2022).

[34]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383--2392.

[35]

Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R Mittal, and Fatma Özcan. 2016. ATHENA: an ontology-driven system for natural language querying over relational data stores. Proceedings of the VLDB Endowment 9, 12 (2016), 1209--1220.

Digital Library

[36]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).

[37]

Torsten Scholak, Nathan Schucher, and Dzmitry Bahdanau. 2021. PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 9895--9901.

[38]

Jaydeep Sen, Fatma Ozcan, Abdul Quamar, Greg Stager, Ashish Mittal, Manasa Jammi, Chuan Lei, Diptikalyan Saha, and Karthik Sankaranarayanan. 2019. Natural language querying of complex business intelligence queries. In Proceedings of the 2019 International Conference on Management of Data. 1997--2000.

Digital Library

[39]

Saeedeh Shekarpour, Soren Auer, Axel-Cyrille Ngonga Ngomo, Daniel Gerber, Sebastian Hellmann, and Claus Stadler. 2011. Keyword-driven sparql query generation leveraging background knowledge. In 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. IEEE, 203--210.

Digital Library

[40]

Simon Suster and Walter Daelemans. 2018. CliCR: a Dataset of Clinical Case Reports for Machine Reading Comprehension. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1551--1563.

[41]

George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos, et al. 2015. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics 16, 1 (2015), 138.

[42]

Ping Wang, Tian Shi, Khushbu Agarwal, Sutanay Choudhury, and Chandan K Reddy. 2022. Attention-based aspect reasoning for knowledge base question answering on clinical notes. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 1--6.

Digital Library

[43]

Ping Wang, Tian Shi, and Chandan K Reddy. 2020. Text-to-SQL generation for question answering on electronic medical records. In Proceedings of The Web Conference 2020. 350--361.

Digital Library

[44]

Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir R Radev, Richard Socher, and Caiming Xiong. 2021. GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing. In ICLR.

[45]

Tao Yu, Rui Zhang, Heyang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, et al. 2019. CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 1962--1979.

[46]

Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, et al. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3911--3921.

[47]

Hamid Zafar, Giulio Napolitano, and Jens Lehmann. 2018. Formal query generation for question answering over knowledge bases. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3--7, 2018, Proceedings 15. Springer, 714--728.

Digital Library

[48]

Rui Zhang, Tao Yu, Heyang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, and Dragomir Radev. 2019. Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5338--5349.

[49]

Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017).

Cited By

Özer ACekinel RToroslu IKaragoz P(2025)DocSpider: a dataset of cross-domain natural language querying for MongoDBNatural Language Processing10.1017/nlp.2024.63(1-32)Online publication date: 12-Feb-2025
https://doi.org/10.1017/nlp.2024.63
Hornsteiner MKreussel MSteindl CEbner FEmpl PSchönig S(2024)Real-Time Text-to-Cypher Query Generation with Large Language Models for Graph DatabasesFuture Internet10.3390/fi1612043816:12(438)Online publication date: 22-Nov-2024
https://doi.org/10.3390/fi16120438
Zhang WShi TWang P(2024)Natural Language Querying on NoSQL Databases: Opportunities and Challenges [Vision Paper]2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825998(3353-3357)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825998
Show More Cited By

Index Terms

Text-to-ESQ: A Two-Stage Controllable Approach for Efficient Retrieval of Vaccine Adverse Events from NoSQL Database

Recommendations

Incorporating NoSQL into a database course

This article introduces the concepts of Big Data and NoSQL and describes a semester long web-based project that uses both a relational database (Oracle 11g) and a NoSQL (MongoDB) database for an undergraduate database course. The relational database ...
Model Transformation From Object Relational Database to NoSQL Document Database
NISS '19: Proceedings of the 2nd International Conference on Networking, Information Systems & Security

With the high increase of data growing, NoSQL databases play a key role in storing large amount of data, to gain flexibility and scalability many industries are now replacing their relational and object relational databases by adopting NoSQL database ...
Model Transformation From Object Relational Database to NoSQL Column Based Database
NISS '20: Proceedings of the 3rd International Conference on Networking, Information Systems & Security

NoSQL databases play an important role in saving a huge amount of data. To benefit from the advantages of horizontal scalability and flexibility and with the fast data growing many companies are now replacing their traditional database management ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

September 2023

626 pages

ISBN:9798400701269

DOI:10.1145/3584371

General Chairs:
May D. Wang
SGeorgia Institute of Technology
,
Byung-Jun Yoon

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGBio: ACM Special Interest Group on Bioinformatics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

BCB '23

Sponsor:

SIGBio

BCB '23: 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

September 3 - 6, 2023

TX, Houston, USA

Acceptance Rates

Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
186
Total Downloads

Downloads (Last 12 months)130
Downloads (Last 6 weeks)36

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Özer ACekinel RToroslu IKaragoz P(2025)DocSpider: a dataset of cross-domain natural language querying for MongoDBNatural Language Processing10.1017/nlp.2024.63(1-32)Online publication date: 12-Feb-2025
https://doi.org/10.1017/nlp.2024.63
Hornsteiner MKreussel MSteindl CEbner FEmpl PSchönig S(2024)Real-Time Text-to-Cypher Query Generation with Large Language Models for Graph DatabasesFuture Internet10.3390/fi1612043816:12(438)Online publication date: 22-Nov-2024
https://doi.org/10.3390/fi16120438
Zhang WShi TWang P(2024)Natural Language Querying on NoSQL Databases: Opportunities and Challenges [Vision Paper]2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825998(3353-3357)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825998
Yang EHu PHan XNing Y(2024)MPLite: Multi-Aspect Pretraining for Mining Clinical Health Records2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825511(5096-5102)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825511
Zhang WHe CYang GBandyopadhyay DShi TWang P(2024)Natural Language Querying on Domain-Specific NoSQL Database with Large Language Models2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM62325.2024.10822485(5174-5181)Online publication date: 3-Dec-2024
https://doi.org/10.1109/BIBM62325.2024.10822485

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten