Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3605098.3636026acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article
Open access

Knowledge Synthesis using Large Language Models for a Computational Biology Workflow Ecosystem

Published: 21 May 2024 Publication History

Abstract

An understanding of the molecular basis of musculoskeletal pain is necessary for the development of therapeutics, their management, and possible personalization. One-in-three Americans use OTC pain killers, and one tenth use prescription drugs to manage pain. The CDC also estimates that about 20% Americans suffer from chronic pain. As the experience of acute or chronic pain varies due to individual genetics and physiology, it is imperative that researchers continue to find novel therapeutics to treat or manage symptoms. In this paper, our goal is to develop a seed knowledgebase computational platform, called BioNursery, that will allow biologists to computationally hypothesize, define and test molecular mechanisms underlying pain. In our knowledge ecosystem, we accumulate curated information from users about the relationships among biological databases, analysis tools, and database contents to generate biological analyses modules, called π-graphs, or process graphs. We propose a mapping function from a natural language description of a hypothesized molecular model to a computational workflow for testing in BioNursery. We use a crowd computing feedback and curation system, called Explorer, to improve proposed computational models for molecular mechanism discovery, and growing the knowledge ecosystem. Since the pain knowledge ecosystem does not yet exist, we validate our approach over a similar application in fertility research.

References

[1]
[n. d.]. Allen Institute. https://alleninstitute.org/. Accessed March 24, 2023.
[2]
Mario Luca Bernardi, Marta Cimitile, Giuseppe A. Di Lucca, and Fabrizio Maria Maggi. 2012. Using Declarative Workflow Languages to Develop Process-Centric Web Applications. In 16th IEEE EDOC Workshops, Beijing, China, September 10--14. 56--65.
[3]
Sridevi Bonthu, S. Rama Sree, and Munaga H. M. Krishna Prasad. 2021. Text2PyCode: Machine Translation of Natural Language Intent to Python Source Code. In International Cross-Domain Conference, CD-MAKE 2021, August 17--20 (LNCS, Vol. 12844). Springer, 51--60.
[4]
Tathagata Chakraborti, Yara Rizk, Vatche Isahagian, Burak Aksar, and Francesco Fuggitti. 2022. From Natural Language to Workflows: Towards Emergent Intelligence in Robotic Process Automation. In BPM 2022 Blockchain, RPA, and CEE Forum, Münster, Germany, September 11--16, 2022 (LNBI, Vol. 459). 123--137.
[5]
Lonnie Chrisman, Pat Langley, and Stephen D. Bay. 2003. Incorporating Biological Knowledge into Evaluation of Causal Regulatory Hypotheses. In PSB 2003, Lihue, Hawaii, USA, January 3--7. 128--139.
[6]
Caron A. C. Clark, Tomáš Helikar, and Joseph Dauer. 2020. Simulating a Computational Biological Model, Rather Than Reading, Elicits Changes in Brain Activity during Biological Reasoning. CBE, ÄîLife Sciences Education 19, 3 (2020), ar45.
[7]
Han Fu, Chang Liu, Bin Wu, Feifei Li, Jian Tan, and Jianling Sun. 2023. CatSQL: Towards Real World Natural Language to SQL Applications. Proc. VLDB Endow. 16, 6 (2023), 1534--1547.
[8]
Avigdor Gal, Giovanni Modica, Hasan M Jamil, and Ami Eyal. 2005. Automatic ontology matching using application semantics. AI Magazine 26, 1 (2005), 21--31.
[9]
Melissa K. Gardner, David J. Odde, and Kerry Bloom. 2007. Hypothesis testing via integrated computer modeling and digital fluorescence microscopy. Methods 41, 2 (2007), 232--237. Methods in Cell Cycle Research.
[10]
Fausto Giunchiglia, Aliaksandr Autayeu, and Juan Pane. 2012. S-Match: An open source framework for matching lightweight ontologies. Semantic Web 3, 3 (2012), 307--317.
[11]
Hasan Jamil, Aminul Islam, and Shahriyar Hossain. 2010. A declarative language and toolkit for scientific workflow implementation and execution. Int. J. Bus. Process. Integr. Manag. 5, 1 (2010), 3--17.
[12]
Hasan Jamil and Kallol Naha. 2023. Mapping Strategies for Declarative Queries over Online Heterogeneous Biological Databases for Intelligent Responses. In SAC 2023, Tallinn, Estonia. March 27 - 31. ACM.
[13]
Hasan M. Jamil. 2015. Improving Integration Effectiveness of ID Mapping Based Biological Record Linkage. IEEE/ACM Trans. Comput. Biology Bioinform. 12, 2 (2015), 473--486.
[14]
Hasan M. Jamil. 2017. Knowledge Rich Natural Language Queries over Structured Biological Databases. In BCB 2017, Boston, MA, USA, August 20--23. 352--361.
[15]
Hasan M. Jamil and Fereidoon Sadri. 2018. Crowd enabled curation and querying of large and noisy text mined protein interaction data. Distributed and Parallel Databases 36, 1 (2018), 9--45.
[16]
Shaini Joseph and Smita D Mahale. 2021. Male Infertility Knowledgebase: decoding the genetic and disease landscape. Database 2021 (08 2021).
[17]
Minoru Kanehisa, Michihiro Araki, Susumu Goto, Masahiro Hattori, Mika Hirakawa, Masumi Itoh, Toshiaki Katayama, Shuichi Kawashima, Shujiro Okuda, Toshiaki Tokimatsu, and Yoshihiro Yamanishi. 2008. KEGG for linking genomes to life and the environment. NAR 36, Database-Issue (2008), 480--484.
[18]
Guang-Hui Liu and Others. 2021. Aging Atlas: a multi-omics database for aging biology. Nucleic Acids Res. 49, Database-Issue (2021), D825--D830.
[19]
Laura Medlock, Kazutaka Sekiguchi, Sungho Hong, Salvador Dura-Bernal, William W. Lytton, and Steven A. Prescott. 2022. Multiscale Computer Model of the Spinal Dorsal Horn Reveals Changes in Network Processing Associated with Chronic Pain. Journal of Neuroscience 42, 15 (2022), 3133--3149.
[20]
Saqib Mir, Steffen Staab, and Isabel Rojas. 2009. Web-Prospector - An Automatic, Site-Wide Wrapper Induction Approach for Scientific Deep-Web Databases. In BTW. 87--106.
[21]
Xin Mou and Hasan M. Jamil. 2020. Visual Life Sciences Workflow Design Using Distributed and Heterogeneous Resources. IEEE ACM Trans. Comput. Biol. Bioinform. 17, 4 (2020), 1459--1473.
[22]
RL Nahin, BJ Stussman, and PM Herman. 2015. Out-Of-Pocket Expenditures on Complementary Health Approaches Associated With Painful Health Conditions in a Nationally Representative Adult Sample. J Pain 16, 11 (Nov 2015), 1147--1162.
[23]
Saket Navlakha, Michael C. Schatz, and Carl Kingsford. 2009. Revealing Biological Modules via Graph Summarization. J. Comput. Biol. 16, 2 (2009), 253--264.
[24]
OpenAI. 2023. ChatGPT [Large language model]. https://chat.openai.com/chat/
[25]
RS Rasu, K Vouthy, AN Crowl, AE Stegeman, B Fikru, WA Bawa, and ME Knell. 2014. Cost of pain medication to treat adult patients with nonmalignant chronic pain in the United States. J Manag Care Spec Pharm 20, 9 (Sep 2014), 921--928.
[26]
Marco Roos, M. Scott Marshall, Andrew P. Gibson, Martijn J. Schuemie, Edgar Meij, Sophia Katrenko, Willem Robert van Hage, Konstantinos Krommydas, and Pieter W. Adriaans. 2009. Structuring and extracting knowledge for the support of hypothesis generation in molecular biology. BMC Bioinform. 10, S-10 (2009), 9.
[27]
Daniel E. Russ and Others. 2021. A harmonized atlas of mouse spinal cord cell types and their spatial organization. Nature Communications 12, 1 (29 Sep 2021).
[28]
Fereidoon Sadri. 1995. Information Source Tracking Method: Efficiency Issues. IEEE TKDE 7, 6 (1995), 947--954.
[29]
Fereidoon Sadri. 2012. On the foundations of probabilistic information integration. In CIKM, Maui, HI, USA, October 29 - November 02. 882--891.
[30]
Elliot Sollis, Abayomi Mosaku, Ala Abid, Annalisa Buniello, Maria Cerezo, Laurent Gil, Tudor Groza, Osman Gunes, Peggy Hall, James Hayhurst, Arwa Ibrahim, Yue Ji, Sajo John, Elizabeth Lewis, Jacqueline A L MacArthur, Aoife McMahon, David Osumi-Sutherland, Kalliope Panoutsopoulou, Zoe Pendlington, Santhi Ramachandran, Ray Stefancsik, Jonathan Stewart, Patricia Whetzel, Robert Wilson, Lucia Hindorff, Fiona Cunningham, Samuel A Lambert, Michael Inouye, Helen Parkinson, and Laura W Harris. 2022. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. NAR 51, D1 (11 2022), D977--D985.
[31]
Yong Wang, Xiang-Sun Zhang, and Luonan Chen. 2013. Computational systems biology in the big data era. BMC Syst. Biol. 7, S-2 (2013), S1.

Cited By

View all
  • (2024)Smart Science Needs Linked Open Data with a Dash of Large Language Models and Extended RelationsProceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3663742.3663971(1-11)Online publication date: 14-Jun-2024
  • (2024)Supporting Data Foragers in Scientific Computing Community Ecosystems for Life SciencesInformation Integration and Web Intelligence10.1007/978-3-031-78093-6_10(118-123)Online publication date: 4-Dec-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
April 2024
1898 pages
ISBN:9798400702433
DOI:10.1145/3605098
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2024

Check for updates

Author Tags

  1. knowledge ecosystem
  2. crowdsourcing
  3. query reformulation

Qualifiers

  • Research-article

Funding Sources

Conference

SAC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)168
  • Downloads (Last 6 weeks)36
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Smart Science Needs Linked Open Data with a Dash of Large Language Models and Extended RelationsProceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3663742.3663971(1-11)Online publication date: 14-Jun-2024
  • (2024)Supporting Data Foragers in Scientific Computing Community Ecosystems for Life SciencesInformation Integration and Web Intelligence10.1007/978-3-031-78093-6_10(118-123)Online publication date: 4-Dec-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media