Exploiting Genomic Relations in Big Data Repositories by Graph-Based Search Methods
Abstract
:1. Introduction
2. Preliminaries
3. The Pharmacogenomics Data Repository LINCS
4. Technical Considerations
5. Conceptual Idea
6. Further Applications
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Edgar, R.; Domrachev, M.; Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30, 207–210. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Holzinger, A.; Jurisica, I. Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. In Interactive Knowledge Discovery and Data Mining in Biomedical Informatics; Springer: Berlin, Germany, 2014; pp. 1–18. [Google Scholar]
- Lamb, J.; Crawford, E.D.; Peck, D.; Modell, J.W.; Blat, I.C.; Wrobel, M.J.; Lerner, J.; Brunet, J.P.; Subramanian, A.; Ross, K.N.; et al. The Connectivity Map: Using gene-expression signatures to connect small molecules, genes, and disease. Science 2006, 313, 1929–1935. [Google Scholar] [CrossRef] [PubMed]
- Ma’ayan, A.; Rouillard, A.; Clark, N.; Wang, Z.; Duan, Q.; Kou, Y. Lean Big Data integration in systems biology and systems pharmacology. Trends Pharmacol. Sci. 2014, 35, 450–460. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Campillos, M.; Kuhn, M.; Gavin, A.C.; Jensen, L.J.; Bork, P. Drug target identification using side-effect similarity. Science 2008, 321, 263–266. [Google Scholar] [CrossRef] [PubMed]
- Subramanian, A.; Narayan, R.; Corsello, S.M.; Peck, D.D.; Natoli, T.E.; Lu, X.; Gould, J.; Davis, J.F.; Tubelli, A.A.; Asiedu, J.K.; et al. A Next Generation Connectivity Map: L1000 Platform And The First 1,000,000 Profiles. BioRxiv 2017. [Google Scholar] [CrossRef] [PubMed]
- Musa, A.; Ghoraie, L.; Zhang, S.D.; Glazko, G.; Yli-Harja, O.; Dehmer, M.; Haibe-Kains, B.; Emmert-Streib, F. A Review of Connectivity Mapping and Computational Approaches in Pharmacogenomics. Brief. Bioinform. 2017, 19, 506–523. [Google Scholar]
- Musa, A.; Tripathi, S.; Kandhavelu, M.; Dehmer, M.; Emmert-Streib, F. Harnessing the biological complexity of Big Data from LINCS gene expression signatures. PLoS ONE 2018, 13, e0201937. [Google Scholar] [CrossRef] [PubMed]
- Vidovic, D.; A, K.; Schurer, S. Large-scale integration of small molecule-induced genome-wide transcriptional responses, Kinome-wide binding affinities and cell-growth inhibition profiles reveal global trends characterizing systems-level drug action. Front. Genet. 2014, 5, 342. [Google Scholar] [PubMed]
- Barrett, T.; Troup, D.B.; Wilhite, S.E.; Ledoux, P.; Evangelista, C.; Kim, I.F.; Tomashevsky, M.; Marshall, K.A.; Phillippy, K.H.; Sherman, P.M.; et al. NCBI GEO: Archive for functional genomics data sets -10 years on. Nucleic Acids Res. 2011, 39, D1005–D1010. [Google Scholar] [CrossRef] [PubMed]
- Codd, E.F. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 1970, 13, 377–387. [Google Scholar] [CrossRef]
- Wiese, L. Advanced Data Management: For SQL, NoSQL, Cloud and Distributed Databases; De Gruyter: Berlin, Germany, 2015. [Google Scholar]
- Angles, R.; Gutierrez, C. Survey of Graph Database Models. ACM Comput. Surv. 2008, 40, 1–39. [Google Scholar] [CrossRef]
- Zou, L.; Chen, L.; Özsu, M.T. Distance-join: Pattern match query in a large graph database. Proc. VLDB Endowment 2009, 2, 886–897. [Google Scholar] [CrossRef]
- Himmelstein, D.S.; Lizee, A.; Hessler, C.; Brueggeman, L.; Chen, S.L.; Hadley, D.; Green, A.; Khankhanian, P.; Baranzini, S.E. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife 2017, 6, e26726. [Google Scholar] [CrossRef] [PubMed]
- Matthews, L.; Gopinath, G.; Gillespie, M.; Caudy, M.; Croft, D.; de Bono, B.; Garapati, P.; Hemish, J.; Hermjakob, H.; Jassal, B.; et al. Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2009, 37, D619–D622. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Swainston, N.; Batista-Navarro, R.; Carbonell, P.; Dobson, P.D.; Dunstan, M.; Jervis, A.J.; Vinaixa, M.; Williams, A.R.; Ananiadou, S.; Faulon, J.L.; et al. biochem4j: Integrated and extensible biochemical knowledge through graph databases. PLoS ONE 2017, 12, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Touré, V.; Mazein, A.; Waltemath, D.; Balaur, I.; Saqi, M.; Henkel, R.; Pellet, J.; Auffray, C. STON: Exploring biological pathways using the SBGN standard and graph databases. BMC Bioinform. 2016, 17, 494. [Google Scholar] [CrossRef] [PubMed]
- Cormen, T.; Leiserson, C.; Rivest, R.; Stein, C. Introduction to Algorithms; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
- Lipski, W.; Marek, W. File organization, an application of graph theory. In Automata, Languages and Programming: 2nd Colloquium, University of Saarbrücken 29 July– 2 August 1974; Loeckx, J., Ed.; Springer: Berlin/Heidelberg, Germany, 1974; pp. 270–279. [Google Scholar]
- Lipski, W. Information storage and retrieval? mathematical foundations II (combinatorial problems). Theor. Comput. Sci. 1976, 3, 183–211. [Google Scholar] [CrossRef]
- Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval; ACM Press: New York, NY, USA, 1999; Volume 463. [Google Scholar]
- Chowdhury, G.G. Introduction to Modern Information Retrieval; Facet Publishing: London, UK, 2010. [Google Scholar]
- Chang, F.; Dean, J.; Ghemawat, S.; Hsieh, W.C.; Wallach, D.A.; Burrows, M.; Chandra, T.; Fikes, A.; Gruber, R.E. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 2008, 26, 4. [Google Scholar] [CrossRef]
- Shoemaker, R.H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 2006, 6, 813–823. [Google Scholar] [CrossRef] [PubMed]
- Brazma, A.; Parkinson, H.; Sarkans, U.; Shojatalab, M.; Vilo, J.; Abeygunawardena, N.; Holloway, E.; Kapushesky, M.; Kemmeren, P.; Lara, G.G.; et al. ArrayExpress-a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2003, 31, 68–71. [Google Scholar] [CrossRef] [PubMed]
- Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A.A.; Kim, S.; Wilson, C.J.; Lehár, J.; Kryukov, G.V.; Sonkin, D.; et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012, 483, 603–607. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dehmer, M.; Emmert-Streib, F. (Eds.) Analysis of Complex Networks: From Biology to Linguistics; Wiley-VCH: Weinheim, Germany, 2009. [Google Scholar]
- Emmert-Streib, F.; Moutari, S.; Dehmer, M. The process of analyzing data is the emergent feature of data science. Front. Genet. 2016, 7, 12. [Google Scholar] [CrossRef] [PubMed]
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Musa, A.; Dehmer, M.; Yli-Harja, O.; Emmert-Streib, F. Exploiting Genomic Relations in Big Data Repositories by Graph-Based Search Methods. Mach. Learn. Knowl. Extr. 2019, 1, 205-210. https://doi.org/10.3390/make1010012
Musa A, Dehmer M, Yli-Harja O, Emmert-Streib F. Exploiting Genomic Relations in Big Data Repositories by Graph-Based Search Methods. Machine Learning and Knowledge Extraction. 2019; 1(1):205-210. https://doi.org/10.3390/make1010012
Chicago/Turabian StyleMusa, Aliyu, Matthias Dehmer, Olli Yli-Harja, and Frank Emmert-Streib. 2019. "Exploiting Genomic Relations in Big Data Repositories by Graph-Based Search Methods" Machine Learning and Knowledge Extraction 1, no. 1: 205-210. https://doi.org/10.3390/make1010012
APA StyleMusa, A., Dehmer, M., Yli-Harja, O., & Emmert-Streib, F. (2019). Exploiting Genomic Relations in Big Data Repositories by Graph-Based Search Methods. Machine Learning and Knowledge Extraction, 1(1), 205-210. https://doi.org/10.3390/make1010012