Abstract
Next Generation Sequencing is a 10-year old technology for reading the DNA, capable of producing massive amounts of genomic data - in turn, reshaping genomic computing. In particular, tertiary data analysis is concerned with the integration of heterogeneous regions of the genome; this is an emerging and increasingly important problem of genomic computing, because regions carry important signals and the creation of new biological or clinical knowledge requires the integration of these signals into meaningful messages. We specifically focus on how the GeCo project is contributing to tertiary data analysis, by overviewing the main results of the project so far and by describing its future scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
1000 Genomes Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature, 491, 56–65 (2012)
Albrecht, F., et al.: DeepBlue epigenomic data server: programmatic data retrieval and analysis of the epigenome. Nucleid Acids Res. 44(W1), W581–586 (2016)
Accelerating bioinformatics research with new software for big data to knowledge (BD2K). Paradigm4 Inc. (2015). http://www.paradigm4.com/)
Apache Flink. http://flink.apache.org/
Apache Pig. http://pig.apache.org/
Apache Spark. http://spark.apache.org/
Bernasconi, A., et al.: Conceptual modeling for genomics: building an integrated repository of open data. In: Proceedings of the Entity-Relationship, Valencia, ES (2017)
Bertoni, M., et al.: Evaluating cloud frameworks on genomic applications. In: Proceedings of the IEEE Conference on Big Data Management, Santa Clara, CA (2015)
Cattani, S., et al.: Evaluating genomic big data operations on SciDB and Spark. In: Cabot, J., De Virgilio, R., Torlone, R. (eds.) ICWE 2017. LNCS, vol. 10360, pp. 482–493. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60131-1_34
Ceri, S., et al.: Data-Driven Genomic Computing (GeCo): Making sense of Signals from the Genome. In: Selected Papers of the XIX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2017), CEUR Workshop Proceedings, vol. 2022, pp. 1–2 (2017)
Ceri, S., et al.: Data management for heterogeneous genomic datasets. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1251–1264 (2016)
Cumbo, F., et al.: TCGA2BED: extracting, extending, integrating, and querying the Cancer genome atlas. BMC Bioinf. 18(6), 1–9 (2017)
ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)
Jalili, V., et al.: Indexing next-generation sequencing data. Inf. Sci. 384, 90–109 (2016). https://doi.org/10.1016/j.ins.2016.08.085
Jalili, V., et al.: Explorative visual analytics on interval-based genomic data and their metadata. BMC Bioinf. 18, 536 (2017)
Kaitoua, A., et al.: Framework for supporting genomic operations, IEEE-TC (2016). https://doi.org/10.1109/TC.2016.2603980
Masseroli, M., et al.: GenoMetric query language: a novel approach to large-scale genomic data management. Bioinformatics 31(12), 1881–1888 (2015)
Masseroli, M., et al.: Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. Methods 111, 3–11 (2016)
Nanni, L., et al.: Exploring genomic datasets: from batch to interactive and back. In: Proceedings of the ExploreDB 2018, Co-Located with ACM-Sigmod, June 2018
Olston, C., et al.: Pig Latin: a not-so-foreign language for data processing. In: ACM-SIGMOD, pp. 1099–1110 (2008)
Romanoski, C.E., et al.: Epigenomics: roadmap for regulation. Nature 518, 314–316 (2015)
SciDB. http://www.scidb.org/
Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Methods 5(1), 16–18 (2008)
Stephens, Z.D., et al.: Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195 (2015)
Weinstein, J.N., et al.: The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the USENIX, pp. 15–28 (2012)
Acknowledgment
This research is funded by the ERC Advanced Grant project GeCo (Data-Driven Genomic Computing), No. 693174, 2016-2021.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ceri, S. et al. (2018). Overview of GeCo: A Project for Exploring and Integrating Signals from the Genome. In: Kalinichenko, L., Manolopoulos, Y., Malkov, O., Skvortsov, N., Stupnikov, S., Sukhomlin, V. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2017. Communications in Computer and Information Science, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-319-96553-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-96553-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96552-9
Online ISBN: 978-3-319-96553-6
eBook Packages: Computer ScienceComputer Science (R0)