Overview of GeCo: A Project for Exploring and Integrating Signals from the Genome

Stefano Ceri ORCID: orcid.org/0000-0003-0671-2415¹⁴,
Anna Bernasconi¹⁴,
Arif Canakoglu¹⁴,
Andrea Gulino¹⁴,
Abdulrahman Kaitoua¹⁴,
Marco Masseroli¹⁴,
Luca Nanni¹⁴ &
…
Pietro Pinoli¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 822))

Included in the following conference series:

International Conference on Data Analytics and Management in Data Intensive Domains

506 Accesses

Abstract

Next Generation Sequencing is a 10-year old technology for reading the DNA, capable of producing massive amounts of genomic data - in turn, reshaping genomic computing. In particular, tertiary data analysis is concerned with the integration of heterogeneous regions of the genome; this is an emerging and increasingly important problem of genomic computing, because regions carry important signals and the creation of new biological or clinical knowledge requires the integration of these signals into meaningful messages. We specifically focus on how the GeCo project is contributing to tertiary data analysis, by overviewing the main results of the project so far and by describing its future scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bioinformatics Tools in Clinical Genomics

Data Science for Genomic Data Management: Challenges, Resources, Experiences

Article 29 June 2019

Current Massively Parallel Sequencing Technologies: Platforms and Reporting Considerations

Notes

1.
http://www.bioinformatics.deib.polimi.it/geco.

References

1000 Genomes Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature, 491, 56–65 (2012)
Google Scholar
Albrecht, F., et al.: DeepBlue epigenomic data server: programmatic data retrieval and analysis of the epigenome. Nucleid Acids Res. 44(W1), W581–586 (2016)
Article Google Scholar
Accelerating bioinformatics research with new software for big data to knowledge (BD2K). Paradigm4 Inc. (2015). http://www.paradigm4.com/)
Apache Flink. http://flink.apache.org/
Apache Pig. http://pig.apache.org/
Apache Spark. http://spark.apache.org/
Bernasconi, A., et al.: Conceptual modeling for genomics: building an integrated repository of open data. In: Proceedings of the Entity-Relationship, Valencia, ES (2017)
Chapter Google Scholar
Bertoni, M., et al.: Evaluating cloud frameworks on genomic applications. In: Proceedings of the IEEE Conference on Big Data Management, Santa Clara, CA (2015)
Google Scholar
Cattani, S., et al.: Evaluating genomic big data operations on SciDB and Spark. In: Cabot, J., De Virgilio, R., Torlone, R. (eds.) ICWE 2017. LNCS, vol. 10360, pp. 482–493. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60131-1_34
Google Scholar
Ceri, S., et al.: Data-Driven Genomic Computing (GeCo): Making sense of Signals from the Genome. In: Selected Papers of the XIX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2017), CEUR Workshop Proceedings, vol. 2022, pp. 1–2 (2017)
Google Scholar
Ceri, S., et al.: Data management for heterogeneous genomic datasets. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1251–1264 (2016)
Article Google Scholar
Cumbo, F., et al.: TCGA2BED: extracting, extending, integrating, and querying the Cancer genome atlas. BMC Bioinf. 18(6), 1–9 (2017)
Google Scholar
ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)
Article Google Scholar
FireCloud. https://software.broadinstitute.org/firecloud
Jalili, V., et al.: Indexing next-generation sequencing data. Inf. Sci. 384, 90–109 (2016). https://doi.org/10.1016/j.ins.2016.08.085
Article Google Scholar
Jalili, V., et al.: Explorative visual analytics on interval-based genomic data and their metadata. BMC Bioinf. 18, 536 (2017)
Article Google Scholar
Kaitoua, A., et al.: Framework for supporting genomic operations, IEEE-TC (2016). https://doi.org/10.1109/TC.2016.2603980
Masseroli, M., et al.: GenoMetric query language: a novel approach to large-scale genomic data management. Bioinformatics 31(12), 1881–1888 (2015)
Article Google Scholar
Masseroli, M., et al.: Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. Methods 111, 3–11 (2016)
Article Google Scholar
Nanni, L., et al.: Exploring genomic datasets: from batch to interactive and back. In: Proceedings of the ExploreDB 2018, Co-Located with ACM-Sigmod, June 2018
Google Scholar
Olston, C., et al.: Pig Latin: a not-so-foreign language for data processing. In: ACM-SIGMOD, pp. 1099–1110 (2008)
Google Scholar
Romanoski, C.E., et al.: Epigenomics: roadmap for regulation. Nature 518, 314–316 (2015)
Article Google Scholar
SciDB. http://www.scidb.org/
Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Methods 5(1), 16–18 (2008)
Article Google Scholar
Stephens, Z.D., et al.: Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195 (2015)
Article Google Scholar
Weinstein, J.N., et al.: The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
Article Google Scholar
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the USENIX, pp. 15–28 (2012)
Google Scholar

Download references

Acknowledgment

This research is funded by the ERC Advanced Grant project GeCo (Data-Driven Genomic Computing), No. 693174, 2016-2021.

Author information

Authors and Affiliations

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy
Stefano Ceri, Anna Bernasconi, Arif Canakoglu, Andrea Gulino, Abdulrahman Kaitoua, Marco Masseroli, Luca Nanni & Pietro Pinoli

Authors

Stefano Ceri
View author publications
You can also search for this author in PubMed Google Scholar
Anna Bernasconi
View author publications
You can also search for this author in PubMed Google Scholar
Arif Canakoglu
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Gulino
View author publications
You can also search for this author in PubMed Google Scholar
Abdulrahman Kaitoua
View author publications
You can also search for this author in PubMed Google Scholar
Marco Masseroli
View author publications
You can also search for this author in PubMed Google Scholar
Luca Nanni
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Pinoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Ceri .

Editor information

Editors and Affiliations

Federal Research Center “Computer Science and Control”, Russian Academy of Sciences, Moscow, Russia
Leonid Kalinichenko
Open University of Cyprus, Latsia, Cyprus
Yannis Manolopoulos
Institute of Astronomy, Russian Academy of Sciences, Moscow, Russia
Oleg Malkov
Federal Research Center “Computer Science and Control”, Russian Academy of Sciences, Moscow, Russia
Nikolay Skvortsov
Federal Research Center “Computer Science and Control”, Russian Academy of Sciences, Moscow, Russia
Sergey Stupnikov
Moscow State University, Moscow, Russia
Vladimir Sukhomlin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ceri, S. et al. (2018). Overview of GeCo: A Project for Exploring and Integrating Signals from the Genome. In: Kalinichenko, L., Manolopoulos, Y., Malkov, O., Skvortsov, N., Stupnikov, S., Sukhomlin, V. (eds) Data Analytics and Management in Data Intensive Domains. DAMDID/RCDL 2017. Communications in Computer and Information Science, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-319-96553-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-96553-6_4
Published: 13 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96552-9
Online ISBN: 978-3-319-96553-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Overview of GeCo: A Project for Exploring and Integrating Signals from the Genome

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bioinformatics Tools in Clinical Genomics

Data Science for Genomic Data Management: Challenges, Resources, Experiences

Current Massively Parallel Sequencing Technologies: Platforms and Reporting Considerations

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Overview of GeCo: A Project for Exploring and Integrating Signals from the Genome

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bioinformatics Tools in Clinical Genomics

Data Science for Genomic Data Management: Challenges, Resources, Experiences

Current Massively Parallel Sequencing Technologies: Platforms and Reporting Considerations

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation