Abstract
Among the available Linux container technologies, Docker is one of the most popular ones. Docker images can be used to provide ready-to-use software packages, where all required dependencies are already installed, and they can be deployed in any operating system where Docker is installed. They are also a convenient way to store immutable working software packages, thus contributing to reproducibility. Moreover, the usage of Docker images greatly eases the development of complex pipelines, standalone software applications with graphical user interfaces that require external software, and even the development of databases. Therefore, not surprisingly, Docker images are now ubiquitously used in computational biology and bioinformatics. Here, we present the pegi3s Bioinformatics Docker Images Project (https://pegi3s.github.io/dockerfiles/), a collection of more than 70 Docker images for commonly used software in the fields of genomics, transcriptomics, proteomics, phylogenetics, and sequence handling, among others, that is constantly growing. Several features distinguish this project from much larger projects, namely: 1) by providing a list of tools that are classified into broad categories, it is easier to find the most adequate tool(s) for a given project; 2) by providing the hyperlinks to the software manuals, we facilitate the process of finding the parameter combinations that are best suited for a given processing step; 3) most importantly, we provide clear instructions on how to run the images, provide test data that can be used to quickly evaluate the Docker image, and give all details on how each Docker image was built. All images are routinely used by ourselves, in the context of our research and teaching activities, meaning that they have been extensively tested. Therefore, we believe that this project, which is offered as a service in the context of the European ELIXIR program, is of interest to many researchers, independently of having or not a background in informatics.
H. López-Fernández and P. Ferreira—Contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
References
Perkel, J.M.: Workflow systems turn raw data into scientific knowledge. Nature 573, 149–150 (2019). https://doi.org/10.1038/d41586-019-02619-z
Gomes, J., et al.: Enabling rootless Linux Containers in multi-user environments: the udocker tool. Comput. Phys. Commun. 232, 84–97 (2018). https://doi.org/10.1016/j.cpc.2018.05.021
Gruening, B., et al.: Recommendations for the packaging and containerizing of bioinformatics software. F1000Res. 7, 742 (2019). https://doi.org/10.12688/f1000research.15140.2
Nüst, D., et al.: Ten simple rules for writing Dockerfiles for reproducible data science. PLoS Comput. Biol. 16, e1008316 (2020). https://doi.org/10.1371/journal.pcbi.1008316
Belmann, P., Dröge, J., Bremges, A., McHardy, A.C., Sczyrba, A., Barton, M.D.: Bioboxes: standardised containers for interchangeable bioinformatics software. GigaScience 4, (2015). https://doi.org/10.1186/s13742-015-0087-0
Moreews, F., et al.: BioShaDock: a community driven bioinformatics shared Docker-based tools registry. F1000Res. 4, 1443 (2015). https://doi.org/10.12688/f1000research.7536.1
da Veiga Leprevost, F., et al.: BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33, 2580–2582 (2017). https://doi.org/10.1093/bioinformatics/btx192
Menegidio, F.B., Jabes, D.L., Costa de Oliveira, R., Nunes, L.R.: Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses. Bioinformatics 34, 514–515 (2018). https://doi.org/10.1093/bioinformatics/btx554
O’Connor, B.D., et al.: The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. F1000Res. 6, 52 (2017). https://doi.org/10.12688/f1000research.10137.1
Jackman, S.D., et al.: ORCA: a comprehensive bioinformatics container environment for education and research. Bioinformatics 35, 4448–4450 (2019). https://doi.org/10.1093/bioinformatics/btz278
Lopez-Fernandez, H., et al.: SEDA: a desktop tool suite for FASTA files processing. IEEE/ACM Trans. Comput. Biol. Bioinform 1 (2020). https://doi.org/10.1109/TCBB.2020.3040383
López-Fernández, H., Graña-Castro, O., Nogueira-Rodríguez, A., Reboiro-Jato, M., Glez-Peña, D.: Compi: a framework for portable and reproducible pipelines. PeerJ Comput. Sci. 7, e593 (2021). https://doi.org/10.7717/peerj-cs.593
López-Fernández, H., et al.: Inferring positive selection in large viral datasets. In: Fdez-Riverola, F., Rocha, M., Mohamad, M.S., Zaki, N., Castellanos-Garzón, J.A. (eds.) PACBB 2019. AISC, vol. 1005, pp. 61–69. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-23873-5_8
Nogueira-Rodríguez, A., López-Fernández, H., Graña-Castro, O., Reboiro-Jato, M., Glez-Peña, D.: Compi hub: a public repository for sharing and discovering Compi pipelines. In: Panuccio, G., Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds.) PACBB 2020. AISC, vol. 1240, pp. 51–59. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-54568-0_6
López-Fernández, H., Vieira, C.P., Fdez-Riverola, F., Reboiro-Jato, M., Vieira, J.: Inferences on mycobacterium leprae host immune response escape and antibiotic resistance using genomic data and GenomeFastScreen. In: Panuccio, G., Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds.) PACBB 2020. AISC, vol. 1240, pp. 42–50. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-54568-0_5
Reboiro-Jato, D., Reboiro-Jato, M., Fdez-Riverola, F., Vieira, C.P., Fonseca, N.A., Vieira, J.: ADOPS–Automatic detection of positively selected sites. J Integr. Bioinform. 9, 200 (2012). https://doi.org/10.2390/biecoll-jib-2012-200
Vázquez, N., López-Fernández, H., Vieira, C.P., Fdez-Riverola, F., Vieira, J., Reboiro-Jato, M.: BDBM 1.0: a desktop application for efficient retrieval and processing of high-quality sequence data and application to the identification of the putative Coffea S-locus. Interdiscip. Sci. Comput. Life Sci. 11(1), 57–67 (2019). https://doi.org/10.1007/s12539-019-00320-3
Vázquez, N., et al.: EvoPPI 1.0: a web platform for within- and between-species multiple interactome comparisons and application to nine PolyQ proteins determining neurodegenerative diseases. Interdiscip. Sci. Comput. Life Sci. 11(1), 45–56 (2019). https://doi.org/10.1007/s12539-019-00317-y
Acknowledgments
This work was financed by the National Funds through FCT—Fundação para a Ciência e a Tecnologia, I.P., under the project UIDB/04293/2020 and through the individual scientific employment program-contract with Hugo López-Fernández (2020.00515.CEECIND), and also by BioData.pt (project 22231/01/SAICT/2016). This work was also partially supported by the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding ED431C2018/55-GRC Competitive Reference Group.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
López-Fernández, H., Ferreira, P., Reboiro-Jato, M., Vieira, C.P., Vieira, J. (2022). The pegi3s Bioinformatics Docker Images Project. In: Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 15th International Conference (PACBB 2021). PACBB 2021. Lecture Notes in Networks and Systems, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-86258-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-86258-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86257-2
Online ISBN: 978-3-030-86258-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)