Abstract
In scientific collaboration, data sharing, the exchange of ideas and results are essential to knowledge construction and the development of science. Hence, we must guarantee interoperability, privacy, traceability (reinforcing transparency), and trust. Provenance has been widely recognized for providing a history of the steps taken in scientific experiments. Consequently, we must support traceability, assisting in scientific results’ reproducibility. One of the technologies that can enhance trust in collaborative scientific experimentation is blockchain. This work proposes an architecture, named BlockFlow, based on blockchain, provenance, and cloud infrastructure to bring trust and traceability in the execution of collaborative scientific experiments. The proposed architecture is implemented on Hyperledger, and a scenario about the genomic sequencing of the SARS-CoV-2 coronavirus is used to evaluate the architecture, discussing the benefits of providing traceability and trust in collaborative scientific experimentation. Furthermore, the architecture addresses the heterogeneity of shared data, facilitating interpretation by geographically distributed researchers and analysis of such data. Through a blockchain-based architecture that provides support on provenance and blockchain, we can enhance data sharing, traceability, and trust in collaborative scientific experiments.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Data Availability
The datasets generated during and/or analyzed during the current study are available on Github at https://github.com/RaianeQC/blockflow-trust-provenance.
Notes
References
Al-Mamun, A., Yan, F., Zhao, D.: SciChain: Blockchain-enabled Lightweight and Efficient Data Provenance for Reproducible Scientific Computing. 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp.1853–1858 (2021). https://doi.org/10.1109/ICDE51399.2021.00166
Ambrosio, L., Magaldi, H., David, J., Braga, R., Arbex, W., Campos, M., Capilla, R.: Enhancing the reuse of scientific experiments for agricultural software ecosystems. J. Grid Comput. (2021). https://doi.org/10.1007/s10723-021-09583-x
Androulaki, E., Barger, A., Bortnikov, V., Cachin, C., Christidis, K., De Caro, A., Enyeart, D., Ferris, C., Laventman, G., Manevich, Y., Muralidharan, S., Murthy, C., Nguyen, B., Sethi, M., Singh, G., Smith, K., Sorniotti, A., Stathakopoulou, C., Vukolic, M., Cocco, S., Yellick, J.: Hyperledger fabric: a distributed operating system for permissioned blockchains. In: Proceedings of the Thirteenth EuroSys Conference, 1–15 (2018). https://doi.org/10.1145/3190508.3190538
Ansorge, W.: Next-generation DNA sequencing techniques. New Biotechnol. 25(4), 195–203 (2009). https://doi.org/10.1016/j.nbt.2008.12.009
Azaria, A., Ekblaw, A., Vieira, T., Lippman, A., Medrec: Using blockchain for medical data access and permission management. In: 2016 2nd iNternational Conference on Open and Big Data (OBD) (pp. 25–30). IEEE (2016). https://doi.org/10.1109/OBD.2016.11
Belloum, A., Inda, M., Vasunin, D., Korkhov, V., Zhao, Z., Rauwerda, H., Breit, T., Bubak, M., Hertzberger, L.: Collaborative e-science experiments and scientific workflows. IEEE Internet Comput. 15(439–47) (2011). https://doi.org/10.1109/MIC.2011.87
Bhuyan, F., Lu, S., Reynolds, R., Zhang, J., Ahmed, I.: A security framework for scientific workflow provenance access control policies. IEEE Trans. Serv. Comput. (2019). https://doi.org/10.1109/TSC.2019.2921586
Bosch, J.: From software product lines to software ecosystems. SPLC, 2009, Pittsburgh, PA, USA: Proceedings of the 13th International Software Product Line Conference, 111– 119 (2009)
Callahan, S., Freire, J., Santos, E., Scheidegger, C., Silva, C., Huy, V.O.: T, VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, 745–747 (2006). https://doi.org/10.1145/1142473.1142574
Cao, Y., Jones, C., Cuevas-Vicenttín, V., Jones, M.B., Ludäscher, B., McPhillips, T.M., Missier, P., Schwalm, C.R., Slaughter, P., Vieglais, D., Walker, L., Wei, Y.: ProvONE: extending PROV to support the DataONE scientific community. Available via (2016). http://homepages.cs.ncl.ac.uk/paolo.missier/doc/dataone-prov-3-years-later.pdf cited Jan 2021
Castro, G., Werner, C., Braga, R., Teixeira, E., Stroele, V., Araújo, M.: Design, application and evaluation of PROV-SwProcess: A PROV extension data model for software development processes. J. Web Semant. V 71, 100676 (2021). https://doi.org/10.1016/j.websem.2021.100676
Chen, W., Liang, X., Li, J., Qin, H., Mu, Y., Wang, J.: Blockchain based provenance sharing of scientific workflows. In: w2018 IEEE International Conference on Big Data (Big Data). IEEE, 3814–3820 (2018). https://doi.org/10.1109/BigData.2018.8622237
Classe, T., Braga, R., David, J.M., Campos, F., Arbex, W.: A distributed infrastructure to support scientific experiments. J. Grid Comput. 1, 1–26 (2017). https://doi.org/10.1007/s10723-017-9401-7
Coelho, R., Braga, R., David, J.M., Dantas, M., Stroele, V., Campos, F.: Blockchain for reliability in collaborative scientific workflows on cloud platforms. In: 2020 IEEE Symposium on Computers and Communications (ISCC). IEEE, 1–7 (2020). https://doi.org/10.1109/ISCC50000.2020.9219729
Coelho, R., Braga, R., David, J.M., Dantas, M., Stroele, V., Campos, F.: Integrating blockchain for data sharing and collaboration support in scientific ecosystem platform. In: Proceedings of the 54th Hawaii International Conference on System Sciences, 264 (2021). https://doi.org/10.24251/HICSS.2021.031
Costa, F., De Oliveira, D., Mattoso, M.: Towards an adaptive and distributed architecture for managing workflow provenance data. In: 2014 IEEE 10th International Conference on e-Science. IEEE, 79–82 (2014). https://doi.org/10.1109/eScience.2014.59
Davidson, S., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD International Conference On Management of Data, p. 1345–1350 (2018). https://doi.org/10.1145/1376616.1376772
De Oliveira, D., Baião, F., Mattoso, M.: Towards a taxonomy for cloud computing from an e-science perspective. In: Cloud Computing, pp. 47–62. Springer, London (2010). https://doi.org/10.1007/978-1-84996-241-4_3
Deelman, E., Chervenak, A.: Data management challenges of data-intensive scientific workflows. In: 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID) (pp. 687–692). IEEE (2008). https://doi.org/10.1109/CCGRID.2008.24
Deelman, E., Mehta, G., Singh, G., Su, M., Vahi, K.: Pegasus: mapping large-scale workflows to distributed resources. In: Workflows for e-Science, pp. 376–394. Springer, London (2007). https://doi.org/10.1007/978-1-84628-757-2_23
Demichev, A., Kryukov, A., Prikhod’ko, N.: Business process engineering for data storing and processing in a collaborative distributed environment based on provenance metadata, smart contracts and blockchain technology. J. Grid Comput. 19, 3 (2021). https://doi.org/10.1007/s10723-021-09544-4
Fanning, K., Centers, D.: Blockchain and its coming impact on financial services. J. Corp. Acc. Finan. 27(5), 53–57 (2016). https://doi.org/10.1002/jcaf.22179
Fernando, D., Kulshrestha, S., Herath, J., Mahadik, N., Ma, Y., Bai, C., Yang, P., Yan, G., Lu, S.: SciBlock: A blockchain-based tamper-proof non-repudiable storage for scientific workflow provenance. In: 2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC). IEEE, 81–90 (2019). https://doi.org/10.1109/CIC48465.2019.00019
Fraser, H., Parker, T., Nakagawa, S., Barnett, A., Fidler, F.: Questionable research practices in ecology and evolution. PLoS One. 13(7), e0200303 (2018). https://doi.org/10.1371/journal.pone.0200303
Freire, J., Chirigati, F.: Provenance and the different flavors of computational reproducibility. IEEE Data Engineering Bulletin, v. 41(1), 15 (2018)
Freire, J., Koop, D., Santos, E., Silva, C.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008). https://doi.org/10.1109/MCSE.2008.79
Groth, P., Moreau, L.: PROV-overview. An overview of the PROV family of documents. Available via (2013). http://eprints.soton.ac.uk/id/eprint/356854 cited Jun 2021
Han, R., et al.: Vassago: Efficient and Authenticated Provenance Query on Multiple Blockchains. 2021 40th International Symposium on Reliable Distributed Systems (SRDS), pp. 132–142 (2021). https://doi.org/10.1109/SRDS53918.2021.00022
Hang, L., Choi, E., Kim, D.-H.: A novel EMR integrity management based on a medical blockchain platform in hospital. Electronics 8, 467 (2019). https://doi.org/10.3390/electronics8040467
Hevner, A., March, S., Park, J., Ram, S.: Design science in information systems research. MIS Q. 75–105 (2004). https://doi.org/10.2307/25148625
Hevner, A., March, S., Park, J., Ram, S.: Design science in information systems research. Manage. Inform. Syst. Q. 28(1), 6 (2008)
Hey, T., Tansley, S., Tolle, K., et al.: The fourth paradigm: data-intensive scientific discovery. Microsoft research [S.l.], Redmond (2009)
Hey, T., Trefethen, A.: The fourth paradigm 10 years on. Informatik Spektrum. 42(6), 441–447 (2020). https://doi.org/10.1007/s00287-019-01215-9
Himanen, L., Geurts, A., Foster, A., Rinke, P.: Data-driven materials science: status, challenges, and perspectives. Adv. Sci. 6, 1900808 (2019). https://doi.org/10.1002/advs.201900808
Jandre, E., Dirr, B., Braganholo, V.: Provenance in collaborative in oisilico scientific research: a survey. ACM SIGMOD Rec. 49(2), 36–51 (2020). https://doi.org/10.1145/3442322.3442329
Jyoti, A., Chauhan, R.K.: A blockchain and smart contract-based data provenance collection and storing in cloud environment. Wirel. Netw 28, 1541–1562 (2022). https://doi.org/10.1007/s11276-022-02924-y
Karastoyanova, D., Stage, L.: Towards collaborative and reproducible scientific experiments on blockchain. In: International Conference on Advanced Information Systems Engineering. Springer, Cham, p. 144–149 (2018). https://doi.org/10.1007/978-3-319-92898-2_12
Kim, H., Laskowski, M.: Toward an ontology-driven blockchain design for supply-chain provenance. Intell. Syst. Account. Finan. Manag. 25(1), 18–27 (2018). https://doi.org/10.1002/isaf.1424
Kochovski, P., Gec, S., Stankovski, V., Bajec, M., Drobintsev, P.D.: Trust management in a blockchain based fog computing platform with trustless smart oracles. Futur. Gener. Comput. Syst. 101, 747–759 (2019). https://doi.org/10.1016/j.future.2019.07.030
Koop, D., Freire, J.: Reorganizing workflow evolution provenance. In: 6th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2014) (2014)
Liang, X., Shetty, S., Tosh, D., Kamhoua, C., Kwiat, K., Njilla, L.: Provchain: A blockchain-based data provenance architecture in cloud environment with enhanced privacy and availability. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, p. 468–477 (2017). https://doi.org/10.1109/CCGRID.2017.8
Lim, C., Lu, S., Chebotko, A., Fotouhi, F.: Prospective and retrospective provenance collection in scientific workflow environments. In: 2010 IEEE International Conference on Services Computing. IEEE, p. 449–456 (2010). https://doi.org/10.1109/SCC.2010.18
Ludascher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurr. Comput. Pract. Experience. 18(10), 1039–1065 (2006). https://doi.org/10.1002/cpe.994
Mendes, Y., Braga, R., Stroele, V., De Oliveira, D.: Polyflow: A soa for analyzing workflow heterogeneous provenance data in distributed environments. In: Proceedings of the XV Brazilian Symposium on Information Systems, p. 1–8 (2019). https://doi.org/10.1145/3330204.3330259
Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Willians, A., Oinn, T., Goble, C.: Taverna, reloaded. In: International Conference On Scientific and Statistical Database Management. Springer, Berlin, p 471–481 (2010). https://doi.org/10.1007/978-3-642-13818-8_33
Missier, P., Woodman, S., Hiden, H., Watson, P.: Provenance and data differencing for workflow reproducibility analysis. Concurr. Comput. Pract. Experience. 28(4), 995–1015 (2016). https://doi.org/10.1002/cpe.3035
Miyakawa, T.: No raw data, no science: another possible source of the reproducibility crisis. Mol. Brain 13, 24 (2020). https://doi.org/10.1186/s13041-020-0552-2
Möller, J., Fröschle, S., Hahn, A.: Permissioned blockchain for data provenance in scientific data management. In: Ahlemann, F., Schütte, R., Stieglitz, S. (eds.) Innovation Through Information Systems. WI 2021. Lecture Notes in Information Systems and Organisation, vol. 48. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86800-0_2
Moreau, L., Freire, J., Futrelle, J., McGrath, R.E., Myers, J., Paulson, P.: The open provenance model: An overview. In: International Provenance and Annotation Workshop, pp. 323–326. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-89965-5_31
Nakamoto S.: Bitcoin: a peer-to-peer electronic cash system. Decentralized Business Review, 21260 (2008)
Ocana, K., De Oliveira, D., Horta, F., Dias, J., Ogasawara, E., Mattoso, M.: Exploring molecular evolution reconstruction using a parallel cloud based scientific workflow, In: Brazilian Symposium on Bioinformatics. Springer, Berlin, p 179–191 (2012). https://doi.org/10.1007/978-3-642-31927-3_16
Ocana, K., De Oliveira, D., Ogasawara, E., Dávila, A., Lima, A., Mattoso, M.: SciPhy: a cloud-based workflow for phylogenetic analysis of drug targets in protozoan genomes. In: Brazilian Symposium on Bioinformatics. Springer, Berlin, p 66–70 (2011). https://doi.org/10.1007/978-3-642-22825-4_9
Oliveira, W., Missier, P., Ocana, K., De Oliveira, D., Braganholo, V.: Analyzing provenance across heterogeneous provenance graphs. In: International Provenance and Annotation Workshop, pp. 57–70. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40593-3_5
Pajooh, H., Rashid, M.A., Alam, F., et al.: IoT Big Data provenance scheme using blockchain on Hadoop ecosystem. J. Big Data 8, 114 (2021). https://doi.org/10.1186/s40537-021-00505-y
Ramachandran, A., Kantarcioglu, M.: Smartprovenance: a distributed, blockchain based dataprovenance system. In: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy., p. 35–42 (2018). https://doi.org/10.1145/3176258.3176333
Shantharam, M., Lin, K., Sakai, S., Sivagnanam, S.: Integrity protection for research artifacts using open science chain’s command line utility. In Practice and Experience in Advanced Research Computing (PEARC '21). Association for Computing Machinery, New York, Article: 31, 1–4 (2021). https://doi.org/10.1145/3437359.3465587
Shull, F., Mendonça, M., Basili, V., Carver, J., Maldonado, J., Fabbri, S., Travassos, G., Ferreira, M.: Knowledge-sharing issues in experimental software engineering. Empir. Softw. Eng. 9(1), 111–137 (2004). https://doi.org/10.1023/B:EMSE.0000013516.80487.33
Silva, C., Freire, J., Callahan, S.: Provenance for visualizations: Reproducibility and beyond. Comput. Sci. Eng. 9(5), 82–89 (2007). https://doi.org/10.1109/MCSE.2007.106
Song, M., Moshiri, N.: An analysis of SARS-CoV-2 using ViReport. Available via (2020). https://doi.org/10.1101/2020.06.20.163162 cited Jun 2021
Song, Z., et al.: An improved data provenance framework integrating blockchain and PROV Model, 2020. International Conference on Computer Science and Management Technology (ICCSMT), pp. 323–327 (2020). https://doi.org/10.1109/ICCSMT51754.2020.00073
Tenopir, C., Dalton, E., Allard, S., Frame, M., Pjesivac, I., Birch, B., Pollock, D., Dorsett, K.: Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLoS One. 10(8), e0134826 (2015). https://doi.org/10.1371/journal.pone.0134826
Tosh, D., Shetty, S., Liang, X., Kamhoua, C., Njilla, L.: Consensus protocols for blockchain-based data provenance: Challenges and opportunities. In: 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON). IEEE, p. 469–474 (2017). https://doi.org/10.1109/UEMCON.2017.8249088
Van Rossun, J.: Blockchain for research: Perspectives on a new paradigm for scholarly communication. Digital Science, November (2017). https://doi.org/10.6084/m9.figshare.5607778.v1
Wan, S., Li, M., Liu, G., Wang, C.: Recent advances in consensus protocols for blockchain: a survey. Wirel. Netw. 26(8), 5579–5593 (2020). https://doi.org/10.1007/s11276-019-02195-0
Wang, W., Hoang, D., Hu, P., Xiong, Z., Niyato, D., Wang, P., Wen, Y., Kim, D.: A survey on consensus mechanisms and mining strategy management in blockchain networks. IEEE Access. 7, 22328–22370 (2019). https://doi.org/10.1109/ACCESS.2019.2896108
Wenyi, T., Changhao, C., Chanyang, J., Taeho, J.: Trac2Chain: trackability and traceability of graph data in blockchain with linkage privacy. In: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing (SAC ‘22). Association for Computing Machinery, New York, NY, USA, 272–281 (2022). https://doi.org/10.1145/3477314.3506993
Wozniak, J., Armstrong, T., Wilde, M., Katz, D., Lusk, E., Foster, I.: Swift/t: Large-scale application composition via distributed-memory dataflow processing. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. IEEE, p. 95–102 (2013). https://doi.org/10.1109/CCGrid.2013.99
Xu, X., Weber, I., Staples, M.: Architecture for Blockchain Applications. Springer, Cham (2019)
Yin, R., Robert, K.: Case Study Research Design and Methods. Sage, Los Angeles (2014)
Zhao, Y., Fei, X., Raicu, I., Lu, S.: Opportunities and challenges in running scientific workflows on the cloud. In: 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. IEEE, p. 455–462 (2011). https://doi.org/10.1109/CyberC.2011.80
Funding
This work was partially funded by UFJF/Brazil, CAPES/Brazil, CNPq/Brazil (grant: 311595/2019-7), and FAPEMIG/Brazil (grant: APQ-02685-17), (grant: APQ-02194-18).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1 Criteria for Inclusion and Exclusion of Articles
Inclusion criteria were: (IC1) - The study proposes a solution that uses the blockchain as a mechanism for the storage and management of provenance data; (IC2) - The study was written in English; (IC3) - The study was published from 2008 to 2022; (IC4): Available as full papers in digital libraries.
Exclusion Criteria were: (EC1) - Matches the keyword in the search string, but the context is different from the search purposes; (EC2) - The abstract did not address any aspect of the research questions; (EC3) - Duplicated, that is, the work has already been retrieved from another digital library; (EC4) - The article does not contain an abstract; (EC5) -It is not a primary study; (EC6) - Not available for the university (UFJF) credentials; (EC7) - The study was published as a short paper; (EC81) – The study is not written in English; (EC9) - The study was not published in a conference or journal related to Computer Science; (EC10) - The study was not published in a peer review vehicle; (EC11) - The study was published before 2008; (EC12) - The study does not propose a solution that uses blockchain as a mechanism for the storage and management of provenance data.
To assist in the mapping, the Parsif.alFootnote 1 tool was used. Some exclusion criteria have already been applied when using this tool, due to the availability, by Parsif.al, of filters, such as the publication year filter.
Appendix 2 Analysis of Mapping Research Questions
Table 7
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Coelho, R., Braga, R., David, J.M.N. et al. A Blockchain-Based Architecture for Trust in Collaborative Scientific Experimentation. J Grid Computing 20, 35 (2022). https://doi.org/10.1007/s10723-022-09626-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10723-022-09626-x