Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Do machine learning platforms provide out-of-the-box reproducibility?

Published: 01 January 2022 Publication History

Abstract

Science is experiencing an ongoing reproducibility crisis. In light of this crisis, our objective is to investigate whether machine learning platforms provide out-of-the-box reproducibility. Our method is twofold: First, we survey machine learning platforms for whether they provide features that simplify making experiments reproducible out-of-the-box. Second, we conduct the exact same experiment on four different machine learning platforms, and by this varying the processing unit and ancillary software only. The survey shows that no machine learning platform supports the feature set described by the proposed framework while the experiment reveals statstically significant difference in results when the exact same experiment is conducted on different machine learning platforms. The surveyed machine learning platforms do not on their own enable users to achieve the full reproducibility potential of their research. Also, the machine learning platforms with most users provide less functionality for achieving it. Furthermore, results differ when executing the same experiment on the different platforms. Wrong conclusions can be inferred at the at 95% confidence level. Hence, we conclude that machine learning platforms do not provide reproducibility out-of-the-box and that results generated from one machine learning platform alone cannot be fully trusted.

Highlights

A framework for comparing the support for reproducibility of machine learning platforms is proposed.
Machine learning platforms are surveyed for how well they support reproducibility.
The features that surveyed platforms should implement in order to improve reproducibility support are identified.
An investigation into the degree to which results differ when conducting the exact same experiment on four different machine learning platforms is carried out.

References

[1]
Baker M., 1,500 scientists lift the lid on reproducibility, Nature 533 (7604) (2016) 452–454,. https://doi.org/10.1038/533452a.
[2]
Pineau J., Reproducibility, reusability, and robustness in deep reinforcement learning, 2018, Keynote at ICLR 2018.
[3]
Nosek B.A., Alter G., Banks G.C., Borsboom D., Bowman S.D., Breckler S.J., Buck S., Chambers C.D., Chin G., Christensen G., et al., Promoting an open research culture, Science 348 (6242) (2015) 1422–1425.
[4]
Braun M.L., Ong C.S., Open science in machine learning, Implementing Reproducible Research (2014) 343–345.
[5]
Collberg C., Proebsting T.A., Repeatability in computer systems research, Commun. ACM 59 (3) (2016) 62–69,. https://doi.org/10.1145/2812803.
[6]
Torralba A., Efros A.A., Unbiased look at dataset bias, in: CVPR 2011, IEEE, 2011, pp. 1521–1528.
[7]
Henderson P., Islam R., Bachman P., Pineau J., Precup D., Meger D., Deep reinforcement learning that matters, 2018, https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16669.
[8]
Hong S.-Y., Koo M.-S., Jang J., Kim J.-E.E., Park H., Joh M.-S., Kang J.-H., Oh T.-J., An evaluation of the software system dependency of a global atmospheric model, Mon. Weather Rev. 141 (11) (2013) 4165–4172,.
[9]
Nagarajan P., Warnell G., Stone P., Deterministic implementations for reproducibility in deep reinforcement learning, in: AAAI 2019 Workshop on Reproducible AI, 2019.
[10]
Gundersen O.E., The fundamental principles of reproducibility, Phil. Trans. R. Soc. A Volume 379 (297) (2021).
[11]
Drummond C., Replicability is not reproducibility: nor is it good science, in: ICML Workshop, 2009.
[12]
Stodden V.C., Trust your science? Open your data and code, Amstat. News (2011) 21–22.
[13]
Peng R.D., Reproducible research in computational science, Science 334 (6060) (2011) 1226–1227.
[14]
Goodman S.N., Fanelli D., Ioannidis J.P.A., What does research reproducibility mean?, Sci. Transl. Med. 8 (341) (2016),. 341ps12–341ps12.
[15]
Gundersen O.E., Kjensmo S., State of the art: Reproducibility in artificial intelligence, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[16]
Tatman R., VanderPlas J., Dane S., A practical taxonomy of reproducibility for machine learning research, 2018.
[17]
National Academies of Sciences E., Medicine J., et al., Reproducibility and Replicability in Science, National Academies Press, 2019.
[18]
Devezer B., Nardin L.G., Baumgaertner B., Buzbas E.O., Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity, PLOS ONE 14 (5) (2019) 1–23,.
[19]
Plesser H.E., Reproducibility vs. Replicability: A brief history of a confused terminology, Front. Neuroinf. 11 (2018) 76,.
[20]
Buckheit J.B., Donoho D.L., Wavelab and reproducible research, Standford, CA, 1995, http://statweb.stanford.edu/ wavelab/Wavelab_850/wavelab.pdf.
[21]
Claerbout J.F., Karrenbach M., Electronic documents give reproducible research a new meaning, in: Proceedings of the 62nd Annual International Meeting of the Society of Exploration Geophysics, 1992, 25 to 29 October 1992. http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92.
[22]
Gil Y., David C.H., Demir I., Essawy B.T., Fulweiler R.W., Goodall J.L., Karlstrom L., Lee H., Mills H.J., Oh J.-H., et al., Toward the geoscience paper of the future: Best practices for documenting and sharing research from data to software to provenance, Earth Space Sci. 3 (10) (2016) 388–415.
[23]
Sethi A., Sankaran A., Panwar N., Khare S., Mani S., DLPaper2Code: Auto-generation of code from deep learning research papers, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[24]
Lasser J., Creating an executable paper is a journey through Open Science, Commun. Phys. 3 (1) (2020) 1–5.
[25]
Grus J., I do not like notebooks, 2018, Talk at JupyterCon 2018.
[26]
Perkel J.M., Reactive, reproducible, collaborative: computational notebooks evolve, Nature 593 (7857) (2021) 156–157.
[27]
Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.-W., da Silva Santos L.B., Bourne P.E., et al., The FAIR guiding principles for scientific data management and stewardship, Sci. Data 3 (2016).
[28]
Stodden V., McNutt M., Bailey D.H., Deelman E., Gil Y., Hanson B., Heroux M.A., Ioannidis J.P., Taufer M., Enhancing reproducibility for computational methods, Science 354 (6317) (2016) 1240–1241.
[29]
Starr J., Castro E., Crosas M., Dumontier M., Downs R.R., Duerr R., Haak L.L., Haendel M., Herman I., Hodson S., et al., Achieving human and machine accessibility of cited data in scholarly publications, PeerJ Comput. Sci. 1 (2015).
[30]
Cockburn A., Dragicevic P., Besançon L., Gutwin C., Threats of a replication crisis in empirical computer science, Commun. ACM 63 (8) (2020) 70–79.
[31]
Gundersen O.E., Gil Y., Aha D., Towards reproducible research, open science, and digital scholarship in AI publications, AI Mag. 39 (3) (2018) 56–68,. https://www.aaai.org/ojs/index.php/aimagazine/article/view/2816.
[32]
Anjos A., El Shafey L., Marcel S., BEAT: An open-science web platform, in: Thirty-Fourth International Conference on Machine Learning, 2017, https://openreview.net/group?id=ICML.cc/2017/RML. URL https://beat-eu.org/platform.
[33]
Sculley D., Holt G., Golovin D., Davydov E., Phillips T., Ebner D., Chaudhary V., Young M., Machine learning: The high interest credit card of technical debt, 2014.
[34]
Chollet F., et al., Keras: The python deep learning library, Astrophys. Source Code Libr. (2018) ascl–1806.
[35]
Gundersen O.E., Shamsaliei S., Isdahl R.J., Do machine learning platforms provide out-of-the-box Reproducibility? (experiments), 2021,.
[36]
Frankle J., Carbin M., The lottery ticket hypothesis: Finding sparse, trainable neural networks, in: International Conference on Learning Representations, 2018.
[37]
Ioannidis J.P.A., Why most published research findings are false, PLOS Med. 2 (8) (2005),.

Cited By

View all
  • (2024)Advancing Research Reproducibility in Machine Learning through Blockchain TechnologyInformatica10.15388/24-INFOR55335:2(227-253)Online publication date: 1-Jan-2024
  • (2024)Improving Dropout Prediction for Informatics Bachelor StudentsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659472(830-831)Online publication date: 8-Jul-2024
  • (2024)PARMA: a Platform Architecture to enable Automated, Reproducible, and Multi-party Assessments of AI TrustworthinessProceedings of the 2nd International Workshop on Responsible AI Engineering10.1145/3643691.3648585(20-27)Online publication date: 16-Apr-2024
  • Show More Cited By

Index Terms

  1. Do machine learning platforms provide out-of-the-box reproducibility?
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Future Generation Computer Systems
          Future Generation Computer Systems  Volume 126, Issue C
          Jan 2022
          340 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 01 January 2022

          Author Tags

          1. Reproducibility
          2. Reproducible AI
          3. Machine learning
          4. Survey
          5. Reproducibility experiment

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 14 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Advancing Research Reproducibility in Machine Learning through Blockchain TechnologyInformatica10.15388/24-INFOR55335:2(227-253)Online publication date: 1-Jan-2024
          • (2024)Improving Dropout Prediction for Informatics Bachelor StudentsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659472(830-831)Online publication date: 8-Jul-2024
          • (2024)PARMA: a Platform Architecture to enable Automated, Reproducible, and Multi-party Assessments of AI TrustworthinessProceedings of the 2nd International Workshop on Responsible AI Engineering10.1145/3643691.3648585(20-27)Online publication date: 16-Apr-2024
          • (2024)Pseudo-unknown uncertainty learning for open set object detectionKnowledge-Based Systems10.1016/j.knosys.2024.112414303:COnline publication date: 4-Nov-2024
          • (2024)MLSea: A Semantic Layer for Discoverable Machine LearningThe Semantic Web10.1007/978-3-031-60635-9_11(178-198)Online publication date: 26-May-2024
          • (2023)Towards Risk-Free Trustworthy Artificial IntelligenceInternational Journal of Intelligent Systems10.1155/2023/44591982023Online publication date: 1-Jan-2023
          • (2023) ClayRSInformation Systems10.1016/j.is.2023.102273119:COnline publication date: 1-Oct-2023
          • (2022)A retrospective study of one decade of artifact evaluationsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549172(145-156)Online publication date: 7-Nov-2022
          • (2022)Reproducibility Crisis in the LOD Cloud? Studying the Impact of Ontology Accessibility and Archiving as a Counter MeasureThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_6(91-107)Online publication date: 23-Oct-2022

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media