research-article

Do machine learning platforms provide out-of-the-box reproducibility?

Authors:

Odd Erik Gundersen,

Saeid Shamsaliei,

Richard Juul IsdahlAuthors Info & Claims

Volume 126, Issue C

Pages 34 - 47

https://doi.org/10.1016/j.future.2021.06.014

Published: 01 January 2022 Publication History

Abstract

Science is experiencing an ongoing reproducibility crisis. In light of this crisis, our objective is to investigate whether machine learning platforms provide out-of-the-box reproducibility. Our method is twofold: First, we survey machine learning platforms for whether they provide features that simplify making experiments reproducible out-of-the-box. Second, we conduct the exact same experiment on four different machine learning platforms, and by this varying the processing unit and ancillary software only. The survey shows that no machine learning platform supports the feature set described by the proposed framework while the experiment reveals statstically significant difference in results when the exact same experiment is conducted on different machine learning platforms. The surveyed machine learning platforms do not on their own enable users to achieve the full reproducibility potential of their research. Also, the machine learning platforms with most users provide less functionality for achieving it. Furthermore, results differ when executing the same experiment on the different platforms. Wrong conclusions can be inferred at the at 95% confidence level. Hence, we conclude that machine learning platforms do not provide reproducibility out-of-the-box and that results generated from one machine learning platform alone cannot be fully trusted.

Highlights

•

A framework for comparing the support for reproducibility of machine learning platforms is proposed.

•

Machine learning platforms are surveyed for how well they support reproducibility.

•

The features that surveyed platforms should implement in order to improve reproducibility support are identified.

•

An investigation into the degree to which results differ when conducting the exact same experiment on four different machine learning platforms is carried out.

References

[1]

Baker M., 1,500 scientists lift the lid on reproducibility, Nature 533 (7604) (2016) 452–454,. https://doi.org/10.1038/533452a.

[2]

Pineau J., Reproducibility, reusability, and robustness in deep reinforcement learning, 2018, Keynote at ICLR 2018.

[3]

Nosek B.A., Alter G., Banks G.C., Borsboom D., Bowman S.D., Breckler S.J., Buck S., Chambers C.D., Chin G., Christensen G., et al., Promoting an open research culture, Science 348 (6242) (2015) 1422–1425.

[4]

Braun M.L., Ong C.S., Open science in machine learning, Implementing Reproducible Research (2014) 343–345.

[5]

Collberg C., Proebsting T.A., Repeatability in computer systems research, Commun. ACM 59 (3) (2016) 62–69,. https://doi.org/10.1145/2812803.

Digital Library

[6]

Torralba A., Efros A.A., Unbiased look at dataset bias, in: CVPR 2011, IEEE, 2011, pp. 1521–1528.

[7]

Henderson P., Islam R., Bachman P., Pineau J., Precup D., Meger D., Deep reinforcement learning that matters, 2018, https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16669.

[8]

Hong S.-Y., Koo M.-S., Jang J., Kim J.-E.E., Park H., Joh M.-S., Kang J.-H., Oh T.-J., An evaluation of the software system dependency of a global atmospheric model, Mon. Weather Rev. 141 (11) (2013) 4165–4172,.

[9]

Nagarajan P., Warnell G., Stone P., Deterministic implementations for reproducibility in deep reinforcement learning, in: AAAI 2019 Workshop on Reproducible AI, 2019.

[10]

Gundersen O.E., The fundamental principles of reproducibility, Phil. Trans. R. Soc. A Volume 379 (297) (2021).

[11]

Drummond C., Replicability is not reproducibility: nor is it good science, in: ICML Workshop, 2009.

[12]

Stodden V.C., Trust your science? Open your data and code, Amstat. News (2011) 21–22.

[13]

Peng R.D., Reproducible research in computational science, Science 334 (6060) (2011) 1226–1227.

[14]

Goodman S.N., Fanelli D., Ioannidis J.P.A., What does research reproducibility mean?, Sci. Transl. Med. 8 (341) (2016),. 341ps12–341ps12.

[15]

Gundersen O.E., Kjensmo S., State of the art: Reproducibility in artificial intelligence, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[16]

Tatman R., VanderPlas J., Dane S., A practical taxonomy of reproducibility for machine learning research, 2018.

[17]

National Academies of Sciences E., Medicine J., et al., Reproducibility and Replicability in Science, National Academies Press, 2019.

[18]

Devezer B., Nardin L.G., Baumgaertner B., Buzbas E.O., Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity, PLOS ONE 14 (5) (2019) 1–23,.

[19]

Plesser H.E., Reproducibility vs. Replicability: A brief history of a confused terminology, Front. Neuroinf. 11 (2018) 76,.

[20]

Buckheit J.B., Donoho D.L., Wavelab and reproducible research, Standford, CA, 1995, http://statweb.stanford.edu/ wavelab/Wavelab_850/wavelab.pdf.

[21]

Claerbout J.F., Karrenbach M., Electronic documents give reproducible research a new meaning, in: Proceedings of the 62nd Annual International Meeting of the Society of Exploration Geophysics, 1992, 25 to 29 October 1992. http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92.

[22]

Gil Y., David C.H., Demir I., Essawy B.T., Fulweiler R.W., Goodall J.L., Karlstrom L., Lee H., Mills H.J., Oh J.-H., et al., Toward the geoscience paper of the future: Best practices for documenting and sharing research from data to software to provenance, Earth Space Sci. 3 (10) (2016) 388–415.

[23]

Sethi A., Sankaran A., Panwar N., Khare S., Mani S., DLPaper2Code: Auto-generation of code from deep learning research papers, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[24]

Lasser J., Creating an executable paper is a journey through Open Science, Commun. Phys. 3 (1) (2020) 1–5.

[25]

Grus J., I do not like notebooks, 2018, Talk at JupyterCon 2018.

[26]

Perkel J.M., Reactive, reproducible, collaborative: computational notebooks evolve, Nature 593 (7857) (2021) 156–157.

[27]

Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.-W., da Silva Santos L.B., Bourne P.E., et al., The FAIR guiding principles for scientific data management and stewardship, Sci. Data 3 (2016).

[28]

Stodden V., McNutt M., Bailey D.H., Deelman E., Gil Y., Hanson B., Heroux M.A., Ioannidis J.P., Taufer M., Enhancing reproducibility for computational methods, Science 354 (6317) (2016) 1240–1241.

[29]

Starr J., Castro E., Crosas M., Dumontier M., Downs R.R., Duerr R., Haak L.L., Haendel M., Herman I., Hodson S., et al., Achieving human and machine accessibility of cited data in scholarly publications, PeerJ Comput. Sci. 1 (2015).

[30]

Cockburn A., Dragicevic P., Besançon L., Gutwin C., Threats of a replication crisis in empirical computer science, Commun. ACM 63 (8) (2020) 70–79.

[31]

Gundersen O.E., Gil Y., Aha D., Towards reproducible research, open science, and digital scholarship in AI publications, AI Mag. 39 (3) (2018) 56–68,. https://www.aaai.org/ojs/index.php/aimagazine/article/view/2816.

Digital Library

[32]

Anjos A., El Shafey L., Marcel S., BEAT: An open-science web platform, in: Thirty-Fourth International Conference on Machine Learning, 2017, https://openreview.net/group?id=ICML.cc/2017/RML. URL https://beat-eu.org/platform.

[33]

Sculley D., Holt G., Golovin D., Davydov E., Phillips T., Ebner D., Chaudhary V., Young M., Machine learning: The high interest credit card of technical debt, 2014.

[34]

Chollet F., et al., Keras: The python deep learning library, Astrophys. Source Code Libr. (2018) ascl–1806.

[35]

Gundersen O.E., Shamsaliei S., Isdahl R.J., Do machine learning platforms provide out-of-the-box Reproducibility? (experiments), 2021,.

[36]

Frankle J., Carbin M., The lottery ticket hypothesis: Finding sparse, trainable neural networks, in: International Conference on Learning Representations, 2018.

[37]

Ioannidis J.P.A., Why most published research findings are false, PLOS Med. 2 (8) (2005),.

Cited By

Filatovas EStripinis LOrts FPaulavičius R(2024)Advancing Research Reproducibility in Machine Learning through Blockchain TechnologyInformatica10.15388/24-INFOR55335:2(227-253)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.15388/24-INFOR553
Kern BMonga MLonati VBarendsen ESheard JPaterson J(2024)Improving Dropout Prediction for Informatics Bachelor StudentsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659472(830-831)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3649405.3659472
Pintz MBecker DMock M(2024)PARMA: a Platform Architecture to enable Automated, Reproducible, and Multi-party Assessments of AI TrustworthinessProceedings of the 2nd International Workshop on Responsible AI Engineering10.1145/3643691.3648585(20-27)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3643691.3648585
Show More Cited By

Index Terms

Do machine learning platforms provide out-of-the-box reproducibility?

Index terms have been assigned to the content through auto-classification.

Recommendations

"Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security Conferences
CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

Reproducibility is crucial to the advancement of science; it strengthens confidence in seemingly contradictory results and expands the boundaries of known discoveries. Computer Security has the natural benefit of creating artifacts that should facilitate ...
Integrated Reproducibility with Self-describing Machine Learning Models
ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and Replicability

Researchers and data scientists frequently want to collaborate on machine learning models. However, in the presence of sharing and simultaneous experimentation, it is challenging both to determine if two models were trained identically and to reproduce ...
Advancing Research Reproducibility in Machine Learning through Blockchain Technology

Like other disciplines, machine learning is currently facing a reproducibility crisis that hinders the advancement of scientific research. Researchers face difficulties reproducing key results due to the lack of critical details, including the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems

Future Generation Computer Systems Volume 126, Issue C

Jan 2022

340 pages

ISSN:0167-739X

Issue’s Table of Contents

The Author(s).

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Filatovas EStripinis LOrts FPaulavičius R(2024)Advancing Research Reproducibility in Machine Learning through Blockchain TechnologyInformatica10.15388/24-INFOR55335:2(227-253)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.15388/24-INFOR553
Kern BMonga MLonati VBarendsen ESheard JPaterson J(2024)Improving Dropout Prediction for Informatics Bachelor StudentsProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659472(830-831)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3649405.3659472
Pintz MBecker DMock M(2024)PARMA: a Platform Architecture to enable Automated, Reproducible, and Multi-party Assessments of AI TrustworthinessProceedings of the 2nd International Workshop on Responsible AI Engineering10.1145/3643691.3648585(20-27)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3643691.3648585
Han JChen Y(2024)Pseudo-unknown uncertainty learning for open set object detectionKnowledge-Based Systems10.1016/j.knosys.2024.112414303:COnline publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112414
Dasoulas IYang DDimou A(2024)MLSea: A Semantic Layer for Discoverable Machine LearningThe Semantic Web10.1007/978-3-031-60635-9_11(178-198)Online publication date: 26-May-2024
https://dl.acm.org/doi/10.1007/978-3-031-60635-9_11
Alzubaidi LAl-Sabaawi ABai JDukhan AAlkenani AAl-Asadi AAlwzwazy HManoufali MFadhel MAlbahri AMoreira COuyang CZhang JSantamaría JSalhi AHollman FGupta ADuan YRabczuk TAbbosh AGu Y(2023)Towards Risk-Free Trustworthy Artificial IntelligenceInternational Journal of Intelligent Systems10.1155/2023/44591982023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/4459198
Lops PPolignano MMusto CSilletti ASemeraro G(2023) ClayRSInformation Systems10.1016/j.is.2023.102273119:COnline publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1016/j.is.2023.102273
Winter STimperley CHermann BCito JBell JHilton MBeyer DRoychoudhury ACadar CKim M(2022)A retrospective study of one decade of artifact evaluationsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549172(145-156)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549172
Frey JStreitmatter DArndt NHellmann S(2022)Reproducibility Crisis in the LOD Cloud? Studying the Impact of Ontology Accessibility and Archiving as a Counter MeasureThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_6(91-107)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19433-7_6

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents