Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3595360.3595859acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

MLflow2PROV: Extracting Provenance from Machine Learning Experiments

Published: 18 June 2023 Publication History

Abstract

Supporting iterative and explorative workflows for developing machine learning (ML) models, ML experiment management systems (ML EMSs), such as MLflow, are increasingly used to simplify the structured collection and management of ML artifacts, such as ML models, metadata, and code. However, EMSs typically suffer from limited provenance capabilities. As a consequence, it is hard to analyze provenance information and gain knowledge that can be used to improve both ML models and their development workflows. We propose a W3C-PROV-compliant provenance model capturing ML experiment activities that originate from Git and MLflow usage. Moreover, we present the tool MLflow2PROV that extracts provenance graphs according to our model, enabling querying, analyzing, and further processing of collected provenance information.

References

[1]
Saleema Amershi, Andrew Begel, et al. 2019. Software Engineering for Machine Learning: A Case Study. In SEIP@ICSE '19. 291--300.
[2]
Vineet Chaoji, Rajeev Rastogi, et al. 2016. Machine Learning in the Real World. PVLDB 9, 13 (2016), 1597--1600.
[3]
Andrew Chen, Andy Chow, et al. 2020. Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle. In DEEM@SIGMOD '20. 5:1--5:4.
[4]
Diego Esteves, Diego Moussallem, et al. 2015. MEX Vocabulary: A Lightweight Interchange Format for Machine Learning Experiments. In SEMANTICS '15. 169--176.
[5]
Diego Esteves, Agnieszka Ławrynowicz, et al. 2016. ML Schema Core Specification. Tech. Rep. W3C Machine Learning Schema Community Group. https://ml-schema.github.io/documentation/ML%20Schema.html.
[6]
Paul Groth and Luc Moreau. 2013. PROV-Overview: An Overview of the PROV Family of Documents. Tech. Rep. W3C.
[7]
Matthew Hartley and Tjelvar S. G. Olsson. 2020. dtoolAI: Reproducibility for Deep Learning. Patterns 1, 5 (2020), 100073.
[8]
Trung Dong Huynh and Luc Moreau. 2014. ProvStore: A Public Provenance Repository. In IPAW '14 (LNCS, Vol. 8628). 275--277.
[9]
Hui Miao, Ang Li, et al. 2017. On Model Discovery For Hosted Data Science Projects. In DEEM@SIGMOD '17. 6:1--6:4.
[10]
Luc Moreau, Paolo Missier, et al. 2013. PROV-DM: The PROV data model. Tech. Rep. W3C. https://www.w3.org/TR/prov-dm/.
[11]
Mohammad Hossein Namaki, Avrilia Floratou, et al. 2020. Vamsa: Automated Provenance Tracking in Data Science Scripts. In KDD '20. 1542--1551.
[12]
Harry Percival and Bob Gregory. 2020. Architecture Patterns with Python. O'Reilly.
[13]
Sebastian Schelter, Joos-Hendrik Böse, et al. 2017. Automatically Tracking Metadata and Provenance of Machine Learning Experiments. In MLSys@NIPS '17.
[14]
Marius Schlegel and Kai-Uwe Sattler. 2022. Management of Machine Learning Lifecycle Artifacts: A Survey. ACM SIGMOD Record 51, 4 (2022), 18--35.
[15]
Andreas Schreiber, Claas de Boer, et al. 2021. GitLab2PROV - Provenance of Software Projects hosted on GitLab. In TaPP '21.
[16]
Renan Souza, Leonardo Azevedo, et al. 2022. Workflow Provenance in the Lifecycle of Scientific Machine Learning. Concurr. Comput. Pract. Exp. 34, 14 (2022).
[17]
Matei Zaharia, Andrew Chen, et al. 2018. Accelerating the Machine Learning Lifecycle with MLflow. IEEE Data Eng. Bull. 41, 4 (2018), 39--45.

Cited By

View all
  • (2024)DLProv: A Data-Centric Support for Deep Learning Workflow AnalysesProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663337(77-85)Online publication date: 9-Jun-2024
  • (2024)Collaboration Management for Federated Learning2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00043(291-300)Online publication date: 13-May-2024
  • (2024)Everything Everyway All at Once - Time Traveling Debugging for Stream Processing Applications2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00131(1606-1618)Online publication date: 13-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning
June 2023
51 pages
ISBN:9798400702044
DOI:10.1145/3595360
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2023

Check for updates

Author Tags

  1. machine learning experiments
  2. MLflow
  3. provenance
  4. W3C PROV

Qualifiers

  • Research-article

Conference

DEEM '23
Sponsor:

Acceptance Rates

DEEM '23 Paper Acceptance Rate 9 of 13 submissions, 69%;
Overall Acceptance Rate 44 of 67 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)754
  • Downloads (Last 6 weeks)87
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DLProv: A Data-Centric Support for Deep Learning Workflow AnalysesProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663337(77-85)Online publication date: 9-Jun-2024
  • (2024)Collaboration Management for Federated Learning2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00043(291-300)Online publication date: 13-May-2024
  • (2024)Everything Everyway All at Once - Time Traveling Debugging for Stream Processing Applications2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00131(1606-1618)Online publication date: 13-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media