research-article

Automatically debugging AutoML pipelines using maro: ML automated remediation oracle

Authors:

Martin HirzelAuthors Info & Claims

MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming

Pages 60 - 69

https://doi.org/10.1145/3520312.3534868

Published: 13 June 2022 Publication History

Abstract

Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters must be correctly configured. Unfortunately, it is quite common for certain combinations of datasets, operators, or hyperparameters to cause failures. Diagnosing and fixing those failures is tedious and error-prone and can seriously derail a data scientist's workflow. This paper describes an approach for automatically debugging an ML pipeline, explaining the failures, and producing a remediation. We implemented our approach, which builds on a combination of AutoML and SMT, in a tool called Maro. Maro works seamlessly with the familiar data science ecosystem including Python, Jupyter notebooks, scikit-learn, and AutoML tools such as Hyperopt. We empirically evaluate our tool and find that for most cases, a single remediation automatically fixes errors, produces no additional faults, and does not significantly impact optimal accuracy nor time to convergence.

References

[1]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042

Digital Library

[2]

Andrea Arcuri and Lionel Briand. 2014. A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Software Testing, Verification and Reliability, 24, 3 (2014), 219–250. https://doi.org/10.1002/stvr.1486

Digital Library

[3]

A Arpteg, B Brinne, L Crnkovic-Friis, and J Bosch. 2018. Software Engineering Challenges of Deep Learning. In Conference on Software Engineering and Advanced Applications (SEAA). 50–59. https://doi.org/10.1109/SEAA.2018.00018

[4]

Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2020. Not Elimination and Witness Generation for JSON Schema. In Conférence sur la Gestion de Données (BDA). arxiv:2104.14828

[5]

Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, and Avraham Shinnar. 2020. Lale: Consistent Automated Machine Learning. In KDD Workshop on Automation in Machine Learning (AutoML@KDD). arxiv:2007.01977

[6]

Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, Avraham Shinnar, and Jason Tsay. 2021. Pipeline Combinators for Gradual AutoML. In Advances in Neural Information Processing Systems (NeurIPS). https://proceedings.neurips.cc/paper/2021/file/a3b36cb25e2e0b93b5f334ffb4e4064e-Paper.pdf

[7]

R K E Bellamy, K Dey, M Hind, S C Hoffman, S Houde, K Kannan, P Lohia, J Martino, S Mehta, A Mojsilović, S Nagar, K N Ramamurthy, J Richards, D Saha, P Sattigeri, M Singh, K R Varshney, and Y Zhang. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63, 4/5 (2019), jul, 4:1–4:15. issn:0018-8646 https://doi.org/10.1147/JRD.2019.2942287

[8]

James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In International Conference on Machine Learning (ICML). 115–123.

[9]

José P. Cambronero, Jürgen Cito, and Martin C. Rinard. 2020. AMS: Generating AutoML Search Spaces from Weak Specifications. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 763–774. https://doi.org/10.1145/3368089.3409700

Digital Library

[10]

Julian Dolby, Jason Tsay, and Martin Hirzel. 2022. Automatically Debugging AutoML Pipelines Using Maro: ML Automated Remediation Oracle (Extended Version). arxiv:2205.01311.

[11]

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Conference on Neural Information Processing Systems (NIPS). 2962–2970. http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf

[12]

Muhammad Ali Gulzar, Siman Wang, and Miryung Kim. 2018. BigSift: Automated Debugging of Big Data Analytics in Data-Intensive Scalable Computing. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 863–866. https://doi.org/10.1145/3236024.3264586

Digital Library

[13]

Odd Erik Gundersen and Sigbjørn Kjensmo. 2017. State of the Art: Reproducibility in Artificial Intelligence. In Conference on Artificial Intelligence (AAAI). 1644–1651. https://ojs.aaai.org/index.php/AAAI/article/view/11503

[14]

Andrew Habib, Avraham Shinnar, Martin Hirzel, and Michael Pradel. 2021. Finding Data Compatibility Bugs with JSON Subschema Checking. In International Symposium on Software Testing and Analysis (ISSTA). 620–632. https://doi.org/10.1145/3460319.3464796

Digital Library

[15]

Fitash Ul Haq, Donghwan Shin, Lionel C Briand, Thomas Stifter, and Jun Wang. 2021. Automatic Test Suite Generation for Key-Points Detection DNNs Using Many-Objective Search (Experience Paper). In International Symposium on Software Testing and Analysis (ISSTA). Association for Computing Machinery, 91–102. isbn:9781450384599 https://doi.org/10.1145/3460319.3464802

Digital Library

[16]

C Hill, R Bellamy, T Erickson, and M Burnett. 2016. Trials and tribulations of developers of intelligent systems: A field study. In Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 162–170. https://doi.org/10.1109/VLHCC.2016.7739680

[17]

Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In International Conference on Software Engineering (ICSE). 1110–1121. https://doi.org/10.1145/3377811.3380395

Digital Library

[18]

Robert Ikeda, Junsang Cho, Charlie Fang, Semih Salihoglu, Satoshi Torikai, and Jennifer Widom. 2012. Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows. In International Conference on Data Engineering (ICDE). 1249–1252. https://doi.org/10.1109/ICDE.2012.118

Digital Library

[19]

Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 510–520. https://doi.org/10.1145/3338906.3338955

Digital Library

[20]

Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Conference on Knowledge Discovery and Data Mining (KDD). 1946–1956. http://doi.acm.org/10.1145/3292500.3330648

Digital Library

[21]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Conference on Neural Information Processing Systems (NIPS). 3146–3154. http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree

[22]

Zixi Liu, Yang Feng, and Zhenyu Chen. 2021. DialTest: Automated Testing for Recurrent-Neural-Network-Driven Dialogue Systems. In International Symposium on Software Testing and Analysis (ISSTA). 115–126. https://doi.org/10.1145/3460319.3464829

Digital Library

[23]

Raoni Lourenço, Juliana Freire, and Dennis Shasha. 2020. BugDoc: A System for Debugging Computational Pipelines. In International Conference on Management of Data (SIGMOD). 2733–2736. isbn:9781450367356 https://doi.org/10.1145/3318464.3384692

Digital Library

[24]

H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. 2013. SemFix: Program Repair via Semantic Analysis. In International Conference on Software Engineering (ICSE). 772–781. https://doi.org/10.1109/ICSE.2013.6606623

[25]

Besmira Nushi, Ece Kamar, Eric Horvitz, and Donald Kossmann. 2017. On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems. In Conference on Artificial Intelligence (AAAI). 1017–1025. https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/15032/0

[26]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (JMLR), 12 (2011), 2825–2830.

Digital Library

[27]

El Kindi Rezig, Ashrita Brahmaroutu, Nesime Tatbul, Mourad Ouzzani, Nan Tang, Timothy Mattson, Samuel Madden, and Michael Stonebraker. 2020. Debugging Large-Scale Data Science Pipelines Using Dagger. In Demonstration at the Conference on Very Large Data Bases (VLDB-Demo). 2993–2996. https://doi.org/10.14778/3415478.3415527

Digital Library

[28]

D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Conference on Neural Information Processing Systems (NIPS). 2503–2511. http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems

[29]

Jason Teoh, Muhammad Ali Gulzar, Guoqing Harry Xu, and Miryung Kim. 2019. PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems. In Symposium on Cloud Computing (SoCC). 465–476. isbn:9781450369732 https://doi.org/10.1145/3357223.3362727

Digital Library

[30]

Emina Torlak and Rastislav Bodik. 2013. Growing Solver-Aided Languages with Rosette. In Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!). 135–152. https://doi.org/10.1145/2509578.2509586

Digital Library

[31]

Zhiyuan Wan, Xin Xia, David Lo, and Gail C. Murphy. 2019. How does Machine Learning Change Software Development Practices? Transactions on Software Engineering (TSE), https://doi.org/10.1109/TSE.2019.2937083

[32]

Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep Learning Library Testing via Effective Model Generation. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (ESEC/FSE 2020). 788–799. isbn:9781450370431 https://doi.org/10.1145/3368089.3409761

Digital Library

Cited By

Baazizi MColazzo DGhelli GSartiani CScherzinger S(2023)Negation-Closure for JSON SchemaTheoretical Computer Science10.1016/j.tcs.2023.113823(113823)Online publication date: Mar-2023
https://doi.org/10.1016/j.tcs.2023.113823

Recommendations

Productivity assessment of neural code completion
MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming

Neural code synthesis has reached a point where snippet generation is accurate enough to be considered for integration into human software development workflows. Commercial products aim to increase programmers’ productivity, without being able to ...
Syntax-guided program reduction for understanding neural code intelligence models
MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming

Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. ...
A systematic evaluation of large language models of code
MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex) are not publicly available, leaving many ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming

June 2022

79 pages

ISBN:9781450392730

DOI:10.1145/3520312

General Chairs:
Swarat Chaudhuri
University of Texas at Austin, USA
,
Charles Sutton
Google Research, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Conference

MAPS '22

Sponsor:

SIGPLAN

MAPS '22: 6th ACM SIGPLAN International Symposium on Machine Programming

June 13, 2022

CA, San Diego, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
119
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Baazizi MColazzo DGhelli GSartiani CScherzinger S(2023)Negation-Closure for JSON SchemaTheoretical Computer Science10.1016/j.tcs.2023.113823(113823)Online publication date: Mar-2023
https://doi.org/10.1016/j.tcs.2023.113823

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten