Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3520312.3534868acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Automatically debugging AutoML pipelines using maro: ML automated remediation oracle

Published: 13 June 2022 Publication History

Abstract

Machine learning in practice often involves complex pipelines for data cleansing, feature engineering, preprocessing, and prediction. These pipelines are composed of operators, which have to be correctly connected and whose hyperparameters must be correctly configured. Unfortunately, it is quite common for certain combinations of datasets, operators, or hyperparameters to cause failures. Diagnosing and fixing those failures is tedious and error-prone and can seriously derail a data scientist's workflow. This paper describes an approach for automatically debugging an ML pipeline, explaining the failures, and producing a remediation. We implemented our approach, which builds on a combination of AutoML and SMT, in a tool called Maro. Maro works seamlessly with the familiar data science ecosystem including Python, Jupyter notebooks, scikit-learn, and AutoML tools such as Hyperopt. We empirically evaluate our tool and find that for most cases, a single remediation automatically fixes errors, produces no additional faults, and does not significantly impact optimal accuracy nor time to convergence.

References

[1]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042
[2]
Andrea Arcuri and Lionel Briand. 2014. A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Software Testing, Verification and Reliability, 24, 3 (2014), 219–250. https://doi.org/10.1002/stvr.1486
[3]
A Arpteg, B Brinne, L Crnkovic-Friis, and J Bosch. 2018. Software Engineering Challenges of Deep Learning. In Conference on Software Engineering and Advanced Applications (SEAA). 50–59. https://doi.org/10.1109/SEAA.2018.00018
[4]
Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, and Stefanie Scherzinger. 2020. Not Elimination and Witness Generation for JSON Schema. In Conférence sur la Gestion de Données (BDA). arxiv:2104.14828
[5]
Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, and Avraham Shinnar. 2020. Lale: Consistent Automated Machine Learning. In KDD Workshop on Automation in Machine Learning (AutoML@KDD). arxiv:2007.01977
[6]
Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, Avraham Shinnar, and Jason Tsay. 2021. Pipeline Combinators for Gradual AutoML. In Advances in Neural Information Processing Systems (NeurIPS). https://proceedings.neurips.cc/paper/2021/file/a3b36cb25e2e0b93b5f334ffb4e4064e-Paper.pdf
[7]
R K E Bellamy, K Dey, M Hind, S C Hoffman, S Houde, K Kannan, P Lohia, J Martino, S Mehta, A Mojsilović, S Nagar, K N Ramamurthy, J Richards, D Saha, P Sattigeri, M Singh, K R Varshney, and Y Zhang. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development, 63, 4/5 (2019), jul, 4:1–4:15. issn:0018-8646 https://doi.org/10.1147/JRD.2019.2942287
[8]
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In International Conference on Machine Learning (ICML). 115–123.
[9]
José P. Cambronero, Jürgen Cito, and Martin C. Rinard. 2020. AMS: Generating AutoML Search Spaces from Weak Specifications. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 763–774. https://doi.org/10.1145/3368089.3409700
[10]
Julian Dolby, Jason Tsay, and Martin Hirzel. 2022. Automatically Debugging AutoML Pipelines Using Maro: ML Automated Remediation Oracle (Extended Version). arxiv:2205.01311.
[11]
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Conference on Neural Information Processing Systems (NIPS). 2962–2970. http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf
[12]
Muhammad Ali Gulzar, Siman Wang, and Miryung Kim. 2018. BigSift: Automated Debugging of Big Data Analytics in Data-Intensive Scalable Computing. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 863–866. https://doi.org/10.1145/3236024.3264586
[13]
Odd Erik Gundersen and Sigbjørn Kjensmo. 2017. State of the Art: Reproducibility in Artificial Intelligence. In Conference on Artificial Intelligence (AAAI). 1644–1651. https://ojs.aaai.org/index.php/AAAI/article/view/11503
[14]
Andrew Habib, Avraham Shinnar, Martin Hirzel, and Michael Pradel. 2021. Finding Data Compatibility Bugs with JSON Subschema Checking. In International Symposium on Software Testing and Analysis (ISSTA). 620–632. https://doi.org/10.1145/3460319.3464796
[15]
Fitash Ul Haq, Donghwan Shin, Lionel C Briand, Thomas Stifter, and Jun Wang. 2021. Automatic Test Suite Generation for Key-Points Detection DNNs Using Many-Objective Search (Experience Paper). In International Symposium on Software Testing and Analysis (ISSTA). Association for Computing Machinery, 91–102. isbn:9781450384599 https://doi.org/10.1145/3460319.3464802
[16]
C Hill, R Bellamy, T Erickson, and M Burnett. 2016. Trials and tribulations of developers of intelligent systems: A field study. In Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 162–170. https://doi.org/10.1109/VLHCC.2016.7739680
[17]
Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In International Conference on Software Engineering (ICSE). 1110–1121. https://doi.org/10.1145/3377811.3380395
[18]
Robert Ikeda, Junsang Cho, Charlie Fang, Semih Salihoglu, Satoshi Torikai, and Jennifer Widom. 2012. Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows. In International Conference on Data Engineering (ICDE). 1249–1252. https://doi.org/10.1109/ICDE.2012.118
[19]
Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 510–520. https://doi.org/10.1145/3338906.3338955
[20]
Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Conference on Knowledge Discovery and Data Mining (KDD). 1946–1956. http://doi.acm.org/10.1145/3292500.3330648
[21]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Conference on Neural Information Processing Systems (NIPS). 3146–3154. http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree
[22]
Zixi Liu, Yang Feng, and Zhenyu Chen. 2021. DialTest: Automated Testing for Recurrent-Neural-Network-Driven Dialogue Systems. In International Symposium on Software Testing and Analysis (ISSTA). 115–126. https://doi.org/10.1145/3460319.3464829
[23]
Raoni Lourenço, Juliana Freire, and Dennis Shasha. 2020. BugDoc: A System for Debugging Computational Pipelines. In International Conference on Management of Data (SIGMOD). 2733–2736. isbn:9781450367356 https://doi.org/10.1145/3318464.3384692
[24]
H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. 2013. SemFix: Program Repair via Semantic Analysis. In International Conference on Software Engineering (ICSE). 772–781. https://doi.org/10.1109/ICSE.2013.6606623
[25]
Besmira Nushi, Ece Kamar, Eric Horvitz, and Donald Kossmann. 2017. On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems. In Conference on Artificial Intelligence (AAAI). 1017–1025. https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/15032/0
[26]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (JMLR), 12 (2011), 2825–2830.
[27]
El Kindi Rezig, Ashrita Brahmaroutu, Nesime Tatbul, Mourad Ouzzani, Nan Tang, Timothy Mattson, Samuel Madden, and Michael Stonebraker. 2020. Debugging Large-Scale Data Science Pipelines Using Dagger. In Demonstration at the Conference on Very Large Data Bases (VLDB-Demo). 2993–2996. https://doi.org/10.14778/3415478.3415527
[28]
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Conference on Neural Information Processing Systems (NIPS). 2503–2511. http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
[29]
Jason Teoh, Muhammad Ali Gulzar, Guoqing Harry Xu, and Miryung Kim. 2019. PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems. In Symposium on Cloud Computing (SoCC). 465–476. isbn:9781450369732 https://doi.org/10.1145/3357223.3362727
[30]
Emina Torlak and Rastislav Bodik. 2013. Growing Solver-Aided Languages with Rosette. In Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!). 135–152. https://doi.org/10.1145/2509578.2509586
[31]
Zhiyuan Wan, Xin Xia, David Lo, and Gail C. Murphy. 2019. How does Machine Learning Change Software Development Practices? Transactions on Software Engineering (TSE), https://doi.org/10.1109/TSE.2019.2937083
[32]
Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep Learning Library Testing via Effective Model Generation. In Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (ESEC/FSE 2020). 788–799. isbn:9781450370431 https://doi.org/10.1145/3368089.3409761

Cited By

View all
  • (2023)Negation-Closure for JSON SchemaTheoretical Computer Science10.1016/j.tcs.2023.113823(113823)Online publication date: Mar-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MAPS 2022: Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming
June 2022
79 pages
ISBN:9781450392730
DOI:10.1145/3520312
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. AI Debugging
  2. AutoML
  3. Automated Debugging
  4. Automated Remediation

Qualifiers

  • Research-article

Conference

MAPS '22
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Negation-Closure for JSON SchemaTheoretical Computer Science10.1016/j.tcs.2023.113823(113823)Online publication date: Mar-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media