research-article

Open access

Using Run-Time Information to Enhance Static Analysis of Machine Learning Code in Notebooks

Authors:

José Antonio Hernández López,

Dániel VarróAuthors Info & Claims

FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering

Pages 497 - 501

https://doi.org/10.1145/3663529.3663785

Published: 10 July 2024 Publication History

Abstract

A prevalent method for developing machine learning (ML) prototypes involves the use of notebooks. Notebooks are sequences of cells containing both code and natural language documentation. When executed during development, these code cells provide valuable run-time information. Nevertheless, current static analyzers for notebooks do not leverage this run-time information to detect ML bugs. Consequently, our primary proposition in this paper is that harvesting this run-time information in notebooks can significantly improve the effectiveness of static analysis in detecting ML bugs. To substantiate our claim, we focus on bugs related to tensor shapes and conduct experiments using two static analyzers: 1) PYTHIA, a traditional rule-based static analyzer, and 2) GPT-4, a large language model that can also be used as a static analyzer. The results demonstrate that using run-time information in static analyzers enhances their bug detection performance and it also helped reveal a hidden bug in a public dataset.

References

[1]

Wilson Baker, Michael O’Connor, Seyed Reza Shahamiri, and Valerio Terragni. 2022. Detect, Fix, and Verify TensorFlow API Misuses. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA. 925–929. issn:1534-5351 https://doi.org/10.1109/SANER53432.2022.00110

[2]

Julian Dolby, Avraham Shinnar, Allison Allain, and Jenna Reinen. 2018. Ariadne: Analysis for Machine Learning Programs. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2018). Association for Computing Machinery, New York, NY, USA. 1–10. isbn:9781450358347 https://doi.org/10.1145/3211346.3211349

Digital Library

[3]

Konstantin Grotov, Sergey Titov, Vladimir Sotnikov, Yaroslav Golubev, and Timofey Bryksin. 2022. A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts. In Proceedings of the 19th International Conference on Mining Software Repositories (MSR ’22). Association for Computing Machinery, NY, USA. 353–364. isbn:9781450393034 https://doi.org/10.1145/3524842.3528447

Digital Library

[4]

Md. Asraful Haque and Shuai Li. 2023. The Potential Use of ChatGPT for Debugging and Bug Fixing. EAI Endorsed Transactions on AI and Robotics, 2 (2023), https://doi.org/10.4108/airo.v2i1.3276

[5]

Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1110–1121. isbn:9781450371216 https://doi.org/10.1145/3377811.3380395

Digital Library

[6]

Anastasia Ioannidou, Elisavet Chatzilari, Spiros Nikolopoulos, and Ioannis Kompatsiaris. 2017. Deep Learning Advances in Computer Vision with 3D Data: A Survey. ACM Comput. Surv., 50, 2 (2017), Article 20, apr, 38 pages. issn:0360-0300 https://doi.org/10.1145/3042064

Digital Library

[7]

Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA. 510–520. isbn:9781450355728 https://doi.org/10.1145/3338906.3338955

Digital Library

[8]

Ho Young Jhoo, Sehoon Kim, Woosung Song, Kyuyeon Park, DongKwon Lee, and Kwangkeun Yi. 2022. A Static Analyzer for Detecting Tensor Shape Errors in Deep Neural Network Training Code. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (ICSE ’22). Association for Computing Machinery, New York, NY, USA. 337–338. isbn:9781450392235 https://doi.org/10.1145/3510454.3528638

Digital Library

[9]

Sifis Lagouvardos, Julian Dolby, Neville Grech, Anastasios Antoniadis, and Yannis Smaragdakis. 2020. Static Analysis of Shape in TensorFlow Programs. In 34th European Conference on Object-Oriented Programming (ECOOP 2020), Robert Hirschfeld and Tobias Pape (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 166). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany. 15:1–15:29. isbn:978-3-95977-154-2 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2020.15

[10]

Maurizio Leotta, Dario Olianas, and Filippo Ricca. 2022. A Large Experimentation to Analyze the Effects of Implementation Bugs in Machine Learning Algorithms. Future Generation Computer Systems, 133 (2022), 184–200. issn:0167-739X https://doi.org/10.1016/j.future.2022.03.004

Digital Library

[11]

Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. 2023. Assisting Static Analysis with Large Language Models: A ChatGPT Experiment. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, NY, USA. 2107–2111. https://doi.org/10.1145/3611643.3613078

Digital Library

[12]

Ben Liblit, Linghui Luo, Alejandro Molina, Rajdeep Mukherjee, Zachary Patterson, Goran Piskachev, Martin Schäf, Omer Tripp, and Willem Visser. 2023. Shifting Left for Early Detection of Machine-Learning Bugs. In Formal Methods: 25th International Symposium. Springer-Verlag, Berlin, Heidelberg. 584–597. isbn:978-3-031-27480-0 https://doi.org/10.1007/978-3-031-27481-7_33

Digital Library

[13]

Stephen Macke, Hongpu Gong, Doris Jung-Lin Lee, Andrew Head, Doris Xin, and Aditya Parameswaran. 2021. Fine-Grained Lineage for Safer Notebook Interactions. Proc. VLDB Endow., 14, 6 (2021), 2, 1093–1101. issn:2150-8097 https://doi.org/10.14778/3447689.3447712

Digital Library

[14]

Mohammad Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh, and Zhen Jiang. 2023. Bug Characterization in Machine Learning-based Systems. Empirical Software Engineering, 29 (2023), 12, https://doi.org/10.1007/s10664-023-10400-0

Digital Library

[15]

Rajdeep Mukherjee, Omer Tripp, Ben Liblit, and Michael Wilson. 2022. Static Analysis for AWS Best Practices in Python Code. arxiv:2205.04432.

[16]

OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.

[17]

Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2018. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv., 51, 5 (2018), Article 92, 36 pages. issn:0360-0300 https://doi.org/10.1145/3234150

Digital Library

[18]

Shreya Shankar, Stephen Macke, Sarah Chasins, Andrew Head, and Aditya Parameswaran. 2022. Bolt-on, Compact, and Rapid Program Slicing for Notebooks. Proc. VLDB Endow., 15, 13 (2022), 09, 4038–4047. issn:2150-8097 https://doi.org/10.14778/3565838.3565855

Digital Library

[19]

Pavle Subotić, Lazar Milikić, and Milan Stojić. 2022. A Static Analysis Framework for Data Science Notebooks. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA. 13–22. isbn:9781450392266 https://doi.org/10.1145/3510457.3513032

Digital Library

[20]

Nigar M. Shafiq Surameery and Mohammed Y. Shakor. 2023. Use Chat GPT to Solve Programming Bugs. International Journal of Information Technology; Computer Engineering (IJITC), 3, 01 (2023), 17–22. https://doi.org/10.55529/ijitc.31.17.22

[21]

Yiran Wang, José Antonio Hernández López, Ulf Nilsson, and Dániel Varró. 2024. Repository of Paper - Using Run-time Information to Enhance Static Analysis of Machine Learning Code in Notebooks. https://doi.org/10.5281/zenodo.11130683

[22]

Dangwei Wu, Beijun Shen, and Yuting Chen. 2021. An Empirical Study on Tensor Shape Faults in Deep Learning Systems. arXiv:2106.02887. arxiv:2106.02887

[23]

Yonghao Wu, Zheng Li, Jie M. Zhang, Mike Papadakis, Mark Harman, and Yong Liu. 2023. Large Language Models in Fault Localisation. arxiv:2308.15276.

[24]

Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering, 48, 1 (2022), 1–36. https://doi.org/10.1109/TSE.2019.2962027

Digital Library

[25]

Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, and Mao Yang. 2020. An Empirical Study on Program Failures of Deep Learning Jobs. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1159–1170. isbn:9781450371216 https://doi.org/10.1145/3377811.3380362

Digital Library

[26]

Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An Empirical Study on TensorFlow Program Bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). Association for Computing Machinery, New York, NY, USA. 129–140. isbn:9781450356992 https://doi.org/10.1145/3213846.3213866

Digital Library

Cited By

Meijer WCombemale BWimmer MChechik MEgyed A(2024)Contract-based Validation of Conceptual Design Bugs for Engineering Complex Machine Learning SoftwareProceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems10.1145/3652620.3688201(155-161)Online publication date: 22-Sep-2024
https://dl.acm.org/doi/10.1145/3652620.3688201

Index Terms

Using Run-Time Information to Enhance Static Analysis of Machine Learning Code in Notebooks
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program analysis

Recommendations

Machine-learning-guided selectively unsound static analysis
ICSE '17: Proceedings of the 39th International Conference on Software Engineering

We present a machine-learning-based technique for selectively applying unsoundness in static analysis. Existing bug-finding static analyzers are unsound in order to be precise and scalable in practice. However, they are uniformly unsound and hence at ...
Machine-Learning-Guided Typestate Analysis for Static Use-After-Free Detection
ACSAC '17: Proceedings of the 33rd Annual Computer Security Applications Conference

Typestate analysis relies on pointer analysis for detecting temporal memory safety errors, such as use-after-free (UAF). For large programs, scalable pointer analysis is usually imprecise in analyzing their hard "corner cases", such as infeasible paths, ...
Static program analysis of embedded executable assembly code
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems

We consider the problem of automatically checking if coding standards have been followed in the development of embedded applications. The problem arises from practical considerations because DSP chip manufacturers (in our case Texas Instruments) want ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering

July 2024

715 pages

ISBN:9798400706585

DOI:10.1145/3663529

General Chair:
Marcelo d'Amorim
North Carolina State University, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article

Funding Sources

The Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation
The Software Center

Conference

FSE '24

Sponsor:

SIGSOFT

FSE '24: 32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Porto de Galinhas, Brazil

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
264
Total Downloads

Downloads (Last 12 months)264
Downloads (Last 6 weeks)56

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Meijer WCombemale BWimmer MChechik MEgyed A(2024)Contract-based Validation of Conceptual Design Bugs for Engineering Complex Machine Learning SoftwareProceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems10.1145/3652620.3688201(155-161)Online publication date: 22-Sep-2024
https://dl.acm.org/doi/10.1145/3652620.3688201

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten