Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3663529.3663785acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

Using Run-Time Information to Enhance Static Analysis of Machine Learning Code in Notebooks

Published: 10 July 2024 Publication History

Abstract

A prevalent method for developing machine learning (ML) prototypes involves the use of notebooks. Notebooks are sequences of cells containing both code and natural language documentation. When executed during development, these code cells provide valuable run-time information. Nevertheless, current static analyzers for notebooks do not leverage this run-time information to detect ML bugs. Consequently, our primary proposition in this paper is that harvesting this run-time information in notebooks can significantly improve the effectiveness of static analysis in detecting ML bugs. To substantiate our claim, we focus on bugs related to tensor shapes and conduct experiments using two static analyzers: 1) PYTHIA, a traditional rule-based static analyzer, and 2) GPT-4, a large language model that can also be used as a static analyzer. The results demonstrate that using run-time information in static analyzers enhances their bug detection performance and it also helped reveal a hidden bug in a public dataset.

References

[1]
Wilson Baker, Michael O’Connor, Seyed Reza Shahamiri, and Valerio Terragni. 2022. Detect, Fix, and Verify TensorFlow API Misuses. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA. 925–929. issn:1534-5351 https://doi.org/10.1109/SANER53432.2022.00110
[2]
Julian Dolby, Avraham Shinnar, Allison Allain, and Jenna Reinen. 2018. Ariadne: Analysis for Machine Learning Programs. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2018). Association for Computing Machinery, New York, NY, USA. 1–10. isbn:9781450358347 https://doi.org/10.1145/3211346.3211349
[3]
Konstantin Grotov, Sergey Titov, Vladimir Sotnikov, Yaroslav Golubev, and Timofey Bryksin. 2022. A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts. In Proceedings of the 19th International Conference on Mining Software Repositories (MSR ’22). Association for Computing Machinery, NY, USA. 353–364. isbn:9781450393034 https://doi.org/10.1145/3524842.3528447
[4]
Md. Asraful Haque and Shuai Li. 2023. The Potential Use of ChatGPT for Debugging and Bug Fixing. EAI Endorsed Transactions on AI and Robotics, 2 (2023), https://doi.org/10.4108/airo.v2i1.3276
[5]
Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1110–1121. isbn:9781450371216 https://doi.org/10.1145/3377811.3380395
[6]
Anastasia Ioannidou, Elisavet Chatzilari, Spiros Nikolopoulos, and Ioannis Kompatsiaris. 2017. Deep Learning Advances in Computer Vision with 3D Data: A Survey. ACM Comput. Surv., 50, 2 (2017), Article 20, apr, 38 pages. issn:0360-0300 https://doi.org/10.1145/3042064
[7]
Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA. 510–520. isbn:9781450355728 https://doi.org/10.1145/3338906.3338955
[8]
Ho Young Jhoo, Sehoon Kim, Woosung Song, Kyuyeon Park, DongKwon Lee, and Kwangkeun Yi. 2022. A Static Analyzer for Detecting Tensor Shape Errors in Deep Neural Network Training Code. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (ICSE ’22). Association for Computing Machinery, New York, NY, USA. 337–338. isbn:9781450392235 https://doi.org/10.1145/3510454.3528638
[9]
Sifis Lagouvardos, Julian Dolby, Neville Grech, Anastasios Antoniadis, and Yannis Smaragdakis. 2020. Static Analysis of Shape in TensorFlow Programs. In 34th European Conference on Object-Oriented Programming (ECOOP 2020), Robert Hirschfeld and Tobias Pape (Eds.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 166). Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany. 15:1–15:29. isbn:978-3-95977-154-2 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2020.15
[10]
Maurizio Leotta, Dario Olianas, and Filippo Ricca. 2022. A Large Experimentation to Analyze the Effects of Implementation Bugs in Machine Learning Algorithms. Future Generation Computer Systems, 133 (2022), 184–200. issn:0167-739X https://doi.org/10.1016/j.future.2022.03.004
[11]
Haonan Li, Yu Hao, Yizhuo Zhai, and Zhiyun Qian. 2023. Assisting Static Analysis with Large Language Models: A ChatGPT Experiment. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, NY, USA. 2107–2111. https://doi.org/10.1145/3611643.3613078
[12]
Ben Liblit, Linghui Luo, Alejandro Molina, Rajdeep Mukherjee, Zachary Patterson, Goran Piskachev, Martin Schäf, Omer Tripp, and Willem Visser. 2023. Shifting Left for Early Detection of Machine-Learning Bugs. In Formal Methods: 25th International Symposium. Springer-Verlag, Berlin, Heidelberg. 584–597. isbn:978-3-031-27480-0 https://doi.org/10.1007/978-3-031-27481-7_33
[13]
Stephen Macke, Hongpu Gong, Doris Jung-Lin Lee, Andrew Head, Doris Xin, and Aditya Parameswaran. 2021. Fine-Grained Lineage for Safer Notebook Interactions. Proc. VLDB Endow., 14, 6 (2021), 2, 1093–1101. issn:2150-8097 https://doi.org/10.14778/3447689.3447712
[14]
Mohammad Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh, and Zhen Jiang. 2023. Bug Characterization in Machine Learning-based Systems. Empirical Software Engineering, 29 (2023), 12, https://doi.org/10.1007/s10664-023-10400-0
[15]
Rajdeep Mukherjee, Omer Tripp, Ben Liblit, and Michael Wilson. 2022. Static Analysis for AWS Best Practices in Python Code. arxiv:2205.04432.
[16]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.
[17]
Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2018. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv., 51, 5 (2018), Article 92, 36 pages. issn:0360-0300 https://doi.org/10.1145/3234150
[18]
Shreya Shankar, Stephen Macke, Sarah Chasins, Andrew Head, and Aditya Parameswaran. 2022. Bolt-on, Compact, and Rapid Program Slicing for Notebooks. Proc. VLDB Endow., 15, 13 (2022), 09, 4038–4047. issn:2150-8097 https://doi.org/10.14778/3565838.3565855
[19]
Pavle Subotić, Lazar Milikić, and Milan Stojić. 2022. A Static Analysis Framework for Data Science Notebooks. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA. 13–22. isbn:9781450392266 https://doi.org/10.1145/3510457.3513032
[20]
Nigar M. Shafiq Surameery and Mohammed Y. Shakor. 2023. Use Chat GPT to Solve Programming Bugs. International Journal of Information Technology; Computer Engineering (IJITC), 3, 01 (2023), 17–22. https://doi.org/10.55529/ijitc.31.17.22
[21]
Yiran Wang, José Antonio Hernández López, Ulf Nilsson, and Dániel Varró. 2024. Repository of Paper - Using Run-time Information to Enhance Static Analysis of Machine Learning Code in Notebooks. https://doi.org/10.5281/zenodo.11130683
[22]
Dangwei Wu, Beijun Shen, and Yuting Chen. 2021. An Empirical Study on Tensor Shape Faults in Deep Learning Systems. arXiv:2106.02887. arxiv:2106.02887
[23]
Yonghao Wu, Zheng Li, Jie M. Zhang, Mike Papadakis, Mark Harman, and Yong Liu. 2023. Large Language Models in Fault Localisation. arxiv:2308.15276.
[24]
Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering, 48, 1 (2022), 1–36. https://doi.org/10.1109/TSE.2019.2962027
[25]
Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, and Mao Yang. 2020. An Empirical Study on Program Failures of Deep Learning Jobs. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1159–1170. isbn:9781450371216 https://doi.org/10.1145/3377811.3380362
[26]
Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An Empirical Study on TensorFlow Program Bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). Association for Computing Machinery, New York, NY, USA. 129–140. isbn:9781450356992 https://doi.org/10.1145/3213846.3213866

Cited By

View all
  • (2024)Contract-based Validation of Conceptual Design Bugs for Engineering Complex Machine Learning SoftwareProceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems10.1145/3652620.3688201(155-161)Online publication date: 22-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering
July 2024
715 pages
ISBN:9798400706585
DOI:10.1145/3663529
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2024

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. large language models
  2. machine learning bugs
  3. notebook
  4. run-time information
  5. static analysis

Qualifiers

  • Research-article

Funding Sources

  • The Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation
  • The Software Center

Conference

FSE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)129
  • Downloads (Last 6 weeks)32
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Contract-based Validation of Conceptual Design Bugs for Engineering Complex Machine Learning SoftwareProceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems10.1145/3652620.3688201(155-161)Online publication date: 22-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media