research-article

Toward Understanding Deep Learning Framework Bugs

Authors:

Shuochuan LiAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 32, Issue 6

Article No.: 135, Pages 1 - 31

https://doi.org/10.1145/3587155

Published: 29 September 2023 Publication History

Abstract

DL frameworks are the basis of constructing all DL programs and models, and thus their bugs could lead to the unexpected behaviors of any DL program or model relying on them. Such a wide effect demonstrates the necessity and importance of guaranteeing DL frameworks’ quality. Understanding the characteristics of DL framework bugs is a fundamental step for this quality assurance task, facilitating designing effective bug detection and debugging approaches. Hence, in this work, we conduct the most large-scale study on 1,000 bugs from four popular and diverse DL frameworks (i.e., TensorFlow, PyTorch, MXNet, and DL4J). By analyzing the root causes and symptoms of DL framework bugs associated with five components decomposed from DL frameworks, as well as measuring test coverage achieved by three state-of-the-art testing techniques, we obtain 12 major findings for the comprehensive understanding of DL framework bugs and the current status of existing DL framework testing practice, and then provide a series of actionable guidelines for better DL framework bug detection and debugging. Finally, based on the guidelines, we design and implement a prototype DL-framework testing tool, called TenFuzz, which is evaluated to be effective and finds three unknown bugs on the latest TensorFlow framework in a preliminary study, indicating the significance of our guidelines.

References

[1]

Accessed: April, 4th, 2022. Coverage.py. Retrieved from https://coverage.readthedocs.io/.

[2]

Accessed: April, 4th, 2022. Deeplearning4J. Retrieved from https://deeplearning4j.org/.

[3]

Accessed: April, 4th, 2022. Gcov. Retrieved from https://gcc.gnu.org/onlinedocs/gcc/Gcov.html.

[4]

Accessed: April, 4th, 2022. Gradle. Retrieved from https://gradle.org/.

[5]

Accessed: April, 4th, 2022. MXNet. Retrieved from https://mxnet.apache.org.

[6]

Accessed: April, 4th, 2022. News. https://www.vice.com/en/article/9kga85/uber-is-giving-up-on-self-driving-cars-in-california-after-deadly-crash.

[7]

Accessed: April, 4th, 2022. News. https://www.newsweek.com/autonomous-tesla-crashes-parked-fire-truck-california-freeway-789177.

[8]

Accessed: April, 4th, 2022. PyTorch. Retrieved from https://pytorch.org.

[9]

Accessed: April, 4th, 2022. TensorFlow. Retrieved from https://www.tensorflow.org.

[10]

Accessed: April, 4th, 2022. Bazel. Retrieved from https://bazel.build/.

[11]

Accessed: April, 4th, 2022. Caffe. Retrieved from https://github.com/intel/caffe.

[12]

Accessed: April, 4th, 2022. Keras. Retrieved from https://github.com/keras-team/keras.

[13]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian J. Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Józefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Gordon Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda B. Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016). arXiv:1603.04467. http://arxiv.org/abs/1603.04467.

[14]

Sven Amann, Sarah Nadi, Hoan A. Nguyen, Tien N. Nguyen, and Mira Mezini. 2016. MUBench: A benchmark for API-misuse detectors. In Proceedings of the 13th International Conference on Mining Software Repositories. 464–467.

Digital Library

[15]

Sven Amann, Hoan Anh Nguyen, Sarah Nadi, Tien N. Nguyen, and Mira Mezini. 2018. A systematic evaluation of static api-misuse detectors. IEEE Transactions on Software Engineering 45, 12 (2018), 1170–1188.

[16]

Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722–2730.

Digital Library

[17]

Fangwei Chen, Lei Li, Jing Jiang, and Li Zhang. 2014. Predicting the number of forks for open source software project. In Proceedings of the 2014 3rd International Workshop on Evidential Assessment of Software Technologies.Association for Computing Machinery, New York, 40–47. DOI:

Digital Library

[18]

Junjie Chen, Zhuo Wu, Zan Wang, Hanmo You, Lingming Zhang, and Ming Yan. 2020. Practical accuracy estimation for efficient deep neural network testing. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 1–35.

Digital Library

[19]

Anthony Di Franco, Hui Guo, and Cindy Rubio-González. 2017. A comprehensive study of real-world numerical bug characteristics. In Proceedings of 32nd IEEE/ACM International Conference on Automated Software Engineering. 509–519.

[20]

Mengnan Du, Fan Yang, Na Zou, and Xia Hu. 2020. Fairness in deep learning: A computational perspective. IEEE Intelligent Systems 36, 4 (2020), 25–34. DOI:

Digital Library

[21]

Fabio Ferreira, Luciana Lourdes Silva, and Marco Tulio Valente. 2019. Software engineering meets deep learning: A literature review. arXiv:1909.11436.

[22]

Joshua Garcia, Yang Feng, Junjie Shen, Sumaya Almanee, Yuan Xia, and Qi Alfred Chen. 2020. A comprehensive study of autonomous vehicle bugs. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 385–396.

Digital Library

[23]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the 3rd International Conference on Learning Representations.

[24]

Jiazhen Gu, Xuchuan Luo, Yangfan Zhou, and Xin Wang. 2022. Muffin: Testing deep learning libraries via neural architecture fuzzing. In Proceedings of the 44th International Conference on Software Engineering (ICSE’22). 1418–1430.

[25]

Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, and Chao Shen. 2020. Audee: Automated testing for deep learning frameworks. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering. 486–498.

Digital Library

[26]

Junxiao Han, Shuiguang Deng, David Lo, Chen Zhi, Jianwei Yin, and Xin Xia. 2020. An empirical study of the dependency networks of deep learning libraries. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. IEEE, 868–878.

[27]

Xue Han and Tingting Yu. 2016. An empirical study on performance bugs for highly configurable software systems. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 23:1–23:10.

Digital Library

[28]

Hannes Hapke and Catherine Nelson. 2020. Building Machine Learning Pipelines. O’Reilly Media.

[29]

Foyzul Hassan and Xiaoyin Wang. 2018. Hirebuild: An automatic approach to history-driven repair of build scripts. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering. IEEE, 1078–1089.

Digital Library

[30]

Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, and Naoyasu Ubayashi. 2019. DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories. IEEE, 34–45.

Digital Library

[31]

Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1110–1121.

Digital Library

[32]

Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510–520.

Digital Library

[33]

Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2020. An empirical study on bugs inside tensorflow. In Proceedings of the International Conference on Database Systems for Advanced Applications. 604–620.

Digital Library

[34]

Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2021. The symptoms, causes, and repairs of bugs inside a deep learning library. Journal of Systems and Software 177 (2021), 110935.

[35]

Kyle D. Julian, Jessica Lopez, Jeffrey S. Brush, Michael P. Owen, and Mykel J. Kochenderfer. 2016. Policy compression for aircraft collision avoidance systems. In Proceedings of the 2016 IEEE/AIAA 35th Digital Avionics Systems Conference. 1–10.

[36]

Yuning Kang, Zan Wang, Hongyu Zhang, Junjie Chen, and Hanmo You. 2021. Apirecx: Cross-library api recommendation via pre-trained language model. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3425–3436.

[37]

Shahedul Huq Khandkar. 2009. Open coding. University of Calgary 23 (2009), 2009.

[38]

Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering. 1039–1049.

Digital Library

[39]

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In Proceedings of the 5th International Conference on Learning Representations.

[40]

Xia Li, Jiajun Jiang, Samuel Benton, Yingfei Xiong, and Lingming Zhang. 2021. A large-scale study on API misuses in the wild. In Proceedings of the 2021 14th IEEE Conference on Software Testing, Verification and Validation. 241–252. DOI:

[41]

Yiling Lou, Junjie Chen, Lingming Zhang, Dan Hao, and Lu Zhang. 2019. History-driven build failure fixing: How far are we?. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 43–54.

Digital Library

[42]

Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from mistakes: A comprehensive study on real world concurrency bug characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems. 329–339.

Digital Library

[43]

Howard Lune and Bruce L. Berg. 2017. Qualitative Research Methods for the Social Sciences. Pearson.

[44]

Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 120–131.

Digital Library

[45]

Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In Proceedings of the 2018 IEEE 29th International Symposium on Software Reliability Engineering. IEEE, 100–111.

[46]

Lei Ma, Fuyuan Zhang, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Combinatorial Testing for Deep Learning Systems. CoRR abs/1806.07723 (2018). arXiv:1806.07723. Retrieved from https://arxiv.org/abs/1806.07723.

[47]

Mahdi Nejadgholi and Jinqiu Yang. 2019. A study of oracle approximations in testing deep learning libraries. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering. 785–796. DOI:

Digital Library

[48]

Frolin Ocariza, Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. 2013. An empirical study of client-side JavaScript bugs. In Proceedings of the 2013 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 55–64.

[49]

Alexandre Perez, Rui Abreu, and Marcelo D’Amorim. 2017. Prevalence of single-fault fixes and its impact on fault localization. In Proceedings of the 2017 IEEE International Conference on Software Testing, Verification and Validation. 12–22.

[50]

Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: Cross-backend validation to detect and localize bugs in deep learning libraries. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering. 1027–1038.

Digital Library

[51]

Qingchao Shen, Junjie Chen, Jie M. Zhang, Haoyu Wang, Shuang Liu, and Menghan Tian. 2022. Natural test generation for precise testing of question answering software. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.

Digital Library

[52]

Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. to appear.

Digital Library

[53]

Xiangzhong Shen, Jieyi Zhang, Xiaonan Wang, Hongfang Yu, and Gang Sun. 2021. Deep learning framework fuzzing based on model mutation. In Proceedings of the 2021 IEEE 6th International Conference on Data Science in Cyberspace. 375–380. DOI:

[54]

Chengnian Sun, Vu Le, Qirun Zhang, and Zhendong Su. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 294–305.

Digital Library

[55]

Lin Tan, Chen Liu, Zhenmin Li, Xuanhui Wang, Yuanyuan Zhou, and Chengxiang Zhai. 2014. Bug characteristics in open scource software. Empirical Software Engineering 19, 6 (2014), 1665–1705.

Digital Library

[56]

Ferdian Thung, Shaowei Wang, David Lo, and Lingxiao Jiang. 2012. An empirical study of bugs in machine learning systems. In Proceedings of 23rd International Symposium on Software Reliability Engineering. 271–280.

Digital Library

[57]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. 303–314.

Digital Library

[58]

Zhao Tian, Junjie Chen, Qihao Zhu, Junjie Yang, and Lingming Zhang. 2022. Learning to construct better mutation faults. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.

Digital Library

[59]

Susana M. Vieira, Uzay Kaymak, and João MC Sousa. 2010. Cohen’s kappa coefficient as a performance measure for feature selection. In Proceedings of International Conference on Fuzzy Systems. 1–8.

[60]

J. M. Voas. 1992. PIE: A dynamic failure-based technique. IEEE Transactions on Software Engineering 18, 8 (1992), 717–727. DOI:

Digital Library

[61]

Gan Wang, Zan Wang, Junjie Chen, Xiang Chen, and Ming Yan. 2022. An empirical study on numerical bugs in deep learning programs. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–5.

Digital Library

[62]

Jiannan Wang, Thibaud Lutellier, Shangshu Qian, Hung Viet Pham, and Lin Tan. 2022. EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries. In Proceedings of the 44th International Conference on Software Engineering (ICSE’22). Association for Computing Machinery, New York, NY, 798–810.

[63]

Peipei Wang, Chris Brown, Jamie A. Jennings, and Kathryn T. Stolee. 2020. An empirical study on regular expression bugs. In Proceedings of the 17th International Conference on Mining Software Repositories. 103–113.

Digital Library

[64]

Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep learning library testing via effective model generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788–799.

Digital Library

[65]

Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, and Wenbin Zhang. 2021. Prioritizing test inputs for deep neural networks via mutation analysis. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 397–409.

Digital Library

[66]

Mohammad Wardat, Wei Le, and Hridesh Rajan. 2021. DeepLocalize: Fault localization for deep neural networks. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. 251–262.

Digital Library

[67]

Anjiang Wei, Yinlin Deng, Chenyuan Yang, and Lingming Zhang. 2022. Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE’22). Association for Computing Machinery, New York, NY, 995–1007.

[68]

Danning Xie, Yitong Li, Mijung Kim, Hung Viet Pham, Lin Tan, Xiangyu Zhang, and Michael W. Godfrey. 2022. Docter: Documentation-guided fuzzing for testing deep learning api functions. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 176–188.

Digital Library

[69]

Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 146–157.

Digital Library

[70]

Yingfei Xiong, Yongqiang Tian, Yepang Liu, and S. C. Cheung. 2022. Towards actionable testing of deep learning models. Science China Information Sciences (2022).

[71]

Ming Yan, Junjie Chen, Xiangyu Zhang, Lin Tan, Gan Wang, and Zan Wang. 2021. Exposing numerical bugs in deep learning via gradient back-propagation. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 627–638.

Digital Library

[72]

Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, and Wenbin Zhang. 2021. Semi-supervised log-based anomaly detection via probabilistic label estimation. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 1448–1460.

Digital Library

[73]

Hanmo You, Zan Wang, Junjie Chen, Shuang Liu, and Shuochuan Li. 2023. Regression fuzzing for deep learning systems. In Proceedings of the 45th International Conference on Software Engineering. to appear.

Digital Library

[74]

Jerrold Zar. 2005. Spearman rank correlation. Vol. 5 (2005).

[75]

Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering 28, 2 (2002), 183–200.

Digital Library

[76]

Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online Q&A forum reliable?: A study of API misuse on stack overflow. In Proceedings of the 40th IEEE/ACM International Conference on Software Engineering. 886–896.

Digital Library

[77]

Xufan Zhang, Jiawei Liu, Ning Sun, Chunrong Fang, Jia Liu, Jiang Wang, Dong Chai, and Zhenyu Chen. 2021. Duo: Differential fuzzing for deep learning operators. IEEE Transactions on Reliability 70, 4 (2021), 1671–1685. DOI:

[78]

Xufan Zhang, Ning Sun, Chunrong Fang, Jiawei Liu, Jia Liu, Dong Chai, Jiang Wang, and Zhenyu Chen. 2021. Predoo: Precision testing of deep learning operators. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 400–412.

Digital Library

[79]

Xiaoyu Zhang, Juan Zhai, Shiqing Ma, and Chao Shen. 2021. AUTOTRAINER: An automatic DNN training problem detection and repair system. In Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering. 359–371.

Digital Library

[80]

Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 129–140.

Digital Library

[81]

Yingyi Zhang, Zan Wang, Jiajun Jiang, Hanmo You, and Junjie Chen. 2022. Toward improving the robustness of deep learning models via model transformation. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. ACM, 104:1–104:13.

Digital Library

[82]

Ziyuan Zhong, Yuchi Tian, and Baishakhi Ray. 2021. Understanding local robustness of deep neural networks under natural variations. In Proceedings of the International Conference on Fundamental Approaches to Software Engineering. Springer, Cham, 313–337.

Digital Library

Cited By

Tamizharasi AEzhumalai P(2024)Hybrid whale optimized crow search algorithm and multi-SVM classifier for effective system level test case selectionJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23270046:2(4191-4207)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.3233/JIFS-232700
Yang CChen JJiang JHuang Y(2024)Dependency-Aware Code NaturalnessProceedings of the ACM on Programming Languages10.1145/36897948:OOPSLA2(2355-2377)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689794
Du CChen T(2024)Contexts Matter: An Empirical Study on Contextual Influence in Fairness Testing for Deep Learning SystemsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686673(107-118)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686673
Show More Cited By

Index Terms

Toward Understanding Deep Learning Framework Bugs
1. General and reference
  1. Cross-computing tools and techniques
    1. Empirical studies
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
  2. Software notations and tools
    1. Software libraries and repositories

Recommendations

A comprehensive study of deep learning compiler bugs
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

There are increasing uses of deep learning (DL) compilers to generate optimized code, boosting the runtime performance of DL models on specific hardware. Like their traditional counterparts, DL compilers can generate incorrect code, resulting in ...
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow
Abstract
Deep Learning (DL) frameworks are now widely used, simplifying the creation of complex models as well as their integration into various applications even among non-DL experts. However, like any other programs, they are prone to bugs. This paper ...
Audee: automated testing for deep learning frameworks
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

Deep learning (DL) has been applied widely, and the quality of DL system becomes crucial, especially for safety-critical applications. Existing work mainly focuses on the quality analysis of DL models, but lacks attention to the underlying frameworks on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 32, Issue 6

November 2023

949 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/3625557

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2023

Online AM: 16 March 2023

Accepted: 07 February 2023

Revised: 13 January 2023

Received: 04 April 2022

Published in TOSEM Volume 32, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
1,304
Total Downloads

Downloads (Last 12 months)781
Downloads (Last 6 weeks)107

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tamizharasi AEzhumalai P(2024)Hybrid whale optimized crow search algorithm and multi-SVM classifier for effective system level test case selectionJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23270046:2(4191-4207)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.3233/JIFS-232700
Yang CChen JJiang JHuang Y(2024)Dependency-Aware Code NaturalnessProceedings of the ACM on Programming Languages10.1145/36897948:OOPSLA2(2355-2377)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689794
Du CChen T(2024)Contexts Matter: An Empirical Study on Contextual Influence in Fairness Testing for Deep Learning SystemsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686673(107-118)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686673
Sun ZChen ZZhang JHao D(2024)Fairness Testing of Machine Translation SystemsACM Transactions on Software Engineering and Methodology10.1145/366460833:6(1-27)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3664608
Zhang SZhu JHao BSun YNie XZhu JLiu XLi XMa YPei Dd'Amorim M(2024)Fault Diagnosis for Test Alarms in Microservices through Multi-source DataCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663833(115-125)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663833
Xu QGao YWei JChristakis MPradel M(2024)An Empirical Study on Kubernetes Operator BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680396(1746-1758)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680396
Guan HBai GLiu YChristakis MPradel M(2024)Large Language Models Can Connect the Dots: Exploring Model Optimization Bugs with Domain Knowledge-Aware PromptsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680383(1579-1591)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680383
Ma HZhang WShen QTian YChen JCheung SChristakis MPradel M(2024)Towards Understanding the Bugs in Solidity CompilerProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680362(1312-1324)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680362
Jiang ZLiu JHuang JLi YHuo YGu JChen ZZhu JLyu MChristakis MPradel M(2024)A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652123(223-234)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652123
Scantamburlo TFalcarin PVeneri AFabris AGallese CBilla VRotolo FMarcuzzi F(2024)Software Systems Compliance with the AI ActProceedings of the 2nd International Workshop on Responsible AI Engineering10.1145/3643691.3648589(44-51)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3643691.3648589
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents