Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

COMET: Coverage-guided Model Generation For Deep Learning Library Testing

Published: 21 July 2023 Publication History

Abstract

Recent deep learning (DL) applications are mostly built on top of DL libraries. The quality assurance of these libraries is critical to the dependable deployment of DL applications. Techniques have been proposed to generate various DL models and apply them to test these libraries. However, their test effectiveness is constrained by the diversity of layer API calls in their generated DL models. Our study reveals that these techniques can cover at most 34.1% layer inputs, 25.9% layer parameter values, and 15.6% layer sequences. As a result, we find that many bugs arising from specific layer API calls (i.e., specific layer inputs, parameter values, or layer sequences) can be missed by existing techniques.
Because of this limitation, we propose COMET to effectively generate DL models with diverse layer API calls for DL library testing. COMET: (1) designs a set of mutation operators and a coverage-based search algorithm to diversify layer inputs, layer parameter values, and layer sequences in DL models. (2) proposes a model synthesis method to boost the test efficiency without compromising the layer API call diversity. Our evaluation result shows that COMET outperforms baselines by covering twice as many layer inputs (69.7% vs. 34.1%), layer parameter values (50.2% vs. 25.9%), and layer sequences (39.0% vs. 15.6%) as those by the state-of-the-art. Moreover, COMET covers 3.4% more library branches than those by existing techniques. Finally, COMET detects 32 new bugs in the latest version of eight popular DL libraries, including TensorFlow and MXNet, with 21 of them confirmed by DL library developers and seven of those confirmed bugs have been fixed by developers.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA, 265–283.
[2]
Paul Ammann and Jeff Offutt. 2008. Introduction to Software Testing (1 ed.). Cambridge University Press, USA.
[3]
A. Ardakani, Alireza Rajabzadeh Kanafi, U. Acharya, Nazanin Khadem, and A. Mohammadi. 2020. Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks. Computers in Biology and Medicine 121 (2020), 103795–103795.
[4]
S. Bhattacharya, Praveen Kumar Reddy Maddikunta, Quoc-Viet Pham, T. Gadekallu, S. SivaRamaKrishnan, C. L. Chowdhary, M. Alazab, and Md. Jalil Piran. 2020. Deep learning and medical image processing for coronavirus (COVID-19) pandemic: A survey. Sustainable Cities and Society 65 (2020), 102589–102589.
[5]
Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. DeepDriving: Learning affordance for direct perception in autonomous driving. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, Santiago, Chile, 2722–2730. DOI:
[6]
Junjie Chen, Yihua Liang, Qingchao Shen, and Jiajun Jiang. 2022. Toward Understanding Deep Learning Framework Bugs. (2022). DOI:
[7]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. (2015). arxiv:cs.DC/1512.01274.
[8]
Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao. 2016. Coverage-directed differential testing of JVM implementations. SIGPLAN Not. 51, 6 (June2016), 85–99. DOI:
[9]
Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao. 2016. Coverage-directed differential testing of JVM implementations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’16). Association for Computing Machinery, New York, NY, USA, 85–99. DOI:
[10]
Yuting Chen and Zhendong Su. 2015. Guided differential testing of certificate validation in SSL/TLS implementations. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA, 793–804. DOI:
[11]
François Chollet et al. 2015. Keras. (2015).
[12]
Operator Fusion For Conv2D and Mul Operators. Accessed: 2022. https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/optimizer/conv_mul_fusion.cc.
[13]
Bug Report For Conv3DTranspose. Accessed: 2022. https://github.com/keras-team/keras/issues/16933.
[14]
Rajshekhar Das, Akshay Gadre, Shanghang Zhang, Swarun Kumar, and Jose M. F. Moura. 2018. A deep learning approach to IoT authentication. In 2018 IEEE International Conference on Communications (ICC). IEEE, Kansas City, MO, USA, 1–6. DOI:
[15]
Bug Report For DepthwiseConv2D. Accessed: 2022. https://github.com/keras-team/keras/issues/16314.
[16]
Aidin Ferdowsi and W. Saad. 2019. Deep learning for signal authentication and security in massive internet-of-things systems. IEEE Transactions on Communications 67 (2019), 1371–1387.
[17]
Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). Association for Computing Machinery, New York, NY, USA, 416–419. DOI:
[18]
Jiazhen Gu, Xuchuan Luo, Yangfan Zhou, and Xin Wang. 2022. Muffin: Testing deep learning libraries via neural architecture fuzzing. In Proceedings of the 44th International Conference on Software Engineering (ICSE’22). Association for Computing Machinery, New York, NY, USA, 1418–1430. DOI:
[19]
Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, and Chao Shen. 2020. Audee: Automated testing for deep learning frameworks. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, Melbourne, VIC, Australia, 486–498.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 630–645.
[21]
Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. DeepCrime: Mutation testing of deep learning systems based on real faults. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). Association for Computing Machinery, New York, NY, USA, 67–78. DOI:
[22]
Li Jia, Hao Zhong, and Linpeng Huang. 2021. The unit test quality of deep learning libraries: A mutation analysis. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, Luxembourg, 47–57. DOI:
[23]
Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2020. An empirical study on bugs inside TensorFlow. In Database Systems for Advanced Applications, Yunmook Nah, Bin Cui, Sang-Won Lee, Jeffrey Xu Yu, Yang-Sae Moon, and Steven Euijong Whang (Eds.). Springer International Publishing, Cham, 604–620.
[24]
Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2021. The symptoms, causes, and repairs of bugs inside a deep learning library. Journal of Systems and Software 177 (2021), 110935. DOI:
[26]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. (2009), 32–33 pages. https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf.
[27]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 1097–1105.
[28]
D. R. Kuhn, D. R. Wallace, and A. M. Gallo. 2004. Software fault interactions and implications for software testing. IEEE Transactions on Software Engineering 30, 6 (2004), 418–421. DOI:
[29]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324. DOI:
[30]
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I. Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017), 60–88.
[31]
Weisi Luo, Dong Chai, Xiaoyue Run, Jiang Wang, Chunrong Fang, and Zhenyu Chen. 2021. Graph-based fuzz testing for deep learning inference engines. In Proceedings of the 43rd International Conference on Software Engineering (ICSE’21). IEEE Press, Madrid, Spain, 288–299. DOI:
[32]
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepMutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, Memphis, TN, 100–111. DOI:
[33]
Keras Apache MXNet. Accessed: 2022. keras-apache-mxnet. https://github.com/awslabs/keras-apache-mxnet.
[34]
Mahdi Nejadgholi and Jinqiu Yang. 2020. A study of oracle approximations in testing deep learning libraries. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE’19). IEEE Press, San Diego, California, 785–796. DOI:
[35]
Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. TensorFuzz: Debugging neural networks with coverage-guided fuzzing. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, 4901–4911. https://proceedings.mlr.press/v97/odena19a.html.
[36]
Onnx. Accessed: 2022. Open standard for machine learning interoperability. https://github.com/onnx/onnx.
[37]
TensorFlow ONNX. Accessed: 2022. tf2onnx - Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX. https://github.com/onnx/tensorflow-onnx.
[38]
Onnx2pytorch. Accessed: 2022. ONNX to PyTorch. https://github.com/ToriML/onnx2pytorch.
[39]
Carlos Pacheco and Michael D. Ernst. 2007. Randoop: Feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications Companion (OOPSLA’07). Association for Computing Machinery, New York, NY, USA, 815–816. DOI:
[40]
Bug Report For Inconsistent Padding. Accessed: 2022. https://github.com/ToriML/onnx2pytorch/pull/41.
[41]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA.
[42]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2019. DeepXplore: Automated whitebox testing of deep learning systems. Commun. ACM 62, 11 (Oct.2019), 137–145. DOI:
[43]
Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: Cross-backend validation to detect and localize bugs in deep learning libraries. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, Montreal, QC, Canada, 1027–1038.
[44]
ONNX Runtime. Accessed: 2022. Optimize and Accelerate Machine Learning Inferencing and Training. https://onnxruntime.ai/.
[45]
Ahmad El Sallab, Mohammed Abdou, E. Perot, and S. Yogamani. 2017. Deep Reinforcement Learning framework for Autonomous Driving. (2017).
[46]
S. Shalev-Shwartz, Shaked Shammah, and A. Shashua. 2016. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. (2016).
[47]
Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 968–980. DOI:
[48]
TensorFlow Padding Strategy. Accessed: 2022. https://www.tensorflow.org/api_docs/python/tf/nn.
[49]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Boston, MA, 1–9. DOI:
[50]
Florian Tambon, Amin Nikanjam, Le An, Foutse Khomh, and Giuliano Antoniol. 2021. Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow. (2021). DOI:
[51]
Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial sample detection for deep neural network through model mutation testing. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19). IEEE Press, Montreal, Quebec, Canada, 1245–1256. DOI:
[52]
Ran Wang, Daming Zou, Xinrui He, Yingfei Xiong, Lu Zhang, and Gang Huang. 2016. Detecting and fixing precision-specific operations for measuring floating-point errors. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). Association for Computing Machinery, New York, NY, USA, 619–630. DOI:
[53]
Song Wang, Nishtha Shrestha, Abarna Kucheri Subburaman, Junjie Wang, Moshi Wei, and Nachiappan Nagappan. 2021. Automatic unit test generation for machine learning libraries: How far are we?. In Proceedings of the 43rd International Conference on Software Engineering (ICSE’21). IEEE Press, Madrid, Spain, 1548–1560. DOI:
[54]
Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep learning library testing via effective model generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 788–799. DOI:
[55]
Mohammad Wardat, Wei Le, and Hridesh Rajan. 2021. DeepLocalize: Fault localization for deep neural networks. In Proceedings of the 43rd International Conference on Software Engineering (ICSE’21). IEEE Press, Madrid, Spain, 251–262. DOI:
[56]
Anjiang Wei, Yinlin Deng, Chenyuan Yang, and Lingming Zhang. 2022. Free lunch for testing: Fuzzing deep-learning libraries from open source. In Proceedings of the 44th International Conference on Software Engineering (ICSE’22). Association for Computing Machinery, New York, NY, USA, 995–1007. DOI:
[57]
Danning Xie, Yitong Li, Mijung Kim, Hung Viet Pham, Lin Tan, Xiangyu Zhang, and Michael W. Godfrey. 2022. DocTer: Documentation-guided fuzzing for testing deep learning API functions. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 176–188. DOI:
[58]
Yilin Yang, Tianxing He, Zhilong Xia, and Yang Feng. 2022. A comprehensive empirical study on bug characteristics of deep learning frameworks. Information and Software Technology 151 (2022), 107004. DOI:
[59]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018). Association for Computing Machinery, New York, NY, USA, 132–142. DOI:
[60]
Xufan Zhang, Ning Sun, Chunrong Fang, Jiawei Liu, Jia Liu, Dong Chai, Jiang Wang, and Zhenyu Chen. 2021. Predoo: Precision Testing of Deep Learning Operators. Association for Computing Machinery, New York, NY, USA, 400–412.
[61]
Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020. Detecting numerical bugs in neural network architectures. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 826–837. DOI:

Cited By

View all
  • (2024)An Empirical Study of the Non-determinism of ChatGPT in Code GenerationACM Transactions on Software Engineering and Methodology10.1145/3697010Online publication date: 26-Sep-2024
  • (2024)Mutation-Based Deep Learning Framework Testing Method in JavaScript EnvironmentProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695478(970-981)Online publication date: 27-Oct-2024
  • (2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 5
September 2023
905 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3610417
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 July 2023
Online AM: 08 February 2023
Accepted: 10 January 2023
Revised: 07 January 2023
Received: 01 October 2022
Published in TOSEM Volume 32, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep learning testing
  2. library testing
  3. model generation
  4. model diversity

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • National Key Research and Development Program of China
  • Hong Kong RGC/GRF
  • Hong Kong ITF
  • MSRA Collaborative Research Grant

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)492
  • Downloads (Last 6 weeks)90
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Empirical Study of the Non-determinism of ChatGPT in Code GenerationACM Transactions on Software Engineering and Methodology10.1145/3697010Online publication date: 26-Sep-2024
  • (2024)Mutation-Based Deep Learning Framework Testing Method in JavaScript EnvironmentProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695478(970-981)Online publication date: 27-Oct-2024
  • (2024)Automated Commit Intelligence by Pre-trainingACM Transactions on Software Engineering and Methodology10.1145/3674731Online publication date: 1-Jul-2024
  • (2024)Interoperability in Deep Learning: A User Survey and Failure Analysis of ONNX Model ConvertersProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680374(1466-1478)Online publication date: 11-Sep-2024
  • (2023)Aster: Encoding Data Augmentation Relations into Seed Test Suites for Robustness Assessment and Fuzzing of Data-Augmented Deep Learning Models2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00044(370-381)Online publication date: 22-Oct-2023
  • (2023)Effective Concurrency Testing for Go via Directional Primitive-Constrained Interleaving ExplorationProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00086(1364-1376)Online publication date: 11-Nov-2023
  • (2023)Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlowEmpirical Software Engineering10.1007/s10664-023-10389-629:1Online publication date: 29-Nov-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media