Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

Published: 31 December 2024 Publication History

Abstract

In the testing-retraining pipeline for enhancing the robustness property of deep learning (DL) models, many state-of-the-art robustness-oriented fuzzing techniques are metric-oriented. The pipeline generates adversarial examples as test cases via such a DL testing technique and retrains the DL model under test with test suites that contain these test cases. On the one hand, the strategies of these fuzzing techniques tightly integrate the key characteristics of their testing metrics. On the other hand, they are often unaware of whether their generated test cases are different from the samples surrounding these test cases and whether there are relevant test cases of other seeds when generating the current one. We propose a novel testing metric called Contextual Confidence (CC). CC measures a test case through the surrounding samples of a test case in terms of their mean probability predicted to the prediction label of the test case. Based on this metric, we further propose a novel fuzzing technique Clover as a DL testing technique for the pipeline. In each fuzzing round, Clover first finds a set of seeds whose labels are the same as the label of the seed under fuzzing. At the same time, it locates the corresponding test case that achieves the highest CC values among the existing test cases of each seed in this set of seeds and shares the same prediction label as the existing test case of the seed under fuzzing that achieves the highest CC value. Clover computes the piece of difference between each such pair of a seed and a test case. It incrementally applies these pieces of differences to perturb the current test case of the seed under fuzzing that achieves the highest CC value and to perturb the resulting samples along the gradient to generate new test cases for the seed under fuzzing. Clover finally selects test cases among the generated test cases of all seeds as much as possible and with a preference to select test cases with higher CC values for improving model robustness. The experiments show that Clover outperforms the state-of-the-art coverage-based technique Adapt and loss-based fuzzing technique RobOT by 67%–129% and 48%–100% in terms of robustness improvement ratio, respectively, delivered through the same testing-retraining pipeline. For test case generation, in terms of numbers of unique adversarial labels and unique categories for the constructed test suites, Clover outperforms Adapt by \(2.0\times\) and \(3.5\times\) and RobOT by \(1.6\times\) and \(1.7\times\) on fuzzing clean models, and also outperforms Adapt by \(3.4\times\) and \(4.5\times\) and RobOT by \(9.8\times\) and \(11.0\times\) on fuzzing adversarially trained models, respectively.

References

[1]
GitHub, Inc. 2017. Fashion-mnist. Retrieved from https://github.com/zalandoresearch/fashion-mnist
[2]
GitHub, Inc. 2018. Shiftresnet-cifar. Retrieved from https://github.com/alvinwan/shiftresnet-cifar
[3]
GitHub, Inc. 2021. Adapt. Retrieved from https://github.com/kupl/ADAPT
[4]
GitHub, Inc. 2021. RobOT. Retrieved from https://github.com/SmallkeyChen/RobOT
[5]
GitHub, Inc. 2022. Clover. Retrieved from https://github.com/PapRep/Clover
[6]
Papers with Code. 2023. Image Classification on CIFAR-10. Retrieved from https://paperswithcode.com/sota/image-classification-on-cifar-10
[7]
IBM Trusted AI. 2017. Adversarial Robustness Toolbox. Retrieved from https://github.com/Trusted-AI/adversarial-robustness-toolbox
[8]
Maksym Andriushchenko and Nicolas Flammarion. 2020. Understanding and Improving Fast Adversarial Training. In Proceedings of the Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 16048–16059. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2020/file/b8ce47761ed7b3b6f48b583350b7f9e4-Paper.pdf
[9]
Devansh Arpit, Stanisław Jastrzȩbski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. 2017. A Closer Look at Memorization in Deep Networks. In Proceedings of the 34th International Conference on Machine Learning. Doina Precup and Yee Whye Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, PMLR, 233–242. Retrieved from https://proceedings.mlr.press/v70/arpit17a.html
[10]
Junjie Bai, Fang Lu, and Ke Zhang. 2019. ONNX: Open Neural Network Exchange. Retrieved from https://github.com/onnx/onnx.
[11]
Teodora Baluta, Zheng Leong Chua, Kuldeep S Meel, and Prateek Saxena. 2021. Scalable quantitative verification for deep neural networks. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 312–323.
[12]
David Berend, Xiaofei Xie, Lei Ma, Lingjun Zhou, Yang Liu, Chi Xu, and Jianjun Zhao. 2021. Cats Are Not Fish: Deep Learning Testing Calls for Out-of-Distribution Awareness. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ’20). ACM, New York, NY, 1041–1052. DOI:
[13]
Tejas S. Borkar and Lina J. Karam. 2019. DeepCorrect: Correcting DNN Models Against Image Distortions. IEEE Transactions on Image Processing 28, 12 (2019), 6022–6034. DOI:
[14]
Jialuo Chen, Jingyi Wang, Xingjun Ma, Youcheng Sun, Jun Sun, Peixin Zhang, and Peng Cheng. 2023. QuoTe: Quality-oriented Testing for Deep Learning Systems. ACM Transactions on Software Engineering and Methodology 32, 5 (Jul 2023), Article 125, 33 pages. DOI:
[15]
Hasan Ferit Eniser, Simos Gerasimou, and Alper Sen. 2019. DeepFault: Fault Localization for Deep Neural Networks. In Fundamental Approaches to Software Engineering. Reiner Hähnle and Wil van der Aalst (Eds.), Springer, Cham, 171–191.
[16]
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar, Volodymyr Kuleshov, Mark DePristo, Katherine Chou, Claire Cui, Greg Corrado, Sebastian Thrun, and Jeff Dean. 2019. A Guide to Deep Learning in Healthcare. Nature Medicine 25, 1 (2019), 24–29.
[17]
Yuchu Fang, Wenzhong Li, Yao Zeng, Yang Zheng, Zheng Hu, and Sanglu Lu. 2023. PatchNAS: Repairing DNNs in Deployment with Patched Network Architecture Search. In Proceedings of the 37th AAAI Conference on Artificial Intelligence and 35th Conference on Innovative Applications of Artificial Intelligence and 13th Symposium on Educational Advances in Artificial Intelligence(AAAI ’23/IAAI ’23/EAAI ’23). AAAI Press, Article 1661, 9 pages. DOI:
[18]
Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, and Zhenyu Chen. 2020. Deepgini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 177–188.
[19]
Catherine O. Fritz, Peter E. Morris, and Jennifer J. Richler. 2012. Effect Size Estimates: Current Use, Calculations, and Interpretation. Journal of Experimental Psychology: General 141, 1 (2012), 2.
[20]
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-Adversarial Training of Neural Networks. The Journal of Machine Learning Research 17, 1 (2016), 2096–2030.
[21]
Xinyu Gao, Yang Feng, Yining Yin, Zixi Liu, Zhenyu Chen, and Baowen Xu. 2022. Adaptive Test Selection for Deep Neural Networks. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). ACM, New York, NY, 73–85. DOI:
[22]
Xiang Gao, Ripon K. Saha, Mukul R. Prasad, and Abhik Roychoudhury. 2020. Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE), 1147–1158.
[23]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and Harnessing Adversarial Examples. Retrieved from https://doi.org/10.48550/ARXIV.1412.6572
[24]
Jianmin Guo, Yu Jiang, Yue Zhao, Quan Chen, and Jiaguang Sun. 2018. DLFuzz: Differential Fuzzing Testing of Deep Learning Systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18). ACM, New York, NY, 739–743. DOI:
[25]
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proceedings of the Advances in Neural Information Processing Systems. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2018/file/a19744e268754fb0148b017647355b7b-Paper.pdf
[26]
Fabrice Harel-Canada, Lingxiao Wang, Muhammad Ali Gulzar, Quanquan Gu, and Miryung Kim. 2020. Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks?. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 20). ACM, New York, NY, 851–862. DOI:
[27]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[28]
Roger A. Horn and Charles R. Johnson. 1990. Norms for Vectors and Matrices. Cambridge University Press Cambridge. 313–386 pages.
[29]
Qiang Hu, Lei Ma, Xiaofei Xie, Bing Yu, Yang Liu, and Jianjun Zhao. 2019. DeepMutation++: A Mutation Testing Framework for Deep Learning Systems. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 1158–1161. DOI:
[30]
Yuheng Huang, Lei Ma, and Yuanchun Li. 2023. PatchCensor: Patch Robustness Certification for Transformers via Exhaustive Testing. ACM Transactions on Software Engineering and Methodology 32, 6 (Sep 2023), Article 154, 34 pages. DOI:
[31]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE), 1039–1049. DOI:
[32]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Master’s thesis. University of Tront.
[33]
Fred Lambert. 2016. Understanding the Fatal Tesla Accident on Autopilot and the NHTSA Probe. Retrieved from https://electrek.co/2016/07/01/understanding-fatal-tesla-accident-autopilot-nhtsa-probe/
[34]
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation 1, 4 (1989), 541–551. DOI:
[35]
Seokhyun Lee, Sooyoung Cha, Dain Lee, and Hakjoo Oh. 2020. Effective White-Box Testing of Deep Neural Networks with Adaptive Neuron-Selection Strategy. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’20). ACM, New York, NY, 165–176. DOI:
[36]
Renjue Li, Pengfei Yang, Cheng-Chao Huang, Youcheng Sun, Bai Xue, and Lijun Zhang. 2022. Towards Practical Robustness Analysis for DNNs Based on PAC-Model Learning. In Proceedings of the 44th International Conference on Software Engineering, 2189–2201.
[37]
Zenan Li, Xiaoxing Ma, Chang Xu, and Chun Cao. 2019. Structural Coverage Criteria for Neural Networks Could Be Misleading. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), 89–92. DOI:
[38]
Tianming Liu, Eliot Siegel, and Dinggang Shen. 2022. Deep Learning and Medical Image Analysis for COVID-19 Diagnosis and Prediction. Annual Review of Biomedical Engineering 24 (2022), 179–201.
[39]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018a. DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE ’18). ACM, New York, NY, 120–131. DOI:
[40]
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018c. DeepMutation: Mutation Testing of Deep Learning Systems. In Proceedings of the IEEE 29th International Symposium on Software Reliability Engineering (ISSRE), 100–111. DOI:
[41]
Shiqing Ma and Yingqi Liu. 2019. Nic: Detecting Adversarial Samples with Neural Network Invariant Checking. In Proceedings of the 26th Network and Distributed System Security Symposium (NDSS ’19).
[42]
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018b. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18). ACM, New York, NY, 175–186. DOI:
[43]
Wei Ma, Mike Papadakis, Anestis Tsakmalis, Maxime Cordy, and Yves Le Traon. 2021. Test Selection for Deep Learning Systems. ACM Transactions on Software Engineering and Methodology 30, 2 (Jan 2021), Article 13, 22 pages. DOI:
[44]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083. Retrieved from https://arxiv.org/abs/1706.06083
[45]
Mark Huasong Meng, Guangdong Bai, Sin Gee Teo, Zhe Hou, Yan Xiao, Yun Lin, and Jin Song Dong. 2022. Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective. IEEE Transactions on Dependable and Secure Computing (2022), 1–1. DOI:
[46]
Glenford J Myers, Corey Sandler, and Tom Badgett. 2011. The Art of Software Testing. Wiley.
[47]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
[48]
Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing. In Proceedings of the 36th International Conference on Machine Learning. Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, PMLR, 4901–4911. Retrieved from https://proceedings.mlr.press/v97/odena19a.html
[49]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles, 1–18.
[50]
Laura Elena Raileanu and Kilian Stoffel. 2004. Theoretical Comparison between the Gini Index and Information Gain Criteria. Annals of Mathematics and Artificial Intelligence 41, 1 (2004), 77–93. DOI:
[51]
Bruce Ratner. 2009. The Correlation Coefficient: Its Values Range between+ 1/- 1, or Do They? Journal of Targeting, Measurement and Analysis for Marketing 17, 2 (2009), 139–142.
[52]
Gregg Rothermel and Mary Jean Harrold. 1997. A Safe, Efficient Regression Test Selection Technique. ACM Transactions on Software Engineering and Methodology 6, 2 (Apr 1997), 173–210. https://doi.org/10.1145/248233.248262
[53]
Evgenia Rusak, Lukas Schott, Roland S. Zimmermann, Julian Bitterwolf, Oliver Bringmann, Matthias Bethge, and Wieland Brendel. 2020. A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions. In Proceedings of the European Conference on Computer Vision (ECCV ’20). Springer, Cham, 53–69.
[54]
Michael Schauperl and Rajiah Aldrin Denny. 2022. AI-Based Protein Structure Prediction in Drug Discovery: Impacts and Challenges. Journal of Chemical Information and Modeling 62, 13 (2022), 3142–3156.
[55]
Patrick Schober, Christa Boer, and Lothar A. Schwarte. 2018. Correlation Coefficients: Appropriate Use and Interpretation. Anesthesia & Analgesia 126, 5 (2018), 1763–1768.
[56]
Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, and Demis Hassabis. 2020. Improved Protein Structure Prediction Using Potentials from Deep Learning. Nature 577, 7792 (2020), 706–710.
[57]
Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. 2019. Adversarial training for free!. In Proceedings of the Advances in Neural Information Processing Systems. H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2019/file/7503cfacd12053d309b6bed5c89de212-Paper.pdf
[58]
Ali Shafahi, Mahyar Najibi, Zheng Xu, John Dickerson, Larry S. Davis, and Tom Goldstein. 2020. Universal Adversarial Training. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 5636–5643.
[59]
Dinggang Shen, Guorong Wu, and Heung-Il Suk. 2017. Deep Learning in Medical Image Analysis. Annual Review of Biomedical Engineering 19 (2017), 221.
[60]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556
[61]
Mathieu Sinn, Martin Wistuba, Beat Buesser, Maria-Irina Nicolae, and Minh Tran. 2019. Evolutionary Search for Adversarially Robust Neural Networks. In Proceedings of the Safe Machine Learning workshop at ICLR. Retrieved from https://sites.google.com/view/safeml-iclr2019/accepted-papers
[62]
Leslie N. Smith. 2017. Cyclical Learning Rates for Training Neural Networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), 464–472. DOI:
[63]
Jeongju Sohn, Sungmin Kang, and Shin Yoo. 2023. Arachne: Search-Based Repair of Deep Neural Networks. ACM Transactions on Software Engineering and Methodology 32, 4 (May 2023), Article 85, 26 pages. DOI:
[64]
Matthew Sotoudeh and Aditya V. Thakur. 2021. Provable Repair of Deep Neural Networks. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’21). ACM, New York, NY, 588–603. DOI:
[65]
Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE ’18). ACM, New York, NY, 109–119. DOI:
[66]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, 303–314. DOI:
[67]
Huiyan Wang, Jingwei Xu, Chang Xu, Xiaoxing Ma, and Jian Lu. 2020. Dissector: Input Validation for Deep Learning Applications by Crossing-Layer Dissection. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). ACM, New York, NY, 727–738. DOI:
[68]
Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, and Peng Cheng. 2021a. RobOT: Robustness-Oriented Testing for Deep Learning Systems. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 300–311. DOI:
[69]
Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering (ICSE), 1245–1256. DOI:
[70]
Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, and Wenbin Zhang. 2021b. Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 397–409. DOI:
[71]
Mohammad Wardat, Breno Dantas Cruz, Wei Le, and Hridesh Rajan. 2022. DeepDiagnosis: Automatically Diagnosing Faults and Recommending Actionable Fixes in Deep Learning Programs. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). ACM, New York, NY, 561–572. DOI:
[72]
Zhengyuan Wei and W.K. Chan. 2022. Predictive Mutation Analysis of Test Inputs Prioritization for Deep Neural Networks. In Proceedings of the 22nd IEEE International Conference on Software Quality, Reliability, and Security (QRS).
[73]
Zhengyuan Wei, Haipeng Wang, Imran Ashraf, and Wing-Kwong Chan. 2023. DeepPatch: Maintaining Deep Learning Model Programs to Retain Standard Accuracy with Substantial Robustness Improvement. ACM Transactions on Software Engineering and Methodology 32, 6 (Jun 2023), 1–49. https://doi.org/10.1145/3604609
[74]
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-Aware Patch Generation for Better Automated Program Repair. In Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE), 1–11. DOI:
[75]
Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. 2018. Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach. In Proceedings of the International Conference on Learning Representations (ICLR). DOI:
[76]
Martin Wistuba. 2019. Deep Learning Architecture Search by Neuro-Cell-Based Evolution with Function-Preserving Mutations. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD ’18). Springer-Verlag, Berlin, 243–258. DOI: https://doi.org/10.1007/978-3-030-10928-8_15
[77]
Eric Wong, Leslie Rice, and J Zico Kolter. 2020. Fast is better than free: Revisiting adversarial training. arXiv:2001.03994. Retrieved from https://arxiv.org/abs/2001.03994
[78]
Robert F Woolson. 2007. Wilcoxon Signed-Rank Test. In Wiley Encyclopedia of Clinical Trials, 1–3.
[79]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747. Retrieved from https://arxiv.org/abs/1708.07747
[80]
Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Jue Wang, and Zhi-Quan Luo. 2022. Stability Analysis and Generalization Bounds of Adversarial Training. In Proceedings of the Advances in Neural Information Processing Systems. S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35, Curran Associates, Inc., 15446–15459. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2022/file/637de5e2a7a77f741b0b84bd61c83125-Paper-Conference.pdf
[81]
Xiaofei Xie, Tianlin Li, Jian Wang, Lei Ma, Qing Guo, Felix Juefei-Xu, and Yang Liu. 2022. NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep Neural Networks. ACM Transactions on Software Engineering and Methodology 31, 3 (Apr 2022), Article 47, 27 pages. DOI:
[82]
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-Guided Fuzz Testing Framework for Deep Neural Networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’19). ACM, New York, NY, 146–157. DOI:
[83]
Shenao Yan, Guanhong Tao, Xuwei Liu, Juan Zhai, Shiqing Ma, Lei Xu, and Xiangyu Zhang. 2020. Correlations Between Deep Neural Network Model Coverage Criteria and Model Quality. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’20). ACM, New York, NY, USA, 775–787. DOI:
[84]
Hanmo You, Zan Wang, Junjie Chen, Shuang Liu, and Shuochuan Li. 2023. Regression Fuzzing for Deep Learning Systems. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE), 82–94. DOI:
[85]
Bing Yu, Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, and Jianjun Zhao. 2022. DeepRepair: Style-Guided Repairing for Deep Neural Networks in the Real-World Operational Environment. IEEE Transactions on Reliability 71, 4 (2022), 1401–1416. DOI:
[86]
Hao Zhang and W.K. Chan. 2019. Apricot: A Weight-Adaptation Approach to Fixing Deep Learning Models. In Proceeding of 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 376–387. DOI:
[87]
Jiliang Zhang and Chen Li. 2020. Adversarial Examples: Opportunities and Challenges. IEEE Transactions on Neural Networks and Learning Systems 31, 7 (2020), 2578–2593. DOI:
[88]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 132–142. DOI:
[89]
Xufan Zhang, Jiawei Liu, Ning Sun, Chunrong Fang, Jia Liu, Jiang Wang, Dong Chai, and Zhenyu Chen. 2021a. Duo: Differential Fuzzing for Deep Learning Operators. IEEE Transactions on Reliability 70, 4 (2021), 1671–1685. DOI:
[90]
Xiaoyu Zhang, Juan Zhai, Shiqing Ma, and Chao Shen. 2021b. AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 359–371. DOI:
[91]
Yuhan Zhi, Xiaofei Xie, Chao Shen, Jun Sun, Xiaoyu Zhang, and Xiaohong Guan. 2023. Seed Selection for Testing Deep Neural Networks. ACM Transactions on Software Engineering and Methodology 33, 1 (Nov 2023), Article 23, 33 pages. DOI:
[92]
Husheng Zhou, Wei Li, Zelun Kong, Junfeng Guo, Yuqun Zhang, Bei Yu, Lingming Zhang, and Cong Liu. 2020. DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE), 347–358.
[93]
Qilin Zhou, Zhengyuan Wei, Haipeng Wang, and WK Chan. 2023. A Majority Invariant Approach to Patch Robustness Certification for Deep Learning Models. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1790–1794. DOI:
[94]
Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Bo Jiang, and Wing-Kwong Chan. 2024. CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models. Proceedings of the ACM on Software Engineering 1 (Jul 2024), Article 120, 22 pages. DOI:
[95]
Xiaogang Zhu, Sheng Wen, Seyit Camtepe, and Yang Xiang. 2022. Fuzzing: A Survey for Roadmap. ACM Computing Surveys 54, 11s (Sep 2022), Article 230, 36 pages. DOI:
[96]
Chris Ziegler. 2016. A Google Self-Driving Car Caused a Crash for the First Time. Retrieved from https://www.theverge.com/2016/2/29/11134344/google-self-driving-car-crash-report

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 34, Issue 1
January 2025
967 pages
EISSN:1557-7392
DOI:10.1145/3703005
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 December 2024
Online AM: 24 July 2024
Accepted: 11 July 2024
Revised: 24 May 2024
Received: 28 February 2023
Published in TOSEM Volume 34, Issue 1

Check for updates

Author Tags

  1. context-awareness
  2. fuzzing algorithm
  3. robustness
  4. assessment
  5. metric

Qualifiers

  • Research-article

Funding Sources

  • CityU MF_EXT

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 302
    Total Downloads
  • Downloads (Last 12 months)302
  • Downloads (Last 6 weeks)66
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media