CVDF DYNAMIC—A Dynamic Fuzzy Testing Sample Generation Framework Based on BI-LSTM and Genetic Algorithm
<p>Complete Flow Chart of CVDF DYNAMIC Fuzzy Testing sample generation.</p> "> Figure 2
<p>Training of neural network.</p> "> Figure 3
<p>The Basic Structure Diagram Of Skip-Gram Model.</p> "> Figure 4
<p>The Specific Structure of LSTM Neuron.</p> "> Figure 5
<p>The Complete Structure of the bi-LSTM neural network.</p> "> Figure 6
<p>General Flow Chart of Generating Test Cases By Genetic Algorithm.</p> "> Figure 7
<p>The Relationship Between the Accuracy and Loss Of bi-LSTM And Epochs.</p> "> Figure 8
<p>Relationship Between Evaluation Indices And Layer Numbers.</p> "> Figure 9
<p>Comparative Test Results Of Genetic Algorithm And AFLFast.</p> "> Figure 10
<p>Execution Time Comparison.</p> "> Figure 11
<p>Comparison Of Compression Ratio Between Heuristic Genetic Algorithm And Approximation Algorithm.</p> "> Figure 12
<p>Comparison of Test Time Between Heuristic Genetic Algorithm And Approximation Algorithm.</p> ">
Abstract
:1. Introduction and Background
- (1)
- The strategy of sample generation based on a genetic algorithm;
- (2)
- The strategy of sample generation based on a bi-LSTM neural network;
- (3)
- The strategy of sample reduction based on a heuristic genetic algorithm.
2. Related Work
3. Algorithm Description
3.1. An Introduction of Existing Fuzzy Testing Sample Generation Methods
3.2. Formal Definition
- Definition 1 PUT (input sample)
- Definition 2 Set Covering Problem (SCP)
- Definition 3 Path Depth Detection Ability
3.3. CVDF DYNAMIC Fuzzy Testing Sample Generation
3.3.1. Theoretical Model and Training Process of BI-LSTM Neural Network
- (a)
- Preprocessing and Vectorization
Algorithm 1. Extracting program execution path |
Start Func Func ExtractPath(binary-source-code) 1: Start = LoadBinaryProgram(binary-source-code) 2: ProgStaddr = GetProgramEntry(Start) 3: ExecutionPath = [] 4: while True: 5: PackagePath = LoadCurrentPackage(ProgStaddr) 6: ExecutionPath +|= PackagePath 7: If ProgStaddr == JumpNextInstrument() 8: ProgStaddr = GetNextInstruAddr() 9: If ProgStaddr == EndOfMemSpace() 10: break 11: Return ExecutionPath End Func |
- (b)
- BI-LSTM neural network structure and parameter optimization
3.3.2. Genetic Algorithm for Constructing Test Cases
- (a)
- Population initialization
- (b)
- Tracking and executing the program under test
- Monitor whether the current test data will cause the tested program to crash;
- Record the execution path of the program
- (c)
- Fitness calculation
- (d)
- Individual selection, crossover and variation
Algorithm 2. Control Mutation |
Start Func Func ControlPROC(X,Y) 1: A = 1, B = 1 2: IF Y >= B THEN 3: FORK1: A = A × X, B = B + 1 4: ELSE: 5: IF X >= A THEN 6: FORK2: A = A + X, B = B − 1 7: ELSE: 8: FORK3: A = A − X, B = B/2 9: RETURN A End Func |
3.3.3. Integrating New Test Data with Integration Idea
3.3.4. Using Heuristic Genetic Algorithm to Reduce Sample Set
- (a)
- Using a compression matrix to represent chromosomes
- (b)
- Using heuristic genetic algorithm to improve chromosome
- (c)
- Paternal selection
- (d)
- Cross rate selection
- (e)
- Variation rate selection
- (f)
- Elite ratio
- (g)
- Stopping Criteria
4. Experiment and Evaluation
4.1. Data Sources
4.2. Evaluate the Validity of CVDF DYNAMIC’s Theoretical Model
4.3. Performance Comparison between CVDF DYNAMIC and Existing Fuzzy Testing Tools
4.4. Performance Overhead of CVDF DYNAMIC and Effectiveness of Sample Set Reduction
5. Discussion on Security and Privacy of CVDF DYNAMIC Model
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zou, Q.; Zhang, T.; Wu, R.; Ma, J.; Li, M.; Chen, C.; Hou, C. From automation to intelligence: Survey of research on vulnerability discovery technique. J. Tsinghua Univ. (Sci. Technol.) 2018, 58, 1079–1094. [Google Scholar] [CrossRef]
- Borzacchiello, L.; Coppa, E.; Demetrescu, C. FUZZOLIC: Mixing fuzzing and concolic execution. Comput. Secur. 2021, 108, 102368. [Google Scholar] [CrossRef]
- Liang, J.; Jiang, Y.; Wang, M.; Jiao, X.; Chen, Y.; Song, H.; Choo, K.R. DeepFuzzer: Accelerated Deep Greybox Fuzzing. IEEE Trans. Dependable Secur. Comput. 2019, 18, 2675–2688. [Google Scholar] [CrossRef] [Green Version]
- Huang, H.; Yao, P.; Wu, R.; Shi, Q.; Zhang, C. PANGOLIN: Incremental Hybrid Fuzzing with Polyhedral Path Abstraction. In Proceedings of the 2020 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 18–21 May 2020. [Google Scholar] [CrossRef]
- Gan, S.; Zhang, C.; Chen, P.; Zhao, B.; Qin, X.; Wu, D.; Chen, Z. GreyOne: Data Flow Sensitive Fuzzing. In Proceedings of the 29th USENIX Security Symposium, Boston, MA, USA, 12–14 August 2020; Available online: https://www.usenix.org/conference/usenixsecurity20/presentation/gan (accessed on 25 January 2022).
- Bohme, M.; Pham, V.-T.; Roychoudhury, A. Coverage-Based Greybox Fuzzing as Markov Chain. IEEE Trans. Softw. Eng. 2019, 5, 489–506. [Google Scholar] [CrossRef]
- Chen, Y.; Jiang, Y.; Ma, F.; Liang, J.; Wang, M.; Zhou, C.; Jiao, X.; Su, Z. EnFuzz: Ensemble Fuzzing with Seed Synchronization among Diverse Fuzzers. In Proceedings of the 28th USENIX Security Symposium, Santa Clara, CA, USA, 14–16 August 2019; Available online: https://www.usenix.org/conference/usenixsecurity19/presentation/chen-yuanliang (accessed on 24 January 2022).
- Wang, Y.; Wu, Z.; Wei, Q.; Wang, A.Q. NeuFuzz: Efficient Fuzzing with Deep Neural Network. IEEE Access 2019, 7, 36340–36352. [Google Scholar] [CrossRef]
- Lin, P.; Hong, Z.; Li, Y.; Wu, L. A priority based path searching method for improving hybrid fuzzing. Comput. Secur. 2021, 105, 102242. [Google Scholar] [CrossRef]
- Chen, P.; Chen, H. Angora: Efficient Fuzzing by Principled Search. In Proceedings of the 2018 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 20–24 May 2018. [Google Scholar] [CrossRef] [Green Version]
- Zhang, X.; Li, Z. Survey of Fuzz Testing Technology. Comput. Sci. 2016, 43, 5. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhao, L.; Jin, Y. Sensitive Region Prediction based on neuralnetwork in Fuzzy Test Algorithm Research. J. Cyber Secur. 2020, 5, 1. [Google Scholar] [CrossRef]
- Xie, X.F.; Li, X.H.; Chen, X.; Meng, G.Z.; Liu, Y. Hybrid testing based on symbolic execution and fuzzing. Ruan Jian Xue Bao/J. Softw. 2019, 30, 3071–3089. (In Chinese) [Google Scholar]
- Xu, P.; Liu, J.; Lin, B.; Sun, H.; Lei, B. Generation of fuzzing test case based on recurrent neural networks. Appl. Res. Comput. 2019, 36, 2679–2685. [Google Scholar] [CrossRef]
- Nagy, S.; Hicks, M. Full-speed Fuzzing: Reducing Fuzzing Overhead through Coverage-guided Tracing. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019. [Google Scholar] [CrossRef] [Green Version]
- Yang, M.F.; Huo, W.; Zou, Y.Y.; Yin, J.W.; Liu, B.X.; Gong, X.R.; Jia, X.Q.; Zou, W. Programmable fuzzing technology. Ruan Jian Xue Bao/J. Softw. 2018, 29, 1258–1274. (In Chinese) [Google Scholar]
- Godefroid, P.; Peleg, H.; Singh, R. Learn&Fuzz: Maching Learning for Input Fuzzing. In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, USA, 30 October−3 November 2017. [Google Scholar]
- Li, M.-L.; Huang, H.; Lu, Y.-L.; Zhu, K.-L. SymFuzz: Vulnerability Detection Technology under Complex Path Conditions. Comput. Sci. 2021, 48, 7. [Google Scholar] [CrossRef]
- Cheng, L.; Zhang, Y.; Zhang, Y.; Wu, C.; Li, Z.; Fu, Y.; Li, H. Optimizing seed inputs in fuzzing with machine learning. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), Montreal, QC, Canada, 25−31 May 2019. [Google Scholar]
- Ma, J.; Zhang, T.; Li, Z.; Zhang, J. Improved fuzzy analysis methods. J. Tsinghua Univ. (Sci. Technol.) 2016, 56, 478–483. [Google Scholar] [CrossRef]
- Aschermann, C.; Schumilo, S.; Abbasi, A.; Holz, T. IJON: Exploring Deep State Spaces via Fuzzing. In Proceedings of the 2020 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 18–21 May 2020. [Google Scholar] [CrossRef]
- Perl, H.; Arp, D.; Fahl, S. VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits. In Proceedings of the CCS ‘15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015. [Google Scholar] [CrossRef]
- Stephens, N.; Grosen, J.; Salls, C. Driller: Augmenting Fuzzing Through Selective Symbolic Execution. NDSS 2016, 16, 21–24. [Google Scholar] [CrossRef]
- She, D.; Pei, K.; Epstein, D.; Yang, J.; Ray, B.; Jana, S. NEUZZ: Efficient Fuzzing with Neural Program Smoothing. In Proceedings of the 2019 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 19–23 May 2019. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Zhong, R.; Hu, H.; Zhang, H.; Yang, Y.; Wu, D.; Lee, W. One Engine to Fuzz’em All: Generic Language Processor Testing with Semantic Validation. In Proceedings of the 2021 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 24–27 May 2021. [Google Scholar] [CrossRef]
- Zhang, Z.; You, W.; Tao, G.; Aafer, Y.; Liu, X.; Zhang, X. STOCHFUZZ: Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting. In Proceedings of the 2021 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 24–27 May 2021. [Google Scholar] [CrossRef]
- Yue, T.; Wang, P.; Tang, Y.; Wang, E.; Yu, B.; Lu, K.; Zhou, X. EcoFuzz: Adaptive Energy-Saving Greybox Fuzzing as a Variant of the Adversarial Multi-Armed Bandit. In Proceedings of the 29th USENIX Security Symposium, Boston, MA, USA, 12–14 August 2020; Available online: https://www.usenix.org/conference/usenixsecurity20/presentation/yue (accessed on 17 January 2022).
- Zong, P.; Lv, T.; Wang, D.; Deng, Z.; Liang, R.; Chen, K. FuzzGuard: Filtering out Unreachable Inputs in Directed Grey-Box Fuzzing through Deep Learning. In Proceedings of the 29th USENIX Security Symposium, Boston, MA, USA, 12–14 August 2020; Available online: https://www.usenix.org/conference/usenixsecurity20/presentation/zong (accessed on 17 January 2022).
- Österlund, S.; Razavi, K.; Bos, H.; Giuffrida, C. ParmeSan: Sanitizer-guided Greybox Fuzzing. In Proceedings of the 29th USENIX Security Symposium, Boston, MA, USA, 12–14 August 2020; Available online: https://www.usenix.org/conference/usenixsecurity20/presentation/osterlund (accessed on 17 January 2022).
- Oleksenko, O.; Trach, B. Mark Silberstein, Christof Fetzer, SpecFuzz: Bringing Spectre-type vulnerabilities to the surface. In Proceedings of the 29th USENIX Security Symposium, Boston, MA, USA, 12–14 August 2020; Available online: https://www.usenix.org/conference/usenixsecurity20/presentation/oleksenko (accessed on 5 January 2022).
- Lee, G.; Shim, W.; Lee, B. Constraint-Guided Directed Greybox Fuzzing. In Proceedings of the 30th USENIX Security Symposium, Vancouver, BC, Canada, 11–13 August 2021; Available online: https://www.usenix.org/conference/usenixsecurity21/presentation/lee-gwangmu (accessed on 13 January 2022).
- Salls, C.; Jindal, C.; Corina, J.; Kruegel, C.; Vigna, G. Token-Level Fuzzing. In Proceedings of the 30th USENIX Security Symposium, Vancouver, BC, Canada, 11–13 August 2021; Available online: https://www.usenix.org/conference/usenixsecurity21/presentation/salls (accessed on 10 January 2022).
- Liu, X.; Xie, L.; Wang, Y.; Zou, J.; Xiong, J.; Ying, Z.; Vasilakos, A.V. Privacy and Security Issues in Deep Learning: A Survey. IEEE Access 2021, 9, 4566–4593. [Google Scholar] [CrossRef]
- Mollah, M.B.; Azad, M.A.K.; Vasilakos, A. Secure data sharing and searching at the edge of cloud-assisted internet of things. IEEE Cloud Comput. 2017, 4, 34–42. [Google Scholar] [CrossRef]
- Yi, G.; Yang, X.; Huang, P.; Wang, Y. A Coverage-Guided Fuzzing Framework based on Genetic Algorithm for Neural Networks. In Proceedings of the 2021 8th International Conference on Dependable Systems and Their Applications (DSA), Yinchuan, China, 5–6 August 2021. [Google Scholar] [CrossRef]
- Lin, G.; Guan, J. An Adaptive Memetic Algorithm for Solving the Set Covering Problem. J. Zhejiang Univ. (Sci. Ed.) 2016, 43, 168–174. [Google Scholar] [CrossRef]
- Alyahya, K.; Rowe, J.E. Landscape Analysis of a Class of NP-Hard Binary Packing Problems. Evol. Comput. 2019, 27, 47–73. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hoesen, D.; Purwarianti, A. Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger. In Proceedings of the 2018 International Conference on Asian Language Processing (ALP), Bandung, Indonesia, 15–17 November 2018. [Google Scholar]
- Zhao, Y.W.; Wang, J.; Guo, M.Z.; Zhang, Z.L.; Yu, G.X. Prediction of protein function based on 0–1 matrix decomposition. Sci. Sin. Inf. Sci. 2019, 49, 1159–1174. (In Chinese) [Google Scholar]
- Github. Available online: https://github.com/ (accessed on 31 December 2021).
- NIST Test Suites. Available online: https://samate.nist.gov/SRD/testsuite.php (accessed on 31 December 2021).
- FFmpeg. FFmpeg [EB/OL]. (2016-10-01) [2017-04-13]. Available online: https://www.ffmpeg.org/ (accessed on 31 December 2021).
Data Sources | Types of Vulnerabilities |
---|---|
SARD | (CWE-119, CWE-120, CWE-131, CWE-134 etc) |
Security Focus | (CWE-119, CWE-120, CWE-189, CWE-369 etc) |
Github | (CWE-415, CWE-476, CWE-119, CWE-763 etc) |
Evaluation Indicator | Code Coverage | WDC | ||||
---|---|---|---|---|---|---|
Tool | ||||||
CVDF DYNAMIC | 5.6% | 92.3% | 88.9% | 89.6% | 2.76 | |
NeuFuzz | 10.2% | 79.8% | 83.4% | 24.7% | 2.78 | |
VDiscover | 8.5% | 86.7% | 85.8% | 86.5% | 2.33 | |
AFLFast | 11.2% | 88.7% | 82.9% | 80.1% | 1.94 |
Number of Samples | Compression Ratio | Execution Times/s | Code Coverage | WDC | |
---|---|---|---|---|---|
Initial sample set | 6308 | 54.6% | 46,184 | 89.6% | 2.76 |
Compressed sample set | 2864 | 32,428 | 89.6% | 2.76 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, M.; Han, L.; Qian, Y. CVDF DYNAMIC—A Dynamic Fuzzy Testing Sample Generation Framework Based on BI-LSTM and Genetic Algorithm. Sensors 2022, 22, 1265. https://doi.org/10.3390/s22031265
Ma M, Han L, Qian Y. CVDF DYNAMIC—A Dynamic Fuzzy Testing Sample Generation Framework Based on BI-LSTM and Genetic Algorithm. Sensors. 2022; 22(3):1265. https://doi.org/10.3390/s22031265
Chicago/Turabian StyleMa, Mingrui, Lansheng Han, and Yekui Qian. 2022. "CVDF DYNAMIC—A Dynamic Fuzzy Testing Sample Generation Framework Based on BI-LSTM and Genetic Algorithm" Sensors 22, no. 3: 1265. https://doi.org/10.3390/s22031265
APA StyleMa, M., Han, L., & Qian, Y. (2022). CVDF DYNAMIC—A Dynamic Fuzzy Testing Sample Generation Framework Based on BI-LSTM and Genetic Algorithm. Sensors, 22(3), 1265. https://doi.org/10.3390/s22031265