survey

A Comprehensive Analysis of Explainable AI for Malware Hunting

Authors:

Samaneh Mahdavifar,

Benjamin C. M. Fung,

Philippe CharlandAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 12

Article No.: 314, Pages 1 - 40

https://doi.org/10.1145/3677374

Published: 03 October 2024 Publication History

Abstract

In the past decade, the number of malware variants has increased rapidly. Many researchers have proposed to detect malware using intelligent techniques, such as Machine Learning (ML) and Deep Learning (DL), which have high accuracy and precision. These methods, however, suffer from being opaque in the decision-making process. Therefore, we need Artificial Intelligence (AI)-based models to be explainable, interpretable, and transparent to be reliable and trustworthy. In this survey, we reviewed articles related to Explainable AI (XAI) and their application to the significant scope of malware detection. The article encompasses a comprehensive examination of various XAI algorithms employed in malware analysis. Moreover, we have addressed the characteristics, challenges, and requirements in malware analysis that cannot be accommodated by standard XAI methods. We discussed that even though Explainable Malware Detection (EMD) models provide explainability, they make an AI-based model more vulnerable to adversarial attacks. We also propose a framework that assigns a level of explainability to each XAI malware analysis model, based on the security features involved in each method. In summary, the proposed project focuses on combining XAI and malware analysis to apply XAI models for scrutinizing the opaque nature of AI systems and their applications to malware analysis.

References

[1]

Ahmed Abusnaina, Aminollah Khormali, Hisham Alasmary, Jeman Park, Afsah Anwar, and Aziz Mohaisen. 2019. Adversarial learning attacks on graph-based IoT malware detection systems. In Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems. IEEE, 1296–1305. DOI:

[2]

A. Adadi and M. Berrada. 2018. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 6 (2018), 52138–52160. DOI:

[3]

W. A. Al-Khater, S. Al-Maadeed, A. A. Ahmed, A. S. Sadiq, and M. K. Khan. 2020. Comprehensive review of cybercrime detection techniques. IEEE Access 8 (2020), 137293–137311. DOI:

[4]

S. Alam, I. Traore, and I. Sogukpinar. 2015. Annotated control flow graph for metamorphic malware detection. The Computer Journal 58, 10 (2015), 2608–2621. DOI:

[5]

Mohammed M. Alani and Ali Ismail Awad. 2022. Paired: An explainable lightweight android malware detection system. IEEE Access 10 (2022), 73214–73228. DOI:

[6]

Mohammed M Alani, Atefeh Mashatan, and Ali Miri. 2023. XMal: A lightweight memory-based explainable obfuscated-malware detector. Computers & Security 133, C (Oct, 2023), 103409. DOI:

Digital Library

[7]

Rafa Alenezi and Simone A Ludwig. 2021. Explainability of cybersecurity threats data using SHAP. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence. IEEE, 01–10. DOI:

[8]

Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi, Ayad Al-Dujaili, Ye Duan, Omran Al-Shamma, José Santamaría, Mohammed A. Fadhel, Muthana Al-Amidie, and Laith Farhan. 2021. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data 8, 1 (2021), 1–74. DOI:

[9]

Namrata Govind Ambekar, N. Nandini Devi, Surmila Thokchom, and Yogita. 2024. TabLSTMNet: Enhancing android malware classification through integrated attention and explainable AI. Microsystem Technologies (2024), 1–19. DOI:

[10]

A. Arfeen, Z. A. Khan, R. Uddin, and U. Ahsan. 2022. Toward accurate and intelligent detection of malware. Concurrency and Computation: Practice and Experience 34, 4 (2022), e6652. DOI:

[11]

Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. Drebin: Effective and explainable detection of android malware in your pocket. In Proceedings of the Ndss. 23–26. DOI:

[12]

Nida Aslam, Irfan Ullah Khan, Samiha Mirza, Alanoud AlOwayed, Fatima M. Anis, Reef M. Aljuaid, and Reham Baageel. 2022. Interpretable machine learning models for malicious domains detection using explainable artificial intelligence (XAI). Sustainability 14, 12 (2022), 7375. DOI:

[13]

Ömer Aslan Aslan and Refik Samet. 2020. A comprehensive review on malware detection approaches. IEEE Access 8 (2020), 6249–6271. DOI:

[14]

J. Bai, Y. Yang, S. Mu, and Y. Ma. 2013. Malware detection through mining symbol table of linux executables. Information Technology Journal 12, 2 (2013), 380–384. DOI:

[15]

H. Berger, C. Hajaj, E. Mariconti, and A. Dvir. 2022. MaMaDroid2.0 – the holes of control flow graphs. 2 (2022), 1–14. DOI: arxiv:2202.13922

[16]

D. Bhusal and N. Rastogi. 2022. Adversarial Patterns: Building Robust Android Malware Classifiers. Building Robust Android Malware Classifiers. arXiv, Adversarial Patterns. DOI:

[17]

P. Bhuvaneshwari, A. N. Rao, and Y. H. Robinson. 2021. Spam review detection using self attention based CNN and bi-directional LSTM. Multimedia Tools and Applications 80, 12 (2021), 18107–18124. DOI:

Digital Library

[18]

Parthajit Borah, DK Bhattacharyya, and JK Kalita. 2020. Malware dataset generation and evaluation. In Proceedings of the 2020 IEEE 4th Conference on Information & Communication Technology. IEEE, 1–6. DOI:

[19]

Shamik Bose, Timothy Barao, and Xiuwen Liu. 2020. Explaining ai for malware detection: Analysis of mechanisms of malconv. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 1–8. DOI:

[20]

Nicola Capuano, Giuseppe Fenza, Vincenzo Loia, and Claudio Stanzione. 2022. Explainable artificial intelligence in cybersecurity: A survey. IEEE Access 10 (2022), 93575–93600. DOI:

[21]

Jaime G. Carbonell, Ryszard S. Michalski, and Tom M. Mitchell. 1983. An overview of machine learning. Machine Learning (1983), 3–23. DOI:

[22]

J. Choo and S. Liu. 2018. Visual analytics for explainable deep learning. IEEE Computer Graphics and Applications 38, 4 (2018), 84–92. DOI:

[23]

Giovanni Ciaramella, Fabio Martinelli, Francesco Mercaldo, and Antonella Santone. 2023. Exploring quantum machine learning for explainable malware detection. In Proceedings of the 2023 International Joint Conference on Neural Networks. 1–6. DOI:

[24]

Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In 2019 IEEE Symposium on Security and Privacy (SP). 472–489. DOI:

[25]

Y. Ding, W. Dai, S. Yan, and Y. Zhang. 2014. Control flow-based opcode behavior analysis for malware detection. Computers & Security 44 (2014), 65–74. DOI:

[26]

Yuxin Ding, Miaomiao Shao, Cai Nie, and Kunyang Fu. 2022. An efficient method for generating adversarial malware samples. Electronics 11, 1 (2022), 154. DOI:

[27]

Ann-Kathrin Dombrowski, Maximilian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel. 2019. Explanations can be manipulated and geometry is to blame. In Advances in Neural Information Processing Systems, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2019/file/bb836c01cdc9120a9c984c525e4b1a4a-Paper.pdf

[28]

Gianni D’Angelo, Eslam Farsimadan, Massimo Ficco, Francesco Palmieri, and Antonio Robustelli. 2023. Privacy-preserving malware detection in android-based IoT devices through federated markov chains. Future Generation Computer Systems 148 (2023), 93–105. DOI:

Digital Library

[29]

Mojtaba Eskandari, Zeinab Khorshidpour, and Sattar Hashemi. 2013. HDM-analyser: A hybrid analysis approach based on data mining techniques for malware detection. Journal of Computer Virology 9, 2(2013), 77–93. DOI:

Digital Library

[30]

Jeffrey Fairbanks, Andres Orbe, Christine Patterson, Janet Layne, Edoardo Serra, and Marion Scheepers. 2021. Identifying ATT&CK tactics in android malware control flow graph through graph representation learning and interpretability. In Proceedings of the 2021 IEEE International Conference on Big Data. IEEE, 5602–5608. DOI:

[31]

A. Feizollah, N. B. Anuar, R. Salleh, and A. W. A. Wahab. 2015. A review on feature selection in mobile malware detection. Digital Investigation 13 (2015), 22–37. DOI:

Digital Library

[32]

Premanand Ghadekar, Tejas Adsare, Neeraj Agrawal, Dhananjay Deore, and Tejas Dharmik. 2024. Multi-class malware detection using modified GNN and explainable AI. In Proceedings of the 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing. 1–8. DOI:

[33]

Daniel Gibert, Carles Mateu, and Jordi Planes. 2020. HYDRA: A multimodal deep learning framework for malware classification. Computers & Security 95 (2020), 101873. DOI:

[34]

D. Gibert, C. Mateu, and J. Planes. 2020. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. Journal of Network and Computer Applications 153, 102526 (2020), 1–22. DOI:

Digital Library

[35]

R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. 2018. A survey of methods for explaining black box models. Computings Surveys 51, 5(2018), 42. DOI:

Digital Library

[36]

Wenbo Guo, Dongliang Mu, Jun Xu, Purui Su, Gang Wang, and Xinyu Xing. 2018. Lemna: Explaining deep learning based security applications. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 364–379. DOI:

[37]

H. HaddadPajouh, A. Dehghantanha, R. Khayami, and K.-K. R. Choo. 2018. A deep recurrent neural network based approach for internet of things malware threat hunting. Future Generation Computer Systems 85 (2018), 88–96. DOI:

Digital Library

[38]

X. Han and B. Olivier. 2020. Interpretable and adversarially-resistant behavioral malware signatures. In Proceedings of the 35th Annual ACM Symposium on Applied Computing. Association for Computing Machinery 35 (2020), 1668–1677. DOI:

Digital Library

[39]

William Hardy, Lingwei Chen, Shifu Hou, Yanfang Ye, and Xin Li. 2016. DL4MD: A deep learning framework for intelligent malware detection. In Proceedings of the International Conference on Data Science. 61. Retrieved from https://api.semanticscholar.org/CorpusID:22913382

[40]

Jerome Dinal Herath, Priti Prabhakar Wakodikar, Ping Yang, and Guanhua Yan. 2022. CFGExplainer: Explaining graph neural network-based malware classification from control flow graphs. In Proceedings of the 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 172–184. DOI:

[41]

Weiwei Hu and Ying Tan. 2018. Black-box attacks against RNN based malware detection algorithms. In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence. DOI:

[42]

Weiwei Hu and Ying Tan. 2022. Generating adversarial malware examples for black-box attacks based on GAN. In Proceedings of the International Conference on Data Mining and Big Data. Springer, 409–423. DOI:

[43]

Giacomo Iadarola, Rosangela Casolare, Fabio Martinelli, Francesco Mercaldo, Christian Peluso, and Antonella Santone. 2021. A semi-automated explainability-driven approach for malware analysis through deep learning. In Proceedings of the 2021 International Joint Conference on Neural Networks. IEEE, 1–8. DOI:

[44]

G. Iadarola, F. Martinelli, F. Mercaldo, and A. Santone. 2021. Towards an interpretable deep learning model for mobile malware detection and family identification. Computers & Security 105 (2021), 102198. DOI:

[45]

Rafiqul Islam, Ronghua Tian, Lynn M. Batten, and Steve Versteeg. 2013. Classification of malware based on integrated static and dynamic features. Journal of Network and Computer Applications 36, 2 (2013), 646–656. DOI:

Digital Library

[46]

Anil K. Jain, Jianchang Mao, and K. Moidin Mohiuddin. 1996. Artificial neural networks: A tutorial. Computer 29, 3 (1996), 31–44. DOI:

Digital Library

[47]

G. Jain, M. Sharma, and B. Agarwal. 2019. Optimizing semantic LSTM for spam detection. International Journal of Information Technology 11, 2 (2019), 239–250. DOI:

[48]

Aditya K., Slawomir Grzonkowski, and Nhien An Lekhac. 2018. Enabling trust in deep learning models: A digital forensics case study. In Proceedings of the 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering. 1250–1255. DOI:

[49]

Arzu Gorgulu Kakisim, Mert Nar, and Ibrahim Sogukpinar. 2020. Metamorphic malware identification using engine-specific patterns based on co-opcode graphs. Computer Standards & Interfaces 71 (2020), 103443. DOI:

[50]

Abhishek Karnik, Suchandra Goswami, and Ratan Guha. 2007. Detecting obfuscated viruses using cosine similarity analysis. In Proceedings of the 1st Asia International Conference on Modelling & Simulation. 165–170. DOI:

Digital Library

[51]

Izhar Ahmed Khan, Nour Moustafa, Dechang Pi, Karam M. Sallam, Albert Y. Zomaya, and Bentian Li. 2021. A new explainable deep learning framework for cyber threat discovery in industrial IoT networks. IEEE Internet of Things Journal 9, 13 (2021), 11604–11613. DOI:

[52]

Martin Kinkead, Stuart Millar, Niall McLaughlin, and Philip O’Kane. 2021. Towards explainable CNNs for android malware detection. Procedia Computer Science 184 (2021), 959–965. DOI:

[53]

S. Kumar and B. Janet. 2022. DTMIC: Deep transfer learning for malware image classification. Journal of Information Security and Applications 64 (2022), 103063. DOI:

Digital Library

[54]

Aditya Kuppa and Nhien-An Le-Khac. 2020. Black box attacks on explainable artificial intelligence (XAI) methods in cyber security. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 1–8. DOI:

[55]

A. Kuppa and N.-A. Le-Khac. 2021. Adversarial XAI methods in cybersecurity. IEEE Transactions on Information Forensics and Security 16 (2021), 4924–4938. DOI:

Digital Library

[56]

Yuma Kurogome, Yuto Otsuki, Yuhei Kawakoya, Makoto Iwamura, Syogo Hayashi, Tatsuya Mori, and Koushik Sen. 2019. EIGER: Automated IOC generation for accurate and interpretable endpoint malware detection. In Proceedings of the 35th Annual Computer Security Applications Conference.Association for Computing Machinery, New York, NY, USA, 687–701. DOI:

Digital Library

[57]

C. Lacave and F. J. Diez. 2000. A review of explanation methods for bayesian networks. Knowledge Engineering Review 17 (2000), 2002. DOI:

Digital Library

[58]

N. Šrndić and P. Laskov. 2014. Practical evasion of a learning-based classifier: A case study. In Proceedings of the 2014 IEEE Symposium on Security and Privacy. IEEE, 197–211. DOI:

Digital Library

[59]

Miles Q Li, Benjamin CM Fung, Philippe Charland, and Steven HH Ding. 2021. I-MAD: Interpretable malware detector using galaxy transformer. Computers & Security 108 (2021), 102371. DOI:

Digital Library

[60]

Wei-Jen Li, Salvatore Stolfo, Angelos Stavrou, Elli Androulaki, and Angelos D. Keromytis. 2007. A study of malcode-bearing documents. In Detection of Intrusions and Malware, and Vulnerability Assessment: 4th International Conference, DIMVA 2007 Lucerne, Switzerland, July 12-13, 2007 Proceedings 4. Springer, 231–250. DOI:

Digital Library

[61]

Y. Li and Q. Liu. 2021. A comprehensive review study of cyber-attacks and cyber security; emerging trends and recent developments. Energy Reports 7 (2021), 8176–8186. DOI:

[62]

Yuzhou Lin and Xiaolin Chang. 2021. Towards interpretable ensemble learning for image-based malware detection. arXiv preprint arXiv:2101.04889 (2021). DOI:

[63]

Hong Liu, Chen Zhong, Awny Alnusair, and Sheikh Rabiul Islam. 2021. FAIXID: A framework for enhancing ai explainability of intrusion detection results using data cleaning techniques. Journal of Network and Systems Management 29, 4 (2021), 1–30. DOI:

Digital Library

[64]

L. Liu and B. Wang. 2016. Malware classification using gray-scale images and ensemble learning.3rd International Conference on Systems and Informatics (2016), 1018-1022.DOI:

[65]

Liu Liu, Bao-sheng Wang, Bo Yu, and Qiu-xi Zhong. 2017. Automatic malware classification and new malware detection using machine learning. Frontiers of Information Technology & Electronic Engineering 18, 9 (2017), 1336–1347. DOI:

[66]

W. Liu, P. Ren, K. Liu, and H. Duan. 2011. Behavior-based malware analysis and detection.2011 First International Workshop on Complexity and Data Mining (2011), 39–42. DOI:

Digital Library

[67]

Zhi Lu and Vrizlynn LL Thing. 2022. “How does it detect a malicious app?” Explaining the predictions of AI-based malware detector. In Proceedings of the 2022 IEEE 8th Intl Conference on Big Data Security on Cloud, IEEE Intl Conference on High Performance and Smart Computing, and IEEE Intl Conference on Intelligent Data and Security. IEEE, 194–199. DOI:

[68]

Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems.Curran Associates Inc., Red Hook, NY, USA, 4768–4777.

Digital Library

[69]

Z. Ma, H. Ge, Y. Liu, M. Zhao, and J. Ma. 2019. A combination method for android malware detection based on control flow graphs and machine learning algorithms. IEEE Access 7 (2019), 21235–21245. DOI:

[70]

Gabriel Maciá-Fernández, José Camacho, Roberto Magán-Carrión, Pedro García-Teodoro, and Roberto Therón. 2018. UGR ‘16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Computers & Security 73 (2018), 411–424. DOI:

[71]

Andreas Madsen, Siva Reddy, and Sarath Chandar. 2022. Post-hoc interpretability for neural NLP: A survey. ACM Computings Surveys 55, 8(2022), 42 pages. DOI:

Digital Library

[72]

Samaneh Mahdavifar. 2021. Explainable Deep Learning for Detecting Cyber Threats. Ph.D. Dissertation. University of New Brunswick.Retrieved from https://unbscholar.lib.unb.ca/handle/1882/14572

[73]

Samaneh Mahdavifar, Dima Alhadidi, and Ali A. Ghorbani. 2022. Effective and efficient hybrid android malware classification using pseudo-label stacked auto-encoder. Journal of Network and Systems Management 30 (2022), 1–34. DOI:

Digital Library

[74]

Samaneh Mahdavifar and Ali A. Ghorbani. 2019. Application of deep learning to cybersecurity: A survey. Neurocomputing 347 (2019), 149–176. DOI:

Digital Library

[75]

Samaneh Mahdavifar and Ali A. Ghorbani. 2023. CapsRule: Explainable deep learning for classifying network attacks. IEEE Transactions on Neural Networks and Learning Systems (2023), 1–15. DOI:

[76]

Samaneh Mahdavifar, Andi Fitriah Abdul Kadir, Rasool Fatemi, Dima Alhadidi, and Ali A. Ghorbani. 2020. Dynamic android malware category classification using semi-supervised deep learning. In Proceedings of the 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress. IEEE, 515–522. DOI:

[77]

Samaneh Mahdavifar, Nasim Maleki, Arash Habibi Lashkari, Matt Broda, and Amir H. Razavi. 2021. Classifying malicious domains using DNS traffic analysis. In Proceedings of the 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress. IEEE, 60–67. DOI:

[78]

Al-Ani Mustafa Majid, Ahmed Jamal Alshaibi, Evgeny Kostyuchenko, and Alexander Shelupanov. 2023. A review of artificial intelligence based malware detection using deep learning. Materials Today: Proceedings 80 (2023), 2678–2683. DOI:

[79]

Mohammad Saiful Islam Mamun, Mohammad Ahmad Rathore, Arash Habibi Lashkari, Natalia Stakhanova, and Ali A. Ghorbani. 2016. Detecting malicious urls using lexical analysis. In Network and System Security: 10th International Conference, NSS 2016, Taipei, Taiwan, September 28-30, 2016, Proceedings 10. Springer, 467–482. DOI:

[80]

D. Martens, B. Baesens, T. Van Gestel, and J. Vanthienen. 2007. Comprehensible credit scoring models using rule extraction from support vector machines. European Journal of Operational Research 183, 3 (2007), 1466–1476. DOI:

[81]

S. M. Mathews. 2019. Explainable artificial intelligence applications in NLP, biomedical, and malware classification: A literature review. In Proceedings of the Intelligent Computing.R. Bhatia Arai and S. Kapoor (Eds.), International Publishing, Springer, 1269–1292. DOI:

[82]

Akshay Mathur, Laxmi Mounika Podila, Keyur Kulkarni, Quamar Niyaz, and Ahmad Y. Javaid. 2021. NATICUSdroid: A malware detection framework for android using native and custom permissions. Journal of Information Security and Applications 58 (2021), 102696. DOI:

[83]

M. Melis, D. Maiorca, B. Biggio, G. Giacinto, and F. Roli. 2018. Explaining black-box android malware detection. 26th European Signal Processing Conference 524 (2018), 524–528. DOI:

[84]

Alan Mills, Theodoros Spyridopoulos, and Phil Legg. 2019. Efficient and interpretable real-time malware detection using random-forest. In Proceedings of the 2019 International Conference on Cyber Situational Awareness, Data Analytics and Assessment. IEEE, 1–8. DOI:

[85]

Jeff Mitchell, Niall McLaughlin, and Jesus Martinez-del Rincon. 2024. Generating sparse explanations for malicious Android opcode sequences using hierarchical LIME. Computers & Security 137 (2024), 103637. DOI:

Digital Library

[86]

Hamad Naeem, Bing Guo, Muhammad Rashid Naeem, and Danish Vasan. 2019. Visual malware classification using local and global malicious pattern. Journal of Computers6 (2019), 73–83. DOI:

[87]

A. P. Namanya, A. Cullen, I. U. Awan, and J. P. Disso. 2018. The world of malware: An overview.IEEE 6th International Conference on Future Internet of Things and Cloud (2018), 420–427. DOI:

[88]

Antonio Nappa, M. Zubair Rafique, and Juan Caballero. 2015. The MALICIA dataset: Identification and analysis of drive-by download operations. International Journal of Information Security 14, 1 (2015), 15–33. DOI:

Digital Library

[89]

Lakshmanan Nataraj, Sreejith Karthikeyan, Gregoire Jacob, and Bangalore S. Manjunath. 2011. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security. 1–7. DOI:

Digital Library

[90]

Smita Naval, Vijay Laxmi, Muttukrishnan Rajarajan, Manoj Singh Gaur, and Mauro Conti. 2015. Employing program semantics for malware detection. IEEE Transactions on Information Forensics and Security 10, 12 (2015), 2591–2604. DOI:

Digital Library

[91]

Stavros D. Nikolopoulos and Iosif Polenakis. 2017. A graph-based model for malware detection and classification using system-call groups. Journal of Computer Virology and Hacking Techniques 13, 1 (2017), 29–46. DOI:

[92]

S. Niksefat, P. Kaghazgaran, and B. Sadeghiyan. 2017. Privacy issues in intrusion detection systems: A taxonomy, survey and future directions. Computer Science Review 25 (2017), 69–78. DOI:

[93]

Ori Or-Meir, Nir Nissim, Yuval Elovici, and Lior Rokach. 2019. Dynamic malware analysis in the modern era-A state of the art survey. ACM Computing Surveys 52, 5 (2019), 1–48. DOI:

Digital Library

[94]

Zhixin Pan, Jennifer Sheldon, and Prabhat Mishra. 2020. Hardware-assisted malware detection using explainable machine learning. In Proceedings of the 2020 IEEE 38th International Conference on Computer Design. 663–666. DOI:

[95]

Younghee Park and Douglas Reeves. 2011. Deriving common malware behavior through graph clustering. In Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security. 497–502. DOI:

Digital Library

[96]

Lukas Pirch, Alexander Warnecke, Christian Wressnegger, and Konrad Rieck. 2021. Tagvet: Vetting malware tags using explainable machine learning. In Proceedings of the 14th European Workshop on Systems Security. 34–40. DOI:

Digital Library

[97]

Paul Prasse, Jan Brabec, Jan Kohout, Martin Kopp, Lukas Bajer, and Tobias Scheffer. 2021. Learning explainable representations of malware behavior. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 53–68. DOI:

Digital Library

[98]

Yanchen Qiao, Weizhe Zhang, Zhicheng Tian, Laurence T Yang, Yang Liu, and Mamoun Alazab. 2022. Adversarial malware sample generation method based on the prototype of deep learning detector. Computers & Security (2022), 102762. DOI:

Digital Library

[99]

Mohammad Muhibur Rahman, Anushua Ahmed, Mutasim Husain Khan, Mohammad Rakibul Hasan Mahin, Fahmid Bin Kibria, Dewan Ziaul Karim, and Mohammad Kaykobad. 2023. CNN vs transformer variants: Malware classification using binary malware images. In Proceedings of the 2023 IEEE International Conference on Communication, Networks and Satellite. IEEE, 308–315. DOI:

[100]

Asma Razgallah, Raphaël Khoury, Sylvain Hallé, and Kobra Khanmohammadi. 2021. A survey of malware detection in android apps: Recommendations and perspectives for future research. Computer Science Review 39 (2021), 100358. DOI:

[101]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Association for Computing Machinery, New York, NY, USA, 1135–1144. DOI:

Digital Library

[102]

Ishai Rosenberg, Shai Meir, Jonathan Berrebi, Ilay Gordon, Guillaume Sicard, and Eli Omid David. 2020. Generating end-to-end adversarial examples for malware classifiers using explainability. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 1–10. DOI:

[103]

Ishai Rosenberg, Asaf Shabtai, Yuval Elovici, and Lior Rokach. 2020. Query-efficient black-box attack against sequence-based malware classifiers. In Proceedings of the Annual Computer Security Applications Conference. 611–626. DOI:

Digital Library

[104]

Ishai Rosenberg, Asaf Shabtai, Lior Rokach, and Yuval Elovici. 2018. Generic black-box end-to-end attack against state of the art API call based malware classifiers. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 490–510. DOI:

[105]

I. A. Saeed, A. Selamat, and A. M. A. Abuagoub. 2013. A survey on malware and malware detection systems. International Journal of Computer Applications 67, 16 (2013), 25–31. DOI:

[106]

Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller. 2019. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 11700, (2019). Springer Nature.

Digital Library

[107]

M. Saqib, B. C. M. Fung, P. Charland, and A. Walenstein. 2024. GAGE: Genetic algorithm-based graph explainer for malware analysis. In Proceedings of the 40th IEEE International Conference on Data Engineering. IEEE Computer Society, Utrecht, Netherlands, 2258–2270.

[108]

V. Sai Sathyanarayan, Pankaj Kohli, and Bezawada Bruhadeshwar. 2008. Signature generation and detection of malware families. In Information Security and Privacy: 13th Australasian Conference, ACISP 2008, Wollongong, Australia, July 7-9, 2008. Proceedings 13. Springer, 336–349. DOI:

Digital Library

[109]

Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85–117. DOI:

Digital Library

[110]

M.G. Schultz, E. Eskin, F. Zadok, and S.J. Stolfo. 2001. Data mining methods for detection of new malicious executables. In Proceedings of the 2001 IEEE Symposium on Security and Privacy. 38–49. DOI:

[111]

Ali Shafiei, Vera Rimmer, Ilias Tsingenopoulos, Lieven Desmet, and Wouter Joosen. 2022. Position paper: On advancing adversarial malware generation using dynamic features. In Proceedings of the 1st Workshop on Robust Malware Analysis.. Association for Computing Machinery, New York, NY, USA, 15–20. DOI:

Digital Library

[112]

F. Shahzad and M. Farooq. 2012. ELF-Miner: Using structural knowledge and data mining methods to detect new (linux) malicious executables. Knowledge and Information Systems 30, 3 (2012), 589–612. DOI:

[113]

Larissa Shamseer, David Moher, Mike Clarke, Davina Ghersi, Alessandro Liberati, Mark Petticrew, Paul Shekelle, and Lesley A Stewart. 2015. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: Elaboration and explanation. Bmj 349 (2015), 1–25. DOI:

[114]

Madhu K. Shankarapani, Subbu Ramamoorthy, Ram S. Movva, and Srinivas Mukkamala. 2011. Malware detection using assembly and API call sequences. Journal in Computer Virology 7 (2011), 107–119. DOI:

Digital Library

[115]

Yashovardhan Sharma, Simon Birnbach, and Ivan Martinovic. 2023. RADAR: A TTP-based Extensible, explainable, and effective system for network traffic analysis and malware detection. In Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference.. Association for Computing Machinery, New York, NY, USA, 159–166. DOI:

Digital Library

[116]

J. Singh and J. Singh. 2021. A survey on machine learning-based malware detection in executable files. Journal of Systems Architecture 112 (2021), 101861. DOI:

[117]

Jagsir Singh and Jaswinder Singh. 2021. A survey on machine learning-based malware detection in executable files. Journal of Systems Architecture 112 (2021), 101861. DOI:https://doi.org/j.sysarc.2020.101861

[118]

Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society.Association for Computing Machinery, New York, NY, USA, 180–186. DOI:

Digital Library

[119]

Santosh K. Smmarwar, Govind P. Gupta, and Sanjay Kumar. 2023. XAI-AMD-DL: An explainable AI approach for android malware detection system using deep learning. In Proceedings of the 2023 IEEE World Conference on Applied Intelligence and Computing. IEEE, 423–428. DOI:

[120]

S. M. Sohi, J.-P. Seifert, and F. Ganji. 2021. RNNIDS: Enhancing network intrusion detection systems through deep learning. Computers & Security 102 (2021), 102151. DOI:

Digital Library

[121]

Diego Soi, Alessandro Sanna, Davide Maiorca, and Giorgio Giacinto. 2024. Enhancing android malware detection explainability through function call graph APIs. Journal of Information Security and Applications 80 (2024), 103691. DOI:

Digital Library

[122]

A. Souri and R. Hosseini. 2018. A state-of-the-art survey of malware detection approaches using data mining techniques. Human-Centric Computing and Information Sciences 8, 1 (2018), 3. DOI:

Digital Library

[123]

G. Srivastava, R. H. Jhaveri, S. Bhattacharya, S. Pandya, Maddikunta Rajeswari, P. K. R., G. Yenduri, J. G. Hall, M. Alazab, and T. R. Gadekallu. 2022. XAI for Cybersecurity: State of the Art, Challenges, Open Issues and Future Directions.DOI:

[124]

T. Stevens. 2020. Knowledge in the grey zone: AI and cybersecurity. Digital War 1, 1 (2020), 164–170. DOI:

[125]

J. Su, D. V. Vasconcellos, S. Prasad, D. Sgandurra, Y. Feng, and K. Sakurai. 2018. Lightweight classification of IoT malware based on image recognition. IEEE 42nd Annual Computer Software and Applications Conference 2 (2018), 664–669. DOI:

[126]

Trong-Nghia To, Hien Do Hoang, Phan The Duy, and Van-Hau Pham. 2023. MalDEX: An explainable malware detection system based on ensemble learning. In Proceedings of the 2023 International Conference on Multimedia Analysis and Pattern Recognition. 1–6. DOI:

[127]

K. Vredenburgh. 2021. The Right to Explanation. Wiley Online Library. . DOI:

[128]

Gérard Wagener, Radu State, and Alexandre Dulaunoy. 2008. Malware behaviour analysis. Journal in Computer Virology 4 (2008), 279–287. DOI:

[129]

Hua Wang, Cuiqin Ma, and Lijuan Zhou. 2009. A brief review of machine learning and its application. In Proceedings of the 2009 International Conference on Information Engineering and Computer Science. IEEE, 1–4. DOI:

[130]

Q. Wang, H. Yang, G. Wu, K.-K. R. Choo, Z. Zhang, G. Miao, and Y. Ren. 2022. Black-box adversarial attacks on XSS attack detection model. Computers & Security 113 (2022), 4102554. DOI:

Digital Library

[131]

Zihao Wang, K. W. Fok, and V. L. L. Thing. 2022. Machine learning for encrypted malicious traffic detection: Approaches, datasets and comparative study. Computers & Security 113 (2022), 102542. DOI:

Digital Library

[132]

Zhiqiang Wang, Q. Liu, and Y. Chi. 2020. Review of android malware detection based on deep learning. IEEE Access 8 (2020), 181102–181126. DOI:

[133]

Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, and Michael R. Lyu. 2021. Why an android app is classified as malware: Toward malware classification interpretation. ACM Transactions on Software Engineering and Methodology 30, 2 (2021), 1–29. DOI:

Digital Library

[134]

Tobias Wüchner, Martín Ochoa, and Alexander Pretschner. 2015. Robust and effective malware detection through quantitative data flow graph metrics. In Detection of Intrusions and Malware, and Vulnerability Assessment: 12th International Conference, DIMVA 2015, Milan, Italy, July 9-10, 2015, Proceedings 12. Springer, 98–118. DOI:

Digital Library

[135]

T. Wüchner, M. Ochoa, and A. Pretschner. 2014. Malware detection with quantitative data flow graphs. In Proceedings of the 9th ACM Symposium on Information. Computer and Communications Security, 271–282. DOI:

Digital Library

[136]

Fei Xiao, Zhaowen Lin, Yi Sun, and Yan Ma. 2019. Malware detection based on deep learning of behavior graphs. Mathematical Problems in Engineering 2019, 1 (2019), 8195395. DOI:

[137]

G. Xiao, J. Li, Y. Chen, and K. Li. 2020. MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks. Journal of Parallel and Distributed Computing 141 (2020), 49–58. DOI:

[138]

Hiromu Yakura, Shinnosuke Shinozaki, Reon Nishimura, Yoshihiro Oyama, and Jun Sakuma. 2017. Malware analysis of imaged binary samples by convolutional neural network with attention mechanism. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.. Association for Computing Machinery, New York, NY, USA, 55–56. DOI:

Digital Library

[139]

Anli Yan, Zhenxiang Chen, Haibo Zhang, Lizhi Peng, Qiben Yan, Muhammad Umair Hassan, Chuan Zhao, and Bo Yang. 2021. Effective detection of mobile malware behavior based on explainable deep neural network. Neurocomputing 453 (2021), 482–492. DOI:

Digital Library

[140]

J. Yang, T. Li, G. Liang, Y. Wang, T. Gao, and F. Zhu. 2020. Spam transaction attack detection model based on GRU and WGAN-div. Computer Communications 161 (2020), 172–182. DOI:

[141]

Wei Yang, Deguang Kong, Tao Xie, and Carl A. Gunter. 2017. Malware detection in adversarial settings: Exploiting feature evolutions and confusions in android apps. In Proceedings of the 33rd Annual Computer Security Applications Conference.Association for Computing Machinery, New York, NY, USA, 288–302. DOI:

Digital Library

[142]

Xin Yao. 1993. A review of evolutionary artificial neural networks. International Journal of Intelligent Systems 8, 4 (1993), 539–567. DOI:

Digital Library

[143]

Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. 2024. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing (2024), 100211. DOI:

[144]

Yanfang Ye, Lingwei Chen, Shifu Hou, William Hardy, and Xin Li. 2018. DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowledge and Information Systems 54 (2018), 265–285. DOI:

Digital Library

[145]

M. Yousefi-Azar, V. Varadharajan, L. Hamey, and U. Tupakula. 2017. Autoencoder-based feature learning for cyber security applications. International Joint Conference on Neural Networks (2017), 3854–3861. DOI:

[146]

J. Yuan, C. Chen, W. Yang, M. Liu, J. Xia, and S. Liu. 2021. A survey of visual analytics techniques for machine learning. Computational Visual Media 7, 1 (2021), 3–36. DOI:

[147]

H. Zhang, L. Huang, C. Q. Wu, and Z. Li. 2020. An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset. Computer Networks 177 (2020), 107315. DOI:

[148]

Xinyang Zhang, Ningfei Wang, Hua Shen, Shouling Ji, Xiapu Luo, and Ting Wang. 2020. Interpretable deep learning under fire. In Proceedings of the 29th USENIX Conference on Security Symposium.USENIX Association, USA, 18 pages.

[149]

M. Zheng, M. Sun, and J. C. S. Lui. 2013. Droid analytics: A signature based analytic system to collect, extract, analyze and associate android malware. In Proceedings of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and. Communications, 163–171. DOI:

Digital Library

[150]

Yajin Zhou and Xuxian Jiang. 2012. Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 95–109. DOI:

Digital Library

[151]

D. Zhu, H. Jin, Y. Yang, D. Wu, and W. Chen. 2017. DeepFlow: Deep learning-based malware detection by mining android application for abnormal usage of sensitive data. IEEE Symposium on Computers and Communications (2017), 438–443. DOI:

Index Terms

A Comprehensive Analysis of Explainable AI for Malware Hunting
1. Computing methodologies
  1. Artificial intelligence
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation

Recommendations

A framework for metamorphic malware analysis and real-time detection

Metamorphism is a technique that mutates the binary code using different obfuscations. It is difficult to write a new metamorphic malware and in general malware writers reuse old malware. To hide detection the malware writers change the obfuscations (...
Automatic generation of vaccines for malware immunization
CCS '12: Proceedings of the 2012 ACM conference on Computer and communications security

Inspired by the biological vaccines, we explore the possibility of developing similar vaccines for malware immunization. We provide the first systematic study towards this direction and present a prototype system, AGAMI, for automatic generation of ...
Malware analysis method using visualization of binary files
RACS '13: Proceedings of the 2013 Research in Adaptive and Convergent Systems

Malware authors have been generating and disseminating malware variants through various ways, such as reusing modules or using automated malware generation tools. With the help of the malware generation techniques, the number of malware keeps increasing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 12

December 2024

966 pages

EISSN:1557-7341

DOI:10.1145/3613718

Issue’s Table of Contents

This article was authored by employees of the Government of Canada. As such, the Canadian government retains all interest in the copyright to this work and grants to ACM a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, provided that clear attribution is given both to the authors and the Canadian government agency employing them. Permission to make digital or hard copies for personal or classroom use is granted. Copies must bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the Canadian Government must be honored. To copy otherwise, distribute, republish, or post, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2024

Online AM: 11 July 2024

Accepted: 05 July 2024

Revised: 14 June 2024

Received: 20 August 2023

Published in CSUR Volume 56, Issue 12

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

BlackBerry Ltd.
Defence Research and Development Canada
NSERC Alliance
NSERC Discovery
Canada Research Chairs Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
822
Total Downloads

Downloads (Last 12 months)822
Downloads (Last 6 weeks)374

Reflects downloads up to 30 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents