Abstract
The Android OS has recently gained immense popularity among smartphone users. It has also attracted many malware developers, leading to countless malicious applications in the ecosystem. Many recent reports suggest that the conventional signature-based malware detection technique fails to protect android smartphones from new and sophisticated malware attacks. Therefore, researchers are exploring machine learning-based malware detection systems that can successfully discriminate between malware and benign applications: effectively and efficiently. Existing literature suggests that many machine learning-based models use large feature sets for malware detection. However, classification models based on a large number of features are computationally expensive, time-consuming, and have poor generalizability. Therefore, this paper proposes a reliable feature reduction approach to select the most prominent features for effective and efficient malware detection. The proposed approach is tested on two different datasets, three distinct features, and twenty-six unique classifiers. The twenty-six baseline malware detection models based on 724 features and thirteen classification algorithms achieved an average accuracy and average AUC of \(94.73\%\) and \(94.49\%\), respectively. Later we performed feature reduction that works with mutually exclusive and merged feature spaces of android permissions, intents, and opcodes. The proposed feature reduction approach reduced the number of features from 724 to 72 (\(10\%\) of the original features). We also list the reduced set of 72 features comprising android permissions, intent, and opcode used for malware detection. The reduced features based twenty-six malware detection models achieved an average accuracy and average AUC of \(93.12\%\) and \(92.97\%\), respectively. The feature reduction leads to less than \(2\%\) reduction in average accuracy and AUC. However, it leads to \(85.25\%\) and \(91.45\%\) reduction in average test and average training time for twenty-six android malware detection models. Therefore, the feature reduction leads to a minute reduction in the effectiveness but results in massively efficient (w.r.t time) malware detection models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McAfee Mobile Threat Report. https://www.mcafee.com/en-us/consumer-support/2020-mobile-threat-report.html (2020). Accessed Jan 2023
AVTEST. https://portal.av-atlas.org/malware/statistics (2022). Accessed Jan 2023
IDC Smartphone Market Share. https://www.idc.com/promo/smartphone-market-share (2022). Accessed Jan 2023
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K.: Drebin: effective and explainable detection of android malware in your pocket. In: Network and Distributed System Security (NDSS) Symposium, vol. 14, pp. 23–26 (2014)
Chen, T., Mao, Q., Yang, Y., Zhu, J.: TinyDroid: a lightweight and efficient model for android malware detection & classification. Mobile Inf. Syst. 2018, 1–9 (2018)
Li, C., Mills, K., Niu, D., Zhu, R., Zhang, H., Kinawi, H.: Android malware detection based on factorization machine. IEEE Access 7, 184008–184019 (2019)
Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., Ye, H.: Significant permission identification for machine-learning-based android malware detection. IEEE Trans. Industr. Inf. 14(7), 3216–3225 (2018)
Liu, Y., Tantithamthavorn, C., Li, L., Liu, Y.: Deep learning for android malware defenses: a systematic literature review. arXiv preprint arXiv:2103.05292 (2021)
McLaughlin, N., et al.: Deep android malware detection. In: ACM Conference On Data and Application Security and PrivacY (CODASPY), pp. 301–308 (2017)
Pushpa Latha, D.: Bat optimization algorithm for wrapper-based feature selection and performance improvement of android malware detection (2021)
Qiu, J., Zhang, J., Luo, W., Pan, L., Nepal, S., Xiang, Y.: A survey of android malware detection with deep neural models. ACM Comput. Surv. (CSUR) 53(6), 1–36 (2020)
Rathore, H., Nikam, P., Sahay, S.K., Sewak, M.: Identification of adversarial android intents using reinforcement learning. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021)
Rathore, H., Sahay, S.K., Thukral, S., Sewak, M.: Detection of malicious android applications: classical machine learning vs. deep neural network integrated with clustering. In: Gao, H., J. Durán Barroso, R., Shanchen, P., Li, R. (eds.) BROADNETS 2020. LNICST, vol. 355, pp. 109–128. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68737-3_7
Rathore, H., Samavedhi, A., Sahay, S.K., Sewak, M.: Robust malware detection models: learning from adversarial attacks and defenses. Foren. Sci. Int. Digit. Investig. 37, 301183 (2021)
Sewak, M., Sahay, S.K., Rathore, H.: An investigation of a deep learning based malware detection system. In: 13th International Conference on Availability, Reliability and Security (ARES), pp. 1–5 (2018)
Sewak, M., Sahay, S.K., Rathore, H.: Assessment of the relative importance of different hyper-parameters of LSTM for an IDS. In: IEEE Region 10 Conference (TENCON), pp. 414–419. IEEE (2020)
Sewak, M., Sahay, S.K., Rathore, H.: DeepIntent: implicitintent based android IDS with E2E deep learning architecture. In: International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1–6. IEEE (2020)
Sewak, M., Sahay, S.K., Rathore, H.: Value-approximation based deep reinforcement learning techniques: an overview. In: International Conference on Computing Communication and Automation, pp. 379–384. IEEE (2020)
Sewak, M., Sahay, S.K., Rathore, H.: Deep reinforcement learning for cybersecurity threat detection and protection: a review. In: Krishnan, R., Rao, H.R., Sahay, S.K., Samtani, S., Zhao, Z. (eds.) Secure Knowledge Management In The Artificial Intelligence Era. SKM 2021. Communications in Computer and Information Science, vol. 1549, pp. 51–72. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-97532-6_4
Sewak, M., Sahay, S.K., Rathore, H.: DRLDO: a novel DRL based de-obfuscation system for defence against metamorphic malware. Def. Sci. J. 71(1), 55–65 (2021)
Sewak, M., Sahay, S.K., Rathore, H.: DRo: a data-scarce mechanism to revolutionize the performance of DL-based security systems. In: IEEE 46th Conference on Local Computer Networks (LCN), pp. 581–588. IEEE (2021)
Sun, L., Li, Z., Yan, Q., Srisa-an, W., Pan, Y.: SigPID: significant permission identification for android malware detection. In: 11th International Conference on Malicious and unwanted software (MALWARE), pp. 1–8. IEEE (2016)
Wang, W., Wang, X., Feng, D., Liu, J., Han, Z., Zhang, X.: Exploring permission-induced risk in android applications for malicious application detection. IEEE Trans. Inf. Forensics Secur. 9(11), 1869–1882 (2014)
Wang, X., Li, C.: Android malware detection through machine learning on kernel task structures. Neurocomputing 435, 126–150 (2021)
Wei, F., Li, Y., Roy, S., Ou, X., Zhou, W.: Deep ground truth analysis of current android malware. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 252–276. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60876-1_12
Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A survey on malware detection using data mining techniques. ACM Comput. Surv. (CSUR) 50(3), 1–40 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Rathore, H., Kharat, A., T, R., Manickavasakam, A., Sahay, S.K., Sewak, M. (2023). MalEfficient10%: A Novel Feature Reduction Approach for Android Malware Detection. In: Wang, W., Wu, J. (eds) Broadband Communications, Networks, and Systems. BROADNETS 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-031-40467-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-40467-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40466-5
Online ISBN: 978-3-031-40467-2
eBook Packages: Computer ScienceComputer Science (R0)