Abstract
Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use Android for their daily routines, including important communi-cations. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search (GS), which is a search based on a genetic algorithm (GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Naïve Bayes (NB), functional trees (FT), J48, random forest (RF), and multilayer perceptron (MLP). Among these classifiers, FT gave the highest accuracy (95%) and true positive rate (TPR) (96.7%) with the use of only six features.
Similar content being viewed by others
References
Aafer Y, Du WL, Yin H, 2013. Droidapiminer: mining API-level features for robust malware detection in Android. Proc 9th Int ICST Conf on Security and Privacy in Communication Networks, p.86–103.
Adewole KS, Anuar NB, Kamsin A, et al., 2017. Malicious accounts: dark of the social networks. J Netw Comput Appl, 79:41–67. https://doi.org/10.1016/j.jnca.2016.11.030
Afifi F, Anuar NB, Shamshirband S, et al., 2016. Dyhap: dynamic hybrid ANFIS-PSO approach for predicting mobile malware. PLoS ONE, 11(9):e0162627. https://doi.org/10.1371/journal.pone.0162627
Android, 2015. App manifest. https://doi.org/developer.Android.com/guide/topics/manifest/manifest-intro.html [Accessed on Apr. 28, 2015].
Android Developers, 2015. Android security overview. Android. https://doi.org/source.Android.com/devices/tech/security/ [Accessed on Sept. 1, 2015].
Anuar NB, Sallehudin H, Gani A, et al., 2008. Identifying false alarm for network intrusion detection system using hybrid data mining and decision tree. Malays J Comput Sci, 21(2):101–115.
Anuar NB, Papadaki M, Furnell S, et al., 2013. Incident prioritisation using analytic hierarchy process (AHP): risk index model (RIM). Secur Commun Netw, 6(9):1087–1116. https://doi.org/10.1002/sec.673
Apvrille A, Strazzere T, 2012. Reducing the window of opportunity for Android malware gotta catch’ em all. J Comput Virol, 8(1-2):61–71. https://doi.org/10.1007/s11416-012-0162-3
Arp D, Spreitzenbarth M, Malte H, et al., 2014. Drebin: effective and explainable detection of Android malware in your pocket. Proc Symp on Network and Distributed System Security, p.1–15.
Arzt S, Rasthofer S, Fritz C, et al., 2014. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android Apps. Proc 35th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.259–269. https://doi.org/10.1145/2666356.2594299
Aung Z, Zaw W, 2013. Permission-based Android malware detection. Int J Sci Technol Res, 2(3):228–234.
Bartel A, Klein J, Le Traon Y, et al., 2012. Automatically securing permission-based software by reducing the attack surface: an application to Android. Proc 27th IEEE/ACM Int Conf on Automated Software Engineering, p.274–277. https://doi.org/10.1145/2351676.2351722
Bird S, Klein E, Loper E, 2009. Natural language processing with Python—analyzing text with the natural language toolkit. O’Reilly Media.
Burguera I, Zurutuza U, Nadjm-Tehrani S, 2011. Crowdroid: behavior-based malware detection system for Android. Proc 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, p.15–26. https://doi.org/10.1145/2046614.2046619
Caruana R, Karampatziakis N, Yessenalina A, 2008. An empirical evaluation of supervised learning in high dimensions. Proc 25th Int Conf on Machine Learning, p.96–103. https://doi.org/10.1145/1390156.1390169
Chan PPK, Song WK, 2014. Static detection of Android malware by using permissions and API calls. Proc Int Conf on Machine Learning and Cybernetics, p.82–87. https://doi.org/10.1109/ICMLC.2014.7009096
Chang TK, Hwang GH, 2007. The design and implementation of an application program interface for securing XML documents. J Syst Softw, 80(8):1362–1374. https://doi.org/10.1016/j.jss.2006.10.051
Chess B, McGraw G, 2004. Static analysis for security. IEEE Secur Priv, 2(6):76–79. https://doi.org/10.1109/MSP.2004.111
Deshotels L, Notani V, Lakhotia A, 2014. Droidlegacy: automated familial classification of Android malware. Proc ACM SIGPLAN on Program Protection and Reverse Engineering Workshop, Article 3. https://doi.org/10.1145/2556464.2556467
Desnos A, 2015. Androguard. https://doi.org/github.com/androguard/androguard [Accessed on June 29, 2015].
Díaz-Uriarte R, de Andrés SA, 2006. Gene selection and classification of microarray data using random forest. BMC Bioinform, 7:3. https://doi.org/10.1186/1471-2105-7-3
eBay, 2016. Online shopping. www.ebay.com [Accessed on Apr. 4, 2016].
Faruki P, Ganmoor V, Laxmi V, et al., 2013. AndroSimilar: robust statistical feature signature for Android malware detection. Proc 6th Int Conf on Security of Information and Networks, p.152–159. https://doi.org/10.1145/2523514.2523539
Feizollah A, Anuar NB, Salleh R, et al., 2013a. A study of machine learning classifiers for anomaly-based mobile botnet detection. Malays J Comput Sci, 26(4):251–265.
Feizollah A, Shamshirband S, Anuar NB, et al., 2013b. Anomaly detection using cooperative fuzzy logic controller. Proc 16th FIRA RoboWorld Congress, p.220–231. https://doi.org/10.1007/978-3-642-40409-2_19
Feizollah A, Anuar NB, Salleh R, et al., 2015. A review on feature selection in mobile malware detection. Dig Invest, 13:22–37. https://doi.org/10.1016/j.diin.2015.02.001
Feizollah A, Anuar NB, Salleh R, et al., 2017. Androdialysis: analysis of Android intent effectiveness in malware detection. Comput Secur, 65:121–134. https://doi.org/10.1016/j.cose.2016.11.007
Feng Y, Anand S, Dillig I, et al., 2014. Apposcopy: semantics-based detection of Android malware through static analysis. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.576–587. https://doi.org/10.1145/2635868.2635869
Firdaus A, Anuar NB, 2015. Root-exploit malware detection using static analysis and machine learning. Proc 4th Int Conf on Computer Science and Computational Mathematics, p.177–183.
Frank E, Hall MA, Witten IH, 2016. The WEKA Workbench (4th Ed.). Morgan Kaufmann. https://doi.org/www.cs.waikato.ac.nz/ml/WEKA/Witten_et_al_2016_appendix.pdf
Fröhlich H, Chapelle O, Schölkopf B, 2003. Feature selection for support vector machines by means of genetic algorithm. Proc 15th IEEE Int Conf on Tools with Artificial Intelligence, p.142–148. https://doi.org/10.1109/TAI.2003.1250182
Gascon H, Yamaguchi F, Arp D, et al., 2013. Structural detection of Android malware using embedded call graphs. Proc ACM Workshop on Artificial Intelligence and Security, p.45–54. https://doi.org/10.1145/2517312.2517315
Goldberg DE, Holland JH, 1988. Genetic algorithms and machine learning. Mach Learn, 3(2-3):95–99. https://doi.org/10.1023/A:1022602019183
Google, 2014. Google play store. https://doi.org/play.google.com/store?hl=en [Accessed on Jan. 1, 2014].
Gordon MI, Kim D, Perkins J, et al., 2015. Information-flow analysis of Android applications in droidSafe. Proc Network and Distributed System Security Symp, p.8–11.
Grace M, Zhou YJ, Wang Z, et al., 2012a. Systematic detection of capability leaks in stock Android smartphones. Proc 19th Network and Distributed System Security Symp, p.1–15.
Grace M, Zhou W, Jiang XX,et al., 2012b. Unsafe exposure analysis of mobile in-app advertisements. Proc 5th ACM Conf on Security and Privacy in Wireless and Mobile Networks, p.101–112. https://doi.org/10.1145/2185448.2185464
Grace M, Zhou YJ, Zhang Q, et al., 2012c. RiskRanker: scalable and accurate zero-day Android malware detection. Proc 10th Int Conf on Mobile Systems, Applications, and Services, p.281–294. https://doi.org/10.1145/2307636.2307663
Hall M, Frank E, Holmes G, et al., 2009. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl, 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Huang CY, Tsai YT, Hsu CH, 2013. Performance evaluation on permission-based detection for Android malware. Proc Int Computer Symp, p.111–120. https://doi.org/10.1007/978-3-642-35473-1_12
Huang JJ, Zhang XY, Tan L, et al., 2014. AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction. Proc 36th Int Conf on Software Engineering, p.1036–1046. https://doi.org/10.1145/2568225.2568301
Ikinci A, Holz T, Freiling F, 2008. Monkey-spider: detecting malicious websites with low-interaction honeyclients. Proc Sicherheit-Schutz und Zuverlässigkeit, p.407–421.
Junaid M, Liu DG, Kung D, 2016. Dexteroid: detecting malicious behaviors in Android apps using reverse- engineered life cycle models. Comput Secur, 59:92–117. https://doi.org/10.1016/j.cose.2016.01.008
Kang H, Jang JW, Mohaisen A, et al., 2015. Detecting and classifying Android malware using static analysis along with creator information. Int J Distr Sens Netw, 11(6), Article 7. https://doi.org/10.1155/2015/479174
Karim A, Salleh RB, Shiraz M, et al., 2014. Botnet detection techniques: review, future trends, and issues. J Zhejiang Univ Sci-C (Comput & Elcetron), 15(11):943–983. https://doi.org/10.1631/jzus.C1300242
Karim A, Salleh R, Khan MK, 2016. Smartbot: a behavioral analysis framework augmented with machine learning to identify mobile botnet applications. PLoS ONE, 11(3):e0150077. https://doi.org/10.1371/journal.pone.0150077
Khatavakhotan AS, Ow SH, 2015. Development of a software risk management model using unique features of a proposed audit component. Malays J Comput Sci, 28(2):110–131.
Komili O, 2015. Sophos detects 100% of Android malware in independent test—for the sixth time in a row. https://doi.org/blogs.sophos.com/2015/08/14/sophos-detects-100-of-Android-malware-in-independent-test-for-the-sixth-time-in-a-row/ [Accessed on Jan. 1, 2016].
Kotsiantis SB, 2013. Decision trees: a recent overview. Artif Intell Rev, 39(4):261–283. https://doi.org/10.1007/s10462-011-9272-4
Kotsiantis SB, Zaharakis ID, Pintelas PE, 2006. Machine learning: a review of classification and combining techniques. Artif Intell Rev, 26(3):159–190. https://doi.org/10.1007/s10462-007-9052-3
La Delfa GC, Monteleone S, Catania V, et al., 2016. Performance analysis of visualmarkers for indoor navigation systems. Front Inform Technol Electron Eng, 17(8):730–740. https://doi.org/10.1631/FITEE.1500324
Lai HJ, Tang Y, Luo HX,et al., 2011. Greedy feature selection for ranking. Proc 15th Int Conf on Computer Supported Cooperative Work in Design, p.42–46. https://doi.org/10.1109/CSCWD.2011.5960053
Lee J, Lee S, Lee H, 2015. Screening smartphone applications using malware family signatures. Comput Secur, 52:234–249. https://doi.org/10.1016/j.cose.2015.02.003
Lee SH, Jin SH, 2013. Warning system for detecting malicious applications on Android system. Int J Comput Commun Eng, 2(3):324–327. https://doi.org/10.7763/IJCCE.2013.V2.197
Liang SY, Keep AW, Might M, et al., 2013. Sound and precise malware analysis for Android via pushdown reachability and entry-point saturation. Proc 3th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, p.21–32. https://doi.org/10.1145/2516760.2516769
Lippmann R, 1987. An introduction to computing with neural nets. IEEE ASSP Mag, 4(2):4–22. https://doi.org/10.1109/MASSP.1987.1165576
Lu L, Li ZC, Wu ZY,et al., 2012. CHEX: statically vetting Android apps for component hijacking vulnerabilities. Proc ACM Conf on Computer and Communications Security, p.229–240. https://doi.org/10.1145/2382196.2382223
Middlemiss MJ, Dick G, 2003. Weighted feature extraction using a genetic algorithm for intrusion detection. Proc Congress on Evolutionary Computation, p.1669–1675. https://doi.org/10.1109/CEC.2003.1299873
Narudin FA, Feizollah A, Anuar NB,et al., 2016. Evaluation of machine learning classifiers for mobile malware detection. Soft Comput, 20(1):343–357. https://doi.org/10.1007/s00500-014-1511-6
Peiravian N, Zhu XQ, 2013. Machine learning for Android malware detection using permission and API calls. Proc 25th Int Conf on Tools with Artificial Intelligence, p.300–305. https://doi.org/10.1109/ICTAI.2013.53
Peng H, Gates C, Sarma B, et al., 2012. Using probabilistic generative models for ranking risks of Android apps. Proc ACM Conf on Computer and Communications Security, p.241–252. https://doi.org/10.1145/2382196.2382224
Punch WFIII, Goodman ED, Pei M, et al., 1993. Further research on feature selection and classification using genetic algorithms. Proc 5th Int Conf on Genetic Algorithms, p.557–564.
Rasthofer S, Arzt S, Bodden E, 2014. A machine-learning approach for classifying and categorizing Android sources and sinks. Proc Network and Distributed System Security Symp, p.1–15.
Razak MFA, Anuar NB, Salleh R, et al., 2016. The rise of “malware”: bibliometric analysis of malware study. J Netw Comput Appl, 75:58–76. https://doi.org/10.1016/j.jnca.2016.08.022
Russon MA, 2016. Android malware discovered on Google Play has infected millions of users with spyware. https://doi.org/www.ibtimes.co.uk/Android-malware-discovered-google-play-store-1553341 [Accessed on June 13, 2016].
Sahs J, Khan L, 2012. A machine learning approach to Android malware detection. Proc European Intelligence and Security Informatics Conf, p.141–147. https://doi.org/10.1109/EISIC.2012.34
Samra AAA, Yim K, Ghanem OA, 2013. Analysis of clustering technique in Android malware detection. Proc 7th Int Conf on Innovative Mobile and Internet Services in Ubiquitous Computing, p.729–733. https://doi.org/10.1109/IMIS.2013.111
Sanz B, Santos I, Laorden C, et al., 2013a. PUMA: permission usage to detect malware in Android. Int Joint Conf CISIS’12-ICEUTE’12-SOCO’12 Special Sessions. Springer Berlin Heidelberg, p.289–298.
Sanz B, Santos I, Laorden C, et al., 2013b. Mama: manifest analysis for malware detection in Android. Cybern Syst, 44(6-7):469–488. https://doi.org/10.1080/01969722.2013.803889
Sarip AG, Hafez MB, Daud MN, 2016. Application of fuzzy regression model for real estate price prediction. Malays J Comput Sci, 29(1):15–27. https://doi.org/10.22452/mjcs.vol29no1.2
Sarma BP, Li NH, Gates C, et al., 2012. Android permissions: a perspective combining risks and benefits. Proc 17th ACM Symp on Access Control Models and Technologies, p.13–22. https://doi.org/10.1145/2295136.2295141
Schmidt AD, Bye R, Schmidt HG, et al., 2009a. Static analysis of executables for collaborative malware detection on Android. Proc IEEE Int Conf on Communications, p.1–5. https://doi.org/10.1109/ICC.2009.5199486
Schmidt AD, Schmidt HG, Batyuk L, et al., 2009b. Smartphone malware evolution revisited: Android next target? Proc 4th Int Conf on Malicious and Unwanted Software, p.1–7. https://doi.org/10.1109/MALWARE.2009.5403026
Schneider J, 2016. Cross validation. https://doi.org/www.cs.cmu.edu/~schneide/tut5/node42.html [Accessed on Aug. 1, 2016].
Seo SH, Gupta A, Mohamed Sallam A, et al., 2014. Detecting mobile malware threats to homeland security through static analysis. J Netw Comput Appl, 38:43–53. https://doi.org/10.1016/j.jnca.2013.05.008
Shabtai A, Fledel Y, Elovici Y, 2010. Automated static code analysis for classifying Android applications using machine learning. Proc Int Conf on Computational Intelligence and Security, p.329–333. https://doi.org/10.1109/CIS.2010.77
Shabtai A, Kanonov U, Elovici Y, et al., 2012. “Andromaly”: a behavioral malware detection framework for Android devices. J Intell Inform Syst, 38(1):161–190. https://doi.org/10.1007/s10844-010-0148-x
Sharif M, Yegneswaran V, Saidi H, et al., 2008. Eureka: a framework for enabling static malware analysis. Proc 13th Symp on Research in Computer Security, p.481–500. https://doi.org/10.1007/978-3-540-88313-5_31
Sheen S, Anitha R, Natarajan V, 2015. Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing, 151:905–912. https://doi.org/10.1016/j.neucom.2014.10.004
Skylot, 2015. Jadx. https://doi.org/github.com/skylot/jadx
Stein G, Chen B, Wu AS, et al., 2005. Decision tree classifier for network intrusion detection with GA-based feature selection. Proc 43rd Annual Southeast Regional Conf, p.136–141. https://doi.org/10.1145/1167253.1167288
Suarez-Tangil G, Tapiador JE, Peris-Lopez P, et al., 2014. Dendroid: a text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst Appl, 41(4):1104–1117. https://doi.org/10.1016/j.eswa.2013.07.106
Talha KA, Alper DI, Aydin C, 2015. Apk auditor: permission-based Android malware detection system. Dig Invest, 13:1–14. https://doi.org/10.1016/j.diin.2015.01.001
Thomas P, 2015. Google’s Android operating system dominates the smartphone market. https://doi.org/finance.yahoo.com/news/google-Android-operating-system-dominates-170640913.html [Accessed on June 11, 2016].
Tropp JA, 2004. Greed is good: algorithmic results for sparse approximation. IEEE Trans Inform Theory, 50(10): 2231–2242. https://doi.org/10.1109/TIT.2004.834793
Walenstein A, Deshotels L, Lakhotia A, 2012. Program structure-based feature selection for Android malware analysis. Proc 4th Int Conf on Security and Privacy in Mobile Information and Communication Systems, p.51–52. https://doi.org/10.1007/978-3-642-33392-7_5
Williams G, 2010. ARFF data. https://doi.org/datamining.togaware.com/survivor/ARFF_Data0.html [Accessed on Sept. 10, 2015].
Wu DJ, Mao CH, Wei TE, et al., 2012. Droidmat: Android malware detection through manifest and API calls tracing. Proc 7th Asia Joint Conf on Information Security, p.62–69. https://doi.org/10.1109/AsiaJCIS.2012.18
Yang ZM, Yang M, 2012. LeakMiner: detect information leakage on Android with static taint analysis. Proc 3rd World Congress on Software Engineering, p.101–104. https://doi.org/10.1109/WCSE.2012.26
Yerima SY, Sezer S, McWilliams G, et al., 2013. A new Android malware detection approach using Bayesian classification. Proc IEEE 27th Int Conf on Advanced Information Networking and Applications, p.121–128. https://doi.org/10.1109/AINA.2013.88
Yerima SY, Sezer S, McWilliams G, 2014a. Analysis of Bayesian classification-based approaches for Android malware detection. IET Inform Secur, 8(1):25–36. https://doi.org/10.1049/iet-ifs.2013.0095
Yerima SY, Sezer S, Muttik I, 2014b. Android malware detection using parallel machine learning classifiers. Proc 8th Int Conf on Next Generation Mobile Apps, Services and Technologies, p.37–42. https://doi.org/10.1109/NGMAST.2014.23
Yerima SY, Sezer S, Muttik I, 2015. High accuracy Android malware detection using ensemble learning. IET Inform Secur, 9(6):313–320. https://doi.org/10.1049/iet-ifs.2014.0099
Yu L, Pan ZL, Liu JJ, et al., 2013. Android malware detection technology based on improved Bayesian classification. Proc 23rd Int Conf on Instrumentation, Measurement, Computer, Communication and Control, p.1338–1341. https://doi.org/10.1109/IMCCC.2013.297
Zhang LS, Niu Y, Wu X, et al., 2013. A3: automatic analysis of Android malware. Proc 1st Int Workshop on Cloud Computing and Information Security, p.89–93. https://doi.org/10.2991/ccis-13.2013.22
Zhang T, 2009. On the consistency of feature selection using greedy least squares regression. J Mach Learn Res, 10:555–568.
Zhou W, Zhou YJ, Jiang XX,et al., 2012. Detecting repackaged smartphone applications in third-party Android marketplaces. Proc 2nd ACM Conf on Data and Application Security and Privacy, p.317–326. https://doi.org/10.1145/2133601.2133640
Zhou W, Zhou YJ, Grace M, et al., 2013. Fast, scalable detection of “Piggybacked” mobile applications. Proc 2nd ACM Conf on Data and Application Security and Privacy, p.185–196. https://doi.org/10.1145/2435349.2435377
Zia T, Akhter MP, Abbas Q, 2015. Comparative study of feature selection approaches for Urdu text categorization. Malays J Comput Sci, 28(2):93–109.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Project supported by the Ministry of Science, Technology and In-novation of Malaysia, under the Grant eScienceFund (No. 01-01-03-SF0914)
Rights and permissions
About this article
Cite this article
Firdaus, A., Anuar, N.B., Karim, A. et al. Discovering optimal features using static analysis and a genetic search based method for Android malware detection. Frontiers Inf Technol Electronic Eng 19, 712–736 (2018). https://doi.org/10.1631/FITEE.1601491
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1601491