Robust malware family classification using effective features and classifiers

BT Hammad, N Jamil, IT Ahmed, ZM Zain, S Basheer - Applied Sciences, 2022 - mdpi.com
Applied Sciences, 2022mdpi.com
Malware development has significantly increased recently, posing a serious security risk to
both consumers and businesses. Malware developers continually find new ways to
circumvent security research's ongoing efforts to guard against malware attacks. Malware
Classification (MC) entails labeling a class of malware to a specific sample, while malware
detection merely entails finding malware without identifying which kind of malware it is.
There are two main reasons why the most popular MC techniques have a low classification …
Malware development has significantly increased recently, posing a serious security risk to both consumers and businesses. Malware developers continually find new ways to circumvent security research’s ongoing efforts to guard against malware attacks. Malware Classification (MC) entails labeling a class of malware to a specific sample, while malware detection merely entails finding malware without identifying which kind of malware it is. There are two main reasons why the most popular MC techniques have a low classification rate. First, Finding and developing accurate features requires highly specialized domain expertise. Second, a data imbalance that makes it challenging to classify and correctly identify malware. Furthermore, the proposed malware classification (MC) method consists of the following five steps: (i) Dataset preparation: 2D malware images are created from the malware binary files; (ii) Visualized Malware Pre-processing: the visual malware images need to be scaled to fit the CNN model’s input size; (iii) Feature extraction: both hand-engineering (Tamura) and deep learning (GoogLeNet) techniques are used to extract the features in this step; (iv) Classification: to perform malware classification, we employed k-Nearest Neighbor (KNN), Support Vector Machines (SVM), and Extreme Learning Machine (ELM). The proposed method is tested on a standard Malimg unbalanced dataset. The accuracy rate of the proposed method was extremely high, making it the most efficient option available. The proposed method’s accuracy rate was outperformed both the Hand-crafted feature and Deep Feature techniques, at 95.42 and 96.84 percent.
MDPI