Abstract
The quantity of normal samples is commonly significantly greater than that of malicious samples, resulting in an imbalance in network security data. When dealing with imbalanced samples, the classification model requires careful sampling and attribute selection methods to cope with bias towards majority classes. Simple data sampling methods and incomplete feature selection techniques cannot improve the accuracy of intrusion detection models. In addition, a single intrusion detection model cannot accurately classify all attack types in the face of massive imbalanced security data. Nevertheless, the existing model integration methods based on stacking or voting technologies suffer from high coupling that undermines their stability and reliability. To address these issues, we propose a Multiple Integration Model (MIM) to implement feature selection and attack classification. First, MIM uses random Oversampling, random Undersampling and Washing Methods (OUWM) to reconstruct the data. Then, a modified simulated annealing algorithm is employed to generate candidate features. Finally, an integrated model based on Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost) and gradient Boosting with Categorical features support (CatBoost) is designed to achieve intrusion detection and attack classification. MIM leverages a Rule-based and Priority-based Ensemble Strategy (RPES) to combine the high accuracy of the former and the high effectiveness of the latter two, improving the stability and reliability of the integration model. We evaluate the effectiveness of our approach on two publicly available intrusion detection datasets, as well as a dataset created by researchers from the University of New Brunswick and another dataset collected by the Australian Center for Cyber Security. In our experiments, MIM significantly outperforms several existing intrusion detection models in terms of accuracy. Specifically, compared to two recently proposed methods, namely, the reinforcement learning method based on the adaptive sample distribution dual-experience replay pool mechanism (ASD2ER) and the method that combines Auto Encoder, Principal Component Analysis, and Long Short-Term Memory (AE+PCA+LSTM), MIM exhibited a respective enhancement in intrusion detection accuracy by 1.35% and 1.16%.
Similar content being viewed by others
References
Yan, J., Zhaoquan, G., Zhihao, J., Cuiyun, G., Jianye, Y.: Persistent graph stream summarization for real-time graph analytics. World Wide Web 26, 2647–2667 (2023)
Uno, F., Jianxin, L., Naveed, A., Man, L., Yan, J.: GoMIC: Multi-view image clustering via self-supervised contrastive heterogeneous graph co-learning. World Wide Web 26, 1667-1683 (2023)
Abhilash, S., Seyed, M.H.M., Jaiprakash, N.: F-TLBO-ID: Fuzzy fed teaching learning based optimisation algorithm to predict the number of k-barriers for intrusion detection. Appl. Soft Comput. 151, 111163 (2024)
Bhawana, S., Lokesh, S., Chhagan, L., Satyabrata, R.: Explainable artificial intelligence for intrusion detection in IoT networks: A deep learning based approach. Expert Syst. App. 238, 121751 (2024)
Zhiqiang, Z., Le, W., Guangyao, C., Zhaoquan, G., Zhihong, T., Xiaojiang, D., Mohsen, G.: STG2P: A two-stage pipeline model for intrusion detection based on improved LightGBM and K-means. Simul. Model. Pract. Theory 120, 102614 (2022)
Giuseppina, A., Annalisa, A., Luca, D.R., Donato, M.: GAN augmentation to deal with imbalance in imaging-based intrusion detection. Futur. Gener. Comput. Syst. 123, 108-127 (2021)
Iman, S., Arash, H.L., Ali, A.G.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceed. 4th Int. Conf. Inf. Syst. Sec. Priv. 108-116 (2018)
Giuseppina, A., Annalisa, A., Donato, M.: Autoencoder-based deep metric learning for network intrusion detection. Inf. Sci. 569, 706–727 (2021)
Mohammed, A.A., Xiangjian, H., Priyadarsi, N., Zhiyuan, T.: Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm. IEEE Trans. Comput. 65(10), 2986-2998 (2016)
Joffrey, L., Taghi, K.: A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data. J. Big Data, 7, 104 (2020)
Mahbub, E.K., Joarder, K., Iqbal, G., Tasadduq, I., Ashfaqur, R.: Malware detection in edge devices with fuzzy oversampling and dynamic class weighting. Appl. Soft Comput. 112, 107783 (2021)
Florian, G., Elizabeth, C., Tharam, D.: CorrCorr: A feature selection method for multivariate correlation network anomaly detection techniques. Comput. Sec. 83, 234-245 (2019)
Fatemeh, A., MohammadMahdi, R.Y., Caro, L., Azadeh, S., Nasser, Y.: Mutual information-based feature selection for intrusion detection systems. Netw. Comput. Appl. 34, 1184–1199 (2011)
Luming, Y., Shaojing, F., Xuyun, Z., Shize, G., Yongjun, W., Chi, Y.: FlowSpectrum: a concrete characterization scheme of network traffic behavior for anomaly detection. World Wide Web, 25, 2139-2161 (2022)
Varma, P.R.K., Kumari, V.V., Kumar, S.S.: Feature Selection Using Relative Fuzzy Entropy and Ant Colony Optimization Applied to Real-time Intrusion Detection System. Procedia Comput. Sci. 85, 503-510 (2016)
Anjum, N., Rizwan, A.K.: A novel combinatorial optimization based feature selection method for network intrusion detection. Comput. Sec. 102, 102164 (2021)
Nour, M., Benjamin, T., Kim-Kwang, R.C.: An Ensemble Intrusion Detection Technique Based on Proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things. IEEE Internet Things J. 6(3), 4815-4830 (2019)
Eduardo, D.H., EmiroDe, L.H., Andrés, O., Julio O., Beatriz, P.: PCA filtering and probabilistic SOM for network intrusion detection. Neurocomputing 164(21), 71-81 (2015)
Rong, Z., Minqi, Z., Xueqing, G., Xiaofeng, H., Weining, Q., Shouke, Q., Aoying, Z.: Detecting anomaly in data streams by fractal model. World Wide Web 18, 1419–1441 (2015)
Mahsa, M., Jafar, T.: A Density-based Undersampling Approach to Intrusion Detection. In: Proceed. 2021 5th Int. Conf. Patt. Recog. Image Anal. (IPRIA), 1-7 (2021)
Qusyairi, R.S.F., Kalamullah, R.: Implementation of Ensemble Learning and Feature Selection for Performance Improvements in Anomaly-Based Intrusion Detection Systems. In: Proceed. 2020 IEEE Int. Conf. Ind. 4.0, Artif. Intell. Commun. Tech. (IAICT), 118-124 (2020)
Selvakumar, B., Muneeswaran, K.: Firefly algorithm based feature selection for network intrusion detection. Comput. Sec. 81, 148–155 (2019)
Darshana, U., Jaume, M., Marzia, Z., Srinivas, S.: Intrusion Detection in SCADA Based Power Grids: Recursive Feature Elimination Model With Majority Vote Ensemble Algorithm. IEEE Trans. Netw. Sc. Eng. 8(3), 2559-2574 (2021)
Zhaoquan, G., Le, W., Xiaolong, C., Yunyi, T., Xingang, W., Xiaojiang, D., Mohsen, G., Zhihong, T.: Epidemic Risk Assessment by a Novel Communication Station Based Method. IEEE Trans. Netw. Sci. Eng. 9(1), 332-344 (2022)
Hao, Z., JieLing, L., XiMeng, L., Chen, D.: Multi-dimensional feature fusion and stacking ensemble mechanism for network intrusion detection. Futur. Gener. Comput. Syst. 122, 130-143 (2021)
Hongwei, D., Leiyang, C., Liang, D., Zhongwang, F., Xiaohui, C.: Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection. Futur. Gener. Comput. Syst. 131, 240-254 (2022)
Hongpo, Z., Lulu, H., Chase, Q.W., ZhanboL.: An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset. Comput. Netw. 177, 107315 (2020)
Zhihao, W., Dingde, J., Liuwei, H., Wei, Y.: An efficient network intrusion detection approach based on deep learning. Wirel. Netw. 27, 1-14 (2021)
Xing, X., Jie, L., Yang, Y., Fumin, S.: Toward Effective Intrusion Detection Using Log-Cosh Conditional Variational Autoencoder. IEEE Internet Things J. 8(8), 6187-6196 (2021)
Samed, A., Murat, D.: STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Sec. 110, 102435 (2021)
Desale, K.S., Ade, R.: Genetic algorithm based feature selection approach for effective intrusion detection system. In: Proceed 2015 Int. Conf. Comput. Commun. Inform. (ICCCI), 1-6 (2015)
Shadi, A., Monther, A., Muneer, B.Y.: Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J. Comput. Sci. 25, 152-160 (2018)
Sara, M., Hamid, M., Mostafa, G.A., Hadis, K.: Cyber intrusion detection by combined feature selection algorithm. J. Inform. Sec. Appl. 44, 80–88 (2019)
Bayu, A.T., Kyung, H.R.: A Combination of PSO-Based Feature Selection and Tree-Based Classifiers Ensemble for Intrusion Detection Systems. Adv Comput Sci Ubiquit. Comput. 489-495 (2015)
Zhiqiang, Z., Le, W., Jiongsong, H.: Principle and Application Research of Particle Swarm Optimization. In: Proceed 2020 5th Int. Conf. Mech. Control Comput. Eng. (ICMCCE), 1638-1642 (2020)
Hu, L., Ye, W., Hua, W., Bin, Z.: Multi-window based ensemble learning for classification of imbalanced streaming data. World Wide Web 20, 1507-1525 (2017)
Dickson, K.W., XiangJun, S., Yong, D., Liangjun, W., ShuCheng, H.: Co-regularized kernel ensemble regression. World Wide Web 22, 717–734 (2019)
Jinping, L., Jiezhou, H., Wuxia, Z., Tianyu, M., Zhaohui, T., Jean, P.N., Weihua, G.: ANID-SEoKELM: Adaptive network intrusion detection based on selective ensemble of kernel ELMs with random features. Knowl-Based Syst. 177(1), 104-116 (2019)
Ying, Z., Thomas, M., Shahram, S.: M-AdaBoost-A Based Ensemble System for Network Intrusion Detection. Expert Syst. Appl. 162, 113864 (2020)
Saikat, D., Mohammad, A., Frederick, T.S., Sajjan, S.: Network Intrusion Detection using Natural Language Processing and Ensemble Machine Learning. In: Proceed 2020 IEEE Symp. Ser. Comput. Intell. (SSCI), 829-835 (2020)
Enkhtur, T., Monowar, H.B., Yuzo, T., Doudou, F., Khishigjargal, G., Erik, E., Youki, K.: DeL-IoT: A Deep Ensemble Learning Approach to Uncover Anomalies in IoT. Internet of Things 14, 100391 (2021)
Prabhat, K., Govind, G., Rakesh, T.: An ensemble learning and fog-cloud architecture-driven cyber-attack detection framework for IoMT networks. Comput. Commun. 166, 110-124 (2021)
Mahbod, T., Ebrahim, B., Wei, L., Ali, A.G.: A detailed analysis of the KDD CUP 99 data set. In: Proceed. 2009 IEEE Symp. Comput. Intell. Sec. Def. Appl. 1-6 (2009)
Nour, M., Jill, S.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Proceed. 2015 Mil. Commun. Inform. Syst. Conf. (MilCIS), 1-6 (2015)
Saharon, R., Aron, I.: KDD-cup 99: knowledge discovery in a charitable organization’s donor database. ACM SIGKDD Explor. Newsl. 1, 85-90 (2000)
Hongyu, Y., Renyun, Z., Guangquan, X., Liang, Z.: A network security situation assessment method based on adversarial deep learning. Appl. Soft Comput. 102, 107096 (2021)
Al, Y., Wathiq, L., Ali, K.I., Faezah, H.A.: Wrapper feature selection method based differential evolution and extreme learning machine for intrusion detection system. Patt. Recog. 132, 108912 (2022)
Haonan, T., Le, W., Dong, Z., Jianyu, D.: Intrusion Detection Based on Adaptive Sample Distribution Dual-Experience Replay Reinforcement Learning. Math. 12(7), 948 (2024)
Thakkar, A., Nandish, K., Rebakah, G.: Fusion of linear and non-linear dimensionality reduction techniques for feature reduction in LSTM-based Intrusion Detection System. Appl. Soft Comput. 154, 111378 (2024)
Jianlei, G., Senchun, C., Baihai, Z., Yuanqing, X.: Research on Network Intrusion Detection Based on Incremental Extreme Learning Machine and Adaptive Principal Component Analysis. Energ. 12(7), 1207-1223 (2019)
Earum, M., Aneela, Z., Muhammad, U., Asima, A.A.: A two-stage intrusion detection system with auto-encoder and LSTMs. Appl. Soft Comput 121, 108768 (2022)
Hooshmand, M.K., Doreswamy, H.: Network anomaly detection using deep learning techniques. CAAI Trans. Intell. Tech. 7(2), 228-243 (2022)
Funding
This work is supported by the Guangdong Basic and Applied Basic Research Foundation (2023A1515011698), the Major Key Project of PCL (PCL2022A03), the Guangdong High-level University Foundation Program (SL2022A03J00918), and the National Natural Science Foundation of China (Grant No. 62372137).
Author information
Authors and Affiliations
Contributions
Zhiqiang Zhang and Le Wang wrote the main manuscript text. Zhiqiang Zhang proposed the main technical ideas for the methods in the manuscript and conducted the experiments. Junyi Zhu and Dong Zhu performed research and analyzed data. Zhaoquan Gu and Yanchun Zhang prepared all figures and tables in the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Z., Wang, L., Zhu, J. et al. MIM: A multiple integration model for intrusion detection on imbalanced samples. World Wide Web 27, 47 (2024). https://doi.org/10.1007/s11280-024-01285-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11280-024-01285-0