Abstract
The escalating frequency and sophistication of cyber threats pose significant challenges to traditional intrusion detection methods. Signature-based misuse detection, hybrid detection, and anomaly detection, while effective in isolation, often struggle to keep pace with the ever-evolving tactics employed by attackers. This research stems from the pressing need to develop intrusion detection models that offer improved accuracy, adaptability, and robustness in the face of rapidly changing attack vectors. Our study leverages a comprehensive set of experiments conducted on diverse datasets, including N-BaIoT, NSL-KDD, and CICIDS2017. The primary focus is on the evaluation and comparison of machine learning algorithms such as Random Forest, XGBoost, and decision trees. Notably, our research culminates in the development of a hybrid intrusion detection model that combines the strengths of these algorithms. The results of our experiments indicate that the hybrid model, particularly when combining Random Forest and XGBoost, exhibits exceptional accuracy. This approach outperforms individual algorithms, achieving an accuracy rate of 97% in certain cases. We attribute this success to the ensemble learning approach, which capitalizes on the consensus of diverse classifiers. In conclusion, our research not only demonstrates the effectiveness of ensemble learning in enhancing intrusion detection but also underscores the importance of continuous adaptation in the face of evolving threats. By leveraging network traffic data, our hybrid model offers a promising avenue for bolstering intrusion classification in Big Data environments.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the authors.
References
Carley, K.M.: Social cybersecurity: an emerging science. Comput. Math. Organ. Theory 26(4), 365–381 (2020)
Hussein, S., Kandel, P., Bolan, C.W., Wallace, M.B., Bagci, U.: Lung and pancreatic tumor characterization in the deep learning era: novel supervised and unsupervised learning approaches. IEEE Trans. Med. Imaging 38(8), 1777–1787 (2019)
Chand, N., Mishra, P., Krishna, C.R., Pilli, E.S., Govil, M.C.: A comparative analysis of SVM and its stacking with another classification algorithm for intrusion detection. In: 2016 International Conference on Advances in Computing, Communication, Automation (ICACCA) (Spring), pp. 1–6. IEEE (2016)
El Arass, M., Souissi, N.: Smart siem: From big data logs and events to smart data alerts. Int. J. Innov. Technol. Explore. Eng 8(8), 3186–3191 (2019)
Iqbal, M.H., Soomro, T.R., et al.: Big data analysis: Apache storm perspective. Int. J. Comput. Trends Technol. 19(1), 9–14 (2015)
Zhang, D.: Big data security and privacy protection. In: 8th International Conference on Management and Computer Science (ICMCS 2018), vol. 77, pp. 275–278. Atlantis Press (2018)
Guezzaz, A., Asimi, Y., Azrour, M., Asimi, A.: Mathematical validation of proposed machine learning classifier for heterogeneous traffic and anomaly detection. Big Data Mining Anal. 4(1), 18–24 (2021)
El Mourabit, Y., Bouirden, A., Toumanari, A., Moussaid, N., et al.: Intrusion detection techniques in wireless sensor network using data mining algorithms: A comparative evaluation based on attacks detection. Int. J. Adv. Comput. Sci. Appl. 6(9), 164–172 (2015)
Nadiammai, G., Hemalatha, M.: Effective approach toward intrusion detection system using data mining techniques. Egypt. Inf. J. 15(1), 37–50 (2014)
Ghazali, A., Nuaimy, W., Al-Atabi, A., Jamaludin, I.: Comparison of classification models for NSL-KDD dataset for network anomaly detection. Acad. J. Sci. 4(1), 199–206 (2015)
Kevric, J., Jukic, S., Subasi, A.: An effective combining classifier approach using tree algorithms for network intrusion detection. Neural Comput. Appl. 28(1), 1051–1058 (2017)
Hadi, A.A.A., Al-Furat, A.-A.: Performance analysis of big data intrusion detection system over random forest algorithm. Int. J. Appl. Eng. Res. 13(2), 1520–1527 (2018)
Karami, A.: An anomaly-based intrusion detection system in presence of benign outliers with visualization capabilities. Expert Syst. Appl. 108, 36–60 (2018)
Gu, J., Wang, L., Wang, H., Wang, S.: A novel approach to intrusion detection using SVM ensemble with feature augmentation. Computers Security 86, 53–62 (2019)
Bertoni, M.A., Rosa, G.H.D., Brega, J.R.: Optimum-path forest stacking-based ensemble for intrusion detection. Evol. Intell. 15(3), 2037–2054 (2022)
Prasad, M., Tripathi, S., Dahal, K.: An efficient feature selection based Bayesian and rough set approach for intrusion detection. Appl. Soft Comput. 87, 105980 (2020)
Tabash, M., Abd Allah, M., Tawfik, B.: Intrusion detection model using naive bayes and deep learning technique. Int. Arab J. Inf. Technol. 17(2), 215–224 (2020)
Elmasry, W., Akbulut, A., Zaim, A.H.: Evolving deep learning architectures for network intrusion detection using a double metaheuristic. Comput. Netw. 168, 107042 (2020)
Gupta, N., Jindal, V., Bedi, P.: CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. Comput. Secur. 112, 102499 (2022)
Maseer, Z.K., Yusof, R., Bahaman, N., Mostafa, S.A., Foozy, C.F.M.: Benchmarking of machine learning for anomaly-based intrusion detection systems in the cicids2017 dataset. IEEE Access 9, 22351–22370 (2021)
Zhou, Y., Cheng, G., Jiang, S., Dai, M.: Building an efficient intrusion detection system based on feature selection and ensemble classifier. Comput. Netw. 174, 107247 (2020)
Tama, B.A., Comuzzi, M., Rhee, K.-H.: TSE-IDS: A two- stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 7, 94497–94507 (2019)
P. Illy, G. Kaddoum, C.M. Moreira, K. Kaur, S. Garg: Securing fog-to-things environment using intrusion detection system based on ensemble learning. In: 2019 IEEE Wireless Communications and Networking Conference (WCNC), pp 1–7. IEEE (2019)
Singh, P., Ranga, V.: Attack and intrusion detection in cloud computing using an ensemble learning approach. Int. J. Inf. Technol. 13(2), 565–571 (2021)
Rajadurai, H., Gandhi, U.D.: A stacked ensemble learning model for intrusion detection in wireless network. Neural Comput. App. 34, 1–9 (2020)
Pham, N.T., Foo, E., Suriadi, S., Jeffrey, H., Lahza, H.F.M.: Improving performance of intrusion detection system using ensemble methods and feature selection. In: Proceedings of the Australasian Computer Science Week Multiconference, pp. 1–6 (2018).
Hazman, C., Guezzaz, A., Benkirane, S., Azrour, M.: Toward an intrusion detection model for IoT-based smart environments. Multimedia Tools App. (2023). https://doi.org/10.1007/s11042-023-16436-0
Douiba, M., Benkirane, S., Guezzaz, A., Azrour, M.: An improved anomaly detection model for IoT security using decision tree and gradient boosting. J. Supercomput. (2022). https://doi.org/10.1007/s11227-022-04783-y
Mohy-eddine, M., Guezzaz, A., Benkirane, S., Azrour, M.: An effective intrusion detection approach based on ensemble learning for IIoT edge computing. J. Comput. Virol. Hack. Techniq. (2022). https://doi.org/10.1007/s11416-022-00456-9
Hazman, C., Guezzaz, A., Benkirane, S., Azrour, M.: IDS-SIoEL: intrusion detection framework for IoT-based smart environments security using ensemble learning. Cluster Comput. (2022). https://doi.org/10.1007/s10586-022-03810-0
Mohy-eddine, M., Guezzaz, A., Benkirane, S., Azrour, M.: An efficient network intrusion detection model for IoT security using K-NN classifier and feature selection. Multimedia Tools App. (2023). https://doi.org/10.1007/s11042-023-14795-2
Mohy-Eddine, M., Guezzaz, A., Benkirane, S., Azrour, M., Farhaoui, Y.: An ensemble learning based intrusion detection model for industrial IoT security. J. Big Data Mining Anal. 6(3), 273–287 (2023)
Attou, H., Mohy-eddine, M., Guezzaz, A., Benkirane, S., Azrour, M., Alabdultif, A., Almusallam, N.: Towards an intelligent intrusion detection system to detect malicious activities in cloud computing. Appl. Sci. J. (2023). https://www.mdpi.com/2076-3417/13/17/9588
Attou, H., Guezzaz, A., Benkirane, S., Azrour, M., Farhaoui, Y.: Cloud-based intrusion detection approach using machine learning techniques. J. Big Data Mining Anal. 6(3), 273–287 (2023)
Ahmim, A., Maglaras, L., Ferrag, M.A., Derdour, M., Janicke, H.: A novel hierarchical intrusion detection system based on decision tree and rules-based models. In: 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), pp. 228–233. IEEE (2019)
Abid, A., Jemili, F.: Intrusion detection based on graph-oriented big data analytics. Procedia Comput. Sci. 176, 572–581 (2020)
Meddeb, R., Triki, B., Jemili, F., Korbaa, O.: Dataset for intrusion detection in mobile ad-hoc networks. In: International Conference on Intelligent Systems Design and Applications, pp. 24–34. Springer (2019)
Jemili, F., Zaghdoud, M., Ahmed, M.B.: Didfast BN: Distributed intrusion detection and forecasting multiagent system using the Bayesian network. 2006 2nd Int. Conf. Inf. Commun. Technol. 2, 3040–3044 (2006)
Sarker, I.H., Kayes, A., Badsha, S., Alqahtani, H., Watters, P., Ng, A.: Cybersecurity data science: An overview from a machine learning perspective. J. Big Data 7(1), 1–29 (2020)
Jemili, F., Zaghdoud, M., Ahmed, M.B.: Intrusion detection based on “hybrid” propagation in Bayesian networks. In: 2009 IEEE International Conference on Intelligence and Security Informatics, pp. 137–142. IEEE (2009)
Meddeb, R., Jemili, F., Triki, B., Korbaa, O.: Anomaly-based behavioral detection in mobile Ad-Hoc networks. Procedia Comput. Sci. 159, 77–86 (2019)
Shaukat, K., Luo, S., Varadharajan, V., Hameed, I.A., Chen, S., Liu, D., Li, J.: Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies 13(10), 2509 (2020)
Ho, S., Al Jufout, S., Dajani, K., Mozumdar, M.: A novel intrusion detection model for detecting known and innovative cyberattacks using convolutional neural network. IEEE Open J. Comput. Soc. 2, 14–25 (2021)
Kumar, P., Kumar, A.A., Sahayakingsly, C., Udayakumar, A.: Analysis of intrusion detection in cyber attacks using deep learning neural networks. Peer-to-Peer Netw. App. 14(4), 2565–2584 (2021)
Nie, F., Zhu, W., Li, X.: Decision tree SVM: An extension of linear SVM for non-linear classification. Neurocomputing 401, 153–159 (2020)
Teng, S., Wu, N., Zhu, H., Teng, L., Zhang, W.: SVM-DT-based adaptive and collaborative intrusion detection. IEEE/CAA J. Autom. Sinica 5(1), 108–118 (2017)
Zavrak, S., Iskefiyeli, M.: Anomaly-based intrusion detection from the network flow features using variational autoencoder. IEEE Access 8, 108346–108358 (2020)
Lin, H.-T., Lin, C.-J.: A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Comput. 3(1–32), 16 (2003)
Chen, W., Pourghasemi, H.R., Naghibi, S.A.: A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China. Bull. Eng. Geol. Environ. 77(2), 647–664 (2018)
Hassan, U.K., Nawi, N.M., Kasim, S.: Classify a protein domain using sigmoid support vector machine. In: 2014 International Conference on Information Science Applications (ICISA), pp. 1–4. IEEE (2014)
Panda, M., Patra, M.R.: Network intrusion detection using Naive Bayes. Int. J. Comput. Sci. Netw. Secur. 7(12), 258–263 (2007)
Guigour`es, R., Boull´e, M.: Optimisation directe des poids de mod`eles dans un pr´edicteur bay´esien na¨ıf moyenn´e. In: EGC, pp. 77–82 (2011)
Liu, G., Zhao, H., Fan, F., Liu, G., Xu, Q., Nazir, S.: An enhanced intrusion detection model based on improved kNN in WSNs. Sensors 22(4), 1407 (2022)
Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: kNN model-based approach in classification. In: OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, pp. 986–996. Springer (2003)
Li, W., Yi, P., Wu, Y., Pan, L., Li, J.: A new intrusion detection system based on kNN classification algorithm in a wireless sensor network. J. Electr. Comput. Eng. 2014, 1–8 (2014)
Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comp. Sci. 14(2), 241–258 (2020)
Shirzadi, A., Soliman, K., Habibnejhad, M., Kavian, A., Chapi, K., Shahabi, H., Chen, W., Khosravi, K., Thai Pham, B., Pradhan, B., et al.: Novel GIS based machine learning algorithms for shallow landslide susceptibility mapping. Sensors 18(11), 3777 (2018)
Gaikwad, D., Thool, R.C.: Intrusion detection system using bagging ensemble method of machine learning. In: 2015 International Conference on Computing Communication Control and Automation, pp. 291–295. IEEE (2015)
Prasad, A.M., Iverson, L.R., Liaw, A.: Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2), 181–199 (2006)
Narassiguin, A.: Ensemble learning, comparative analysis and further improvements with dynamic ensemble selection. Ph.D. thesis, Universite de Lyon (2018)
Narassiguin, A., Bibimoune, M., Elghazel, H., Aussem, A.: An extensive empirical comparison of ensemble learning methods for binary classification. Pattern Anal. Appl. 19(4), 1093–1128 (2016)
Rufo, D.D., Debelee, T.G., Ibenthal, A., Negera, W.G.: Diagnosis of diabetes mellitus using gradient boosting machine (lightgbm). Diagnostics 11(9), 1714 (2021)
Vijay, R., Manoj, S., Ravikanth, V., Vikas, Y., Priyadarshini, P.I.: Augmenting network intrusion detection system using extreme gradient boosting (XGBoost). Int. J. Creative Res. Thoughts 9 (2021)
Hong, H., Liu, J., Zhu, A.-X.: Modeling landslide susceptibility using logitboost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Sci. Total Environ. 718, 137231 (2020)
Panda, M., Patra, M.R. Ensemble of classifiers for detecting network intrusion. In: Proceedings of the International Conference on Advances in Computing, Communication, and Control, pp. 510–515 (2009)
Shivaswamy, P., Jebara, T.: Variance penalizing Adaboost. Advances in Neural Information Processing Systems 24 (2011)
Rashid, M., Kamruzzaman, J., Imam, T., Wibowo, S., Gordon, S.: Atree-based stacking ensemble technique with feature selection for network intrusion detection. Appl. Intell. 52, 1–14 (2022)
Rajagopal, S., Kundapur, P.P., Hareesha, K.S.: A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. 2020, 1–9 (2020)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1), 105–139 (1999)
Alharbi, A., Alosaimi, W., Alyami, H., Rauf, H.T., Damaˇseviˇcius, R.: Botnet attack detection using local global best bat algorithm for the industrial internet of things. Electronics 10(11), 1341 (2021)
Cocoros, P., Sobocinski, M., Steiger, K., Coffman, J.: Evaluating techniques for practical cloud-based network intrusion detection. In: 2020 IEEE International Conference on Smart Cloud (SmartCloud), pp. 62–67 (2020). https://doi.org/10.1109/SmartCloud49737.2020.00020
Mhawi, D.N., Aldallal, A., Hassan, S.: Advanced feature-selection-based hybrid ensemble learning algorithms for network intrusion detection systems. Symmetry 14, 1461 (2022). https://doi.org/10.3390/sym14071461
Revathi, S., Malathi, A.: A detailed analysis on the NSL-KDD dataset using various machine learning techniques for intrusion detection. Int. J. Eng. Res. Technol. 2(12), 1848–1853 (2013)
Kotsiantis, S., Pintelas, P.: Combining bagging and boosting. Int. J. Comput. Intell. 1(4), 324–333 (2004)
Kilincer, I.F., Ertam, F., Sengur, A.: A comprehensive intrusion detection framework using boosting algorithms. Comput. Electr. Eng. 100, 107869 (2022)
El Motaki, S., Yahyaouy, A., Gualous, H., et al.: A new weighted fuzzy C-means clustering for workload monitoring in cloud data center platforms. Cluster Comput. 24, 3367–3379 (2021). https://doi.org/10.1007/s10586-021-03331-2
Mousavi, A., Sajedi Hosseini, F., Choubin, B., Goodarzi, M., Dineva, A.A., Rafiei Sardooi, E.: Ensemble boosting and bagging based machine learning models for groundwater potential prediction. Water Resour. Manage 35(1), 23–37 (2021)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J.: The online performance estimation framework: Heterogeneous ensemble learning for data streams. Mach. Learn. 107(1), 149–176 (2018)
De Coste, M., Li, Z., Pupek, D., Sun, W.: A hybrid ensemble modeling framework for the prediction of breakup ice jams on northern Canadian rivers. Cold Reg. Sci. Technol. 189, 103302 (2021)
Chowdhury, R., Sen, S., Roy, A., Saha, B.: An optimal feature-based network intrusion detection system using bagging ensemble method for real-time traffic analysis. Multimed. Tools App. 81, 1–23 (2022)
Louppe, G., Geurts, P.: Ensembles on random patches. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 346–361. Springer (2012)
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)
Disha, R.A., Waheed, S.: Performance analysis of machine learning models for intrusion detection system using Gini impurity-based weighted random forest (GIWRF) feature selection technique. Cybersecurity 5(1), 1–22 (2022)
Krishnaveni, S., Sivamohan, S., Sridhar, S.S., et al.: Efficient feature selection and classification through ensemble method for network intrusion detection on cloud computing. Cluster Comput. 24, 1761–1779 (2021). https://doi.org/10.1007/s10586-020-03222-y
Artur, M.: Review the performance of the Bernoulli Naive Bayes classifier in intrusion detection systems using recursive feature elimination with a cross-validated selection of the best number of features. Procedia Comput. Sci. 190, 564–570 (2021)
Belouch, M., El Hadaj, S., Idhammad, M.: Performance evaluation of intrusion detection based on machine learning using apache spark. Procedia Comput. Sci. 127, 1–6 (2018)
Al-Omari, M., Rawashdeh, M., Qutaishat, F., Alshira’H, M., Ababneh, N.: An intelligent tree-based intrusion detection model for cyber security. J. Netw. Syst. Manag. 29(2), 1–18 (2021)
Author information
Authors and Affiliations
Contributions
Farah Jemili and Rahma Meddeb wrote the main manuscript text. Farah Jemili prepared tables and figures. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors state that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jemili, F., Meddeb, R. & Korbaa, O. Intrusion detection based on ensemble learning for big data classification. Cluster Comput 27, 3771–3798 (2024). https://doi.org/10.1007/s10586-023-04168-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-023-04168-7