Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Intrusion detection based on ensemble learning for big data classification

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The escalating frequency and sophistication of cyber threats pose significant challenges to traditional intrusion detection methods. Signature-based misuse detection, hybrid detection, and anomaly detection, while effective in isolation, often struggle to keep pace with the ever-evolving tactics employed by attackers. This research stems from the pressing need to develop intrusion detection models that offer improved accuracy, adaptability, and robustness in the face of rapidly changing attack vectors. Our study leverages a comprehensive set of experiments conducted on diverse datasets, including N-BaIoT, NSL-KDD, and CICIDS2017. The primary focus is on the evaluation and comparison of machine learning algorithms such as Random Forest, XGBoost, and decision trees. Notably, our research culminates in the development of a hybrid intrusion detection model that combines the strengths of these algorithms. The results of our experiments indicate that the hybrid model, particularly when combining Random Forest and XGBoost, exhibits exceptional accuracy. This approach outperforms individual algorithms, achieving an accuracy rate of 97% in certain cases. We attribute this success to the ensemble learning approach, which capitalizes on the consensus of diverse classifiers. In conclusion, our research not only demonstrates the effectiveness of ensemble learning in enhancing intrusion detection but also underscores the importance of continuous adaptation in the face of evolving threats. By leveraging network traffic data, our hybrid model offers a promising avenue for bolstering intrusion classification in Big Data environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the authors.

References

  1. Carley, K.M.: Social cybersecurity: an emerging science. Comput. Math. Organ. Theory 26(4), 365–381 (2020)

    Google Scholar 

  2. Hussein, S., Kandel, P., Bolan, C.W., Wallace, M.B., Bagci, U.: Lung and pancreatic tumor characterization in the deep learning era: novel supervised and unsupervised learning approaches. IEEE Trans. Med. Imaging 38(8), 1777–1787 (2019)

    Google Scholar 

  3. Chand, N., Mishra, P., Krishna, C.R., Pilli, E.S., Govil, M.C.: A comparative analysis of SVM and its stacking with another classification algorithm for intrusion detection. In: 2016 International Conference on Advances in Computing, Communication, Automation (ICACCA) (Spring), pp. 1–6. IEEE (2016)

  4. El Arass, M., Souissi, N.: Smart siem: From big data logs and events to smart data alerts. Int. J. Innov. Technol. Explore. Eng 8(8), 3186–3191 (2019)

    Google Scholar 

  5. Iqbal, M.H., Soomro, T.R., et al.: Big data analysis: Apache storm perspective. Int. J. Comput. Trends Technol. 19(1), 9–14 (2015)

    Google Scholar 

  6. Zhang, D.: Big data security and privacy protection. In: 8th International Conference on Management and Computer Science (ICMCS 2018), vol. 77, pp. 275–278. Atlantis Press (2018)

    Google Scholar 

  7. Guezzaz, A., Asimi, Y., Azrour, M., Asimi, A.: Mathematical validation of proposed machine learning classifier for heterogeneous traffic and anomaly detection. Big Data Mining Anal. 4(1), 18–24 (2021)

    Google Scholar 

  8. El Mourabit, Y., Bouirden, A., Toumanari, A., Moussaid, N., et al.: Intrusion detection techniques in wireless sensor network using data mining algorithms: A comparative evaluation based on attacks detection. Int. J. Adv. Comput. Sci. Appl. 6(9), 164–172 (2015)

    Google Scholar 

  9. Nadiammai, G., Hemalatha, M.: Effective approach toward intrusion detection system using data mining techniques. Egypt. Inf. J. 15(1), 37–50 (2014)

    Google Scholar 

  10. Ghazali, A., Nuaimy, W., Al-Atabi, A., Jamaludin, I.: Comparison of classification models for NSL-KDD dataset for network anomaly detection. Acad. J. Sci. 4(1), 199–206 (2015)

    Google Scholar 

  11. Kevric, J., Jukic, S., Subasi, A.: An effective combining classifier approach using tree algorithms for network intrusion detection. Neural Comput. Appl. 28(1), 1051–1058 (2017)

    Google Scholar 

  12. Hadi, A.A.A., Al-Furat, A.-A.: Performance analysis of big data intrusion detection system over random forest algorithm. Int. J. Appl. Eng. Res. 13(2), 1520–1527 (2018)

    Google Scholar 

  13. Karami, A.: An anomaly-based intrusion detection system in presence of benign outliers with visualization capabilities. Expert Syst. Appl. 108, 36–60 (2018)

    Google Scholar 

  14. Gu, J., Wang, L., Wang, H., Wang, S.: A novel approach to intrusion detection using SVM ensemble with feature augmentation. Computers Security 86, 53–62 (2019)

    Google Scholar 

  15. Bertoni, M.A., Rosa, G.H.D., Brega, J.R.: Optimum-path forest stacking-based ensemble for intrusion detection. Evol. Intell. 15(3), 2037–2054 (2022)

    Google Scholar 

  16. Prasad, M., Tripathi, S., Dahal, K.: An efficient feature selection based Bayesian and rough set approach for intrusion detection. Appl. Soft Comput. 87, 105980 (2020)

    Google Scholar 

  17. Tabash, M., Abd Allah, M., Tawfik, B.: Intrusion detection model using naive bayes and deep learning technique. Int. Arab J. Inf. Technol. 17(2), 215–224 (2020)

    Google Scholar 

  18. Elmasry, W., Akbulut, A., Zaim, A.H.: Evolving deep learning architectures for network intrusion detection using a double metaheuristic. Comput. Netw. 168, 107042 (2020)

    Google Scholar 

  19. Gupta, N., Jindal, V., Bedi, P.: CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. Comput. Secur. 112, 102499 (2022)

    Google Scholar 

  20. Maseer, Z.K., Yusof, R., Bahaman, N., Mostafa, S.A., Foozy, C.F.M.: Benchmarking of machine learning for anomaly-based intrusion detection systems in the cicids2017 dataset. IEEE Access 9, 22351–22370 (2021)

    Google Scholar 

  21. Zhou, Y., Cheng, G., Jiang, S., Dai, M.: Building an efficient intrusion detection system based on feature selection and ensemble classifier. Comput. Netw. 174, 107247 (2020)

    Google Scholar 

  22. Tama, B.A., Comuzzi, M., Rhee, K.-H.: TSE-IDS: A two- stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 7, 94497–94507 (2019)

    Google Scholar 

  23. P. Illy, G. Kaddoum, C.M. Moreira, K. Kaur, S. Garg: Securing fog-to-things environment using intrusion detection system based on ensemble learning. In: 2019 IEEE Wireless Communications and Networking Conference (WCNC), pp 1–7. IEEE (2019)

  24. Singh, P., Ranga, V.: Attack and intrusion detection in cloud computing using an ensemble learning approach. Int. J. Inf. Technol. 13(2), 565–571 (2021)

    Google Scholar 

  25. Rajadurai, H., Gandhi, U.D.: A stacked ensemble learning model for intrusion detection in wireless network. Neural Comput. App. 34, 1–9 (2020)

    Google Scholar 

  26. Pham, N.T., Foo, E., Suriadi, S., Jeffrey, H., Lahza, H.F.M.: Improving performance of intrusion detection system using ensemble methods and feature selection. In: Proceedings of the Australasian Computer Science Week Multiconference, pp. 1–6 (2018).

  27. Hazman, C., Guezzaz, A., Benkirane, S., Azrour, M.: Toward an intrusion detection model for IoT-based smart environments. Multimedia Tools App. (2023). https://doi.org/10.1007/s11042-023-16436-0

    Article  Google Scholar 

  28. Douiba, M., Benkirane, S., Guezzaz, A., Azrour, M.: An improved anomaly detection model for IoT security using decision tree and gradient boosting. J. Supercomput. (2022). https://doi.org/10.1007/s11227-022-04783-y

    Article  Google Scholar 

  29. Mohy-eddine, M., Guezzaz, A., Benkirane, S., Azrour, M.: An effective intrusion detection approach based on ensemble learning for IIoT edge computing. J. Comput. Virol. Hack. Techniq. (2022). https://doi.org/10.1007/s11416-022-00456-9

  30. Hazman, C., Guezzaz, A., Benkirane, S., Azrour, M.: IDS-SIoEL: intrusion detection framework for IoT-based smart environments security using ensemble learning. Cluster Comput. (2022). https://doi.org/10.1007/s10586-022-03810-0

    Article  Google Scholar 

  31. Mohy-eddine, M., Guezzaz, A., Benkirane, S., Azrour, M.: An efficient network intrusion detection model for IoT security using K-NN classifier and feature selection. Multimedia Tools App. (2023). https://doi.org/10.1007/s11042-023-14795-2

    Article  Google Scholar 

  32. Mohy-Eddine, M., Guezzaz, A., Benkirane, S., Azrour, M., Farhaoui, Y.: An ensemble learning based intrusion detection model for industrial IoT security. J. Big Data Mining Anal. 6(3), 273–287 (2023)

    Google Scholar 

  33. Attou, H., Mohy-eddine, M., Guezzaz, A., Benkirane, S., Azrour, M., Alabdultif, A., Almusallam, N.: Towards an intelligent intrusion detection system to detect malicious activities in cloud computing. Appl. Sci. J. (2023). https://www.mdpi.com/2076-3417/13/17/9588

  34. Attou, H., Guezzaz, A., Benkirane, S., Azrour, M., Farhaoui, Y.: Cloud-based intrusion detection approach using machine learning techniques. J. Big Data Mining Anal. 6(3), 273–287 (2023)

    Google Scholar 

  35. Ahmim, A., Maglaras, L., Ferrag, M.A., Derdour, M., Janicke, H.: A novel hierarchical intrusion detection system based on decision tree and rules-based models. In: 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), pp. 228–233. IEEE (2019)

  36. Abid, A., Jemili, F.: Intrusion detection based on graph-oriented big data analytics. Procedia Comput. Sci. 176, 572–581 (2020)

    Google Scholar 

  37. Meddeb, R., Triki, B., Jemili, F., Korbaa, O.: Dataset for intrusion detection in mobile ad-hoc networks. In: International Conference on Intelligent Systems Design and Applications, pp. 24–34. Springer (2019)

    Google Scholar 

  38. Jemili, F., Zaghdoud, M., Ahmed, M.B.: Didfast BN: Distributed intrusion detection and forecasting multiagent system using the Bayesian network. 2006 2nd Int. Conf. Inf. Commun. Technol. 2, 3040–3044 (2006)

    Google Scholar 

  39. Sarker, I.H., Kayes, A., Badsha, S., Alqahtani, H., Watters, P., Ng, A.: Cybersecurity data science: An overview from a machine learning perspective. J. Big Data 7(1), 1–29 (2020)

    Google Scholar 

  40. Jemili, F., Zaghdoud, M., Ahmed, M.B.: Intrusion detection based on “hybrid” propagation in Bayesian networks. In: 2009 IEEE International Conference on Intelligence and Security Informatics, pp. 137–142. IEEE (2009)

  41. Meddeb, R., Jemili, F., Triki, B., Korbaa, O.: Anomaly-based behavioral detection in mobile Ad-Hoc networks. Procedia Comput. Sci. 159, 77–86 (2019)

    Google Scholar 

  42. Shaukat, K., Luo, S., Varadharajan, V., Hameed, I.A., Chen, S., Liu, D., Li, J.: Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies 13(10), 2509 (2020)

    Google Scholar 

  43. Ho, S., Al Jufout, S., Dajani, K., Mozumdar, M.: A novel intrusion detection model for detecting known and innovative cyberattacks using convolutional neural network. IEEE Open J. Comput. Soc. 2, 14–25 (2021)

    Google Scholar 

  44. Kumar, P., Kumar, A.A., Sahayakingsly, C., Udayakumar, A.: Analysis of intrusion detection in cyber attacks using deep learning neural networks. Peer-to-Peer Netw. App. 14(4), 2565–2584 (2021)

    Google Scholar 

  45. Nie, F., Zhu, W., Li, X.: Decision tree SVM: An extension of linear SVM for non-linear classification. Neurocomputing 401, 153–159 (2020)

    Google Scholar 

  46. Teng, S., Wu, N., Zhu, H., Teng, L., Zhang, W.: SVM-DT-based adaptive and collaborative intrusion detection. IEEE/CAA J. Autom. Sinica 5(1), 108–118 (2017)

    Google Scholar 

  47. Zavrak, S., Iskefiyeli, M.: Anomaly-based intrusion detection from the network flow features using variational autoencoder. IEEE Access 8, 108346–108358 (2020)

    Google Scholar 

  48. Lin, H.-T., Lin, C.-J.: A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Comput. 3(1–32), 16 (2003)

    Google Scholar 

  49. Chen, W., Pourghasemi, H.R., Naghibi, S.A.: A comparative study of landslide susceptibility maps produced using support vector machine with different kernel functions and entropy data mining models in China. Bull. Eng. Geol. Environ. 77(2), 647–664 (2018)

    Google Scholar 

  50. Hassan, U.K., Nawi, N.M., Kasim, S.: Classify a protein domain using sigmoid support vector machine. In: 2014 International Conference on Information Science Applications (ICISA), pp. 1–4. IEEE (2014)

  51. Panda, M., Patra, M.R.: Network intrusion detection using Naive Bayes. Int. J. Comput. Sci. Netw. Secur. 7(12), 258–263 (2007)

    Google Scholar 

  52. Guigour`es, R., Boull´e, M.: Optimisation directe des poids de mod`eles dans un pr´edicteur bay´esien na¨ıf moyenn´e. In: EGC, pp. 77–82 (2011)

  53. Liu, G., Zhao, H., Fan, F., Liu, G., Xu, Q., Nazir, S.: An enhanced intrusion detection model based on improved kNN in WSNs. Sensors 22(4), 1407 (2022)

    Google Scholar 

  54. Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: kNN model-based approach in classification. In: OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, pp. 986–996. Springer (2003)

  55. Li, W., Yi, P., Wu, Y., Pan, L., Li, J.: A new intrusion detection system based on kNN classification algorithm in a wireless sensor network. J. Electr. Comput. Eng. 2014, 1–8 (2014)

    Google Scholar 

  56. Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comp. Sci. 14(2), 241–258 (2020)

    Google Scholar 

  57. Shirzadi, A., Soliman, K., Habibnejhad, M., Kavian, A., Chapi, K., Shahabi, H., Chen, W., Khosravi, K., Thai Pham, B., Pradhan, B., et al.: Novel GIS based machine learning algorithms for shallow landslide susceptibility mapping. Sensors 18(11), 3777 (2018)

    Google Scholar 

  58. Gaikwad, D., Thool, R.C.: Intrusion detection system using bagging ensemble method of machine learning. In: 2015 International Conference on Computing Communication Control and Automation, pp. 291–295. IEEE (2015)

  59. Prasad, A.M., Iverson, L.R., Liaw, A.: Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9(2), 181–199 (2006)

    Google Scholar 

  60. Narassiguin, A.: Ensemble learning, comparative analysis and further improvements with dynamic ensemble selection. Ph.D. thesis, Universite de Lyon (2018)

  61. Narassiguin, A., Bibimoune, M., Elghazel, H., Aussem, A.: An extensive empirical comparison of ensemble learning methods for binary classification. Pattern Anal. Appl. 19(4), 1093–1128 (2016)

    MathSciNet  Google Scholar 

  62. Rufo, D.D., Debelee, T.G., Ibenthal, A., Negera, W.G.: Diagnosis of diabetes mellitus using gradient boosting machine (lightgbm). Diagnostics 11(9), 1714 (2021)

    Google Scholar 

  63. Vijay, R., Manoj, S., Ravikanth, V., Vikas, Y., Priyadarshini, P.I.: Augmenting network intrusion detection system using extreme gradient boosting (XGBoost). Int. J. Creative Res. Thoughts 9 (2021)

  64. Hong, H., Liu, J., Zhu, A.-X.: Modeling landslide susceptibility using logitboost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Sci. Total Environ. 718, 137231 (2020)

    Google Scholar 

  65. Panda, M., Patra, M.R. Ensemble of classifiers for detecting network intrusion. In: Proceedings of the International Conference on Advances in Computing, Communication, and Control, pp. 510–515 (2009)

  66. Shivaswamy, P., Jebara, T.: Variance penalizing Adaboost. Advances in Neural Information Processing Systems 24 (2011)

  67. Rashid, M., Kamruzzaman, J., Imam, T., Wibowo, S., Gordon, S.: Atree-based stacking ensemble technique with feature selection for network intrusion detection. Appl. Intell. 52, 1–14 (2022)

    Google Scholar 

  68. Rajagopal, S., Kundapur, P.P., Hareesha, K.S.: A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. 2020, 1–9 (2020)

    Google Scholar 

  69. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1), 105–139 (1999)

    Google Scholar 

  70. Alharbi, A., Alosaimi, W., Alyami, H., Rauf, H.T., Damaˇseviˇcius, R.: Botnet attack detection using local global best bat algorithm for the industrial internet of things. Electronics 10(11), 1341 (2021)

    Google Scholar 

  71. Cocoros, P., Sobocinski, M., Steiger, K., Coffman, J.: Evaluating techniques for practical cloud-based network intrusion detection. In: 2020 IEEE International Conference on Smart Cloud (SmartCloud), pp. 62–67 (2020). https://doi.org/10.1109/SmartCloud49737.2020.00020

  72. Mhawi, D.N., Aldallal, A., Hassan, S.: Advanced feature-selection-based hybrid ensemble learning algorithms for network intrusion detection systems. Symmetry 14, 1461 (2022). https://doi.org/10.3390/sym14071461

    Article  Google Scholar 

  73. Revathi, S., Malathi, A.: A detailed analysis on the NSL-KDD dataset using various machine learning techniques for intrusion detection. Int. J. Eng. Res. Technol. 2(12), 1848–1853 (2013)

    Google Scholar 

  74. Kotsiantis, S., Pintelas, P.: Combining bagging and boosting. Int. J. Comput. Intell. 1(4), 324–333 (2004)

    Google Scholar 

  75. Kilincer, I.F., Ertam, F., Sengur, A.: A comprehensive intrusion detection framework using boosting algorithms. Comput. Electr. Eng. 100, 107869 (2022)

    Google Scholar 

  76. El Motaki, S., Yahyaouy, A., Gualous, H., et al.: A new weighted fuzzy C-means clustering for workload monitoring in cloud data center platforms. Cluster Comput. 24, 3367–3379 (2021). https://doi.org/10.1007/s10586-021-03331-2

    Article  Google Scholar 

  77. Mousavi, A., Sajedi Hosseini, F., Choubin, B., Goodarzi, M., Dineva, A.A., Rafiei Sardooi, E.: Ensemble boosting and bagging based machine learning models for groundwater potential prediction. Water Resour. Manage 35(1), 23–37 (2021)

    Google Scholar 

  78. Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

  79. van Rijn, J.N., Holmes, G., Pfahringer, B., Vanschoren, J.: The online performance estimation framework: Heterogeneous ensemble learning for data streams. Mach. Learn. 107(1), 149–176 (2018)

    MathSciNet  Google Scholar 

  80. De Coste, M., Li, Z., Pupek, D., Sun, W.: A hybrid ensemble modeling framework for the prediction of breakup ice jams on northern Canadian rivers. Cold Reg. Sci. Technol. 189, 103302 (2021)

    Google Scholar 

  81. Chowdhury, R., Sen, S., Roy, A., Saha, B.: An optimal feature-based network intrusion detection system using bagging ensemble method for real-time traffic analysis. Multimed. Tools App. 81, 1–23 (2022)

    Google Scholar 

  82. Louppe, G., Geurts, P.: Ensembles on random patches. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 346–361. Springer (2012)

  83. Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)

    Google Scholar 

  84. Disha, R.A., Waheed, S.: Performance analysis of machine learning models for intrusion detection system using Gini impurity-based weighted random forest (GIWRF) feature selection technique. Cybersecurity 5(1), 1–22 (2022)

    Google Scholar 

  85. Krishnaveni, S., Sivamohan, S., Sridhar, S.S., et al.: Efficient feature selection and classification through ensemble method for network intrusion detection on cloud computing. Cluster Comput. 24, 1761–1779 (2021). https://doi.org/10.1007/s10586-020-03222-y

    Article  Google Scholar 

  86. Artur, M.: Review the performance of the Bernoulli Naive Bayes classifier in intrusion detection systems using recursive feature elimination with a cross-validated selection of the best number of features. Procedia Comput. Sci. 190, 564–570 (2021)

    Google Scholar 

  87. Belouch, M., El Hadaj, S., Idhammad, M.: Performance evaluation of intrusion detection based on machine learning using apache spark. Procedia Comput. Sci. 127, 1–6 (2018)

    Google Scholar 

  88. Al-Omari, M., Rawashdeh, M., Qutaishat, F., Alshira’H, M., Ababneh, N.: An intelligent tree-based intrusion detection model for cyber security. J. Netw. Syst. Manag. 29(2), 1–18 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Farah Jemili and Rahma Meddeb wrote the main manuscript text. Farah Jemili prepared tables and figures. All authors reviewed the manuscript.

Corresponding author

Correspondence to Farah Jemili.

Ethics declarations

Conflict of interest

The authors state that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jemili, F., Meddeb, R. & Korbaa, O. Intrusion detection based on ensemble learning for big data classification. Cluster Comput 27, 3771–3798 (2024). https://doi.org/10.1007/s10586-023-04168-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-023-04168-7

Keywords

Navigation