Abstract
Intrusion and anomaly detection are particularly important to protect computer networks and communication vulnerability. This research aims to experimentally identify the best error-based machine learning algorithm for anomaly detection and anomaly attack categorization with the highest accuracy and fastest build time. A two-stage anomaly and categorization framework has been set up for experimental evaluation. The first stage identifies if a network flow is normal or anomalous and the second stage identifies type of attack if the first stage result is anomalous. The goal is to eventually use the best algorithm in an online stream model of network intrusion detection. To this end, five research propositions are defined, four sets of experiments are set up, and four research questions are asked. The UNSW-NB15 dataset for network anomaly is used for training and testing in the experiments. Machine learning algorithms are classified into four different learning approaches: information-based, similarity-based, probability-based, and error-based. Our focus in this paper is on the error-based learning models, specifically, the following algorithms: Winnow, Logistic, Perceptron, Support Vector Machine (SVM), and Deep Learning. The results are also compared with the results of non-error-based machine learning algorithms. The results obtained show that, overall, the error-based machine learning algorithm, Winnow, is the best with 100% accuracy and time to build the model of 0.47 s for network anomaly detection. In terms of accuracy only, SVM comes top for network anomaly attack categorization but Simple Logistic is the best when accuracy and time to build are considered together.
Similar content being viewed by others
References
Das A, Ajila SA, Lung C-H (2020) A comprehensive analysis of accuracies of machine learning algorithms for network intrusion detection, IFIP international federation for information processing 2020, Published by Springer Nature Switzerland AG 2020, S. Boumerdassi et al. (Eds.): MLN 2019, LNCS 12081, pp 40–57
Kelleher JD, Mac Namee B, D’Arcy A (2015) Fundamentals of machine learning for predictive data analytics. The MIT Press, Cambridge ISBN 978-0-262-02944-5
Moustafa N, Slay J, Creech G (2019) Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Trans Big Data 5(4):481–494
Nikravesh AY, Ajila SA, Lung C-H (2017) An autonomic prediction suite for cloud resource provisioning. J Cloud Comput 6(3):1–20
Witten IH, Frank E, Hall MA, Pal CJ (2017) Data mining – practical machine learning tools and techniques, 4th edn. Morgan Kaufmann, Elsevier, New York ISBN 978-0-12-804291-5
Bifet A, Gavaldà R, Holmes G, Pfahringer B (2017) Machine learning for data streams with practical examples in MOA. The MIT Press, Cambridge ISBN 978-0-262-03779-2
Charniak E (2018) Introduction to deep learning. The MIT Press, Cambridge ISBN 978-0-262-03951-2
S. Langa, F. Bravo-Marquezb, C. Beckhamc, M. Halld, and E. Franke, WekaDeeplearning4j: a deep learning package for Weka based on DeepLearning4j, Knowledge-Based Systems, 2019
N. Koroniotis, N. Moustafa, E. Sitnikova, B. Turnbull, “Towards the development of realistic Botnet dataset in the Internet of things for network forensic analytics: Bot-IoT Datasethttps://arxiv.org/abs/1811.00701, 2018
T. Janarthanan and S. Zargari, "Feature selection in UNSW-NB15 and KDDCUP'99 datasets", Proc. of IEEE 26th Int’l Symposium on Industrial Electronics, 2017
N. Moustafa; J. Slay, “A hybrid feature selection for Network intrusion detection systems: central points”, Proc. of the 16th Australian Information Warfare Conf., 2015
Idhammad M, Afdel K, Belouch M (2017) DoS detection method based on artificial neural networks. Int J Adv Comput Sci Appl 8(4)
M. Al-Zewairi, S. Almajali and A. Awajan, “Experimental evaluation of a multi-layer feed-forward artificial neural network classifier for network intrusion detection system”, Proc. of Int’l Conf on New Trends in Computing Sciences, 2017, pp. 167–172
H. Gharaee and H. Hosseinvand, “A new feature selection IDS based on genetic algorithm and SVM”, Proc. of the 8th Int’l Symp. on Telecommunications, 2016, pp. 139–144
Moustafa N, Slay J The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 Data set. Inf Secur J 25(1):18–31
S. Marsland, Machine learning: an algorithmic perspective, 2, Chapman and Hall/CRC, 2014
N. Moustafa, and J. Slay, “UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data Set)”, Proc. of IEEE Military Communications and Information Systems Conf. (MilCIS), 2015
Ajila SA, Bankole AA (2016) Using machine learning algorithms for cloud prediction models in a web VM resource provisioning environment. Trans Mach Learn Artif Intell 4(1):29–51
Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50
S. Kumar and A. Yadav, “Increasing performance of intrusion detection system using neural network”, Proc. of IEEE Int’l Conf on Advanced Communications, Control and Computing Technologies, 2014, pp. 546–550
Bouckaert RR (2008) Bayesian Network Classifiers in Weka for Version 3–5-7. https://www.cs.waikato.ac.nz/~remco/weka.bn.pdf, Access June 2020
Quinlan JR (1986) “Induction of decision trees”, Machine Learning. 6 Kluwer academic publishers, Boston
H. Nguyen, K. Franke and S. Petrovic, “Improving effectiveness of intrusion detection by correlation feature selection”, Proc. of Int’l Conf. on Availability, Reliability and Security, 2010, pp. 17–24
M. S. Pervez and D. M. Farid, “Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs”, Proc. of Int’l Conf on Software, Knowledge, Information Management and Applications, 2014, pp. 1–6
H. T. Nguyen, K. Franke and S. Petrovic, “Towards a generic feature-selection measure for intrusion detection”, Proc. of Int’l Conf on Pattern Recognition, 2010, pp.1529–1532
A. Zainal, M. A. Maarof and S. M. Shamsuddin, “Feature selection using rough set in intrusion detection”, Proc. of TENCON IEEE Region 10 Conference, Hong Kong, 2006, pp. 1–4
Z. Muda, W. Yassin, M. N. Sulaiman and N. I. Udzir, “Intrusion detection based on K-means clustering and Naïve Bayes classification”, Proc. of 7th Int’l Conf on Information Technology in Asia, 2011, pp. 1–6
Ingre B, Yadav A (2015) Performance analysis of NSL-KDD dataset using ANN. Proc. of Int’l Conf on Signal Processing and Communication Eng. Systems, Guntur, pp 92–96
T. Garg and S. S. Khurana, “Comparison of classification techniques for intrusion detection dataset using WEKA”, Proc. of Int’l Conf on Recent Advances and Innovations in Engineering, 2014, pp. 1–5
N. Paulauskas and J. Auskalnis, “Analysis of data pre-processing influence on intrusion detection using NSL-KDD dataset," Proc. of Open Conf. of Electrical, Electronic and Information Sciences (eStream), 2017, pp. 1–5
Funding
The authors received support from the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ajila, S.A., Lung, CH. & Das, A. Analysis of error-based machine learning algorithms in network anomaly detection and categorization. Ann. Telecommun. 77, 359–370 (2022). https://doi.org/10.1007/s12243-021-00836-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12243-021-00836-0