article

Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set

Authors:

Maheshkumar Sabhnani,

Gursel SerpenAuthors Info & Claims

Intelligent Data Analysis, Volume 8, Issue 4

Pages 403 - 415

Published: 01 September 2004 Publication History

Abstract

A large set of machine learning and pattern classification algorithms trained and tested on KDD intrusion detection data set failed to identify most of the user-to-root and remote-to-local attacks, as reported by many researchers in the literature. In light of this observation, this paper aims to expose the deficiencies and limitations of the KDD data set to argue that this data set should not be used to train pattern recognition or machine learning algorithms for misuse detection for these two attack categories. Multiple analysis techniques are employed to demonstrate, both objectively and subjectively, that the KDD training and testing data subsets represent dissimilar target hypotheses for user-to-root and remote-to-local attack categories. These techniques consisted of switching the roles of original training and testing data subsets to develop a decision tree classifier, cross-validation on merged training and testing data subsets, and qualitative and comparative analysis of rules generated independently on training and testing data subsets through the C4.5 decision tree algorithm. Analysis results clearly suggest that no pattern classification or machine learning algorithm can be trained successfully with the KDD data set to perform misuse detection for user-to-root or remote-to-local attack categories. It is further noted that the analysis techniques employed to assess the similarity between the two target hypotheses represented by the training and the testing data subsets can readily be generalized to data set pairs in other problem domains.

References

[1]

R. Agarwal and M.V. Joshi, PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection), IBM Research Division Technical Report No. RC-21719, 2000.

[2]

I. Levin, KDD-99 Classifier Learning Contest LLSoft's Results Overview, ACM SIGKDD Explorations 1(2) (2000), 67-75.

Digital Library

[3]

C. Elkan, Results of the KDD'99 Classifier Learning, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA 1(2) (2000), 63-64.

[4]

L. Ertoz, M. Steinbach and V. Kumar, Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data, Technical Report, 2002.

[5]

W. Fan, M. Miller, S. Stolfo, W. Lee and P. Chan, Using Artificial Anomalies to Detect Unknown and Known Network Intrusions, IEEE International Conference on Data Mining, San Jose, CA, 2001, pp. 123-130.

[6]

J. Gomez and D. Dasgupta, Evolving Fuzzy Classifiers for Intrusion Detection, In Proceedings of the IEEE Workshop on Information Assurance, United States Military Academy, West Point, NY, 2001, pp. 68-75.

[7]

M.V. Joshi, R.C. Agarwal and V. Kumar, Mining Needles in a Haystack: Classifying Rare Classes via Two-Phase Rule Induction, ACM SIGMOD Conference on Management of Data, Santa Barbara, CA, 2001, pp. 91-102.

[8]

W. Lee and S. Stolfo, A Framework for Constructing Features and Models for Intrusion Detection Systems, ACM Transactions on Information and System Security 3 (2000), 227-261.

[9]

D.Y. Yeung and C. Chow, Parzen-window Network Intrusion Detectors, In Proceedings of the Sixteenth International Conference on Pattern Recognition, Quebec City, Canada, 2002, pp. 385-388.

Digital Library

[10]

DARPA 1998 data set, http://www.ll.mit.edu/IST/ideval/data/1998/1998_data_index.html, cited August 2003.

[11]

W. Lee, S.J. Stolfo and K.W. Mok, A Data Mining Framework for Building Intrusion Detection Models, IEEE Symposium on Security and Privacy, Oakland, California, 1999, pp. 120-132.

Digital Library

[12]

KDD 1999 data set, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, cited August 2003.

[13]

W. Lee, S.J. Stolfo and K.W. Mok, Mining in a Data-Flow Environment: Experience in Network Intrusion Detection, in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, 1999, pp. 114-124.

Digital Library

[14]

B. Pfahringer, Winner of KDD'99 classifier learning contest, Australian Research Institute for Artificial Intelligence, http://www.ai.univie.ac.at/bernhard/kddcup99.html, cited August 2003.

[15]

M.R. Sabhnani and G. Serpen, Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context, Proceedings of International Conference on Machine Learning: Models, Technologies, and Applications, Las Vegas, Nevada, 2003, pp. 209-215.

[16]

J.R. Quinlan, C4.5: Program for Machine Learning, Morgan Kaufmann Publishing, 1992.

[17]

C4.5 Simulator, Developer Site: http://www.rulequest.com; Download code from: http://www.cs.uregina.ca/dbd/cs831/ notes/ml/dtrees/c4.5/tutorial.html, cited August 2003.

[18]

P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioural Sciences, Ph.D. dissertation, Committee on Applied Mathematics, Harvard University, Cambridge, MA, 1974.

[19]

R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, New York: Wiley, 1973.

Digital Library

[20]

L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees, Belmont, California, Wadsworth, Inc., 1984.

[21]

LNKnet software, http://www.ll.mit.edu/IST/lnknet/index.html, cited August 2003.

[22]

T. Mitchell, Machine Learning, McGraw-Hill series in computer science, 1997.

[23]

E. Parzen, On Estimation of A Probability Density and Mode, Annals of Mathematical Statistics 35 (1962), 1065-1076.

Cited By

Mahbooba BTimilsina MSahal RSerrano M(2021)Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree ModelComplexity10.1155/2021/66348112021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/6634811
Nazir A(2019)A critique of imbalanced data learning approaches for big data analyticsInternational Journal of Business Intelligence and Data Mining10.1504/ijbidm.2019.09996114:4(419-457)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1504/ijbidm.2019.099961
Bridges RGlass-Vanderlan TIannacone MVincent MChen Q(2019)A Survey of Intrusion Detection Systems Leveraging Host DataACM Computing Surveys10.1145/334438252:6(1-35)Online publication date: 14-Nov-2019
https://dl.acm.org/doi/10.1145/3344382
Show More Cited By

Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

Rule generalisation in intrusion detection systems using SNORT

Intrusion Detection Systems (IDSs) provide an important layer of security for computer systems and networks. An IDS's responsibility is to detect suspicious or unacceptable system and network activity and to alert a systems administrator to this ...
An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks

In this paper, we propose a novel Intrusion Detection System (IDS) architecture utilizing both anomaly and misuse detection approaches. This hybrid Intrusion Detection System architecture consists of an anomaly detection module, a misuse detection ...
Network Intrusion Detection: Automated and Manual Methods Prone to Attack and Evasion

In this article, the authors describe common intrusion detection techniques, NIDS evasion methods, and how NIDSs detect intrusions. Additionally, we introduce new evasion methods, present test results for confirming attack outcomes based on server ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Intelligent Data Analysis

Intelligent Data Analysis Volume 8, Issue 4

September 2004

112 pages

ISSN:1088-467X

Issue’s Table of Contents

Publisher

IOS Press

Netherlands

Publication History

Published: 01 September 2004

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mahbooba BTimilsina MSahal RSerrano M(2021)Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree ModelComplexity10.1155/2021/66348112021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/6634811
Nazir A(2019)A critique of imbalanced data learning approaches for big data analyticsInternational Journal of Business Intelligence and Data Mining10.1504/ijbidm.2019.09996114:4(419-457)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1504/ijbidm.2019.099961
Bridges RGlass-Vanderlan TIannacone MVincent MChen Q(2019)A Survey of Intrusion Detection Systems Leveraging Host DataACM Computing Surveys10.1145/334438252:6(1-35)Online publication date: 14-Nov-2019
https://dl.acm.org/doi/10.1145/3344382
Hosseini Bamakan SWang HShi Y(2017)Ramp loss K-Support Vector Classification-Regression; a robust and sparse multi-class approach to the intrusion detection problemKnowledge-Based Systems10.1016/j.knosys.2017.03.012126:C(113-126)Online publication date: 15-Jun-2017
https://dl.acm.org/doi/10.1016/j.knosys.2017.03.012
Antunes NVieira M(2017)Designing vulnerability testing tools for web servicesInternational Journal of Information Security10.1007/s10207-016-0334-016:4(435-457)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s10207-016-0334-0
Guo CZhou YPing YZhang ZLiu GYang Y(2014)A distance sum-based hybrid method for intrusion detectionApplied Intelligence10.1007/s10489-013-0452-640:1(178-188)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1007/s10489-013-0452-6
Staudemeyer ROmlin CMcNeill JBradshaw K(2013)Evaluating performance of long short-term memory recurrent neural networks on intrusion detection dataProceedings of the South African Institute for Computer Scientists and Information Technologists Conference10.1145/2513456.2513490(218-224)Online publication date: 7-Oct-2013
https://dl.acm.org/doi/10.1145/2513456.2513490
Shafi KAbbass H(2013)Evaluation of an adaptive genetic-based signature extraction system for network intrusion detectionPattern Analysis & Applications10.1007/s10044-011-0255-516:4(549-566)Online publication date: 1-Nov-2013
https://dl.acm.org/doi/10.1007/s10044-011-0255-5
Badran KRockett P(2012)Multi-class pattern classification using single, multi-dimensional feature-space feature extraction evolved by multi-objective genetic programming and its application to network intrusion detectionGenetic Programming and Evolvable Machines10.5555/2159284.215929513:1(33-63)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.5555/2159284.2159295
Lima Cde Assis Fde Souza C(2012)A comparative study of use of shannon, rényi and tsallis entropy for attribute selecting in network intrusion detectionProceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning10.1007/978-3-642-32639-4_60(492-501)Online publication date: 29-Aug-2012
https://dl.acm.org/doi/10.1007/978-3-642-32639-4_60
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents