Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set

Published: 01 September 2004 Publication History

Abstract

A large set of machine learning and pattern classification algorithms trained and tested on KDD intrusion detection data set failed to identify most of the user-to-root and remote-to-local attacks, as reported by many researchers in the literature. In light of this observation, this paper aims to expose the deficiencies and limitations of the KDD data set to argue that this data set should not be used to train pattern recognition or machine learning algorithms for misuse detection for these two attack categories. Multiple analysis techniques are employed to demonstrate, both objectively and subjectively, that the KDD training and testing data subsets represent dissimilar target hypotheses for user-to-root and remote-to-local attack categories. These techniques consisted of switching the roles of original training and testing data subsets to develop a decision tree classifier, cross-validation on merged training and testing data subsets, and qualitative and comparative analysis of rules generated independently on training and testing data subsets through the C4.5 decision tree algorithm. Analysis results clearly suggest that no pattern classification or machine learning algorithm can be trained successfully with the KDD data set to perform misuse detection for user-to-root or remote-to-local attack categories. It is further noted that the analysis techniques employed to assess the similarity between the two target hypotheses represented by the training and the testing data subsets can readily be generalized to data set pairs in other problem domains.

References

[1]
R. Agarwal and M.V. Joshi, PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection), IBM Research Division Technical Report No. RC-21719, 2000.
[2]
I. Levin, KDD-99 Classifier Learning Contest LLSoft's Results Overview, ACM SIGKDD Explorations 1(2) (2000), 67-75.
[3]
C. Elkan, Results of the KDD'99 Classifier Learning, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA 1(2) (2000), 63-64.
[4]
L. Ertoz, M. Steinbach and V. Kumar, Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data, Technical Report, 2002.
[5]
W. Fan, M. Miller, S. Stolfo, W. Lee and P. Chan, Using Artificial Anomalies to Detect Unknown and Known Network Intrusions, IEEE International Conference on Data Mining, San Jose, CA, 2001, pp. 123-130.
[6]
J. Gomez and D. Dasgupta, Evolving Fuzzy Classifiers for Intrusion Detection, In Proceedings of the IEEE Workshop on Information Assurance, United States Military Academy, West Point, NY, 2001, pp. 68-75.
[7]
M.V. Joshi, R.C. Agarwal and V. Kumar, Mining Needles in a Haystack: Classifying Rare Classes via Two-Phase Rule Induction, ACM SIGMOD Conference on Management of Data, Santa Barbara, CA, 2001, pp. 91-102.
[8]
W. Lee and S. Stolfo, A Framework for Constructing Features and Models for Intrusion Detection Systems, ACM Transactions on Information and System Security 3 (2000), 227-261.
[9]
D.Y. Yeung and C. Chow, Parzen-window Network Intrusion Detectors, In Proceedings of the Sixteenth International Conference on Pattern Recognition, Quebec City, Canada, 2002, pp. 385-388.
[10]
DARPA 1998 data set, http://www.ll.mit.edu/IST/ideval/data/1998/1998_data_index.html, cited August 2003.
[11]
W. Lee, S.J. Stolfo and K.W. Mok, A Data Mining Framework for Building Intrusion Detection Models, IEEE Symposium on Security and Privacy, Oakland, California, 1999, pp. 120-132.
[12]
KDD 1999 data set, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, cited August 2003.
[13]
W. Lee, S.J. Stolfo and K.W. Mok, Mining in a Data-Flow Environment: Experience in Network Intrusion Detection, in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, 1999, pp. 114-124.
[14]
B. Pfahringer, Winner of KDD'99 classifier learning contest, Australian Research Institute for Artificial Intelligence, http://www.ai.univie.ac.at/bernhard/kddcup99.html, cited August 2003.
[15]
M.R. Sabhnani and G. Serpen, Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context, Proceedings of International Conference on Machine Learning: Models, Technologies, and Applications, Las Vegas, Nevada, 2003, pp. 209-215.
[16]
J.R. Quinlan, C4.5: Program for Machine Learning, Morgan Kaufmann Publishing, 1992.
[17]
C4.5 Simulator, Developer Site: http://www.rulequest.com; Download code from: http://www.cs.uregina.ca/dbd/cs831/ notes/ml/dtrees/c4.5/tutorial.html, cited August 2003.
[18]
P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioural Sciences, Ph.D. dissertation, Committee on Applied Mathematics, Harvard University, Cambridge, MA, 1974.
[19]
R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, New York: Wiley, 1973.
[20]
L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees, Belmont, California, Wadsworth, Inc., 1984.
[21]
LNKnet software, http://www.ll.mit.edu/IST/lnknet/index.html, cited August 2003.
[22]
T. Mitchell, Machine Learning, McGraw-Hill series in computer science, 1997.
[23]
E. Parzen, On Estimation of A Probability Density and Mode, Annals of Mathematical Statistics 35 (1962), 1065-1076.

Cited By

View all
  • (2021)Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree ModelComplexity10.1155/2021/66348112021Online publication date: 1-Jan-2021
  • (2019)A critique of imbalanced data learning approaches for big data analyticsInternational Journal of Business Intelligence and Data Mining10.1504/ijbidm.2019.09996114:4(419-457)Online publication date: 1-Jan-2019
  • (2019)A Survey of Intrusion Detection Systems Leveraging Host DataACM Computing Surveys10.1145/334438252:6(1-35)Online publication date: 14-Nov-2019
  • Show More Cited By
  1. Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Intelligent Data Analysis
      Intelligent Data Analysis  Volume 8, Issue 4
      September 2004
      112 pages

      Publisher

      IOS Press

      Netherlands

      Publication History

      Published: 01 September 2004

      Author Tags

      1. KDD data set
      2. cross validation
      3. decision trees
      4. intrusion detection
      5. machine learning

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree ModelComplexity10.1155/2021/66348112021Online publication date: 1-Jan-2021
      • (2019)A critique of imbalanced data learning approaches for big data analyticsInternational Journal of Business Intelligence and Data Mining10.1504/ijbidm.2019.09996114:4(419-457)Online publication date: 1-Jan-2019
      • (2019)A Survey of Intrusion Detection Systems Leveraging Host DataACM Computing Surveys10.1145/334438252:6(1-35)Online publication date: 14-Nov-2019
      • (2017)Ramp loss K-Support Vector Classification-Regression; a robust and sparse multi-class approach to the intrusion detection problemKnowledge-Based Systems10.1016/j.knosys.2017.03.012126:C(113-126)Online publication date: 15-Jun-2017
      • (2017)Designing vulnerability testing tools for web servicesInternational Journal of Information Security10.1007/s10207-016-0334-016:4(435-457)Online publication date: 1-Aug-2017
      • (2014)A distance sum-based hybrid method for intrusion detectionApplied Intelligence10.1007/s10489-013-0452-640:1(178-188)Online publication date: 1-Jan-2014
      • (2013)Evaluating performance of long short-term memory recurrent neural networks on intrusion detection dataProceedings of the South African Institute for Computer Scientists and Information Technologists Conference10.1145/2513456.2513490(218-224)Online publication date: 7-Oct-2013
      • (2013)Evaluation of an adaptive genetic-based signature extraction system for network intrusion detectionPattern Analysis & Applications10.1007/s10044-011-0255-516:4(549-566)Online publication date: 1-Nov-2013
      • (2012)Multi-class pattern classification using single, multi-dimensional feature-space feature extraction evolved by multi-objective genetic programming and its application to network intrusion detectionGenetic Programming and Evolvable Machines10.5555/2159284.215929513:1(33-63)Online publication date: 1-Mar-2012
      • (2012)A comparative study of use of shannon, rényi and tsallis entropy for attribute selecting in network intrusion detectionProceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning10.1007/978-3-642-32639-4_60(492-501)Online publication date: 29-Aug-2012
      • Show More Cited By

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media