Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

KAGO: an approximate adaptive grid-based outlier detection approach using kernel density estimate

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Outlier detection approaches show their efficacy while extracting unforeseen knowledge in domains such as intrusion detection, e-commerce, and fraudulent transactions. A prominent method like the K-Nearest Neighbor (KNN)-based outlier detection (KNNOD) technique relies on distance measures to extract the anomalies from the dataset. However, KNNOD is ill-equipped to deal with dynamic data environment efficiently due to its quadratic time complexity and sensitivity to changes in the dataset. As a result, any form of redundant computation due to frequent updates may lead to inefficiency while detecting outliers. In order to address these challenges, we propose an approximate adaptive grid-based outlier detection technique by finding point density using kernel density estimate (KAGO) instead of any distance measure. The proposed technique prunes the inlier grids and filters the candidate grids with local outliers upon a new point insertion. The grids containing potential outliers are aggregated to converge on to at most top-N global outliers incrementally. Experimental evaluation showed that KAGO outperformed KNNOD by more than an order of \(\approx\)3.9 across large relevant datasets at about half the memory consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. In this paper, we use the term anomaly and outlier interchangeably.

  2. Base dataset refers to the dataset before any change is inflicted upon it.

  3. The density at a point here signifies the local density since it is computed wrt. (with respect to) the grid cell behaving as local neighborhood of the concerned point. We use the term density and local density interchangeably while describing concepts related to the KAGO algorithm.

  4. Kernel centers are data points sampled from input dataset. A detailed definition of kernel center is presented in Sect. 2.

  5. The point within \(g_{c}\) where each co-ordinate in a given dimension is the minimum of all the current points \(\in g_{c}\) in that dimension.

  6. The point within \(g_{c}\) where each co-ordinate in a given dimension is the maximum of all the current points \(\in g_{c}\) in that dimension.

  7. Post entry of new point, any grid previously a part of COG might not be a part of it anymore.

  8. With repeated insertions, the number of existing outliers may be less than N.

  9. Please refer to the file ‘KAGO_SVDD_comparison.pdf‘ for further details.

  10. https://nlp.stanford.edu/IR-book/html/htmledition/.

References

  1. Aggarwal CC (2015) Outlier analysis. Data mining. Springer, Cham, pp 237–263

    Google Scholar 

  2. Baldoni R, Montanari L, Rizzuto M (2015) On-line failure prediction in safety-critical systems. Future Gener Comput Syst 45:123–132

    Article  Google Scholar 

  3. Brabazon A, Cahill J, Keenan P, Walsh D (2010) Identifying online credit card fraud using artificial immune systems. In: IEEE Congress on Evolutionary Computation, IEEE, pp 1–7

  4. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of data, pp. 93–104

  5. Cao L, Yang D, Wang Q, Yu Y, Wang J, Rundensteiner EA (2014) Scalable distance-based outlier detection over high-volume data streams. In: 2014 IEEE 30th International Conference on Data Engineering, IEEE, pp. 76–87

  6. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15

    Article  Google Scholar 

  7. Dang TT, Ngan HY, Liu W (2015) Distance-based k-nearest neighbors outlier detection method in large-scale traffic data. In: 2015 IEEE International Conference on Digital Signal Processing (DSP), IEEE, pp. 507–510

  8. Djenouri Y, Belhadi A, Lin JCW, Cano A (2019) Adapted k-nearest neighbors for detecting anomalies on spatio-temporal traffic flow. IEEE Access 7:10015–10027

    Article  Google Scholar 

  9. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  10. Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231

    Google Scholar 

  11. Haque S, Rahman M, Aziz S (2015) Sensor anomaly detection in wireless sensor networks for healthcare. Sensors 15(4):8764–8786

    Article  Google Scholar 

  12. Hassanat AB, Abbadi MA, Altarawneh GA, Alhasanat AA (2014) Solving the problem of the k parameter in the knn classifier using an ensemble learning approach. arXiv preprint arXiv:14090919

  13. Hero AO (2007) Geometric entropy minimization (gem) for anomaly detection and localization. In: Advances in Neural Information Processing Systems, pp. 585–592

  14. Karimian SH, Kelarestaghi M, Hashemi S (2012) I-inclof: improved incremental local outlier detection for data streams. In: The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), pp. 023–028, https://doi.org/10.1109/AISP.2012.6313711

  15. Khalastchi E, Kaminka GA, Kalech M, Lin R (2011) Online anomaly detection in unmanned vehicles. In: The 10th International Conference on Autonomous Agents and Multiagent Systems-Vol 1, International Foundation for Autonomous Agents and Multiagent Systems, pp. 115–122

  16. Kirchner M (2010) A framework for detecting anomalies in http traffic using instance-based learning and k-nearest neighbor classification. In: 2010 2nd International Workshop on Security and Communication Networks (IWSCN), pp. 1–8, https://doi.org/10.1109/IWSCN.2010.5497997

  17. Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. VLDB, Citeseer 98:392–403

    Google Scholar 

  18. Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, pp. 61–75

  19. Li Y, Fang B, Guo L, Chen Y (2007) Network anomaly detection based on tcm-knn algorithm. In: Proceedings of the 2nd ACM Symposium on Information, Computer and Communications Security, ACM, pp. 13–19

  20. Mitchell R, Chen R (2013) Behavior-rule based intrusion detection systems for safety critical smart grid applications. IEEE Trans Smart Grid 4(3):1254–1263

    Article  Google Scholar 

  21. Na GS, Kim D, Yu H (2018) Dilof: effective and memory efficient local outlier detection in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1993–2002

  22. Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: 2007 IEEE Symposium on Computational Intelligence and Data Mining, pp. 504–515, https://doi.org/10.1109/CIDM.2007.368917

  23. Qian G, Sural S, Gu Y, Pramanik S (2004) Similarity between euclidean and cosine angle distance for nearest neighbor queries. In: Proceedings of the 2004 ACM Symposium on Applied Computing, ACM, New York, NY, USA, SAC ’04, pp. 1232–1237, https://doi.org/10.1145/967900.968151, http://doi.acm.org/10.1145/967900.968151

  24. Qin X, Cao L, Rundensteiner EA, Madden S (2019) Scalable kernel density estimation-based local outlier detection over large data streams. In: EDBT, pp. 421–432

  25. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. ACM Sigmod Rec ACM 29:427–438

    Article  Google Scholar 

  26. Salehi M, Leckie C, Bezdek JC, Vaithianathan T, Zhang X (2016) Fast memory efficient local outlier detection in data streams. IEEE Trans Knowl Data Eng 28(12):3246–3260

    Article  Google Scholar 

  27. Salem O, Liu Y, Mehaoua A, Boutaba R (2014) Online anomaly detection in wireless body area networks for reliable healthcare monitoring. IEEE J Biomed Health Inf 18(5):1541–1551

    Article  Google Scholar 

  28. Schubert E, Zimek A, Kriegel HP (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM International Conference on Data Mining, SIAM, pp. 542–550

  29. Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, London

    Book  Google Scholar 

  30. Srivastava A, Kundu A, Sural S, Majumdar A (2008) Credit card fraud detection using hidden markov model. IEEE Trans Dependable Secur Comput 5(1):37–48

    Article  Google Scholar 

  31. Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd International Conference on Very large data bases, VLDB Endowment, pp. 187–198

  32. Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180

    Article  Google Scholar 

  33. Tax DM, Duin RP (2004) Support vector data description. Mach Learn 54(1):45–66

    Article  Google Scholar 

  34. Thottan M, Ji C (2003) Anomaly detection in ip networks. IEEE Trans Sig Process 51(8):2191–2204

    Article  Google Scholar 

  35. Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000

    Article  Google Scholar 

  36. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52

    Article  Google Scholar 

  37. Xie M, Hu J, Han S, Chen HH (2012) Scalable hypergrid k-nn-based online anomaly detection in wireless sensor networks. IEEE Trans Parallel Distrib Syst 24(8):1661–1670

    Article  Google Scholar 

  38. Xu X, Liu H, Yao M (2019) Recent progress of anomaly detection. Complexity. https://doi.org/10.1155/2019/2686378

    Article  Google Scholar 

  39. Zhang S, Bar-Shalom Y (2009) Robust kernel-based object tracking with multiple kernel centers. In: 2009 12th International Conference on Information Fusion, IEEE, pp. 1014–1021

  40. Zill D, Wright WS, Cullen MR (2011) Advanced engineering mathematics. Jones & Bartlett Learning

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panthadeep Bhattacharjee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhattacharjee, P., Garg, A. & Mitra, P. KAGO: an approximate adaptive grid-based outlier detection approach using kernel density estimate. Pattern Anal Applic 24, 1825–1846 (2021). https://doi.org/10.1007/s10044-021-00998-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-021-00998-6

Keywords

Navigation