Abstract
Outlier detection approaches show their efficacy while extracting unforeseen knowledge in domains such as intrusion detection, e-commerce, and fraudulent transactions. A prominent method like the K-Nearest Neighbor (KNN)-based outlier detection (KNNOD) technique relies on distance measures to extract the anomalies from the dataset. However, KNNOD is ill-equipped to deal with dynamic data environment efficiently due to its quadratic time complexity and sensitivity to changes in the dataset. As a result, any form of redundant computation due to frequent updates may lead to inefficiency while detecting outliers. In order to address these challenges, we propose an approximate adaptive grid-based outlier detection technique by finding point density using kernel density estimate (KAGO) instead of any distance measure. The proposed technique prunes the inlier grids and filters the candidate grids with local outliers upon a new point insertion. The grids containing potential outliers are aggregated to converge on to at most top-N global outliers incrementally. Experimental evaluation showed that KAGO outperformed KNNOD by more than an order of \(\approx\)3.9 across large relevant datasets at about half the memory consumption.
Similar content being viewed by others
Notes
In this paper, we use the term anomaly and outlier interchangeably.
Base dataset refers to the dataset before any change is inflicted upon it.
The density at a point here signifies the local density since it is computed wrt. (with respect to) the grid cell behaving as local neighborhood of the concerned point. We use the term density and local density interchangeably while describing concepts related to the KAGO algorithm.
Kernel centers are data points sampled from input dataset. A detailed definition of kernel center is presented in Sect. 2.
The point within \(g_{c}\) where each co-ordinate in a given dimension is the minimum of all the current points \(\in g_{c}\) in that dimension.
The point within \(g_{c}\) where each co-ordinate in a given dimension is the maximum of all the current points \(\in g_{c}\) in that dimension.
Post entry of new point, any grid previously a part of COG might not be a part of it anymore.
With repeated insertions, the number of existing outliers may be less than N.
Please refer to the file ‘KAGO_SVDD_comparison.pdf‘ for further details.
References
Aggarwal CC (2015) Outlier analysis. Data mining. Springer, Cham, pp 237–263
Baldoni R, Montanari L, Rizzuto M (2015) On-line failure prediction in safety-critical systems. Future Gener Comput Syst 45:123–132
Brabazon A, Cahill J, Keenan P, Walsh D (2010) Identifying online credit card fraud using artificial immune systems. In: IEEE Congress on Evolutionary Computation, IEEE, pp 1–7
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of data, pp. 93–104
Cao L, Yang D, Wang Q, Yu Y, Wang J, Rundensteiner EA (2014) Scalable distance-based outlier detection over high-volume data streams. In: 2014 IEEE 30th International Conference on Data Engineering, IEEE, pp. 76–87
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Dang TT, Ngan HY, Liu W (2015) Distance-based k-nearest neighbors outlier detection method in large-scale traffic data. In: 2015 IEEE International Conference on Digital Signal Processing (DSP), IEEE, pp. 507–510
Djenouri Y, Belhadi A, Lin JCW, Cano A (2019) Adapted k-nearest neighbors for detecting anomalies on spatio-temporal traffic flow. IEEE Access 7:10015–10027
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96:226–231
Haque S, Rahman M, Aziz S (2015) Sensor anomaly detection in wireless sensor networks for healthcare. Sensors 15(4):8764–8786
Hassanat AB, Abbadi MA, Altarawneh GA, Alhasanat AA (2014) Solving the problem of the k parameter in the knn classifier using an ensemble learning approach. arXiv preprint arXiv:14090919
Hero AO (2007) Geometric entropy minimization (gem) for anomaly detection and localization. In: Advances in Neural Information Processing Systems, pp. 585–592
Karimian SH, Kelarestaghi M, Hashemi S (2012) I-inclof: improved incremental local outlier detection for data streams. In: The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), pp. 023–028, https://doi.org/10.1109/AISP.2012.6313711
Khalastchi E, Kaminka GA, Kalech M, Lin R (2011) Online anomaly detection in unmanned vehicles. In: The 10th International Conference on Autonomous Agents and Multiagent Systems-Vol 1, International Foundation for Autonomous Agents and Multiagent Systems, pp. 115–122
Kirchner M (2010) A framework for detecting anomalies in http traffic using instance-based learning and k-nearest neighbor classification. In: 2010 2nd International Workshop on Security and Communication Networks (IWSCN), pp. 1–8, https://doi.org/10.1109/IWSCN.2010.5497997
Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. VLDB, Citeseer 98:392–403
Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, pp. 61–75
Li Y, Fang B, Guo L, Chen Y (2007) Network anomaly detection based on tcm-knn algorithm. In: Proceedings of the 2nd ACM Symposium on Information, Computer and Communications Security, ACM, pp. 13–19
Mitchell R, Chen R (2013) Behavior-rule based intrusion detection systems for safety critical smart grid applications. IEEE Trans Smart Grid 4(3):1254–1263
Na GS, Kim D, Yu H (2018) Dilof: effective and memory efficient local outlier detection in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1993–2002
Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: 2007 IEEE Symposium on Computational Intelligence and Data Mining, pp. 504–515, https://doi.org/10.1109/CIDM.2007.368917
Qian G, Sural S, Gu Y, Pramanik S (2004) Similarity between euclidean and cosine angle distance for nearest neighbor queries. In: Proceedings of the 2004 ACM Symposium on Applied Computing, ACM, New York, NY, USA, SAC ’04, pp. 1232–1237, https://doi.org/10.1145/967900.968151, http://doi.acm.org/10.1145/967900.968151
Qin X, Cao L, Rundensteiner EA, Madden S (2019) Scalable kernel density estimation-based local outlier detection over large data streams. In: EDBT, pp. 421–432
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. ACM Sigmod Rec ACM 29:427–438
Salehi M, Leckie C, Bezdek JC, Vaithianathan T, Zhang X (2016) Fast memory efficient local outlier detection in data streams. IEEE Trans Knowl Data Eng 28(12):3246–3260
Salem O, Liu Y, Mehaoua A, Boutaba R (2014) Online anomaly detection in wireless body area networks for reliable healthcare monitoring. IEEE J Biomed Health Inf 18(5):1541–1551
Schubert E, Zimek A, Kriegel HP (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM International Conference on Data Mining, SIAM, pp. 542–550
Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, London
Srivastava A, Kundu A, Sural S, Majumdar A (2008) Credit card fraud detection using hidden markov model. IEEE Trans Dependable Secur Comput 5(1):37–48
Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd International Conference on Very large data bases, VLDB Endowment, pp. 187–198
Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180
Tax DM, Duin RP (2004) Support vector data description. Mach Learn 54(1):45–66
Thottan M, Ji C (2003) Anomaly detection in ip networks. IEEE Trans Sig Process 51(8):2191–2204
Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Xie M, Hu J, Han S, Chen HH (2012) Scalable hypergrid k-nn-based online anomaly detection in wireless sensor networks. IEEE Trans Parallel Distrib Syst 24(8):1661–1670
Xu X, Liu H, Yao M (2019) Recent progress of anomaly detection. Complexity. https://doi.org/10.1155/2019/2686378
Zhang S, Bar-Shalom Y (2009) Robust kernel-based object tracking with multiple kernel centers. In: 2009 12th International Conference on Information Fusion, IEEE, pp. 1014–1021
Zill D, Wright WS, Cullen MR (2011) Advanced engineering mathematics. Jones & Bartlett Learning
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bhattacharjee, P., Garg, A. & Mitra, P. KAGO: an approximate adaptive grid-based outlier detection approach using kernel density estimate. Pattern Anal Applic 24, 1825–1846 (2021). https://doi.org/10.1007/s10044-021-00998-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-00998-6