Abstract
Most existing algorithms of anomaly detection are suitable for static data where all data are available during detection but are incapable of handling dynamic data streams. In this study, we proposed an improved iLOF (incremental local outlier factor) algorithm based on the landmark window model, which provides an efficient method for anomaly detection in data streams and outperforms conventional methods. What is more, data windows as updating units are introduced to reduce the false alarm rate, and multiple tests are taken here to identify candidate anomalies and real anomalies. The improved iLOF shows its obvious advantage with its false positive rate. Furthermore, the proposed algorithm instantly deletes data points of identified real anomalies. We analyzed the performance of the improved algorithm and the sensitivity of certain parameters via empirical experiments using synthetic and real data sets. The experimental results demonstrate that the proposed improved algorithm achieved better performance on the higher detection rate and the lower false alarm rate compared with the original iLOF algorithm and its improvements.
Similar content being viewed by others
Notes
The experiment was done on a Core i5-4200 M CPU 2.50 GHz running Windows10.
References
Aggarwal CC (2015) Outlier analysis. Springer, Switzerland. https://doi.org/10.1007/978-3-319-14142-8_8
Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. ACM SIGMOD Rec 30(2):37–46. https://doi.org/10.1145/376284.375668
Ahn J, Lee MH, Lee JA (2019) Distance-based outlier detection for high dimension, low sample size data. J Appl Stat 46(1):13–29. https://doi.org/10.1080/02664763.2018.1452901
Andrade T, Gama J, Ribeiro RP, Sousa W, Carvalho A (2019) Anomaly detection in sequential data: principles and case studies. Wiley Encycl Electr Electron Eng. https://doi.org/10.1002/047134608X.W8382
Billor N, Hadi AS, Velleman PF (2000) Bacon: blocked adaptive computationally efficient outlier nominators. Comput Stats Data Anal 34(3):279–298. https://doi.org/10.1016/S0167-9473(99)00101-2
Blaiotta C (2019) Learning generative socially aware models of pedestrian motion. IEEE Robot Autom Lett 4(4):3433–3440. https://doi.org/10.1109/LRA.2019.2928202
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density based local outliers. In: Proceedings of ACM SIGMOD 2000 international conference on management of data 29(2): 1–12. https://doi.org/10.1145/342009.335388
Cao L, Yang D, Wang Q, Yu Y, Wang J, Rundensteiner, EA (2014) Scalable distance-based outlier detection over high-volume data streams. In: IEEE 30th international conference on data engineering, Chicago, IL, USA, pp 76–87. https://doi.org/10.1109/ICDE.2014.6816641
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58. https://doi.org/10.1145/1541880.1541882
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357. https://doi.org/10.1613/jair.953
Dua D, Graff C (2019) UCI machine learning repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science
Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S (2002) A geometric framework for unsupervised anomaly detection. In: Barbará D, Jajodia S (eds) Applications of data mining in computer security. Advances in information security. Springer, Boston, pp 77–101. https://doi.org/10.1007/978-1-4615-0953-0_4
Esmaeili M, Almadan A (2011) Stream data mining and anomaly detection. Int J Comput Appl 34(9):38–41
Guigou F, Collet P, Parrend P (2017) Anomaly detection and motif discovery in symbolic representations of time series, https://doi.org/10.13140/RG.2.2.20158.69447
Gupta M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267. https://doi.org/10.1109/TKDE.2013.184
Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data warehousing and knowledge discovery. Lecture notes in computer science, vol 2454. Springer, Berlin, pp 170–180. https://doi.org/10.1007/3-540-46145-0_17
He ZY, Xu XF, Deng SC (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650. https://doi.org/10.1016/S0167-8655(03)00003-5
Huerta R, Mosqueiro T, Fonollosa J, Rulkov FN, Rodriguez-Lujan I (2016) Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemom Intell Lab Syst 157:169–176. https://doi.org/10.1016/j.chemolab.2016.07.004
Karimian SH, Kelarestaghi M, Hashemi S (2012) I-IncLOF: improved incremental local outlier detection for data streams. In: Proceedings of the 16th CSI international symposium on artificial intelligence and signal. Shiraz, Fars, pp 023–028. https://doi.org/10.1109/AISP.2012.6313711
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large data sets. VLDB, Morgan Kaufmann, Burlington, pp 392–403
Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K, Manolopoulos Y (2011) Continuous monitoring of distance-based outliers over data streams. In: IEEE 27th international conference on data engineering, Hannover, Germany, pp 135–146. https://doi.org/10.1109/ICDE.2011.5767923
Kriegel HP, Kroger P, Schubert E, Zimek A (2009) LoOP: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management. Hong Kong, China, pp 1649–1652. https://doi.org/10.1145/1645953.1646195
Latecki LJ, Miezianko R, Megalooikonomou V, Pokrajac D (2006) Using spatiotemporal blocks to reduce the uncertainty in detecting and tracking moving objects in video. Int J Intell Syst Technol Appl 1(3/4):376–392. https://doi.org/10.1504/IJISTA.2006.009914
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, United States, pp 157–166. https://doi.org/10.1145/1081870.1081891
Lee J, Kang B, Kang SH (2011) Integrating independent component analysis and local outlier factor for plant-wide process monitoring. J Process Control 21(7):1011–1021. https://doi.org/10.1016/j.jprocont.2011.06.004
Liu F, Qi JP, Yu YW, Cao L, Zhao JD (2019) A fast algorithm for density-based top-n local outlier detection. ACTA Autom Sin 45(09):1756–1771. https://doi.org/10.16383/j.aas.c180425
Liu J, Deng HF (2013) Outlier detection on uncertain data based on local information. Knowl Based Syst 51:60–71. https://doi.org/10.1016/j.knosys.2013.07.005
Liu X, Guan J, Hu P (2009) Mining frequent closed itemsets from a landmark window over online data streams. Comput Math Appl 57(6):927–936. https://doi.org/10.1016/j.camwa.2008.10.060
Medioni G, Cohen I, Hongeng S, Bremond F, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 23(8):873–889. https://doi.org/10.1109/34.946990
Paulauskas N, Bagdonas AF (2015) Local outlier factor use for the network flow anomaly detection. Secur Commun Networks 8(18):4203–4212. https://doi.org/10.1002/sec.1335
Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: IEEE symposium on computational intelligence and data mining. Honolulu, HI, pp 504–515. https://doi.org/10.1109/CIDM.2007.368917
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231. https://doi.org/10.1023/A:1007601015854
Ribeiro RP, Pereira P, Gama J (2016) Sequential anomalies: a study in the Railway Industry. Mach Learn 105(1):127–153. https://doi.org/10.1007/s10994-016-5584-6
Salehi M, Leckie C, Bezdek JC, Vaithianathan T, Zhang X (2016) Fast memory efficient local outlier detection in data streams. IEEE Trans Knowl Data Eng 28(12):3246–3260. https://doi.org/10.1109/TKDE.2016.2597833
Salehi M, Leckie CA, Moshtaghi M, Vaithianathan T (2014) A relevance weighted ensemble model for anomaly detection in switching data streams. In: Tseng VS, Ho TB, Zhou ZH, Chen ALP, Kao HY (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 8444. Springer, Cham, pp 461–473. https://doi.org/10.1007/978-3-319-06605-9_38
Schubert E, Zimek A, Kriegel HP (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc 28(1):190–237. https://doi.org/10.1007/s10618-012-0300-z
Sun P, Chawla S (2004) On local spatial outliers. In: International conference on data mining. Brighton, UK, pp 209–216. https://doi.org/10.1109/ICDM.2004.10097
Tang J, Chen Z, Fu A, Cheung D (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Chen MS, Yu PS, Liu B (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 2336. Springer, Berlin, pp 535–548. https://doi.org/10.1007/3-540-47887-6_53
Xu J, Sung AH, Liu Q (2007) Behaviour mining for fraud detection. J Res Pract Inf 39(1):3–18. https://doi.org/10.1007/s10851-006-9000-x
Yamanishi K, Takeuchi J, Williams G, Milne P (2000) Online unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Disc 8:275–300. https://doi.org/10.1023/B:DAMI.0000023676.72185.7c
Yu D, Sheikholeslami G, Zhang A (2002) FindOut: finding outliers in very large datasets. Knowl Inf Syst 4(4):387–412. https://doi.org/10.1007/s101150200013
Yu JX, Qian W, Lu H, Zhou A (2006) Finding centric local outliers in categorical/numerical spaces. Knowl Inf Syst 9(3):309–338. https://doi.org/10.1007/s10115-005-0197-6
Zhang GL, Lei JS (2011) Characteristics of data stream mining for frequent pattern based on landmark window. Comput Eng Appl 47(10):131–134. https://doi.org/10.3778/j.issn.1002-8331.2011.10.037
Zhang K, Hutter M, Jin HD (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 5476. Springer, Berlin, pp 813–822. https://doi.org/10.1007/978-3-642-01307-2_84
Zhang Y, Meratnia N, Havinga P (2007) A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets. Computer 49(3):355–363
Acknowledgements
We are grateful to the editors and anonymous reviewers for useful comments and suggestions. The authors also acknowledge the financial support of the National Natural Science Foundation (71932008, 71401188) and the Engineering Research Center of National Financial Security of Ministry of Education.
Funding
This work was supported by the National Natural Science Foundation [Grant Nos. 71932008, 71401188].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no financial interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, A., Xu, W., Liu, Z. et al. Improved incremental local outlier detection for data streams based on the landmark window model. Knowl Inf Syst 63, 2129–2155 (2021). https://doi.org/10.1007/s10115-021-01585-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01585-1