Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Improved incremental local outlier detection for data streams based on the landmark window model

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Most existing algorithms of anomaly detection are suitable for static data where all data are available during detection but are incapable of handling dynamic data streams. In this study, we proposed an improved iLOF (incremental local outlier factor) algorithm based on the landmark window model, which provides an efficient method for anomaly detection in data streams and outperforms conventional methods. What is more, data windows as updating units are introduced to reduce the false alarm rate, and multiple tests are taken here to identify candidate anomalies and real anomalies. The improved iLOF shows its obvious advantage with its false positive rate. Furthermore, the proposed algorithm instantly deletes data points of identified real anomalies. We analyzed the performance of the improved algorithm and the sensitivity of certain parameters via empirical experiments using synthetic and real data sets. The experimental results demonstrate that the proposed improved algorithm achieved better performance on the higher detection rate and the lower false alarm rate compared with the original iLOF algorithm and its improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The experiment was done on a Core i5-4200 M CPU 2.50 GHz running Windows10.

  2. http://www.cs.umn.edu/~aleks/inclof.

  3. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

  4. http://kdd.ics.uci.edu/databases/covertype/covertype.html.

References

  1. Aggarwal CC (2015) Outlier analysis. Springer, Switzerland. https://doi.org/10.1007/978-3-319-14142-8_8

    Book  MATH  Google Scholar 

  2. Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. ACM SIGMOD Rec 30(2):37–46. https://doi.org/10.1145/376284.375668

    Article  Google Scholar 

  3. Ahn J, Lee MH, Lee JA (2019) Distance-based outlier detection for high dimension, low sample size data. J Appl Stat 46(1):13–29. https://doi.org/10.1080/02664763.2018.1452901

    Article  MathSciNet  Google Scholar 

  4. Andrade T, Gama J, Ribeiro RP, Sousa W, Carvalho A (2019) Anomaly detection in sequential data: principles and case studies. Wiley Encycl Electr Electron Eng. https://doi.org/10.1002/047134608X.W8382

    Article  Google Scholar 

  5. Billor N, Hadi AS, Velleman PF (2000) Bacon: blocked adaptive computationally efficient outlier nominators. Comput Stats Data Anal 34(3):279–298. https://doi.org/10.1016/S0167-9473(99)00101-2

    Article  MATH  Google Scholar 

  6. Blaiotta C (2019) Learning generative socially aware models of pedestrian motion. IEEE Robot Autom Lett 4(4):3433–3440. https://doi.org/10.1109/LRA.2019.2928202

    Article  Google Scholar 

  7. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density based local outliers. In: Proceedings of ACM SIGMOD 2000 international conference on management of data 29(2): 1–12. https://doi.org/10.1145/342009.335388

  8. Cao L, Yang D, Wang Q, Yu Y, Wang J, Rundensteiner, EA (2014) Scalable distance-based outlier detection over high-volume data streams. In: IEEE 30th international conference on data engineering, Chicago, IL, USA, pp 76–87. https://doi.org/10.1109/ICDE.2014.6816641

  9. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58. https://doi.org/10.1145/1541880.1541882

    Article  Google Scholar 

  10. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357. https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  11. Dua D, Graff C (2019) UCI machine learning repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science

  12. Eskin E, Arnold A, Prerau M, Portnoy L, Stolfo S (2002) A geometric framework for unsupervised anomaly detection. In: Barbará D, Jajodia S (eds) Applications of data mining in computer security. Advances in information security. Springer, Boston, pp 77–101. https://doi.org/10.1007/978-1-4615-0953-0_4

    Chapter  Google Scholar 

  13. Esmaeili M, Almadan A (2011) Stream data mining and anomaly detection. Int J Comput Appl 34(9):38–41

    Google Scholar 

  14. Guigou F, Collet P, Parrend P (2017) Anomaly detection and motif discovery in symbolic representations of time series, https://doi.org/10.13140/RG.2.2.20158.69447

  15. Gupta M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267. https://doi.org/10.1109/TKDE.2013.184

    Article  MATH  Google Scholar 

  16. Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data warehousing and knowledge discovery. Lecture notes in computer science, vol 2454. Springer, Berlin, pp 170–180. https://doi.org/10.1007/3-540-46145-0_17

    Chapter  Google Scholar 

  17. He ZY, Xu XF, Deng SC (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650. https://doi.org/10.1016/S0167-8655(03)00003-5

    Article  MATH  Google Scholar 

  18. Huerta R, Mosqueiro T, Fonollosa J, Rulkov FN, Rodriguez-Lujan I (2016) Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemom Intell Lab Syst 157:169–176. https://doi.org/10.1016/j.chemolab.2016.07.004

    Article  Google Scholar 

  19. Karimian SH, Kelarestaghi M, Hashemi S (2012) I-IncLOF: improved incremental local outlier detection for data streams. In: Proceedings of the 16th CSI international symposium on artificial intelligence and signal. Shiraz, Fars, pp 023–028. https://doi.org/10.1109/AISP.2012.6313711

  20. Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large data sets. VLDB, Morgan Kaufmann, Burlington, pp 392–403

    Google Scholar 

  21. Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas K, Manolopoulos Y (2011) Continuous monitoring of distance-based outliers over data streams. In: IEEE 27th international conference on data engineering, Hannover, Germany, pp 135–146. https://doi.org/10.1109/ICDE.2011.5767923

  22. Kriegel HP, Kroger P, Schubert E, Zimek A (2009) LoOP: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management. Hong Kong, China, pp 1649–1652. https://doi.org/10.1145/1645953.1646195

  23. Latecki LJ, Miezianko R, Megalooikonomou V, Pokrajac D (2006) Using spatiotemporal blocks to reduce the uncertainty in detecting and tracking moving objects in video. Int J Intell Syst Technol Appl 1(3/4):376–392. https://doi.org/10.1504/IJISTA.2006.009914

    Article  Google Scholar 

  24. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, United States, pp 157–166. https://doi.org/10.1145/1081870.1081891

  25. Lee J, Kang B, Kang SH (2011) Integrating independent component analysis and local outlier factor for plant-wide process monitoring. J Process Control 21(7):1011–1021. https://doi.org/10.1016/j.jprocont.2011.06.004

    Article  Google Scholar 

  26. Liu F, Qi JP, Yu YW, Cao L, Zhao JD (2019) A fast algorithm for density-based top-n local outlier detection. ACTA Autom Sin 45(09):1756–1771. https://doi.org/10.16383/j.aas.c180425

    Article  Google Scholar 

  27. Liu J, Deng HF (2013) Outlier detection on uncertain data based on local information. Knowl Based Syst 51:60–71. https://doi.org/10.1016/j.knosys.2013.07.005

    Article  Google Scholar 

  28. Liu X, Guan J, Hu P (2009) Mining frequent closed itemsets from a landmark window over online data streams. Comput Math Appl 57(6):927–936. https://doi.org/10.1016/j.camwa.2008.10.060

    Article  MATH  Google Scholar 

  29. Medioni G, Cohen I, Hongeng S, Bremond F, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 23(8):873–889. https://doi.org/10.1109/34.946990

    Article  Google Scholar 

  30. Paulauskas N, Bagdonas AF (2015) Local outlier factor use for the network flow anomaly detection. Secur Commun Networks 8(18):4203–4212. https://doi.org/10.1002/sec.1335

    Article  Google Scholar 

  31. Pokrajac D, Lazarevic A, Latecki LJ (2007) Incremental local outlier detection for data streams. In: IEEE symposium on computational intelligence and data mining. Honolulu, HI, pp 504–515. https://doi.org/10.1109/CIDM.2007.368917

  32. Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231. https://doi.org/10.1023/A:1007601015854

    Article  MATH  Google Scholar 

  33. Ribeiro RP, Pereira P, Gama J (2016) Sequential anomalies: a study in the Railway Industry. Mach Learn 105(1):127–153. https://doi.org/10.1007/s10994-016-5584-6

    Article  MathSciNet  Google Scholar 

  34. Salehi M, Leckie C, Bezdek JC, Vaithianathan T, Zhang X (2016) Fast memory efficient local outlier detection in data streams. IEEE Trans Knowl Data Eng 28(12):3246–3260. https://doi.org/10.1109/TKDE.2016.2597833

    Article  Google Scholar 

  35. Salehi M, Leckie CA, Moshtaghi M, Vaithianathan T (2014) A relevance weighted ensemble model for anomaly detection in switching data streams. In: Tseng VS, Ho TB, Zhou ZH, Chen ALP, Kao HY (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 8444. Springer, Cham, pp 461–473. https://doi.org/10.1007/978-3-319-06605-9_38

    Chapter  Google Scholar 

  36. Schubert E, Zimek A, Kriegel HP (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc 28(1):190–237. https://doi.org/10.1007/s10618-012-0300-z

    Article  MathSciNet  MATH  Google Scholar 

  37. Sun P, Chawla S (2004) On local spatial outliers. In: International conference on data mining. Brighton, UK, pp 209–216. https://doi.org/10.1109/ICDM.2004.10097

  38. Tang J, Chen Z, Fu A, Cheung D (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Chen MS, Yu PS, Liu B (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 2336. Springer, Berlin, pp 535–548. https://doi.org/10.1007/3-540-47887-6_53

    Chapter  Google Scholar 

  39. Xu J, Sung AH, Liu Q (2007) Behaviour mining for fraud detection. J Res Pract Inf 39(1):3–18. https://doi.org/10.1007/s10851-006-9000-x

    Article  Google Scholar 

  40. Yamanishi K, Takeuchi J, Williams G, Milne P (2000) Online unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Disc 8:275–300. https://doi.org/10.1023/B:DAMI.0000023676.72185.7c

    Article  Google Scholar 

  41. Yu D, Sheikholeslami G, Zhang A (2002) FindOut: finding outliers in very large datasets. Knowl Inf Syst 4(4):387–412. https://doi.org/10.1007/s101150200013

    Article  Google Scholar 

  42. Yu JX, Qian W, Lu H, Zhou A (2006) Finding centric local outliers in categorical/numerical spaces. Knowl Inf Syst 9(3):309–338. https://doi.org/10.1007/s10115-005-0197-6

    Article  Google Scholar 

  43. Zhang GL, Lei JS (2011) Characteristics of data stream mining for frequent pattern based on landmark window. Comput Eng Appl 47(10):131–134. https://doi.org/10.3778/j.issn.1002-8331.2011.10.037

    Article  Google Scholar 

  44. Zhang K, Hutter M, Jin HD (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 5476. Springer, Berlin, pp 813–822. https://doi.org/10.1007/978-3-642-01307-2_84

    Chapter  Google Scholar 

  45. Zhang Y, Meratnia N, Havinga P (2007) A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets. Computer 49(3):355–363

    Google Scholar 

Download references

Acknowledgements

We are grateful to the editors and anonymous reviewers for useful comments and suggestions. The authors also acknowledge the financial support of the National Natural Science Foundation (71932008, 71401188) and the Engineering Research Center of National Financial Security of Ministry of Education.

Funding

This work was supported by the National Natural Science Foundation [Grant Nos. 71932008, 71401188].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weijia Xu.

Ethics declarations

Conflict of interest

The authors declare they have no financial interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, A., Xu, W., Liu, Z. et al. Improved incremental local outlier detection for data streams based on the landmark window model. Knowl Inf Syst 63, 2129–2155 (2021). https://doi.org/10.1007/s10115-021-01585-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01585-1

Keywords

Navigation