Nothing Special   »   [go: up one dir, main page]

Skip to main content

Impact of the Neighborhood Parameter on Outlier Detection Algorithms

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2024)

Abstract

We study the impact and stability of the neighborhood parameter for a selection of popular outlier detection algorithms: kNN, LOF, ABOD, LoOP and SDO. We conduct a sensitivity analysis with data undergoing controlled changes related to: cardinality, dimensionality, global outliers ratio, local outliers ratio, layers of density, density differences between inliers and outliers, and zonification. Experiments reveal how each type of data variation affects algorithms differently in terms of accuracy and runtimes, and discloses the performance dependence on the neighborhood parameter. This serves not only to know how to select its value, but also for assessing accuracy robustness against common data phenomena, as well as algorithms’ tolerance to adjustment variations. kNN, ABOD and SDO stand out, with kNN being the most accurate, ABOD the most suitable for both global and local outliers at the same time, and SDO the most stable in the parameterization. The findings of this work are key to understanding the intrinsic behavior of algorithms based on distance and density estimations, which remain the most efficient and reliable in anomaly detection applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For simplicity, hereafter we refer to the elements of a dataset simply as data points.

  2. 2.

    Packages: https://github.com/yzhao062/pyod [19], https://github.com/vc1492a/PyNomaly and https://github.com/CN-TU/pysdoclust/tree/main.

  3. 3.

    We set a maximum value of 50 to ensure a lightweight total experimental time.

  4. 4.

    We remind that zonification in our tests implies a larger number of distinct densities.

References

  1. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: ACM SIGMOD, pp. 93–104 (2000). https://doi.org/10.1145/335191.335388

  2. Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8

    Article  MathSciNet  Google Scholar 

  3. Ghosh, A.K.: On optimum choice of \(k\) in nearest neighbor classification. Comput. Stat. Data Anal. 50(11), 3113–3123 (2006). https://doi.org/10.1016/j.csda.2005.06.007

    Article  MathSciNet  Google Scholar 

  4. Hall, P., Park, B.U., Samworth, R.J.: Choice of neighbor order in nearest-neighbor classification. Ann. Stat. 36(5), 2135–2152 (2008). https://doi.org/10.1214/07-AOS537

    Article  MathSciNet  Google Scholar 

  5. Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: Anomaly detection benchmark. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) NeurIPS 2022. Curran Assoc., Inc. (2022)

    Google Scholar 

  6. Iglesias, F.: Analysis of the neighborhood parameter on outlier detection algorithms—evaluation tests (2024). https://doi.org/10.48436/xvy1m-jwg83

  7. Iglesias, F., Zseby, T., Ferreira, D., Zimek, A.: MDCGen: multidimensional dataset generator for clustering. J. Classif. 36(3), 599–618 (2019)

    Article  MathSciNet  Google Scholar 

  8. Iglesias Vázquez, F., Hartl, A., Zseby, T., Zimek, A.: Anomaly detection in streaming data: A comparison and evaluation study. Expert Syst. Appl. 233(C) (2023). https://doi.org/10.1016/j.eswa.2023.120994

  9. Iglesias Vázquez, F., Zseby, T., Zimek, A.: Outlier detection based on low density models. In: IEEE International Conference on Data Mining Workshops, pp. 970–979 (2018)

    Google Scholar 

  10. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: ACM CIKM, pp. 1649–1652 (2009).https://doi.org/10.1145/1645953.1646195

  11. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: SIAM International Conference on Data Mining (SDM’11), pp. 13–24 (2011)

    Google Scholar 

  12. Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: ACM SIGKDD KDD, pp. 444–452 (2008)

    Google Scholar 

  13. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008). https://doi.org/10.1109/ICDM.2008.17

  14. Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36(3), 1049–1051 (1965). https://doi.org/10.1214/aoms/1177700079

    Article  MathSciNet  Google Scholar 

  15. Nassif, A.B., Talib, M.A., Nasir, Q., Dakalbab, F.M.: Machine learning for anomaly detection: a systematic review. IEEE Access 9, 78658–78700 (2021). https://doi.org/10.1109/ACCESS.2021.3083060

  16. Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comp. Surveys 54(2) (2021).https://doi.org/10.1145/3439950

  17. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. SIGMOD Rec. 29(2), 427–438 (2000)

    Google Scholar 

  18. Yang, J., Tan, X., Rahardja, S.: Outlier detection: How to select \(k\) for \(k\)-nearest-neighbors-based outlier detectors. Pattern Recogn. Lett. 174(C), 112–117 (2023). https://doi.org/10.1016/j.patrec.2023.08.020

  19. Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019). http://jmlr.org/papers/v20/19-011.html

  20. Zimek, A., Gaudet, M., Campello, R.J., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: ACM SIGKDD KDD, pp. 428–436 (2013)

    Google Scholar 

Download references

Acknowledgments.

This work has been partially supported by funds from the MOTION Project (Project PID2020-112581GB-C21) of the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033, and the JUNON “Ambition Research Development Centre-Val de Loire” (ARD CVL) program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Félix Iglesias .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Iglesias, F., Martínez, C., Zseby, T. (2025). Impact of the Neighborhood Parameter on Outlier Detection Algorithms. In: Chávez, E., Kimia, B., Lokoč, J., Patella, M., Sedmidubsky, J. (eds) Similarity Search and Applications. SISAP 2024. Lecture Notes in Computer Science, vol 15268. Springer, Cham. https://doi.org/10.1007/978-3-031-75823-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-75823-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-75822-5

  • Online ISBN: 978-3-031-75823-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics