Impact of the Neighborhood Parameter on Outlier Detection Algorithms

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15268))

Included in the following conference series:

International Conference on Similarity Search and Applications

216 Accesses

Abstract

We study the impact and stability of the neighborhood parameter for a selection of popular outlier detection algorithms: kNN, LOF, ABOD, LoOP and SDO. We conduct a sensitivity analysis with data undergoing controlled changes related to: cardinality, dimensionality, global outliers ratio, local outliers ratio, layers of density, density differences between inliers and outliers, and zonification. Experiments reveal how each type of data variation affects algorithms differently in terms of accuracy and runtimes, and discloses the performance dependence on the neighborhood parameter. This serves not only to know how to select its value, but also for assessing accuracy robustness against common data phenomena, as well as algorithms’ tolerance to adjustment variations. kNN, ABOD and SDO stand out, with kNN being the most accurate, ABOD the most suitable for both global and local outliers at the same time, and SDO the most stable in the parameterization. The findings of this work are key to understanding the intrinsic behavior of algorithms based on distance and density estimations, which remain the most efficient and reliable in anomaly detection applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SDROF: outlier detection algorithm based on relative skewness density ratio outlier factor

Article 02 December 2024

ARDOD: adaptive radius density-based outlier detection

Article 22 June 2024

Accelerating LOF Outlier Detection Approach

Notes

1.
For simplicity, hereafter we refer to the elements of a dataset simply as data points.
2.
Packages: https://github.com/yzhao062/pyod [19], https://github.com/vc1492a/PyNomaly and https://github.com/CN-TU/pysdoclust/tree/main.
3.
We set a maximum value of 50 to ensure a lightweight total experimental time.
4.
We remind that zonification in our tests implies a larger number of distinct densities.

References

Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: ACM SIGMOD, pp. 93–104 (2000). https://doi.org/10.1145/335191.335388
Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8
Article MathSciNet Google Scholar
Ghosh, A.K.: On optimum choice of $k$ in nearest neighbor classification. Comput. Stat. Data Anal. 50(11), 3113–3123 (2006). https://doi.org/10.1016/j.csda.2005.06.007
Article MathSciNet Google Scholar
Hall, P., Park, B.U., Samworth, R.J.: Choice of neighbor order in nearest-neighbor classification. Ann. Stat. 36(5), 2135–2152 (2008). https://doi.org/10.1214/07-AOS537
Article MathSciNet Google Scholar
Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: Anomaly detection benchmark. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) NeurIPS 2022. Curran Assoc., Inc. (2022)
Google Scholar
Iglesias, F.: Analysis of the neighborhood parameter on outlier detection algorithms—evaluation tests (2024). https://doi.org/10.48436/xvy1m-jwg83
Iglesias, F., Zseby, T., Ferreira, D., Zimek, A.: MDCGen: multidimensional dataset generator for clustering. J. Classif. 36(3), 599–618 (2019)
Article MathSciNet Google Scholar
Iglesias Vázquez, F., Hartl, A., Zseby, T., Zimek, A.: Anomaly detection in streaming data: A comparison and evaluation study. Expert Syst. Appl. 233(C) (2023). https://doi.org/10.1016/j.eswa.2023.120994
Iglesias Vázquez, F., Zseby, T., Zimek, A.: Outlier detection based on low density models. In: IEEE International Conference on Data Mining Workshops, pp. 970–979 (2018)
Google Scholar
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: ACM CIKM, pp. 1649–1652 (2009).https://doi.org/10.1145/1645953.1646195
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: SIAM International Conference on Data Mining (SDM’11), pp. 13–24 (2011)
Google Scholar
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: ACM SIGKDD KDD, pp. 444–452 (2008)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008). https://doi.org/10.1109/ICDM.2008.17
Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36(3), 1049–1051 (1965). https://doi.org/10.1214/aoms/1177700079
Article MathSciNet Google Scholar
Nassif, A.B., Talib, M.A., Nasir, Q., Dakalbab, F.M.: Machine learning for anomaly detection: a systematic review. IEEE Access 9, 78658–78700 (2021). https://doi.org/10.1109/ACCESS.2021.3083060
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comp. Surveys 54(2) (2021).https://doi.org/10.1145/3439950
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. SIGMOD Rec. 29(2), 427–438 (2000)
Google Scholar
Yang, J., Tan, X., Rahardja, S.: Outlier detection: How to select $k$ for $k$-nearest-neighbors-based outlier detectors. Pattern Recogn. Lett. 174(C), 112–117 (2023). https://doi.org/10.1016/j.patrec.2023.08.020
Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019). http://jmlr.org/papers/v20/19-011.html
Zimek, A., Gaudet, M., Campello, R.J., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: ACM SIGKDD KDD, pp. 428–436 (2013)
Google Scholar

Download references

Acknowledgments.

This work has been partially supported by funds from the MOTION Project (Project PID2020-112581GB-C21) of the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033, and the JUNON “Ambition Research Development Centre-Val de Loire” (ARD CVL) program.

Author information

Authors and Affiliations

Institute of Telecommunications, TU Wien, Vienna, Austria
Félix Iglesias & Tanja Zseby
Le Studium Loire Valley Institute for Advanced Studies, Orléans, France
Félix Iglesias
Department of Computer Science, Universitat Politècnica de Catalunya, Catalunya, Spain
Conrado Martínez

Authors

Félix Iglesias
View author publications
You can also search for this author in PubMed Google Scholar
Conrado Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Tanja Zseby
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Félix Iglesias .

Editor information

Editors and Affiliations

Center for Scientific Research and Higher Education at Ensenada, Ensenada, Mexico
Edgar Chávez
Brown University, Providence, RI, USA
Benjamin Kimia
Charles University, Prague, Czech Republic
Jakub Lokoč
University of Bologna, Bologna, Italy
Marco Patella
Masaryk University, Brno, Czech Republic
Jan Sedmidubsky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iglesias, F., Martínez, C., Zseby, T. (2025). Impact of the Neighborhood Parameter on Outlier Detection Algorithms. In: Chávez, E., Kimia, B., Lokoč, J., Patella, M., Sedmidubsky, J. (eds) Similarity Search and Applications. SISAP 2024. Lecture Notes in Computer Science, vol 15268. Springer, Cham. https://doi.org/10.1007/978-3-031-75823-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-75823-2_8
Published: 25 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75822-5
Online ISBN: 978-3-031-75823-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics