Abstract
We study the impact and stability of the neighborhood parameter for a selection of popular outlier detection algorithms: kNN, LOF, ABOD, LoOP and SDO. We conduct a sensitivity analysis with data undergoing controlled changes related to: cardinality, dimensionality, global outliers ratio, local outliers ratio, layers of density, density differences between inliers and outliers, and zonification. Experiments reveal how each type of data variation affects algorithms differently in terms of accuracy and runtimes, and discloses the performance dependence on the neighborhood parameter. This serves not only to know how to select its value, but also for assessing accuracy robustness against common data phenomena, as well as algorithms’ tolerance to adjustment variations. kNN, ABOD and SDO stand out, with kNN being the most accurate, ABOD the most suitable for both global and local outliers at the same time, and SDO the most stable in the parameterization. The findings of this work are key to understanding the intrinsic behavior of algorithms based on distance and density estimations, which remain the most efficient and reliable in anomaly detection applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For simplicity, hereafter we refer to the elements of a dataset simply as data points.
- 2.
- 3.
We set a maximum value of 50 to ensure a lightweight total experimental time.
- 4.
We remind that zonification in our tests implies a larger number of distinct densities.
References
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: ACM SIGMOD, pp. 93–104 (2000). https://doi.org/10.1145/335191.335388
Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8
Ghosh, A.K.: On optimum choice of \(k\) in nearest neighbor classification. Comput. Stat. Data Anal. 50(11), 3113–3123 (2006). https://doi.org/10.1016/j.csda.2005.06.007
Hall, P., Park, B.U., Samworth, R.J.: Choice of neighbor order in nearest-neighbor classification. Ann. Stat. 36(5), 2135–2152 (2008). https://doi.org/10.1214/07-AOS537
Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: Anomaly detection benchmark. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) NeurIPS 2022. Curran Assoc., Inc. (2022)
Iglesias, F.: Analysis of the neighborhood parameter on outlier detection algorithms—evaluation tests (2024). https://doi.org/10.48436/xvy1m-jwg83
Iglesias, F., Zseby, T., Ferreira, D., Zimek, A.: MDCGen: multidimensional dataset generator for clustering. J. Classif. 36(3), 599–618 (2019)
Iglesias Vázquez, F., Hartl, A., Zseby, T., Zimek, A.: Anomaly detection in streaming data: A comparison and evaluation study. Expert Syst. Appl. 233(C) (2023). https://doi.org/10.1016/j.eswa.2023.120994
Iglesias Vázquez, F., Zseby, T., Zimek, A.: Outlier detection based on low density models. In: IEEE International Conference on Data Mining Workshops, pp. 970–979 (2018)
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: LoOP: local outlier probabilities. In: ACM CIKM, pp. 1649–1652 (2009).https://doi.org/10.1145/1645953.1646195
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: SIAM International Conference on Data Mining (SDM’11), pp. 13–24 (2011)
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: ACM SIGKDD KDD, pp. 444–452 (2008)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008). https://doi.org/10.1109/ICDM.2008.17
Loftsgaarden, D.O., Quesenberry, C.P.: A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 36(3), 1049–1051 (1965). https://doi.org/10.1214/aoms/1177700079
Nassif, A.B., Talib, M.A., Nasir, Q., Dakalbab, F.M.: Machine learning for anomaly detection: a systematic review. IEEE Access 9, 78658–78700 (2021). https://doi.org/10.1109/ACCESS.2021.3083060
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comp. Surveys 54(2) (2021).https://doi.org/10.1145/3439950
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. SIGMOD Rec. 29(2), 427–438 (2000)
Yang, J., Tan, X., Rahardja, S.: Outlier detection: How to select \(k\) for \(k\)-nearest-neighbors-based outlier detectors. Pattern Recogn. Lett. 174(C), 112–117 (2023). https://doi.org/10.1016/j.patrec.2023.08.020
Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019). http://jmlr.org/papers/v20/19-011.html
Zimek, A., Gaudet, M., Campello, R.J., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: ACM SIGKDD KDD, pp. 428–436 (2013)
Acknowledgments.
This work has been partially supported by funds from the MOTION Project (Project PID2020-112581GB-C21) of the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033, and the JUNON “Ambition Research Development Centre-Val de Loire” (ARD CVL) program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Iglesias, F., Martínez, C., Zseby, T. (2025). Impact of the Neighborhood Parameter on Outlier Detection Algorithms. In: Chávez, E., Kimia, B., Lokoč, J., Patella, M., Sedmidubsky, J. (eds) Similarity Search and Applications. SISAP 2024. Lecture Notes in Computer Science, vol 15268. Springer, Cham. https://doi.org/10.1007/978-3-031-75823-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-75823-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75822-5
Online ISBN: 978-3-031-75823-2
eBook Packages: Computer ScienceComputer Science (R0)