Abstract
Outlier or anomaly detection is an important branch of data analysis that becomes a crucial task in many application domains. Data objects which significantly dissimilar and inconsistent from the rest of the data objects are referred to as an outlier. In this paper, a new approach, called LDBAD (Local Density-Based Abnormal Detector), is proposed to discover useful irregular patterns hidden in the collected data sets. This method aims to find local abnormal data objects, which are characterized through three proposed measurements: local distance, local density, and Influenced outlierness degree. The performance of the proposed approach is evaluated on flow pattern experiments along a 180 degrees sharp bend channel with and without a T-shaped spur dike. Flow velocity components are collected using 3D velocimeter Vectrino. The analysis shows that the novel outlier detection method is effective and applicable to find outlier objects. Moreover, some feed-forward neural network velocity prediction models are created to demonstrate the necessity and advantages of outlier detection in flow pattern experiments. The results show that the accuracy of created models has been increased by removing outliers from the measurements.
Similar content being viewed by others
References
Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recognit 74:406–421. https://doi.org/10.1016/j.patcog.2017.09.037
Mahmoodi K, Ghassemi H (2018) Outlier detection in ocean wave measurements by using unsupervised data mining methods. Polish Marit Res 25:44–50. https://doi.org/10.2478/pomr-2018-0005
Tolvi J (2004) Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Comput 8:527–533. https://doi.org/10.1007/s00500-003-0310-2
Aljawarneh S, Aldwairi M, Yassein MB (2018) Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J Comput Sci 25:152–160. https://doi.org/10.1016/j.jocs.2017.03.006
Greggio N (2018) Anomaly Detection in IDSs by means of unsupervised greedy learning of finite mixture models. Soft Comput 22:3357–3372. https://doi.org/10.1007/s00500-017-2581-z
Hillerman T, Souza JCF, Reis ACB, Carvalho RN (2017) Applying clustering and AHP methods for evaluating suspect healthcare claims J. Comput Sci 19:97–111. https://doi.org/10.1016/j.jocs.2017.02.007
Bindu PV, Thilagam PS (2016) Mining social networks for anomalies: Methods and challenges. J Netw Comput Appl 68:213–229. https://doi.org/10.1016/j.jnca.2016.02.021
Theissler A (2017) Detecting known and unknown faults in automotive systems using ensemble-based anomaly detection. Knowl-Based Syst 123:163–173. https://doi.org/10.1016/j.knosys.2017.02.023
Zhang J, Gao Q, Wang H, Wang H (2011) Detecting anomalies from high-dimensional wireless network data streams: a case study. Soft Comput 15:1195–1215. https://doi.org/10.1007/s00500-010-0575-1
Ahmed M, Naser Mahmood A, Hu J (2016) A survey of network anomaly detection techniques. J Netw Comput Appl 60:19–31. https://doi.org/10.1016/j.jnca.2015.11.016
Vavilis S, Egner A, Petković M, Zannone N (2015) An anomaly analysis framework for database systems. Comput Secur 53:156–173. https://doi.org/10.1016/j.cose.2015.06.004
Mahmoodi K, Ghassemi H, Nowruzi H (2017) Data mining models to predict ocean wave energy flux in the absence of wave records. Sci J Marit Univ Szczecin-Zeszyty Nauk Akad Morskiej W Szczecinie 49:119–129. https://doi.org/10.17402/209
Lee JY, Jung SY, Kim PW (2018) Adaptive switching filter for impulse noise removal in digital content. Soft Comput 22:1445–1455. https://doi.org/10.1007/s00500-017-2843-9
Vaghefi M, Mahmoodi K, Akbari M (2018) A comparison among data mining algorithms for outlier detection using flow pattern experiments. Sci Iran. https://doi.org/10.24200/sci.2017.4182
Vaghefi M, Mahmoodi K, Akbari M (2019) Detection of outlier in 3D flow velocity collection in an open-channel bend using various data mining techniques. Iran J Sci Technol - Trans Civ Eng 43:197–214. https://doi.org/10.1007/s40996-018-0131-2
Vaghefi M, Mahmoodi K, Setayeshi S, Akbari M (2019) Application of artificial neural networks to predict flow velocity in a 180° sharp bend with and without a spur dike. Soft Comput. https://doi.org/10.1007/s00500-019-04413-5
Hong-yu H, Jia-xiang LIN, Chong-cheng C, Ming-hui FAN (2006) Review of outlier detection. J Res Comput Appl 8:8–13
Bettencourt SMA (2012) Outlier detection: applications and techniques. IJCSI Int J Comput Sci Issues 9:1694–1814
Wen W, Hao Z, Yang X (2010) Robust least squares support vector machine based on recursive outlier elimination. Soft Comput 14:1241–1251. https://doi.org/10.1007/s00500-009-0535-9
Wang H (2012) Data and Knowledge Engineering. 12–14
Aggarwal CC (2015) Outlier Analysis. Data Mining. Springer International Publishing, Cham, pp 237–263
Chen Y, Miao D, Zhang H (2010) Neighborhood outlier detection. Expert Syst Appl 37:8745–8749. https://doi.org/10.1016/j.eswa.2010.06.040
Amiri M, Bakhshandeh Amnieh H, Hasanipanah M, Mohammad Khanli L (2016) A new combination of artificial neural network and K-nearest neighbors models to predict blast-induced ground vibration and air-overpressure. Eng Comput 32:631–644. https://doi.org/10.1007/s00366-016-0442-5
Aggarwal CC (2013) Outlier analysis. Outlier Anal 9781461463:1–446. https://doi.org/10.1007/978-1-4614-6396-2
Rehm F, Klawonn F, Kruse R (2007) A novel approach to noise clustering for outlier detection. Soft Comput 11:489–494. https://doi.org/10.1007/s00500-006-0112-4
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:1–58. https://doi.org/10.1145/1541880.1541882
Breuniq MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. SIGMOD Rec (ACM Spec Interes Gr Manag Data) 29:93–104
Zhang Y, Yang S, Wang Y (2008) LDBOD: a novel local distribution based outlier detector. Pattern Recognit Lett 29:967–976. https://doi.org/10.1016/j.patrec.2008.01.019
Goring DG, Nikora VI (2002) Despiking acoustic doppler velocimeter data. J Hydraul Eng 128:117–126. https://doi.org/10.1061/(asce)0733-9429(2002)128:1(117)
Nie X, Zhou J, Long X (2013) Velocity correction of the Janus configuration laser Doppler velocimeter. Meas J Int Meas Confed 46:938–941. https://doi.org/10.1016/j.measurement.2012.10.029
Zhou J, Nie X, Long X (2014) Research on speckle noise of laser Doppler velocimeter for the vehicle self-contained navigation. Optik (Stuttg) 125:5878–5883. https://doi.org/10.1016/j.ijleo.2014.07.048
Duncan J, Dabiri D, Hove J, Gharib M (2010) Universal outlier detection for particle image velocimetry (PIV) and particle tracking velocimetry (PTV) data. Meas Sci Technol 21:057002. https://doi.org/10.1088/0957-0233/21/5/057002
Westerweel J, Scarano F (2005) Universal outlier detection for PIV data. Exp Fluids 39:1096–1100. https://doi.org/10.1007/s00348-005-0016-6
Matlab Help Document. http://www.mathworks.com
Korn F, Muthukrishnan S (2005) Influence sets based on reverse nearest neighbor queries. ACM SIGMOD Rec 29:201–212. https://doi.org/10.1145/335191.335415
Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 3918 LNAI:577–593. https://doi.org/10.1007/11731139_68
Nikoo M, Zarfam P, Sayahpour H (2013) Determination of compressive strength of concrete using self organization feature map (SOFM). Eng Comput 31:113–121. https://doi.org/10.1007/s00366-013-0334-x
Yan X (2011) Multivariate outlier detection based on self-organizing map and adaptive nonlinear map and its application. Chemom Intell Lab Syst 107:251–257. https://doi.org/10.1016/j.chemolab.2011.04.007
Alan O, Catal C (2011) Thresholds based outlier detection approach for mining class outliers: an empirical case study on software measurement datasets. Expert Syst Appl 38:3440–3445. https://doi.org/10.1016/j.eswa.2010.08.130
Łuczak M (2016) Hierarchical clustering of time series data with parametric derivative dynamic time warping. Expert Syst Appl 62:116–130. https://doi.org/10.1016/j.eswa.2016.06.012
Wang P, Shi H, Yang X, Mi J (2019) Three-way k-means: integrating k-means and three-way decision. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0901-y
Sheykhi H, Bagherpour R, Ghasemi E, Kalhori H (2018) Forecasting ground vibration due to rock blasting: a hybrid intelligent approach using support vector regression and fuzzy C-means clustering. Eng Comput 34:357–365. https://doi.org/10.1007/s00366-017-0546-6
Agyemang M, Ezeife CI (2004) LSC mine: algorithm for mining local outliers. In: 15th Information Resources Management Association. New Orleans, USA, pp 23–26. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.8281&rep=rep1&type=pdf
Mahmoodi K, Ghassemi H, Razminia A (2019) Temporal and spatial characteristics of wave energy in the Persian Gulf based on the ERA5 reanalysis dataset. Energy 187:115991. https://doi.org/10.1016/j.energy.2019.115991
Mahmoodi K, Ghassemi H, Razminia A (2020) Wind energy potential assessment in the Persian Gulf: a spatial and temporal analysis. Ocean Eng 216:107674. https://doi.org/10.1016/j.oceaneng.2020.107674
Mahmoodi K, Ghassemi H, Nowruzi H, Shora MM (2018) Prediction of the hydrodynamic performance and cavitation volume of the marine propeller using gene expression programming. Ships Offshore Struct. https://doi.org/10.1080/17445302.2018.1557589
Acknowledgements
The data sets used in this research were obtained during the Master science thesis by Maryam Akbari, Persian Gulf University of technology. The authors thank for her efforts in data collection in experimental work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mahmoodi, K., Ketabdari, M.J. & Vaghefi, M. Proposing a new local density estimation outlier detection algorithm: an empirical case study on flow pattern experiments. Pattern Anal Applic 24, 1859–1872 (2021). https://doi.org/10.1007/s10044-021-01019-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-01019-2