Density Biased Sampling with Locality Sensitive Hashing for Outlier Detection

Xuyun Zhang¹⁸,
Mahsa Salehi¹⁹,
Christopher Leckie²⁰,
Yun Luo²¹,
Qiang He²²,
Rui Zhou²² &
…
Rao Kotagiri²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11234))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1541 Accesses
2 Citations

Abstract

Outlier or anomaly detection is one of the major challenges in big data analytics since unusual but insightful patterns are often hidden in massive data sets such as sensing data and social networks. Sampling techniques have been a focus for outlier detection to address scalability on big data. The recent study has shown uniform random sampling with ensemble can boost outlier detection performance. However, uniform sampling assumes that all points are of equal importance, which usually fails to hold for outlier detection because some points are more sensitive to sampling than others. Thus, it is necessary and promising to utilise the density information of points to reflect their importance for sampling based detection. In this paper, we formally investigate density biased sampling for outlier detection, and propose a novel density biased sampling approach. To attain scalable density estimation, we use Locality Sensitive Hashing (LSH) for counting the nearest neighbours of a point. Extensive experiments on both synthetic and real-world data sets show that our approach significantly outperforms existing outlier detection methods based on uniform sampling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient and Intelligent Density and Delta-Distance Clustering Algorithm

Article 08 January 2018

Entropy-based outlier detection using spark

Article 16 April 2019

Extreme-Centroid Tree for Outlier Detection

Notes

1.
https://archive.ics.uci.edu/ml/datasets.html.

References

Aggarwal, C.C.: Outlier ensembles: position paper. ACM SIGKDD Explor. Newsl. 14(2), 49–58 (2013)
Article Google Scholar
Aggarwal, C.C., Sathe, S.: Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor. Newsl. 17(1), 24–47 (2015)
Article Google Scholar
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468 (2006)
Google Scholar
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45681-3_2
Chapter Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. ACM SIGMOD Rec 29(2), 93–104 (2000)
Article Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Article Google Scholar
Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM, pp. 669–678 (2008)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Fu, P., Hu, X.: Biased-sampling of density-based local outlier detection algorithm. In: ICNC-FSKD, pp. 1246–1253 (2016)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)
Google Scholar
Jones, M.: Kumaraswamy’s distribution: a beta-type distribution with some tractability advantages. Stat. Methodol. 6(1), 70–81 (2009)
Article MathSciNet Google Scholar
Knox, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, pp. 392–403 (1998)
Google Scholar
Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng. 15(5), 1170–1187 (2003)
Article Google Scholar
Kriegel, H.P., Zimek, A., et al.: Angle-based outlier detection in high-dimensional data. In: ACM SIGKDD, pp. 444–452 (2008)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.-H.: On detecting clustered anomalies using SCiForest. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6322, pp. 274–290. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15883-4_18
Chapter Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6(1), 3 (2012)
Article Google Scholar
Luo, C., Shrivastava, A.: Arrays of (locality-sensitive) count estimators (ACE): anomaly detection on the edge. In: WWW, pp. 1439–1448 (2018)
Google Scholar
Nanopoulos, A., Manolopoulos, Y., Theodoridis, Y.: An efficient and effective algorithm for density biased sampling. In: CIKM, pp. 398–404 (2002)
Google Scholar
Pang, G., Cao, L., Chen, L., Lian, D., Liu, H.: Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data. In: AAAI (2018)
Google Scholar
Pillutla, M.R., Raval, N., Bansal, P., Srinathan, K., Jawahar, C.: LSH based outlier detection and its application in distributed setting. In: CIKM, pp. 2289–2292 (2011)
Google Scholar
Rayana, S., Zhong, W., Akoglu, L.: Sequential ensemble learning for outlier detection: a bias-variance perspective. In: ICDM, pp. 1167–1172 (2016)
Google Scholar
Schubert, E.: Generalized and efficient outlier detection for spatial, temporal, and high-dimensional data mining. Ph.D. thesis (2013)
Google Scholar
Sugiyama, M., Borgwardt, K.: Rapid distance-based outlier detection via sampling. In: NIPS, pp. 467–475 (2013)
Google Scholar
Wang, Y., Parthasarathy, S., Tatikonda, S.: Locality sensitive outlier detection: a ranking driven approach. In: ICDE, pp. 410–421 (2011)
Google Scholar
Wu, M., Jermaine, C.: Outlier detection by sampling with accuracy guarantees. In: ACM SIGKDD, pp. 767–772 (2006)
Google Scholar
Yang, X., Latecki, L.J., Pokrajac, D.: Outlier detection with globally optimal exemplar-based GMM. In: SDM, pp. 145–154 (2009)
Chapter Google Scholar
Zhang, X., et al.: LSHiForest: a generic framework for fast tree isolation based ensemble anomaly analysis. In: ICDE, pp. 983–994 (2017)
Google Scholar
Zimek, A., Campello, R.J., Sander, J.: Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explor. Newsl. 15(1), 11–22 (2014)
Article Google Scholar
Zimek, A., Gaudet, M., Campello, R.J., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: ACM SIGKDD, pp. 428–436 (2013)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the New Zealand Marsden Fund under Grant No. 17-UOA-248, the UoA FRDF under Grant No. 3714668, and the NJU Overseas Open fund under Grant No. KFKT2018A12.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Auckland, Auckland, New Zealand
Xuyun Zhang
Faculty of Information Technology, Monash University, Melbourne, Australia
Mahsa Salehi
Department of Computing and Information Systems, University of Melbourne, Melbourne, Australia
Christopher Leckie & Rao Kotagiri
Faculty of Computer Science and Technology, Guizhou University, Guiyang, China
Yun Luo
School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Australia
Qiang He & Rui Zhou

Authors

Xuyun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mahsa Salehi
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Leckie
View author publications
You can also search for this author in PubMed Google Scholar
Yun Luo
View author publications
You can also search for this author in PubMed Google Scholar
Qiang He
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Rao Kotagiri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuyun Zhang .

Editor information

Editors and Affiliations

Zayed University, Dubai, United Arab Emirates
Hakim Hacid
Poznan University of Economics, Poznan, Poland
Wojciech Cellary
University of Victoria, Footscray, VIC, Australia
Hua Wang
University of New South Wales, Sydney, NSW, Australia
Hye-Young Paik
Swinburne University of Technology, Hawthorn, VIC, Australia
Rui Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X. et al. (2018). Density Biased Sampling with Locality Sensitive Hashing for Outlier Detection. In: Hacid, H., Cellary, W., Wang, H., Paik, HY., Zhou, R. (eds) Web Information Systems Engineering – WISE 2018. WISE 2018. Lecture Notes in Computer Science(), vol 11234. Springer, Cham. https://doi.org/10.1007/978-3-030-02925-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-02925-8_19
Published: 21 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02924-1
Online ISBN: 978-3-030-02925-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics