Abstract
The density-based spatial clustering of applications with noise (DBSCAN) algorithm has been well studied in database domains for clustering multi-dimensional data to extract arbitrary shape clusters. Recently, with the growing interest in big data and increasing diversification of data, the typical size and volume of databases have increased and data have increasingly become high-dimensional. Therefore, a large number of speed-up techniques for DBSCAN algorithms including exact and approximate approaches have been proposed. The fastest DBSCAN algorithm is the cell-based algorithm, which divides the whole data set into small cells. In this paper, we propose a novel exact version cell-based DBSCAN algorithm using minimum bounding rectangle (MBR) criteria. The connecting cells step is the most time-consuming step of the cell-based algorithm. The proposed algorithm can process the connecting cells step at high speed by using MBR criteria. We implemented the proposed cell-based DBSCAN algorithm and show that it outperforms the conventional one in high dimensions.
T. Sakai—JSPS Research Fellow.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Borah, B., Bhattacharyya, D.K.: An improved sampling-based DBSCAN for large spatial databases. In: 2004 Proceedings of International Conference on Intelligent Sensing and Information Processing, pp. 92–96 (2004)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining KDD-1996, pp. 226–231 (1996)
Gan, J., Tao, Y.: DBSCAN revisited: mis-claim, un-fixability, and approximation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data SIGMOD 2015, pp. 519–530 (2015)
Gunawan, A.: A faster algorithm for DBSCAN. Master’s thesis, Technische University Eindhoven (2013)
Liria, A.L.C.: Algorithms for processing of spatial queries using R-trees. The closest pairs query and its application on spatial databases. Ph.D. thesis, Department of Languages and Computation, University of Almeria (2002)
Liu, B.: A fast density-based clustering algorithm for large databases. In: 2006 International Conference on Machine Learning and Cybernetics, pp. 996–1000 (2006)
Mahran, S., Mahar, K.: Using grid for accelerating density-based clustering. In: 2008 8th IEEE International Conference on Computer and Information Technology, pp. 35–40 (2008)
Mai, S.T., Assent, I., Storgaard, M.: AnyDBC: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2016, pp. 1025–1034 (2016)
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)
Tsai, C.F., Wu, C.T.: GF-DBSCAN: a new efficient and effective data clustering technique for large databases. In: Proceedings of the 9th WSEAS International Conference on Multimedia Systems Signal Processing MUSP 2009, pp. 231–236 (2009)
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number JP16J05403 and JP26330139, and the MIC/SCOPE #162308002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sakai, T., Tamura, K., Kitakami, H. (2017). Cell-Based DBSCAN Algorithm Using Minimum Bounding Rectangle Criteria. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-55705-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55704-5
Online ISBN: 978-3-319-55705-2
eBook Packages: Computer ScienceComputer Science (R0)