Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2684464.2684477acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdcnConference Proceedingsconference-collections
research-article

Parallelizing OPTICS for Commodity Clusters

Published: 04 January 2015 Publication History

Abstract

In this paper, we propose an algorithm, DOPTICS, a parallelized version of a popular density based cluster-ordering algorithm OPTICS. Parallelizing OPTICS is challenging because of its strong sequential data access behavior. To achieve high parallelism, a data parallel approach that exploits the underlying indexing structure is proposed. We implement the proposed algorithm for processor nodes in a commodity cluster as well as across cores in a processor. Moreover, the clusters obtained by our algorithm are exactly same as that of classical OPTICS unlike the only existing implementation of the parallel OPTICS. We demonstrate the performance of the proposed algorithm on a commodity cluster which is typically a combination of distributed and shared memory systems. Experimental results on several large real and synthetic data sets with varying dimensions are presented to show speed up and scalability achieved. The speed up obtained is remarkable and is found to scale well with increasing number of processing elements. Performance improvements of the proposed DOPTICS algorithm are due to algorithmic optimizations and parallelization strategy.

References

[1]
Ankerst, M. Breunig, M. M., Kriegel, H.-P., and Sander, J. 1999. OPTICS: Ordering Points To Identify the Clustering Structure, In Proceedings of the 1999 ACM SIGMOD International conference on Management of Data (SIGMOD '99). ACM, New York, NY, USA, 49--60.
[2]
Aoying, Z., Shuigeng, Z., Jing, C., Ye, F., and Yunfa, H. 2000. Approaches for scaling DBSCAN algorithm to large spatial databases, Journal of Computer Science and Technology, 15 (6), 509--526.
[3]
Arlia, D. 2001. Experiments in Parallel Clustering with DBSCAN. In Proceedings of the Euro Par '01 conference. 326--331.
[4]
Bertone, S., De Lucia, G., and Thomas, P. 2007. The recycling of gas and metals in galaxy formation, predictions of a dynamical feedback model, Monthly Notices of the Royal Astronomical Society, 379, 3, 1143--1154.
[5]
Brecheisen, S., Kriegel, H., and Pfeifle, M. 2006. Parallel density-based clustering of complex objects, Advances in Knowledge Discovery and Data Mining. 179--188.
[6]
Chen, M., Gao, X., and Li, H. 2010. Parallel DBSCAN with Priority R-tree, In Information Management and Engineering (ICIME), The 2nd IEEE International Conference on. 508--511.
[7]
Coppola, M. and Vanneschi, M. 2001. High-performance data mining with skeleton-based structured parallel programming, Parallel Computing - Parallel data-intensive algorithms and applications. 28, 793--813.
[8]
De Lucia, G. and Blaizot J. 2007. The hierarchical formation of the brightest cluster galaxies, Monthly Notices of the Royal AstronomicaSociety, 375, 1, 2--14.
[9]
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A Density-Based algorithm for discovering clusters in large spatial databases with noise. In Evangelos Simoudis, Jiawei Han, and Usama Fayyad, editors, Second International Conference on Knowledge Discovery and Data Mining, pages 226--231, Portland, Oregon, AAAI Press.
[10]
Fatta, G. Di and Pettinger, G, D. 2010. Dynamic Load Balancing in Parallel KD-Tree k-Means. 10th IEEE International Conference on Computer and Information Technology. 2478--2485.
[11]
Faloutsos, C. 1987. Analysis of object oriented spatial access methods. In ACM SIGMOD International conference on Management of Data. 426--439.
[12]
Fränti, P. and Virmajoki, O. 2006. Iterative shrinking method for clustering problems. Pattern Recognition. 39, 5, 761--778.
[13]
Fu, Y., Zhao, W., and Ma, H. 2011. Research on parallel DBSCAN algorithm design based on mapreduce, Advanced Materials Research, 301, 1133--1138.
[14]
Guttman, A. 1984. R-trees a dynamic index structure for spatial searching, In Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 47--57.
[15]
He, Y., Tan, H., Luo, W., Mao, H., Ma, D., Feng, S. et al. 2011. MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce, In Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems, 473--480. Washington, DC, USA: IEEE Computer Society.
[16]
Hinneburg, A. and Keim, D. A. 1998. An efficient approach to clustering in large multimedia databases with noise. AAAI Press, 58--65.
[17]
Kaul, M., Yang, B., and Jensen, C.S. 2013. Building Accurate 3D Spatial Networks to Enable Next Generation Intelligent Transportation Systems, In Proceedings of International Conference on Mobile Data Management (IEEE MDM), Milan, Italy.
[18]
Kriegel, H.-P., Krooger, P., and Gotlibovich, I. 2003. Incremental OPTICS: Efficient Computation of Updates in a Hierarchical Cluster Ordering, In M. M. Yahiko Kambayashi (Ed.), Data Warehousing and Knowledge Discovery, vol. 2737, pp. 224--233, Springer Berlin Heidelberg.
[19]
Patwary, M. A., Palsetia, D., Agrawal, A. Liao, W.-K., Manne, F., and Choudhary, A. 2013. Scalable Parallel OPTICS Data Clustering Using Graph Algorithmic Techniques. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, New York, NY, USA.
[20]
Patwary, M. A., Palsetia, D., Agrawal, A., Liao, W.-K., Manne, F., and Choudhary, A. 2012. A new scalable parallel DBSCAN algorithm using the disjoint-set data structure, In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 62, 1--11. Los Alamitos, CA, USA: IEEE Computer Society Press.
[21]
Shaoa, F., Caoa, Y., Gua, J. and Wangb, Y. 2010. A new real-time clustering algorithm.
[22]
Springel, V., White, S., Jenkins, A., Frenk, C., Yoshida, N., Gao, L., Navarro, J., Thacker, R., Croton, D., Helly, J., et al. 2005. Simulations of the formation, evolution and clustering of galaxies and quasars, Nature, 435, 7042, 629--636.
[23]
SUVnet-Trace data: http://wirelesslab.sjtu.edu.cn.
[24]
VampirTrace Library, http://www.tudresden.de/die_tu_dresden/zentrale_einrichtungen/zih/forschung/projekte/vampirtrace.
[25]
Xu, X., Jager, J., and Kriege, H.-P. 1999. A fast parallel clustering algorithm for large spatial databases. Data Mining and Knowledge Discovery, 3, 263--290.

Cited By

View all
  • (2024)A Survey and Experimental Review on Data Distribution Strategies for Parallel Spatial Clustering AlgorithmsJournal of Computer Science and Technology10.1007/s11390-024-2700-039:3(610-636)Online publication date: 1-May-2024
  • (2021)Anytime clustering of data streams while handling noise and concept driftJournal of Experimental & Theoretical Artificial Intelligence10.1080/0952813X.2021.188200134:3(399-429)Online publication date: 15-Mar-2021
  • (2020)Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data miningInternational Journal of Data Science and Analytics10.1007/s41060-020-00208-210:1(25-47)Online publication date: 3-Apr-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICDCN '15: Proceedings of the 16th International Conference on Distributed Computing and Networking
January 2015
360 pages
ISBN:9781450329286
DOI:10.1145/2684464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. MPI
  3. OPTICS
  4. OpenMP
  5. Parallel computing
  6. R-trees
  7. clusters
  8. commodity clusters
  9. density-based clustering
  10. multicore

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICDCN '15

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey and Experimental Review on Data Distribution Strategies for Parallel Spatial Clustering AlgorithmsJournal of Computer Science and Technology10.1007/s11390-024-2700-039:3(610-636)Online publication date: 1-May-2024
  • (2021)Anytime clustering of data streams while handling noise and concept driftJournal of Experimental & Theoretical Artificial Intelligence10.1080/0952813X.2021.188200134:3(399-429)Online publication date: 15-Mar-2021
  • (2020)Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data miningInternational Journal of Data Science and Analytics10.1007/s41060-020-00208-210:1(25-47)Online publication date: 3-Apr-2020
  • (2018)An efficient method for batch updates in OPTICS cluster orderingInternational Journal of Data Analysis Techniques and Strategies10.1504/IJDATS.2018.09063110:1(57-80)Online publication date: 1-Jan-2018
  • (2018)RP-DBSCANProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3196887(1173-1187)Online publication date: 27-May-2018
  • (2018)Towards a New Approach for Empowering the MR-DBSCAN Clustering for Massive Data Using Quadtree2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00044(91-98)Online publication date: Jun-2018
  • (2017)Exact, Fast and Scalable Parallel DBSCAN for Commodity PlatformsProceedings of the 18th International Conference on Distributed Computing and Networking10.1145/3007748.3007773(1-10)Online publication date: 5-Jan-2017
  • (2015)Classifying G-Band Bright Points Based on Clustering Analysis2015 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS)10.1109/ICINIS.2015.14(50-53)Online publication date: Nov-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media