Abstract
Clustering is the process of partitioning data into different clusters with the goal of minimizing the difference of objects within each cluster, where the commonly used evaluation function is defined as the sum of the squared distance from each point to the cluster center to which it belongs. Nevertheless, this general evaluation function is extremely vulnerable to outliers and noisy data, and it is sensitive to initial cluster centers. More seriously, this evaluation function cannot effectively represent the core of clustering results; even if the partition achieves the global optimum value according to the evaluation function, the clustering results may not be good. In this study, we propose a multi-start local search algorithm (MLS) with several techniques to tackle this problem. First, the center of each cluster is no longer its centroid, which reduces the dependence of the cluster algorithm on difference in size and shape of ideal clusters. Second, the number of adjacent points shared between clusters is defined as the new objective function. Third, two basic meta-operations, merge and split, are used to optimize the objective function and make the iterative process insensitive to the initial solution. The novelty of our approach is the selection criterion of the initial centers and the new objective function, which enables MLS to explore more promising search area. Experimental results demonstrate that MLS outperforms traditional centroid-based clustering algorithms in terms of both solution quality and computational efficiency, and it is quite competitive to other reference algorithms such as spectral, density, and geometric based clustering algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets adopted in the current study are available at https://archive.ics.uci.edu/ml/datasets.php.
References
Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Agrawal R, Gehrke J, Gunopulos D et al (2005) Automatic subspace clustering of high dimensional data. Data Min Knowl Disc 11(1):5–33
Aljalbout E, Golkov V, Siddiqui Y et al (2018) Clustering with deep learning: taxonomy and new methods. arXiv:180107648
Aljarah I, Mafarja M, Heidari AA et al (2020) Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowl Inf Syst 62(2):507–539
Aloise D, Deshpande A, Hansen P et al (2009) NP-Hardness of euclidean sum-of-squares clustering. Mach Learn 75(2):245–248
Alshammari M, Stavrakakis J, Takatsuka M (2021) Refining a k-nearest neighbor graph for a computationally efficient spectral clustering. Pattern Recog 114:107,869
Amini A, Wah TY, Saybani MR et al (2011) A study of density-grid based clustering algorithms on data streams. In: 2011 Eighth international conference on fuzzy systems and knowledge discovery (FSKD), IEEE, pp 1652–1656
Ankerst M, Breunig MM, Kriegel HP et al (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod Record 28(2):49–60
Ashraf FB, Matin A, Shafi MSR et al (2021) An improved k-means clustering algorithm for multi-dimensional multi-cluster data using meta-heuristics. In: 2021 24th International conference on computer and information technology (ICCIT), IEEE, pp 1-6
Bateni MH, Behnezhad S, Derakhshan M et al (2017) Affinity clustering: hierarchical clustering at scale. In: Proceedings of the 31st International conference on neural information processing systems, pp 6867–6877
Bendoly E (2003) Theory and support for process frameworks of knowledge discovery and data mining from ERP systems. Inf Manag 40(7):639–647
Brown D, Japa A, Shi Y (2019) A fast density-grid based clustering method. In: 2019 IEEE 9Th annual computing and communication workshop and conference (CCWC), IEEE, pp 0048–0054
Cao B, Glover F, Rego C (2015) A tabu search algorithm for cohesive clustering problems. J Heuristics 21(4):457–477
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst Appl 40(1):200–210
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 3(2):224–227
Duwairi R, Abu-Rahmeh M (2015) A novel approach for initializing the spherical K-means clustering algorithm. Simul Model Pract Theory 54:49–63
Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the AAAI conference on artificial intelligence, pp 226–231
Fahy C, Yang S, Gongora M (2018) Ant colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybern 49(6):2215–2228
Fan J (2019) Ope-hca: an optimal probabilistic estimation approach for hierarchical clustering algorithm. Neural Comput Applic 31(7):2095–2105
Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. Appl Intell 48(12):4743–4759
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315 (5814):972–976
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):4–14
Glover F (2017) Pseudo-centroid clustering. Soft Comput 21(22):6571–6592
Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record 27(2):73–84
Guo X, Gao L, Liu X et al (2017) Improved deep embedded clustering with local structure preservation. In: The 26th International joint conference on artificial intelligence (IJCAI), pp 1753–1759
Hartigan JA, Wong MA (1979) Algorithm as 136: a K-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108
Huang D, Wang CD, Lai JH (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473
Huang D, Wang CD, Wu JS et al (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng 32(6):1212–1226
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Res Biomol 22(12):2577–2637
Kanungo T, Mount DM, Netanyahu NS et al (2002) An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Langan DA, Modestino JW, Zhang J (1998) Cluster validation for unsupervised stochastic model-based image segmentation. IEEE Trans Image Process 7(2):180–195
Li H, Liu X, Li T et al (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recog 102:107,206
Likas A, Vlassis N, Verbeek JJ (2003) The global K-means clustering algorithm. Pattern Recogn 36(2):451–461
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Meyerhenke H, Sanders P, Schulz C (2016) Partitioning (hierarchically clustered) complex networks via size-constrained graph clustering. J Heuristics 22(5):759–782
Peng X, Zhu H, Feng J et al (2019) Deep clustering with sample-assignment invariance prior. IEEE Trans Neural Netw Learn Syst 31(11):4857–4868
Pourbahrami S, Hashemzadeh M (2022) A geometric-based clustering method using natural neighbors. Inf Sci 610:694–706
Ran X, Zhou X, Lei M et al (2021) A novel K-means clustering algorithm with a noise algorithm for capturing urban hotspots. Appl Sci 11(23):11,202
Rappoport N, Shamir R (2018) Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res 46(20):10,546–10,562
Rezaee MJ, Eshkevari M, Saberi M et al (2021) GBK-Means clustering algorithm: an improvement to the K-means algorithm based on the bargaining game. Knowl-Based Syst 213:106,672
Rezaei M, Fränti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28(8):2173–2186
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492–1496
Ruspini EH, Bezdek JC, Keller JM (2019) Fuzzy clustering: a historical perspective. IEEE Comput Intell Mag 14(1):45–55
Sabin M, Gray R (1986) Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans Inf Theory 32(2):148–155
Sedluk MJ, Miller JW (2000) Cluster-based data compression system and method. US Patent 6,100:825
Sharma KK, Seal A (2020) Spectral embedded generalized mean based K-nearest neighbors clustering with S-distance. Expert Syst Appl 169(4):114,326
Sheikholeslami G, Chatterjee S, Zhang A (2000) Wavecluster: a wavelet-based clustering approach for spatial data in very large databases. VLDB J 8(3):289–304
Sheng Y, Wang M, Wu T et al (2019) Adaptive local learning regularized nonnegative matrix factorization for data clustering. Appl Intell 49(6):2151–2168
Shin KS, Jeong YS, Jeong MK (2012) A two-leveled symbiotic evolutionary algorithm for clustering problems. Appl Intell 36(4):788–799
Sieranoja S, Fränti P (2021) Adapting K-means for graph clustering. Knowl Inf Syst 8 (11):33–47
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
Sun G, Cong Y, Wang Q et al (2020) Lifelong spectral clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 5867–5874
Tao X, Wang R, Chang R et al (2019) Spectral clustering algorithm using density-sensitive distance measure with global and local consistencies. Knowl-Based Syst 170:26–42
Vassilvitskii S, Arthur D (2006) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035
Veenman CJ, Reinders MJT, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280
Virmajoki O, Franti P, Kaukoranta T (2002) Iterative shrinking method for generating clustering. In: Proceedings of international conference on image processing, IEEE, pp 2–10
Wang H, Yang Y, Liu B (2019a) Gmc: graph-based multi-view clustering. IEEE Trans Knowl Data Eng 32(6):1116–1129
Wang H, Yang Y, Liu B et al (2019b) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163:1009–1019
Wang TS, Lin HT, Wang P (2017) Weighted-spectral clustering algorithm for detecting community structures in complex networks. Artif Intell Rev 47(4):463–483
Wang W, Yang J, Muntz R et al (1997) Sting: a statistical information grid approach to spatial data mining. In: The VLDB journal, pp 186–195
Wang Y, Duan X, Liu X et al (2018) A spectral clustering method with semantic interpretation based on axiomatic fuzzy set theory. Appl Soft Comput 64:59–74
Wu B, Wilamowski BM (2016) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13(4):1620–1628
Xie H, Zhang L, Lim CP et al (2019) Improving K-means clustering with enhanced firefly algorithms. Appl Soft Comput 84:105,763
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Xu J, Wang G, Deng W (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373:200–218
Xu Q, Zhang Q, Liu J et al (2020) Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Syst Appl 151:113,367
Xu X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl-Based Syst 158:65–74
Xu X, Ding S, Wang Y et al (2021) A fast density peaks clustering algorithm with sparse search. Inf Sci 554:61–83
Yin L, Li M, Chen H et al (2022) An improved hierarchical clustering algorithm based on the idea of population reproduction and fusion. Electronics 11(17):2735
Yoo HW, Jung SH, Jang DS et al (2002) Extraction of major object features using VQ clustering for content-based image retrieval. Pattern Recogn 35(5):1115–1126
Zhang C, Fu H, Hu Q et al (2018a) Generalized latent multi-view subspace clustering. IEEE Trans Pattern Anal Mach Intell 42(1):86–99
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM Sigmod Record 25(2):103–114
Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182
Zhang W, Zang W (2018) A fuzzy density peaks clustering algorithm based on improved DNA genetic algorithm and K-nearest neighbors. In: International conference on intelligent science and big data engineering, Springer, pp 476–487
Zhang X, Xu Z (2015) Hesitant fuzzy agglomerative hierarchical clustering algorithms. Int J Syst Sci 46(3):562–576
Zhang Z, Liu L, Shen F et al (2018b) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782
Zhu X, Zhang S, Li Y et al (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng 31(8):1532–1543
Zhu X, Zhu Y, Zheng W (2020) Spectral rotation for deep one-step clustering. Pattern Recogn 105:107,175
Acknowledgements
We are grateful to one of our team members Tuanyue Xiao, who has contributed substantially to the revisions, especially the additional experiments for validation, comparisons with other clustering algorithms, and analysis of computational results. This work was supported by the Special Project for Knowledge Innovation of Hubei Province under Grant No. 2022013301015175 and the National Natural Science Foundation of China under Grant No. 62202192.
Author information
Authors and Affiliations
Contributions
Xiaolu Liu contributed to conceptualization, validation, writing and editing, supervision, project administration. Wenhan Shao contributed to data analysis, investigation, and writing-original draft preparation; Jiaming Chen contributed with conceptualization, validation, investigation, visualization, writing original draft and editing. Jiaming Chen contributed to revisions, additional experiments for validation, drawing flowchart of the main algorithm, and data analysis. Zhipeng Lü contributed to conceptualization, validation, writing-draft review and editing, supervision, project administration, funding acquisition. Fred Glover contributed to conceptualization, methodology, writing-review and editing, project administration. Junwen Ding contributed to conceptualization, methodology, validation, writing-review and editing, formal analysis, supervision, and funding acquisition.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Shao, W., Chen, J. et al. Multi-start local search algorithm based on a novel objective function for clustering analysis. Appl Intell 53, 20346–20364 (2023). https://doi.org/10.1007/s10489-023-04580-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04580-x