Multi-start local search algorithm based on a novel objective function for clustering analysis

Xiaolu Liu¹,
Wenhan Shao²,
Jiaming Chen¹,
Zhipeng Lü²,
Fred Glover³ &
…
Junwen Ding ORCID: orcid.org/0000-0002-0618-4969²

499 Accesses
1 Altmetric
Explore all metrics

Abstract

Clustering is the process of partitioning data into different clusters with the goal of minimizing the difference of objects within each cluster, where the commonly used evaluation function is defined as the sum of the squared distance from each point to the cluster center to which it belongs. Nevertheless, this general evaluation function is extremely vulnerable to outliers and noisy data, and it is sensitive to initial cluster centers. More seriously, this evaluation function cannot effectively represent the core of clustering results; even if the partition achieves the global optimum value according to the evaluation function, the clustering results may not be good. In this study, we propose a multi-start local search algorithm (MLS) with several techniques to tackle this problem. First, the center of each cluster is no longer its centroid, which reduces the dependence of the cluster algorithm on difference in size and shape of ideal clusters. Second, the number of adjacent points shared between clusters is defined as the new objective function. Third, two basic meta-operations, merge and split, are used to optimize the objective function and make the iterative process insensitive to the initial solution. The novelty of our approach is the selection criterion of the initial centers and the new objective function, which enables MLS to explore more promising search area. Experimental results demonstrate that MLS outperforms traditional centroid-based clustering algorithms in terms of both solution quality and computational efficiency, and it is quite competitive to other reference algorithms such as spectral, density, and geometric based clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Clustering Algorithm Based on Fitness Probability Scores for Cluster Centers Optimization

An Automatic Multi-Objective Clustering Based on Hierarchical Method

Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering

Article 24 April 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets adopted in the current study are available at https://archive.ics.uci.edu/ml/datasets.php.

Notes

https://archive.ics.uci.edu/ml/datasets.php.

References

Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071
Google Scholar
Agrawal R, Gehrke J, Gunopulos D et al (2005) Automatic subspace clustering of high dimensional data. Data Min Knowl Disc 11(1):5–33
MathSciNet Google Scholar
Aljalbout E, Golkov V, Siddiqui Y et al (2018) Clustering with deep learning: taxonomy and new methods. arXiv:180107648
Aljarah I, Mafarja M, Heidari AA et al (2020) Clustering analysis using a novel locality-informed grey wolf-inspired clustering approach. Knowl Inf Syst 62(2):507–539
Google Scholar
Aloise D, Deshpande A, Hansen P et al (2009) NP-Hardness of euclidean sum-of-squares clustering. Mach Learn 75(2):245–248
MATH Google Scholar
Alshammari M, Stavrakakis J, Takatsuka M (2021) Refining a k-nearest neighbor graph for a computationally efficient spectral clustering. Pattern Recog 114:107,869
Google Scholar
Amini A, Wah TY, Saybani MR et al (2011) A study of density-grid based clustering algorithms on data streams. In: 2011 Eighth international conference on fuzzy systems and knowledge discovery (FSKD), IEEE, pp 1652–1656
Ankerst M, Breunig MM, Kriegel HP et al (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod Record 28(2):49–60
Google Scholar
Ashraf FB, Matin A, Shafi MSR et al (2021) An improved k-means clustering algorithm for multi-dimensional multi-cluster data using meta-heuristics. In: 2021 24th International conference on computer and information technology (ICCIT), IEEE, pp 1-6
Bateni MH, Behnezhad S, Derakhshan M et al (2017) Affinity clustering: hierarchical clustering at scale. In: Proceedings of the 31st International conference on neural information processing systems, pp 6867–6877
Bendoly E (2003) Theory and support for process frameworks of knowledge discovery and data mining from ERP systems. Inf Manag 40(7):639–647
Google Scholar
Brown D, Japa A, Shi Y (2019) A fast density-grid based clustering method. In: 2019 IEEE 9Th annual computing and communication workshop and conference (CCWC), IEEE, pp 0048–0054
Cao B, Glover F, Rego C (2015) A tabu search algorithm for cohesive clustering problems. J Heuristics 21(4):457–477
Google Scholar
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the K-means clustering algorithm. Expert Syst Appl 40(1):200–210
Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 3(2):224–227
Google Scholar
Duwairi R, Abu-Rahmeh M (2015) A novel approach for initializing the spherical K-means clustering algorithm. Simul Model Pract Theory 54:49–63
Google Scholar
Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the AAAI conference on artificial intelligence, pp 226–231
Fahy C, Yang S, Gongora M (2018) Ant colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybern 49(6):2215–2228
Google Scholar
Fan J (2019) Ope-hca: an optimal probabilistic estimation approach for hierarchical clustering algorithm. Neural Comput Applic 31(7):2095–2105
Google Scholar
Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. Appl Intell 48(12):4743–4759
MATH Google Scholar
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315 (5814):972–976
MathSciNet MATH Google Scholar
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):4–14
Google Scholar
Glover F (2017) Pseudo-centroid clustering. Soft Comput 21(22):6571–6592
Google Scholar
Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. ACM Sigmod Record 27(2):73–84
MATH Google Scholar
Guo X, Gao L, Liu X et al (2017) Improved deep embedded clustering with local structure preservation. In: The 26th International joint conference on artificial intelligence (IJCAI), pp 1753–1759
Hartigan JA, Wong MA (1979) Algorithm as 136: a K-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108
MATH Google Scholar
Huang D, Wang CD, Lai JH (2017) Locally weighted ensemble clustering. IEEE Trans Cybern 48(5):1460–1473
Google Scholar
Huang D, Wang CD, Wu JS et al (2019) Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng 32(6):1212–1226
Google Scholar
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Res Biomol 22(12):2577–2637
Google Scholar
Kanungo T, Mount DM, Netanyahu NS et al (2002) An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Google Scholar
Langan DA, Modestino JW, Zhang J (1998) Cluster validation for unsupervised stochastic model-based image segmentation. IEEE Trans Image Process 7(2):180–195
Google Scholar
Li H, Liu X, Li T et al (2020) A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recog 102:107,206
Google Scholar
Likas A, Vlassis N, Verbeek JJ (2003) The global K-means clustering algorithm. Pattern Recogn 36(2):451–461
Google Scholar
Liu R, Wang H, Yu X (2018) Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf Sci 450:200–226
MathSciNet Google Scholar
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
MathSciNet MATH Google Scholar
Meyerhenke H, Sanders P, Schulz C (2016) Partitioning (hierarchically clustered) complex networks via size-constrained graph clustering. J Heuristics 22(5):759–782
MATH Google Scholar
Peng X, Zhu H, Feng J et al (2019) Deep clustering with sample-assignment invariance prior. IEEE Trans Neural Netw Learn Syst 31(11):4857–4868
MathSciNet Google Scholar
Pourbahrami S, Hashemzadeh M (2022) A geometric-based clustering method using natural neighbors. Inf Sci 610:694–706
Google Scholar
Ran X, Zhou X, Lei M et al (2021) A novel K-means clustering algorithm with a noise algorithm for capturing urban hotspots. Appl Sci 11(23):11,202
Google Scholar
Rappoport N, Shamir R (2018) Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res 46(20):10,546–10,562
Google Scholar
Rezaee MJ, Eshkevari M, Saberi M et al (2021) GBK-Means clustering algorithm: an improvement to the K-means algorithm based on the bargaining game. Knowl-Based Syst 213:106,672
Google Scholar
Rezaei M, Fränti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28(8):2173–2186
Google Scholar
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344 (6191):1492–1496
Google Scholar
Ruspini EH, Bezdek JC, Keller JM (2019) Fuzzy clustering: a historical perspective. IEEE Comput Intell Mag 14(1):45–55
Google Scholar
Sabin M, Gray R (1986) Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans Inf Theory 32(2):148–155
MathSciNet MATH Google Scholar
Sedluk MJ, Miller JW (2000) Cluster-based data compression system and method. US Patent 6,100:825
Google Scholar
Sharma KK, Seal A (2020) Spectral embedded generalized mean based K-nearest neighbors clustering with S-distance. Expert Syst Appl 169(4):114,326
Google Scholar
Sheikholeslami G, Chatterjee S, Zhang A (2000) Wavecluster: a wavelet-based clustering approach for spatial data in very large databases. VLDB J 8(3):289–304
Google Scholar
Sheng Y, Wang M, Wu T et al (2019) Adaptive local learning regularized nonnegative matrix factorization for data clustering. Appl Intell 49(6):2151–2168
Google Scholar
Shin KS, Jeong YS, Jeong MK (2012) A two-leveled symbiotic evolutionary algorithm for clustering problems. Appl Intell 36(4):788–799
Google Scholar
Sieranoja S, Fränti P (2021) Adapting K-means for graph clustering. Knowl Inf Syst 8 (11):33–47
Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
MathSciNet MATH Google Scholar
Sun G, Cong Y, Wang Q et al (2020) Lifelong spectral clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 5867–5874
Tao X, Wang R, Chang R et al (2019) Spectral clustering algorithm using density-sensitive distance measure with global and local consistencies. Knowl-Based Syst 170:26–42
Google Scholar
Vassilvitskii S, Arthur D (2006) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035
Veenman CJ, Reinders MJT, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280
Google Scholar
Virmajoki O, Franti P, Kaukoranta T (2002) Iterative shrinking method for generating clustering. In: Proceedings of international conference on image processing, IEEE, pp 2–10
Wang H, Yang Y, Liu B (2019a) Gmc: graph-based multi-view clustering. IEEE Trans Knowl Data Eng 32(6):1116–1129
Google Scholar
Wang H, Yang Y, Liu B et al (2019b) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163:1009–1019
Google Scholar
Wang TS, Lin HT, Wang P (2017) Weighted-spectral clustering algorithm for detecting community structures in complex networks. Artif Intell Rev 47(4):463–483
Google Scholar
Wang W, Yang J, Muntz R et al (1997) Sting: a statistical information grid approach to spatial data mining. In: The VLDB journal, pp 186–195
Wang Y, Duan X, Liu X et al (2018) A spectral clustering method with semantic interpretation based on axiomatic fuzzy set theory. Appl Soft Comput 64:59–74
Google Scholar
Wu B, Wilamowski BM (2016) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13(4):1620–1628
Google Scholar
Xie H, Zhang L, Lim CP et al (2019) Improving K-means clustering with enhanced firefly algorithms. Appl Soft Comput 84:105,763
Google Scholar
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Xu J, Wang G, Deng W (2016) Denpehc: density peak based efficient hierarchical clustering. Inf Sci 373:200–218
Google Scholar
Xu Q, Zhang Q, Liu J et al (2020) Efficient synthetical clustering validity indexes for hierarchical clustering. Expert Syst Appl 151:113,367
Google Scholar
Xu X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl-Based Syst 158:65–74
Google Scholar
Xu X, Ding S, Wang Y et al (2021) A fast density peaks clustering algorithm with sparse search. Inf Sci 554:61–83
MathSciNet Google Scholar
Yin L, Li M, Chen H et al (2022) An improved hierarchical clustering algorithm based on the idea of population reproduction and fusion. Electronics 11(17):2735
Google Scholar
Yoo HW, Jung SH, Jang DS et al (2002) Extraction of major object features using VQ clustering for content-based image retrieval. Pattern Recogn 35(5):1115–1126
MATH Google Scholar
Zhang C, Fu H, Hu Q et al (2018a) Generalized latent multi-view subspace clustering. IEEE Trans Pattern Anal Mach Intell 42(1):86–99
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. ACM Sigmod Record 25(2):103–114
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering algorithm and its applications. Data Min Knowl Disc 1(2):141–182
Google Scholar
Zhang W, Zang W (2018) A fuzzy density peaks clustering algorithm based on improved DNA genetic algorithm and K-nearest neighbors. In: International conference on intelligent science and big data engineering, Springer, pp 476–487
Zhang X, Xu Z (2015) Hesitant fuzzy agglomerative hierarchical clustering algorithms. Int J Syst Sci 46(3):562–576
MATH Google Scholar
Zhang Z, Liu L, Shen F et al (2018b) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782
Google Scholar
Zhu X, Zhang S, Li Y et al (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng 31(8):1532–1543
Google Scholar
Zhu X, Zhu Y, Zheng W (2020) Spectral rotation for deep one-step clustering. Pattern Recogn 105:107,175
Google Scholar

Download references

Acknowledgements

We are grateful to one of our team members Tuanyue Xiao, who has contributed substantially to the revisions, especially the additional experiments for validation, comparisons with other clustering algorithms, and analysis of computational results. This work was supported by the Special Project for Knowledge Innovation of Hubei Province under Grant No. 2022013301015175 and the National Natural Science Foundation of China under Grant No. 62202192.

Author information

Authors and Affiliations

College of System Engineering, National University of Defense Technology, Deya Road, Changsha, 410000, Hunan, China
Xiaolu Liu & Jiaming Chen
School of Computer Science and Technology, Huazhong University of Science and Technology, Luoyu Road, Wuhan, 430074, Hubei, China
Wenhan Shao, Zhipeng Lü & Junwen Ding
College of Engineering & Applied Science, University of Colorado, Boulder, Colorado, 80309-0425, Colorado, USA
Fred Glover

Authors

Xiaolu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenhan Shao
View author publications
You can also search for this author in PubMed Google Scholar
Jiaming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Lü
View author publications
You can also search for this author in PubMed Google Scholar
Fred Glover
View author publications
You can also search for this author in PubMed Google Scholar
Junwen Ding
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xiaolu Liu contributed to conceptualization, validation, writing and editing, supervision, project administration. Wenhan Shao contributed to data analysis, investigation, and writing-original draft preparation; Jiaming Chen contributed with conceptualization, validation, investigation, visualization, writing original draft and editing. Jiaming Chen contributed to revisions, additional experiments for validation, drawing flowchart of the main algorithm, and data analysis. Zhipeng Lü contributed to conceptualization, validation, writing-draft review and editing, supervision, project administration, funding acquisition. Fred Glover contributed to conceptualization, methodology, writing-review and editing, project administration. Junwen Ding contributed to conceptualization, methodology, validation, writing-review and editing, formal analysis, supervision, and funding acquisition.

Corresponding author

Correspondence to Junwen Ding.

Ethics declarations

Ethics approval and consent to participate

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, X., Shao, W., Chen, J. et al. Multi-start local search algorithm based on a novel objective function for clustering analysis. Appl Intell 53, 20346–20364 (2023). https://doi.org/10.1007/s10489-023-04580-x

Download citation

Accepted: 17 March 2023
Published: 06 April 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10489-023-04580-x

Multi-start local search algorithm based on a novel objective function for clustering analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Clustering Algorithm Based on Fitness Probability Scores for Cluster Centers Optimization

An Automatic Multi-Objective Clustering Based on Hierarchical Method

Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-start local search algorithm based on a novel objective function for clustering analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Clustering Algorithm Based on Fitness Probability Scores for Cluster Centers Optimization

An Automatic Multi-Objective Clustering Based on Hierarchical Method

Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation