Abstract
This work proposes a novel data clustering algorithm based on the potential field model, with a hierarchical optimization mechanism on the algorithm. There are two stages in this algorithm. Firstly, we build an edge-weighted tree based on the mutual distances between all data points and their hypothetical potential values derived from the data distribution. Using the tree structure, the dataset can be divided into an appropriate number of initial sub-clusters, with the cluster centers close to the local minima of the potential field. Then the sub-clusters are further merged according to the well-designed merging criteria by analyzing their border potential values and the cluster average potential values. The proposed clustering algorithm follows a hierarchical clustering mechanism, and aims to optimize the initial sub-cluster results in the first stage. The algorithm takes advantage of the cluster merging criteria to merge the sub-clusters, so it can automatically stop the clustering process without designating the number of clusters in advance. The experimental results show that the proposed algorithm produces the most satisfactory clustering results in most cases compared with other existing methods, and can effectively identify the data clusters with arbitrary shape, size and density.
Similar content being viewed by others
References
Bahrololoum, A., Nezamabadi-pour, H., Saryazdi, S.: A data clustering approach based on universal gravity rule. Eng. Appl. Artif. Intel. 45, 415–428 (2015)
Chang, H., Yeung, D.-Y.: Robust path-based spectral clustering. Pattern Recogn. 41, 191–203 (2008)
Endo, Y., Iwata, H.: Dynamic clustering based on universal gravitation model. In: International Conference on Modeling Decisions for Artificial Intelligence, pp. 183–193 (2005)
Ester, M., Kriegel, H., Sander, J., Xiaowei, X.: A density-based algorithm for discovery clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recogn. 41, 176–190 (2008)
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (2012)
Gao, J., Zhao, L., Chen, Z., Li, P., Han, X., Hu, Y.: Icfs: An improved fast search and find of density peaks clustering algorithm. In: International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress, pp. 537–543 (2016)
Jain, A.K.: Data clustering: a user’s dilemma. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 1–10 (2005)
Jain, A.K.: Dataclustering: 50 years beyond k-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Kleinberg, J.: An impossibility theorem for clustering. In: Annural Conference on Neural Information Processing Systems, pp. 463–470 (2002)
Liu, X., Liu, Y., Xie, Q.: A potential-based clustering method by fast search and find of cluster centers. In: Proceedings of the Big Data Partitioning and Mining Workshop associated with 2017 IEEE International Conference on Big Knowledge (2017)
Lu Y., Wan, Y.: Clustering by sorting potential values (cspv): a novel potential-based clustering method. Pattern Recogn. 45, 3512–3522 (2012)
Lu, Y., Yi, W.: Pha: a fast potential-based hierarchical agglomerative clustering method. Pattern Recogn. 46, 1227–1239 (2013)
Omran, M.G.H., Engelbrecht, A.P., Salman, A.: An overview of clustering methods. Intelligent Data Analysis 11, 583–605 (2007)
Peng, L., Bo, Y., Chen, Y., Abraham, A.: Data gravitation based classification. Inform. Sci. 179, 809–819 (2009)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)
Rostami, A., Lashkari, M.: Extended pso algorithm for improvement problems k-means clustering algorithm. International Journal of Managing Information Technology 6, 17–29 (2014)
Shang, S., Chen, L., Jensen, C.S., Wen, J.-R., Kalnis, P.: Seasearch trajectories by regions of interest. IEEE Trans. Knowl. Data Eng. 29, 1549–1562 (2017)
Shang, S., Guo, D., Liu, J., Zheng, K., Wen, J.: Finding regions of interest using location based social media. Neurocomputing 173, 118–123 (2016)
Shang, S., Liu, J., Zhao, K., Yang, M., Zheng, K., Wen, J.-R.: Dimension reduction with meta object-groups for efficient image retrieval. Neurocomputing 169, 50–54 (2015)
Shang, S., Zheng, K., Jensen, C.S., Yang, B., Kalnis, P., Li, G., Wen, J.: Discovery of path nearby clusters in spatial networks. IEEE Trans. Knowl. Data Eng. 27, 1505–1518 (2015)
Shi, S., Yang, G., Wang, D., Zheng, W.: Potential-based hierarchical clustering. In: International Conference on Pattern Recognition, pp. 272–275 (2002)
Tu, Q., Lu, J., Yuan, B., Tang, J.B., Yang, J.Y.: Density-based hierarchical clustering for streaming data. Pattern Recogn. Lett. 33, 641–645 (2012)
Wright, W.E.: Gravitational clustering. Pattern Recogn. 9, 151–166 (1977)
Xu, R., Wunsch, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005)
Xu, X., Ester, M., Kriegel, H., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases. In: International Conference on Data Engineering, pp. 324–331 (1998)
Yamachi, H., Kambayashi, Y., Tsujimura, Y.: A clustering method based on potential field. In: The 10th Asia Pacific Industrial Engineering & Management System Conference, pp. 846–855 (2009)
Zhao, Q., Shi, Y., Liu, Q., Franti, P.: A grid-growing clustering algorithm for geo-spatial data. Pattern Recogn. Lett. 53, 77–84 (2015)
Zheng, K., Zheng, Y., Yuan, N.J., Shang, S., Zhou, X.: Online discovery of gathering patterns over trajectories. IEEE Trans. Knowl. Data Eng. 26, 1974–1988 (2014)
Zhu, J., Xie, Q., Zheng, K.: An improved early detection method of type-2 diabetes mellitus using multiple classifier system. Inform. Sci. 292, 1–14 (2015)
Zhu, X., Li, X., Zhang, S.: Block-row sparse multiview multilabel learning for image classification. IEEE Transactions on Cybernetics 46, 450–461 (2016)
Zhu, X., Li, X., Zhang, S., Ju, C., Wu, X.: Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Transactions on Neural Networks and Learning Systems 26, 1263–1275 (2016)
Zhu, X., Zhang, L., Zi, H.: A sparse embedding and least variance encoding approach to hashing. IEEE Trans. Image Process. 23, 3737–3750 (2014)
Author information
Authors and Affiliations
Corresponding author
Additional information
This article belongs to the Topical Collection: Special Issue on Deep Mining Big Social Data
Guest Editors: Xiaofeng Zhu, Gerard Sanroma, Jilian Zhang, and Brent C. Munsell
Part of the results in this work appeared in proceedings of the Big Data Partitioning and Mining Workshop associated with 2017 IEEE International Conference on Big Knowledge [11].
This research is partially supported by Natural Science Foundation of China (Grant No.61602353), National Social Science Foundation of China (Grant No.15BGL048) and the Fundamental Research Funds for the Central Universities (WUT:2017IVA053, WUT:2017IVB028 and WUT:2017II39GX).
Rights and permissions
About this article
Cite this article
Liu, X., Liu, Y., Xie, Q. et al. A potential-based clustering method with hierarchical optimization. World Wide Web 21, 1617–1635 (2018). https://doi.org/10.1007/s11280-017-0509-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-017-0509-2