Abstract
An algorithm for optimizing data clustering in feature space is studied in this work. Using graph Laplacian and extreme learning machine (ELM) mapping technique, we develop an optimal weight matrix W for feature mapping. This work explicitly performs a mapping of the original data for clustering into an optimal feature space, which can further increase the separability of original data in the feature space, and the patterns points in same cluster are still closely clustered. Our method, which can be easily implemented, gets better clustering results than some popular clustering algorithms, like k-means on the original data, kernel clustering method, spectral clustering method, and ELM k-means on data include three UCI real data benchmarks (IRIS data, Wisconsin breast cancer database, and Wine database).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Luxburg UV (2004) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Han J, Kamber M, Pei J (2001) Data mining, concepts and techniques. Morgan Kaufmann, San Francisco
McQueen J (1967) Some methods for classifications and analysis of multivariate observations. In: The symposium on mathematical statistics and probability vol 1, pp 281–297
Karypis G, Han E-H, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Rastogi G, Shim K (1998) CURE: an efficient clustering algorithm for large datasets. In: ACM SIGMOD conference, 1998
Defays D (1977) An efficient algorithm for a complete link method. Comput J 20(4):364–366
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial data bases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, AAAI Press, pp 226–231
Roy S, Bhattacharyya D (2005) An approach to find embedded clusters using density based techniques. In: Distributed computing and internet technology, pp 523–535
Sheikholeslami G, Chatterjee S, Zhang A (1998) Wave cluster: a multi-resolution clustering approach for very large spatial databases. In: The proceedings of the 24th VLDB conference, New York, USA, pp 428–439
Xiong H, Wu J, Chen J (2009) K-means clustering versus validation measures: a data-distribution perspective. IEEE Trans Syst Man Cybern Part B Cybern 39(2):318–331
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Girolami M (2002) Mercer kernel based clustering in feature space. IEEE Trans Neural Netw 13(3):780–784
Camastra F, Verri A (2005) A novel kernel method for clustering. IEEE Trans Pattern Anal Mach Intell 27(5):801–805
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856
He Q, Jin X, Du C, Zhuang F, Shi Z (2014) Clustering in extreme learning machine feature space. Neurocomputing 128:88–95
Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol 2, Budapest, Hungary, pp 985–990
Man Z, Lee K, Wang DH, Cao Z, Miao C (2011) A new robust training algorithm for a class of single hidden layer neural networks. Neurocomputing 74:2491–2501
Man Z, Lee K, Wang D, Cao Z, Khoo S (2013) An optimal weight learning machine for handwritten digit image recognition. Signal Process 93(6):1624–1638
Belkin M, Matveeva I, Niyogi P (2004) Regularization and semi-supervised learning on large graphs. In: Proceedings of 17th conference on learning theory (COLT), 2004
The IRIS data can be downloaded from the following address: http://archive.ics.uci.edu/ml/datasets/Iris
Wisconsin’s breast cancer database can be downloaded from the following address: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
Wine database can be downloaded from the following address: https://archive.ics.uci.edu/ml/datasets/Wine
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Wolberg WH, Mangasarian OL (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc Natl Acad Sci USA 87:9193–9196
Acknowledgments
This research was supported by Natural Science Foundation of China under Grant No. 11171137, Zhejiang Provincial Natural Science Foundation of China under Grant No. LY13A010008, and Scientific Research Fund of Zhejiang Provincial Education Department under Grant No. 2014.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xie, L., Lu, C., Mei, Y. et al. An optimal method for data clustering. Neural Comput & Applic 27, 283–289 (2016). https://doi.org/10.1007/s00521-014-1818-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-014-1818-3