Abstract
Affinity propagation (AP) is a recently proposed clustering algorithm, which has been successful used in a lot of practical problems. Although effective in finding meaningful clustering solutions, a key disadvantage of AP is its efficiency, which has become the bottleneck when applying AP for large-scale problems. In the literature, most of the methods proposed to improve the efficiency of AP are based on implementing the message-passing on a sparse similarity matrix, while neither the decline in effectiveness nor the improvement in efficiency is theoretically analyzed. In this paper, we propose a two-stage fast affinity propagation (FastAP) algorithm. Different from previous work, the scale of the similarity matrix is first compressed by selecting only potential exemplars, then further reduced by sparseness according to k nearest neighbors. More importantly, we provide theoretical analysis, based on which the improvement of efficiency in our method is controllable with guaranteed clustering performance. In experiments, two synthetic data sets, seven publicly available data sets, and two real-world streaming data sets are used to evaluate the proposed method. The results demonstrate that FastAP can achieve comparable clustering performances with the original AP algorithm, while the computational efficiency has been improved with a several-fold speed-up on small data sets and a dozens-of-fold on larger-scale data sets.
Similar content being viewed by others
References
Barbakh W, Wu Y, Fyfe C (2009) Review of clustering algorithms. Non-standard parameter adaptation for exploratory data analysis, studies in computational intelligence, vol 249. Springer, Berlin, pp 7–28
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer series in statistics. Springer, Berlin
Chisci L, Mavino A, Perferi G, Sciandrone M, Anile C, Colicchio G, Fuggetta F (2010) Real time epileptic seizure prediction using AR models and support vector machines. IEEE Trans Biomed Eng 57(5):1124–1132
Drezner Z, Hamacher HW (2002) Facility location: applications and theory. Springer, Berlin
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
Dueck D (2009) Affinity propagation: clustering data by passing messages. Canadian thesis, University of Toronto
EPILEPSIAE (2008) The Freiburg EEG database. http://epilepsy.uni-freiburg.de/freiburg-seizure-prediction-project/eeg-database
Farinelli A, Denitto M, Bicego M (2011) Biclustering of expression microarray data using affinity propagation. In: Proceedings of the 6th international conference on pattern recognition in bioinformatics (PRIB’11). Springer, Berlin, pp 13–24
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Frey BJ, Dueck D (2008) Response to comment on “clustering by passing messages between data points”. Science 319(5864):726
Fujiwara Y, Irie G, Kitahara T (2011) Fast algorithm for affinity propagation. In: Proceedings of the twenty-second international joint conference on artificial intelligence (IJCAI’11). IJCAI/AAAI, Barcelona, Spain, pp 2238–2243
Givoni IE (2012) Beyond affinity propagation: message passing algorithms for clustering. Canadian thesis, University of Toronto
Givoni IE, Frey BJ (2009) Semi-supervised affinity propagation with instance-level constraints. In: Dyk DAV, Welling M (eds) JMLR proceedings of the 12th international conference on artificial intelligence and statistics. JMLR.org, Clearwater, FL, USA, pp 161–168
Givoni IE, Chung C, Frey BJ (2011) Hierarchical affinity propagation. In: 27th Conference on uncertainty in artificial intelligence (UAI’11), Barcelona, Spain
Guan R, Shi X, Marchese M, Yang C, Liang Y (2011) Text clustering with seeds ffinity propagation. IEEE Trans Knowl Data Eng 23(4):627–637
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Los Altos
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Jia C, Jiang Y, Yu J (2010) Affinity propagation on identifying communities in social and biological networks. Knowledge science, engineering and management, vol 6291., Lecture notes in computer scienceSpringer, Berlin, pp 597–602
Jia Y, Wang J, Zhang C, Hua XS (2008) Finding image exemplars using fast sparse affinity propagation. In: Proceedings of the 16th ACM international conference on multimedia (MM ’08). ACM, New York, USA, pp 639–642
Kariv O, Hakimi S (1979) An algorithmic approach to network location problems: the p-centers. SIAM J Appl Math 37(3):513–538
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Kazantseva A, Szpakowicz S (2011) Linear text segmentation using affinity propagation. Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 284–293
Kschischang FR, Frey BJ, Loeliger HA (2001) Factor graphs and the sum-product algorithm. IEEE Trans Inf Theory 47(2):498–519
Lai D, Nardini C, Lu H (2011) Partitioning networks into communities by message passing. Phys Rev E 83(016):115
Leone M, Sumedha Weigt M (2007) Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics 23(20):2708–2715
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Sun L, Guo C (2014) Incremental affinity propagation clustering based on message passing. IEEE Trans Knowl Data Eng 26(11):2731–2744
Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In: Proceedings of the 26th annual international conference on machine learning (ICML ’09). ACM, pp 1073–1080
Xiao J, Wang J, Tan P, Quan L (2007) Joint affinity propagation for multiple view segmentation. In: IEEE 11th international conference on computer vision (ICCV ’07). IEEE, pp 1–7
Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neur Netw 16(3):645–678
Zhang K, Kwok JT (2010) Clustered nystrom method for large scale manifold learning and dimension reduction. IEEE Trans Neur Netw 21(10):1576–1587
Acknowledgments
This work was partly supported by the Natural Science Foundation of China under Grant (Nos. 71171030 and 71501023) and partly by Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No. 71421001).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 7.
Rights and permissions
About this article
Cite this article
Sun, L., Guo, C., Liu, C. et al. Fast affinity propagation clustering based on incomplete similarity matrix. Knowl Inf Syst 51, 941–963 (2017). https://doi.org/10.1007/s10115-016-0996-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0996-y