Abstract
The size of everyday data sets is outpacing the capability of computational hardware to analyze these data sets. Social networking and mobile computing alone are producing data sets that are growing by terabytes every day. Because these data often cannot be loaded into a computer’s working memory, most literal algorithms (algorithms that require access to the full data set) cannot be used. One type of pattern recognition and data mining method that is used to analyze databases is clustering; thus, clustering algorithms that can be used on large data sets are important and useful. We focus on a specific type of clustering: kernelized fuzzy c-means (KFCM). The literal KFCM algorithm has a memory requirement of O(n 2), where n is the number objects in the data set. Thus, even data sets that have nearly 1,000,000 objects require terabytes of working memory—infeasible for most computers. One way to attack this problem is by using incremental algorithms; these algorithms sequentially process chunks or samples of the data, combining the results from each chunk. Here we propose three new incremental KFCM algorithms: rseKFCM, spKFCM, and oKFCM. We assess the performance of these algorithms by, first, comparing their clustering results to that of the literal KFCM and, second, by showing that these algorithms can produce reasonable partitions of large data sets. In summary, the rseKFCM is the most efficient of the three, exhibiting significant speedup at low sampling rates. The oKFCM algorithm seems to produce the most accurate approximation of KFCM, but at a cost of low efficiency. Our recommendation is to use rseKFCM at the highest sample rate allowable for your computational and problem needs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Belabbas, M., Wolfe, P.: Spectral methods in machine learning and new strategies for very large datasets. Proc. National Academy of Sciences 106(2), 369–374 (2009)
Bezdek, J.: A convergence theorem for the fuzzy isodata clustering algorithms. IEEE Trans. Pattern Analysis and Machine Intelligence 2, 1–8 (1980)
Bezdek, J.: Pattern Recognition With Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Bezdek, J., Hathaway, R.: Convergence of alternating optmization. Nueral, Parallel, and Scientific Computations 11(4), 351–368 (2003)
Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer, Norwell (1999)
Bo, W., Nevatia, R.: Cluster boosted tree classifier for multi-view, multi-pose object detection. In: Proc. ICCV (October 2007)
Cannon, R., Dave, J., Bezdek, J.: Efficient implementation of the fuzzy c-means algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence 8, 248–255 (1986)
Cheng, T., Goldgof, D., Hall, L.: Fast clustering with application to fuzzy rule generation. In: Proc. IEEE Int. Conf. Fuzzy Systems, Tokyo, Japan, pp. 2289–2295 (1995)
Chitta, R., Jin, R., Havens, T., Jain, A.: Approximate kernel k-means: Solution to large scale kernel clustering. In: Proc. ACM SIGKDD (2011)
Dhillon, I., Guan, Y., Kulis, B.: Kernel k-means, spectral clustering, and normalized cuts. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery Data Mining, pp. 551–556 (August 2004)
Drineas, P., Mahoney, M.: On the nystrom method for appoximating a gram matrix for improved kernel-based learning. The J. of Machine Learning Research 6, 2153–2175 (2005)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley-Interscience (October 2000)
Eschrich, S., Ke, J., Hall, L., Goldgof, D.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. Fuzzy Systems 11, 262–269 (2003)
Frigui, H.: Simultaneous Clustering and Feature Discrimination with Applications. In: Advances in Fuzzy Clustering and Feature Discrimination with Applications, pp. 285–312. John Wiley and Sons (2007)
Hartigan, J.: Clustering Algorithms. Wiley, New York (1975)
Hathaway, R., Bezdek, J.: NERF c-MEANS: Non-euclidean relational fuzzy clustering. Pattern Recognition 27, 429–437 (1994)
Hathaway, R., Bezdek, J.: Extending fuzzy and probabilistic clustering to very large data sets. Computational Statistics and Data Analysis 51, 215–234 (2006)
Hathaway, R., Bezdek, J., Tucker, W.: An improved convergence theory for the fuzzy isodata clustering algorithms. In: Bezdek, J. (ed.) Analysis of Fuzzy Information, vol. 3, pp. 123–132. CRC Press, Boca Raton (1987)
Hathaway, R., Davenport, J., Bezdek, J.: Relational duals of the c-means clustering algorithms. Pattern Recognition 22(2), 205–212 (1989)
Hathaway, R., Huband, J., Bezdek, J.: A kernelized non-euclidean relational fuzzy c-means algorithm. In: Proc. IEEE Int. Conf. Fuzzy Systems, pp. 414–419 (2005)
Havens, T., Chitta, R., Jain, A., Jin, R.: Speedup of fuzzy and possibilistic c-means for large-scale clustering. In: Proc. IEEE Int. Conf. Fuzzy Systems, Taipei, Taiwan (2011)
Hore, P., Hall, L., Goldgof, D.: Single pass fuzzy c means. In: Proc. IEEE Int. Conf. Fuzzy Systems, London, England, pp. 1–7 (2007)
Hore, P., Hall, L., Goldgof, D., Gu, Y., Maudsley, A.: A scalable framework for segmenting magentic resonance images. J. Signal Process. Syst. 54(1-3), 183–203 (2009)
Huber, P.: Massive Data Sets Workshop: The Morning After. In: Massive Data Sets, pp. 169–184. National Academy Press (1997)
Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2, 193–218 (1985)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Johnson, S.: Hierarchical clustering schemes. Psychometrika 2, 241–254 (1967)
Khan, S., Situ, G., Decker, K., Schmidt, C.: Go Figure: Automated Gene Ontology annotation. Bioinf. 19(18), 2484–2485 (2003)
Kolen, J., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. Fuzzy Systems 10, 263–267 (2002)
Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Trans. on Fuzzy Sys. 1(2) (May 1993)
Kumar, S., Mohri, M., Talwalkar, A.: Sampling techniques for the nystrom method. In: Proc. Conf. Artificial Intelligence and Statistics, pp. 304–311 (2009)
Lloyd, S.: Least square quantization in pcm. Tech. rep., Bell Telephone Laboratories (1957)
Lloyd, S.: Least square quantization in pcm. IEEE Trans. Information Theory 28(2), 129–137 (1982)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Stat. and Prob., pp. 281–297. University of California Press (1967)
Pal, N., Bezdek, J.: Complexity reduction for “large image” processing. IEEE Trans. Systems, Man, and Cybernetics B (32), 598–611 (2002)
Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proc. KDDM, pp. 23–32 (1999)
Rand, W.: Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Asooc. 66(336), 846–850 (1971)
Shankar, B.U., Pal, N.: FFCM: an effective approach for large data sets. In: Proc. Int. Conf. Fuzzy Logic, Neural Nets, and Soft Computing, Fukuoka, Japan, p. 332 (1994)
The UniProt Consotium: The universal protein resource (UniProt). Nucleic Acids Res. 35, D193–D197 (2007)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, San Diego (2009)
Tucker, W.: Counterexamples to the convergence theorem for fuzzy isodata clustering algorithms. In: Bezdek, J. (ed.) Analysis of Fuzzy Information, vol. 3, pp. 109–122. CRC Press, Boca Raton (1987)
Wu, Z., Xie, W., Yu, J.: Fuzzy c-means clustering algorithm based on kernel method. In: Proc. Int. Conf. Computational Intelligence and Multimedia Applications, pp. 49–54 (September 2003)
Xu, R., Wunsch II, D.: Clustering. IEEE Press, Psicataway (2009)
Zhang, R., Rudnicky, A.: A large scale clustering scheme for kernel k-means. In: Proc. Int. Conf. Pattern Recognition, pp. 289–292 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this paper
Cite this paper
Havens, T.C., Bezdek, J.C., Palaniswami, M. (2012). Incremental Kernel Fuzzy c-Means. In: Madani, K., Dourado Correia, A., Rosa, A., Filipe, J. (eds) Computational Intelligence. IJCCI 2010. Studies in Computational Intelligence, vol 399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27534-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-27534-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27533-3
Online ISBN: 978-3-642-27534-0
eBook Packages: EngineeringEngineering (R0)