Abstract
Crisp index structures introduce the problem of having sharp decision boundaries which may not be found in the real life clustering problems. In real world, specifically in the CBIR context, each data may not be fully assigned to one cluster and it may partially belong to other clusters, as opposed to the crisp index structures which fully affect data to clusters according to their proximity in terms of distance in the high-dimensional vector space. Based on kernel-fuzzy C-means clustering (KFCM) mechanism, this paper presents a fast and efficient index structure to support high-dimensional indexing for both crisp and fuzzy data. The proposed index structure offers a number of advantages such as a compact and efficient fuzzy data clustering. The experimental study demonstrates the efficiency and effectiveness of our method.
Similar content being viewed by others
Notes
References
http://www.cscolumbiaedu/CAVE/software/softlib/coil100php (1996)
Aizerman MA, Braverman EA, Rozonoer L (1964) Theoretical foundations of the potential function method in pattern recognition learning. In: Pattern recognition learning. pp 821–837
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51(1):117–122. doi:10.1145/1327452.1327494
Barranco CD, Campaña JR, Medina JM (2008) A b+-tree based indexing technique for fuzzy numerical data. Fuzzy Sets Syst 159:1431–1449
Barton S, Gouet-Brunet V, Rukoz M (2011) Large scale disk-based metric indexing structure for approximate information retrieval by content. In: Proceedings of the 1st workshop on new trends in similarity search, NTSS ’11. ACM, New York, pp 2–7
Bayer R, McCreight E (1972) Organization and maintenance of large ordered indexes. Acta Informatica 3:173–189
Bertino E, Guglielmina C (1993) Path-index: an approach to the efficient execution of object-oriented queries. Data Knowl Eng 10(1):1–27
Bertino E, Kim W (2002) Indexing techniques for queries on nested objects. Knowl Data Eng 1:196–214
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer, Norwell
Bosc P, Galibourg M (1989) Indexing principles for a fuzzy databases. Inf Syst 6:493–499
Zadeh LA, Kacprzyk J (eds) (1992) Fuzzy logic for the management of uncertainty. Wiley, New York
Chen L, Özsu MT (2005) Robuste and fast similarity search for moving object trajectories. In: Proceedings of the 24th international conference on management of data. pp 491–502
Chen T, Nakazato M, Huang TS (2002) Speeding up the similarity search in multimedia database. In: Proceedings of the 2002 IEEE international conference on multimedia and expo, ICME 2002, 26–29 August 2002, vol II. Lausanne, Switzerland, pp 509–512
Cheng R, Kalashnikov DV, Prabhakar S (2004) Querying imprecise data in improving object environments. IEEE Trans Knowl Data Eng 16(9):1112–1127
Chiang JH, Hao PY (2003) A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Trans Fuzzy Syst 11:518–527
Comer D (1979) Ubiquitous b-tree. ACM Comput Surv 2:121–137
Cui J, Zhou S, Zhao S (2007) Pcr-tree: a compression-based index structure for similarity searching in high-dimensional image databases. In: Proceedings of the fourth international conference on fuzzy systems and knowledge discovery, vol 02. FSKD ’07, pp 395–400
Daoudi I, Ouatik SEA, Kharraz AE, Idrissi K, Aboutajdine D (2008) Vector approximation based indexing for high-dimensional multimedia databases. Eng Lett 16(2):210–218
Daoudi I, Idrissi K, Ouatik SE, Baskurt A, Aboutajdine D (2009) An efficient high-dimensional indexing method for content-based retrieval in large image databases. Image Commun 24(10):775–790
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th annual symposium on computational geometry. Brooklyn, New York, pp 253–262. doi:10.1145/997817.997857
Faradjian A (2002) Gadt: a probability space adt for representing and querying the physical world. In: Proceedings of the 18th international conference on data engineering
George R, Srikanth R, Petry FE, Buckles BP (1996) Uncertainty management issues in the object-oriented data model. IEEE Trans Fuzzy Syst 4(2):179–192
Graves D, Pedrycz W (2007) Performance of kernel-based fuzzy clustering. Electron Lett 43:1445–1446
Graves D, Pedrycz W (2010) Kernel-based fuzzy clustering and fuzzy clustering: a comparative experimental study. Fuzzy Sets Syst 161(4):522–543
Gunnemann S, Kremer H, Lenhard D, Seidlm T (2011) Subspace clustering for indexing high dimensional data: a main memory index based on local reductions and individual multi-representations. In: Proceedings of the 14th international conference on extending database technology. ACM, New York, pp 237–248
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, SIGMOD ’84. pp 47–57
Heisterkamp DR, Peng J (2005) Kernel vector approximation files for relevance feedback retrieval in large image databases. Multimed Tools Appl 26:175–189
Idrissi K, Lavoué G, Ricard J, Baskurt A (2004) Object of interest-based visual navigation, retrieval, and semantic content identification system. Comput Vis Image Underst 94(1–3):271–294. doi:10.1016/j.cviu.2003.10.014
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality, pp 604–613. doi:10.1145/276698.276876
Koyuncu M, Yazici A (2003) Ifood: an intelligent fuzzy object-oriented database architecture. IEEE Trans Knowl Data Eng 15(5):1137–1154
MehtaM, Agrawal R, Rissanen J (1996) SLIQ: a fast scalable classifier for data mining. In: Apers PMG, Bouzeghoub M, Gardarin G (eds) Proceedings of the 5th international conference on extending database technology: advances in database technology (EDBT ’96). Springer-Verlag, London, pp 18–32
Nello C, Colin C, John St (1998) Dynamically adapting kernels in support vector machines. In: Advances in neural information processing systems 11. MIT Press, pp 204–210
Pang L, Xiao K, Liang A, Guan H (2012) A improved clustering analysis method based on fuzzy c-means algorithm by adding pso algorithm. 7208:231–242. doi:10.1007/978-3-642-28942-2_21
Pedrycz W (2005) Knowledge-based clustering: from data to information granules. Wiley-Interscience
Petry FE (1996) Fuzzy databases: principles and applications. In: International series in intelligent technologies. Kluwer Academic Publishers, Dordrecht, p 226
Saeid N, Amin AN, Saeid H, Abdolreza S, Farhad S (2012) Particle swarm optimization of kernelbased fuzzy c-means for hyperspectral data clustering. J Appl Remote Sens 001; 6(1):063601–063601. doi:10.1117/1.JRS.6.063601
Salembier P, Sikora T (2002) Introduction to MPEG-7: multimedia content description interface, vol 9, no 3. Wiley, New York, pp 83–93
Scholkopf B, Smola A, Müller KR (1999) Kernel principal component analysis. In: Burges CJC, Smola AJ (eds) Advances in kernel methods-support vector learning. MIT Press, Cambridge, pp 327–352
Shakhnarovich G, Darrell T (2008) Nearest-neighbor methods in learning and vision: theory and practice. Pattern Anal Applic 11(2):221–222
Tao Y, Yi K, Sheng C, Kalnis P (2009) Quality and efficiency in high dimensional nearest neighbor search. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data, Providence
Urruty T, Belkouch F, Djeraba C (2005) Kpyr: an efficient indexing method. In: Proceedings of the 2005 IEEE international conference on multimedia and expo, ICME 2005, July 6–9. Amsterdam, The Netherlands, pp 1448–1451
Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: Proceedings of the 24th international conference on very large data bases. Morgan Kaufmann Publishers Inc., San Francisco, pp 194–205
Wu Xiaohong ZJ (2006) Fuzzy principal component analysis and its kernel-based model. J Electron (China) 24:772–775
Yazici A, George R, Aksoy D (1998) Design and implementation issues in the fuzzy object-oriented data model. Inf Sci Inf Comput Sci 108:241–260
Yazici A, Ince C, Koyuncu M (2008) Food index: a multidimensional index structure for similarity-based fuzzy object oriented database models. IEEE Trans Fuzzy Syst 16(4):942–957
Ye L, Hua Y (2010) The cmvai-file: an efficient approximation-based high-dimensional index structure. In: Multimedia information networking and security (MINES), MINES ’10. Nanjing, pp 710–713
Ye L, Hua Y (2010) The cmvai-file: an efficient approximation-based high-dimensional index structure. In: Proceedings of the 2010 international conference on multimedia information networking and security, MINES ’10. IEEE Computer Society, Washington, pp 710–713
Yih JM, Lin YH, Liu HC (2008) Fcm algorithm based on unsupervised mahalanobis distances with better initial values and separable criterion. pp 326–331. http://dl.acm.org/citation.cfm?id=1.504034.1504094
Zeyu L, Shiwei T, Jing X, Jun J (2001) Modified fcm clustering based on kernel mapping. Internet Soc Opt Eng 4554:242–245
Zhang D, Chen S (2002) Fuzzy clustering unsing kernel method. In: In the 2002 international conference on control and automation, 2002. ICCA, 2002, pp 162–163
Zhou S, Gan JQ (2004) Mercer kernel fuzzy c-means algorithm and prototypes of clusters. In: International data engineering and automated learning, vol 3177. pp 613–618
Acknowledgments
We are grateful to Daniel Graves and Alexandr Andoni, for providing us the source code of the KFCM and E2LSH needed to carry out this experiments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Daoudi, I., Idrissi, K. A fast and efficient fuzzy approximation-based indexing for CBIR. Multimed Tools Appl 74, 4507–4533 (2015). https://doi.org/10.1007/s11042-013-1820-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1820-2