Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2857218.2857233acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedesConference Proceedingsconference-collections
research-article

SimSearch: similarity search framework based on indexing techniques in metric spaces

Published: 25 October 2015 Publication History

Abstract

Similarity search in metric spaces refers to searching elements in data repositories that are similar to an element supplied by the user (query example). Similarity functions are used to determine which elements in the data repositories are similar to the query example and indexing mechanisms are used to improve the efficiency in the search. Classic indexation mechanisms such as LSH, M-Index, and M-Tree behave different according to the dimensionality in the metric space, volume of data repositories, and query strategies. In this paper, we describe SimSearch, a modular and flexible framework for similarity search in metric spaces, which allows to use, analyse, compare, and add several indexation mechanisms, search approaches, and query strategies. SimSearch allows doing queries given one or more example elements to obtain the set of elements more similar to the query examples, using query composition and Skyline. We show the variability of performance of several indexation mechanisms, including LSH-ML (our proposed variant of LSH), with experimental study in the domain of images represented by a feature vector in a high dimensionality metric space and Web Services represented by a vector with the values of Quality of Service (QoS) parameters.

References

[1]
C. Akgül, D. Rubin, S. Napel, C. Beaulieu, H. Greenspan, and B. Acar. Content-based image retrieval in radiology: Current status and future directions. J. of Digital Imaging, 24(2):208--222, 2011.
[2]
E. Al-Masri and Q. H. Mahmoud. Qos-based discovery and ranking of web services. In Proc. of the 16th Internat. Conf. on Computer Communications and Networks (ICCCN), pages 529--534. IEEE, 2007.
[3]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, 2008.
[4]
A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA'14, pages 1018--1028, 2014.
[5]
F. Aurenhammer. Voronoi diagrams a survey of a fundamental geometric data structure. ACM Comput. Surv., 23(3):345--405, Sept. 1991.
[6]
M. Brut, D. Codreanu, S. Dumitrescu, A.-M. Manzat, and F. Sedes. A distributed architecture for flexible multimedia management and retrieval. In Database and Expert Systems Applications, volume 6861 of LNCS, pages 249--263. Springer Berlin, 2011.
[7]
S. Chafik, I. Daoudi, H. El Ouardi, M. El Yacoubi, and B. Dorizzi. Locality sensitive hashing for content based image retrieval: A comparative experimental study. In Fifth Internac. Conf. on Next Generation Networks and Services (NGNS), pages 38--43, 2014.
[8]
P. Ciaccia, M. Patella, F. Rabitti, and P. Zezula. Indexing metric spaces with m-tree. In Proc. Quinto convegno Nazionale SEBD. Citeseer, 1997.
[9]
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In Proc. of the 23rd VLDB conference, Athens, Greece, page 426--435. Citeseer, 1997.
[10]
D. Comer. Ubiquitous b-tree. ACM Comput. Surv., 11(2):121--137, June 1979.
[11]
F. Dabek, E. Brunskill, M. Kaashoek, D. Karger, R. Morris, I. Stoica, and H. Balakrishnan. Building peer-to-peer systems with Chord, a distributed lookup service. In Proc. of the Eighth Workshop on Hot Topics in Operating Systems., page 81--86, 2001.
[12]
M. Danisch, J.-L. Guillaume, and B. Le Grand. Towards multi-ego-centered communities: a node similarity approach. International Journal of Web Based Communities, 9(3):299--322, 2013.
[13]
C. Doulkeridis, A. Vlachou, Y. Kotidis, and M. Vazirgiannis. Peer-to-peer similarity search in metric spaces. In Proc. of the 33rd internat. conf. on Very large data bases, page 986--997, 2007.
[14]
F. Falchi. A Content-Addressable Network for Similarity Search in Metric Spaces. PhD thesis, University of Pisa, Italy, 5 2007.
[15]
D. Fuhry, R. Jin, and D. Zhang. Efficient skyline computation in metric space. In Proc. of the 12th Internat. Conf. on Extending Database Technology: Advances in Database Tech., page 1042--1051, 2009.
[16]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proc. of the 25th Internat. Conf. on Very Large Data Bases, 1999.
[17]
K. Grolinger, M. Capretz, A. Cunha, and S. Tazi. Integration of business process modeling and web services: a survey. Service Oriented Computing and Applications, 8(2):105--128, 2014.
[18]
A. Guttman. R-trees: A dynamic index structure for spatial searching. ACM, 1984.
[19]
M. J. Huiskes and M. S. Lew. The MIR Flickr Retrieval Evaluation. In Proc. of ACM Internat. Conf. on Multimedia Information Retrieval (MIR), 2008.
[20]
B. Ionescu, A.-L. Radu, M. Menéndez, H. Müller, A. Popescu, and B. Loni. Div400: A social image retrieval result diversification dataset. In Proc. of the ACM Multimedia Systems Conf., pages 29--34, 2014.
[21]
M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl., 2(1):1--19, Feb. 2006.
[22]
D. Novak, M. Batko, and P. Zezula. Large-scale similarity data management with distributed metric index. Inf. Process. Manage., 48(5):855--872, 2012.
[23]
D. Novak and P. Zezula. M-Chord: a scalable distributed similarity search structure. In Proc. of the 1st international conference on Scalable information systems. ACM, 2006.
[24]
I. F. Rajam and S. Valli. A Survey on Content Based Image Retrieval. Life Science Journal, 10(2):2475--2487, 2013.
[25]
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. ACM Computer Comm. Review, 31(4):161--172, 2001.
[26]
M. Rukoz, M. Manouvrier, and G. Jomier. Δ-distance: A family of dissimilarity metrics between images represented by multi-level feature vectors. Information Retrieval, 9(6):633--655, 2006.
[27]
O. Sahin, A. Gulbeden, F. Emekçci, D. Agrawal, and A. El Abbadi. PRISM: indexing multi-dimensional data in P2P networks using reference vectors. In ACM Internat Conf on Multimedia, page 946--955, 2005.
[28]
J. Wang and C. Lin. Mapreduce based personalized locality sensitive hashing for similarity joins on large scale data. Computational Intelligence and Neuroscience, 2015:1--14, 201.
[29]
D. Zaragoza, Y. Cardinale, and M. Rukoz. Búsqueda por Similitud de Contenido basada en Técnicas de Indexacióon en Espacios Métrico. In Conferencia Nacional de Computación, Informática y Sistemas (CoNCISa), pages 16--23, 2013.
[30]
P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach, volume 32 of Advances in Database Systems. 2006.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEDES '15: Proceedings of the 7th International Conference on Management of computational and collective intElligence in Digital EcoSystems
October 2015
271 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • The French Chapter of ACM Special Interest Group on Applied Computing
  • IFSP: Federal Institute of São Paulo

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. indexing mechanisms
  2. query strategies
  3. similarly search

Qualifiers

  • Research-article

Conference

MEDES '15
Sponsor:
  • IFSP

Acceptance Rates

MEDES '15 Paper Acceptance Rate 13 of 64 submissions, 20%;
Overall Acceptance Rate 267 of 682 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 50
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media