research-article

Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces

Authors:

Michael E. Houle,

Vincent OriaAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 27, Issue 12

Pages 3258 - 3273

https://doi.org/10.1109/TKDE.2015.2475740

Published: 01 December 2015 Publication History

Abstract

Numerous applications in different fields, such as spatial databases, multimedia databases, data mining, and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor (AggNN) search. Given a group of query objects Q, the goal of AggNN is to retrieve the k most similar objects from the database, where the underlying similarity measure is defined as an aggregation (usually sum or max) of the distances between the retrieved objects and every query object in Q. Recently, the problem was generalized so as to retrieve the k objects which are most similar to a fixed proportion of the elements of Q. This variant of aggregate similarity search is referred to as “flexible AggNN”, or FANN. In this work, we propose two approximation algorithms, one for the sum variant of FANN, and the other for the max variant. Extensive experiments are provided showing that, relative to state-of-the-art approaches (both exact and approximate), our algorithms produce query results with good accuracy, while at the same time being very efficient.

References

[1]

R. Fagin, A. Lotem, and M. Naor, “Optimal aggregation algorithms for middleware,” in Proc. Symp. Principles Database Syst., 2001, pp. 102–113.

[2]

A. Marian, N. Bruno, and L. Gravano, “Evaluating top-k queries over web-accessible databases,” ACM Trans. Database Syst., vol. 29, no. 2, pp. 319–362, 2004.

Digital Library

[3]

T. Bernecker, T. Emrich, F. Graf, H.-P. Kriegel, P. Kröger, M. Renz, E. Schubert, and A. Zimek, “Subspace similarity search using the ideas of ranking and top-k retrieval,” in Proc. ICDE Workshop DBRank, 2010, pp. 4–9.

[4]

H. L. Razente, M. C. N. Barioni, A. J. M. Traina, and C. Traina Jr., “Aggregate similarity queries in relevance feedback methods for content-based image retrieval,” in Proc. Symp. Appl. Comput., 2008, pp. 869–874.

[5]

H. L. Razente, M. C. N. Barioni, A. J. M. Traina, C. Faloutsos, and C. Traina Jr., “A novel optimization approach to efficiently process aggregate similarity queries in metric access methods,” in Proc. Int. Conf. Inf. Knowl. Manage. , 2008, pp. 193–202.

[6]

D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, “Group nearest neighbor queries,” in Proc. Int. Conf. Data Eng., 2004, pp. 301 –312.

[7]

G. Beliakov, T. Calvo, and S. James, “Aggregation of preferences in recommender systems,” in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, Eds. New York, NY, USA : Springer, 2010.

[8]

M. L. Yiu, N. Mamoulis, and D. Papadias, “Aggregate nearest neighbor queries in road networks,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 820–833, Jun. 2005.

Digital Library

[9]

F. Li, B. Yao, and P. Kumar, “Group enclosing queries,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 10, pp. 1526–1540, Oct. 2011.

Digital Library

[10]

D. Papadias, Y. Tao, K. Mouratidis, and C. K. Hui, “Aggregate nearest neighbor queries in spatial databases,” ACM Trans. Database Syst., vol. 30, no. 2, pp. 529–576, 2005.

Digital Library

[11]

Y. Li, F. Li, K. Yi, B. Yao, and M. Wang, “Flexible aggregate similarity search,” in Proc. Int. Conf. Manage. Data, 2011, pp.1009–1020.

[12]

F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, “Fast nearest neighbor search in medical image databases, ” in Proc. Int. Conf. Very Large Data Bases, 1996, pp. 215 –226.

[13]

T. Seidl and H.-P. Kriegel, “Optimal multi-step k-nearest neighbor search,” in Proc. Int. Conf. Manage. Data, 1998, pp. 154–165.

[14]

M. Houle, X. Ma, M. Nett, and V. Oria, “Dimensional testing for multi-step similarity search,” in Proc. Int. Conf. Data Mining, 2012, pp. 299–308.

[15]

M. Houle, H. Kashima, and M. Nett, “Generalized expansion dimension,” in Proc. IEEE ICDM Workshop Practical Theories Exploratory Data Mining, 2012, pp. 587–594.

[16]

P. Ciaccia, M. Patella, and P. Zezula, “M-tree: an efficient access method for similarity search in metric spaces,” in Proc. Int. Conf. Very Large Data Bases, 1997, pp. 426–435.

[17]

D. R. Karger and M. Ruhl, “Finding nearest neighbors in growth-restricted metrics,” in Proc. Symp. Theory Comput., 2002, pp. 741–750.

[18]

R. Chandrasekaran and A. Tamir, “Open questions concerning weiszfeld’s algorithm for the fermat-weber location problem,” Math. Program., Series A, vol. 44, pp. 293 –295, 1989.

[19]

M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications. New York, NY, USA: Springer, 2000.

Digital Library

[20]

P. Kumar, J. S. B. Mitchell, and E. A. Yildirim, “ Approximate minimum enclosing balls in high dimensions using core-sets,” ACM J. Exp. Algorithmics, vol. 8, article 1.1, 2003.

[21]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.

[22]

J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, “ The Amsterdam library of object images,” Int. J. Comput. Vis., vol. 61, no. 1, pp. 103–112, 2005.

Digital Library

[23]

N. Boujemaa, J. Fauqueur, M. Ferecatu, F. Fleuret, V. Gouet, B. L. Saux, and H. Sahbi, “IKONA: Interactive generic and specific image retrieval,” in Proc. Int. Workshop Multimedia Content-Based Indexing Retrieval, 2001, pp. 25–29.

[24]

K. Rose and B. S. Manjunath. The Cortina data set [Online]. Available: http://www.scl.ece.ucsb.edu/datasets/index.htm, 2011.

[25]

Reuters Ltd. Reuters corpus, volume 2, multilingual corpus [Online]. Available: http://trec.nist.gov/data/reuters/reuters.html , 2005.

[26]

M. E. Houle and J. Sakuma, “Fast approximate similarity search in extremely high-dimensional data sets,” in Proc. Int. Conf. Data Eng., 2005, pp. 619–630.

Cited By

Houle M(2020)Local Intrinsic Dimensionality III: Density and SimilaritySimilarity Search and Applications10.1007/978-3-030-60936-8_19(248-260)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-60936-8_19
Zhang PLin HGao YLu D(2018)Aggregate keyword nearest neighbor queries on road networksGeoinformatica10.1007/s10707-017-0315-022:2(237-268)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s10707-017-0315-0
Song YGu YYu G(2018)LIDH: An Efficient Filtering Method for Approximate k Nearest Neighbor Queries Based on Local Intrinsic DimensionWeb and Big Data10.1007/978-3-319-96890-2_22(268-276)Online publication date: 23-Jul-2018
https://dl.acm.org/doi/10.1007/978-3-319-96890-2_22
Show More Cited By

Index Terms

Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces
1. Human-centered computing
  1. Visualization

Index terms have been assigned to the content through auto-classification.

Recommendations

Flexible aggregate similarity search
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Aggregate similarity search, a.k.a. aggregate nearest neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves the most (or top-k) similar object to Q from a database P, ...
Flexible Aggregate Similarity Search in High-Dimensional Data Sets
SISAP 2015: Proceedings of the 8th International Conference on Similarity Search and Applications - Volume 9371

Numerous applications in different fields, such as spatial databases, multimedia databases, data mining and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor AggNN search. ...
Exact and approximate flexible aggregate similarity search

Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 27, Issue 12

Dec. 2015

281 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © 2015.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 December 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Houle M(2020)Local Intrinsic Dimensionality III: Density and SimilaritySimilarity Search and Applications10.1007/978-3-030-60936-8_19(248-260)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-60936-8_19
Zhang PLin HGao YLu D(2018)Aggregate keyword nearest neighbor queries on road networksGeoinformatica10.1007/s10707-017-0315-022:2(237-268)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s10707-017-0315-0
Song YGu YYu G(2018)LIDH: An Efficient Filtering Method for Approximate k Nearest Neighbor Queries Based on Local Intrinsic DimensionWeb and Big Data10.1007/978-3-319-96890-2_22(268-276)Online publication date: 23-Jul-2018
https://dl.acm.org/doi/10.1007/978-3-319-96890-2_22
von Ritter LHoule MGünnemann S(2018)Intrinsic Degree: An Estimator of the Local Growth Rate in GraphsSimilarity Search and Applications10.1007/978-3-030-02224-2_15(195-208)Online publication date: 7-Oct-2018
https://dl.acm.org/doi/10.1007/978-3-030-02224-2_15
Casanova GEnglmeier EHoule MKröger PNett MSchubert EZimek A(2017)Dimensional testing for reverse k-nearest neighbor searchProceedings of the VLDB Endowment10.14778/3067421.306742610:7(769-780)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.14778/3067421.3067426

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents