Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces

Published: 01 December 2015 Publication History

Abstract

Numerous applications in different fields, such as spatial databases, multimedia databases, data mining, and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor (AggNN) search. Given a group of query objects Q, the goal of AggNN is to retrieve the k most similar objects from the database, where the underlying similarity measure is defined as an aggregation (usually sum or max) of the distances between the retrieved objects and every query object in Q. Recently, the problem was generalized so as to retrieve the k objects which are most similar to a fixed proportion of the elements of Q. This variant of aggregate similarity search is referred to as “flexible AggNN”, or FANN. In this work, we propose two approximation algorithms, one for the sum variant of FANN, and the other for the max variant. Extensive experiments are provided showing that, relative to state-of-the-art approaches (both exact and approximate), our algorithms produce query results with good accuracy, while at the same time being very efficient.

References

[1]
R. Fagin, A. Lotem, and M. Naor, “Optimal aggregation algorithms for middleware,” in Proc. Symp. Principles Database Syst., 2001, pp. 102–113.
[2]
A. Marian, N. Bruno, and L. Gravano, “Evaluating top-k queries over web-accessible databases,” ACM Trans. Database Syst., vol. 29, no. 2, pp. 319–362, 2004.
[3]
T. Bernecker, T. Emrich, F. Graf, H.-P. Kriegel, P. Kröger, M. Renz, E. Schubert, and A. Zimek, “Subspace similarity search using the ideas of ranking and top-k retrieval,” in Proc. ICDE Workshop DBRank, 2010, pp. 4–9.
[4]
H. L. Razente, M. C. N. Barioni, A. J. M. Traina, and C. Traina Jr., “Aggregate similarity queries in relevance feedback methods for content-based image retrieval,” in Proc. Symp. Appl. Comput., 2008, pp. 869–874.
[5]
H. L. Razente, M. C. N. Barioni, A. J. M. Traina, C. Faloutsos, and C. Traina Jr., “A novel optimization approach to efficiently process aggregate similarity queries in metric access methods,” in Proc. Int. Conf. Inf. Knowl. Manage. , 2008, pp. 193–202.
[6]
D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, “Group nearest neighbor queries,” in Proc. Int. Conf. Data Eng., 2004, pp. 301 –312.
[7]
G. Beliakov, T. Calvo, and S. James, “Aggregation of preferences in recommender systems,” in Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, Eds. New York, NY, USA : Springer, 2010.
[8]
M. L. Yiu, N. Mamoulis, and D. Papadias, “Aggregate nearest neighbor queries in road networks,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 820–833, Jun. 2005.
[9]
F. Li, B. Yao, and P. Kumar, “Group enclosing queries,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 10, pp. 1526–1540, Oct. 2011.
[10]
D. Papadias, Y. Tao, K. Mouratidis, and C. K. Hui, “Aggregate nearest neighbor queries in spatial databases,” ACM Trans. Database Syst., vol. 30, no. 2, pp. 529–576, 2005.
[11]
Y. Li, F. Li, K. Yi, B. Yao, and M. Wang, “Flexible aggregate similarity search,” in Proc. Int. Conf. Manage. Data, 2011, pp.1009–1020.
[12]
F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, “Fast nearest neighbor search in medical image databases, ” in Proc. Int. Conf. Very Large Data Bases, 1996, pp. 215 –226.
[13]
T. Seidl and H.-P. Kriegel, “Optimal multi-step k-nearest neighbor search,” in Proc. Int. Conf. Manage. Data, 1998, pp. 154–165.
[14]
M. Houle, X. Ma, M. Nett, and V. Oria, “Dimensional testing for multi-step similarity search,” in Proc. Int. Conf. Data Mining, 2012, pp. 299–308.
[15]
M. Houle, H. Kashima, and M. Nett, “Generalized expansion dimension,” in Proc. IEEE ICDM Workshop Practical Theories Exploratory Data Mining, 2012, pp. 587–594.
[16]
P. Ciaccia, M. Patella, and P. Zezula, “M-tree: an efficient access method for similarity search in metric spaces,” in Proc. Int. Conf. Very Large Data Bases, 1997, pp. 426–435.
[17]
D. R. Karger and M. Ruhl, “Finding nearest neighbors in growth-restricted metrics,” in Proc. Symp. Theory Comput., 2002, pp. 741–750.
[18]
R. Chandrasekaran and A. Tamir, “Open questions concerning weiszfeld’s algorithm for the fermat-weber location problem,” Math. Program., Series A, vol. 44, pp. 293 –295, 1989.
[19]
M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications. New York, NY, USA: Springer, 2000.
[20]
P. Kumar, J. S. B. Mitchell, and E. A. Yildirim, “ Approximate minimum enclosing balls in high dimensions using core-sets,” ACM J. Exp. Algorithmics, vol. 8, article 1.1, 2003.
[21]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[22]
J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, “ The Amsterdam library of object images,” Int. J. Comput. Vis., vol. 61, no. 1, pp. 103–112, 2005.
[23]
N. Boujemaa, J. Fauqueur, M. Ferecatu, F. Fleuret, V. Gouet, B. L. Saux, and H. Sahbi, “IKONA: Interactive generic and specific image retrieval,” in Proc. Int. Workshop Multimedia Content-Based Indexing Retrieval, 2001, pp. 25–29.
[24]
K. Rose and B. S. Manjunath. The Cortina data set [Online]. Available: http://www.scl.ece.ucsb.edu/datasets/index.htm, 2011.
[25]
Reuters Ltd. Reuters corpus, volume 2, multilingual corpus [Online]. Available: http://trec.nist.gov/data/reuters/reuters.html , 2005.
[26]
M. E. Houle and J. Sakuma, “Fast approximate similarity search in extremely high-dimensional data sets,” in Proc. Int. Conf. Data Eng., 2005, pp. 619–630.

Cited By

View all

Index Terms

  1. Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Knowledge and Data Engineering
    IEEE Transactions on Knowledge and Data Engineering  Volume 27, Issue 12
    Dec. 2015
    281 pages

    Publisher

    IEEE Educational Activities Department

    United States

    Publication History

    Published: 01 December 2015

    Author Tags

    1. dimensional test
    2. Similarity search
    3. aggregation
    4. intrinsic dimensionality

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Local Intrinsic Dimensionality III: Density and SimilaritySimilarity Search and Applications10.1007/978-3-030-60936-8_19(248-260)Online publication date: 30-Sep-2020
    • (2018)Aggregate keyword nearest neighbor queries on road networksGeoinformatica10.1007/s10707-017-0315-022:2(237-268)Online publication date: 1-Apr-2018
    • (2018)LIDH: An Efficient Filtering Method for Approximate k Nearest Neighbor Queries Based on Local Intrinsic DimensionWeb and Big Data10.1007/978-3-319-96890-2_22(268-276)Online publication date: 23-Jul-2018
    • (2018)Intrinsic Degree: An Estimator of the Local Growth Rate in GraphsSimilarity Search and Applications10.1007/978-3-030-02224-2_15(195-208)Online publication date: 7-Oct-2018
    • (2017)Dimensional testing for reverse k-nearest neighbor searchProceedings of the VLDB Endowment10.14778/3067421.306742610:7(769-780)Online publication date: 1-Mar-2017

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media