Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/11687238_39guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Finding data broadness via generalized nearest neighbors

Published: 26 March 2006 Publication History

Abstract

A data object is broad if it is one of the k-Nearest Neighbors (k-NN) of many data objects. We introduce a new database primitive called Generalized Nearest Neighbor (GNN) to express data broadness. We also develop three strategies to answer GNN queries efficiently for large datasets of multidimensional objects. The R*-Tree based search algorithm generates candidate pages and ranks them based on their distances. Our first algorithm, Fetch All (FA), fetches as many candidate pages as possible. Our second algorithm, Fetch One (FO), fetches one candidate page at a time. Our third algorithm, Fetch Dynamic (FD), dynamically decides on the number of pages that needs to be fetched. We also propose three optimizations, Column Filter, Row Filter and Adaptive Filter, to eliminate pages from each dataset. Column Filter prunes the pages that are guaranteed to be non-broad. Row Filter prunes the pages whose removal do not change the broadness of any data point. Adaptive Filter prunes the search space dynamically along each dimension to eliminate unpromising objects. Our experiments show that FA is the fastest when the buffer size is large and FO is the fastest when the buffer size is small. FD is always either fastest or very close to the faster of FA and FO. FD is significantly faster than the existing methods adapted to the GNN problem.

References

[1]
Susanne Albers. Competitive Online Algorithms. Technical Report LS-96-2, brics, September 1996.
[2]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In International Conference on Management of Data (SIGMOD), pages 322-331, 1990.
[3]
S. Berchtold, B. Ertl, D.A. Keim, H.-P. Kriegel, and T. Seidl. Fast Nearest Neighbor Search in High-dimensional Space. In International Conference on Data Engineering (ICDE), pages 209-218, 1998.
[4]
K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When Is "Nearest Neighbor" Meaningful? In International Conference on Database Theory (ICDT), pages 217-235, 1999.
[5]
C. Böhm and F. Krebs. The k-Nearest Neighbour Join: Turbo Charging the KDD Process. Knowledge and Information Systems (KAIS), 6(6), 2004.
[6]
O. Çamoğlu, T. Kahveci, and A. K. Singh. Towards Index-based Similarity Search for Protein Structure Databases. Journal of Bioinformatics and Computational Biology (JBCB), 2(1):99-126, 2004.
[7]
Chee Yong Chan and Beng Chin Ooi. Efficient Scheduling of Page Access in Index-Based Join Processing. IEEE Transactions on Knowledge and Data Engineering (TKDE), 9(6):1005-1011, November/December 1997.
[8]
C. Ding and H. Peng. Minimum redundancy feature selection from microarray gene expression data. In Computational Systems Bioinformatics Conference (CSB), pages 523-528, 2003.
[9]
G.R. Hjaltason and H. Samet. Ranking in Spatial Databases. In Symposium on Spatial Databases, pages 83-95, Portland, Maine, August 1995.
[10]
X. Huang and A. Madan. CAP3: A DNA Sequence Assembly Program. Genome Research, 9(9):868-877, 1999.
[11]
Ibrahim Kamel and Christos Faloutsos. Hilbert R-tree: An Improved R-tree using Fractals. In International Conference on Very Large Databases (VLDB), pages 500-509, 1994.
[12]
F. Korn and S. Muthukrishnan. Influence sets based on reverse nearest neighbor queries. In International Conference on Management of Data (SIGMOD), pages 201-212, 2000.
[13]
F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas. Fast Nearest Neighbor Search in Medical Databases. In International Conference on Very Large Databases (VLDB), pages 215-226, India, 1996.
[14]
T. H. Merrett, Yahiko Kambayashi, and H. Yasuura. Scheduling of Page-Fetches in Join Operations. In International Conference on Very Large Databases (VLDB), pages 488-498, 1981.
[15]
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest Neighbor Queries. In International Conference on Management of Data (SIGMOD), San Jose, CA, 1995.
[16]
Mario Lopez Scott Leutenegger and Jeffrey Edgington. STR: A Simple and Efficient Algorithm for R-Tree Packing. In International Conference on Data Engineering (ICDE), pages 497-506, 1997.
[17]
B. Seeger. An analysis of schedules for performing multi-page requests. Information Systems, 21(5):387-407, 1996.
[18]
T. Seidl and H.P. Kriegel. Optimal Multi-Step k-Nearest Neighbor Search. In International Conference on Management of Data (SIGMOD), 1998.
[19]
I. Stanoi, M. Riedewald, D. Agrawal, and A.E. Abbadi. Discovery of Influence Sets in Frequently Updated Databases. In International Conference on Very Large Databases (VLDB), pages 99-108, 2001.
[20]
Y. Tao, D. Papadias, and X. Lian. Reverse kNN Search in Arbitrary Dimensionality. In International Conference on Very Large Databases (VLDB), 2004.
[21]
C. Xia, H. Lu, B.C. Ooi, and J. Hu. GORDER: An Efficient Method for KNN Join Processing. In International Conference on Very Large Databases (VLDB), 2004.
[22]
C. Yang and K.-I. Lin. An Index Structure for Efficient Reverse Nearest Neighbor Queries. In International Conference on Data Engineering (ICDE), pages 485-492, 2001.

Cited By

View all
  • (2010)Towards evaluating GRASIM for ontology-based data matchingProceedings of the 2010 international conference on On the move to meaningful internet systems: Part II10.5555/1926129.1926164(1009-1017)Online publication date: 25-Oct-2010

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
EDBT'06: Proceedings of the 10th international conference on Advances in Database Technology
March 2006
1204 pages
ISBN:3540329609
  • Editors:
  • Yannis Ioannidis,
  • Marc H. Scholl,
  • Joachim W. Schmidt,
  • Florian Matthes,
  • Mike Hatzopoulos

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 26 March 2006

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2010)Towards evaluating GRASIM for ontology-based data matchingProceedings of the 2010 international conference on On the move to meaningful internet systems: Part II10.5555/1926129.1926164(1009-1017)Online publication date: 25-Oct-2010

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media