Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3557915.3560935acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
poster

Learned k-NN distance estimation

Published: 22 November 2022 Publication History

Abstract

Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life applications. In such analysis, the distances to k nearest neighbors are usually employed, thus its main bottleneck is derived from data retrieval. Much efforts have been made to improve the efficiency of these analyses. However, they still incur large costs, because they essentially need many data accesses. To avoid this issue, we propose a machine-learning technique that quickly and accurately estimates the k-NN distances (i.e., distances to the k nearest neighbors) of a given query. We train a fully connected neural network model and utilize pivots to achieve accurate estimation. Our model is designed to have useful advantages: it infers distances to the k-NNs at a time, its inference time is O(1) (no data accesses are incurred), but it keeps high accuracy. Our experimental results and case studies on real datasets demonstrate the efficiency and effectiveness of our solution.

References

[1]
Daichi Amagata, Yusuke Arai, Sumio Fujita, and Takahiro Hara. 2022. Learned k-NN Distance Estimation. arXiv:2208.14210 (2022).
[2]
Daichi Amagata and Takahiro Hara. 2021. Fast Density-Peaks Clustering: Multicore-based Parallelization Approach. In SIGMOD. 49--61.
[3]
Daichi Amagata, Makoto Onizuka, and Takahiro Hara. 2021. Fast and Exact Outlier Detection in Metric Spaces: A Proximity Graph-based Approach. In SIGMOD. 36--48.
[4]
Daichi Amagata, Makoto Onizuka, and Takahiro Hara. 2022. Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spaces. The VLDB Journal (2022), 1--25.
[5]
Pierre Baldi, Kyle Cranmer, Taylor Faucett, Peter Sadowski, and Daniel Whiteson. 2016. Parameterized machine learning for high-energy physics. arXiv preprint arXiv:1601.07913 (2016).
[6]
Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM 18, 9 (1975), 509--517.
[7]
Gérard Biau, Frédéric Chazal, David Cohen-Steiner, Luc Devroye, and Carlos Rodriguez. 2011. A Weighted k-Nearest Neighbor Density Estimate for Geometric Inference. Electronic Journal of Statistics 5 (2011), 204--237.
[8]
Ricardo JGB Campello, Davoud Moulavi, Arthur Zimek, and Jörg Sander. 2015. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM TKDD 10, 1 (2015), 1--51.
[9]
Tsz Nam Chan, Reynold Cheng, and Man Lung Yiu. 2020. QUAD: Quadratic-bound-based Kernel Density Visualization. In SIGMOD. 35--50.
[10]
Yuchen Ji, Daichi Amagata, Yuya Sasaki, and Takahiro Hara. 2022. A Performance Study of One-dimensional Learned Cardinality Estimation. In DOLAP. 86--90.
[11]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: a highly efficient gradient boosting decision tree. In NIPS. 3149--3157.
[12]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. 807--814.
[13]
Attila Reiss and Didier Stricker. 2012. Introducing a new benchmarked dataset for activity monitoring. In ISWC. 108--109.
[14]
Gary M Weiss, Kenichi Yoneda, and Thaier Hayajneh. 2019. Smartphone and Smartwatch-based Biometrics using Activities of Daily Living. IEEE Access 7 (2019), 133190--133202.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '22: Proceedings of the 30th International Conference on Advances in Geographic Information Systems
November 2022
806 pages
ISBN:9781450395298
DOI:10.1145/3557915
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2022

Check for updates

Author Tags

  1. k-NN query
  2. machine learning
  3. spatial data

Qualifiers

  • Poster

Funding Sources

  • JSPS
  • JST

Conference

SIGSPATIAL '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 220 of 1,116 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)3
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Spatial Query Optimization With LearningProceedings of the VLDB Endowment10.14778/3685800.368584617:12(4245-4248)Online publication date: 1-Aug-2024
  • (2024)Independent Range Sampling on Interval Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00041(449-461)Online publication date: 13-May-2024
  • (2024)Estimating Visited Stores Through Positive-Unlabeled LearningDatabase Systems for Advanced Applications10.1007/978-981-97-5575-2_28(377-389)Online publication date: 2-Sep-2024
  • (2024)An Efficient Framework for Approximate Nearest Neighbor Search on High-Dimensional Multi-metric DataSimilarity Search and Applications10.1007/978-3-031-75823-2_1(3-17)Online publication date: 25-Oct-2024
  • (2023)Approximate Reverse Top-k Spatial-Keyword Queries2023 24th IEEE International Conference on Mobile Data Management (MDM)10.1109/MDM58254.2023.00026(96-105)Online publication date: Jul-2023
  • (2022)Retrieving Top-N Weighted Spatial k-cliques2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10021071(4952-4961)Online publication date: 17-Dec-2022
  • (2022)Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020690(445-454)Online publication date: 17-Dec-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media