Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3581784.3607062acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods

Published: 11 November 2023 Publication History

Abstract

The top-K problem is an essential part of many important applications in scientific computing, information retrieval, etc. As data volume grows rapidly, high-performance parallel top-K algorithms become critical. We propose two parallel top-K algorithms, AIR Top-K (Adaptive and Iteration-fused Radix Top-K) and GridSelect, for GPU. AIR Top-K employs an iteration-fused design to minimize CPU-GPU communication and device data access. Its adaptive strategy eliminates unnecessary device memory traffic automatically under various data distributions. GridSelect can process data on-the-fly. It adopts a shared queue and parallel two-step insertion to decrease the frequency of costly operations. We comprehensively compare 8 open-source GPU implementations and our methods for a wide range of problem sizes and data distributions. For batch sizes 1 and 100, respectively, AIR Top-K shows 1.98--21.48× and 8.01--574.78× speedup over previous radix top-K algorithm, and 1.44--7.34× and 1.38--31.91× speedup over state-of-the-art methods. GridSelect shows up to 882.29× speedup over its baseline.

Supplemental Material

MP4 File - SC23 paper presentation recording for "Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods"
SC23 paper presentation recording for "Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods", by Jingrong Zhang, Akira Naruse, Xipeng Li and Yong Wang.

References

[1]
Tolu Alabi, Jeffrey D Blanchard, Bradley Gordon, and Russel Steinbach. 2012. Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics (JEA) 17 (2012), 4--1.
[2]
Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems 87 (2020), 101374.
[3]
Artem Babenko and Victor S. Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. IEEE Computer Society, Las Vegas, 2055--2063.
[4]
Ricardo Barrientos, J.I. Gomez, Christian Tenllado, and Manuel Prieto Matias. 2010. Heap based k-nearest neighbor search on GPUs. Congreso Espanol de Informática (CEDI) 1, 7 (01 2010), 559--566.
[5]
Ricardo J Barrientos, Fabricio Millaguir, José L Sánchez, and Enrique Arias. 2017. GPU-based exhaustive algorithms processing kNN queries. The Journal of Super-computing 73, 10 (2017), 4611--4634.
[6]
Nathan Bell and Jared Hoberock. 2012. Chapter 26 - Thrust: A Productivity-Oriented Library for CUDA. In GPU Computing Gems Jade Edition, Wen mei W. Hwu (Ed.). Morgan Kaufmann, Boston, 359--371.
[7]
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM '19). Association for Computing Machinery, New York, NY, USA, 456--464.
[8]
Jack Choquette and Wish Gandhi. 2020. NVIDIA A100 GPU Performance & Innovation for GPU Computing. In 2020 IEEE Hot Chips 32 Symposium (HCS). IEEE Computer Society, IEEE, Palo Alto, CA, 1--43.
[9]
Ali Dashti, Ivan Komarov, and Roshan M D'Souza. 2013. Efficient computation of k-nearest neighbour graphs for large high-dimensional data sets on GPU clusters. Plos one 8, 9 (2013), e74113.
[10]
Meta Platforms, Inc. 2022. Faiss v1.7.3. Meta Platforms, Inc. Retrieved March 1, 2023 from https://github.com/facebookresearch/faiss
[11]
Anil Gaihre. 2022. Anil-Gaihre/DrTopKSC. https://github.com/Anil-Gaihre/DrTopKSC
[12]
Anil Gaihre, Da Zheng, Scott Weitze, Lingda Li, Shuaiwen Leon Song, Caiwen Ding, Xiaoye S. Li, and Hang Liu. 2021. Dr. Top-k: Delegate-Centric Top-k on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (St. Louis, Missouri) (SC '21). Association for Computing Machinery, New York, NY, USA, Article 39, 14 pages.
[13]
David E Graff, Eugene I Shakhnovich, and Connor W Coley. 2021. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chemical science 12, 22 (2021), 7866--7881.
[14]
Bonan Huang, Jinlan Gao, and Xiaoming Li. 2009. An empirically optimized radix sort for GPU. In 2009 IEEE international symposium on parallel and distributed processing with applications. IEEE, IEEE, Chengdu, 234--241.
[15]
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (Jan. 2011), 117--128. https://inria.hal.science/inria-00514462
[16]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2021. Billion-scale similarity search with gpus. IEEE Transactions on Big Data 7, 3 (2021), 535--547.
[17]
Ivan Komarov, Ali Dashti, and Roshan M D'Souza. 2014. Fast k-NNG construction with GPU-based quick multi-select. PloS one 9, 5 (2014), e92409.
[18]
Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, Vancouver, BC, Canada, 14 pages. https://openreview.net/forum?id=SkhQHMW0W
[19]
Abdelhamid Malki, Sidi-Mohamed Benslimane, Mimoun Malki, Mahmoud Barhamgi, and Djamal Benslimane. 2020. Top-k query optimization over data services. Future Generation Computer Systems 113 (2020), 1--12.
[20]
Duane Merrill. 2015. Cub. https://on-demand.gputechconf.com/gtc/2015/presentation/S5617-Duane-Merrill.pdf
[21]
Nvidia. 2020. NVIDIA A100 Tensor Core Gpu. https://www.nvidia.com/en-us/data-center/a100/
[22]
Nvidia. 2021. NVIDIA A10 Tensor Core Gpu. https://www.nvidia.com/en-us/data-center/products/a10-gpu/
[23]
Nvidia. 2022. CUDA 12.0 Release Notes. https://docs.nvidia.com/cuda/archive/12.0.0/cuda-toolkit-release-notes/index.html
[24]
NVIDIA. 2023. CUDA C++ Best Practices Guide. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/
[25]
Nvidia. 2023. NVIDIA H100 Tensor Core Gpu. https://www.nvidia.com/en-us/data-center/h100/
[26]
NVIDIA. 2023. NVIDIA Kernel Profiling Guide. https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html
[27]
Omar Obeya, Endrias Kahssay, Edward Fan, and Julian Shun. 2019. Theoretically-efficient and practical parallel in-place radix sorting. In The 31st ACM Symposium on Parallelism in Algorithms and Architectures. IEEE, Phoenix AZ USA, 213--224.
[28]
Rapidsai. 2022. Rapidsai/raft: RAFT contains fundamental widely-used algorithms and primitives for data science, Graph and machine learning. https://github.com/rapidsai/raft
[29]
NVIDIA Research. 2022. cub::DeviceRadixSort. https://nvlabs.github.io/cub/structcub_1_1_device_radix_sort.html
[30]
Tobias Ribizel. 2020. gpu selection. https://github.com/upsj/gpu_selection
[31]
Tobias Ribizel and Hartwig Anzt. 2020. Parallel selection on GPUs. Parallel Comput. 91 (2020), 102588.
[32]
Anil Shanbhag, Holger Pirk, and Samuel Madden. 2018. Efficient top-k query processing on massively parallel hardware. In Proceedings of the 2018 International Conference on Management of Data. Association for Computing Machinery, Houston TX USA, 1557--1570.
[33]
Xiaoxin Tang, Zhiyi Huang, David Eyers, Steven Mills, and Minyi Guo. 2015. Efficient selection algorithm for fast k-nn search on gpus. In 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, IEEE, Hyderabad, India, 397--406.
[34]
Polychronis Velentzas, Panagiotis Moutafis, and George Mavrommatis. 2021. An Improved GPU-Based Algorithmfor Processing the k Nearest Neighbor Query. In 24th Pan-Hellenic Conference on Informatics (Athens, Greece) (PCI 2020). Association for Computing Machinery, New York, NY, USA, 372--375.
[35]
Wikipedia. 2022. Selection algorithm. https://en.wikipedia.org/wiki/Selection_algorithm
[36]
Feng Xue, Xiangnan He, Xiang Wang, Jiandong Xu, Kai Liu, and Richang Hong. 2019. Deep item-based collaborative filtering for top-n recommendation. ACM Transactions on Information Systems (TOIS) 37, 3 (2019), 1--25.
[37]
Hamed Zamani, Susan Dumais, Nick Craswell, Paul Bennett, and Gord Lueck. 2020. Generating Clarifying Questions for Information Retrieval. In Proceedings of The Web Conference 2020 (Taipei, Taiwan) (WWW '20). Association for Computing Machinery, New York, NY, USA, 418--428.
[38]
Vasileios Zois, Vassilis J Tsotras, and Walid A Najjar. 2019. GPU accelerated top-k selection with efficient early stopping.

Cited By

View all
  • (2024)RadiK: Scalable and Optimized GPU-Parallel Radix Top-K SelectionProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656596(537-548)Online publication date: 30-May-2024
  • (2024)Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00289(3767-3781)Online publication date: 13-May-2024
  • (2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 29-Mar-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2023
1428 pages
ISBN:9798400701092
DOI:10.1145/3581784
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2023

Check for updates

Author Tags

  1. top-K
  2. K-selection
  3. radix select
  4. parallel algorithms
  5. GPU
  6. CUDA

Qualifiers

  • Research-article

Conference

SC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,825
  • Downloads (Last 6 weeks)427
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)RadiK: Scalable and Optimized GPU-Parallel Radix Top-K SelectionProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656596(537-548)Online publication date: 30-May-2024
  • (2024)Neos: A NVMe-GPUs Direct Vector Service Buffer in User Space2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00289(3767-3781)Online publication date: 13-May-2024
  • (2024)Split-bucket partition (SBP): a novel execution model for top-K and selection algorithms on GPUsThe Journal of Supercomputing10.1007/s11227-024-06031-x80:11(15122-15160)Online publication date: 29-Mar-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media