Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1376616.1376670acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Relational joins on graphics processors

Published: 09 June 2008 Publication History

Abstract

We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for writing to random memory locations, efficient inter-processor communication, and a programming model for general-purpose computing. Taking advantage of these new features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts.

References

[1]
A. Ailamaki, N. Govindaraju, S. Harizopoulos and D. Manocha. Query co-processing on commodity processors. VLDB, 2006.
[2]
AMD CTM, http://ati.amd.com/products/streamprocessor/.
[3]
S. Azadegan, A. R. Tripathi. Parallel join algorithms for SIMD models. ICPP (3) 1991: 125--133.
[4]
S. Azadegan, A. R. Tripathi. A parallel join algorithm for SIMD architectures. Journal of Systems and Software 39(3), 1997.
[5]
N. Bandi, C. Sun, D. Agrawal and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. VLDB, 2004.
[6]
D. Blythe. The Direct3D 10 system. SIGGRAPH 2006.
[7]
C. Boyd. Mass market applications of massively parallel computing. ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2007.
[8]
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. SIGGRAPH 2004.
[9]
G. E. Blelloch. Prefix sums and their applications. Technical report, CMU-CS-90-190, Nov 1990.
[10]
P. Boncz, S. Manegold and M. Kersten. Database architecture optimized for the new bottleneck: memory access. VLDB, 1999.
[11]
J. Cieslewicz, J. Berry, B. Hendrickson and K. A. Ross. Realizing parallelism in database operations: insights from a massively multithreaded architecture. DaMoN, 2006.
[12]
T. Cormen, C. Leiserson, R. Rivest and C. Stein. Introduction to Algorithms, Second Edition. 2001.
[13]
D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. In Communications of the ACM, Vol. 35, No. 6, June 1992.
[14]
N. Govindaraju, J. Gray, R. Kumar and D. Manocha. GPUTeraSort: high performance graphics coprocessor sorting for large database management. SIGMOD, 2006.
[15]
N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. SIGMOD, 2004.
[16]
N. Govindaraju, N. Raghuvanshi and D. Manocha. Fast and approximate stream mining of quantiles and frequencies using graphics processors. SIGMOD, 2005.
[17]
N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: limitations and opportunities. CIDR, 2007.
[18]
M. Harris, J. Owens, S. Sengupta, Y. Zhang and A. Davidson. CUDPP: CUDA Data Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/, 2007.
[19]
B. He, N. Govindaraju, Q. Luo and B. Smith. Efficient gather and scatter operations on graphics processors. ACM/IEEE Supercomputing, 2007.
[20]
D. Horn. Stream reduction operations for GPGPU applications. In GPU Gems 2, Ed. Addison Wesley, 2005.
[21]
K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. VLDB, 1991.
[22]
A. Lamarca and R. Ladner. The influence of caches on the performance of sorting. SODA, 1997.
[23]
M. D. Lieberman, J. Sankaranarayanan, H. Samet. A fast similarity join algorithm using graphics processing units. ICDE, 2008.
[24]
B. Liu and E. Rundensteiner. Revisiting pipelined parallelism in multi-join query processing. VLDB, 2005.
[25]
H. Lu, K. Tan and M. Shan. Hash-based join algorithms for multiprocessor computers with shared memory. VLDB, 1990.
[26]
MonetDB. http://monetdb.cwi.nl/.
[27]
NVIDIA CUDA (Compute Unified Device Architecture), http://developer.nvidia.com/object/cuda.html.
[28]
OpenGL, http://www.opengl.org/.
[29]
OpenMP, http://www.openmp.org/.
[30]
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum (26), 2007.
[31]
J. Rao and K. A. Ross. Cache conscious indexing for decision-support in main memory. VLDB, 1999.
[32]
D. A. Schneider and D. DeWitt. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. SIGMOD, 1989.
[33]
S. Sengupta, M. Harris, Y. Zhang, J. D. Owens. Scan primitives for GPU computing. ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2007.
[34]
A. Shatdal, C. Kant and J. F. Naughton. Cache conscious algorithms for relational query processing. VLDB, 1994.
[35]
M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran and S. Zdonik. C-Store: A column-oriented DBMS. VLDB, 2005.
[36]
C. Sun, D. Agrawal and A. El Abbadi. Hardware acceleration for spatial selections and joins. SIGMOD, 2003.
[37]
D. Tarditi, S. Puri and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. ASPLOS, 2006.
[38]
A. Thall. Extended-precision floating-point numbers for GPU computation. SIGGRAPH, 2006.
[39]
M. Zagha and G. E. Blelloch. Radix sort for vector multiprocessors. ACM/IEEE Supercomputing, 1991.
[40]
M. Zukowski, S. Heman, N. Nes, P. Boncz. Super-Scalar RAM-CPU Cache Compression. ICDE, 2006.

Cited By

View all
  • (2024)AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKSAdıyaman Üniversitesi Mühendislik Bilimleri Dergisi10.54365/adyumbd.1508182Online publication date: 3-Oct-2024
  • (2024)INFINEL: An efficient GPU-based processing method for unpredictable large output graph queriesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638490(147-159)Online publication date: 2-Mar-2024
  • (2024)G-Learned Index: Enabling Efficient Learned Index on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338121435:6(950-967)Online publication date: 2-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graphics processors
  2. join
  3. parallel processing
  4. primitive
  5. relational database
  6. sort

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKSAdıyaman Üniversitesi Mühendislik Bilimleri Dergisi10.54365/adyumbd.1508182Online publication date: 3-Oct-2024
  • (2024)INFINEL: An efficient GPU-based processing method for unpredictable large output graph queriesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638490(147-159)Online publication date: 2-Mar-2024
  • (2024)G-Learned Index: Enabling Efficient Learned Index on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338121435:6(950-967)Online publication date: 2-Apr-2024
  • (2024)CPU and GPU Hash Joins on Skewed Data2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00064(402-408)Online publication date: 13-May-2024
  • (2024)ArcaDB: A Disaggregated Query Engine for Heterogenous Computational Environments2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00015(42-53)Online publication date: 7-Jul-2024
  • (2024)$k$-Way In-Place Merge by CPU-GPU Cooperative Processing2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00039(152-160)Online publication date: 24-Jul-2024
  • (2024)Data-centric workloads with MPI_SortJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104833(104833)Online publication date: Jan-2024
  • (2024)Extension of Parallel Primitives and Their Applications to Large-Scale Data ProcessingDatabase and Expert Systems Applications10.1007/978-3-031-68312-1_19(248-253)Online publication date: 26-Aug-2024
  • (2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
  • (2023)Deploying Computational Storage for HTAP DBMSs Takes More Than Just Computation OffloadingProceedings of the VLDB Endowment10.14778/3583140.358316116:6(1480-1493)Online publication date: 20-Apr-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media