research-article

Relational joins on graphics processors

Authors:

Naga Govindaraju,

Pedro SanderAuthors Info & Claims

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages 511 - 524

https://doi.org/10.1145/1376616.1376670

Published: 09 June 2008 Publication History

Abstract

We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for writing to random memory locations, efficient inter-processor communication, and a programming model for general-purpose computing. Taking advantage of these new features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts.

References

[1]

A. Ailamaki, N. Govindaraju, S. Harizopoulos and D. Manocha. Query co-processing on commodity processors. VLDB, 2006.

Digital Library

[2]

AMD CTM, http://ati.amd.com/products/streamprocessor/.

[3]

S. Azadegan, A. R. Tripathi. Parallel join algorithms for SIMD models. ICPP (3) 1991: 125--133.

[4]

S. Azadegan, A. R. Tripathi. A parallel join algorithm for SIMD architectures. Journal of Systems and Software 39(3), 1997.

Digital Library

[5]

N. Bandi, C. Sun, D. Agrawal and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. VLDB, 2004.

Digital Library

[6]

D. Blythe. The Direct3D 10 system. SIGGRAPH 2006.

Digital Library

[7]

C. Boyd. Mass market applications of massively parallel computing. ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2007.

[8]

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. SIGGRAPH 2004.

Digital Library

[9]

G. E. Blelloch. Prefix sums and their applications. Technical report, CMU-CS-90-190, Nov 1990.

[10]

P. Boncz, S. Manegold and M. Kersten. Database architecture optimized for the new bottleneck: memory access. VLDB, 1999.

Digital Library

[11]

J. Cieslewicz, J. Berry, B. Hendrickson and K. A. Ross. Realizing parallelism in database operations: insights from a massively multithreaded architecture. DaMoN, 2006.

Digital Library

[12]

T. Cormen, C. Leiserson, R. Rivest and C. Stein. Introduction to Algorithms, Second Edition. 2001.

Digital Library

[13]

D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. In Communications of the ACM, Vol. 35, No. 6, June 1992.

Digital Library

[14]

N. Govindaraju, J. Gray, R. Kumar and D. Manocha. GPUTeraSort: high performance graphics coprocessor sorting for large database management. SIGMOD, 2006.

Digital Library

[15]

N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. SIGMOD, 2004.

Digital Library

[16]

N. Govindaraju, N. Raghuvanshi and D. Manocha. Fast and approximate stream mining of quantiles and frequencies using graphics processors. SIGMOD, 2005.

Digital Library

[17]

N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: limitations and opportunities. CIDR, 2007.

[18]

M. Harris, J. Owens, S. Sengupta, Y. Zhang and A. Davidson. CUDPP: CUDA Data Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/, 2007.

[19]

B. He, N. Govindaraju, Q. Luo and B. Smith. Efficient gather and scatter operations on graphics processors. ACM/IEEE Supercomputing, 2007.

Digital Library

[20]

D. Horn. Stream reduction operations for GPGPU applications. In GPU Gems 2, Ed. Addison Wesley, 2005.

[21]

K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. VLDB, 1991.

Digital Library

[22]

A. Lamarca and R. Ladner. The influence of caches on the performance of sorting. SODA, 1997.

Digital Library

[23]

M. D. Lieberman, J. Sankaranarayanan, H. Samet. A fast similarity join algorithm using graphics processing units. ICDE, 2008.

Digital Library

[24]

B. Liu and E. Rundensteiner. Revisiting pipelined parallelism in multi-join query processing. VLDB, 2005.

Digital Library

[25]

H. Lu, K. Tan and M. Shan. Hash-based join algorithms for multiprocessor computers with shared memory. VLDB, 1990.

Digital Library

[26]

MonetDB. http://monetdb.cwi.nl/.

[27]

NVIDIA CUDA (Compute Unified Device Architecture), http://developer.nvidia.com/object/cuda.html.

[28]

OpenGL, http://www.opengl.org/.

[29]

OpenMP, http://www.openmp.org/.

[30]

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum (26), 2007.

[31]

J. Rao and K. A. Ross. Cache conscious indexing for decision-support in main memory. VLDB, 1999.

Digital Library

[32]

D. A. Schneider and D. DeWitt. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. SIGMOD, 1989.

Digital Library

[33]

S. Sengupta, M. Harris, Y. Zhang, J. D. Owens. Scan primitives for GPU computing. ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2007.

Digital Library

[34]

A. Shatdal, C. Kant and J. F. Naughton. Cache conscious algorithms for relational query processing. VLDB, 1994.

Digital Library

[35]

M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran and S. Zdonik. C-Store: A column-oriented DBMS. VLDB, 2005.

Digital Library

[36]

C. Sun, D. Agrawal and A. El Abbadi. Hardware acceleration for spatial selections and joins. SIGMOD, 2003.

Digital Library

[37]

D. Tarditi, S. Puri and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. ASPLOS, 2006.

Digital Library

[38]

A. Thall. Extended-precision floating-point numbers for GPU computation. SIGGRAPH, 2006.

Digital Library

[39]

M. Zagha and G. E. Blelloch. Radix sort for vector multiprocessors. ACM/IEEE Supercomputing, 1991.

Digital Library

[40]

M. Zukowski, S. Heman, N. Nes, P. Boncz. Super-Scalar RAM-CPU Cache Compression. ICDE, 2006.

Digital Library

Cited By

Turan MTenekeci EGüner K(2024)AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKSAdıyaman Üniversitesi Mühendislik Bilimleri Dergisi10.54365/adyumbd.1508182Online publication date: 3-Oct-2024
https://doi.org/10.54365/adyumbd.1508182
Park SOh SKim MLee IChabbi MSteuwer M(2024)INFINEL: An efficient GPU-based processing method for unpredictable large output graph queriesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638490(147-159)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638490
Liu JZhang FLu LQi CGuo XDeng DLi GZhang HZhai JZhang HChen YPan ADu X(2024)G-Learned Index: Enabling Efficient Learned Index on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338121435:6(950-967)Online publication date: 2-Apr-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3381214
Show More Cited By

Index Terms

Relational joins on graphics processors
1. Information systems
  1. Data management systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Efficient gather and scatter operations on graphics processors
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Gather and scatter are two fundamental data-parallel operations, where a large number of data items are read (gathered) from or are written (scattered) to given locations. In this paper, we study these two operations on graphics processing units (GPUs).
...
Relational query coprocessing on graphics processors

Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs ...
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

June 2008

1396 pages

ISBN:9781605581026

DOI:10.1145/1376616

General Chairs:
Laks V. S. Lakshmanan
University of British Columbia, Canada
,
Raymond T. Ng
University of British Columbia, Canada
,
Dennis Shasha
New York University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '08

Sponsor:

SIGMOD/PODS '08: SIGMOD/PODS '08 - International Conference on Management of Data

June 9 - 12, 2008

Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

237
Total Citations
View Citations
1,943
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)4

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Turan MTenekeci EGüner K(2024)AN ANALYSIS OF APACHE SPARK AND GPU PERFORMANCES ON DATABASE SQL QUERIES FOR DISTRIBUTED NETWORKSAdıyaman Üniversitesi Mühendislik Bilimleri Dergisi10.54365/adyumbd.1508182Online publication date: 3-Oct-2024
https://doi.org/10.54365/adyumbd.1508182
Park SOh SKim MLee IChabbi MSteuwer M(2024)INFINEL: An efficient GPU-based processing method for unpredictable large output graph queriesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638490(147-159)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638490
Liu JZhang FLu LQi CGuo XDeng DLi GZhang HZhai JZhang HChen YPan ADu X(2024)G-Learned Index: Enabling Efficient Learned Index on GPUIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.338121435:6(950-967)Online publication date: 2-Apr-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3381214
Cai YChen S(2024)CPU and GPU Hash Joins on Skewed Data2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00064(402-408)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDEW61823.2024.00064
Ruiz-Rohena KRodriguez-Martínez M(2024)ArcaDB: A Disaggregated Query Engine for Heterogenous Computational Environments2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00015(42-53)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00015
Miura SChang QMiyazaki J(2024)$k$-Way In-Place Merge by CPU-GPU Cooperative Processing2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00039(152-160)Online publication date: 24-Jul-2024
https://doi.org/10.1109/ASAP61560.2024.00039
Zulian PBen Bader SFourestey GKrause RRossinelli D(2024)Data-centric workloads with MPI_SortJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104833(104833)Online publication date: Jan-2024
https://doi.org/10.1016/j.jpdc.2023.104833
Nakano MChang QMiyazaki J(2024)Extension of Parallel Primitives and Their Applications to Large-Scale Data ProcessingDatabase and Expert Systems Applications10.1007/978-3-031-68312-1_19(248-253)Online publication date: 26-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-68312-1_19
Doraiswamy HKalagi VRamachandra KHaritsa J(2023)A Case for Graphics-Driven Query ProcessingProceedings of the VLDB Endowment10.14778/3603581.360359016:10(2499-2511)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.14778/3603581.3603590
Lee KJo IAhn JLee HLee HSul WJung H(2023)Deploying Computational Storage for HTAP DBMSs Takes More Than Just Computation OffloadingProceedings of the VLDB Endowment10.14778/3583140.358316116:6(1480-1493)Online publication date: 20-Apr-2023
https://dl.acm.org/doi/10.14778/3583140.3583161
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents