Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3035918.3035946acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

FPGA-based Data Partitioning

Published: 09 May 2017 Publication History

Abstract

Implementing parallel operators in multi-core machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in some cases up to 90% of the cost of, e.g., a parallel hash join. In this paper we explore the use of an FPGA to accelerate data partitioning. We do so in the context of new hybrid architectures where the FPGA is located as a co-processor residing on a socket and with coherent access to the same memory as the CPU residing on the other socket. Such an architecture reduces data transfer overheads between the CPU and the FPGA, enabling hybrid operator execution where the partitioning happens on the FPGA and the build and probe phases of a join happen on the CPU. Our experiments demonstrate that FPGA-based partitioning is significantly faster and more robust than CPU-based partitioning. The results open interesting options as FPGAs are gradually integrated tighter with the CPU.

References

[1]
I. Absalyamov, R. Halstead, P. Budhkar, W. Najjar, et al. Fpga-accelerated group-by aggregation using synchronizing caches. In ACM DaMoN, 2016.
[2]
A. Appleby. https://github.com/aappleby/smhasher. January 2016.
[3]
C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, 2013.
[4]
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In IEEE ICDE, 2013.
[5]
R. Barber, G. Lohman, I. Pandis, et al. Memory-efficient hash joins. PVLDB, 8(4):353--364, 2014.
[6]
C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-scale in-memory join processing using rdma. In ACM SIGMOD, 2015.
[7]
C. Barthels, I. Müller, T. Schneider, G. Alonso, and T. Hoefler. Distributed join algorithms on thousands of cores. PVLDB, 10(5), 2017.
[8]
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In ACM SIGMOD, 2011.
[9]
J. Casper and K. Olukotun. Hardware acceleration of database operations. In ACM FPGA, 2014.
[10]
R. Chen and V. K. Prasanna. Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform. IEEE FCCM, 2016.
[11]
R. J. Halstead, I. Absalyamov, W. A. Najjar, and V. J. Tsotras. FPGA-based Multithreading for In-Memory Hash Joins. In CIDR, 2015.
[12]
R. J. Halstead, B. Sukhwani, H. Min, M. Thoennes, et al. Accelerating join operation for relational databases with FPGAs. In IEEE FCCM, 2013.
[13]
J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. PVLDB, 8(4):329--340, 2014.
[14]
Z. Istvan, D. Sidler, and G. Alonso. Runtime parameterizable regular expression operators for databases. 2016.
[15]
Z. Istvan, L. Woods, and G. Alonso. Histograms as a side effect of data movement for big data. In ACM SIGMOD, 2014.
[16]
S. Jha, B. He, M. Lu, X. Cheng, and H. P. Huynh. Improving main memory hash joins on intel xeon phi processors: An experimental approach. PVLDB, 8(6):642--653, 2015.
[17]
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In ACM DaMoN, 2012.
[18]
K. Kara and G. Alonso. Fast and robust hashing for database operators. In IEEE FPL, 2016.
[19]
C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, et al. Sort vs. hash revisited: fast join implementation on modern multi-core cpus. PVLDB, 2(2):1378--1389, 2009.
[20]
H. Lang, V. Leis, M.-C. Albutiu, T. Neumann, and A. Kemper. Massively parallel numa-aware hash joins. In In Memory Data Management and Analysis. 2015.
[21]
S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE TKDE, 14(4):709--730, 2002.
[22]
N. Mirzadeh, O. Kocberber, B. Falsafi, and B. Grot. Sort vs. hash join revisited for near-memory execution. In ASBD, 2015.
[23]
R. Mueller, J. Teubner, and G. Alonso. Glacier: a query-to-hardware compiler. In ACM SIGMOD, 2010.
[24]
N. Oliver et al. A reconfigurable computing system based on a cache-coherent fabric. In IEEE ReConFig, 2011.
[25]
H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In IEEE ICDE, 2014.
[26]
O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking SIMD vectorization for in-memory databases. In ACM SIGMOD, 2015.
[27]
O. Polychroniou and K. A. Ross. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In ACM SIGMOD, 2014.
[28]
A. Putnam, A. M. Caulfield, E. S. Chung, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In ACM/IEEE ISCA, 2014.
[29]
S. Richter, V. Alvarez, and J. Dittrich. A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB, 9(3):96--107, 2015.
[30]
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, et al. Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort. In ACM SIGMOD, 2010.
[31]
S. Schuh, X. Chen, and J. Dittrich. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In ACM SIGMOD, 2016.
[32]
F. M. Schuhknecht, P. Khanchandani, and J. Dittrich. On the surprising difficulty of simple things: the case of radix partitioning. PVLDB, 8(9):934--937, 2015.
[33]
D. Sidler, Z. Istvan, M. Owaida, and G. Alonso. Accelerating pattern matching queries in hybrid cpu-fpga architectures. In ACM SIGMOD, 2017.
[34]
J. Stuecheli, B. Blaner, C. Johns, and M. Siegel. Capi: A coherent accelerator processor interface. IBM Journal of Research and Development, 59(1):7--1, 2015.
[35]
B. Sukhwani, H. Min, M. Thoennes, et al. Database analytics acceleration using fpgas. In ACM PACT, 2012.
[36]
T. Ueda, M. Ito, and M. Ohara. A Dynamically Reconfigurable Equi-Joiner on FPGA. IBM Tehnical Report RT0969, 2015.
[37]
Z. Wang, B. He, and W. Zhang. A study of data partitioning on OpenCL-based FPGAs. In IEEE FPL, 2015.
[38]
J. Wassenberg and P. Sanders. Engineering a multi-core radix sort. In Euro-Par, 2011.
[39]
S. Werner, S. Groppe, V. Linnemann, and T. Pionteck. Hardware-accelerated join processing in large Semantic Web databases with FPGAs. In IEEE HPCS, 2013.
[40]
L. Woods, G. Alonso, and J. Teubner. Parallel computation of skyline queries. In IEEE FCCM, 2013.
[41]
L. Wu, R. J. Barker, M. A. Kim, and K. A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In ACM SIGARCH, 2013.

Cited By

View all
  • (2024)F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00287(3739-3752)Online publication date: 13-May-2024
  • (2023)A Study of Early Aggregation in Database Query Processing on FPGAsProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573194(55-65)Online publication date: 12-Feb-2023
  • (2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326278234:6(1757-1772)Online publication date: 1-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data partitioning
  2. fpga
  3. hardware acceleration
  4. hash join

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)4
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00287(3739-3752)Online publication date: 13-May-2024
  • (2023)A Study of Early Aggregation in Database Query Processing on FPGAsProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573194(55-65)Online publication date: 12-Feb-2023
  • (2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326278234:6(1757-1772)Online publication date: 1-Jun-2023
  • (2023)Relational Fabric: Transparent Data Transformation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00297(3688-3698)Online publication date: Apr-2023
  • (2022)Data Partitioning and Asynchronous Processing to Improve the Embedded Software Performance on Multicore ProcessorsРазделение данных и асинхронная обработка для повышения производительности встроенного программного обеспечения на многокодных процессорахInformatics and AutomationИнформатика и автоматизация10.15622/ia.21.2.221:2(243-274)Online publication date: 17-Feb-2022
  • (2022)Near-memory Computing on FPGAs with 3D-stacked Memories: Applications, Architectures, and OptimizationsACM Transactions on Reconfigurable Technology and Systems10.1145/354765816:1(1-32)Online publication date: 22-Dec-2022
  • (2022)Improving In-Memory Database Operations with Acceleration DIMM (AxDIMM)Proceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3535093(1-9)Online publication date: 12-Jun-2022
  • (2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
  • (2022)Enzian: an open, general, CPU/FPGA platform for systems software researchProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507742(434-451)Online publication date: 28-Feb-2022
  • (2022)Exploiting HBM on FPGAs for Data ProcessingACM Transactions on Reconfigurable Technology and Systems10.1145/349123815:4(1-27)Online publication date: 9-Dec-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media