research-article

FPGA-based Data Partitioning

Authors:

Gustavo AlonsoAuthors Info & Claims

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 433 - 445

https://doi.org/10.1145/3035918.3035946

Published: 09 May 2017 Publication History

Abstract

Implementing parallel operators in multi-core machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in some cases up to 90% of the cost of, e.g., a parallel hash join. In this paper we explore the use of an FPGA to accelerate data partitioning. We do so in the context of new hybrid architectures where the FPGA is located as a co-processor residing on a socket and with coherent access to the same memory as the CPU residing on the other socket. Such an architecture reduces data transfer overheads between the CPU and the FPGA, enabling hybrid operator execution where the partitioning happens on the FPGA and the build and probe phases of a join happen on the CPU. Our experiments demonstrate that FPGA-based partitioning is significantly faster and more robust than CPU-based partitioning. The results open interesting options as FPGAs are gradually integrated tighter with the CPU.

References

[1]

I. Absalyamov, R. Halstead, P. Budhkar, W. Najjar, et al. Fpga-accelerated group-by aggregation using synchronizing caches. In ACM DaMoN, 2016.

Digital Library

[2]

A. Appleby. https://github.com/aappleby/smhasher. January 2016.

[3]

C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, 2013.

Digital Library

[4]

C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In IEEE ICDE, 2013.

Digital Library

[5]

R. Barber, G. Lohman, I. Pandis, et al. Memory-efficient hash joins. PVLDB, 8(4):353--364, 2014.

Digital Library

[6]

C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-scale in-memory join processing using rdma. In ACM SIGMOD, 2015.

Digital Library

[7]

C. Barthels, I. Müller, T. Schneider, G. Alonso, and T. Hoefler. Distributed join algorithms on thousands of cores. PVLDB, 10(5), 2017.

Digital Library

[8]

S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In ACM SIGMOD, 2011.

Digital Library

[9]

J. Casper and K. Olukotun. Hardware acceleration of database operations. In ACM FPGA, 2014.

Digital Library

[10]

R. Chen and V. K. Prasanna. Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform. IEEE FCCM, 2016.

[11]

R. J. Halstead, I. Absalyamov, W. A. Najjar, and V. J. Tsotras. FPGA-based Multithreading for In-Memory Hash Joins. In CIDR, 2015.

[12]

R. J. Halstead, B. Sukhwani, H. Min, M. Thoennes, et al. Accelerating join operation for relational databases with FPGAs. In IEEE FCCM, 2013.

Digital Library

[13]

J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. PVLDB, 8(4):329--340, 2014.

Digital Library

[14]

Z. Istvan, D. Sidler, and G. Alonso. Runtime parameterizable regular expression operators for databases. 2016.

[15]

Z. Istvan, L. Woods, and G. Alonso. Histograms as a side effect of data movement for big data. In ACM SIGMOD, 2014.

Digital Library

[16]

S. Jha, B. He, M. Lu, X. Cheng, and H. P. Huynh. Improving main memory hash joins on intel xeon phi processors: An experimental approach. PVLDB, 8(6):642--653, 2015.

Digital Library

[17]

T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In ACM DaMoN, 2012.

Digital Library

[18]

K. Kara and G. Alonso. Fast and robust hashing for database operators. In IEEE FPL, 2016.

[19]

C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, et al. Sort vs. hash revisited: fast join implementation on modern multi-core cpus. PVLDB, 2(2):1378--1389, 2009.

Digital Library

[20]

H. Lang, V. Leis, M.-C. Albutiu, T. Neumann, and A. Kemper. Massively parallel numa-aware hash joins. In In Memory Data Management and Analysis. 2015.

[21]

S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE TKDE, 14(4):709--730, 2002.

Digital Library

[22]

N. Mirzadeh, O. Kocberber, B. Falsafi, and B. Grot. Sort vs. hash join revisited for near-memory execution. In ASBD, 2015.

[23]

R. Mueller, J. Teubner, and G. Alonso. Glacier: a query-to-hardware compiler. In ACM SIGMOD, 2010.

Digital Library

[24]

N. Oliver et al. A reconfigurable computing system based on a cache-coherent fabric. In IEEE ReConFig, 2011.

Digital Library

[25]

H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In IEEE ICDE, 2014.

[26]

O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking SIMD vectorization for in-memory databases. In ACM SIGMOD, 2015.

Digital Library

[27]

O. Polychroniou and K. A. Ross. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In ACM SIGMOD, 2014.

Digital Library

[28]

A. Putnam, A. M. Caulfield, E. S. Chung, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In ACM/IEEE ISCA, 2014.

Digital Library

[29]

S. Richter, V. Alvarez, and J. Dittrich. A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB, 9(3):96--107, 2015.

Digital Library

[30]

N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, et al. Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort. In ACM SIGMOD, 2010.

Digital Library

[31]

S. Schuh, X. Chen, and J. Dittrich. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In ACM SIGMOD, 2016.

Digital Library

[32]

F. M. Schuhknecht, P. Khanchandani, and J. Dittrich. On the surprising difficulty of simple things: the case of radix partitioning. PVLDB, 8(9):934--937, 2015.

Digital Library

[33]

D. Sidler, Z. Istvan, M. Owaida, and G. Alonso. Accelerating pattern matching queries in hybrid cpu-fpga architectures. In ACM SIGMOD, 2017.

Digital Library

[34]

J. Stuecheli, B. Blaner, C. Johns, and M. Siegel. Capi: A coherent accelerator processor interface. IBM Journal of Research and Development, 59(1):7--1, 2015.

Digital Library

[35]

B. Sukhwani, H. Min, M. Thoennes, et al. Database analytics acceleration using fpgas. In ACM PACT, 2012.

Digital Library

[36]

T. Ueda, M. Ito, and M. Ohara. A Dynamically Reconfigurable Equi-Joiner on FPGA. IBM Tehnical Report RT0969, 2015.

[37]

Z. Wang, B. He, and W. Zhang. A study of data partitioning on OpenCL-based FPGAs. In IEEE FPL, 2015.

[38]

J. Wassenberg and P. Sanders. Engineering a multi-core radix sort. In Euro-Par, 2011.

Digital Library

[39]

S. Werner, S. Groppe, V. Linnemann, and T. Pionteck. Hardware-accelerated join processing in large Semantic Web databases with FPGAs. In IEEE HPCS, 2013.

[40]

L. Woods, G. Alonso, and J. Teubner. Parallel computation of skyline queries. In IEEE FCCM, 2013.

Digital Library

[41]

L. Wu, R. J. Barker, M. A. Kim, and K. A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In ACM SIGARCH, 2013.

Digital Library

Cited By

Zhou YZhang FLin THuang YLong SZhai JDu X(2024)F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00287(3739-3752)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00287
Moghaddamfar MMay NFärber CLehner WKumar AIenne PZhang Z(2023)A Study of Early Aggregation in Database Query Processing on FPGAsProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573194(55-65)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573194
Liu ZHan RZhang YZhang YTang XDeng GZhong TDementiev RLu YQue M(2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326278234:6(1757-1772)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3262782
Show More Cited By

Index Terms

FPGA-based Data Partitioning
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Join algorithms

Recommendations

SAccO

This paper presents SAccO (Scalable Accelerator platform Osnabrück), a novel framework for implementing data-intensive applications using scalable and portable reconfigurable hardware accelerators. Instead of using expensive "reconfigurable ...
Hardware accelerated FPGA placement

A key advantage of field-programmable gate arrays (FPGAs) over full-custom and semi-custom devices is that they provide relatively quick implementation from concept to physical realization. However, as modern FPGAs reach close to one million logic ...
Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA
Special Issue on Deep learning on FPGAs

Convolutional Neural Networks-- (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

May 2017

1810 pages

ISBN:9781450341974

DOI:10.1145/3035918

General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS'17

Sponsor:

SIGMOD

SIGMOD/PODS'17: International Conference on Management of Data

May 14 - 19, 2017

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
1,199
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou YZhang FLin THuang YLong SZhai JDu X(2024)F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00287(3739-3752)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00287
Moghaddamfar MMay NFärber CLehner WKumar AIenne PZhang Z(2023)A Study of Early Aggregation in Database Query Processing on FPGAsProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573194(55-65)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573194
Liu ZHan RZhang YZhang YTang XDeng GZhong TDementiev RLu YQue M(2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326278234:6(1757-1772)Online publication date: 1-Jun-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3262782
Papon THyoung Mun JRoozkhosh SHoornaert DSanaullah ADrepper UMancuso RAthanassoulis M(2023)Relational Fabric: Transparent Data Transformation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00297(3688-3698)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00297
Bui PLe MHoang BNgoc NPham H(2022)Data Partitioning and Asynchronous Processing to Improve the Embedded Software Performance on Multicore ProcessorsРазделение данных и асинхронная обработка для повышения производительности встроенного программного обеспечения на многокодных процессорахInformatics and AutomationИнформатика и автоматизация10.15622/ia.21.2.221:2(243-274)Online publication date: 17-Feb-2022
https://doi.org/10.15622/ia.21.2.2
Iskandar VGhany MGöhringer D(2022)Near-memory Computing on FPGAs with 3D-stacked Memories: Applications, Architectures, and OptimizationsACM Transactions on Reconfigurable Technology and Systems10.1145/354765816:1(1-32)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3547658
Lee DSo JAHN MLee JKim JCho JOliver RThummala VJV RUpadhya SKhan MKim J(2022)Improving In-Memory Database Operations with Acceleration DIMM (AxDIMM)Proceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3535093(1-9)Online publication date: 12-Jun-2022
https://dl.acm.org/doi/10.1145/3533737.3535093
Lutz CBreß SZeuch SRabl TMarkl VIves ZBonifati AEl Abbadi A(2022)Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast InterconnectsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517911(1017-1032)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3517911
Cock DRamdas ASchwyn DGiardino MTurowski AHe ZHossle NKorolija DLicciardello MMartsenko KAchermann RAlonso GRoscoe TFalsafi BFerdman MLu SWenisch T(2022)Enzian: an open, general, CPU/FPGA platform for systems software researchProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507742(434-451)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507742
Shi RKara KHagleitner CDiamantopoulos DSyrivelis DAlonso G(2022)Exploiting HBM on FPGAs for Data ProcessingACM Transactions on Reconfigurable Technology and Systems10.1145/349123815:4(1-27)Online publication date: 9-Dec-2022
https://dl.acm.org/doi/10.1145/3491238
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents