research-article

Rethinking SIMD Vectorization for In-Memory Databases

Authors:

Orestis Polychroniou,

Kenneth A. RossAuthors Info & Claims

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Pages 1493 - 1508

https://doi.org/10.1145/2723372.2747645

Published: 27 May 2015 Publication History

Abstract

Analytical databases are continuously adapting to the underlying hardware in order to saturate all sources of parallelism. At the same time, hardware evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, strays from the mainstream CPU design by packing a larger number of simpler cores per chip, relying on SIMD instructions to fill the performance gap. Databases have been attempting to utilize the SIMD capabilities of CPUs. However, mainstream CPUs have only recently adopted wider SIMD registers and more advanced instructions, since they do not rely primarily on SIMD for efficiency. In this paper, we present novel vectorized designs and implementations of database operators, based on advanced SIMD operations, such as gathers and scatters. We study selections, hash tables, and partitioning; and combine them to build sorting and joins. Our evaluation on the MIC-based Xeon Phi co-processor as well as the latest mainstream CPUs shows that our vectorization designs are up to an order of magnitude faster than the state-of-the-art scalar and vector approaches. Also, we highlight the impact of efficient vectorization on the algorithmic design of in-memory database operators, as well as the architectural design and power efficiency of hardware, by making simple cores comparably fast to complex cores. This work is applicable to CPUs and co-processors with advanced SIMD capabilities, using either many simple cores or fewer complex cores.

References

[1]

M.-C. Albutiu et al. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, 5(10):1064--1075, June 2012.

Digital Library

[2]

P. Bakkum et al. Accelerating SQL database operations on a GPU with CUDA. In GPGPU, pages 94--103, 2010.

Digital Library

[3]

C. Balkesen et al. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013.

Digital Library

[4]

C. Balkesen et al. Multicore, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, Sept. 2013.

Digital Library

[5]

S. Blanas et al. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011.

Digital Library

[6]

P. Boncz et al. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.

[7]

J. Chhugani et al. Efficient implementation of sorting on multi-core SIMD CPU architecture. In VLDB, pages 1313--1324, 2008.

Digital Library

[8]

J. Cieslewicz et al. Automatic contention detection and amelioration for data-intensive operations. In SIGMOD, pages 483--494, 2010.

Digital Library

[9]

D. J. DeWitt et al. Materialization strategies in a column-oriented DBMS. In ICDE, pages 466--475, 2007.

[10]

J. Hofmann et al. Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips. CoRR, arXiv:1401.7494, 2014.

Digital Library

[11]

H. Inoue et al. AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. In PACT, pages 189--198, 2007.

[12]

S. Jha et al. Improving main memory hash joins on Intel Xeon Phi processors: An experimental approach. PVLDB, 8(6):642--653, Feb. 2015.

Digital Library

[13]

T. Kaldewey et al. GPU join processing revisited. In DaMoN, 2012.

Digital Library

[14]

C. Kim et al. Sort vs. hash revisited: fast join implementation on modern multicore CPUs. In VLDB, pages 1378--1389, 2009.

Digital Library

[15]

C. Kim et al. Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In SIGMOD, pages 339--350, 2010.

Digital Library

[16]

K. Krikellas et al. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.

[17]

S. Larsen et al. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 145--156, 2000.

Digital Library

[18]

V. Leis et al. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD, pages 743--754, 2014.

Digital Library

[19]

S. Manegold et al. Optimizing database architecture for the new bottleneck: Memory access. J. VLDB, 9(3):231--246, Dec. 2000.

Digital Library

[20]

S. Manegold et al. What happens during a join? dissecting CPU and memory optimization effects. In VLDB, pages 339--350, 2000.

Digital Library

[21]

S. Manegold et al. Cache-conscious radix-decluster projections. In VLDB, pages 684--695, 2004.

Digital Library

[22]

T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, June 2011.

Digital Library

[23]

R. Pagh et al. Cuckoo hashing. J. Algorithms, 51(2):122--144, May 2004.

Digital Library

[24]

H. Pirk et al. Accelerating foreign-key joins using asymmetric memory channels. In ADMS, 2011.

[25]

O. Polychroniou et al. High throughput heavy hitter aggregation for modern SIMD processors. In DaMoN, 2013.

Digital Library

[26]

O. Polychroniou et al. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, pages 755--766, 2014.

Digital Library

[27]

O. Polychroniou et al. Vectorized Bloom filters for advanced SIMD processors. In DaMoN, 2014.

Digital Library

[28]

V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, Aug. 2013.

Digital Library

[29]

K. A. Ross. Selection conditions in main memory. ACM Trans. Database Systems, 29(1):132--161, Mar. 2004.

Digital Library

[30]

K. A. Ross. Efficient hash probes on modern processors. In ICDE, pages 1297--1301, 2007.

[31]

N. Satish et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010.

Digital Library

[32]

B. Schlegel et al. Scalable frequent itemset mining on many-core processors. In DaMoN, 2013.

Digital Library

[33]

L. Seiler et al. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graphics, 27(3), Aug. 2008.

Digital Library

[34]

J. Shin. Introducing control flow into vectorized code. In PACT, pages 280--291, 2007.

Digital Library

[35]

L. Sidirourgos et al. Column imprints: A secondary index structure. In SIGMOD, pages 893--904, 2013.

Digital Library

[36]

E. A. Sitaridi et al. Optimizing select conditions on GPUs. In DaMoN, 2013.

Digital Library

[37]

M. Stonebraker et al. C-store: A column-oriented DBMS. In VLDB, pages 553--564, 2005.

Digital Library

[38]

J. Wassenberg et al. Engineering a multi core radix sort. In EuroPar, pages 160--169, 2011.

Digital Library

[39]

T. Willhalm et al. SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, Aug. 2009.

Digital Library

[40]

Y. Ye et al. Scalable aggregation on multicore processors. In DaMoN, 2011.

Digital Library

[41]

J. Zhou et al. Implementing database operations using SIMD instructions. In SIGMOD, pages 145--156, 2002.

Digital Library

[42]

M. Zukowski et al. Architecture-conscious hashing. In DaMoN, 2006.

Digital Library

Cited By

Schulze RSchreiber TYatsishin IDahimene RMilovidov A(2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685802
Chen YChen S(2024)Cabin: A Compressed Adaptive Binned Scan IndexProceedings of the ACM on Management of Data10.1145/36393122:1(1-26)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639312
Habich DPietrzyk JBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654694
Show More Cited By

Index Terms

Rethinking SIMD Vectorization for In-Memory Databases
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Vectorizing unstructured mesh computations for many-core architectures

Achieving optimal performance on the latest multi-core and many-core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

May 2015

2110 pages

ISBN:9781450327589

DOI:10.1145/2723372

General Chair:
Timos Sellis
RMIT University, Australia
,
Program Chairs:
Susan B. Davidson
University of Pennsylvania, USA
,
Zack Ives
University of Pennsylvania, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGMOD/PODS'15

Sponsor:

SIGMOD

SIGMOD/PODS'15: International Conference on Management of Data

May 31 - June 4, 2015

Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

130
Total Citations
View Citations
2,064
Total Downloads

Downloads (Last 12 months)268
Downloads (Last 6 weeks)25

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schulze RSchreiber TYatsishin IDahimene RMilovidov A(2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685802
Chen YChen S(2024)Cabin: A Compressed Adaptive Binned Scan IndexProceedings of the ACM on Management of Data10.1145/36393122:1(1-26)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639312
Habich DPietrzyk JBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654694
Cai YChen S(2024)CPU and GPU Hash Joins on Skewed Data2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00064(402-408)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDEW61823.2024.00064
Böther MBenson LKlimovic ARabl T(2023)Analyzing Vectorized Hash Tables across CPU ArchitecturesProceedings of the VLDB Endowment10.14778/3611479.361148516:11(2755-2768)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611485
Afroozeh ABoncz P(2023)The FastLanes Compression Layout: Decoding > 100 Billion Integers per Second with Scalar CodeProceedings of the VLDB Endowment10.14778/3598581.359858716:9(2132-2144)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.14778/3598581.3598587
Li YLu JChandramouli B(2023)Selection Pushdown in Column Stores using Bit Manipulation InstructionsProceedings of the ACM on Management of Data10.1145/35893231:2(1-26)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589323
Rayhan YAref WRenz MNascimento M(2023)SIMD-ified R-tree Query Processing and OptimizationProceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625610(1-10)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3589132.3625610
Mohammed HSummers CKaushik SWu EDas SPandis ISelçuk Candan KAmer-Yahia S(2023)SmokedDuck Demonstration: SQLStepperCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589731(183-186)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589731
Liu ZHan RZhang YZhang YTang XDeng GZhong TDementiev RLu YQue M(2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.3262782(1-16)Online publication date: 2023
https://doi.org/10.1109/TPDS.2023.3262782
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents