Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2723372.2747645acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Rethinking SIMD Vectorization for In-Memory Databases

Published: 27 May 2015 Publication History

Abstract

Analytical databases are continuously adapting to the underlying hardware in order to saturate all sources of parallelism. At the same time, hardware evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, strays from the mainstream CPU design by packing a larger number of simpler cores per chip, relying on SIMD instructions to fill the performance gap. Databases have been attempting to utilize the SIMD capabilities of CPUs. However, mainstream CPUs have only recently adopted wider SIMD registers and more advanced instructions, since they do not rely primarily on SIMD for efficiency. In this paper, we present novel vectorized designs and implementations of database operators, based on advanced SIMD operations, such as gathers and scatters. We study selections, hash tables, and partitioning; and combine them to build sorting and joins. Our evaluation on the MIC-based Xeon Phi co-processor as well as the latest mainstream CPUs shows that our vectorization designs are up to an order of magnitude faster than the state-of-the-art scalar and vector approaches. Also, we highlight the impact of efficient vectorization on the algorithmic design of in-memory database operators, as well as the architectural design and power efficiency of hardware, by making simple cores comparably fast to complex cores. This work is applicable to CPUs and co-processors with advanced SIMD capabilities, using either many simple cores or fewer complex cores.

References

[1]
M.-C. Albutiu et al. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, 5(10):1064--1075, June 2012.
[2]
P. Bakkum et al. Accelerating SQL database operations on a GPU with CUDA. In GPGPU, pages 94--103, 2010.
[3]
C. Balkesen et al. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013.
[4]
C. Balkesen et al. Multicore, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, Sept. 2013.
[5]
S. Blanas et al. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011.
[6]
P. Boncz et al. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.
[7]
J. Chhugani et al. Efficient implementation of sorting on multi-core SIMD CPU architecture. In VLDB, pages 1313--1324, 2008.
[8]
J. Cieslewicz et al. Automatic contention detection and amelioration for data-intensive operations. In SIGMOD, pages 483--494, 2010.
[9]
D. J. DeWitt et al. Materialization strategies in a column-oriented DBMS. In ICDE, pages 466--475, 2007.
[10]
J. Hofmann et al. Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips. CoRR, arXiv:1401.7494, 2014.
[11]
H. Inoue et al. AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. In PACT, pages 189--198, 2007.
[12]
S. Jha et al. Improving main memory hash joins on Intel Xeon Phi processors: An experimental approach. PVLDB, 8(6):642--653, Feb. 2015.
[13]
T. Kaldewey et al. GPU join processing revisited. In DaMoN, 2012.
[14]
C. Kim et al. Sort vs. hash revisited: fast join implementation on modern multicore CPUs. In VLDB, pages 1378--1389, 2009.
[15]
C. Kim et al. Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In SIGMOD, pages 339--350, 2010.
[16]
K. Krikellas et al. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.
[17]
S. Larsen et al. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 145--156, 2000.
[18]
V. Leis et al. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD, pages 743--754, 2014.
[19]
S. Manegold et al. Optimizing database architecture for the new bottleneck: Memory access. J. VLDB, 9(3):231--246, Dec. 2000.
[20]
S. Manegold et al. What happens during a join? dissecting CPU and memory optimization effects. In VLDB, pages 339--350, 2000.
[21]
S. Manegold et al. Cache-conscious radix-decluster projections. In VLDB, pages 684--695, 2004.
[22]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, June 2011.
[23]
R. Pagh et al. Cuckoo hashing. J. Algorithms, 51(2):122--144, May 2004.
[24]
H. Pirk et al. Accelerating foreign-key joins using asymmetric memory channels. In ADMS, 2011.
[25]
O. Polychroniou et al. High throughput heavy hitter aggregation for modern SIMD processors. In DaMoN, 2013.
[26]
O. Polychroniou et al. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, pages 755--766, 2014.
[27]
O. Polychroniou et al. Vectorized Bloom filters for advanced SIMD processors. In DaMoN, 2014.
[28]
V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, Aug. 2013.
[29]
K. A. Ross. Selection conditions in main memory. ACM Trans. Database Systems, 29(1):132--161, Mar. 2004.
[30]
K. A. Ross. Efficient hash probes on modern processors. In ICDE, pages 1297--1301, 2007.
[31]
N. Satish et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010.
[32]
B. Schlegel et al. Scalable frequent itemset mining on many-core processors. In DaMoN, 2013.
[33]
L. Seiler et al. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graphics, 27(3), Aug. 2008.
[34]
J. Shin. Introducing control flow into vectorized code. In PACT, pages 280--291, 2007.
[35]
L. Sidirourgos et al. Column imprints: A secondary index structure. In SIGMOD, pages 893--904, 2013.
[36]
E. A. Sitaridi et al. Optimizing select conditions on GPUs. In DaMoN, 2013.
[37]
M. Stonebraker et al. C-store: A column-oriented DBMS. In VLDB, pages 553--564, 2005.
[38]
J. Wassenberg et al. Engineering a multi core radix sort. In EuroPar, pages 160--169, 2011.
[39]
T. Willhalm et al. SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, Aug. 2009.
[40]
Y. Ye et al. Scalable aggregation on multicore processors. In DaMoN, 2011.
[41]
J. Zhou et al. Implementing database operations using SIMD instructions. In SIGMOD, pages 145--156, 2002.
[42]
M. Zukowski et al. Architecture-conscious hashing. In DaMoN, 2006.

Cited By

View all
  • (2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 1-Aug-2024
  • (2024)Cabin: A Compressed Adaptive Binned Scan IndexProceedings of the ACM on Management of Data10.1145/36393122:1(1-26)Online publication date: 26-Mar-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • Show More Cited By

Index Terms

  1. Rethinking SIMD Vectorization for In-Memory Databases

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
    May 2015
    2110 pages
    ISBN:9781450327589
    DOI:10.1145/2723372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. simd
    2. vectorization
    3. xeon phi

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS'15
    Sponsor:
    SIGMOD/PODS'15: International Conference on Management of Data
    May 31 - June 4, 2015
    Victoria, Melbourne, Australia

    Acceptance Rates

    SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)268
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 1-Aug-2024
    • (2024)Cabin: A Compressed Adaptive Binned Scan IndexProceedings of the ACM on Management of Data10.1145/36393122:1(1-26)Online publication date: 26-Mar-2024
    • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
    • (2024)CPU and GPU Hash Joins on Skewed Data2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00064(402-408)Online publication date: 13-May-2024
    • (2023)Analyzing Vectorized Hash Tables across CPU ArchitecturesProceedings of the VLDB Endowment10.14778/3611479.361148516:11(2755-2768)Online publication date: 24-Aug-2023
    • (2023)The FastLanes Compression Layout: Decoding > 100 Billion Integers per Second with Scalar CodeProceedings of the VLDB Endowment10.14778/3598581.359858716:9(2132-2144)Online publication date: 1-May-2023
    • (2023)Selection Pushdown in Column Stores using Bit Manipulation InstructionsProceedings of the ACM on Management of Data10.1145/35893231:2(1-26)Online publication date: 20-Jun-2023
    • (2023)SIMD-ified R-tree Query Processing and OptimizationProceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625610(1-10)Online publication date: 13-Nov-2023
    • (2023)SmokedDuck Demonstration: SQLStepperCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589731(183-186)Online publication date: 4-Jun-2023
    • (2023)Exploring Fine-Grained In-Memory Database Performance for Modern CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.3262782(1-16)Online publication date: 2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media