Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3168808acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Poker: permutation-based SIMD execution of intensive tree search by path encoding

Published: 24 February 2018 Publication History

Abstract

We propose POKER, a permutation-based vectorization approach for vectorizing multiple queries over B+-trees. Our key insight is to combine vector loads and path-encoding-based permutations to alleviate memory latency while keeping the number of key comparisons needed for a query to a minimum. Implemented as a C++ template library, POKER represents a general-purpose solution for vectorizing the queries over indexing trees on multi-core processors equipped with SIMD units. For a set of five representative benchmarks evaluated with 24 configurations each, POKER outperforms the state-of-the-art by 2.11x with one single thread and 2.28x with eight threads on an Intel Broadwell processor that supports 256-bit AVX2, on average.

References

[1]
2016. Intel® 64 and IA-32 Architectures Optimization Reference Manual. (June 2016).
[2]
Andrew Anderson, Avinash Malik, and David Gregg. 2015. Automatic Vectorization of Interleaved Data Revisited. ACM Trans. Archit. Code Optim. (2015), 50:1–50:25.
[3]
Amittai F. Aviram. 2008. Interactive B+ Tree. http://www.amittai.com/ prose/bplustree.html, (2008).
[4]
Timo Bingmann. 2013. Stx B+ tree C++ template classes. http:// panthema.net/2007/stx- btree, (2013).
[5]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC ’09. 44–54.
[6]
S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, Liang Wang, and K. Skadron. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In IISWC ’10. 1–11.
[7]
Linchuan Chen, Peng Jiang, and Gagan Agrawal. Exploiting Recent SIMD Architectural Advances for Irregular Applications. In CGO ’16. 47–58.
[8]
Mayank Daga and Mark Nutter. Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU. In SCC ’12. 240–247.
[9]
Franz Franchetti and Markus Püschel. Generating SIMD Vectorized Permutations. In CC ’08. 116–131.
[10]
Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. General Transformations for GPU Execution of Tree Traversals. In SC ’13. 10:1–10:12.
[11]
T. Hayes, O. Palomar, O. Unsal, A. Cristal, and M. Valero. Future Vector Microprocessor Extensions for Data Aggregations. In ISCA ’16. 418–430.
[12]
Peng Jiang and Gagan Agrawal. Efficient SIMD and MIMD Parallelization of Hash-based Aggregation by Conflict Mitigation. In ICS ’17. 24:1–24:11.
[13]
Youngjoon Jo, Michael Goldfarb, and Milind Kulkarni. Automatic Vectorization of Tree Traversals. In PACT ’13. 363–374.
[14]
Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D. Nguyen, Tim Kaldewey, Victor W. Lee, Scott A. Brandt, and Pradeep Dubey. FAST: Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs. In SIGMOD ’10. 339–350.
[15]
D. E. Knuth. 1971. Optimum binary search trees. Acta Inform 1, 1 (1971), 14–25.
[16]
Jianqiao Liu, Nikhil Hegde, and Milind Kulkarni. Hybrid CPU-GPU scheduling and execution of tree traversals. In ICS ’16. 2:1–2:12.
[17]
Daniel S. McFarlin, Volodymyr Arbatov, Franz Franchetti, and Markus Püschel. Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets. In ICS ’11. 265–274.
[18]
Aravind Natarajan and Neeraj Mittal. Fast Concurrent Lock-free Binary Search Trees. In PPoPP ’14. 317–328.
[19]
United Nations. 1997. Principles and Recommendations for Population and Housing Censuses. (1997).
[20]
Jakob Nielsen. 1993. Usability Engineering.
[21]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. Auto-vectorization of Interleaved Data for SIMD. In PLDI ’06. 132–143.
[22]
Bin Ren, Youngjoon Jo, Sriram Krishnamoorthy, Kunal Agrawal, and Milind Kulkarni. Efficient Execution of Recursive Programs on Commodity Vector Hardware. In PLDI ’15. 509–520.
[23]
Bin Ren, Tomi Poutanen, Todd Mytkowicz, Wolfram Schulte, Gagan Agrawal, and James R. Larus. SIMD Parallelization of Applications That Traverse Irregular Data Structures. In CGO ’13. 1–10.
[24]
Minsoo Rhu and Mattan Erez. Maximizing SIMD Resource Utilization in GPGPUs with SIMD Lane Permutation. In ISCA ’13. 356–367.
[25]
Robert Sedgewick and Kevin Wayne. 2011. Algorithms, 4th Edition.
[26]
Amirhesam Shahvarani and Hans-Arno Jacobsen. A Hybrid B+-tree As Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms. In SIGMOD ’16. 1523–1538.
[27]
Karthik Sonti. 2016. Building Event-Driven Batch Analytics on AWS. (2016).
[28]
Nigel Stephens. 2016. ARMv8-A Next-Generation Vector Architecture for HPC. (2016).
[29]
H. R. Strong, G. Markowsky, and A. K. Chandra. 1979. Search Within a Page. J. ACM 26, 3 (1979), 457–482.
[30]
Ravi Rajwar Konrad Lai Thomas Legler Benjamin Schlegel Tomas Karnagel, Roman Dementiev and Wolfgang Lehner. Improving in-memory database index performance with Intel® Transactional Synchronization Extensions. In HPCA ’14. 476–487.
[31]
Xin Wang, Weihua Zhang, Zhaoguo Wang, Ziyun Wei, Haibo Chen, and Wenyun Zhao. Eunomia: Scaling Concurrent Search Trees Under Contention Using HTM. In PPoPP ’17. 385–399.
[32]
Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-scan: Ultra Fast Inmemory Table Scan Using On-chip Vector Processing Units. PVLDB 2, 1 (2009), 385–394.
[33]
Jingling Xue. 2000. Loop Tiling for Parallelism.
[34]
Steffen Zeuch, Johann-Christoph Freytag, and Frank Huber. Adapting Tree Structures for Processing with SIMD Instructions. In EDBT ’14. 97–108.
[35]
Feng Zhang, Peng Di, Hao Zhou, Xiangke Liao, and Jingling Xue. RegTT: Accelerating Tree Traversals on GPUs by Exploiting Regularities. In ICPP ’16. 562–571.
[36]
Hao Zhou and Jingling Xue. Exploiting Mixed SIMD Parallelism by Reducing Data Reorganization Overhead. In CGO ’16. 59–69.
[37]
Hao Zhou and Jingling Xue. 2016. A Compiler Approach for Exploiting Partial SIMD Parallelism. ACM Trans. Archit. Code Optim. (2016), 11:1– 11:26.

Cited By

View all

Index Terms

  1. Poker: permutation-based SIMD execution of intensive tree search by path encoding

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '18: Proceedings of the 2018 International Symposium on Code Generation and Optimization
    February 2018
    377 pages
    ISBN:9781450356176
    DOI:10.1145/3179541
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication Notes

    Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

    Publication History

    Published: 24 February 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Intensive Tree Search
    2. Permute
    3. SIMD

    Qualifiers

    • Research-article

    Conference

    CGO '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media