research-article

FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Authors:

Jatin Chhugani,

Nadathur Satish,

Anthony D. Nguyen,

Scott A. Brandt,

Pradeep DubeyAuthors Info & Claims

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Pages 339 - 350

https://doi.org/10.1145/1807167.1807206

Published: 06 June 2010 Publication History

Abstract

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal.

In this paper, we present FAST, an extremely fast architecture sensitive layout of the index tree. FAST is a binary tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware. FAST eliminates impact of memory latency, and exploits thread-level and datalevel parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second, 5X (CPU) and 1.7X (GPU) faster than the best previously reported performance on the same architectures. FAST supports efficient bulk updates by rebuilding index trees in less than 0.1 seconds for datasets as large as 64Mkeys and naturally integrates compression techniques, overcoming the memory bandwidth bottleneck and achieving a 6X performance improvement over uncompressed index search for large keys on CPUs.

References

[1]

D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, pages 671--682, 2006.

Digital Library

[2]

D. A. Alcantara, A. Sharf, F. Abbasinejad, S. Sengupta, et al. Real-time parallel hashing on the GPU. ACM Transactions on Graphics, 28(5), Dec. 2009.

Digital Library

[3]

V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. Software pipelining. ACM Comput. Surv., 27(3):367--432, 1995.

Digital Library

[4]

L. Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.

Digital Library

[5]

R. Bayer and K. Unterauer. Prefix b-trees. ACM Trans. Database Syst., 2(1):11--26, 1977.

Digital Library

[6]

D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Theory and practise of monotone minimal perfect hashing. In ALENEX, pages 132--144, 2009.

[7]

C. Binnig, S. Hildenbrand, and F. Färber. Dictionary-based order-preserving string compression for column stores. In SIGMOD, pages 283--296, 2009.

Digital Library

[8]

P. Bohannon, P. Mcllroy, and R. Rastogi. Main-memory index structures with fixed-size partial keys. In SIGMOD, pages 163--174, 2001.

Digital Library

[9]

S. Chen, P. B. Gibbons, and T. C. Mowry. Improving index performance through prefetching. SIGMOD Record, 30(2):235--246, 2001.

Digital Library

[10]

S. Chen, P. B. Gibbons, T. C. Mowry, et al. Fractal prefetching b+-trees: optimizing both cache and disk performance. In SIGMOD, pages 157--168, '02.

Digital Library

[11]

J. Chhugani, A. D. Nguyen, V.W. Lee,W. Macy, et al. Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB, 1(2), 2008.

Digital Library

[12]

J. Cieslewicz and K. A. Ross. Adaptive aggregation on chip multiprocessors. In VLDB, pages 339--350, 2007.

Digital Library

[13]

D. Comer. Ubiquitous b-tree. ACM Comput. Surv., 11(2):121--137, 1979.

Digital Library

[14]

E. A. Fox, Q. F. Chen, A. M. Daoud, and L. S. Heath. Order-preserving minimal perfect hash functions. ACM Trans. Inf. Syst., 9(3):281--308, 1991.

Digital Library

[15]

J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In ICDE, pages 370--379, 1998.

Digital Library

[16]

G. Graefe and P.-A. Larson. B-tree indexes and cpu caches. In ICDE, pages 349--358, 2001.

Digital Library

[17]

G. Graefe and L. Shapiro. Data compression and database performance. In Applied Computing, pages 22--27, Apr 1991.

[18]

R. A. Hankins and J. M. Patel. Effect of node size on the performance of cache-conscious b+-trees. In SIGMETRICS, pages 283--294, 2003.

Digital Library

[19]

A. L. Holloway, V. Raman, G. Swart, and D. J. DeWitt. How to barter bits for chronons: tradeoffs for database scans. In SIGMOD, pages 389--400, 2007.

Digital Library

[20]

B. R. Iyer and D. Wilhite. Data compression support in databases. In VLDB, pages 695--704, 1994.

Digital Library

[21]

T. Kaldewey, J. Hagen, A. D. Blas, and E. Sedlar. Parallel search on video cards. In USENIX Workshop on Hot Topics in Parallelism, 2009.

Digital Library

[22]

C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, et al. Sort vs. hash revisited: Fast join implementation on multi-core CPUs. PVLDB, 2(2):1378--1389, 2009.

Digital Library

[23]

T. J. Lehman and M. J. Carey. A study of index structures for main memory database management systems. In VLDB, pages 294--303, 1986.

Digital Library

[24]

NVIDIA. NVIDIA CUDA Programming Guide 2.3. 2009.

[25]

J. Rao and K. A. Ross. Cache conscious indexing for decision support in main memory. In VLDB, pages 78--89, 1999.

Digital Library

[26]

J. Rao and K. A. Ross. Making b+- trees cache conscious in main memory. In SIGMOD, pages 475--486, 2000.

Digital Library

[27]

M. Reilly. When multicore isn't enough: Trends and the future for multi-multicore systems. In HPEC, 2008.

[28]

B. Schlegel, R. Gemulla, and W. Lehner. k-ary search on modern processors. In DaMoN, pages 52--60, 2009.

Digital Library

[29]

L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, et al. Larrabee: A Many-Core x86 Architecture for Visual Computing. SIGGRAPH, 27(3), 2008.

Digital Library

[30]

T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, et al. Simd-scan: Ultra fast in-memory scan using vector processing units. PVLDB, 2(1):385--394, 2009.

Digital Library

[31]

J. Zhou and K. A. Ross. Implementing database operations using simd instructions. In SIGMOD Conference, pages 145--156, 2002.

Digital Library

[32]

J. Zhou and K. A. Ross. Buffering accesses to memory resident index structures. In VLDB, pages 405--416, 2003.

Digital Library

[33]

M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In ICDE, page 59, 2006

Digital Library

Cited By

Schulze RSchreiber TYatsishin IDahimene RMilovidov A(2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685802
Gao CBallijepalli SWang J(2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654972
Zhang SQi JYao XBrinkmann A(2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654948
Show More Cited By

Index Terms

FAST: fast architecture sensitive tree search on modern CPUs and GPUs
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Designing fast architecture-sensitive tree search on modern multicore/many-core processors

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor ...
A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing ...
Heterogeneous concurrent execution of Monte Carlo photon transport on CPU, GPU and MIC
IA³ '14: Proceedings of the 4th Workshop on Irregular Applications: Architectures and Algorithms

In this paper, a new level of heterogeneous concurrent execution of Monte Carlo photon transport is presented. ARCHER, an application for computing radiation dosimetry for CT imaging involving whole-body patient phantoms has been extended to execute on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

June 2010

1286 pages

ISBN:9781450300322

DOI:10.1145/1807167

General Chair:
Ahmed Elmagarmid
Purdue University, USA
,
Program Chair:
Divyakant Agrawal
University of California at Santa Barbara, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '10

Sponsor:

SIGMOD

SIGMOD/PODS '10: International Conference on Management of Data

June 6 - 10, 2010

Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

219
Total Citations
View Citations
3,086
Total Downloads

Downloads (Last 12 months)165
Downloads (Last 6 weeks)12

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schulze RSchreiber TYatsishin IDahimene RMilovidov A(2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685802
Gao CBallijepalli SWang J(2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654972
Zhang SQi JYao XBrinkmann A(2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654948
Wheatman BBurns RBuluc AXu HLee IChabbi MSteuwer M(2024)CPMA: An Efficient Batch-Parallel Compressed Set Without PointersProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638492(348-363)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638492
Li PHua YJia JZuo P(2024)A Fast Learned Key-Value Store for Concurrent and Distributed SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3327009(1-14)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3327009
Zhu RWang HXia SZheng B(2024)Learned index for non-key queriesKnowledge and Information Systems10.1007/s10115-024-02233-0Online publication date: 25-Sep-2024
https://doi.org/10.1007/s10115-024-02233-0
Magalhaes ABrayner AMonteiro J(2024)MM-DIRECTThe VLDB Journal10.1007/s00778-024-00846-z33:3(859-882)Online publication date: 27-Mar-2024
https://doi.org/10.1007/s00778-024-00846-z
Marwala TMarwala T(2024)Introduction to the Artificial Intelligence Balancing ProblemThe Balancing Problem in the Governance of Artificial Intelligence10.1007/978-981-97-9251-1_1(1-16)Online publication date: 13-Nov-2024
https://doi.org/10.1007/978-981-97-9251-1_1
Na YKoo BPark TPark JKim W(2023)ESL: A High-Performance Skiplist with Express LaneApplied Sciences10.3390/app1317992513:17(9925)Online publication date: 1-Sep-2023
https://doi.org/10.3390/app13179925
Henneberg JSchuhknecht F(2023)RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database IndexingProceedings of the VLDB Endowment10.14778/3625054.362506316:13(4268-4281)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.14778/3625054.3625063
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents