Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1142351.1142385acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Cache-oblivious string B-trees

Published: 26 June 2006 Publication History

Abstract

B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally
when keys are long or of variable length,
when keys are compressed, even when using front compression, the standard B-tree compression scheme,
for range queries, and
with respect to memory effects such as disk prefetching.
This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways:
The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally.
It maintains an index whose size is proportional to the front-compressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner.
It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block.
It utilizes all levels of a memory hierarchy efficiently and makes good use of disk locality by using cache-oblivious layout strategies.

References

[1]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, Sept. 1988.]]
[2]
S. Alstrup, M. A. Bender, E. D. Demaine, M. Farach-Colton, T. Rauhe, and M. Thorup. Efficient tree layout in a multilevel memory hierarchy. arXiv:cs.DS/0211010, 2004. http://www.arXiv.org/pdf/cs.DS/0211010.]]
[3]
A. Amir, M. Farach, and Y. Matias. Efficient randomized dictionary matching algorithms (extended abstract). In Proc. 3rd Symp. on Combinatorial Pattern Matching (CPM), pp. 262--275, Tucson, Arizona, Apr. 1992.]]
[4]
R. Bayer and E. M. McCreight. Organization and maintenance of large ordered indexes. Acta Inf., 1(3):173--189, Feb. 1972.]]
[5]
R. Bayer and K. Unterauer. Prefix B-trees. ACM Trans. Database Syst., 2(1):11--26, 1977.]]
[6]
M. A. Bender, R. Cole, E. Demaine, M. Farach-Colton, and J. Zito. Two simplified algorithms for maintaining order in a list. In Proc. 10th European Symp. on Algorithms (ESA), pp. 152--164, Rome, Italy, Sept. 2002.]]
[7]
M. A. Bender, R. Cole, E. D. Demaine, and M. Farach-Colton. Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In Proc. 10th Annual European Symp. on Algorithms (ESA), pp. 139--151, Rome, Italy, Sept. 2002.]]
[8]
M. A. Bender, E. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. In Proc. 41st Annual Symp. on Foundations of Computer Science (FOCS), pp. 399--409, Redondo Beach, California, 2000.]]
[9]
M. A. Bender, E. Demaine, and M. Farach-Colton. Cache-oblivious B-trees. SIAM J. Comput., 35(2):341--358, 2005.]]
[10]
M. A. Bender, Z. Duan, J. Iacono, and J. Wu. A locality-preserving cache-oblivious dynamic dictionary. In Proc. 13th Annual Symp. on Discrete Mathematics (SODA), pp. 29--38, San Francisco, California, Jan. 2002.]]
[11]
M. A. Bender, Z. Duan, J. Iacono, and J. Wu. A locality-preserving cacheoblivious dynamic dictionary. J. Algorithms, 3(2):115--136, 2004.]]
[12]
G. S. Brodal and R. Fagerberg. Cache-oblivious string dictionaries. In Proc. 17th Annual Symposium on Discrete Algorithm (SODA), pp. 581--590, Miami, Florida, Jan. 2006.]]
[13]
G. S. Brodal, R. Fagerberg, and R. Jacob. Cache oblivious search trees via binary trees of small height. In Proc. 13th Annual Symposium on Discrete Algorithms (SODA), pp. 39--48, San Francisco, California, Jan. 2002.]]
[14]
H. K. Chang. Compressed indexing method. IBM Technical Disclosure Bulletin, 11(11):1400--1401, 1969.]]
[15]
W. A. Clark IV, K. A. Salmond, and T. A. Stafford. Method and means for generating compressed keys. US Patent 3,593,309, 3 Jan. 1969.]]
[16]
D. Comer. The ubiquitous B-tree. ACM Comput. Surv., 11(2):121--137, June 1979.]]
[17]
E. D. Demaine, J. Iacono, and S. Langerman. Worst-case optimal tree layout in a memory hierarchy. Manuscript. arXiv:DS/0410048, 2004. http://john2.poly.edu/papers/manu04/paper.pdf.]]
[18]
P. F. Dietz and D. D. Sleator. Two algorithms for maintaining order in a list. In Proc. 19th Annual Symposium on Theory of Computing (STOC), pp. 365--372, 1987.]]
[19]
P. Ferragina and R. Grossi. The String B-Tree: A new data structure for string search in external memory and its applications. J. ACM, 46(2):236--280, Mar. 1999.]]
[20]
M. L. Fredman and D. E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Science, 47:424--436, 1994.]]
[21]
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symp. on Foundations of Computer Science (FOCS), pp. 285--297, New York, New York, Oct. 1999.]]
[22]
J. Gil and A. Itai. How to pack trees. J. Algorithms, 32(2):108--132, 1999.]]
[23]
A. Itai, A. G. Konheim, and M. Rodeh. A sparse table implementation of priority queues. In Proc. 8th Internationl Colloquium on Automata, Languages, and Programming (ICALP), vol. 115, pp. 417--431, Acre (Akko), Israel, July 1981.]]
[24]
R. M. Karp and M. O. Rabin. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249--260, Mar. 1987.]]
[25]
I. Katriel. Implicit data structures based on local reorganizations. Master's thesis, Technion, Israel Inst. of Tech., Haifa, May 2002.]]
[26]
D. E. Knuth. Sorting and Searching, vol. 3 of The Art of Computer Programming. Addison-Wesley, Reading, Massachusetts, 1973.]]
[27]
K. Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching, pp. 198--199. Springer-Verlag, Berlin, 1984.]]
[28]
M. Naor. String matching with preprocessing of text and pattern. In Proc. 18th International Colloquium on Automata, Languages and Programming (ICALP), pp. 739--750, 1991.]]
[29]
H. Prokop. Cache-oblivious algorithms. Master's thesis, Department of Electrical Engineering and Computer Science, Massachusetts Inst. of Tech., June 1999.]]
[30]
Sleepycat Software. The Berkeley Database. http://www.sleepycat.com, 2005.]]
[31]
K. A. Smith and M. I. Seltzer. File system aging - increasing the relevance of file system benchmarks. In Proc. 1997 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 203--213, Seattle, Washington, 1997.]]
[32]
R. E. Wagner. Indexing design considerations. IBM Syst. J., 12(4):351--367, 1973.]]

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2006
382 pages
ISBN:1595933182
DOI:10.1145/1142351
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache oblivious string B-tree
  2. locality preserving front compression
  3. packed-memory array
  4. range query
  5. rebalance

Qualifiers

  • Article

Conference

SIGMOD/PODS06

Acceptance Rates

PODS '06 Paper Acceptance Rate 35 of 185 submissions, 19%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Nearly Optimal List Labeling2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00132(2253-2274)Online publication date: 27-Oct-2024
  • (2024)CoCo-trieInformation Systems10.1016/j.is.2023.102316120:COnline publication date: 1-Feb-2024
  • (2024)Predecessor on the Ultra-Wide Word RAMAlgorithmica10.1007/s00453-023-01193-186:5(1578-1599)Online publication date: 10-Jan-2024
  • (2022) Online List Labeling: Breaking the log 2 n Barrier 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00096(980-990)Online publication date: Oct-2022
  • (2022)B2-Tree: Page-Based String Indexing in Concurrent EnvironmentsDatenbank-Spektrum10.1007/s13222-022-00409-y22:1(11-22)Online publication date: 21-Feb-2022
  • (2022)Compressed String Dictionaries via Data-Aware Subtrie CompactionString Processing and Information Retrieval10.1007/978-3-031-20643-6_17(233-249)Online publication date: 8-Nov-2022
  • (2021)External-memory Dictionaries in the Affine and PDAM ModelsACM Transactions on Parallel Computing10.1145/34706358:3(1-20)Online publication date: 20-Sep-2021
  • (2021)Adaptive learning of compressible stringsTheoretical Computer Science10.1016/j.tcs.2021.10.003896:C(46-52)Online publication date: 6-Dec-2021
  • (2019)Small Refinements to the DAM Can Have Big Consequences for Data-Structure DesignThe 31st ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3323165.3323210(265-274)Online publication date: 17-Jun-2019
  • (2019)Packed Memory Arrays - Rewired2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00079(830-841)Online publication date: Apr-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media