Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3514221.3526121acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Adaptive Hybrid Indexes

Published: 11 June 2022 Publication History

Abstract

While index structures are crucial components in high-performance query processing systems, they occupy a large fraction of the available memory. Recently-proposed compact indexes reduce this space overhead and thus speed up queries by allowing the database to keep larger working sets in memory. These compact indexes, however, are slower than performance-optimized in-memory indexes because they adopt encodings that trade performance for memory efficiency. Applying different encodings within a single index might allow optimizing both dimensions at the same time - however, it is not clear which encodings should be applied to which index parts at build-time.
To take advantage of multiple encodings in one index structure, we present a new framework forming the basis of workload-adaptive hybrid indexes which moves encoding decisions to run-time instead. By sampling incoming queries adaptively, it tracks accesses to index parts and keeps fine-grained statistics which are used for space- and performance-optimized encoding migrations. We evaluated our framework using B+-trees and tries, and examine the adaptation process and space/performance trade-off for real-world and synthetic workloads. For skewed workloads, our framework can reduce the space by up to 82% while retaining more than 90% of the original performance.

Supplementary Material

MP4 File (SIGMOD22-moddm280.mp4)
Database systems leverage index structures to improve query performance. However, indexes can also significantly increase a database's memory footprint, making them an ideal compression target. Nevertheless, there is a trade-off between space and performance as more compressed encodings often result in decreased query performance. We, therefore, propose a framework called Adaptive Hybrid Indexes (AHI) that allows optimizing indexes adaptively at run-time at a finer granularity by considering the workload properties. Instead of representing all index parts using one encoding, AHI can implement different encodings which are adaptively applied depending on the current workload. For example, frequently accessed parts could be stored in performance-optimized encodings, while rarely accessed parts could be represented in a more compressed fashion. Our experiments show that this approach can reduce index sizes significantly while retaining most of the performance under skewed workloads.

References

[1]
AWS EC2 Instances. https://aws.amazon.com/en/ec2/instance-types/high-memory [accessed 2021-03-01].
[2]
LevelDB. https://github.com/google/leveldb [accessed 2021-03-01].
[3]
RocksDB. https://rocksdb.org/ [accessed 2021-03-01].
[4]
S2 Geometry Library. https://s2geometry.io/ [accessed 2021-03-01].
[5]
Tape is dead, Disk is tape, Flash is disk, RAM locality is king. http://research.microsoft.com/en-us/um/people/gray/talks/Flash_is_Good.ppt [accessed 2021-03-01].
[6]
C++ HopscotchMap. https://github.com/Tessil/hopscotch-map [accessed 2021-03-01].
[7]
Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling Queries on Compressed Data. In NSDI. USENIX Association, 337--350.
[8]
Adnan Alhomssi and Viktor Leis. 2021. Contention and Space Management in B-Trees. In CIDR. 26--37.
[9]
Christoph Anneser, Andreas Kipf, Harald Lang, Thomas Neumann, and Alfons Kemper. 2020. The Case for Hybrid Succinct Data Structures. In EDBT. 391--394.
[10]
Nikolas Askitis and Ranjan Sinha. 2007. HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings. In ACSC. 97--105.
[11]
Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Methods: The RUM Conjecture. In EDBT. 461--466.
[12]
David Benoit, Erik D. Demaine, J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. 2005. Representing Trees of Higher Degree. Algorithmica 43, 4 (Nov. 2005), 275--292.
[13]
Matthias Böhm, Benjamin Schlegel, Peter Benjamin Volk, Ulrike Fischer, Dirk Habich, and Wolfgang Lehner. 2011. Efficient In-Memory Indexing with Generalized Prefix Trees. In BTW. 227--246.
[14]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In USENIX. 209--223.
[15]
Kun-Ta Chuang, Jiun-Long Huang, and Ming-Syan Chen. 2008. Mining top-k frequent patterns in the presence of the memory constraint. The VLDB Journal 17, 5 (Aug. 2008), 1321--1344. https://doi.org/10.1007/s00778-007-0078--6
[16]
Edith Cohen, Nadav Grossaug, and Haim Kaplan. 2006. Processing Top k Queries from Samples. In CoNEXT.
[17]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In SoCC. 143--154. https://doi.org/10.1145/1807128.1807152
[18]
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, et al. 2020. ALEX: An Updatable Adaptive Learned Index. In SIGMOD. 969--984. https://doi.org/10.1145/3318464.3389711
[19]
Philipp Fent, Michael Jungmair, Andreas Kipf, and Thomas Neumann. 2020. START-Self-Tuning Adaptive Radix Tree. In ICDEW. IEEE, 147--153. https://doi.org/10.1109/ICDEW49219.2020.00015
[20]
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proc. VLDB Endow. 13, 8 (2020), 1162--1175. https://doi.org/10.14778/3389133.3389135
[21]
Florian Funke, Alfons Kemper, and Thomas Neumann. 2012. Compacting Transactional Data in Hybrid OLTP&OLAP Databases. VLDB 5, 11 (2012). https://doi.org/10.14778/2350229.2350258
[22]
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In SIGMOD. ACM, 1189--1206. https://doi.org/10.1145/3299869.3319860
[23]
Roberto Grossi and Giuseppe Ottaviano. 2014. Fast Compressed Tries through Path Decompositions. ACM J. Exp. Algorithmics 19, 1 (2014). https://doi.org/10.1145/2656332
[24]
Roberto Grossi and Jeffrey Scott Vitter. 2005. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput. 35, 2 (2005), 378--407. https://doi.org/10.1137/S0097539702402354
[25]
Anurag Khandelwal, Rachit Agarwal, and Ion Stoica. 2016. BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores. In NSDI. USENIX Association, 485--500.
[26]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. NeurIPS Workshop on Machine Learning for Systems (Dec. 2019). http://arxiv.org/abs/1911.13014
[27]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In aiDM. 1--5.
[28]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD. ACM, 489--504. https://doi.org/10.1145/3183713.3196909
[29]
Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD. ACM, 311--326.
[30]
Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE. IEEE, 185--196. https://doi.org/10.1109/ICDE.2018.00026
[31]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In ICDE, Vol. 13. 38--49.
[32]
Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of Practical Synchronization. In DaMoN. 1--8.
[33]
Justin J Levandoski, Per-Åke Larson, and Radu Stoica. 2013. Identifying Hot and Cold Data in Main-Memory Databases. In ICDE. IEEE, 26--37.
[34]
Xiaozhou Li, David G Andersen, Michael Kaminsky, and Michael J Freedman. 2014. Algorithmic Improvements for Fast Concurrent Cuckoo Hashing. In EuroSys. 1--14.
[35]
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-Value Storage. In EuroSys. 183--196.
[36]
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (2020), 1--13. https://doi.org/10.14778/3421424.3421425
[37]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient Computation of Frequent and Top-k Elements in Data Streams. In ICDT. Springer, 398--412.
[38]
Kyriakos Mouratidis, Spiridon Bakiras, and Dimitris Papadias. 2006. Continuous Monitoring of Top-k Queries over Sliding Windows. In SIGMOD. 635--646.
[39]
Gonzalo Navarro. 2016. Compact Data Structures - A Practical Approach. Cambridge University Press.
[40]
Thomas Neumann and Michael J Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.
[41]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al . 2017. Self- Driving Database Management Systems. In CIDR.
[42]
Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems. In SIGMOD. 61--72.
[43]
Andrea Pietracaprina, Matteo Riondato, Eli Upfal, and Fabio Vandin. 2010. Mining top-K frequent itemsets through progressive sampling. Data Mining and Knowledge Discovery 21, 2 (2010), 310--326.
[44]
Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. 2007. Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees, Prefix Sums and Multisets. ACM Trans. Algorithms 3, 4 (2007), 43.
[45]
Wolf Rödiger, Sam Idicula, Alfons Kemper, and Thomas Neumann. 2016. Flow-Join: Adaptive Skew Handling for Distributed Joins over High-Speed Networks. In ICDE. IEEE, 1194--1205.
[46]
Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, and Tim Kraska. 2021. Bounding the Last Mile: Efficient Learned String Indexing. 3rd International Workshop on Applied AI for Database Systems and Applications (2021).
[47]
Mihail Stoian, Andreas Kipf, Ryan Marcus, and Tim Kraska. 2021. PLEX: Towards Practical Learned Indexing. 3rd International Workshop on Applied AI for Database Systems and Applications (2021).
[48]
Michael Stonebraker, Lawrence A Rowe, and Michael Hirohama. 1990. The implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering 2, 1 (1990), 125--142.
[49]
Jeffrey S Vitter. 1985. Random Sampling with a Reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37--57.
[50]
Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In SIGMOD. 473--488.
[51]
Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. VLDB 14, 8 (2021), 1276--1288.
[52]
Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance Analysis of NVMe SSDs and their Implication on Real World Databases. In SYSTOR. 1--11.
[53]
Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD. ACM, 1567--1581.
[54]
Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. TKDE 27, 7 (2015), 1920--1948.
[55]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In SIGMOD. 323--336.
[56]
Huanchen Zhang, Xiaoxuan Liu, David G Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2020. Order-Preserving Key Compression for In-Memory Search Trees. In SIGMOD. 1601--1615.

Cited By

View all
  • (2024)Revisiting Database Indexing for Parallel and Accelerated Computing: A Comprehensive Study and Novel ApproachesInformation10.3390/info1508042915:8(429)Online publication date: 24-Jul-2024
  • (2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 1-Jun-2024
  • (2023)SALI: A Scalable Adaptive Learned Index Framework based on Probability ModelsProceedings of the ACM on Management of Data10.1145/36267521:4(1-25)Online publication date: 12-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive index
  2. hybrid index
  3. space-efficient index

Qualifiers

  • Research-article

Funding Sources

  • German Research Foundation

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)165
  • Downloads (Last 6 weeks)21
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Revisiting Database Indexing for Parallel and Accelerated Computing: A Comprehensive Study and Novel ApproachesInformation10.3390/info1508042915:8(429)Online publication date: 24-Jul-2024
  • (2024)DEX: Scalable Range Indexing on Disaggregated MemoryProceedings of the VLDB Endowment10.14778/3675034.367505017:10(2603-2616)Online publication date: 1-Jun-2024
  • (2023)SALI: A Scalable Adaptive Learned Index Framework based on Probability ModelsProceedings of the ACM on Management of Data10.1145/36267521:4(1-25)Online publication date: 12-Dec-2023
  • (2023)DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated ManagementProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605590(62-71)Online publication date: 7-Aug-2023
  • (2023)Morphtree: a polymorphic main-memory learned index for dynamic workloadsThe VLDB Journal10.1007/s00778-023-00823-y33:4(1065-1084)Online publication date: 1-Dec-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media