Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

When Tree Meets Hash: Reducing Random Reads for Index Structures on Persistent Memories

Published: 30 May 2023 Publication History

Abstract

Indexing structures are widely used in modern data-processing applications to support high-performance queries, and there are a variety of recent designs specifically optimized for the newly available persistent memory (PM). The primary focus of previous PM indexes is on reducing the expensive PM writes for persisting data. However, we find that in tree-based PM indexes, because of the smaller performance gap between writes and random reads on real PM devices, the read-intensive tree traversal phase dominates the overall latency. This observation calls for further optimizations on existing indexing structures for PM.
In this paper, we propose Extendible Radix Tree (ERT), an efficient indexing structure for PM that significantly reduces tree heights to minimize random reads, while still maintaining fast in-node search speed. The key idea is to use extendible hashing for each node in a radix tree. This design allows us to have a relatively large fanout of the radix tree to keep the tree height small, and also to realize constant-time lookups within a node. Using extendible hashing also allows for incremental node modification without excessive writes during inserts and updates. Range queries are efficiently and robustly handled by enforcing partial ordering among the keys in the hash table of each node without introducing more hash collisions. Our experiments on both synthetic and real-world data sets demonstrate that ERT achieves up to 2.65×, 4.41×, and 2.43× speedups for search, insert, and range queries over the respectively state-of-the-art PM index.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023
PDF File
Read me
ZIP File
Source Code

References

[1]
Amazon. 2018. Amazon sales rank data for print and kindle books. https://www.kaggle.com/ucffool/amazon-sales-rank-data-for-print-and-kindle-books.
[2]
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 International Conference on Management of Data (SIGMOD). ACM, Melbourne, Victoria, Australia, 1383--1394. https://doi.org/10.1145/2723372.2742797
[3]
Joy Arulraj, Justin J. Levandoski, Umar Farooq Minhas, and Per-Åke Larson. 2018. BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory. Proceedings of the VLDB Endowment, Vol. 11, 5 (2018), 553--565. https://doi.org/10.1145/3187009.3164147
[4]
Joy Arulraj, Matthew Perron, and Andrew Pavlo. 2016. Write-Behind Logging. Proceedings of the VLDB Endowment, Vol. 10, 4 (2016), 337--348. https://doi.org/10.14778/3025111.3025116
[5]
Kumud Bhandari, Dhruva R. Chakrabarti, and Hans-Juergen Boehm. 2016. Makalu: Fast Recoverable Allocation of Non-volatile Memory. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). ACM, Amsterdam, The Netherlands, 677--694. https://doi.org/10.1145/2983990.2984019
[6]
Robert Binna, Eva Zangerle, Martin Pichl, Gü nther Specht, and Viktor Leis. 2018. HOT: A Height Optimized Trie Index for Main-Memory Database Systems. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD). ACM, Houston, TX, USA, 521--534. https://doi.org/10.1145/3183713.3196896
[7]
Wentao Cai, Haosen Wen, H. Alan Beadle, Chris Kjellqvist, Mohammad Hedayati, and Michael L. Scott. 2020. Understanding and Optimizing Persistent Memory Allocation. In Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management (ISMM). ACM, virtual [London, UK], 60--73. https://doi.org/10.1145/3381898.3397212
[8]
Shimin Chen and Qin Jin. 2015. Persistent B-Trees in Non-Volatile Main Memory. Proceedings of the VLDB Endowment, Vol. 8, 7 (2015), 786--797. https://doi.org/10.14778/2752939.2752947
[9]
Nachshon Cohen, David T. Aksun, and James R. Larus. 2018. Object-oriented Recovery for Non-volatile Memory. Proceedings of the ACM on Programming Languages, Vol. 2, OOPSLA (2018), 153:1--153:22. https://doi.org/10.1145/3276523
[10]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 2010 ACM Symposium on Cloud Computing (SoCC). ACM, Indianapolis, Indiana, USA, 143--154. https://doi.org/10.1145/1807128.1807152
[11]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. Evolution of Development Priorities in Key-Value Stores Serving Large-scale Applications: The RocksDB Experience. In Proceedings of the 2021 USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Virtual Event, 33--49.
[12]
Emmanuel Esposito, Thomas Mueller Graf, and Sebastiano Vigna. 2020. RecSplit: Minimal Perfect Hashing via Recursive Splitting. In Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX). SIAM, Salt Lake City, UT, USA, 175--185. https://doi.org/10.1137/1.9781611976007.14
[13]
Ronald Fagin, Jü rg Nievergelt, Nicholas Pippenger, and H. Raymond Strong. 1979. Extendible Hashing - A Fast Access Method for Dynamic Files. ACM Transactions on Database Systems, Vol. 4, 3 (1979), 315--344. https://doi.org/10.1145/320083.320092
[14]
Keir Fraser and Tim Harris. 2007. Concurrent Programming Without Locks. ACM Transactions on Computer Systems, Vol. 25, 2 (2007), 5. https://doi.org/10.1145/1233307.1233309
[15]
Michael L. Fredman, Já nos Komló s, and Endre Szemeré di. 1984. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM, Vol. 31, 3 (1984), 538--544. https://doi.org/10.1145/828.1884
[16]
Frank T. Hady, Annie P. Foong, Bryan Veal, and Dan Williams. 2017. Platform Storage Performance With 3D XPoint Technology. Proc. IEEE, Vol. 105, 9 (2017), 1822--1833. https://doi.org/10.1109/JPROC.2017.2731776
[17]
Xiangpeng Hao, Lucas Lersch, Tianzheng Wang, and Ismail Oukid. 2020. PiBench Online: Interactive Benchmarking of Persistent Memory Indexes. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 2817--2820. https://doi.org/10.14778/3415478.3415483
[18]
Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable Transient Inconsistency in Byte-Addressable Persistent B-Tree. In Proceedings of the 2018 USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Oakland, CA, USA, 187--200.
[19]
Intel. 2021. Intel Optane DC Persistent Memory Module. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.
[20]
Wook-Hee Kim, Jinwoong Kim, Woongki Baek, Beomseok Nam, and Youjip Won. 2016. NVWAL: Exploiting NVRAM in Write-Ahead Logging. In Proceedings of the 2016 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Atlanta, GA, USA, 385--398. https://doi.org/10.1145/2872362.2872392
[21]
Michael Kö rber, Nikolaus Glombiewski, and Bernhard Seeger. 2021. Index-Accelerated Pattern Matching in Event Stores. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD). ACM, Virtual Event, China, 1023--1036. https://doi.org/10.1145/3448016.3457245
[22]
Se Kwon Lee, K. Hyun Lim, Hyunsub Song, Beomseok Nam, and Sam H. Noh. 2017. WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems. In Proceedings of the 2017 USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Santa Clara, CA, USA, 257--270.
[23]
Philip L. Lehman and S. Bing Yao. 1981. Efficient Locking for Concurrent Operations on B-Trees. ACM Transactions on Database Systems, Vol. 6, 4 (1981), 650--670. https://doi.org/10.1145/319628.319663
[24]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE). IEEE Computer Society, Brisbane, Australia, 38--49. https://doi.org/10.1109/ICDE.2013.6544812
[25]
Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. In Proceedings of the 2019 ACM Symposium on Operating Systems Principles (SOSP). ACM, Huntsville, ON, Canada, 447--461. https://doi.org/10.1145/3341301.3359628
[26]
Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for New Hardware Platforms. In Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE). IEEE Computer Society, Brisbane, Australia, 302--313. https://doi.org/10.1109/ICDE.2013.6544834
[27]
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In Proceedings of the 2014 USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, Seattle, WA, USA, 429--444.
[28]
Witold Litwin. 1980. Linear Hashing: A New Tool for File and Table Addressing. In Proceedings of the 6th International Conference on Very Large Data Bases (VLDB). IEEE Computer Society, Montreal, Quebec, Canada, 212--223.
[29]
Jihang Liu, Shimin Chen, and Lujun Wang. 2020. LB-Trees: Optimizing Persistent Index Performance on 3DXPoint Memory. Proceedings of the VLDB Endowment, Vol. 13, 7 (2020), 1078--1090. https://doi.org/10.14778/3384345.3384355
[30]
Baotong Lu, Xiangpeng Hao, Tianzheng Wang, and Eric Lo. 2020. Dash: Scalable Hashing on Persistent Memory. Proceedings of the VLDB Endowment, Vol. 13, 8 (2020), 1147--1161. https://doi.org/10.14778/3389133.3389134
[31]
Zhihan Lv, Xiaoming Li, Haibin Lv, and Wenqun Xiu. 2020. BIM Big Data Storage in WebVRGIS. IEEE Transactions on Industrial Informatics, Vol. 16, 4 (2020), 2566--2573. https://doi.org/10.1109/TII.2019.2916689
[32]
Shaonan Ma, Kang Chen, Shimin Chen, Mengxing Liu, Jianglang Zhu, Hongbo Kang, and Yongwei Wu. 2021. ROART: Range-query Optimized Persistent ART. In Proceedings of the 2021 USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Virtual Event, 1--16.
[33]
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-Value Storage. In Proceedings of the 2012 European Conference on Computer Systems (EuroSys). ACM, Bern, Switzerland, 183--196. https://doi.org/10.1145/2168836.2168855
[34]
Donald R. Morrison. 1968. PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric. Journal of the ACM, Vol. 15, 4 (1968), 514--534. https://doi.org/10.1145/321479.321481
[35]
Moohyeon Nam, Hokeun Cha, Young-ri Choi, Sam H. Noh, and Beomseok Nam. 2019. Write-Optimized Dynamic Hashing for Persistent Memory. In Proceedings of the 2019 USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Boston, MA, 31--44.
[36]
Ismail Oukid, Daniel Booss, Adrien Lespinasse, Wolfgang Lehner, Thomas Willhalm, and Gré goire Gomes. 2017. Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems. Proceedings of the VLDB Endowment, Vol. 10, 11 (2017), 1166--1177. https://doi.org/10.14778/3137628.3137629
[37]
Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD). ACM, San Francisco, CA, USA, 371--386. https://doi.org/10.1145/2882903.2915251
[38]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms, Vol. 51, 2 (2004), 122--144. https://doi.org/10.1016/j.jalgor.2003.12.002
[39]
Dulloor Subramanya Rao, Sanjay Kumar, Anil S. Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System Software for Persistent Memory. In Proceedings of the 2014 European Conference on Computer Systems (EuroSys). ACM, Amsterdam, The Netherlands, 15:1--15:15. https://doi.org/10.1145/2592798.2592814
[40]
Simone Raoux, Geoffrey W. Burr, Matthew J. Breitwisch, Charles T. Rettner, Yi-Chou Chen, Robert M. Shelby, Martin Salinga, Daniel Krebs, Shih-Hung Chen, Hsiang-Lan Lung, and Chung Hon Lam. 2008. Phase-Change Random Access Memory: A Scalable Technology. IBM Journal of Research and Development, Vol. 52, 4--5 (2008), 465--480. https://doi.org/10.1147/rd.524.0465
[41]
Peter Van Sandt, Yannis Chronis, and Jignesh M. Patel. 2019. Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search?. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD). ACM, Amsterdam, The Netherlands, 36--53. https://doi.org/10.1145/3299869.3300075
[42]
Steve Scargall. 2020. Persistent Memory Architecture. In Programming Persistent Memory. Springer, Santa Clara, CA, USA, 11--30.
[43]
David Schwalb, Tim Berning, Martin Faust, Markus Dreseler, and Hasso Plattner. 2015. nvm malloc: Memory Allocation for NVRAM. In International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS). ACM, Kohala Coast, Hawaii, USA, 61--72. http://www.adms-conf.org/2015/adms15_schwalb.pdf
[44]
Jason Sewall, Jatin Chhugani, Changkyu Kim, Nadathur Satish, and Pradeep Dubey. 2011. PALM: Parallel Architecture-Friendly Latch-Free Modifications to B Trees on Many-Core Processors. Proceedings of the VLDB Endowment, Vol. 4, 11 (2011), 795--806.
[45]
Spyros Blanas. 2020. From FLOPS to IOPS: The New Bottlenecks of Scientific Computing. https://www.sigarch.org/from-flops-to-iops-the-new-bottlenecks-of-scientific-computing/.
[46]
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy Transactions in Multicore In-Memory Databases. In Proceedings of the 2013 ACM Symposium on Operating Systems Principles (SOSP). ACM, Farmington, PA, USA, 18--32. https://doi.org/10.1145/2517349.2522713
[47]
Zixuan Wang, Xiao Liu, Jian Yang, Theodore Michailidis, Steven Swanson, and Jishen Zhao. 2020. Characterizing and Modeling Non-Volatile Memory Systems. In Proceedings of the 2020 Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, Athens, Greece, 496--508. https://doi.org/10.1109/MICRO50266.2020.00049
[48]
Wikimedia. 2022. Wikimedia Downloads. https://dumps.wikimedia.org/.
[49]
Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC). USENIX Association, Santa Clara, CA, USA, 349--362.
[50]
Yi Xu, Joseph Izraelevitz, and Steven Swanson. 2021. Clobber-NVM: Log Less, Re-execute More. In Proceedings of the 2021 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Virtual Event, USA, 346--359. https://doi.org/10.1145/3445814.3446730
[51]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steven Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In Proceedings of the 2020 USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Santa Clara, CA, USA, 169--182.
[52]
Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing Consistency Cost for NVM-Based Single Level Systems. In Proceedings of the 2015 USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Santa Clara, CA, USA, 167--181.
[53]
Vinson Young, Prashant J. Nair, and Moinuddin K. Qureshi. 2015. DEUCE: Write-Efficient Encryption for Non-Volatile Memories. In Proceedings of the 2015 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Istanbul, Turkey, 33--44. https://doi.org/10.1145/2694344.2694387
[54]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 2012 USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, San Jose, CA, USA, 15--28.
[55]
Bowen Zhang, Shengan Zheng, Zhenlin Qi, and Linpeng Huang. 2022. NBTree: A Lock-free PM-friendly Persistent B-Tree for eADR-enabled PM Systems. Proceedings of the VLDB Endowment, Vol. 15, 6 (2022), 1187--1200.
[56]
Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD). ACM, San Francisco, CA, USA, 1567--1581. https://doi.org/10.1145/2882903.2915222
[57]
Wenhui Zhang, Xingsheng Zhao, Song Jiang, and Hong Jiang. 2021. ChameleonDB: A Key-Value Store for Optane Persistent Memory. In Proceedings of the 2021 European Conference on Computer Systems (EuroSys). ACM, Online Event, United Kingdom, 194--209. https://doi.org/10.1145/3447786.3456237
[58]
Xinjing Zhou, Lidan Shou, Ke Chen, Wei Hu, and Gang Chen. 2019. DPTree: Differential Indexing for Persistent Memory. Proceedings of the VLDB Endowment, Vol. 13, 4 (2019), 421--434. https://doi.org/10.14778/3372716.3372717
[59]
Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory. In Proceedings of the 2018 USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Carlsbad, CA, USA, 461--476.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Badges

Author Tags

  1. Radix Tree
  2. extendible hashing
  3. index
  4. persistent memory

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)999
  • Downloads (Last 6 weeks)45
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LITS: An Optimized Learned Index for StringsProceedings of the VLDB Endowment10.14778/3681954.368201017:11(3415-3427)Online publication date: 30-Aug-2024
  • (2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
  • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 6-Aug-2024
  • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 6-Aug-2024
  • (2024)Visualization-Aware Time Series Min-Max Caching with Error Bound GuaranteesProceedings of the VLDB Endowment10.14778/3659437.365946017:8(2091-2103)Online publication date: 31-May-2024
  • (2024)Sorting on Byte-Addressable Storage: The Resurgence of Tree StructureProceedings of the VLDB Endowment10.14778/3648160.364818517:6(1487-1500)Online publication date: 3-May-2024
  • (2024)Performance-Based Pricing for Federated Learning via AuctionProceedings of the VLDB Endowment10.14778/3648160.364816917:6(1269-1282)Online publication date: 3-May-2024
  • (2024)Hybrid Prompt Learning for Generating Justifications of Security Risks in Automation RulesACM Transactions on Intelligent Systems and Technology10.1145/3675401Online publication date: 29-Jun-2024
  • (2024)Databases in Edge and Fog Environments: A SurveyACM Computing Surveys10.1145/366600156:11(1-40)Online publication date: 8-Jul-2024
  • (2024)RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36549702:3(1-27)Online publication date: 30-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media