research-article

Open access

Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)

Authors:

Guihai ChenAuthors Info & Claims

ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

Pages 23 - 27

https://doi.org/10.1145/3674399.3674418

Published: 30 July 2024 Publication History

All formats PDF

Abstract

Hash tables can efficiently determine whether an element exists in a given set and have been widely used in computer networks, the Internet of Things (IoT), data centers, and stream data mining. With the continuous generation of massive data, the memory consumption of hash tables keeps increasing. The emerging Compute Express Link (CXL) technique can significantly expand memory capacity. Porting hash tables from DRAM to CXL memory can alleviate the issue that hash tables occupy significant amounts of DRAM space. However, porting hash tables to CXL memory is not a trivial task. This paper analyzes the challenges of porting hash tables to CXL memory and shows opportunities to address these challenges.

References

[1]

2024. AWS Lambda. https://aws.amazon.com/lambda/ Accessed: 2024-06-06.

[2]

2024. Azure Functions. https://azure.microsoft.com/en-us/products/functions Accessed: 2024-06-06.

[3]

2024. CXL-based memory | Micron Technology Inc.https://sg.micron.com/products/memory/cxl-memory Accessed: 2024-06-09.

[4]

2024. Expanding the Limits of Memory Bandwidth and Density: Samsung’s CXL Memory Expander. https://semiconductor.samsung.com/news-events/tech-blog/expanding-the-limits-of-memory-bandwidth-and-density-samsungs-cxl-dram-memory-expander/ Accessed: 2024-06-09.

[5]

2024. Partitioned Index/Filters. https://rocksdb.org/blog/2017/05/12/partitioned-index-filter.html Accessed: 2024-06-09.

[6]

Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, and Erez Zadok. 2012. Don’t Thrash: How to Cache Your Hash on Flash. Proceedings of the VLDB Endowment 5, 11 (2012), 1627–1637.

Digital Library

[7]

Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM 13, 7 (1970), 422–426.

Digital Library

[8]

Sebastian Burckhardt, Badrish Chandramouli, Chris Gillum, David Justo, Konstantinos Kallas, Connor McMahon, Christopher Meiklejohn, and Xiangfeng Zhu. 2022. Netherite: Efficient Execution of Serverless Workflows. Proceedings of the VLDB Endowment 15, 8 (2022), 1591–1604.

Digital Library

[9]

Qi Chen, Hao Hu, Cai Deng, Dingbang Liu, Shiyi Li, Bo Tang, Ting Yao, and Wen Xia. 2023. EEPH: An Efficient Extendible Perfect Hashing for Hybrid PMem-DRAM. In Proceedings of the International Conference on Data Engineering. IEEE, 1366–1378.

[10]

Zhiwen Chen, Daokun Hu, Wenkui Che, Jianhua Sun, and Hao Chen. 2024. A quantitative evaluation of persistent memory hash indexes. The VLDB Journal 33, 2 (2024), 375–397.

Digital Library

[11]

Zhangyu Chen, Yu Hua, Bo Ding, and Pengfei Zuo. 2020. Lock-free Concurrent Level Hashing for Persistent Memory. In Proceedings of the Annual Technical Conference. USENIX Association, 799–812.

[12]

Haipeng Dai, Muhammad Shahzad, Alex X. Liu, and Yuankun Zhong. 2016. Finding Persistent Items in Data Streams. Proceedings of the VLDB Endowment 10, 4 (2016), 289–300.

Digital Library

[13]

Haipeng Dai, Hancheng Wang, Zhipeng Chen, Jiaqi Zheng, Meng Li, Rong Gu, Chen Tian, and Wanchun Dou. 2023. Variable-length Encoding Framework: A Generic Framework for Enhancing the Accuracy of Approximate Membership Queries. In Proceedings of International Conference on Data Mining. IEEE, 61–70.

[14]

Biplob Debnath, Alireza Haghdoost, Asim Kadav, Mohammed G. Khatib, and Cristian Ungureanu. 2015. Revisiting hash table design for phase change memory. In Proceedings of the Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads. ACM, 1–9.

Digital Library

[15]

Bin Fan, David G. Andersen, Michael Kaminsky, and Michael Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of ACM International Conference on Emerging Networking Experiments and Technologies. ACM, 75–88.

Digital Library

[16]

Li Fan, Pei Cao, Jussara M. Almeida, and Andrei Z. Broder. 2000. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactions on Networking 8, 3 (2000), 281–293.

Digital Library

[17]

Rong Gu, Simian Li, Haipeng Dai, Hancheng Wang, Yili Luo, Bin Fan, Ran Ben Basat, Ke Wang, Zhenyu Song, Shouwei Chen, Beinan Wang, Yihua Huang, and Guihai Chen. 2023. Adaptive Online Cache Capacity Optimization via Lightweight Working Set Size Estimation at Scale. In Proceedings of Annual Technical Conference. USENIX, 467–484.

[18]

Gaurav Gupta, Minghao Yan, Benjamin Coleman, Bryce Kille, Ryan A. Leo Elworth, Tharun Medini, Todd J. Treangen, and Anshumali Shrivastava. 2021. Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO). In Proceedings of International Conference on Management of Data. ACM, 2226–2234.

Digital Library

[19]

Maurice Herlihy, Nir Shavit, and Moran Tzafrir. 2008. Hopscotch Hashing. In Proceedings of International Symposium on Distributed Computing. Springer, 350–364.

[20]

Daokun Hu, Zhiwen Chen, Wenkui Che, Jianhua Sun, and Hao Chen. 2022. Halo: A Hybrid PMem-DRAM Persistent Hash Index with Fast Recovery. In Proceedings of International Conference on Management of Data. ACM, 1049–1063.

Digital Library

[21]

Daokun Hu, Zhiwen Chen, Jianbing Wu, Jianhua Sun, and Hao Chen. 2021. Persistent Memory Hash Indexes: An Experimental Evaluation. Proceedings of the VLDB Endowment 14, 5 (2021), 785–798.

Digital Library

[22]

Kaisong Huang, Yuliang He, and Tianzheng Wang. 2022. The Past, Present and Future of Indexing on Persistent Memory. Proceedings of the VLDB Endowment 15, 12 (2022), 3774–3777.

Digital Library

[23]

Robert Kelly, Barak A. Pearlmutter, and Phil Maguire. 2020. Lock-Free Hopscotch Hashing. In Proceedings of Symposium on Algorithmic Principles of Computer Systems. SIAM, 45–59.

[24]

Se Kwon Lee, Jayashree Mohan, Sanidhya Kashyap, Taesoo Kim, and Vijay Chidambaram. 2019. Recipe: converting concurrent DRAM indexes to persistent-memory indexes. In Proceedings of the Symposium on Operating Systems Principles. ACM, 462–477.

Digital Library

[25]

Sylvain Lefebvre. 2013. Indexed Bloom Filters for Web Caches Summaries. In Proceedings of International Conference on Computational Collective Intelligence. Springer, 507–516.

Digital Library

[26]

Tianlong Li, Tian Song, and Yating Yang. 2024. iStack: A General and Stateful Name-based Protocol Stack for Named Data Networking. In Proceedings of Symposium on Networked Systems Design and Implementation. USENIX, 267–280.

[27]

Yunchuan Li, Ziwei Wang, Ruixin Yang, Yan Zhao, Rui Zhou, and Kai Zheng. 2023. Learned Bloom Filter for Multi-key Membership Testing. In Proceedings of Database Systems for Advanced Applications. Springer, 62–79.

Digital Library

[28]

Witold Litwin. 1980. Linear Hashing: A New Tool for File and Table Addressing. In Proceedings of International Conference on Very Large Data Bases. ACM, 212–223.

[29]

Jiaqian Liu, Haipeng Dai, Rui Xia, Meng Li, Ran Ben Basat, Rui Li, and Guihai Chen. 2022. DUET: A Generic Framework for Finding Special Quadratic Elements in Data Streams. In Proceedings of International World Wide Web Conference. ACM, 2989–2997.

Digital Library

[30]

Shizhe Liu, Haipeng Dai, Shaoxu Song, Meng Li, Jingsong Dai, Rong Gu, and Guihai Chen. 2024. ACER: Accelerating Complex Event Recognition via Two-Phase Filtering under Range Bitmap-Based Indexes. In Proceedings of International Conference on Knowledge Discovery and Data Mining. ACM, 1–12.

Digital Library

[31]

Zhuoxuan Liu and Shimin Chen. 2023. Pea Hash: A Performant Extendible Adaptive Hashing Index. Proceedings of the ACM on Management of Data 1, 1 (2023), 1–25.

Digital Library

[32]

Baotong Lu, Xiangpeng Hao, Tianzheng Wang, and Eric Lo. 2020. Dash: Scalable Hashing on Persistent Memory. Proceedings of the VLDB Endowment 13, 8 (2020), 1147–1161.

Digital Library

[33]

Hasan Al Maruf and Mosharaf Chowdhury. 2023. Memory Disaggregation: Advances and Open Challenges. ACM SIGOPS Operating Systems Review 57, 1 (2023), 29–37.

Digital Library

[34]

Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit O. Kanaujia, and Prakash Chauhan. 2023. TPP: Transparent Page Placement for CXL-Enabled Tiered-Memory. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 742–755.

Digital Library

[35]

Hunter McCoy, Steven A. Hofmeyr, Katherine A. Yelick, and Prashant Pandey. 2023. High-Performance Filters for GPUs. In Proceedings of Annual Symposium on Principles and Practice of Parallel Programming. ACM, 160–173.

Digital Library

[36]

Moohyeon Nam, Hokeun Cha, Young-ri Choi, Sam H. Noh, and Beomseok Nam. 2019. Write-Optimized Dynamic Hashing for Persistent Memory. In Proceedings of the Conference on File and Storage Technologies. USENIX Association, 31–44.

[37]

Prashant Pandey, Michael A. Bender, Alex Conway, Martin Farach-Colton, William Kuszmaul, Guido Tagliavini, and Rob Johnson. 2023. IcebergHT: High Performance Hash Tables Through Stability and Low Associativity. Proceedings of the ACM on Management of Data 1, 1 (2023), 1–26.

Digital Library

[38]

Sylvia Ratnasamy, Andrey Ermolinskiy, and Scott Shenker. 2006. Revisiting IP Multicast. In Proceedings of Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. ACM, 15–26.

Digital Library

[39]

Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu. 2017. Monkey: Optimal Navigable Key-Value Store. In Proceedings of International Conference on Management of Data. ACM, 79–94.

[40]

David Schwalb, Markus Dreseler, Matthias Uflacker, and Hasso Plattner. 2015. NVC-Hashmap: A Persistent and Concurrent Hashmap For Non-Volatile Memories. In Proceedings of the VLDB Workshop on In-Memory Data Mangement and Analytics. ACM, 1–8.

Digital Library

[41]

Yupeng Tang, Ping Zhou, Wenhui Zhang, Henry Hu, Qirui Yang, Hao Xiang, Tongping Liu, Jiaxin Shan, Ruoyun Huang, Cheng Zhao, Cheng Chen, Hui Zhang, Fei Liu, Shuai Zhang, Xiaoning Ding, and Jianjun Chen. 2024. Exploring Performance and Cost Optimization with ASIC-Based CXL Memory. In Proceedings of the Nineteenth European Conference on Computer Systems. ACM, 818–833.

Digital Library

[42]

Lukas Vogel, Alexander van Renen, Satoshi Imamura, Jana Giceva, Thomas Neumann, and Alfons Kemper. 2022. Plush: A Write-Optimized Persistent Log-Structured Hash-Table. Proceedings of the VLDB Endowment 15, 11 (2022), 2895–2907.

Digital Library

[43]

Chao Wang, Junliang Hu, Tsun-Yu Yang, Yuhong Liang, and Ming-Chang Yang. 2023. SEPH: Scalable, Efficient, and Predictable Hashing on Persistent Memory. In Proceedings of the Symposium on Operating Systems Design and Implementation. USENIX Association, 479–495.

[44]

Hancheng Wang, Haipeng Dai, Shusen Chen, Meng Li, Rong Gu, Huayi Chai, Jiaqi Zheng, Zhiyuan Chen, Shuaituan Li, Xianjun Deng, and Guihai Chen. 2024. Bamboo Filters: Make Resizing Smooth and Adaptive. IEEE/ACM Transactions on Networking 32, 1 (2024), 1–16. https://doi.org/10.1109/TNET.2024.3403997

[45]

Hancheng Wang, Haipeng Dai, Rong Gu, Youyou Lu, Jiaqi Zheng, Jingsong Dai, Shusen Chen, Zhiyuan Chen, Shuaituan Li, and Guihai Chen. 2024. Wormhole Filters: Caching Your Hash on Persistent Memory. In Proceedings of European Conference on Computer Systems. ACM, 1–16.

Digital Library

[46]

Hancheng Wang, Haipeng Dai, Meng Li, Jun Yu, Rong Gu, Jiaqi Zheng, and Guihai Chen. 2022. Bamboo Filters: Make Resizing Smooth. In Proceedings of IEEE International Conference on Data Engineering. IEEE, 979–991.

[47]

Hao Zheng, Chen Tian, Tong Yang, Huiping Lin, Chang Liu, Zhaochen Zhang, Wanchun Dou, and Guihai Chen. 2022. Flymon: Enabling On-The-Fly Task Reconfiguration for Network Measurement. In Proceedings of Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. ACM, 486–502.

Digital Library

[48]

Pengfei Zuo and Yu Hua. 2018. A Write-Friendly and Cache-Optimized Hashing Scheme for Non-Volatile Memory Systems. IEEE Transactions on Parallel and Distributed Systems 29, 5 (2018), 985–998.

[49]

Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory. In Proceedings of the Symposium on Operating Systems Design and Implementation. USENIX Association, 461–476.

Index Terms

Rethinking Hash Tables: Challenges and Opportunities with Compute Express Link (CXL)

Index terms have been assigned to the content through auto-classification.

Recommendations

Herniated Hash Tables: Exploiting Multi-Level Phase Change Memory for In-Place Data Expansion
MEMSYS '15: Proceedings of the 2015 International Symposium on Memory Systems

Hash tables are a commonly used data structure used in many algorithms and applications. As applications and data scale, the efficient implementation of hash tables becomes increasingly important and challenging. In particular, memory capacity becomes ...
Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search
Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare ...
Challenges and Opportunities: From Near-memory Computing to In-memory Computing
ISPD '17: Proceedings of the 2017 ACM on International Symposium on Physical Design

The confluence of the recent advances in technology and the ever-growing demand for large-scale data analytics created a renewed interest in a decades-old concept, processing-in-memory (PIM). PIM, in general, may cover a very wide spectrum of compute ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

July 2024

261 pages

ISBN:9798400710117

DOI:10.1145/3674399

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2024

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University
Jiangsu High-level Innovation and Entrepreneurship (Shuangchuang) Program
National Natural Science Foundation of China

Conference

ACM-TURC '24

ACM-TURC '24: ACM Turing Award Celebration Conference 2024

July 5 - 7, 2024

Changsha, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
269
Total Downloads

Downloads (Last 12 months)269
Downloads (Last 6 weeks)88

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents