research-article

Open access

A Parallel Hash Table for Streaming Applications

Authors:

Magnus Östgren,

Ioannis SourdisAuthors Info & Claims

PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques

Pages 297 - 308

https://doi.org/10.1145/3656019.3676951

Published: 13 October 2024 Publication History

All formats PDF

Abstract

Hash Tables are important data structures for a wide range of data intensive applications in various domains. They offer compact storage for sparse data, but their performance has difficulties to scale with the rapidly increasing volumes of data as they typically offer a single access port. Building a hash table with multiple parallel ports either has an excessive cost in memory resources, i.e., requiring redundant copies of its contents, and/or exhibits a worst case performance of just a single port memory due to bank conflicts. This work introduces a new multi-port hash table design, called Multi Hash Table, which does not require content replication to offer conflict free parallelism. Multi Hash Table avoids conflicts among its parallel banks by (i) supporting different dynamic mappings of its hash table address to index to the banks, and by (ii) caching (and aggregating) accesses to frequently used entries. The Multi Hash Table is used for reconfigurable single sliding window stream aggregation, increasing processing throughput by 7.5 ×.

References

[1]

amaranth lang. 2023. amaranth. https://github.com/amaranth-lang/amaranth.

[2]

AMD. 2023. Alveo U280 Data Center Accelerator Card. https://www.xilinx.com/products/boards-and-kits/alveo/u280.html.

[3]

AMD. 2023. Vitis HLS. https://www.xilinx.com/products/design-tools/vitis/vitis-hls.html.

[4]

Henrique CM Andrade, Buğra Gedik, and Deepak S Turaga. 2014. Fundamentals of stream processing: application design, systems, and analytics. Cambridge University Press.

Digital Library

[5]

Arvind Arasu, Mitch Cherniack, Eduardo Galvez, David Maier, Anurag S Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear road: a stream data management benchmark. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. 480–491.

Digital Library

[6]

Saman Ashkiani, Martin Farach-Colton, and John D. Owens. 2018. A Dynamic Hash Table for the GPU. In IEEE International Parallel and Distributed Processing Symposium, IPDPS. 419–429.

[7]

Tiziano De Matteis 2019. GASSER: An Auto-Tunable System for General Sliding-Window Streaming Operators on GPUs. IEEE Access 7 (2019), 48753–48769.

[8]

Tomás Fukac, Jirí Matousek, Jan Korenek, and Lukás Kekely. 2021. Increasing Memory Efficiency of Hash-Based Pattern Matching for High-Speed Networks. In International Conference on Field-Programmable Technology, (FPT). 1–9.

[9]

B. Gedik. 2014. Generic windowing support for extensible stream processing systems. Softw., Pract. Exper. 44, 9 (2014), 1105–1128.

Digital Library

[10]

Prajith Ramakrishnan Geethakumari, Vincenzo Gulisano, Bo Joel Svensson, Pedro Trancoso, and Ioannis Sourdis. 2017. Single window stream aggregation using reconfigurable hardware. In 2017 International Conference on Field Programmable Technology (ICFPT). 112–119. https://doi.org/10.1109/FPT.2017.8280128

[11]

Prajith Ramakrishnan Geethakumari, Vincenzo Gulisano, Pedro Trancoso, and Ioannis Sourdis. 2019. Time-SWAD: A Dataflow Engine for Time-Based Single Window Stream Aggregation. In Int’ernationa’l Conf. on Field-Programmable Technology (FPT). IEEE, 72–80.

[12]

Prajith Ramakrishnan Geethakumari and IoannisSourdis. 2021. A Specialized Memory Hierarchy for Stream Aggregation. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). 204–210. https://doi.org/10.1109/FPL53798.2021.00041

[13]

Prajith Ramakrishnan Geethakumari and Ioannis Sourdis. 2021. StreamZip: Compressed Sliding-Windows for Stream Aggregation. In 2021 International Conference on Field-Programmable Technology (ICFPT). 1–9. https://doi.org/10.1109/ICFPT52863.2021.9609952

[14]

Prajith Ramakrishnan Geethakumari and Ioannis Sourdis. 2023. Stream Aggregation with Compressed Sliding Windows. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 16, 3 (2023), 37:1–37:28.

[15]

John R. Gilbert, Steve Reinhardt, and Viral B. Shah. 2006. High-Performance Graph Algorithms from Parallel Sparse Matrices (PARA). In 8th International Conference on Applied Parallel Computing: State of the Art in Scientific Computing. 260–269.

[16]

Allan Gottlieb, Ralph Grishman, Clyde P. Kruskal, Kevin P. McAuliffe, Larry Rudolph, and Marc Snir. 1983. The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer. IEEE Trans. Comput. C-32, 2 (1983), 175–189. https://doi.org/10.1109/TC.1983.1676201

Digital Library

[17]

Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Min. Knowl. Dis., 1(1) (Jan. 1997), 29–53.

[18]

Vincenzo Gulisano, Zbigniew Jerzak, Roman Katerinenko, Martin Strohbach, and Holger Ziekow. 2017. The DEBS 2017 Grand Challenge. In ACM Int. Conf. on Distributed Event-based Systems (DEBS). 271–273.

[19]

Vincenzo Gulisano, Zbigniew Jerzak, Spyros Voulgaris, and Holger Ziekow. 2016. The DEBS 2016 Grand Challenge. In ACM DEBS. ACM, 289–292.

[20]

Vincenzo Gulisano, Yiannis Nikolakopoulos, Ivan Walulya, Marina Papatriantafilou, and Philippas Tsigas. 2015. Deterministic Real-time Analytics of Geospatial Data Streams Through ScaleGate Objects. In ACM DEBS. ACM, 316–317.

[21]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th Int’l Conf. on Learning Representations, ICLR.

[22]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS). 1135–1143.

[23]

Zsolt István, Gustavo Alonso, Michaela Blott, and Kees Vissers. 2015. A hash table for line-rate data processing. ACM TRETS 8, 2 (2015), 13.

[24]

Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Using Accelerated Flash Storage for External Graph Analytics. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 411–424.

[25]

A. Kirsch, M. Mitzenmacher, and G. Varghese. 2010. Hash-Based Techniques for High-Speed Packet Processing. In Algorithms for Next Generation Networks.

[26]

Donald E. Knuth. 1998. The art of computer programming, volume 3: (2nd ed.) sorting and searching. Addison Wesley Longman Publishing Co., Inc., USA.

[27]

Charles Eric LaForest and J. Gregory Steffan. 2010. Efficient Multi-Ported Memories for FPGAs. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA). 41–50.

Digital Library

[28]

Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A Tucker. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD 34, 1 (2005), 39–44.

Digital Library

[29]

Ben Lin, Michael B. Healy, Rustam Miftakhutdinov, Philip G. Emma, and Yale Patt. 2018. Duplicon Cache: Mitigating Off-Chip Memory Bank and Bank Group Conflicts Via Data Duplication. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 285–297. https://doi.org/10.1109/MICRO.2018.00031

Digital Library

[30]

Rene Mueller, Jens Teubner, and Gustavo Alonso. 2009. Streams on wires: a query compiler for FPGAs. VLDB 2, 1 (2009), 229–240.

Digital Library

[31]

M. Naumov, L. Chien, P. Vandermersch, and U. Kapasi. 2010. Cusparse library. In GPU Technology Conference (GTC).

[32]

Michael Offel, Andreas Ley, and Sven Hager. 2023. HashCache: High-Performance State Tracking for Resilient FPGA-Based Packet Processing. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). 364–364. https://doi.org/10.1109/FPL60245.2023.00069

[33]

Yasin Oge, Masato Yoshimi, Takefumi Miyoshi, Hideyuki Kawashima, Hidetsugu Irie, and Tsutomu Yoshinaga. 2013. An efficient and scalable implementation of sliding-window aggregate operator on FPGA. In CANDAR. IEEE, 112–121.

[34]

Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2 (2004), 122–144.

Digital Library

[35]

Salvatore Pontarelli, Pedro Reviriego, and Juan Antonio Maestro. 2016. Parallel d-Pipeline: A Cuckoo Hashing Implementation for Increased Throughput. IEEE Trans. Comput. 65, 1 (2016), 326–331. https://doi.org/10.1109/TC.2015.2417524

Digital Library

[36]

M.O. Rabin and V.V. Vazirani. 1989. Maximum Matchings in General Graphs Through Randomization. In Journal of Algorithms, Vol. 10. 557–567.

[37]

Charles Reiss, John Wilkes, and Joseph L. Hellerstein. 2011. Google cluster-usage traces: format + schema. Technical Report. Google Inc., Mountain View, CA, USA. Revised 2014-11-17 for version 2.1. Posted at https://github.com/google/cluster-data.

[38]

André Seznec. 1993. A Case for Two-Way Skewed-Associative Caches. In 20th Annual International Symposium on Computer Architecture (ISCA). 169–178.

Digital Library

[39]

I. Sourdis, D. Pnevmatikatos, S. Wong, and S. Vassiliadis. 2005. A reconfigurable perfect-hashing scheme for packet inspection. In Int’l Conf. on Field Programmable Logic and Applications (FPL).644–647.

[40]

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. In Advances in Neural Information Processing Systems.

[41]

Xilinx. 2023. XUP Vitis Network Example. https://github.com/Xilinx/xup_vitis_network_example.

[42]

Ichitaro Yamazaki and Xiaoye S. Li. 2010. On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver. In Proceedings of the 9th International Conference on High Performance Computing for Computational Science (VECPAR). 421–434.

Digital Library

[43]

Yang Yang, Sanmukh R. Kuppannagari, and Viktor K. Prasanna. 2020. A High Throughput Parallel Hash Table Accelerator on HBM-enabled FPGAs. In 2020 International Conference on Field-Programmable Technology (ICFPT). 148–153.

[44]

Yang Yang, Sanmukh R. Kuppannagari, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K. Prasanna. 2020. FASTHash: FPGA-Based High Throughput Parallel Hash Table. In International Conference in High Performance Computing. 3–22.

Index Terms

A Parallel Hash Table for Streaming Applications
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Hash Table in Massively Parallel Systems
IPPS '92: Proceedings of the 6th International Parallel Processing Symposium

The authors look at the performance and new collision resolution strategies for hash tables in massively parallel systems. The results show that using a hash table with linear probing yields O(logN) time performance for handling M accesses by N ...
Performance Analysis of Non-cryptographic Hash Functions for Real-Time Storage and Lookup of URLs
FIT '13: Proceedings of the 2013 11th International Conference on Frontiers of Information Technology

In this work, the performance of various non-cryptographic hash functions has been investigated to identify their random nature when employed in the implementation of hash tables for real-time storage and lookup of uniform resource locators. The ...
Phase-concurrent hash tables for determinism
SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures

We present a deterministic phase-concurrent hash table in which operations of the same type are allowed to proceed concurrently, but operations of different types are not. Phase-concurrency guarantees that all concurrent operations commute, giving a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques

October 2024

375 pages

ISBN:9798400706318

DOI:10.1145/3656019

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Swedish Foundation for Strategic Research

Conference

PACT '24

Sponsor:

SIGARCH

PACT '24: International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
95
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)95

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents