Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3656019.3676951acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Open access

A Parallel Hash Table for Streaming Applications

Published: 13 October 2024 Publication History

Abstract

Hash Tables are important data structures for a wide range of data intensive applications in various domains. They offer compact storage for sparse data, but their performance has difficulties to scale with the rapidly increasing volumes of data as they typically offer a single access port. Building a hash table with multiple parallel ports either has an excessive cost in memory resources, i.e., requiring redundant copies of its contents, and/or exhibits a worst case performance of just a single port memory due to bank conflicts. This work introduces a new multi-port hash table design, called Multi Hash Table, which does not require content replication to offer conflict free parallelism. Multi Hash Table avoids conflicts among its parallel banks by (i) supporting different dynamic mappings of its hash table address to index to the banks, and by (ii) caching (and aggregating) accesses to frequently used entries. The Multi Hash Table is used for reconfigurable single sliding window stream aggregation, increasing processing throughput by 7.5 ×.

References

[1]
amaranth lang. 2023. amaranth. https://github.com/amaranth-lang/amaranth.
[2]
AMD. 2023. Alveo U280 Data Center Accelerator Card. https://www.xilinx.com/products/boards-and-kits/alveo/u280.html.
[3]
AMD. 2023. Vitis HLS. https://www.xilinx.com/products/design-tools/vitis/vitis-hls.html.
[4]
Henrique CM Andrade, Buğra Gedik, and Deepak S Turaga. 2014. Fundamentals of stream processing: application design, systems, and analytics. Cambridge University Press.
[5]
Arvind Arasu, Mitch Cherniack, Eduardo Galvez, David Maier, Anurag S Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear road: a stream data management benchmark. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. 480–491.
[6]
Saman Ashkiani, Martin Farach-Colton, and John D. Owens. 2018. A Dynamic Hash Table for the GPU. In IEEE International Parallel and Distributed Processing Symposium, IPDPS. 419–429.
[7]
Tiziano De Matteis 2019. GASSER: An Auto-Tunable System for General Sliding-Window Streaming Operators on GPUs. IEEE Access 7 (2019), 48753–48769.
[8]
Tomás Fukac, Jirí Matousek, Jan Korenek, and Lukás Kekely. 2021. Increasing Memory Efficiency of Hash-Based Pattern Matching for High-Speed Networks. In International Conference on Field-Programmable Technology, (FPT). 1–9.
[9]
B. Gedik. 2014. Generic windowing support for extensible stream processing systems. Softw., Pract. Exper. 44, 9 (2014), 1105–1128.
[10]
Prajith Ramakrishnan Geethakumari, Vincenzo Gulisano, Bo Joel Svensson, Pedro Trancoso, and Ioannis Sourdis. 2017. Single window stream aggregation using reconfigurable hardware. In 2017 International Conference on Field Programmable Technology (ICFPT). 112–119. https://doi.org/10.1109/FPT.2017.8280128
[11]
Prajith Ramakrishnan Geethakumari, Vincenzo Gulisano, Pedro Trancoso, and Ioannis Sourdis. 2019. Time-SWAD: A Dataflow Engine for Time-Based Single Window Stream Aggregation. In Int’ernationa’l Conf. on Field-Programmable Technology (FPT). IEEE, 72–80.
[12]
Prajith Ramakrishnan Geethakumari and IoannisSourdis. 2021. A Specialized Memory Hierarchy for Stream Aggregation. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). 204–210. https://doi.org/10.1109/FPL53798.2021.00041
[13]
Prajith Ramakrishnan Geethakumari and Ioannis Sourdis. 2021. StreamZip: Compressed Sliding-Windows for Stream Aggregation. In 2021 International Conference on Field-Programmable Technology (ICFPT). 1–9. https://doi.org/10.1109/ICFPT52863.2021.9609952
[14]
Prajith Ramakrishnan Geethakumari and Ioannis Sourdis. 2023. Stream Aggregation with Compressed Sliding Windows. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 16, 3 (2023), 37:1–37:28.
[15]
John R. Gilbert, Steve Reinhardt, and Viral B. Shah. 2006. High-Performance Graph Algorithms from Parallel Sparse Matrices (PARA). In 8th International Conference on Applied Parallel Computing: State of the Art in Scientific Computing. 260–269.
[16]
Allan Gottlieb, Ralph Grishman, Clyde P. Kruskal, Kevin P. McAuliffe, Larry Rudolph, and Marc Snir. 1983. The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer. IEEE Trans. Comput. C-32, 2 (1983), 175–189. https://doi.org/10.1109/TC.1983.1676201
[17]
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Min. Knowl. Dis., 1(1) (Jan. 1997), 29–53.
[18]
Vincenzo Gulisano, Zbigniew Jerzak, Roman Katerinenko, Martin Strohbach, and Holger Ziekow. 2017. The DEBS 2017 Grand Challenge. In ACM Int. Conf. on Distributed Event-based Systems (DEBS). 271–273.
[19]
Vincenzo Gulisano, Zbigniew Jerzak, Spyros Voulgaris, and Holger Ziekow. 2016. The DEBS 2016 Grand Challenge. In ACM DEBS. ACM, 289–292.
[20]
Vincenzo Gulisano, Yiannis Nikolakopoulos, Ivan Walulya, Marina Papatriantafilou, and Philippas Tsigas. 2015. Deterministic Real-time Analytics of Geospatial Data Streams Through ScaleGate Objects. In ACM DEBS. ACM, 316–317.
[21]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th Int’l Conf. on Learning Representations, ICLR.
[22]
Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS). 1135–1143.
[23]
Zsolt István, Gustavo Alonso, Michaela Blott, and Kees Vissers. 2015. A hash table for line-rate data processing. ACM TRETS 8, 2 (2015), 13.
[24]
Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Using Accelerated Flash Storage for External Graph Analytics. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 411–424.
[25]
A. Kirsch, M. Mitzenmacher, and G. Varghese. 2010. Hash-Based Techniques for High-Speed Packet Processing. In Algorithms for Next Generation Networks.
[26]
Donald E. Knuth. 1998. The art of computer programming, volume 3: (2nd ed.) sorting and searching. Addison Wesley Longman Publishing Co., Inc., USA.
[27]
Charles Eric LaForest and J. Gregory Steffan. 2010. Efficient Multi-Ported Memories for FPGAs. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA). 41–50.
[28]
Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A Tucker. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD 34, 1 (2005), 39–44.
[29]
Ben Lin, Michael B. Healy, Rustam Miftakhutdinov, Philip G. Emma, and Yale Patt. 2018. Duplicon Cache: Mitigating Off-Chip Memory Bank and Bank Group Conflicts Via Data Duplication. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 285–297. https://doi.org/10.1109/MICRO.2018.00031
[30]
Rene Mueller, Jens Teubner, and Gustavo Alonso. 2009. Streams on wires: a query compiler for FPGAs. VLDB 2, 1 (2009), 229–240.
[31]
M. Naumov, L. Chien, P. Vandermersch, and U. Kapasi. 2010. Cusparse library. In GPU Technology Conference (GTC).
[32]
Michael Offel, Andreas Ley, and Sven Hager. 2023. HashCache: High-Performance State Tracking for Resilient FPGA-Based Packet Processing. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). 364–364. https://doi.org/10.1109/FPL60245.2023.00069
[33]
Yasin Oge, Masato Yoshimi, Takefumi Miyoshi, Hideyuki Kawashima, Hidetsugu Irie, and Tsutomu Yoshinaga. 2013. An efficient and scalable implementation of sliding-window aggregate operator on FPGA. In CANDAR. IEEE, 112–121.
[34]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2 (2004), 122–144.
[35]
Salvatore Pontarelli, Pedro Reviriego, and Juan Antonio Maestro. 2016. Parallel d-Pipeline: A Cuckoo Hashing Implementation for Increased Throughput. IEEE Trans. Comput. 65, 1 (2016), 326–331. https://doi.org/10.1109/TC.2015.2417524
[36]
M.O. Rabin and V.V. Vazirani. 1989. Maximum Matchings in General Graphs Through Randomization. In Journal of Algorithms, Vol. 10. 557–567.
[37]
Charles Reiss, John Wilkes, and Joseph L. Hellerstein. 2011. Google cluster-usage traces: format + schema. Technical Report. Google Inc., Mountain View, CA, USA. Revised 2014-11-17 for version 2.1. Posted at https://github.com/google/cluster-data.
[38]
André Seznec. 1993. A Case for Two-Way Skewed-Associative Caches. In 20th Annual International Symposium on Computer Architecture (ISCA). 169–178.
[39]
I. Sourdis, D. Pnevmatikatos, S. Wong, and S. Vassiliadis. 2005. A reconfigurable perfect-hashing scheme for packet inspection. In Int’l Conf. on Field Programmable Logic and Applications (FPL).644–647.
[40]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. In Advances in Neural Information Processing Systems.
[41]
Xilinx. 2023. XUP Vitis Network Example. https://github.com/Xilinx/xup_vitis_network_example.
[42]
Ichitaro Yamazaki and Xiaoye S. Li. 2010. On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver. In Proceedings of the 9th International Conference on High Performance Computing for Computational Science (VECPAR). 421–434.
[43]
Yang Yang, Sanmukh R. Kuppannagari, and Viktor K. Prasanna. 2020. A High Throughput Parallel Hash Table Accelerator on HBM-enabled FPGAs. In 2020 International Conference on Field-Programmable Technology (ICFPT). 148–153.
[44]
Yang Yang, Sanmukh R. Kuppannagari, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K. Prasanna. 2020. FASTHash: FPGA-Based High Throughput Parallel Hash Table. In International Conference in High Performance Computing. 3–22.

Index Terms

  1. A Parallel Hash Table for Streaming Applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques
    October 2024
    375 pages
    ISBN:9798400706318
    DOI:10.1145/3656019
    This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2024

    Check for updates

    Author Tags

    1. FPGA
    2. hash table
    3. high throughput
    4. stream aggregation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Swedish Foundation for Strategic Research

    Conference

    PACT '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 95
      Total Downloads
    • Downloads (Last 12 months)95
    • Downloads (Last 6 weeks)95
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media