Abstract
Distributed algorithms for graph searching require a high-performance CPU-efficient hash table that supports find-or-put. This operation either inserts data or indicates that it has already been added before. This paper focuses on the design and evaluation of such a hash table, targeting supercomputers. The latency of find-or-put is minimized by using one-sided RDMA operations. These operations are overlapped as much as possible to reduce waiting times for roundtrips. In contrast to existing work, we use linear probing and argue that this requires less roundtrips. The hash table is implemented in UPC. A peak-throughput of 114.9 million op/s is reached on an Infiniband cluster. With a load-factor of 0.9, find-or-put can be performed in \(4.5\,\upmu \mathrm{s}\) on average. The hash table performance remains very high, even under high loads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chapman, B., Curtis, T., Pophale, S., Poole, S., Kuehn, J., Koelbel, C., Smith, L.: Introducing OpenSHMEM: SHMEM for the PGAS community. In: Fourth Conference on Partitioned Global Address Space Programming Model. ACM (2010)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT press, Cambridge (2009)
Dragojevi, A., Narayanan, D., Hodson, O., Castro, M.: FaRM: Fast remote memory. In: 11th USENIX Conference on Networked Systems Design and Implementation, NSDI, vol. 14 (2014)
El-Ghazawi, T., Smith, L.: UPC: Unified Parallel C. In: ACM/IEEE Conference on Supercomputing. ACM (2006)
Farreras, M., Almasi, G., Cascaval, C., Cortes, T.: Scalable RDMA performance in PGAS languages. In: Parallel and Distributed Processing, pp. 1–12. IEEE (2009)
Herlihy, M.P., Shavit, N.N., Tzafrir, M.: Hopscotch hashing. In: Taubenfeld, G. (ed.) DISC 2008. LNCS, vol. 5218, pp. 350–364. Springer, Heidelberg (2008)
InfiniBand Trade Association: Accessed 9 May 2015. http://www.infinibandta.org
The Distributed ASCI Supercomputer 5 (2015). http://www.cs.vu.nl/das5
Kalia, A., Kaminsky, M., Andersen, D.G.: Using RDMA efficiently for key-value services. In: ACM Conference on SIGCOMM, pp. 295–306. ACM (2014)
Mitchell, C., Geng, Y., Li, J.: Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In: USENIX Annual Technical Conference, pp. 103–114 (2013)
Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)
Laarman, A., van de Pol, J., Weber, M.: Boosting multi-core reachability performance with shared hash tables. In: Conference on Formal Methods in Computer-Aided Design, FMCAD, pp. 247–256 (2010)
Ross, K.A.: Efficient hash probes on modern processors. In: IEEE 23rd International Conference on Data Engineering, pp. 1297–1301. IEEE (2007)
Rumble, S.M., Ongaro, D., Stutsman, R., Rosenblum, M., Ousterhout, J.K.: Its time for low latency. In: HotOS (2011)
Szepesi, T., Wong, B., Cassell, B., Brecht, T.: Designing a low-latency cuckoo hash table for write-intensive workloads using RDMA. In: First International Workshop on Rack-scale Computing (2014)
van Dijk, T., van de Pol, J.: Sylvan: Multi-core decision diagrams. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 677–691. Springer, Heidelberg (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Oortwijn, W., van Dijk, T., van de Pol, J. (2016). A Distributed Hash Table for Shared Memory. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science(), vol 9574. Springer, Cham. https://doi.org/10.1007/978-3-319-32152-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-32152-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32151-6
Online ISBN: 978-3-319-32152-3
eBook Packages: Computer ScienceComputer Science (R0)