Article

When cycles are cheap, some tables can be huge

Authors:

Michael Kaminsky,

David G. AndersenAuthors Info & Claims

HotOS'13: Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems

Page 12

Published: 13 May 2013 Publication History

Abstract

The goal of this paper is to raise a new question: What changes in operating systems and networks if it were feasible to have a (type of) lookup table that supported billions, or hundreds of billions, of entries, using only a few bits per entry. We do so by showing that the progress of Moore's law, continuing to give more and more transistors per chip, makes it possible to apply formerly ludicrous amounts of brute-force parallel computation to find spacesavings opportunities.

We make two primary observations: First, that some applications can tolerate getting an incorrect answer from the table if they query for a key that is not in the table. For these applications, we can discard the keys entirely, using storage space only for the values. Further, for some applications, the value is not arbitrary. If the range of output values is small, we can instead view the problem as one of set separation. These two observations allow us to shrink the size of the mapping by brute force searching for a "perfect mapping" from inputs to outputs that (1) does not store the input keys; and (2) avoids collisions (and thus the related storage). Our preliminary results show that we can reduce memory consumption by an order of magnitude compared to traditional hash tables while providing competitive or better lookup performance.

References

[1]

Hadoop. http://hadoop.apache.org/, 2011.

[2]

Google SparseHash. https://code.google. com/p/sparsehash/, 2012.

[3]

D. Belazzougui, F. Botelho, and M. Dietzfelbinger. Hash, displace, and compress. In Proceedings of the 17th European Symposium on Algorithms, ESA '09, pages 682-693, 2009.

[4]

B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422-426, 1970.

[5]

A. Broder, M. Mitzenmacher, and A. Broder. Network Applications of Bloom Filters: A Survey. In Internet Mathematics, volume 1, pages 636-646, 2002.

[6]

A. Fikes. Storage architecture and challenges. Talk at the Google Faculty Summit, 2010.

[7]

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proc. 19th ACM Symposium on Operating Systems Principles (SOSP), Lake George, NY, Oct. 2003.

[8]

V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, and R. L. Braynard. Networking named content. In Proc. CoNEXT, Dec. 2009.

[9]

S. Kumar, J. Turner, P. Crowley, and M. Mitzenmacher. HEXA: Compact data structures for faster packet processing. In Network Protocols, 2007. ICNP 2007. IEEE International Conference on, pages 246-255. IEEE, 2007.

[10]

H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: A memory-efficient, high-performance key-value store. In Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, Oct. 2011.

[11]

H. Lim, D. G. Andersen, and M. Kaminsky. Practical batch-updatable external hashing with sorting. In Proc. Meeting on Algorithm Engineering and Experiments (ALENEX), Jan. 2013.

[12]

M. Mitzenmacher and B. Vocking. The asymptotics of selecting the shortest of two, improved. In Proc. the Annual Allerton Conference on Communication Control and Computing, volume 37, pages 326-327, 1999.

[13]

M. Mitzenmacher, A. W. Richa, and R. Sitaraman. The power of two random choices: A survey of techniques and results. In Handbook of Randomized Computing, pages 255-312. Kluwer, 2000.

[14]

R. Pagh and F. Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122-144, May 2004.

[15]

M. Yu, A. Fabrikant, and J. Rexford. BUFFALO: Bloom filter forwarding architecture for large organizations. In Proc. CoNEXT, Dec. 2009.

Cited By

Ren KZheng QArulraj JGibson G(2017)SlimDBProceedings of the VLDB Endowment10.14778/3151106.315110810:13(2037-2048)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.14778/3151106.3151108
Zhou DFan BLim HAndersen DKaminsky MMitzenmacher MWang RSingh A(2015)Scaling Up Clustered Network Appliances with ScaleBricksACM SIGCOMM Computer Communication Review10.1145/2829988.278750345:4(241-254)Online publication date: 17-Aug-2015
https://dl.acm.org/doi/10.1145/2829988.2787503
Zhou DFan BLim HAndersen DKaminsky MMitzenmacher MWang RSingh AUhlig SMaennel OKarp BPadhye J(2015)Scaling Up Clustered Network Appliances with ScaleBricksProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication10.1145/2785956.2787503(241-254)Online publication date: 17-Aug-2015
https://dl.acm.org/doi/10.1145/2785956.2787503

When cycles are cheap, some tables can be huge
1. Information systems
  1. Information storage systems
    1. Record storage systems
      1. Record storage alternatives

Recommendations

Can recursive bisection alone produce routable placements?
DAC '00: Proceedings of the 37th Annual Design Automation Conference

This work focuses on congestion-driven placement of standard cells into rows in the fixed-die context. We summarize the state-of-the-art after two decades of research in recursive bisection placement and implement a new placer, called Capo, to ...
Small forwarding tables for fast routing lookups
SIGCOMM '97: Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication

For some time, the networking community has assumed that it is impossible to do IP routing lookups in software fast enough to support gigabit speeds. IP routing lookups must find the routing entry with the longest matching prefix, a task that has been ...
Stable High-Capacity One-Hop Distributed Hash Tables
ISCC '06: Proceedings of the 11th IEEE Symposium on Computers and Communications

Most research on Distributed Hash Tables (DHTs) assumes ephemeral, lightly loaded deployments. Each node has a lifetime of a few hours and initiates a lookup once every few seconds or minutes. However, in giant internet data centers, each node has a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

HotOS'13: Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems

May 2013

27 pages

General Chair:
Petros Maniatis
Intel Labs

Sponsors

VMware
Google Inc.
Microsoft Research: Microsoft Research

Publisher

USENIX Association

United States

Publication History

Published: 13 May 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ren KZheng QArulraj JGibson G(2017)SlimDBProceedings of the VLDB Endowment10.14778/3151106.315110810:13(2037-2048)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.14778/3151106.3151108
Zhou DFan BLim HAndersen DKaminsky MMitzenmacher MWang RSingh A(2015)Scaling Up Clustered Network Appliances with ScaleBricksACM SIGCOMM Computer Communication Review10.1145/2829988.278750345:4(241-254)Online publication date: 17-Aug-2015
https://dl.acm.org/doi/10.1145/2829988.2787503
Zhou DFan BLim HAndersen DKaminsky MMitzenmacher MWang RSingh AUhlig SMaennel OKarp BPadhye J(2015)Scaling Up Clustered Network Appliances with ScaleBricksProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication10.1145/2785956.2787503(241-254)Online publication date: 17-Aug-2015
https://dl.acm.org/doi/10.1145/2785956.2787503

View Options

View options

Media

Figures

Other

Tables

View Table of Contents