Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2490483.2490495guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

When cycles are cheap, some tables can be huge

Published: 13 May 2013 Publication History

Abstract

The goal of this paper is to raise a new question: What changes in operating systems and networks if it were feasible to have a (type of) lookup table that supported billions, or hundreds of billions, of entries, using only a few bits per entry. We do so by showing that the progress of Moore's law, continuing to give more and more transistors per chip, makes it possible to apply formerly ludicrous amounts of brute-force parallel computation to find spacesavings opportunities.
We make two primary observations: First, that some applications can tolerate getting an incorrect answer from the table if they query for a key that is not in the table. For these applications, we can discard the keys entirely, using storage space only for the values. Further, for some applications, the value is not arbitrary. If the range of output values is small, we can instead view the problem as one of set separation. These two observations allow us to shrink the size of the mapping by brute force searching for a "perfect mapping" from inputs to outputs that (1) does not store the input keys; and (2) avoids collisions (and thus the related storage). Our preliminary results show that we can reduce memory consumption by an order of magnitude compared to traditional hash tables while providing competitive or better lookup performance.

References

[1]
Hadoop. http://hadoop.apache.org/, 2011.
[2]
Google SparseHash. https://code.google. com/p/sparsehash/, 2012.
[3]
D. Belazzougui, F. Botelho, and M. Dietzfelbinger. Hash, displace, and compress. In Proceedings of the 17th European Symposium on Algorithms, ESA '09, pages 682-693, 2009.
[4]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422-426, 1970.
[5]
A. Broder, M. Mitzenmacher, and A. Broder. Network Applications of Bloom Filters: A Survey. In Internet Mathematics, volume 1, pages 636-646, 2002.
[6]
A. Fikes. Storage architecture and challenges. Talk at the Google Faculty Summit, 2010.
[7]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proc. 19th ACM Symposium on Operating Systems Principles (SOSP), Lake George, NY, Oct. 2003.
[8]
V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, and R. L. Braynard. Networking named content. In Proc. CoNEXT, Dec. 2009.
[9]
S. Kumar, J. Turner, P. Crowley, and M. Mitzenmacher. HEXA: Compact data structures for faster packet processing. In Network Protocols, 2007. ICNP 2007. IEEE International Conference on, pages 246-255. IEEE, 2007.
[10]
H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: A memory-efficient, high-performance key-value store. In Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, Oct. 2011.
[11]
H. Lim, D. G. Andersen, and M. Kaminsky. Practical batch-updatable external hashing with sorting. In Proc. Meeting on Algorithm Engineering and Experiments (ALENEX), Jan. 2013.
[12]
M. Mitzenmacher and B. Vocking. The asymptotics of selecting the shortest of two, improved. In Proc. the Annual Allerton Conference on Communication Control and Computing, volume 37, pages 326-327, 1999.
[13]
M. Mitzenmacher, A. W. Richa, and R. Sitaraman. The power of two random choices: A survey of techniques and results. In Handbook of Randomized Computing, pages 255-312. Kluwer, 2000.
[14]
R. Pagh and F. Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122-144, May 2004.
[15]
M. Yu, A. Fabrikant, and J. Rexford. BUFFALO: Bloom filter forwarding architecture for large organizations. In Proc. CoNEXT, Dec. 2009.

Cited By

View all
  • (2017)SlimDBProceedings of the VLDB Endowment10.14778/3151106.315110810:13(2037-2048)Online publication date: 1-Sep-2017
  • (2015)Scaling Up Clustered Network Appliances with ScaleBricksACM SIGCOMM Computer Communication Review10.1145/2829988.278750345:4(241-254)Online publication date: 17-Aug-2015
  • (2015)Scaling Up Clustered Network Appliances with ScaleBricksProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication10.1145/2785956.2787503(241-254)Online publication date: 17-Aug-2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
HotOS'13: Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
May 2013
27 pages

Sponsors

  • VMware
  • Google Inc.
  • Microsoft Research: Microsoft Research

Publisher

USENIX Association

United States

Publication History

Published: 13 May 2013

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)SlimDBProceedings of the VLDB Endowment10.14778/3151106.315110810:13(2037-2048)Online publication date: 1-Sep-2017
  • (2015)Scaling Up Clustered Network Appliances with ScaleBricksACM SIGCOMM Computer Communication Review10.1145/2829988.278750345:4(241-254)Online publication date: 17-Aug-2015
  • (2015)Scaling Up Clustered Network Appliances with ScaleBricksProceedings of the 2015 ACM Conference on Special Interest Group on Data Communication10.1145/2785956.2787503(241-254)Online publication date: 17-Aug-2015

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media