Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access
Just Accepted

Tiny Pointers

Online AM: 21 October 2024 Publication History

Abstract

This paper introduces a new data-structural object that we call the tiny pointer. In many applications, traditional \(\log n\)-bit pointers can be replaced with \(o(\log n)\)-bit tiny pointers at the cost of only a constant-factor time overhead and a small probability of failure. We develop a comprehensive theory of tiny pointers, and give optimal constructions for both fixed-size tiny pointers (i.e., settings in which all of the tiny pointers must be the same size) and variable-size tiny pointers (i.e., settings in which the average tiny-pointer size must be small, but some tiny pointers can be larger). If a tiny pointer references an item in an array filled to load factor \(1-\delta\), then the optimal tiny-pointer size is \(\Theta(\log\log\log n+\log\delta^{-1})\) bits in the fixed-size case, and \(\Theta(\log\delta^{-1})\) expected bits in the variable-size case.
Our tiny-pointer constructions also require us to revisit several classic problems having to do with balls and bins; these results may be of independent interest.
Using tiny pointers, we apply tiny pointers to five classic data-structure problems. We show that:
A data structure storing \(n\) \(v\)-bit values for \(n\) keys with constant-factor time modifications/queries can be implemented to take space \(nv+O(n\log^{(r)}n)\) bits, for any constant \(r\gt0\), as long as the user stores a tiny pointer of expected size \(O(1)\) with each key—here, \(\log^{(r)}n\) is the \(r\)-th iterated logarithm.
Any binary search tree can be made succinct, meaning that it achieves \((1+o(1))\) times the optimal space, with constant-factor time overhead, and can even be made to be within \(O(n)\) bits of optimal if we allow for \(O(\log^{*}n)\)-time modifications—this holds even for rotation-based trees such as the splay tree and the red-black tree.
Any fixed-capacity key-value dictionary can be made stable (i.e., items do not move once inserted) with constant-factor time overhead and \((1+o(1))\)-factor space overhead.
Any key-value dictionary that requires uniform-size values can be made to support arbitrary-size values with constant-factor time overhead and with an additional space consumption of \(\log^{(r)}n+O(\log j)\) bits per \(j\)-bit value for an arbitrary constant \(r\gt0\) of our choice.
Given an external-memory array \(A\) of size \((1+\varepsilon)n\) containing a dynamic set of up to \(n\) key-value pairs, it is possible to maintain an internal-memory stash of size \(O(n\log\varepsilon^{-1})\) bits so that the location of any key-value pair in \(A\) can be computed in constant time (and with no IOs).
In each case tiny pointers allow for us to take a natural space-inefficient solution that uses pointers and make it space-efficient for free.

References

[1]
Abseil. 2024. Google's Abseil C++ Library. https://abseil.io/. Accessed: 2024-07-18.
[2]
George Adel’son-Vel’skii and Evgenii Landis. 1962. An algorithm for organization of information. In Doklady Akademii Nauk, Vol. 146. Russian Academy of Sciences, 263–266.
[3]
Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. 2001. Optimal static range reporting in one dimension. In Proceedings on 33rd Annual ACM Symposium on Theory of Computing, July 6-8, 2001, Heraklion, Crete, Greece, Jeffrey Scott Vitter, Paul G. Spirakis, and Mihalis Yannakakis (Eds.). ACM, 476–482. https://doi.org/10.1145/380752.380842
[4]
AMDManual. 2024. AMD64 Architecture Programmer's Manual Volume 2: System Programming. https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf. Accessed: 2024-07-18.
[5]
Yuriy Arbitman, Moni Naor, and Gil Segev. 2009. De-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results. In Automata, Languages and Programming, 36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 5555), Susanne Albers, Alberto Marchetti-Spaccamela, Yossi Matias, Sotiris E. Nikoletseas, and Wolfgang Thomas (Eds.). Springer, 107–118. https://doi.org/10.1007/978-3-642-02927-1_11
[6]
Yuriy Arbitman, Moni Naor, and Gil Segev. 2010. Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA. IEEE Computer Society, 787–796. https://doi.org/10.1109/FOCS.2010.80
[7]
Vladimir L’vovich Arlazarov, Yefim A Dinitz, MA Kronrod, and IgorAleksandrovich Faradzhev. 1970. On economical construction of the transitive closure of an oriented graph. In Doklady Akademii Nauk, Vol. 194. Russian Academy of Sciences, 487–488.
[8]
Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. 1999. Balanced Allocations. SIAM J. Comput. 29, 1 (Sept. 1999), 180–200. https://doi.org/10.1137/S0097539795288490
[9]
Michael A. Bender, Abhishek Bhattacharjee, Alex Conway, Martin Farach-Colton, Rob Johnson, Sudarsun Kannan, William Kuszmaul, Nirjhar Mukherjee, Donald E. Porter, Guido Tagliavini, Janet Vorobyeva, and Evan West. 2021. Paging and the Address-Translation Problem. In SPAA ’21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, Virtual Event, USA, 6-8 July, 2021, Kunal Agrawal and Yossi Azar (Eds.). ACM, 105–117. https://doi.org/10.1145/3409964.3461814
[10]
Michael A. Bender, Alex Conway, Martin Farach-Colton, William Kuszmaul, and Guido Tagliavini. 2023. Iceberg Hashing: Optimizing Many Hash-Table Criteria at Once. J. ACM 70, 6 (2023), 40:1–40:51. https://doi.org/10.1145/3625817
[11]
Michael A. Bender, Martin Farach-Colton, Mayank Goswami, Rob Johnson, Samuel McCauley, and Shikha Singh. 2018. Bloom Filters, Adaptivity, and the Dictionary Problem. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, Mikkel Thorup (Ed.). IEEE Computer Society, 182–193. https://doi.org/10.1109/FOCS.2018.00026
[12]
Michael A. Bender, Martin Farach-Colton, John Kuszmaul, William Kuszmaul, and Mingmou Liu. 2022. On the optimal time/space tradeoff for hash tables. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, Stefano Leonardi and Anupam Gupta (Eds.). ACM, 1284–1297. https://doi.org/10.1145/3519935.3519969
[13]
Michael A. Bender, Martin Farach-Colton, John Kuszmaul, William Kuszmaul, and Mingmou Liu. 2022. On the optimal time/space tradeoff for hash tables. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, Stefano Leonardi and Anupam Gupta (Eds.). ACM, 1284–1297. https://doi.org/10.1145/3519935.3519969
[14]
Ioana Oriana Bercea and Guy Even. 2020. A Space-Efficient Dynamic Dictionary for Multisets with Constant Time Operations. CoRR abs/2005.02143 (2020). arXiv:2005.02143 https://arxiv.org/abs/2005.02143
[15]
Ioana O. Bercea and Guy Even. 2023. Dynamic Dictionaries for Multisets and Counting Filters with Constant Time Operations. Algorithmica 85, 6 (2023), 1786–1804. https://doi.org/10.1007/S00453-022-01057-0
[16]
c++ std::map. 2023. cpppreference std::map. https://en.cppreference.com/w/cpp/container/map. Accessed: 2024-07-18.
[17]
c++ std::set. 2024. cpppreference std::set. https://en.cppreference.com/w/cpp/container/set. Accessed: 2024-07-18.
[18]
c++ std::unordered_map. 2024. cpppreference std::unordered_map. https://en.cppreference.com/w/cpp/container/unordered_map. Accessed: 2024-07-18.
[19]
c++ stl_set.h. 2024. gcc-mirror/gcc libstdc++-v3 stl_set.h. https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_set.h. Accessed: 2024-07-18.
[20]
c++ sty_map.h. 2024. gcc-mirror/gcc libstdc++-v3 stl_map.h. https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_map.h. Accessed: 2024-07-18.
[21]
c++ unordered_map.h. 2024. gcc-mirror/gcc libstdc++-v3 unordered_map.h. https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_map.h. Accessed: 2024-07-18.
[22]
c++ unordered_set. 2024. cpppreference std::unordered_set. https://en.cppreference.com/w/cpp/container/unordered_set. Accessed: 2024-07-18.
[23]
c++ unordered_set.h. 2024. gcc-mirror/gcc libstdc++-v3 unordered_set.h. https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/unordered_set.h. Accessed: 2024-07-18.
[24]
Joshimar Cordova and Gonzalo Navarro. 2016. Simple and efficient fully-functional succinct trees. Theor. Comput. Sci. 656 (2016), 135–145. https://doi.org/10.1016/J.TCS.2016.04.031
[25]
Pooya Davoodi, Rajeev Raman, and Srinivasa Rao Satti. 2017. On Succinct Representations of Binary Trees. Math. Comput. Sci. 11, 2, 177–189. https://doi.org/10.1007/S11786-017-0294-4
[26]
Erik D. Demaine, Friedhelm Meyer auf der Heide, Rasmus Pagh, and Mihai Puatracscu. 2006. De Dictionariis Dynamicis Pauco Spatio Utentibus (lat. On Dynamic Dictionaries Using Little Space). In LATIN 2006: Theoretical Informatics, 7th Latin American Symposium, Valdivia, Chile, March 20-24, 2006, Proceedings (Lecture Notes in Computer Science, Vol. 3887), José R. Correa, Alejandro Hevia, and Marcos A. Kiwi (Eds.). Springer, 349–361. https://doi.org/10.1007/11682462_34
[27]
Martin Dietzfelbinger and Rasmus Pagh. 2008. Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract). In Automata, Languages and Programming, 35th International Colloquium, ICALP 2008, Reykjavik, Iceland, July 7-11, 2008, Proceedings, Part I: Track A: Algorithms, Automata, Complexity, and Games (Lecture Notes in Computer Science, Vol. 5125), Luca Aceto, Ivan Damgård, Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz (Eds.). Springer, 385–396. https://doi.org/10.1007/978-3-540-70575-8_32
[28]
Martin Dietzfelbinger and Michael Rink. 2009. Applications of a Splitting Trick. In Automata, Languages and Programming, 36th International Colloquium, ICALP 2009, Rhodes, Greece, July 5-12, 2009, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 5555), Susanne Albers, Alberto Marchetti-Spaccamela, Yossi Matias, Sotiris E. Nikoletseas, and Wolfgang Thomas (Eds.). Springer, 354–365. https://doi.org/10.1007/978-3-642-02927-1_30
[29]
Martin Dietzfelbinger and Stefan Walzer. 2019. Constant-Time Retrieval with O(log m) Extra Bits. In 36th International Symposium on Theoretical Aspects of Computer Science, STACS 2019, March 13-16, 2019, Berlin, Germany (LIPIcs, Vol. 126), Rolf Niedermeier and Christophe Paul (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 24:1–24:16. https://doi.org/10.4230/LIPICS.STACS.2019.24
[30]
Martin Dietzfelbinger and Christoph Weidling. 2007. Balanced allocation and dictionaries with tightly packed constant size bins. Theor. Comput. Sci. 380, 1-2, 47–68. https://doi.org/10.1016/J.TCS.2007.02.054
[31]
Martin Dietzfelbinger and Philipp Woelfel. 2003. Almost random graphs with simple hash functions. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing, June 9-11, 2003, San Diego, CA, USA, Lawrence L. Larmore and Michel X. Goemans (Eds.). ACM, 629–638. https://doi.org/10.1145/780542.780634
[32]
F14 [n. d.]. Facebook's F14 Hash Table. https://engineering.fb.com/2019/04/25/developer-tools/f14/. Accessed: 2024-07-18.
[33]
Arash Farzan and J. Ian Munro. 2011. Succinct representation of dynamic trees. Theor. Comput. Sci. 412, 24 (2011), 2668–2678. https://doi.org/10.1016/J.TCS.2010.10.030
[34]
Dimitris Fotakis, Rasmus Pagh, Peter Sanders, and Paul G. Spirakis. 2005. Space Efficient Hash Tables with Worst Case Constant Access Time. Theory Comput. Syst. 38, 2 (2005), 229–248. https://doi.org/10.1007/S00224-004-1195-X
[35]
Gianni Franceschini and Roberto Grossi. 2003. Optimal Worst-Case Operations for Implicit Cache-Oblivious Search Trees. In Algorithms and Data Structures, 8th International Workshop, WADS 2003, Ottawa, Ontario, Canada, July 30 - August 1, 2003, Proceedings (Lecture Notes in Computer Science, Vol. 2748), Frank K. H. A. Dehne, Jörg-Rüdiger Sack, and Michiel H. M. Smid (Eds.). Springer, 114–126. https://doi.org/10.1007/978-3-540-45078-8_11
[36]
Gaston H. Gonnet and Per-Åke Larson. 1988. External hashing with limited internal storage. J. ACM 35, 1 (1988), 161–184. https://doi.org/10.1145/42267.42274
[37]
Krishnan Gosakan, Jaehyun Han, William Kuszmaul, Ibrahim N. Mubarek, Nirjhar Mukherjee, Karthik Sriram, Guido Tagliavini, Evan West, Michael A. Bender, Abhishek Bhattacharjee, Alex Conway, Martin Farach-Colton, Jayneel Gandhi, Rob Johnson, Sudarsun Kannan, and Donald E. Porter. 2023. Mosaic Pages: Big TLB Reach with Small Pages. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, Vancouver, BC, Canada, March 25-29, 2023, Tor M. Aamodt, Natalie D. Enright Jerger, and Michael M. Swift (Eds.). ACM, 433–448. https://doi.org/10.1145/3582016.3582021
[38]
Leonidas J. Guibas and Robert Sedgewick. 1978. A Dichromatic Framework for Balanced Trees. In 19th Annual Symposium on Foundations of Computer Science, Ann Arbor, Michigan, USA, 16-18 October 1978. IEEE Computer Society, 8–21. https://doi.org/10.1109/SFCS.1978.3
[39]
Takao Gunji and E Goto. 1980. Studies on hashing part-1: A comparison of hashing algorithms with key deletion. J. Information Processing 3, 1 (1980), 1–12.
[40]
Intel. 2024. Intel®64 and IA-32 Architectures Software Developer's Manual Combined Volumes 3A, 3B, 3C, and 3D: System Programming Guide. https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-3a-3b-3c-and-3d-system-programming-guide.html. Accessed: 2024-07-18.
[41]
Don E. Knuth. 1963. Notes on “Open” Addressing.
[42]
Donald E. Knuth. 1973. The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley.
[43]
Per-Åke Larson. 1983. Analysis of Uniform Hashing. J. ACM 30, 4 (1983), 805–819. https://doi.org/10.1145/2157.322407
[44]
Per-Åke Larson. 1988. Linear Hashing with Separators - A Dynamic Hashing Scheme Achieving One-Access Retrieval. ACM Trans. Database Syst. 13, 3 (1988), 366–388. https://doi.org/10.1145/44498.44500
[45]
Per-Åke Larson and Ajay Kajla. 1984. File Organization: Implementation of a Method Guaranteeing Retrieval in One Access. Commun. ACM 27, 7 (1984), 670–677. https://doi.org/10.1145/358105.358193
[46]
Mingmou Liu, Yitong Yin, and Huacheng Yu. 2020. Succinct Filters for Sets of Unknown Sizes. In 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference) (LIPIcs, Vol. 168), Artur Czumaj, Anuj Dawar, and Emanuela Merelli (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 79:1–79:19. https://doi.org/10.4230/LIPICS.ICALP.2020.79
[47]
M. Molloy and B. Reed. 2013. Graph Colouring and the Probabilistic Method. Springer Berlin Heidelberg. https://books.google.com/books?id=gU3xCAAAQBAJ
[48]
J. Ian Munro, Venkatesh Raman, and Adam J. Storm. 2001. Representing dynamic binary trees succinctly. In Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA, S. Rao Kosaraju (Ed.). ACM/SIAM, 529–536. http://dl.acm.org/citation.cfm?id=365411.365526
[49]
Gonzalo Navarro and Kunihiko Sadakane. 2014. Fully Functional Static and Dynamic Succinct Trees. ACM Trans. Algorithms 10, 3 (2014), 16:1–16:39. https://doi.org/10.1145/2601073
[50]
Anna Pagh and Rasmus Pagh. 2008. Uniform Hashing in Constant Time and Optimal Space. SIAM J. Comput. 38, 1 (2008), 85–96. https://doi.org/10.1137/060658400
[51]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. J. Algorithms 51, 2 (2004), 122–144. https://doi.org/10.1016/J.JALGOR.2003.12.002
[52]
Mihai Pătraşcu. 2008. Succincter. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA. IEEE Computer Society, 305–313. https://doi.org/10.1109/FOCS.2008.83
[53]
W. Wesley Peterson. 1957. Addressing for Random-Access Storage. IBM J. Res. Dev. 1, 2 (1957), 130–146. https://doi.org/10.1147/RD.12.0130
[54]
Martin Raab and Angelika Steger. 1998. ”Balls into Bins” - A Simple and Tight Analysis. In Randomization and Approximation Techniques in Computer Science, Second International Workshop, RANDOM’98, Barcelona, Spain, October 8-10, 1998, Proceedings (Lecture Notes in Computer Science, Vol. 1518), Michael Luby, José D. P. Rolim, and Maria J. Serna (Eds.). Springer, 159–170. https://doi.org/10.1007/3-540-49543-6_13
[55]
Rajeev Raman and S. Srinivasa Rao. 2003. Succinct Dynamic Dictionaries and Trees. In Automata, Languages and Programming, 30th International Colloquium, ICALP 2003, Eindhoven, The Netherlands, June 30 - July 4, 2003. Proceedings (Lecture Notes in Computer Science, Vol. 2719), Jos C. M. Baeten, Jan Karel Lenstra, Joachim Parrow, and Gerhard J. Woeginger (Eds.). Springer, 357–368. https://doi.org/10.1007/3-540-45061-0_30
[56]
Peter Sanders. 2018. Hashing with Linear Probing and Referential Integrity. CoRR abs/1808.04602 (2018). arXiv:1808.04602 http://arxiv.org/abs/1808.04602
[57]
Raimund Seidel and Cecilia R. Aragon. 1996. Randomized Search Trees. Algorithmica 16, 4/5 (1996), 464–497. https://doi.org/10.1007/BF01940876
[58]
Daniel Sleator and Robert Tarjan. 1985. Self-Adjusting Binary Search Trees. J. ACM 32, 3 (1985), 652–686. https://doi.org/10.1145/3828.3835
[59]
Berthold Vöcking. 2003. How asymmetry helps load balancing. J. ACM 50, 4 (2003), 568–589. https://doi.org/10.1145/792538.792546
[60]
Philipp Woelfel. 2006. Asymmetric balanced allocation with simple hash functions. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006. ACM Press, 424–433. http://dl.acm.org/citation.cfm?id=1109557.1109605

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Algorithms
ACM Transactions on Algorithms Just Accepted
EISSN:1549-6333
Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 21 October 2024
Accepted: 23 September 2024
Revised: 07 August 2024
Received: 23 July 2023

Check for updates

Author Tags

  1. pointers
  2. space-efficient
  3. balanced allocation
  4. balls and bins
  5. hashing
  6. load balancing
  7. randomized algorithms
  8. retrieval

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 55
    Total Downloads
  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)55
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media