Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Optimizing hash-array mapped tries for fast and lean immutable JVM collections

Published: 23 October 2015 Publication History

Abstract

The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemented as Hash-Array Mapped Tries (HAMTs). HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts. This particularly prohibits their application in programs which process larger data sets. In this paper, we propose changes to the HAMT design that increase the overall performance of immutable sets and maps. The resulting general purpose design increases cache locality and features a canonical representation. It outperforms Scala’s and Clojure’s data structure implementations in terms of memory footprint and runtime efficiency of iteration (1.3–6.7x) and equality checking (3–25.4x).

Supplementary Material

Auxiliary Archive (p783-steindorfer-s.zip)
Artifact that was submitted to the OOPSLA'15 Artifact Evaluation. Contains the source code of the benchmarks and the results from our experiments.

References

[1]
A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.
[2]
P. Bagwell. Fast And Space Efficient Trie Searches. Technical Report LAMP-REPORT-2000-001, Ecole polytechnique fédérale de Lausanne, 2000.
[3]
P. Bagwell. Ideal Hash Trees. Technical Report LAMPREPORT-2001-001, Ecole polytechnique fédérale de Lausanne, 2001.
[4]
P. Bagwell. Fast Functional Lists, Hash-Lists, Deques, and Variable Length Arrays. Technical Report LAMP-REPORT- 2002-003, Ecole polytechnique fédérale de Lausanne, 2002.
[5]
P. Bagwell and T. Rompf. RRB-Trees: Efficient Immutable Vectors. Technical Report EPFL-REPORT-169879, Ecole polytechnique fédérale de Lausanne, 2011.
[6]
A. Biboudis, N. Palladinos, G. Fourtounis, and Y. Smaragdakis. Streams a la carte: Extensible Pipelines with Object Algebras. In ECOOP ’15: Proceedings of the 29th European conference on Object-Oriented Programming. Schloss Dagstuhl, 2015.
[7]
T. J. Biggerstaff. A perspective of generative reuse. Annals of Software Engineering, 5(1):169–226, 1998. ISSN 1022-7091.
[8]
K. D. Cooper, T. J. Harvey, and K. Kennedy. A Simple, Fast Dominance Algorithm. Technical Report TR-06-33870, Rice University, 2006.
[9]
K. Czarnecki and U. W. Eisenecker. Generative Programming: Methods, Tools, and Applications. ACM Press, 2000.
[10]
R. De La Briandais. File Searching Using Variable Length Keys. In Papers presented at the the March 3-5, 1959, western joint computer conference, pages 295–298. ACM Press, 1959.
[11]
J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. In Proceedings of the 18th annual ACM symposium on Theory of computing. ACM, 1986.
[12]
J. Ebert, D. Bildhauer, H. Schwarz, and V. Riediger. Using difference information to reuse software cases. Softwaretechnik-Trends, 2007.
[13]
E. Fredkin. Trie Memory. Communications of the ACM, 3(9): 490–499, 1960.
[14]
J. Gil and Y. Shimron. Smaller Footprint for Java Collections. In ECOOP ’12: Proceedings of the 26th European conference on Object-Oriented Programming. Springer-Verlag, 2012.
[15]
E. Hajiyev, M. Verbaere, and O. de Moor. Codequest: Scalable source code queries with datalog. In ECOOP ’06: Proceedings of the 20th European Conference on Object-Oriented Programming. Springer-Verlag, 2006.
[16]
M. Hills and P. Klint. PHP AiR: Analyzing PHP systems with Rascal. In Proceedings of IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering. IEEE, 2014.
[17]
A. Igarashi and M. Viroli. On Variance-Based Subtyping for Parametric Types. In ECOOP ’02: Proceedings of the 16th European conference on Object-Oriented Programming. Springer-Verlag, 2002.
[18]
P. Klint, T. van der Storm, and J. Vinju. Rascal: A Domain Specific Language for Source Code Analysis and Manipulation. In Proceedings of Ninth IEEE International Working Conference on Source Code Analysis and Manipulation. IEEE, 2009.
[19]
R. E. Ladner, R. Fortna, and B.-H. Nguyen. A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation. In Experimental Algorithmics. Springer-Verlag, 2002.
[20]
D. McIlroy. Mass-Produced Software Components. In P. Naur and B. Randell, editors, Proceedings of NATO Software Engineering Conference, pages 138–155. NATO Scientific Affairs Division, 1968.
[21]
N. Mitchell and G. Sevitsky. The causes of bloat, the limits of health. In OOPSLA ’07: Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications. ACM, 2007.
[22]
C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1999.
[23]
R. Olsson and S. Nilsson. TRASH A dynamic LC-trie and hash data structure. In High Performance Switching and Routing, 2007. HPSR ’07. Workshop on. IEEE, 2007.
[24]
A. Prokopec, N. G. Bronson, P. Bagwell, and M. Odersky. Concurrent tries with efficient non-blocking snapshots. In PPoPP ’12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming. ACM, 2012.
[25]
P. Rademaker. Binary relational querying for structural source code analysis. Universiteit Utrecht. Master’s thesis, 2008.
[26]
I. Ramakrishnan, P. Rao, K. Sagonas, T. Swift, and D. S. Warren. Efficient Tabling Mechanisms for Logic Programs. In Proceedings of the 12th International Conference on Logic Programming. Elsevier, 1995.
[27]
N. Sarnak and R. E. Tarjan. Planar point location using persistent search trees. Communications of the ACM, 29(7): 669–679, 1986.
[28]
N. Shavit and D. Touitou. Software transactional memory. ACM Press, 1995.
[29]
M. J. Steindorfer and J. J. Vinju. Code Specialization for Memory Efficient Hash Tries (Short Paper). In GPCE ’14: Proceedings of Generative Programming Concepts & Experiences. ACM, 2014.
[30]
N. Stucki, T. Rompf, V. Ureche, and P. Bagwell. RRB Vector: A Practical General Purpose Immutable Sequence. In ICFP ’15: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming. ACM, 2015.
[31]
V. Ureche, C. Talau, and M. Odersky. Miniboxing: improving the speed to code size tradeoff in parametric polymorphism translations. In OOPSLA ’13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications. ACM, 2013.
[32]
A. Wöß, C. Wirth, D. Bonetta, C. Seaton, C. Humer, and H. Mössenböck. An object storage model for the truffle language implementation framework. In PPPJ ’14: Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java platform. ACM, 2014.

Cited By

View all
  • (2024)Refinery: Graph Solver as a ServiceProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3640045(64-68)Online publication date: 14-Apr-2024
  • (2018)Efficient Lock-Free Removing and Compaction for the Cache-Trie Data StructureEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_41(575-589)Online publication date: 27-Aug-2018
  • (2017)Persistence for the masses: RRB-vectors in a systems languageProceedings of the ACM on Programming Languages10.1145/31102601:ICFP(1-28)Online publication date: 29-Aug-2017
  • Show More Cited By

Index Terms

  1. Optimizing hash-array mapped tries for fast and lean immutable JVM collections

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 50, Issue 10
    OOPSLA '15
    October 2015
    953 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2858965
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
    • cover image ACM Conferences
      OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
      October 2015
      953 pages
      ISBN:9781450336895
      DOI:10.1145/2814270
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2015
    Published in SIGPLAN Volume 50, Issue 10

    Check for updates

    Author Tags

    1. Hash trie
    2. Java Virtual Machine
    3. cache locality
    4. immutability
    5. performance
    6. persistent data structure

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 23 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Refinery: Graph Solver as a ServiceProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3640045(64-68)Online publication date: 14-Apr-2024
    • (2018)Efficient Lock-Free Removing and Compaction for the Cache-Trie Data StructureEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_41(575-589)Online publication date: 27-Aug-2018
    • (2017)Persistence for the masses: RRB-vectors in a systems languageProceedings of the ACM on Programming Languages10.1145/31102601:ICFP(1-28)Online publication date: 29-Aug-2017
    • (2024)Reducing Write Barrier Overheads for Orthogonal PersistenceProceedings of the 17th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3687997.3695646(210-223)Online publication date: 17-Oct-2024
    • (2022)Aggregate update problem for multi-clocked dataflow languagesProceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO53902.2022.9741275(79-91)Online publication date: 2-Apr-2022
    • (2021)Runtime and compiler support for HAMTsProceedings of the 17th ACM SIGPLAN International Symposium on Dynamic Languages10.1145/3486602.3486931(48-59)Online publication date: 19-Oct-2021
    • (2018)To-many or to-one? all-in-one! efficient purely functional multi-maps with type-heterogeneous hash-triesACM SIGPLAN Notices10.1145/3296979.319242053:4(283-295)Online publication date: 11-Jun-2018
    • (2018)C♭: a new modular approach to implementing efficient and tunable collectionsProceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3276954.3276956(57-71)Online publication date: 24-Oct-2018
    • (2018)Cache-triesACM SIGPLAN Notices10.1145/3200691.317849853:1(137-151)Online publication date: 10-Feb-2018
    • (2018)To-many or to-one? all-in-one! efficient purely functional multi-maps with type-heterogeneous hash-triesProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192420(283-295)Online publication date: 11-Jun-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media