Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3310232.3310242acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiflConference Proceedingsconference-collections
research-article

Extended Memory Reuse: An Optimisation for Reducing Memory Allocations

Published: 05 September 2018 Publication History

Abstract

In this paper we present an optimisation for reference counting based garbage collection. The optimisation aims at reducing the total number of calls to the heap manager while preserving the key benefits of reference counting, i.e. the opportunities for in-place updates as well as memory deallocation without global garbage collection. The key idea is to carefully extend the lifetime of variables so that memory deallocations followed by memory allocations of the same size can be replaced by a direct memory reuse. Such memory reuse turns out particularly useful in the context of innermost loops of compute-intensive applications. It leads to a runtime behaviour that performs pointer swaps between buffers in the same way it would be implemented manually in languages that require explicit memory management, e.g. C.
We have implemented the proposed optimisation in the context of the Single-Assignment C compiler tool chain. The paper provides an algorithmic description of our optimisation and an evaluation of its effectiveness over a collection of benchmarks including a subset of the Rodinia benchmarks and the NAS Parallel Benchmarks. We show that for several benchmarks with allocations within loops our optimisation reduces the amount of allocations by a few orders of magnitude. We also observe no negative impact on the overall memory footprint nor on the overall runtime. Instead, for some sequential executions we find mild improvement, and on GPU devices we observe speedups of up to a factor of 4x.

References

[1]
David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et al. 1991. The NAS Parallel Benchmarks. The International Journal of Supercomputing Applications 5, 3 (1991), 63--73.
[2]
Robert Bernecky, Stephan Herhut, Sven-Bodo Scholz, Kai Trojahner, Clemens Grelck, and Alex Shafarenko. 2007. Index Vector Elimination: Making Index Vectors Affordable. In Implementation and Application of Functional Languages, 18th International Symposium (IFL'06), Budapest, Hungary, Revised Selected Papers (Lecture Notes in Computer Science), Zoltan Horváth, Viktória Zsók, and Andrew Butterfield (Eds.), Vol. 4449. Springer, 19--36.
[3]
Robert Bernecky and Sven-Bodo Scholz. 2015. Abstract Expressionism for Parallel Performance. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming. ACM, 54--59.
[4]
David C. Cann. 1989. Compilation Techniques for High-performance Applicative Computation. Ph.D. Dissertation. Fort Collins, CO, USA. AAI9007070.
[5]
David C. Cann and Paraskevas Evripidou. 1995. Advanced array optimizations for high performance functional languages. IEEE Transactions on Parallel and Distributed Systems 6, 3 (March 1995), 229--239.
[6]
Shuai Che, M. Boyer, Jiayuan Meng, D. Tarjan, J. W. Sheaffer, Sang-Ha Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In IEEE International Symposium on Workload Characterization (IISWC '09). 44--54.
[7]
David L. Detlefs, Paul A. Martin, Mark Moir, and Guy L. Steele Jr. 2002. Lockfree reference counting. Distributed Computing 15, 4 (01 Dec 2002), 255--271.
[8]
Steven M. Fitzgerald and Rodney R. Oldehoeft. 1996. Update-in-place Analysis for True Multidimensional Arrays. Sci. Program. 5, 2 (July 1996), 147--160.
[9]
Edinburgh Centre for Robotics. 2014. Robotarium Cluster. The cluster is part of the EPSRC Centre for Doctoral Training in Robotics and Autonomous Systems (RAS) in Edinburgh grant (EP/L016834/1) funded by The Engineering and Physical Sciences Research Council (EPSRC) (UK).
[10]
Python Software Foundation. 2018. Python 3.7 Language Documentation. https://docs.python.org/3/c-api/index.html Online, accessed 14 August 2018.
[11]
Clemens Grelck. 2012. Single Assignment C (SaC): High Productivity Meets High Performance. In 4th Central European Functional Programming Summer School (CEFP'11), Budapest, Hungary (Lecture Notes in Computer Science), V. Zsók, Z. Horváth, and R. Plasmeijer (Eds.), Vol. 7241. Springer, 207--278.
[12]
Clemens Grelck and Kai Trojahner. 2004. Implicit Memory Management for SaC. In Implementation and Application of Functional Languages, 16th International Workshop, IFL'04, Clemens Grelck and Frank Huch (Eds.). University of Kiel, Institute of Computer Science and Applied Mathematics, 335--348. Technical Report 0408.
[13]
Jing Guo, Robert Bernecky, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2014. Polyhedral Methods for Improving Parallel Update-in-Place. In Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Sanjay Rajopadhye and Sven Verdoolaege (Eds.). Vienna, Austria.
[14]
Jing Guo, Jeyarajan Thiyagalingam, and Sven-Bodo Scholz. 2011. Breaking the Gpu Programming Barrier with the Auto-parallelising Sac Compiler. In 6th Workshop on Declarative Aspects of Multicore Programming (DAMP'11), Austin, USA. ACM Press, 15--24.
[15]
Jurriaan Hage and Stefan Holdermans. 2008. Heap recycling for lazy languages. In Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM 2008, San Francisco, California, USA, January 7--8, 2008. 189--197.
[16]
G. W. Hamilton and S. B. Jones. 1991. Compile-Time Garbage Collection by Necessity Analysis. In Functional Programming, Glasgow 1990, Simon L. Peyton Jones, Graham Hutton, and Carsten Kehler Holst (Eds.). Springer London, London, 66--70.
[17]
Paul Hudak and Adrienne Bloss. 1985. The Aggregate Update Problem in Functional Programming Systems. In Proceedings of the 12th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL '85). ACM, New York, NY, USA, 300--314.
[18]
Apple Inc. 2018. Swift Language Documentation. https://docs.swift.org/swift-book/ Online, accessed 14 August 2018.
[19]
Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P.Johnson, Stephen R. Beard, and David I. August. 2011. Automatic CPU-GPU Communication Management and Optimization. SIGPLAN Not. 46, 6 (June 2011), 142--151.
[20]
Andreas Kågedal and Saumya Debray. 1997. A Practical Approach to Structure Reuse of Arrays in Single Assignment Languages. In Proceedings of the 14th International Conference on Logic Programming. MIT Press, 18--32.
[21]
Akash Lal and G. Ramalingam. 2010. Reference Count Analysis with Shallow Aliasing. Inform. Process. Lett. 111, 2 (Dec. 2010), 57--63.
[22]
Oukseh Lee, Hongseok Yang, and Kwangkeun Yi. 2003. Inserting Safe Memory Reuse Commands into ML-Like Programs. In Static Analysis, 10th International Symposium, SAS 2003, San Diego, CA, USA, June 11-13, 2003, Proceedings. 171--188.
[23]
Frank H. McMahon. 1986. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report UCRL-53745. Lawrence Livermore National Lab., CA, USA.
[24]
NVIDIA Corporation 2018. CUDA C Programming Guide (9.0.176 ed.). NVIDIA Corporation. https://docs.nvidia.com/cuda/archive/9.0/ Online, accessed 12 Aug. 2018.
[25]
Young Park and Benjamin Goldberg. 1995. Static analysis for optimizing reference counting. Inform. Process. Lett. 55, 4 (1995), 229--234.
[26]
SaC Development Team 2016. SaC EBNF Grammar. SaC Development Team. Available online at http://www.sac-home.org/doku.php?id=docs:syntax.
[27]
Kazuki Sakamoto and Tomohiko Furumoto. 2012. Life Before Automatic Reference Counting. Apress, Berkeley, CA, 1--29.
[28]
Sven-Bodo Scholz. 1997. An Overview of Sc Sac -- a Functional Language for Numerical Applications. In Programming Languages and Fundamentals of Programming, Technical Report 9717, R. Berghammer and F. Simon (Eds.). Institut für Informatik und Praktische Mathematik, Universität Kiel.
[29]
Sven-Bodo Scholz. 2003. Single Assignment C: efficient support for high-level array operations in a functional setting. Journal of Functional Programming 13, 6 (2003), 1005--1059.
[30]
Rifat Shahriyar, Stephen M. Blackburn, and Daniel Frampton. 2012. Down for the Count? Getting Reference Counting Back in the Ring. SIGPLAN Not. 47, 11 (June 2012), 73--84.
[31]
H. Sundell. 2005. Wait-free reference counting and memory management. In 19th IEEE International Parallel and Distributed Processing Symposium. 10 pp.--.
[32]
Hans-Nikolai Vießmann, Sven-Bodo Scholz, Artjoms Šinkarovs, Brian Bainbridge, Brian Hamilton, and Simon Flower. 2015. Making Fortran Legacy Code More Functional. 27th Symposium on Implementation and Application of Functional Languages (IFL '15) (2015).

Cited By

View all
  • (2021)The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architecturesFuture Generation Computer Systems10.1016/j.future.2021.07.021125:C(743-757)Online publication date: 1-Dec-2021
  • (2020)Effective Host-GPU Memory Management Through Code GenerationProceedings of the 32nd Symposium on Implementation and Application of Functional Languages10.1145/3462172.3462199(138-149)Online publication date: 2-Sep-2020
  • (2020)A Shared Memory Cache Layer across Multiple Executors in Apache Spark2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378179(477-482)Online publication date: 10-Dec-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
IFL '18: Proceedings of the 30th Symposium on Implementation and Application of Functional Languages
September 2018
143 pages
ISBN:9781450371438
DOI:10.1145/3310232
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2018

Check for updates

Author Tags

  1. compiler optimisation
  2. memory management
  3. reference counting

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

IFL 2018

Acceptance Rates

Overall Acceptance Rate 19 of 36 submissions, 53%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architecturesFuture Generation Computer Systems10.1016/j.future.2021.07.021125:C(743-757)Online publication date: 1-Dec-2021
  • (2020)Effective Host-GPU Memory Management Through Code GenerationProceedings of the 32nd Symposium on Implementation and Application of Functional Languages10.1145/3462172.3462199(138-149)Online publication date: 2-Sep-2020
  • (2020)A Shared Memory Cache Layer across Multiple Executors in Apache Spark2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378179(477-482)Online publication date: 10-Dec-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media