Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3373376.3378525acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

Learning-based Memory Allocation for C++ Server Workloads

Published: 13 March 2020 Publication History

Abstract

Modern C++ servers have memory footprints that vary widely over time, causing persistent heap fragmentation of up to 2x from long-lived objects allocated during peak memory usage. This fragmentation is exacerbated by the use of huge (2MB) pages, a requirement for high performance on large heap sizes. Reducing fragmentation automatically is challenging because C++ memory managers cannot move objects.
This paper presents a new approach to huge page fragmentation. It combines modern machine learning techniques with a novel memory manager (LLAMA) that manages the heap based on object lifetimes and huge pages (divided into blocks and lines). A neural network-based language model predicts lifetime classes using symbolized calling contexts. The model learns context-sensitive per-allocation site lifetimes from previous runs, generalizes over different binary versions, and extrapolates from samples to unobserved calling contexts. Instead of size classes, LLAMA's heap is organized by lifetime classes that are dynamically adjusted based on observed behavior at a block granularity.
LLAMA reduces memory fragmentation by up to 78% while only using huge pages on several production servers. We address ML-specific questions such as tolerating mispredictions and amortizing expensive predictions across application execution. Although our results focus on memory allocation, the questions we identify apply to other system-level problems with strict latency and resource requirements where machine learning could be applied.

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265--283. http://dl.acm.org/citation.cfm?id=3026877.3026899
[2]
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-scale, Unbounded, Out-of-order Data Processing. Proc. VLDB Endow., Vol. 8, 12 (Aug. 2015), 1792--1803. https://doi.org/10.14778/2824032.2824076
[3]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings .
[4]
David A. Barrett and Benjamin G. Zorn. 1993. Using Lifetime Predictors to Improve Memory Allocation Performance. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI '93). ACM, New York, NY, USA, 187--196. https://doi.org/10.1145/155090.155108
[5]
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, USA, 117--128. https://doi.org/10.1145/378993.379232
[6]
Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2002. Reconsidering Custom Memory Allocation. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA '02). ACM, New York, NY, USA, 1--12. https://doi.org/10.1145/582419.582421
[7]
Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. 2016. Site Reliability Engineering: How Google Runs Production Systems .O'Reilly Media, Inc.
[8]
Stephen Blackburn, Richard E. Jones, Kathryn S. McKinley, and J. Eliot B. Moss. 2002. Beltway: Getting Around Garbage Collection Gridlock. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) . 153--164.
[9]
Stephen M. Blackburn, Perry Cheng, and Kathryn S. McKinley. 2004. Myths and realities: the performance impact of garbage collection. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems, SIGMETRICS 2004. 25--36. https://doi.org/10.1145/1005686.1005693
[10]
Stephen M. Blackburn and Kathryn S. McKinley. 2008. Immix: A Mark-region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). 22--32. https://doi.org/10.1145/1375581.1375586
[11]
Stephen M. Blackburn, Sharad Singhai, Matthew Hertz, Kathryn S. McKinely, and J. Eliot B. Moss. 2001. Pretenuring for Java. In Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA '01). ACM, New York, NY, USA, 342--352. https://doi.org/10.1145/504282.504307
[12]
Michael D. Bond, Graham Z. Baker, and Samuel Z. Guyer. 2010. Breadcrumbs: efficient context sensitivity for dynamic bug detection analyses. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, Toronto, Ontario, Canada . 13--24. https://doi.org/10.1145/1806596.1806599
[13]
Rodrigo Bruno, Duarte Patricio, José Sim ao, Luis Veiga, and Paulo Ferreira. 2019. Runtime Object Lifetime Profiler for Latency Sensitive Big Data Applications. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). ACM, New York, NY, USA, Article 28, bibinfonumpages16 pages. https://doi.org/10.1145/3302424.3303988
[14]
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert Henry, Robert Bradshaw, and Nathan. 2010. FlumeJava: Easy, Efficient Data-Parallel Pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) . 363--375.
[15]
Pohua P. Chang, Scott A. Mahlke, William Y. Chen, and Wen-Mei W. Hwu. 1992. Profile-guided automatic inline expansion for C programs. Software: Practice and Experience, Vol. 22, 5 (1992), 349--369. https://doi.org/10.1002/spe.4380220502
[16]
Daniel Clifford, Hannes Payer, Michael Stanton, and Ben L. Titzer. 2015. Memento Mori: Dynamic Allocation-site-based Optimizations. In Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management . 105--117.
[17]
David A. Cohn and Satinder P. Singh. 1997. Predicting Lifetimes in Dynamically Allocated Memory. In Advances in Neural Information Processing Systems 9, M. C. Mozer, M. I. Jordan, and T. Petsche (Eds.). MIT Press, 939--945. http://papers.nips.cc/paper/1240-predicting-lifetimes-in-dynamically-allocated-memory.pdf
[18]
David Detlefs, Christine H. Flood, Steve Heller, and Tony Printezis. 2004. Garbage-first garbage collection. In ACM International Symposium on Memory Management (ISMM). 37--48. https://doi.org/10.1145/1029873.1029879
[19]
Jason Evans. 2006. A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the bsdcan conference, ottawa, canada .
[20]
Kunihiko Fukushima. 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, Vol. 36, 4 (1980), 193--202.
[21]
Sanjay Ghemawat and Paul Menage. 2009. Tcmalloc: Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html
[22]
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 1487--1495.
[23]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning .MIT Press. http://www.deeplearningbook.org.
[24]
Google. 2020 a. C
[25]
Arena Allocation Guide. https://developers.google.com/protocol-buffers/docs/reference/arenas
[26]
Google. 2020 b. pprof. https://github.com/google/pprof
[27]
Google. 2020 c. TCMalloc. https://github.com/google/tcmalloc
[28]
Swapnil Haria, Mark D. Hill, and Michael M. Swift. 2018. Devirtualizing Memory in Heterogeneous Systems. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 637--650. https://doi.org/10.1145/3173162.3173194
[29]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[30]
Jipeng Huang and Michael D. Bond. 2013. Efficient context sensitivity for dynamic analyses via calling context uptrees and customized memory management. In ACM Conference on Object-Oriented Programming Languages and Systems (OOPSLA). 53--72.
[31]
Maria Jump, Stephen M. Blackburn, and Kathryn S. McKinley. 2004. Dynamic object sampling for pretenuring. In Proceedings of the 4th International Symposium on Memory Management, ISMM 2004, Vancouver, BC, Canada, October 24--25, 2004 . 152--162. https://doi.org/10.1145/1029873.1029892
[32]
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a Warehouse-scale Computer. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 158--169. https://doi.org/10.1145/2749469.2750392
[33]
Svilen Kanev, Sam Likun Xi, Gu-Yeon Wei, and David Brooks. 2017. Mallacc: Accelerating Memory Allocation. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 33--45. https://doi.org/10.1145/3037697.3037736
[34]
Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adriá n Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman S. Unsal. 2015. Redundant memory mappings for fast access to large memories. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13--17, 2015. 66--78. https://doi.org/10.1145/2749469.2749471
[35]
Sang-Hoon Kim, Sejun Kwon, Jin-Soo Kim, and Jinkyu Jeong. 2015. Controlling physical memory fragmentation in mobile systems. In Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management, ISMM 2015, Portland, OR, USA, June 13--14, 2015. 1--14. https://doi.org/10.1145/2754169.2754179
[36]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings . http://arxiv.org/abs/1412.6980
[37]
Bradley C. Kuszmaul. 2015. SuperMalloc: a super fast multithreaded malloc for 64-bit machines. In Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management, ISMM 2015, Portland, OR, USA. 41--55. https://doi.org/10.1145/2754169.2754178
[38]
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 705--721. http://dl.acm.org/citation.cfm?id=3026877.3026931
[39]
Doug Lea and Wolfram Gloger. 1996. A memory allocator.
[40]
Yann LeCun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Lawrence D. Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural Computation, Vol. 1, 4 (1989), 541--551.
[41]
David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In ACM International Conference on Computer Architecture (ISCA). 301--312. https://doi.org/10.1109/ISCA.2014.6853237
[42]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[43]
Todd Mytkowicz, Devin Coughlin, and Amer Diwan. 2009. Inferred Call Path Profiling. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '09). ACM, New York, NY, USA, 175--190. https://doi.org/10.1145/1640089.1640102
[44]
Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 675--690. https://doi.org/10.1145/2694344.2694345
[45]
Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. Tensorflow-serving: Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139 (2017).
[46]
Ashish Panwar, Aravinda Prasad, and K. Gopinath. 2018. Making Huge Pages Actually Useful. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 679--692. https://doi.org/10.1145/3173162.3173203
[47]
Bobby Powers, David Tench, Emery D. Berger, and Andrew McGregor. 2019. Mesh: Compacting memory management for C/C
[48]
applications. In ACM Conference on Programming Language Design and Implementation(PLDI). 333--346. https://doi.org/10.1145/3314221.3314582
[49]
Chuck Rossi. 2017. Rapid release at massive scale. https://engineering.fb.com/web/rapid-release-at-massive-scale/.
[50]
Rifat Shahriyar, Stephen M. Blackburn, Xi Yang, and Kathryn S. McKinley. 2013. Taking off the gloves with reference counting Immix. In ACM Conference on Object-Oriented Programming Languages and Systems (OOPSLA). 93--110. https://doi.org/10.1145/2509136.2509527
[51]
Darko Stefanovic, Kathryn S. McKinley, and J. Eliot B. Moss. 1999. Age-Based Garbage Collection. In ACM SIGPLAN Conference on Object-Oriented Programming Languages and Systems (OOPSLA). 370--381. https://doi.org/10.1145/320385.320425
[52]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .
[53]
David M. Ungar. 1984. Generation Scavenging: A Non-Disruptive High Performance Storage Reclamation Algorithm. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, Pittsburgh, Pennsylvania, USA, April 23--25, 1984 . 157--167. https://doi.org/10.1145/800020.808261

Cited By

View all
  • (2024)On the Feasibility and Benefits of Extensive EvaluationProceedings of the ACM on Management of Data10.1145/36771372:4(1-24)Online publication date: 30-Sep-2024
  • (2024)IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658659(69-82)Online publication date: 3-Jun-2024
  • (2024)Characterizing a Memory Allocator at Warehouse ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651350(192-206)Online publication date: 27-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2020
1412 pages
ISBN:9781450371025
DOI:10.1145/3373376
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. lifetime prediction
  2. lstms
  3. machine learning
  4. memory management
  5. profile-guided optimization

Qualifiers

  • Research-article

Conference

ASPLOS '20

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,200
  • Downloads (Last 6 weeks)138
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On the Feasibility and Benefits of Extensive EvaluationProceedings of the ACM on Management of Data10.1145/36771372:4(1-24)Online publication date: 30-Sep-2024
  • (2024)IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658659(69-82)Online publication date: 3-Jun-2024
  • (2024)Characterizing a Memory Allocator at Warehouse ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651350(192-206)Online publication date: 27-Apr-2024
  • (2024)The Importance of Generalizability in Machine Learning for SystemsIEEE Computer Architecture Letters10.1109/LCA.2024.338444923:1(95-98)Online publication date: Jan-2024
  • (2024)Enabling Large Dynamic Neural Network Training with Learning-based Memory Management2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00066(788-802)Online publication date: 2-Mar-2024
  • (2024)Storage Technology Trends and DevelopmentData Storage Architectures and Technologies10.1007/978-981-97-3534-1_14(379-428)Online publication date: 28-Aug-2024
  • (2023)CoopProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668293(49870-49882)Online publication date: 10-Dec-2023
  • (2023)GL-CacheProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585946(115-133)Online publication date: 21-Feb-2023
  • (2023)Is Machine Learning Necessary for Cloud Resource Usage Forecasting?Proceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624790(544-554)Online publication date: 30-Oct-2023
  • (2023)Beyond RSS: Towards Intelligent Dynamic Memory Management (Work in Progress)Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622989(158-164)Online publication date: 19-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media