Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3447786.3456256acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

M3: end-to-end memory management in elastic system software stacks

Published: 21 April 2021 Publication History

Abstract

This paper proposes M3, an end-to-end system that dynamically distributes memory resources among competing applications to maximize their overall performance. Today's data center workloads, can adapt to a wide range of memory sizes, and they are built on complex software stacks.
M3 consists of a set of mechanisms and policies allowing the layers of the system stack to make coordinated decisions. Applications continuously adapt to current resource availability, and resources are distributed to competing applications according to their needs. Experiments show that compared to the best possible static configurations, M3 achieves up to 3.05x speed-up.

References

[1]
Raphael Alonso and Andrew W. Appel. 1990. An Advisor for Flexible Working Sets. In Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (Univ. of Colorado, Boulder, Colorado, USA) (SIGMETRICS '90). ACM, New York, NY, USA, 153--162.
[2]
A. W. Appel. 1989. Simple Generational Garbage Collection and Fast Allocation. Softw. Pract. Exper. 19, 2 (Feb. 1989), 171--183.
[3]
Amazon Web Services (AWS). 2021. EC2 Instance Pricing - Amazon Web Services (AWS). https://aws.amazon.com/ec2/pricing/ondemand/.
[4]
Benjamin Berg, Daniel S. Berger, Sara McAllister, Isaac Grosof, Sathya Gunasekar, Jimmy Lu, Michael Uhlar, Jim Carrig, Nathan Beckmann, Mor Harchol-Balter, and Gregory R. Ganger. 2020. The CacheLib Caching Engine: Design and Experiences at Scale. In 14th USENIX Symposium on Operating Systems Design and Implementation (Virtual Event) (OSDI '20). USENIX Association, 753--768. https://www.usenix.org/conference/osdi20/presentation/berg
[5]
Laurent Bindschaedler, Jasmina Malicevic, Nicolas Schiper, Ashvin Goel, and Willy Zwaenepoel. 2018. Rock You like a Hurricane: Taming Skew in Large Scale Analytics. In Proceedings of the Thirteenth EuroSys Conference (Porto, Portugal) (EuroSys '18). Association for Computing Machinery, New York, NY, USA, Article 20, 15 pages.
[6]
Rodrigo Bruno, Paulo Ferreira, Ruslan Synytsky, Tetiana Fydorenchyk, Jia Rao, Hang Huang, and Song Wu. 2018. Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications. In Proceedings of the 2018 ACM SIGPLAN International Symposium on Memory Management (Philadelphia, PA, USA) (ISMM '18). ACM, New York, NY, USA, 59--70.
[7]
Daniel Byrne, Nilufer Onder, and Zhenlin Wang. 2018. mPart: Missratio Curve Guided Partitioning in Key-value Stores. In Proceedings of the 2018 ACM SIGPLAN International Symposium on Memory Management (Philadelphia, PA, USA) (ISMM '18). ACM, New York, NY, USA, 84--95.
[8]
Oracle Java Bug Database. 2014. JDK-4408373: Can we eliminate the -Xmx max heap "glass ceiling"? https://bugs.java.com/view_bug.do?bug_id=4408373.
[9]
Peter J. Denning. 1968. The Working Set Model for Program Behavior. Commun. ACM 11, 5 (May 1968), 323--333.
[10]
David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. 2004. Garbage-first Garbage Collection. In Proceedings of the 4th International Symposium on Memory Management (Vancouver, BC, Canada) (ISMM '04). ACM, New York, NY, USA, 37--48.
[11]
Jake Edge. 2008. Avoiding the OOM killer with mem_notify. https://lwn.net/Articles/267013/.
[12]
D. R. Engler, M. F. Kaashoek, and J. O'Toole, Jr. 1995. Exokernel: An Operating System Architecture for Application-level Resource Management. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (Copper Mountain, Colorado, USA) (SOSP '95). ACM, New York, NY, USA, 251--266.
[13]
Go Language GitHub. 2015. Issue 9849: runtime: make max heap size configurable. https://github.com/golang/go/issues/9849.
[14]
Go Language GitHub. 2016. Issue 16843: runtime: mechanism for monitoring heap size. https://github.com/golang/go/issues/16843.
[15]
James Gosling, Bill Joy, Guy Steele, Gilad Bracha, and Alex Buckley. 2015. The Java®Virtual Machine Specification - Java SE 8 Edition. https://docs.oracle.com/javase/specs/jvms/se8/html/.
[16]
Apache Hadoop. 2021. The Apache™ Hadoop® Project. https://hadoop.apache.org.
[17]
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (Boston, MA, USA) (NSDI '11). USENIX Association, Berkeley, CA, USA, 295--308. http://dl.acm.org/citation.cfm?id=1972457.1972488
[18]
S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. 2010. The Hi-Bench Benchmark Suite: Characterization of the MapReduce-based Data Analysis. In 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) (Long Beach, CA, USA). 41--51.
[19]
Redis Labs. 2021. GitHub - RedisLabs/memtier_benchmark: NoSQL Redis and Memcache traffic generation and benchmarking tool. https://github.com/RedisLabs/memtier_benchmark.
[20]
Zero Gravity Labs. 2017. Spark Performance Tuning: A Checklist. https://medium.com/zero-gravity-labs/spark-performance-tuning-a-checklist-abb3c80efb44.
[21]
Butler W. Lampson and Robert F. Sproull. 1979. An Open Operating System for a Single-user Machine. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, CA, USA) (SOSP '79). ACM, 98--105.
[22]
F. Laniel, D. Carver, J. Sopena, F. Wajsburt, J. Lejeune, and M. Shapiro. 2020. MemOpLight: Leveraging application feedback to improve container memory consolidation. In 19th IEEE International Symposium on Network Computing and Applications (NCA) (Cambridge, MA, USA). 1--10.
[23]
Martin Maas, Krste Asanović, Tim Harris, and John Kubiatowicz. 2016. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (Atlanta, Georgia, USA) (ASPLOS '16). ACM, New York, NY, USA, 457--471.
[24]
Microsoft. 2017. Microsoft Docs - .NET - ICLRMemoryNotificationCallback Interface. https://docs.microsoft.com/en-us/dotnet/framework/unmanaged-api/hosting/iclrmemorynotificationcallback-interface.
[25]
Microsoft. 2018. Microsoft Docs - Win32 - CreateMemoryResourceNotification. https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-creatememoryresourcenotification.
[26]
Chakri Padala. 2017. Time for memory disaggregation? https://www.ericsson.com/en/blog/2017/5/time-for-memory-disaggregation.
[27]
OpenJDK JDK Enhancement Proposals. 2017. JEP 248: Make G1 the Default Garbage Collector. http://openjdk.java.net/jeps/248.
[28]
Tudor-Ioan Salomie, Gustavo Alonso, Timothy Roscoe, and Kevin Elphinstone. 2013. Application Level Ballooning for Efficient Server Consolidation. In Proceedings of the 8th ACM European Conference on Computer Systems (Prague, Czech Republic) (EuroSys '13). ACM, New York, NY, USA, 337--350.
[29]
J. H. Saltzer, D. P. Reed, and D. D. Clark. 1984. End-to-end Arguments in System Design. ACM Trans. Comput. Syst. 2, 4 (Nov. 1984), 277--288.
[30]
Prateek Sharma, Ahmed Ali-Eldin, and Prashant Shenoy. 2019. Resource Deflation: A New Approach For Transient Resource Reclamation. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys '19). Association for Computing Machinery, New York, NY, USA, Article 33, 17 pages.
[31]
Alan J. Smith. 1983. Disk Cache - Miss Ratio Analysis and Design Considerations. Technical Report UCB/CSD-83-120. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/1983/6336.html
[32]
Apache Spark. 2020. Configuration - Spark 2.4.1 Documentation. https://spark.apache.org/docs/2.4.1/configuration.html#memory-management.
[33]
Apache Spark. 2021. Apache Spark™- Unified Analytics Engine for Big Data. http://spark.apache.org.
[34]
Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (Santa Clara, CA, USA) (SOCC '13). ACM, New York, NY, USA, Article 5, 16 pages.
[35]
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale Cluster Management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (Bordeaux, France) (EuroSys '15). ACM, New York, NY, USA, Article 18, 17 pages.
[36]
Carl A. Waldspurger. 2002. Memory Resource Management in VMware ESX Server. In Proceedings of the 5th Symposium on Operating Systems Design and implementationCopyright Restrictions Prevent ACM from Being Able to Make the PDFs for This Conference Available for Downloading (Boston, MA, USA) (OSDI '02). USENIX Association, Berkeley, CA, USA, 181--194. http://dl.acm.org/citation.cfm?id=1060289.1060307
[37]
Carl A. Waldspurger, Nohhyun Park, Alexander Garthwaite, and Irfan Ahmad. 2015. Efficient MRC Construction with SHARDS. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (Santa Clara, CA, USA) (FAST '15). USENIX Association, USA, 95--110.
[38]
Carl A. Waldspurger, Trausti Saemundson, Irfan Ahmad, and Nohhyun Park. 2017. Cache Modeling and Optimization Using Miniature Simulations. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (Santa Clara, CA, USA) (USENIX ATC '17). USENIX Association, USA, 487--498.
[39]
Daniel Xu. 2018. Open-sourcing oomd, a new approach to handling OOMs. https://code.fb.com/production-engineering/oomd/.
[40]
Ting Yang, Emery D. Berger, Scott F. Kaplan, and J. Eliot B. Moss. 2006. CRAMM: Virtual Memory Support for Garbage-collected Applications. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Seattle, Washington, USA) (OSDI '06). USENIX Association, Berkeley, CA, USA, 103--116. http://dl.acm.org/citation.cfm?id=1298455.1298466
[41]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (San Jose, CA, USA) (NSDI '12). USENIX Association, 2--2. http://dl.acm.org/citation.cfm?id=2228298.2228301
[42]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (Boston, MA, USA) (HotCloud '10). USENIX Association, 10--10. http://dl.acm.org/citation.cfm?id=1863103.1863113
[43]
Pin Zhou, Vivek Pandey, Jagadeesan Sundaresan, Anand Raghuraman, Yuanyuan Zhou, and Sanjeev Kumar. 2004. Dynamic Tracking of Page Miss Ratio Curve for Memory Management. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, MA, USA) (ASPLOS XI). ACM, New York, NY, USA, 177--188.

Cited By

View all
  • (2022)Optimal heap limits for reducing browser memory useProceedings of the ACM on Programming Languages10.1145/35633236:OOPSLA2(986-1006)Online publication date: 31-Oct-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems
April 2021
631 pages
ISBN:9781450383349
DOI:10.1145/3447786
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Check for updates

Badges

Qualifiers

  • Research-article

Funding Sources

  • NSERC
  • VMware
  • Huawei

Conference

EuroSys '21
Sponsor:
EuroSys '21: Sixteenth European Conference on Computer Systems
April 26 - 28, 2021
Online Event, United Kingdom

Acceptance Rates

EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)169
  • Downloads (Last 6 weeks)19
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Optimal heap limits for reducing browser memory useProceedings of the ACM on Programming Languages10.1145/35633236:OOPSLA2(986-1006)Online publication date: 31-Oct-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media