research-article

Public Access

Pufferfish: Container-driven Elastic Memory Management for Data-intensive Applications

Authors:

Xiaobo ZhouAuthors Info & Claims

SoCC '19: Proceedings of the ACM Symposium on Cloud Computing

Pages 259 - 271

https://doi.org/10.1145/3357223.3362730

Published: 20 November 2019 Publication History

Abstract

Data-intensive applications often suffer from significant memory pressure, resulting in excessive garbage collection (GC) and out-of-memory (OOM) errors, harming system performance and reliability. In this paper, we demonstrate how lightweight virtualization via OS containers opens up opportunities to address memory pressure and realize memory elasticity: 1) tasks running in a container can be set to a large heap size to avoid OutOfMemory (OOM) errors, and 2) tasks that are under memory pressure and incur significant swapping activities can be temporarily "suspended" by depriving resources from the hosting containers, and be "resumed" when resources are available. We propose and develop Pufferfish, an elastic memory manager, that leverages containers to flexibly allocate memory for tasks. Memory elasticity achieved by Pufferfish can be exploited by a cluster scheduler to improve cluster utilization and task parallelism. We implement Pufferfish on the cluster scheduler Apache Yarn. Experiments with Spark and MapReduce on real-world traces show Pufferfish is able to avoid OOM errors, improve cluster memory utilization by 2.7x and the median job runtime by 5.5x compared to a memory over-provisioning solution.

References

[1]

Spark-19371. https://issues.apache.org/jira/browse/SPARK-19371.

[2]

Tpch standard specification. http://www.tpch.org/tpcc/spec/tpcc.

[3]

Yarn-1645. https://issues.apache.org/jira/browse/SPARK-19371/.

[4]

Hadoop. http://hadoop.apache.org, 2009.

[5]

O. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, and M. Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In Proc. of USENIX NSDI, 2017.

[6]

M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In Pro. of ACM SIGMOD, 2015.

[7]

V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proc. of IEEE ICDE, 2011.

Digital Library

[8]

R. Bruno, D. Patrício, J. Simão, L. Veiga, and P. Ferreira. Runtime object lifetime profiler for latency sensitive big data applications. In Proc. of ACM Eurosys, 2019.

Digital Library

[9]

W. Chen, A. Pi, S. Wang, and X. Zhou. Characterizing scheduling delay for low-latency data analytics workloads. In Proc. of IEEE IPDPS, 2018.

[10]

W. Chen, J. Rao, and X. Zhou. Preemptive, low latency datacenter scheduling via lightweight virtualization. In Proc. of USENIX ATC, 2017.

Digital Library

[11]

E. Cortez, A. Bonde, A. Muzio, M. Russinovich, M. Fontoura, and R. Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proc. of the ACM SOSP, 2017.

Digital Library

[12]

P. Delgado, F. Dinu, A.-M. Kermarrec, and W. Zwaenepoel. Hawk: Hybrid datacenter scheduling. In Proc. of USENIX ATC, 2015.

Digital Library

[13]

C. Engle, A. Lupher, R. Xin, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: fast data analysis using coarse-grained distributed memory. In Proc. of ACM SIGMOD, 2012.

Digital Library

[14]

L. Fang, K. Nguyen, G. Xu, B. Demsky, and S. Lu. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In Proc. of ACM SOSP, 2015.

Digital Library

[15]

R. Gandhi, D. Xie, and Y. C. Hu. Pikachu: How to rebalance load in optimizing MapReduce on heterogeneous clusters. In Proc. of USENIX ATC, 2013.

[16]

P. Garefalakis, K. Karanasos, P. R. Pietzuch, A. Suresh, and S. Rao. Medea: scheduling of long running applications in shared production clusters. In Proc. of the ACM EuroSys, 2018.

Digital Library

[17]

I. Gog, J. Giceva, M. Schwarzkopf, K. Vaswani, D. Vytiniotis, G. Ramalingam, M. Costa, D. G. Murray, S. Hand, and M. Isard. Broom: Sweeping out garbage collection from big data systems. In Proc. of USENIX HotOS, 2015.

[18]

J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In Proc. of USENIX OSDI, 2014.

[19]

R. Grandl, M. Chowdhury, A. Akella, and G. Ananthanarayanan. Altruistic scheduling in multi-resource clusters. In Proc. of USENIX OSDI, 2016.

[20]

R. Grandl, S. Kandula, S. Rao, A. Akella, and J. Kulkarni. Graphene: Packing and dependency-aware scheduling for data-parallel clusters. In Proc. of the USENIX OSDI, 2016.

[21]

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proc. of USENIX NSDI, 2011.

Digital Library

[22]

S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Proc. of IEEE Data Engineering Workshops (ICDEW), 2010.

[23]

C. Hunt and B. John. Java performance. Prentice Hall Press, 2011.

[24]

C. Iorgulescu, F. Dinu, A. Raza, W. U. Hassan, and W. Zwaenepoel. Don't cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling. In Proc. of USENIX ATC, 2017.

[25]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed dataparallel programs from sequential building blocks. In Proc. of ACM SOSP, 2007.

[26]

K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S. Sakalanaga. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In Proc. of USENIX ATC, 2015.

Digital Library

[27]

Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. Skewtune: mitigating skew in MapReduce applications. In Proc. of ACM SIGMOD, 2012.

Digital Library

[28]

Y. Kwon, K. Ren, M. Balazinska, B. Howe, and J. Rolia. Managing skew in hadoop. Proc. of IEEE Data Eng. Bull., 2013.

[29]

L. Liu and H. Xu. Elasecutor: Elastic executor scheduling in data analytics systems. In Proc. of the ACM SoCC, 2018.

Digital Library

[30]

D. Lo, L. Cheng, R. Govindaraju, P. Ranganathan, and C. Kozyrakis. Heracles: improving resource efficiency at scale. In Proc. of ACM ISCA, 2015.

Digital Library

[31]

J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. of IEEE/ACM MICRO, 2011.

Digital Library

[32]

K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, and O. Mutlu. Yak: A high-performance big-data-friendly garbage collector. In Proc. of USENIX OSDI, 2016.

[33]

K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, and G. Xu. Facade: A compiler and runtime for (almost) object-bounded big data applications. In Proc. of ACM SOSP, 2015.

Digital Library

[34]

K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, B.-G. Chun, and V. ICSI. Making sense of performance in data analytics frameworks. In Proc. of USENIX NSDI, 2015.

[35]

K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: distributed, low latency scheduling. In Proc. of ACM SOSP, 2013.

Digital Library

[36]

J. W. Park, A. Tumanov, A. Jiang, M. A. Kozuch, and G. R. Ganger. 3sigma: distribution-based cluster scheduling for runtime uncertainty. In Proc. of the ACM EuroSys, 2018.

Digital Library

[37]

A. Qiao, A. Aghayev, W. Yu, H. Chen, Q. Ho, G. A. Gibson, and E. P. Xing. Litz: Elastic framework for high-performance distributed machine learning. In Proc. of the USENIX ATC), 2018.

[38]

C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proc. of ACM SoCC, 2012.

Digital Library

[39]

B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. Murthy, and C. Curino. Apache tez: A unifying framework for modeling and building data processing applications. In Proc. of ACM SIGMOD, 2015.

Digital Library

[40]

T.-I. Salomie, G. Alonso, T. Roscoe, and K. Elphinstone. Application level ballooning for efficient server consolidation. In Proc. of ACM Eurosys, 2013.

Digital Library

[41]

A. Thusoo, J. S. Sarma, N.Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. of VLDB Endowment, 2009.

Digital Library

[42]

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et al. Apache Hadoop YARN: Yet another resource negotiator. In Proc. of ACM SoCC, 2013.

Digital Library

[43]

S. Venkataraman, Z. Yang, M. Franklin, B. Recht, and I. Stoica. Ernest: efficient performance prediction for large-scale advanced analytics. In Proc. of USENIX NSDI, 2016.

[44]

C. A. Waldspurger. Memory resource management in vmware esx server. ACM SIGOPS Operating Systems Review, 2002.

[45]

J. Wang and M. Balazinska. Elastic memory management for cloud data analytics. In Proc. of USENIX ATC, 2017.

[46]

W. Xiao, R. Bhardwaj, R. Ramjee, M. Sivathanu, N. Kwatra, Z. Han, P. Patel, X. Peng, H. Zhao, Q. Zhang, et al. Gandiva: introspective cluster scheduling for deep learning. In Proc. of the USENIX OSDI, 2018.

[47]

H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In Proc. of ACM ISCA, 2013.

Digital Library

[48]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. of USENIX NSDI, 2012.

[49]

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, et al. Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11):56--65, 2016.

Digital Library

[50]

Z. Zhang, L. Cherkasova, and B. T. Loo. Exploiting cloud heterogeneity to optimize performance and cost of MapReduce processing. In Proc. of ACM SIGMETRICS, 2015.

Digital Library

Cited By

Dou RWang XMa R(2024)Emma: Elastic Multi-Resource Management for Realtime Stream ProcessingIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621313(1581-1590)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOM52122.2024.10621313
Zhao JZhou XChang SXu CButt AMi NChard K(2023)Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in GolangProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592998(169-180)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592998
Huang HZhao YRao JWu SJin HWang DKun SPan L(2023)Adapt Burstable Containers to Variable CPU ResourcesIEEE Transactions on Computers10.1109/TC.2022.317448072:3(614-626)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TC.2022.3174480
Show More Cited By

Index Terms

Pufferfish: Container-driven Elastic Memory Management for Data-intensive Applications
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
  2. Dependable and fault-tolerant systems and networks
    1. Availability
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
    2. Software system structures
      1. Distributed systems organizing principles
        Cloud computing

Recommendations

Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14

Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '19: Proceedings of the ACM Symposium on Cloud Computing

November 2019

503 pages

ISBN:9781450369732

DOI:10.1145/3357223

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

SoCC '19

Sponsor:

SoCC '19: ACM Symposium on Cloud Computing

November 20 - 23, 2019

CA, Santa Cruz, USA

Acceptance Rates

SoCC '19 Paper Acceptance Rate 39 of 157 submissions, 25%;

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,025
Total Downloads

Downloads (Last 12 months)166
Downloads (Last 6 weeks)17

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dou RWang XMa R(2024)Emma: Elastic Multi-Resource Management for Realtime Stream ProcessingIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621313(1581-1590)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOM52122.2024.10621313
Zhao JZhou XChang SXu CButt AMi NChard K(2023)Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in GolangProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592998(169-180)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592998
Huang HZhao YRao JWu SJin HWang DKun SPan L(2023)Adapt Burstable Containers to Variable CPU ResourcesIEEE Transactions on Computers10.1109/TC.2022.317448072:3(614-626)Online publication date: 1-Mar-2023
https://doi.org/10.1109/TC.2022.3174480
Kang TYu HLee E(2023)Container Restart Reduction Technique in Kubernetes Using Memory Oversubscription2023 IEEE 20th International Conference on Mobile Ad Hoc and Smart Systems (MASS)10.1109/MASS58611.2023.00081(606-607)Online publication date: 25-Sep-2023
https://doi.org/10.1109/MASS58611.2023.00081
Dou RMa R(2023)Latency-Oriented Elastic Memory Management at Task-Granularity for Stateful Streaming ProcessingIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228963(1-10)Online publication date: 17-May-2023
https://doi.org/10.1109/INFOCOM53939.2023.10228963
Zhao JPi AZhou XChang SXu CBellavista PZhang KGherbi ABagchi SPatiño MDi Modica GGascon-Samson J(2022)Improving Concurrent GC for Latency Critical Services in Multi-tenant SystemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3531515(43-55)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3528535.3531515
Pi AZhou XXu CWeissman JChandra AGavrilovska ATiwari D(2022)HolmesProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531464(110-121)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3502181.3531464
Pi AZhao JWang SZhou XZhang KGherbi AVenkatasubramanian NVeiga L(2021)Memory at your serviceProceedings of the 22nd International Middleware Conference10.1145/3464298.3493394(185-197)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3464298.3493394
Zhao JPi AWang SZhou X(2021)FlashByte: Improving Memory Efficiency with Lightweight Native Storage2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00016(61-70)Online publication date: May-2021
https://doi.org/10.1109/CCGrid51090.2021.00016
Lin JLiu FCai ZHuang ZLi WXiao N(2021)Adaptive Online Estimation of Thrashing-Avoiding Memory Reservations for Long-Lived ContainersCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-030-67537-0_37(620-639)Online publication date: 22-Jan-2021
https://doi.org/10.1007/978-3-030-67537-0_37
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents