research-article

Public Access

Interruptible tasks: treating memory pressure as interrupts for highly scalable data-parallel programs

Authors:

Shan LuAuthors Info & Claims

SOSP '15: Proceedings of the 25th Symposium on Operating Systems Principles

Pages 394 - 409

https://doi.org/10.1145/2815400.2815407

Published: 04 October 2015 Publication History

Abstract

Real-world data-parallel programs commonly suffer from great memory pressure, especially when they are executed to process large datasets. Memory problems lead to excessive GC effort and out-of-memory errors, significantly hurting system performance and scalability. This paper proposes a systematic approach that can help data-parallel tasks survive memory pressure, improving their performance and scalability without needing any manual effort to tune system parameters. Our approach advocates interruptible task (ITask), a new type of data-parallel tasks that can be interrupted upon memory pressure---with part or all of their used memory reclaimed---and resumed when the pressure goes away.

To support ITasks, we propose a novel programming model and a runtime system, and have instantiated them on two state-of-the-art platforms Hadoop and Hyracks. A thorough evaluation demonstrates the effectiveness of ITask: it has helped real-world Hadoop programs survive 13 out-of-memory problems reported on StackOverflow; a second set of experiments with 5 already well-tuned programs in Hyracks on datasets of different sizes shows that the ITask-based versions are 1.5--3x faster and scale to 3--24x larger datasets than their regular counterparts.

Supplementary Material

MP4 File (p394.mp4)

Download
2086.63 MB

References

[1]

AsterixDB. https://asterixdb.ics.uci.edu/.

[2]

Cascading. http://www.cascading.org.

[3]

Hadoop YARN. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/.

[4]

Hyracks. http://hyracks.org/.

[5]

Out of memory error due to appending values to string-builder. http://stackoverflow.com/questions/12831076/.

[6]

Out of memory error due to large spill buffer. http://stackoverflow.com/questions/8464048/.

[7]

Out of memory error in a web parser. http://stackoverflow.com/questions/17707883/.

[8]

Out of memory error in building inverted index. http://stackoverflow.com/questions/17980491/.

[9]

Out of memory error in computing frequencies of attribute values. http://stackoverflow.com/questions/23042829/.

[10]

Out of memory error in customer review processing. http://stackoverflow.com/questions/20247185/.

[11]

Out of memory error in efficient sharded positional indexer. http://www.cs.cmu.edu/~lezhao/TA/2010/HW2/.

[12]

Out of memory error in hash join using distributed-cache. http://stackoverflow.com/questions/15316539/.

[13]

Out of memory error in map-side aggregation. http://stackoverflow.com/questions/16684712/.

[14]

Out of memory error in processing a text file as a record. http://stackoverflow.com/questions/12466527/.

[15]

Out of memory error in word cooccurrence matrix stripes builder. http://stackoverflow.com/questions/12831076/.

[16]

The performance comparison between in-mapper combiner and regular combiner. http://stackoverflow.com/questions/10925840/.

[17]

Reducer hange at the merge step. http://stackoverflow.com/questions/15541900/.

[18]

Tuning Spark. http://spark.apache.org/docs/latest/tuning.html.

[19]

Ahmad, F., Chakradhar, S. T., Raghunathan, A., and Vijaykumar, T. N. Shufflewatcher: Shuffle-aware scheduling in multi-tenant mapreduce clusters. In USENIX ATC (2014), pp. 1--12.

Digital Library

[20]

Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V. R., Bu, Y., Carey, M. J., Cetindil, I., Cheelangi, M., Faraaz, K., Gabrielova, E., Grover, R., Heilbron, Z., Kim, Y., Li, C., Li, G., Ok, J. M., Onose, N., Pirzadeh, P., Tsotras, V. J., Vernica, R., Wen, J., and Westmann, T. Asterixdb: A scalable, open source BDMS. PVLDB 7, 14 (2014), 1905--1916.

Digital Library

[21]

Hadoop: Open-source implementation of MapReduce. http://hadoop.apache.org.

[22]

Behm, A., Borkar, V. R., Carey, M. J., Grover, R., Li, C., Onose, N., Vernica, R., Deutsch, A., Papakonstantinou, Y., and Tsotras, V. J. ASTERIX: Towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases 29 (2011), 185--216.

Digital Library

[23]

Bond, M. D., and McKinley, K. S. Leak pruning. In ASPLOS (2009), pp. 277--288.

Digital Library

[24]

Borkar, V. R., Carey, M. J., Grover, R., Onose, N., and Vernica, R. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE (2011), pp. 1151--1162.

Digital Library

[25]

Borkar, V. R., Carey, M. J., and Li, C. Inside "Big Data Management": Ogres, Onions, or Parfaits? In EDBT (2012), pp. 3--14.

Digital Library

[26]

Bu, Y., Borkar, V., Xu, G., and Carey, M. J. A bloat-aware design for big data applications. In ISMM (2013), pp. 119--130.

Digital Library

[27]

Bu, Y., Borkar, V. R., Jia, J., Carey, M. J., and Condie, T. Pregelix: Big(ger) graph analytics on a dataflow engine. PVLDB 8, 2 (2014), 161--172.

Digital Library

[28]

Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB 1, 2 (2008), 1265--1276.

Digital Library

[29]

Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R. R., Bradshaw, R., and Weizenbaum, N. FlumeJava: Easy, efficient data-parallel pipelines. In PLDI (2010), pp. 363--375.

Digital Library

[30]

Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2 (2008), 4:1--4:26.

Digital Library

[31]

Chu, C. T., Kim, S. K., Lin, Y. A., Yu, Y., Bradski, G. R., Ng, A. Y., and Olukotun, K. Map-reduce for machine learning on multicore. In NIPS (2006), pp. 281--288.

Digital Library

[32]

Condie, T., Conway, N., Alvaro, P., Hellerstein, J. M., Elmeleegy, K., and Sears, R. MapReduce online. In NSDI (2010), pp. 313--328.

Digital Library

[33]

Dean, J., and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In OSDI (2004), pp. 137--150.

Digital Library

[34]

Gog, I., Giceva, J., Schwarzkopf, M., Vaswani, K., Vytiniotis, D., Ramalingam, G., Costa, M., Murray, D. G., Hand, S., and Isard, M. Broom: Sweeping out garbage collection from big data systems. In HotOS (2015).

Digital Library

[35]

Guo, Z., Fan, X., Chen, R., Zhang, J., Zhou, H., McDirmid, S., Liu, C., Lin, W., Zhou, J., and Zhou, L. Spotting code optimizations in data-parallel pipelines through periscope. In OSDI (2012), pp. 121--133.

Digital Library

[36]

Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., and Babu, S. Starfish: A self-tuning system for big data analytics. In CIDR (2011), pp. 261--272.

[37]

Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R., Shenker, S., and Stoica, I. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI (2011), pp. 295--308.

Digital Library

[38]

Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys (2007), pp. 59--72.

Digital Library

[39]

Kryo. https://github.com/EsotericSoftware/kryo.

[40]

Kwon, Y., Ren, K., Balazinska, M., and Howe, B. Managing skew in hadoop. IEEE Data Eng. Bull. 36, 1 (2013), 24--33.

[41]

Kyrola, A., Blelloch, G., and Guestrin, C. GraphChi: Large-Scale Graph Computation on Just a PC. In OSDI (2012), pp. 31--46.

Digital Library

[42]

Liu, J., Ravi, N., Chakradhar, S., and Kandemir, M. Panacea: Towards holistic optimization of MapReduce applications. In CGO (2012), pp. 33--43.

Digital Library

[43]

Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J. M. Distributed GraphLab: A framework for machine learning in the cloud. PVLDB 5, 8 (2012), 716--727.

Digital Library

[44]

Mitchell, N., Schonberg, E., and Sevitsky, G. Four trends leading to java runtime bloat. IEEE Software 27, 1 (2010), 56--63.

Digital Library

[45]

Mitchell, N., and Sevitsky, G. The causes of bloat, the limits of health. In OOPSLA (2007), pp. 245--260.

Digital Library

[46]

Murray, D. G., Isard, M., and Yu, Y. Steno: Automatic optimization of declarative queries. In PLDI (2011), pp. 121--131.

Digital Library

[47]

Nguyen, K., Wang, K., Bu, Y., Fang, L., Hu, J., and Xu, G. Facade: A compiler and runtime for (almost) object-bounded big data applications. In ASPLOS (2015), pp. 675--690.

Digital Library

[48]

Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig latin: A not-so-foreign language for data processing. In SIGMOD (2008), pp. 1099--1110.

Digital Library

[49]

Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. Interpreting the data: Parallel analysis with sawzall. Scientific Programming 13, 4 (2005), 277--298.

Digital Library

[50]

Tang, Y., Gao, Q., and Qin, F. LeakSurvivor: Towards safely tolerating memory leaks for garbage-collected languages. In USENIX ATC (2008), pp. 307--320.

Digital Library

[51]

Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., and Murthy, R. Hive - A warehousing solution over a map-reduce framework. PVLDB 2, 2 (2009), 1626--1629.

Digital Library

[52]

The TPC Benchmark(TM)H (TPC-H). http://www.tpc.org/tpch.

[53]

Xu, G., Mitchell, N., Arnold, M., Rountev, A., and Sevitsky, G. Software bloat analysis: Finding, removing, and preventing performance problems in modern large-scale object-oriented applications. In FoSER (2010), pp. 421--426.

Digital Library

[54]

Xu, G. H., Mitchell, N., Arnold, M., Rountev, A., Schonberg, E., and Sevitsky, G. Scalable runtime bloat detection using abstract dynamic slicing. TOSEM 23, 3 (2014), 23.

Digital Library

[55]

Yahoo! Webscope Program. http://webscope.sandbox.yahoo.com/.

[56]

Yang, H.-C., Dasdan, A., Hsiao, R.-L., and Parker, D. S. Map-reduce-merge: Simplified relational data processing on large clusters. In SIGMOD (2007), pp. 1029--1040.

Digital Library

[57]

Yu, Y., Gunda, P. K., and Isard, M. Distributed aggregation for data-parallel computing: Interfaces and implementations. In SOSP (2009), pp. 247--260.

Digital Library

[58]

Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P. K., and Currey, J. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI (2008), pp. 1--14.

Digital Library

[59]

Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., and Stoica, I. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI (2012), pp. 15--28.

Digital Library

[60]

Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. Spark: Cluster computing with working sets. In HotCloud (2010).

Digital Library

[61]

Zhang, J., Zhou, H., Chen, R., Fan, X., Guo, Z., Lin, H., Li, J. Y., Lin, W., Zhou, J., and Zhou, L. Optimizing data shuffling in data-parallel computation by understanding user-defined functions. In NSDI (2012), pp. 22--22.

Digital Library

[62]

Zhou, J., Larson, P.-Å., and Chaiken, R. Incorporating partitioning and parallel plans into the SCOPE optimizer. In ICDE (2010), pp. 1060--1071.

Cited By

Chen LLiu SWang CMa HQiao YWang ZWu CLu YFeng XCui HLu SXu HGavrilovska ATerry D(2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691943
Eymer JDexter PRaskind JLiu Y(2024)A Runtime System for Interruptible Query Processing: When Incremental Computing Meets Fine-Grained ParallelismProceedings of the ACM on Programming Languages10.1145/36897728:OOPSLA2(1729-1756)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689772
Hu YHuang GHuang PDruschel PKaufmann AMace JFlinn JSeltzer M(2023)Pushing Performance Isolation Boundaries into Application with pBoxProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613159(247-263)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613159
Show More Cited By

Index Terms

Interruptible tasks: treating memory pressure as interrupts for highly scalable data-parallel programs

Recommendations

HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM
SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles

High-capacity non-volatile memory (NVM) is a new main memory tier. Tiered DRAM+NVM servers increase total memory capacity by up to 8x, but can diminish memory bandwidth by up to 7x and inflate latency by up to 63% if not managed well. We study existing ...
MIND: In-Network Memory Management for Disaggregated Data Centers
SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles

Memory disaggregation promises transparent elasticity, high resource utilization and hardware heterogeneity in data centers by physically separating memory and compute into network-attached resource "blades". However, existing designs achieve ...
Cross-checking semantic correctness: the case of finding file system bugs
SOSP '15: Proceedings of the 25th Symposium on Operating Systems Principles

Today, systems software is too complex to be bug-free. To find bugs in systems software, developers often rely on code checkers, like Linux's Sparse. However, the capability of existing tools used in commodity, large-scale systems is limited to finding ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SOSP '15: Proceedings of the 25th Symposium on Operating Systems Principles

October 2015

499 pages

ISBN:9781450338349

DOI:10.1145/2815400

General Chair:
Ethan Miller
UC Santa Cruz
,
Program Chair:
Steven Hand
Google

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SSRC: Storage Systems Research Center, UC Santa Cruz
SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

SOSP '15

Sponsor:

SSRC
SIGOPS

SOSP '15: ACM SIGOPS 25th Symposium on Operating Systems Principles

October 4 - 7, 2015

California, Monterey

Acceptance Rates

SOSP '15 Paper Acceptance Rate 30 of 181 submissions, 17%;

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25

Sponsor:
sigops

ACM SIGOPS 31st Symposium on Operating Systems Principles

October 13 - 16, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
1,586
Total Downloads

Downloads (Last 12 months)121
Downloads (Last 6 weeks)18

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen LLiu SWang CMa HQiao YWang ZWu CLu YFeng XCui HLu SXu HGavrilovska ATerry D(2024)A tale of two pathsProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691943(77-95)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691943
Eymer JDexter PRaskind JLiu Y(2024)A Runtime System for Interruptible Query Processing: When Incremental Computing Meets Fine-Grained ParallelismProceedings of the ACM on Programming Languages10.1145/36897728:OOPSLA2(1729-1756)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689772
Hu YHuang GHuang PDruschel PKaufmann AMace JFlinn JSeltzer M(2023)Pushing Performance Isolation Boundaries into Application with pBoxProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613159(247-263)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613159
Zhao JPi AZhou XChang SXu CBellavista PZhang KGherbi ABagchi SPatiño MDi Modica GGascon-Samson J(2022)Improving Concurrent GC for Latency Critical Services in Multi-tenant SystemsProceedings of the 23rd ACM/IFIP International Middleware Conference10.1145/3528535.3531515(43-55)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3528535.3531515
Li ZWu MCriswell JWilliams DXia Y(2022)Transparent and lightweight object placement for managed workloads atop hybrid memoriesProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516822(72-80)Online publication date: 25-Feb-2022
https://dl.acm.org/doi/10.1145/3516807.3516822
Kweun MKim GOh BJung SUm TLee W(2022)PokéMem: Taming Wild Memory Consumers in Apache Spark2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00015(59-69)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00015
Fördős VBarbosa Rodrigues A(2022)Lesser Evil: Embracing Failure to Protect Overall System AvailabilityDistributed Applications and Interoperable Systems10.1007/978-3-031-16092-9_5(57-73)Online publication date: 6-Sep-2022
https://doi.org/10.1007/978-3-031-16092-9_5
Suo KSon JCheng DChen WBaidya S(2021)Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime Reusing2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00018(433-443)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00018
Zhao JPi AWang SZhou X(2021)FlashByte: Improving Memory Efficiency with Lightweight Native Storage2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00016(61-70)Online publication date: May-2021
https://doi.org/10.1109/CCGrid51090.2021.00016
Wu MZhao ZYang YLi HChen HZang BGuan HLi SLu CZhang TGavrilovska AZadok E(2020)PlatinumProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489157(159-172)Online publication date: 15-Jul-2020
https://dl.acm.org/doi/10.5555/3489146.3489157
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten