research-article

A Case Against Tiny Tasks in Iterative Analytics

Authors:

Subramanya R. Dulloor,

Amitabha RoyAuthors Info & Claims

HotOS '17: Proceedings of the 16th Workshop on Hot Topics in Operating Systems

Pages 144 - 149

https://doi.org/10.1145/3102980.3103004

Published: 07 May 2017 Publication History

Abstract

Big data systems such as Spark are built around the idea of splitting an iterative parallel program into tiny tasks with other aspects of system design built around this basic design principle. Unfortunately, in spite of immense engineering effort, tiny tasks have unavoidably large overheads. We use the example of logistic regression -- a common machine learning primitive -- to compare the performance of Spark to different designs that converge to a hand-coded parallel MPI-based implementation. We conclude that Spark leaves orders of magnitude performance on the table, due to its insistence on setting the granularity of a task to a single iteration. We counter a common argument for the tiny task approach --namely better resilience to faults -- by demonstrating that optimum job checkpoint intervals are far longer than the duration of the tiny tasks favored in Spark's design. We propose an alternative approach that relies on an auto-parallelizing compiler tightly integrated with the MPI runtime, illustrating the opposite end of the spectrum where task granularities are as large as possible.

References

[1]

2015. Apache Spark Survey 2015 Report. http://go.databricks.com/2015-spark-survey/. (2015).

[2]

2017. Cori Supercomputer at NERSC. http://www.nersc.gov/users/computational-systems/cori/. (2017).

[3]

Bilge Acun, Abhishek Gupta, Nikhil Jain, Akhil Langer, Harshitha Menon, Eric Mikida, Xiang Ni, Michael Robson, Yanhua Sun, Ehsan Totoni, Lukasz Wesolowski, and Laxmikant Kale. 2014. Parallel Programming with Migratable Objects: Charm++ in Practice (SC'14).

[4]

George Almási, Philip Heidelberger, Charles J. Archer, Xavier Martorell, C. Chris Erway, José E. Moreira, B. Steinmacher-Burow, and Yili Zheng. 2005. Optimization of MPI Collective Communication on BlueGene/L Systems (ICS '05).

[5]

Michael Anderson, Shaden Smith, Narayanan Sundaram, Mihai Capota, Zheguang Zhao, Subramanya Dulloor, Nadathur Satish, and Theodore L Willke. 2017. Bridging the Gap Between HPC and Big Data Frameworks. Proceedings of the VLDB Endowment 10, 8 (2017).

Digital Library

[6]

M. Barnett, R. Littlefield, D. G. Payne, and R. van de Geijn. 1993. Global combine on mesh architectures with wormhole routing. In Proceedings Seventh International Parallel Processing Symposium.

Digital Library

[7]

Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.

[8]

R. Choy and A. Edelman. 2005. Parallel MATLAB: Doing it Right. Proc. IEEE 93, 2 (2005).

[9]

Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur Cetintemel, and Stan Zdonik. 2015. An Architecture for Compiling UDF-centric Workflows. Proc. VLDB Endow. 8, 12 (Aug. 2015).

Digital Library

[10]

Databricks. 2015. Project Tungsten: Bringing Apache Spark Closer to Bare Metal. https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html. (2015).

[11]

Databricks. 2016. GPU Acceleration in Databricks: Speeding Up Deep Learning on Apache Spark. https://databricks.com/blog/2016/10/27/gpu-acceleration-in-databricks.html. (2016).

[12]

Jack Dongarra, Thomas Herault, and Yves Robert. 2015. Fault tolerance techniques for high-performance computing. In Fault-Tolerance Techniques for High-Performance Computing. Springer.

[13]

G. M. Essertel, R. Y. Tahboub, J. M. Decker, K. J. Brown, K. Olukotun, and T. Rompf. 2017. Flare: Native Compilation for Heterogeneous Workloads in Apache Spark. https://arxiv.org/abs/1703.08219. (2017).

[14]

Abhishek Gupta, Bilge Acun, Osman Sarood, and Laxmikant V. Kale. 2014. Towards Realizing the Potential of Malleable Parallel Jobs (HiPC '14).

[15]

Hartmut Kaiser, Thomas Heller, Bryce Adelstein-Lelbach, Adrian Serio, and Dietmar Fey. 2014. HPX: A Task Based Programming Model in a Global Address Space (PGAS '14).

Digital Library

[16]

Frank McSherry, Michael Isard, and Derek G. Murray. 2015. Scalability! But at What COST? (HotOS'15).

[17]

Kay Ousterhout, Aurojit Panda, Joshua Rosen, Shivaram Venkataraman, Reynold Xin, Sylvia Ratnasamy, Scott Shenker, and Ion Stoica. 2013. The Case for Tiny Tasks in Compute Clusters (HotOS'13).

[18]

Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun. 2015. Making Sense of Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation (NSDI'15).

Digital Library

[19]

Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, Low Latency Scheduling (SOSP'13).

Digital Library

[20]

Shoumik Palkar, James J. Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, and Matei Zaharia. 2017. Weld: A Common Runtime for High Performance Data Analytics (CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research).

[21]

Suraj Prabhakaran, Marcel Neumann, Sebastian Rinke, Felix Wolf, Abhishek Gupta, and Laxmikant V. Kale. 2015. A Batch System with Efficient Adaptive Scheduling for Malleable and Evolving Applications (IPDPS '15).

[22]

Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. The International Journal of High Performance Computing Applications 19, 1 (2005), 49--66.

Digital Library

[23]

Ehsan Totoni, Todd A. Anderson, and Tatiana Shpeisman. 2017. HPAT: High Performance Analytics with Scripting Ease-of-Use. https://arxiv.org/abs/1611.04934/. (2017).

[24]

Ehsan Totoni, Todd A. Anderson, and Tatiana Shpeisman. 2017. HPAT: High Performance Analytics with Scripting Ease-of-Use (ICS'17 (to appear)).

Digital Library

[25]

E. Totoni, A. Bhatele, E. J. Bohm, N. Jain, C. L. Mendes, R. M. Mokos, G. Zheng, and L. V. Kale. 2011. Simulation-Based Performance Analysis and Tuning for a Two-Level Directly Connected System. In IEEE 17th International Conference on Parallel and Distributed Systems.

Digital Library

[26]

Ehsan Totoni, Wajih Ul Hassan, Todd A. Anderson, and Tatiana Shpeisman. 2017. HiFrames: High Performance Data Frames in a Scripting Language. https://arxiv.org/abs/1704.02341.(2017).

[27]

Shivaram Venkataraman, Aurojit Panda, Ali Ousterhout, Kay Ghodsi, Michael J. Franklin, Benjamin Recht, and Ion Stoica. 2017. Drizzle: Fast and Adaptable Stream Processing at Scale. http://shivaram.org/drafts/drizzle.pdf. (2017).

Digital Library

[28]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12).

Digital Library

[29]

Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale (SOSP'13).

Digital Library

Cited By

Bora SWalker BFidler M(2023)The Tiny-Tasks Granularity Trade-Off: Balancing Overhead Versus Performance in Parallel SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323371234:4(1128-1144)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TPDS.2022.3233712
Shan YKesidis GJain AUrgaonkar BKhamse-Ashari JLambadaris I(2020)Heterogeneous MacroTasking (HeMT) for Parallel Processing in the CloudProceedings of the 2020 6th International Workshop on Container Technologies and Container Clouds10.1145/3429885.3429962(7-12)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3429885.3429962
Qu HMashayekhi OShah CLevis POliveira RFelber PHu Y(2018)Decoupling the control plane from program control flow for flexibility and performance in cloud computingProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190516(1-13)Online publication date: 23-Apr-2018
https://dl.acm.org/doi/10.1145/3190508.3190516
Show More Cited By

Recommendations

The case for tiny tasks in compute clusters
HotOS'13: Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems

We argue for breaking data-parallel jobs in compute clusters into tiny tasks that each complete in hundreds of milliseconds. Tiny tasks avoid the need for complex skew mitigation techniques: by breaking a large job into millions of tiny tasks, work will ...
Big Data Analytics
Communicating Data-Parallel Tasks: An MPI Library for HPF
HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)

High Performance Fortran (HPF) has emerged as a standard dialect of Fortran for data-parallel computing. However, HPF does not support task parallelism or heterogeneous computing adequately. This paper presents a summary of our work on a library-based ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HotOS '17: Proceedings of the 16th Workshop on Hot Topics in Operating Systems

May 2017

185 pages

ISBN:9781450350686

DOI:10.1145/3102980

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

HotOS '17

Sponsor:

SIGOPS

HotOS '17: Workshop on Hot Topics in Operating Systems

May 7 - 10, 2017

BC, Whistler, Canada

Upcoming Conference

HOTOS '25

Sponsor:
sigops

Workshop on Hot Topics in Operating Systems

May 14 - 16, 2025

Banff or Lake Louise , AB , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
545
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bora SWalker BFidler M(2023)The Tiny-Tasks Granularity Trade-Off: Balancing Overhead Versus Performance in Parallel SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.323371234:4(1128-1144)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TPDS.2022.3233712
Shan YKesidis GJain AUrgaonkar BKhamse-Ashari JLambadaris I(2020)Heterogeneous MacroTasking (HeMT) for Parallel Processing in the CloudProceedings of the 2020 6th International Workshop on Container Technologies and Container Clouds10.1145/3429885.3429962(7-12)Online publication date: 7-Dec-2020
https://dl.acm.org/doi/10.1145/3429885.3429962
Qu HMashayekhi OShah CLevis POliveira RFelber PHu Y(2018)Decoupling the control plane from program control flow for flexibility and performance in cloud computingProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190516(1-13)Online publication date: 23-Apr-2018
https://dl.acm.org/doi/10.1145/3190508.3190516
Uta AVarbanescu AMusaafir ALemaire CIosup A(2018)Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00019(66-77)Online publication date: Sep-2018
https://doi.org/10.1109/CLUSTER.2018.00019

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents