research-article

Design and evaluation of the gemtc framework for GPU-enabled many-task computing

Authors:

Scott J. Krieder,

Justin M. Wozniak,

Timothy Armstrong,

Daniel S. Katz,

Benjamin Grimmer,

Ioan RaicuAuthors Info & Claims

HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing

Pages 153 - 164

https://doi.org/10.1145/2600212.2600228

Published: 23 June 2014 Publication History

Abstract

We present the design and first performance and usability evaluation of GeMTC, a novel execution model and runtime system that enables accelerators to be programmed with many concurrent and independent tasks of potentially short or variable duration. With GeMTC, a broad class of such "many-task" applications can leverage the increasing number of accelerated and hybrid high-end computing systems. GeMTC overcomes the obstacles to using GPUs in a many-task manner by scheduling and launching independent tasks on hardware designed for SIMD-style vector processing. We demonstrate the use of a high-level MTC programming model (the Swift parallel dataflow language) to run tasks on many accelerators and thus provide a high-productivity programming model for the growing number of supercomputers that are accelerator-enabled. While still in an experimental stage, GeMTC can already support tasks of fine (subsecond) granularity and execute concurrent heterogeneous tasks on 86,000 independent GPU warps spanning 2.7M GPU threads on the Blue Waters supercomputer.

References

[1]

I. Raicu, Z. Zhang, M. Wilde, I. Foster, P. Beckman, K. Iskra, and B. Clifford, "Toward loosely coupled programming on petascale systems," in Proc. of 2008 ACM/IEEE Conf. on Supercomputing, ser. SC '08. Piscataway, NJ: IEEE Press, 2008, pp. 22:1--22:12.

Digital Library

[2]

I. Raicu, Many-task computing: bridging the gap between high-throughput computing and high-performance computing. ProQuest, 2009.

Digital Library

[3]

I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde, "Falkon: a Fast and Light-weight tasK executiON framework," in Proc. of the 2007 ACM/IEEE Conf. on Supercomputing (SC'07). New York, NY, USA: ACM, 2007, pp. 43:1--43:12.

Digital Library

[4]

M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster, "Swift: A language for distributed parallel scripting," Parallel Computing, vol. 37, pp. 633--652, 2011.

Digital Library

[5]

J. M. Wozniak, M. Wilde, and D. S. Katz, "JETS: Language and system support for many-parallel-task workflows," J. Grid Computing, vol. 11, no. 3, pp. 341--360, 2013.

Digital Library

[6]

M. Hategan, J. Wozniak, and K. Maheshwari, "Coasters: uniform resource provisioning and access for scientific computing on clouds and grids," in Proc. Utility and Cloud Computing, 2011, pp. 114--121.

Digital Library

[7]

D. P. Anderson, "Boinc: A system for public-resource computing and storage," in Proc of 5th IEEE/ACM Intl. Workshop on Grid Computing. IEEE, 2004.

Digital Library

[8]

A. B. Yoo, M. A. Jette, and M. Grondona, "Slurm: Simple linux utility for resource management," in Job Scheduling Strategies for Parallel Processing. Springer, 2003, pp. 44--60.

[9]

N. Desai, "Cobalt: an open source platform for hpc system software research," in Edinburgh BG/L System Software Workshop, 2005.

[10]

IBM, "Sub-block jobs," in IBM System Blue Gene Solution: Blue Gene/Q System Administration, 2013, pp. 80--81, Sec. 6.3.

[11]

K. Ousterhout, A. Panda, J. Rosen, S. Venkataraman, R. Xin, S. Ratnasamy, S. Shenker, and I. Stoica, \The case for tiny tasks in compute clusters," in Proc. of the 14th USENIX Conf. on Hot Topics in Operating Systems. USENIX Association, 2013, pp. 14--14.

Digital Library

[12]

L. V. Kale and G. Zheng, "Charm++ and ampi: Adaptive runtime strategies via migratable objects," Advanced Computational Infrastructures for Parallel and Distributed Applications, pp. 265--282, 2009.

[13]

S. Wienke, P. Springer, C. Terboven, and D. an Mey, "OpenACC - first experiences with real-world applications," in Euro-Par 2012 Parallel Processing. Springer, 2012, pp. 859--870.

Digital Library

[14]

NVIDIA Inc., "CUDA C Programming Guide PG-02829-001 v5.5, Section 3.2.5, Asynchronous Concurrent Execution," 2013.

[15]

NVIDIA Inc., "CUDA C Programming Guide PG-02829-001 v5.5, Appendix C, Dynamic Parallelism Execution," 2013.

[16]

J. Johnson, S. J. Krieder, B. Grimmer, J. M. Wozniak, M. Wilde, and I. Raicu, "Understanding the costs of many-task computing workloads on intel xeon phi coprocessors," in 2nd Greater Chicago Area System Research Workshop (GCASR), 2013.

[17]

Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde, "Swift: Fast, reliable, loosely coupled parallel computation," in Services, 2007 IEEE Congress on, 2007, pp. 199--206.

[18]

J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E. Lusk, and I. T. Foster, "Swift/T: Scalable data ow programming for many-task applications," in Proc. CCGrid, 2013.

[19]

T. G. Armstrong, J. M. Wozniak, M. Wilde, and I. T. Foster, "Compiler optimization for data-driven task parallelism on distributed memory systems," ANL/MCS-P5030--1013.

[20]

J. M. Wozniak, T. G. Armstrong, K. Maheshwari, E. L. Lusk, D. S. Katz, M. Wilde, and I. T. Foster, "Turbine: A distributed-memory data ow engine for high performance many-task applications," vol. 28, no. 3, pp. 337--366, 2013, fundamenta Informaticae 128(3).

Digital Library

[21]

NCSA, "Blue Waters User Portal," 2014, https://bluewaters.ncsa.illinois.edu/hardware-summary.

[22]

J. Burkardt, "MD - molecular dynamics," 2013, http://people.sc.fsu.edu/~jburkardt/cppsrc/md/md.html.

[23]

A. N. Adhikari, J. Peng, M. Wilde, J. Xu, K. F. Freed, and T. R. Sosnick, "Modeling large regions in proteins: Applications to loops, termini, and folding," Protein Science, vol. 21, no. 1, pp. 107--121, 2012.

[24]

S. S. Hampton, P. Brenner, A. Wenger, S. Chatterjee, and J. A. Izaguirre, "Biomolecular sampling: Algorithms,test molecules, and metrics," in New Algorithms for Macromolecular Simulation, ser. Lecture Notes in Computational Science and Engineering, B. Leimkuhler, C. Chipot, R. Elber, A. Laaksonen, A. Mark, T. Schlick, C. SchÃijtte, and R. Skeel, Eds. Springer-Verlag, New York, 2006, vol. 49, pp. 103--121.

[25]

S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun, "Accelerating CUDA graph algorithms at maximum warp," in Proc. of the 16th ACM Symp. on Principles and practice of parallel programming, ser. PPoPP '11. New York, NY, USA: ACM, 2011, pp. 267--276.

Digital Library

[26]

L. Chen, O. Villa, S. Krishnamoorthy, and G. R. Gao, "Dynamic load balancing on single-and multi-gpu systems," in IEEE Intl. Symp. on Parallel & Distributed Processing (IPDPS). IEEE, 2010.

[27]

L. Chen, O. Villa, and G. R. Gao, "Exploring fine-grained task-based execution on multi-gpu systems," in 2011 IEEE Intl. Conf. on Cluster Computing (CLUSTER). IEEE, 2011, pp. 386--394.

Digital Library

[28]

C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel, "Ptask: operating system abstractions to manage GPUs as compute devices," in Proc. of the Twenty-Third ACM Symp. on Operating Systems Principles. ACM, 2011, pp. 233--248.

Digital Library

[29]

C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly, "Dandelion: a compiler and runtime for heterogeneous systems," in Proc. of the Twenty-Fourth ACM Symp. on Operating Systems Principles. ACM, 2013, pp. 49--68.

Digital Library

[30]

V. T. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar, "Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework," in Proc. of the 20th Intl. Symp. on High performance distributed computing, ser. HPDC '11. New York, NY, USA: ACM, 2011, pp. 217--228.

Digital Library

[31]

M. Becchi, K. Sajjapongse, I. Graves, A. Procter, V. Ravi, and S. Chakradhar, "A virtual memory based runtime to support multi-tenancy in clusters with GPUs," in Proc. of the 21st Intl. Symp. on High-Performance Parallel and Distributed Computing. ACM, 2012, pp. 97--108.

Digital Library

[32]

V. Gupta, K. Schwan, N. Tolia, V. Talwar, and P. Ranganathan, "Pegasus: coordinated scheduling for virtualized accelerator-based systems," in Proc. of the 2011 USENIX Annual Technical Conf., ser. USENIXATC'11. Berkeley, CA, USA: USENIX Association, 2011, pp. 3--3.

Digital Library

[33]

C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier, "StarPU: a unified platform for task scheduling on heterogeneous multicore architectures," Concurrency and Computation: Practice and Experience, vol. 23, no. 2, pp. 187--198, 2011.

Digital Library

[34]

C. Zhang, G. Han, and C.-L. Wang, "GPU-TLS: An efficient runtime for speculative loop parallelization on gpus," in 13th IEEE/ACM Intl. Symp. on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 2013.

[35]

S. Chatterjee, M. Grossman, A. Sbîrlea, and V. Sarkar, "Dynamic task parallelism with a gpu work-stealing runtime system," in Languages and Compilers for Parallel Computing. Springer, 2013, pp. 203--217.

[36]

S. Cadambi, G. Coviello, C.-H. Li, R. Phull, K. Rao, M. Sankaradass, and S. Chakradhar, "COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors," in Proc. of the 22nd Intl. Symp. on High-performance parallel and distributed computing. ACM, 2013, pp. 215--226.

Digital Library

[37]

S. Lee and R. Eigenmann, "OpenMPC: Extended OpenMP for efficient programming and tuning on GPUs," Intl. J. of Computational Science and Eng., 2012.

Digital Library

[38]

T. R. Scogland, B. Rountree, W.-c. Feng, and B. R. de Supinski, "Heterogeneous task scheduling for accelerated OpenMP," in IEEE 26th Intl. Parallel & Distributed Processing Symp. (IPDPS). IEEE, 2012.

Digital Library

[39]

J. Meng, V. A. Morozov, K. Kumaran, V. Vishwanath, and T. D. Uram, "GROPHECY: GPU performance projection from CPU code skeletons," in Proc. of 2011 Intl. Conf. for High Performance Computing, Networking, Storage and Analysis, ser. SC '11. New York, NY, USA: ACM, 2011, pp. 14:1--14:11.

Digital Library

[40]

M. Bauer, S. Treichler, and A. Aiken, "Singe: Leveraging Warp Specialization for High Performance on GPUs," in Proc. of the 19th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, ser. PPoPP '14. New York, NY, USA: ACM, 2014.

Digital Library

[41]

A. M. Aji, L. S. Panwar, F. Ji, M. Chabbi, K. Murthy, P. Balaji, K. R. Bisset, J. Dinan, W.-c. Feng, J. Mellor-Crummey et al., "On the efficacy of GPU-integrated MPI for scientific applications," in Proc. of the 22nd Intl. Symp. on High-Performance Parallel and Distributed Computing. ACM, 2013.

Digital Library

Cited By

Kim JLee SJohnston BVetter J(2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
https://doi.org/10.1109/TPDS.2024.3429010
Kerney JRaicu IRaicu JChard K(2024)Towards Fine-Grained Parallelism in Parallel and Distributed Python Libraries2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00133(706-715)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00133
Suluhan HGener SFusco AMack JDagli IBelviranli MEdemen CAkoglu A(2024)A Runtime Manager Integrated Emulation Environment for Heterogeneous SoC Design with RISC-V Cores2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00013(23-30)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00013
Show More Cited By

Index Terms

Design and evaluation of the gemtc framework for GPU-enabled many-task computing
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators
CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster Computing

As Many-Task Computing (MTC) is becoming common-place on clusters, grids, and supercomputers, research that aims to take advantage of the new advances in hardware for MTC workloads is becoming more relevant. A good example is the design of frameworks ...
GPU Acceleration for Simulating Massively Parallel Many-Core Platforms
Emerging massively parallel architectures such as a general-purpose processor plus many-core programmable accelerators are creating an increasing demand for novel methods to perform their architectural simulation. Most state-of-the-art simulation ...
Boosting CUDA Applications with CPU---GPU Hybrid Computing

This paper presents a cooperative heterogeneous computing framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The proposed system exploits at ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing

June 2014

334 pages

ISBN:9781450327497

DOI:10.1145/2600212

General Chairs:
Beth Plale
Indiana University, USA
,
Matei Ripeanu
University of British Columbia, CA
,
Program Chairs:
Franck Cappello
Argonne National Lab and INRIA, USA
,
Dongyan Xu
Purdue University, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC'14

Sponsor:

SIGARCH

HPDC'14: The 23rd International Symposium on High-Performance Parallel and Distributed Computing

June 23 - 27, 2014

BC, Vancouver, Canada

Acceptance Rates

HPDC '14 Paper Acceptance Rate 21 of 130 submissions, 16%;

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
357
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)5

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim JLee SJohnston BVetter J(2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
https://doi.org/10.1109/TPDS.2024.3429010
Kerney JRaicu IRaicu JChard K(2024)Towards Fine-Grained Parallelism in Parallel and Distributed Python Libraries2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00133(706-715)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00133
Suluhan HGener SFusco AMack JDagli IBelviranli MEdemen CAkoglu A(2024)A Runtime Manager Integrated Emulation Environment for Heterogeneous SoC Design with RISC-V Cores2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00013(23-30)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00013
Mustafa DAlkhasawneh RObeidat FShatnawi A(2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3372990
Mack JHassan SKumbhare NCastro Gonzalez MAkoglu A(2023)CEDR: A Compiler-integrated, Extensible DSSoC RuntimeACM Transactions on Embedded Computing Systems10.1145/352925722:2(1-34)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3529257
Stockinger MGuerine Mde Paula USantiago FFrota YRosseti IPlastino Ade Oliveira D(2022)A Provenance-based Execution Strategy for Variant GPU-accelerated Scientific Workflows in CloudsJournal of Grid Computing10.1007/s10723-022-09625-y20:4Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s10723-022-09625-y
Kim JLee SJohnston BVetter J(2021)IRIS: A Portable Runtime System Exploiting Multiple Heterogeneous Programming Systems2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622873(1-8)Online publication date: 20-Sep-2021
https://doi.org/10.1109/HPEC49654.2021.9622873
Pavlidakis MMavridis SChrysos NBilas A(2020)TReM: A Task Revocation Mechanism for GPUs2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS50907.2020.00034(273-282)Online publication date: Dec-2020
https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00034
Li HTu YZeng B(2019)Concurrent query processing in a GPU-based database systemPLOS ONE10.1371/journal.pone.021472014:4(e0214720)Online publication date: 16-Apr-2019
https://doi.org/10.1371/journal.pone.0214720
Yeh TSabne ASakdhnagool PEigenmann RRogers T(2019)PagodaACM Transactions on Parallel Computing10.1145/33656576:4(1-23)Online publication date: 19-Nov-2019
https://dl.acm.org/doi/10.1145/3365657
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents