research-article

Open access

Specifying and testing GPU workgroup progress models

Authors:

Alastair F. DonaldsonAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 5, Issue OOPSLA

Article No.: 131, Pages 1 - 30

https://doi.org/10.1145/3485508

Published: 15 October 2021 Publication History

PDF eReader

Abstract

As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depends on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications (e.g. Vulkan and Metal) say almost nothing about relative forward progress guarantees between workgroups. Although prior work has proposed a spectrum of plausible progress models for GPUs, cross-vendor specifications have yet to commit to any model.

This work is a collection of tools and experimental data to aid specification designers when considering forward progress guarantees in programming frameworks. As a foundation, we formalize a small parallel programming language that captures the essence of fine-grained synchronization. We then provide a means of formally specifying a progress model, and develop a termination oracle that decides whether a given program is guaranteed to eventually terminate with respect to a given progress model. Next, we formalize a set of constraints that describe concurrent programs that require forward progress to terminate. This allows us to synthesize a large set of 483 progress litmus tests. Combined with the termination oracle, we can determine the expected status of each litmus test -- i.e. whether it is guaranteed to eventually terminate -- under various progress models. We present a large experimental campaign running the litmus tests across 8 GPUs from 5 different vendors. Our results highlight that GPUs have significantly different termination behaviors under our test suite. Most notably, we find that Apple and ARM GPUs do not support the linear occupancy-bound model, as was hypothesized by prior work.

Supplementary Material

Auxiliary Presentation Video (oopsla21main-p143-p-video.mp4)

Download
211.54 MB

References

[1]

Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. 2015. GPU Concurrency: Weak Behaviours and Programming Assumptions. In Architectural Support for Programming Languages and Operating Systems, ASPLOS. ACM. https://doi.org/10.1145/2694344.2694391

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations