research-article

Public Access

Verification of producer-consumer synchronization in GPU programs

Authors:

Alex AikenAuthors Info & Claims

ACM SIGPLAN Notices, Volume 50, Issue 6

Pages 88 - 98

https://doi.org/10.1145/2813885.2737962

Published: 03 June 2015 Publication History

Abstract

Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware. In this work we present the first formal operational semantics for named barriers and define what it means for a warp-specialized kernel to be correct. We give algorithms for verifying the correctness of warp-specialized kernels and prove that they are both sound and complete for the most common class of warp-specialized programs. We also present WEFT, a verification tool for checking warp-specialized code. Using WEFT, we discover several non-trivial bugs in production warp-specialized kernels.

References

[1]

Parallel thread execution ISA version 4.1. http://docs.nvidia. com/cuda/parallel-thread-execution/index.html, 2014.

[2]

A. Aiken and D. Gay. Barrier inference. In POPL, pages 342–354, 1998.

Digital Library

[3]

J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: Weak behaviours and programming assumptions. In ASPLOS, pages 577– 591, 2015.

Digital Library

[4]

E. Bardsley and A. F. Donaldson. Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In NFM, pages 230–245, 2014.

Digital Library

[5]

E. Bardsley, A. Betts, N. Chong, P. Collingbourne, P. Deligiannis, A. F. Donaldson, J. Ketema, D. Liew, and S. Qadeer. Engineering a static verification tool for GPU kernels. In CAV, pages 226–242, 2014.

Digital Library

[6]

M. Bauer, H. Cook, and B. Khailany. CudaDMA: optimizing GPU memory bandwidth via warp specialization. SC ’11, 2011.

Digital Library

[7]

M. Bauer, S. Treichler, and A. Aiken. Singe: Leveraging warp specialization for high performance on GPUs. PPoPP ’14, 2014.

Digital Library

[8]

A. Betts, N. Chong, A. F. Donaldson, S. Qadeer, and P. Thomson. GPUVerify: a verifier for GPU kernels. In OOPSLA, pages 113–132, 2012.

Digital Library

[9]

J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science and Discovery, page 015001, 2009.

[10]

W. Chiang, G. Gopalakrishnan, G. Li, and Z. Rakamaric. Formal analysis of GPU programs with atomics via conflict-directed delaybounding. In NFM, pages 213–228, 2013.

[11]

N. Chong, A. F. Donaldson, P. H. J. Kelly, J. Ketema, and S. Qadeer. Barrier invariants: a shared state abstraction for the analysis of datadependent GPU kernels. In OOPSLA, pages 605–622, 2013.

Digital Library

[12]

N. Chong, A. F. Donaldson, and J. Ketema. A sound and complete abstraction for reasoning about parallel prefix sums. In POPL, pages 397–410, 2014.

Digital Library

[13]

P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic crosschecking of floating-point and SIMD code. In EuroSys, pages 315–328, 2011.

Digital Library

[14]

S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. Gpus and the future of parallel computing. IEEE Micro, 31(5):7–17, 2011.

Digital Library

[15]

Khronos. The OpenCL Specification, Version 1.0. The Khronos OpenCL Working Group, December 2008.

[16]

L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558–565, 1978.

Digital Library

[17]

D.-K. Le, W.-N. Chin, and Y. M. Teo. Verification of static and dynamic barrier synchronization using bounded permissions. In ICFEM, pages 231–248, 2013.

[18]

A. Leung, M. Gupta, Y. Agarwal, R. Gupta, R. Jhala, and S. Lerner. Verifying GPU kernels by test amplification. In PLDI, pages 383–394, 2012.

Digital Library

[19]

G. Li and G. Gopalakrishnan. Scalable smt-based verification of GPU kernel functions. In FSE, pages 187–196, 2010.

Digital Library

[20]

G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: concolic verification and test generation for GPUs. In PPOPP, pages 215–224, 2012.

Digital Library

[21]

P. Li, G. Li, and G. Gopalakrishnan. Parametric flows: automated behavior equivalencing for symbolic analysis of races in CUDA programs. In SC, page 29, 2012.

Digital Library

[22]

NVIDIA. CUDA programming guide 6.0. http://developer. download.nvidia.com/compute/DevZone/docs/html/C/doc/ CUDA_C_Programming_Guide.pdf, August 2014.

[23]

Oracle. Class phaser. https://docs.oracle.com/javase/7/ docs/api/java/util/concurrent/Phaser.html, 2014.

[24]

S. G. Parker, J. Bigler, A. Dietrich, H. Friedrich, J. Hoberock, D. Luebke, D. McAllister, M. McGuire, K. Morley, A. Robison, and M. Stich. OptiX: A general purpose ray tracing engine. ACM Transactions on Graphics, August 2010.

Digital Library

[25]

G. Ramalingam. Context-sensitive synchronization-sensitive analysis is undecidable. ACM Trans. Program. Lang. Syst., 22(2):416–430, 2000.

Digital Library

[26]

Top500. Top 500 supercomputers. http://www.top500.org, 2014.

Cited By

Luz HSouza PSouza S(2024)Structural testing for CUDA programming modelConcurrency and Computation: Practice and Experience10.1002/cpe.810536:14Online publication date: 9-Apr-2024
https://doi.org/10.1002/cpe.8105
Huisman MSeceleanu C(2022)Verification and Validation of Concurrent and Distributed Heterogeneous Systems (Track Summary)Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles10.1007/978-3-031-19849-6_24(417-421)Online publication date: 17-Oct-2022
https://doi.org/10.1007/978-3-031-19849-6_24
Alghamdi AEassa FKhamakhem MAL-Ghamdi AAlfakeeh AAlshahrani AAlarood A(2020)Parallel Hybrid Testing Techniques for the Dual-Programming Models-Based ProgramsSymmetry10.3390/sym1209155512:9(1555)Online publication date: 20-Sep-2020
https://doi.org/10.3390/sym12091555
Show More Cited By

Index Terms

Verification of producer-consumer synchronization in GPU programs

Recommendations

The Design and Implementation of a Verification Technique for GPU Kernels

We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous, ...
Verification of producer-consumer synchronization in GPU programs
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized ...
GPUVerify: a verifier for GPU kernels
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 50, Issue 6

PLDI '15

June 2015

630 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2813885

Editor:
Andy Gill
University of Kansas, Lawrence, KS

Issue’s Table of Contents

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2015
630 pages
ISBN:9781450334686
DOI:10.1145/2737924
General Chair:
David Grove
IBM Research, USA
,
Program Chair:
Steve Blackburn
Australian National University, Australia

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2015

Published in SIGPLAN Volume 50, Issue 6

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,122
Total Downloads

Downloads (Last 12 months)252
Downloads (Last 6 weeks)24

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Luz HSouza PSouza S(2024)Structural testing for CUDA programming modelConcurrency and Computation: Practice and Experience10.1002/cpe.810536:14Online publication date: 9-Apr-2024
https://doi.org/10.1002/cpe.8105
Huisman MSeceleanu C(2022)Verification and Validation of Concurrent and Distributed Heterogeneous Systems (Track Summary)Leveraging Applications of Formal Methods, Verification and Validation. Verification Principles10.1007/978-3-031-19849-6_24(417-421)Online publication date: 17-Oct-2022
https://doi.org/10.1007/978-3-031-19849-6_24
Alghamdi AEassa FKhamakhem MAL-Ghamdi AAlfakeeh AAlshahrani AAlarood A(2020)Parallel Hybrid Testing Techniques for the Dual-Programming Models-Based ProgramsSymmetry10.3390/sym1209155512:9(1555)Online publication date: 20-Sep-2020
https://doi.org/10.3390/sym12091555
Eassa FAlghamdi AHaridi SKhemakhem MAl-Ghamdi AAlsolami E(2020)ACC_TEST: Hybrid Testing Approach for OpenACC-Based ProgramsIEEE Access10.1109/ACCESS.2020.29910098(80358-80368)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2991009
Alghamdi AEassa F(2019)OpenACC Errors Classification and Static Detection TechniquesIEEE Access10.1109/ACCESS.2019.29354987(113235-113253)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2935498
Cogumbreiro THu RMartins FYoshida N(2018)Dynamic Deadlock Verification for General Barrier SynchronisationACM Transactions on Programming Languages and Systems10.1145/322906041:1(1-38)Online publication date: 11-Dec-2018
https://doi.org/10.1145/3229060
Mustafa DAlkhasawneh RObeidat FShatnawi A(2024)MIMD Programs Execution Support on SIMD Machines: A Holistic SurveyIEEE Access10.1109/ACCESS.2024.337299012(34354-34377)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3372990
Costa GGalletta LDegano PBasin DBodei C(2020)Natural Projection as Partial Model CheckingJournal of Automated Reasoning10.1007/s10817-020-09568-7Online publication date: 13-Aug-2020
https://doi.org/10.1007/s10817-020-09568-7
van den Haak LWijs Avan den Brand MHuisman M(2020)Formal Methods for GPGPU Programming: Is the Demand Met?Integrated Formal Methods10.1007/978-3-030-63461-2_9(160-177)Online publication date: 13-Nov-2020
https://doi.org/10.1007/978-3-030-63461-2_9
Cogumbreiro THu RMartins FYoshida N(2018)Dynamic Deadlock Verification for General Barrier SynchronisationACM Transactions on Programming Languages and Systems10.1145/322906041:1(1-38)Online publication date: 11-Dec-2018
https://dl.acm.org/doi/10.1145/3229060
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents