Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3297858.3304043acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

A Formal Analysis of the NVIDIA PTX Memory Consistency Model

Published: 04 April 2019 Publication History

Abstract

This paper presents the first formal analysis of the official memory consistency model for the NVIDIA PTX virtual ISA. Like other GPU memory models, the PTX memory model is weakly ordered but provides scoped synchronization primitives that enable GPU program threads to communicate through memory. However, unlike some competing GPU memory models, PTX does not require data race freedom, and this results in PTX using a fundamentally different (and more complicated) set of rules in its memory model. As such, PTX has a clear need for a rigorous and reliable memory model testing and analysis infrastructure. We break our formal analysis of the PTX memory model into multiple steps that collectively demonstrate its rigor and validity. First, we adapt the English language specification from the public PTX documentation into a formal axiomatic model. Second, we derive an up-to-date presentation of an OpenCL-like scoped C++ model and develop a mapping from the synchronization primitives of that scoped C++ model onto PTX. Third, we use the Alloy relational modeling tool to empirically test the correctness of the mapping. Finally, we compile the model and mapping into Coq and build a full machine-checked proof that the mapping is sound for programs of any size. Our analysis demonstrates that in spite of issues in previous generations, the new NVIDIA PTX memory model is suitable as a sound compilation target for GPU programming languages such as CUDA.

References

[1]
2017. CppMem: Interactive C/C++ Memory Model. http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem.
[2]
2017. diy, Release Seven.http://diy.inria.fr/.
[3]
2017. Lem, a Tool for Lightweight Executable Mathematics.http://www.cl.cam.ac.uk/~pes20/lem.
[4]
2017. The Coq Proof Assistant. https://coq.inria.fr.
[5]
2017. The HOL Interactive Theorem Prover. https://hol-theorem-prover.org.
[6]
Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. 2015. GPU Concurrency: Weak Behaviours and Programming Assumptions. In 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[7]
Jade Alglave and Luc Maranget. 2016. Towards a Formalization of the HSA Memory Model in the cat Language. Technical Report. HSA Foundation Specification Version 1.1. URL: http://www.hsafoundation.com/?ddownload=5381.
[8]
Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2010. Fences in Weak Memory Models. In 22nd International Conference on Computer Aided Verification (CAV).
[9]
Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory. ACM Transanctions on Programming Languages and Systems (TOPLAS) 36, 2 (July 2014), 7:1--7:74.
[10]
Konstantine Arkoudas, Sarfraz Khurshid, Darko Marinov, and Martin Rinard. 2003. Integrating Model Checking and Theorem Proving for Relational Reasoning. International Conference on Relational Methods in Computer Science (RelMiCS).
[11]
ARM. 2011. Cortex-A9 MPCore?, Programmer Advice Notice, Read-after-Read Hazards. Technical Report. URL: http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf.
[12]
ARM Holdings. 2016. ARM Architecture Reference Manuals. Technical Report. URL: http://infocenter.arm.com/help/topic/com.arm.doc.set.architecture.
[13]
Mark Batty, Alastair F. Donaldson, and John Wickerson. 2016. Over-hauling SC Atomics in C11 and OpenCL. In 43rd Annual Symposiumon Principles of Programming Languages (POPL).
[14]
Mark Batty, Kayvan Memarian, Scott Owens, Susmit Sarkar, and Peter Sewell. 2012. Clarifying and Compiling C/C++ Concurrency: from C++11 to POWER.39th Symposium on Principles of Programming Languages (POPL) (2012).
[15]
Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. 2011. Mathematizing C++ Concurrency. 38th Symposium on Principles of Programming Languages (POPL)(2011).
[16]
Jasmin Christian Blanchette and Tobias Nipkow. 2010. Nitpick: A Counterexample Generator for Higher-Order Logic Based on a Relational Model Finder. 23rd International Conference on Interactive Theorem Proving (ITP).
[17]
Jasmin Christian Blanchette, Tjark Weber, Mark Batty, Scott Owens,and Susmit Sarkar. 2011. Nitpicking C++ Concurrency. 13th International Symposium on Principles and Practice of Declarative Programming (PPDP).
[18]
Hans-J. Boehm and Brian Demsky. 2014. Outlawing Ghosts: Avoiding Out-of-thin-air Results. In Workshop on Memory Systems Performance and Correctness (MSPC).
[19]
Hans-J. Boehm, Olivier Giroux, Viktor Vafeiadis, and with in-put from Will Deacon, Doug Lea, Daniel Lustig, Paul McKenney and others {sic}. 2018. P0668R3: Revising the C++ memory model. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r3.html.ISO/JTC1/SC22/WG21(2018).
[20]
Adam Chlipala. 2013.Certified Programming with Dependent Types: a Pragmatic Introduction to the Coq Proof Assistant. MIT Press.
[21]
Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis, Luc Maranget, Kathryn E Gray, Ali Sezgin, Mark Batty, and Peter Sewell. 2017. Mixed-size Concurrency: ARM, POWER, C/C++11, and SC. (2017).
[22]
HSA Foundation. 2017. Heterogeneous System Architecture. http://www.hsafoundation.com/standards.
[23]
Aboubakr Achraf El Ghazi and Mana Taghdiri. 2011. Relational Reasoning via SMT Solving. International Symposium on Formal Methods (FM).
[24]
Khronos Group. 2017. OpenCL 2.2.https://www.khronos.org/opencl.
[25]
Mark Harris. 2017. Unified Memory for CUDA Beginners.NVIDIA Developer Blog (2017). https://devblogs.nvidia.com/unified-memory-cuda-beginners.
[26]
Mark Harris and Kyrylo Perelygin. 2017. Cooperative Groups: Flexible CUDA Thread Programming. NVIDIA Developer Blog (2017). https://devblogs.nvidia.com/cooperative-groups.
[27]
Derek R. Hower, Blake A. Hechtman, Bradford M. Beckmann, Benedict R. Gaster, Mark D. Hill, Steven K. Reinhardt, and David A. Wood. 2014. Heterogeneous-race-free Memory Models. 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[28]
International Organization for Standardization (ISO). 2011. Information technology -- Programming languages -- C++, ISO/IEC 14882:2011. Technical Report.
[29]
International Organization for Standardization (ISO). 2011. Information technology -- Programming languages -- C, ISO/IEC 9899: 2011. Technical Report.
[30]
Daniel Jackson. 2002. Alloy: A Lightweight Object Modelling Notation. In ACM Transactions on Software Engineering and Methodology(TOSEM), Vol. 11. URL: http://alloy.mit.edu.
[31]
Khronos Group. 2015. The OpenCL Specification, Version 2.2. Technical Report. URL: https://www.khronos.org/registry/cl/specs/opencl-2.2.pdf.
[32]
Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. 2017. Repairing Sequential Consistency in C/C++11. (2017).
[33]
Daniel Lustig, Michael Pellauer, and Margaret Martonosi. 2014. PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models. 47th Annual International Symposium on Microarchitecture (MICRO) (2014).
[34]
Daniel Lustig, Sameer Sahasrabuddhe, and Olivier Giroux. 2017. Supplemental material. https://github.com/nvlabs/ptxmemorymodel.
[35]
Daniel Lustig, Andrew Wright, Alexandros Papakonstantinou, andOlivier Giroux. 2017. Automated Synthesis of Comprehensive Memory Model Litmus Test Suites. 22nd International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS) (2017).
[36]
Yatin A. Manerkar, Caroline Trippel, Daniel Lustig, Michael Pellauer,and Margaret Martonosi. 2016. Counterexamples and Proof Loop-hole for the C/C++ to POWER and ARMv7 Trailing-sync Compiler Mappings. arXiv1611.01507v2 (Nov 2016).
[37]
Baoluo Meng, Andrew Reynolds, Cesare Tinelli, and Clark Barrett. 2017. Relational Constraint Solving in SMT. 26th International Conference on Automated Deduction (CADE).
[38]
Mariano M. Moscato, Carlos G. Lopez Pombo, and Marcelo F. Frias. 2014. Dynamite: a Tool for the Verification of Alloy Models based on PVS. ACM Transactions on Software Engineering and Methodology (TOSEM) 23. Issue 2.
[39]
NVIDIA. 2017. CUDA C Programming Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide.
[40]
NVIDIA. 2017. Parallel Thread Execution ISA Version 6.0. http://docs.nvidia.com/cuda/parallel-thread-execution/index.html.
[41]
NVIDIA. 2018. NVIDIA Tesla V100 GPU Architecture: The World's MostAdvanced Data Center GPU. Technical Report.
[42]
Oracle. 2017. Java Language and Virtual Machine Specifications. https://docs.oracle.com/javase/specs/.
[43]
Peizhao Ou and Brian Demsky. 2018. Towards Understanding the Costs of Avoiding Out-of-Thin-Air Results. Object-Oriented Programming, Systems, Languages & Applications Conference (OOPSLA) (2018).
[44]
Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A Better x86Memory Model: x86-TSO. In 22nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs).
[45]
Gustavo Petri, Jan Vitek, and Suresh Jagannathan. 2015. Cookingthe Books: Formalizing JMM Implementation Recipes. 29th European Conference on Object-Oriented Programming (ECOOP)(2015).
[46]
Power.org. 2013. Power ISA? Version 2.07. Technical Report. URL:https://www.power.org/wp-content/uploads/2013/05/PowerISA_V2.07_PUBLIC.pdf.
[47]
Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell. 2018. Simplifying ARM Concurrency: Multicopy-Atomic Axiomatic and Operational Models for ARMv8. 45th Symposium on Principles of Programming Languages (POPL).
[48]
Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams. 2011. Understanding POWER Multiprocessors. In 32nd Conference on Programming Language Design and Implementation (PLDI).
[49]
Matthew D. Sinclair, Johnathan Alsop, and Sarita V. Adve. 2015. Efficient GPU Synchronization without Scopes: Saying No to Complex Consistency Models. In 48th Annual International Symposium on Microarchitecture (MICRO).
[50]
Matthew D. Sinclair, Johnathan Alsop, and Sarita V. Adve. 2017. Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems. 44th Annual International Symposium on Computer Architecture (ISCA) (2017).
[51]
Tyler Sorensen and Alastair F. Donaldson. 2016. Exposing Errors Related to Weak Memory in GPU Applications. In 37th Conference on Programming Language Design and Implementation (PLDI).
[52]
Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2011. A Primeron Memory Consistency and Cache Coherence. Morgan & Claypool Publishers.
[53]
SPARC International. 1993. The SPARC Architecture Manual, Version 9. Technical Report.
[54]
Emina Torlak and Daniel Jackson. 2007. Kodkod: A Relational Model Finder. In 13th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS).
[55]
Caroline Trippel, Yatin A. Manerkar, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. 2017. TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA. 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)(2017).
[56]
Mattias Ulbrich, Ulrich Geilmann, Aboubakr Achraf El Ghazi, and Mana Taghdiri. 2012. A Proof Assistant for Alloy Specifications. International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS).
[57]
Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. Common Compiler Optimisations Are Invalid in the C11 Memory Model and What We Can Do About It. In 42nd Symposium on Principles of Programming Languages (POPL).
[58]
John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constantinides. 2017. Automatically Comparing Memory Consistency Models. 44th Symposium on Principles of Programming Languages(POPL) (2017).

Cited By

View all
  • (2024)GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00011(1-16)Online publication date: 29-Jun-2024
  • (2024)Energy-Efficient CNN Inferencing on GPUs with Dynamic Frequency ScalingInnovations in Data Analytics10.1007/978-981-97-3466-5_28(375-389)Online publication date: 21-Jul-2024
  • (2023)Compound Memory ModelsProceedings of the ACM on Programming Languages10.1145/35912677:PLDI(1145-1168)Online publication date: 6-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
April 2019
1126 pages
ISBN:9781450362405
DOI:10.1145/3297858
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPUs
  2. SAT solving
  3. memory consistency models
  4. model finding
  5. theorem proving

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '19

Acceptance Rates

ASPLOS '19 Paper Acceptance Rate 74 of 351 submissions, 21%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,218
  • Downloads (Last 6 weeks)177
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00011(1-16)Online publication date: 29-Jun-2024
  • (2024)Energy-Efficient CNN Inferencing on GPUs with Dynamic Frequency ScalingInnovations in Data Analytics10.1007/978-981-97-3466-5_28(375-389)Online publication date: 21-Jul-2024
  • (2023)Compound Memory ModelsProceedings of the ACM on Programming Languages10.1145/35912677:PLDI(1145-1168)Online publication date: 6-Jun-2023
  • (2023)MC Mutants: Evaluating and Improving Testing for Memory Consistency SpecificationsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575750(473-488)Online publication date: 27-Jan-2023
  • (2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
  • (2023)Taking Back Control in an Intermediate Representation for GPU ComputingProceedings of the ACM on Programming Languages10.1145/35712537:POPL(1740-1769)Online publication date: 11-Jan-2023
  • (2023)Improving the Scalability of GPU Synchronization PrimitivesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.321850834:1(275-290)Online publication date: 1-Jan-2023
  • (2022)A GPU Multiversion B-TreeProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569681(481-493)Online publication date: 8-Oct-2022
  • (2022)GPM: leveraging persistent memory from a GPUProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507758(142-156)Online publication date: 28-Feb-2022
  • (2022)Mixed-proxy extensions for the NVIDIA PTX memory consistency modelProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3533045(1058-1070)Online publication date: 18-Jun-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media