research-article

Exploring Memory Persistency Models for GPUs

Authors:

Mohammad Alshboul,

Huiyang ZhouAuthors Info & Claims

PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

Pages 310 - 322

https://doi.org/10.1109/PACT.2019.00032

Published: 26 November 2024 Publication History

Abstract

Given its high integration density, high speed, byte addressability, and low standby power, non-volatile or persistent memory is expected to supplement/replace DRAM as main memory. Through persistency programming models (which define durability ordering of stores) and durable transaction constructs, the programmer can provide recoverable data structure (RDS) which allows programs to recover to a consistent state after a failure. While persistency models have been well studied for CPUs, they have been neglected for graphics processing units (GPUs). Considering the importance of GPUs as a dominant accelerator for high performance computing, we investigate persistency models for GPUs.

GPU applications exhibit substantial differences with CPUs applications, hence in this paper we adapt, re-architect, and optimize CPU persistency models for GPUs. We design a pragma-based compiler scheme to express persistency models for GPUs. We identify that the thread hierarchy in GPUs offers intuitive scopes to form epochs and durable transactions. We find that undo logging produces significant performance overheads. We propose to use idempotency analysis to reduce both logging frequency and the size of logs. Through both real-system and simulation evaluations, we show low overheads of our proposed architecture support.

References

[1]

L. Spelman, "Reimagining the data center memory and storage hierarchy", Online: https://newsroom.intel.com/editorials/re-architecting-data-center-memory-storage-hierarchy/, May 2018. [Online]. Available: https://newsroom.intel.com/editorials/re-architecting-data-center-memory-storage-hierarchy/

[2]

Intel, "Intel octane technology." [Online]. Available: Online: https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html

[3]

H. Elnawawy, M. Alshboul, J. Tuck, and Y. Solihin, "Efficient checkpointing of loop-based codes for non-volatile main memory", in 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Sept 2017, pp. 318--329.

[4]

J. T. Mohammad Alshboul and Y. Solihin, "Lazy persistency: a high-performing and write-efficient software persistency technique", in Proceeding of the 45st Annual International Symposium on Computer Architecuture, ser. ISCA '18, 2018.

[5]

S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency: Semantics for byte-addressable nonvolatile memory technologies", IEEE Micro, vol. 35, no. 3, pp. 125--131, May 2015.

Digital Library

[6]

NVM Library Team at Intel, "Persistent memory programming", http://pmem.io.

[7]

A. Kolli, S. Pelley, A. Saidi, P. M. Chen, and T. F. Wenisch, "Highperformance transactions for persistent memories", in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '16. New York, NY, USA: ACM, 2016, pp. 399--411. [Online]. Available: http://doi.acm.org/10.1145/2872362.2872381

Digital Library

[8]

H. Volos, A. J. Tack, and M. M. Swift, "Mnemosyne: Lightweight persistent memory", in Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XVI. New York, NY, USA: ACM, 2011, pp. 91--104. [Online]. Available: http://doi.acm.org/10.1145/1950365.1950379

Digital Library

[9]

K. Zhang, K. Wang, Y. Yuan, L. Guo, R. Lee, and X. Zhang, "Mega-kv: A case for gpus to maximize the throughput of in-memory key-value stores", Proc. VLDB Endow., vol. 8, no. 11, pp. 1226--1237, Jul. 2015. [Online]. Available: https://doi.org/10.14778/2809974.2809984

Digital Library

[10]

M. A. Awad, S. Ashkiani, R. Johnson, M. Farach-Colton, and J. D. Owens, "Engineering a high-performance gpu b-tree", in Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '19. New York, NY, USA: ACM, 2019, pp. 145--157. [Online]. Available: http://doi.acm.org/10.1145/3293883.3295706

Digital Library

[11]

kinetica. [Online]. Available: Online: https://www.kinetica.com

[12]

N. Sakharnykh, "Beyond gpu memory limits with unified memory on pascal", https://devblogs.nvidia.com/beyond-gpu-memory-limits-unified-memory-pascal/, 2016.

[13]

S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency", in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ, USA: IEEE Press, 2014, pp. 265--276. [Online]. Available: http://dl.acm.org/citation.cfm?id=2665671.2665712

Digital Library

[14]

S. Nalli, S. Haria, M. D. Hill, M. M. Swift, H. Volos, and K. Keeton, "An analysis of persistent memory use with whisper", SIGOPS Oper. Syst. Rev., vol. 51, no. 2, pp. 135--148, Apr. 2017. [Online]. Available: http://doi.acm.org/10.1145/3093315.3037730

[15]

J. Menon, M. de Kruijf, and K. Sankaralingam, "igpu: Exception support and speculative execution on gpus", in 2012 39th Annual International Symposium on Computer Architecture (ISCA), June 2012, pp. 72--83.

[16]

Q. Liu, J. Izraelevitz, S. K. Lee, M. L. Scott, S. H. Noh, and C. Jung, "ido: Compiler-directed failure atomicity for nonvolatile memory", in 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct 2018, pp. 258--270.

Digital Library

[17]

A. Joshi, V. Nagarajan, S. Viglas, and M. Cintra, "Atom: Atomic durability in non-volatile memory through hardware logging", in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb 2017, pp. 361--372.

[18]

S. Shin, S. K. Tirukkovalluri, J. Tuck, and Y. Solihin, "Proteus: A flexible and fast software supported hardware logging approach for nvm", in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-50 '17. New York, NY, USA: ACM, 2017, pp. 178--190. [Online]. Available: http://doi.acm.org/10.1145/3123939.3124539

Digital Library

[19]

A. Singh, S. Aga, and S. Narayanasamy, "Efficiently enforcing strong memory ordering in gpus", in 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2015, pp. 699--712.

Digital Library

[20]

HSA Foundation, "Hsa programmer's reference manual: Hsail virtual isa and programming model, compiler writer, and object format (brig)", 2015.

[21]

A. Munshi, "The opencl specification (version 2.0)", Khronos OpenCL Working Group, Nov. 2013.

[22]

J. Alsop, M. S. Orr, B. M. Beckmann, and D. A. Wood, "Lazy release consistency for gpus", in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct 2016, pp. 1--14.

[23]

X. Ren and M. Lis, "Efficient sequential consistency in gpus via relativistic cache coherence", in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb 2017, pp. 625--636.

[24]

D. Gope, A. Basu, S. Puthoor, and M. Meswani, "A case for scoped persist barriers in gpus", in Proceedings of the 11th Workshop on General Purpose GPUs, ser. GPGPU-11. New York, NY, USA: ACM, 2018, pp. 2--12. [Online]. Available: http://doi.acm.org/10.1145/3180270.3180275

Digital Library

[25]

S. Kannan, N. Farooqui, A. Gavrilovska, and K. Schwan, "Heterocheckpoint: Efficient checkpointing for accelerator-based systems", in 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 2014, pp. 738--743.

Digital Library

[26]

"Nvidia ptx isa", http://docs.nvidia.com/cuda/parallel-thread-execution/index.html.

[27]

A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing cuda workloads using a detailed gpu simulator", in 2009 IEEE International Symposium on Performance Analysis of Systems and Software, April 2009, pp. 163--174.

[28]

J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, vLi Wen Chang, N. Anssari, G. D. Liu, and W. mei W. Hwu, "Parboil: A revised benchmark suite for scientific and commercial throughput computing", IMPACT Technical Report, 2012.

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

Memory persistency
ISCA '14

Emerging nonvolatile memory technologies (NVRAM) promise the performance of DRAM with the persistence of disk. However, constraining NVRAM write order, necessary to ensure recovery correctness, limits NVRAM write concurrency and degrades throughput. We ...
Memory persistency
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture

Emerging nonvolatile memory technologies (NVRAM) promise the performance of DRAM with the persistence of disk. However, constraining NVRAM write order, necessary to ensure recovery correctness, limits NVRAM write concurrency and degrades throughput. We ...
Scoped Buffered Persistency Model for GPUs
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

While the implications of persistent memory (PM) on CPU hardware and software are well-explored, the same is not true for GPUs (Graphics Processing Units). A recent work, GPM, demonstrated how GPU programs can benefit from the fine-grain persistence ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

September 2019

521 pages

ISBN:9781728136134

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\DATC: IEEE Computer Society

Publisher

IEEE Press

Publication History

Published: 26 November 2024

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

PACT '19

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE-CS\DATC

PACT '19: International Conference on Parallel Architectures and Compilation Techniques

September 23 - 26, 2019

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
5
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten