Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/PACT.2019.00032acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Exploring Memory Persistency Models for GPUs

Published: 26 November 2024 Publication History

Abstract

Given its high integration density, high speed, byte addressability, and low standby power, non-volatile or persistent memory is expected to supplement/replace DRAM as main memory. Through persistency programming models (which define durability ordering of stores) and durable transaction constructs, the programmer can provide recoverable data structure (RDS) which allows programs to recover to a consistent state after a failure. While persistency models have been well studied for CPUs, they have been neglected for graphics processing units (GPUs). Considering the importance of GPUs as a dominant accelerator for high performance computing, we investigate persistency models for GPUs.
GPU applications exhibit substantial differences with CPUs applications, hence in this paper we adapt, re-architect, and optimize CPU persistency models for GPUs. We design a pragma-based compiler scheme to express persistency models for GPUs. We identify that the thread hierarchy in GPUs offers intuitive scopes to form epochs and durable transactions. We find that undo logging produces significant performance overheads. We propose to use idempotency analysis to reduce both logging frequency and the size of logs. Through both real-system and simulation evaluations, we show low overheads of our proposed architecture support.

References

[1]
L. Spelman, "Reimagining the data center memory and storage hierarchy", Online: https://newsroom.intel.com/editorials/re-architecting-data-center-memory-storage-hierarchy/, May 2018. [Online]. Available: https://newsroom.intel.com/editorials/re-architecting-data-center-memory-storage-hierarchy/
[2]
Intel, "Intel octane technology." [Online]. Available: Online: https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html
[3]
H. Elnawawy, M. Alshboul, J. Tuck, and Y. Solihin, "Efficient checkpointing of loop-based codes for non-volatile main memory", in 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Sept 2017, pp. 318--329.
[4]
J. T. Mohammad Alshboul and Y. Solihin, "Lazy persistency: a high-performing and write-efficient software persistency technique", in Proceeding of the 45st Annual International Symposium on Computer Architecuture, ser. ISCA '18, 2018.
[5]
S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency: Semantics for byte-addressable nonvolatile memory technologies", IEEE Micro, vol. 35, no. 3, pp. 125--131, May 2015.
[6]
NVM Library Team at Intel, "Persistent memory programming", http://pmem.io.
[7]
A. Kolli, S. Pelley, A. Saidi, P. M. Chen, and T. F. Wenisch, "Highperformance transactions for persistent memories", in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '16. New York, NY, USA: ACM, 2016, pp. 399--411. [Online]. Available: http://doi.acm.org/10.1145/2872362.2872381
[8]
H. Volos, A. J. Tack, and M. M. Swift, "Mnemosyne: Lightweight persistent memory", in Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XVI. New York, NY, USA: ACM, 2011, pp. 91--104. [Online]. Available: http://doi.acm.org/10.1145/1950365.1950379
[9]
K. Zhang, K. Wang, Y. Yuan, L. Guo, R. Lee, and X. Zhang, "Mega-kv: A case for gpus to maximize the throughput of in-memory key-value stores", Proc. VLDB Endow., vol. 8, no. 11, pp. 1226--1237, Jul. 2015. [Online]. Available: https://doi.org/10.14778/2809974.2809984
[10]
M. A. Awad, S. Ashkiani, R. Johnson, M. Farach-Colton, and J. D. Owens, "Engineering a high-performance gpu b-tree", in Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '19. New York, NY, USA: ACM, 2019, pp. 145--157. [Online]. Available: http://doi.acm.org/10.1145/3293883.3295706
[11]
kinetica. [Online]. Available: Online: https://www.kinetica.com
[12]
N. Sakharnykh, "Beyond gpu memory limits with unified memory on pascal", https://devblogs.nvidia.com/beyond-gpu-memory-limits-unified-memory-pascal/, 2016.
[13]
S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency", in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ser. ISCA '14. Piscataway, NJ, USA: IEEE Press, 2014, pp. 265--276. [Online]. Available: http://dl.acm.org/citation.cfm?id=2665671.2665712
[14]
S. Nalli, S. Haria, M. D. Hill, M. M. Swift, H. Volos, and K. Keeton, "An analysis of persistent memory use with whisper", SIGOPS Oper. Syst. Rev., vol. 51, no. 2, pp. 135--148, Apr. 2017. [Online]. Available: http://doi.acm.org/10.1145/3093315.3037730
[15]
J. Menon, M. de Kruijf, and K. Sankaralingam, "igpu: Exception support and speculative execution on gpus", in 2012 39th Annual International Symposium on Computer Architecture (ISCA), June 2012, pp. 72--83.
[16]
Q. Liu, J. Izraelevitz, S. K. Lee, M. L. Scott, S. H. Noh, and C. Jung, "ido: Compiler-directed failure atomicity for nonvolatile memory", in 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct 2018, pp. 258--270.
[17]
A. Joshi, V. Nagarajan, S. Viglas, and M. Cintra, "Atom: Atomic durability in non-volatile memory through hardware logging", in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb 2017, pp. 361--372.
[18]
S. Shin, S. K. Tirukkovalluri, J. Tuck, and Y. Solihin, "Proteus: A flexible and fast software supported hardware logging approach for nvm", in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-50 '17. New York, NY, USA: ACM, 2017, pp. 178--190. [Online]. Available: http://doi.acm.org/10.1145/3123939.3124539
[19]
A. Singh, S. Aga, and S. Narayanasamy, "Efficiently enforcing strong memory ordering in gpus", in 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2015, pp. 699--712.
[20]
HSA Foundation, "Hsa programmer's reference manual: Hsail virtual isa and programming model, compiler writer, and object format (brig)", 2015.
[21]
A. Munshi, "The opencl specification (version 2.0)", Khronos OpenCL Working Group, Nov. 2013.
[22]
J. Alsop, M. S. Orr, B. M. Beckmann, and D. A. Wood, "Lazy release consistency for gpus", in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct 2016, pp. 1--14.
[23]
X. Ren and M. Lis, "Efficient sequential consistency in gpus via relativistic cache coherence", in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Feb 2017, pp. 625--636.
[24]
D. Gope, A. Basu, S. Puthoor, and M. Meswani, "A case for scoped persist barriers in gpus", in Proceedings of the 11th Workshop on General Purpose GPUs, ser. GPGPU-11. New York, NY, USA: ACM, 2018, pp. 2--12. [Online]. Available: http://doi.acm.org/10.1145/3180270.3180275
[25]
S. Kannan, N. Farooqui, A. Gavrilovska, and K. Schwan, "Heterocheckpoint: Efficient checkpointing for accelerator-based systems", in 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 2014, pp. 738--743.
[26]
"Nvidia ptx isa", http://docs.nvidia.com/cuda/parallel-thread-execution/index.html.
[27]
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing cuda workloads using a detailed gpu simulator", in 2009 IEEE International Symposium on Performance Analysis of Systems and Software, April 2009, pp. 163--174.
[28]
J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, vLi Wen Chang, N. Anssari, G. D. Liu, and W. mei W. Hwu, "Parboil: A revised benchmark suite for scientific and commercial throughput computing", IMPACT Technical Report, 2012.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques
September 2019
521 pages
ISBN:9781728136134

Sponsors

Publisher

IEEE Press

Publication History

Published: 26 November 2024

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PACT '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 5
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media