Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory

Published: 29 May 2019 Publication History

Abstract

Future main memory will likely include Non-Volatile Memory. Non-Volatile Main Memory (NVMM) provides an opportunity to rethink checkpointing strategies for providing failure safety to applications. While there are many checkpointing and logging schemes in the literature, their use must be revisited as they incur high execution time overheads as well as a large number of additional writes to NVMM, which may significantly impact write endurance.
In this article, we propose a novel recompute-based failure safety approach and demonstrate its applicability to loop-based code. Rather than keeping a fully consistent logging state, we only log enough state to enable recomputation. Upon a failure, our approach recovers to a consistent state by determining which parts of the computation were not completed and recomputing them. Effectively, our approach removes the need to keep checkpoints or logs, thus reducing execution time overheads and improving NVMM write endurance at the expense of more complex recovery. We compare our new approach against logging and checkpointing on five scientific workloads, including tiled matrix multiplication, on a computer system model that was built on gem5 and supports Intel PMEM instruction extensions. For tiled matrix multiplication, our recompute approach incurs an execution time overhead of only 5%, in contrast to 8% overhead with logging and 207% overhead with checkpointing. Furthermore, recompute only adds 7% additional NVMM writes, compared to 111% with logging and 330% with checkpointing. We also conduct experiments on real hardware, allowing us to run our workloads to completion while varying the number of threads used for computation. These experiments substantiate our simulation-based observations and provide a sensitivity study and performance comparison between the Recompute Scheme and Naive Checkpointing.

References

[1]
2016. Ruby Memory System. Retrieved from http://gem5.org/Ruby.
[2]
Song Ho Ahn. 2005. Convolution. Retrieved from http://www.songho.ca/dsp/convolution/convolution.html.
[3]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. IEEE J. 98, 12 (2010), 2237--2251.
[4]
M. Alshboul, J. Tuck, and Y. Solihin. 2018. Lazy persistency: A high-performing and write-efficient software persistency technique. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 439--451.
[5]
Amro Awad, Sergey Blagodurov, and Yan Solihin. 2016. Write-aware management of NVM-based memory extensions. In Proceedings of the 2016 International Conference on Supercomputing (ICS’16).
[6]
Amro Awad, Brett Kettering, and Yan Solihin. 2015. Non-volatile memory host controller interface performance analysis in high-performance I/O systems. In Proceedings of International Symposium on Performance Analysis of Systems and Software (ISPASS).
[7]
Amro Awad, Pratyusa Manadhata, Stuart Haber, Yan Solihin, and William Horne. 2016. Silent shredder: Zero-cost shredding for secure non-volatile main memory controllers. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).
[8]
Amro Awad, Yipeng Wang, Deborah Shands, and Yan Solihin. 2017. ObfusMem: A low-overhead access obfuscation for trusted memories. In Proceedings of the International Symposium on Computer Architecture (ISCA).
[9]
F. Bedeschi, et al. 2004. An 8Mb demonstrator for high-density 1.8V phase-change memories. In Proceedings of the International Symposium on VLSI Circuits.
[10]
Brian N. Bershad, David D. Redell, and John R. Ellis. 1992. Fast mutual exclusion for uniprocessors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V).
[11]
N. Binkert, et al. 2011. The GEM5 simulator. ACM SIGARCH Computer Architecture News (CAN) (2011).
[12]
G. Bronevetsky, D. Marques, K. Pingali, P. K. Szwed, and M. Schulz. 2004. Application-level checkpointing for shared memory programs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[13]
Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging locks for non-volatile memory consistency. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages 8 Applications (OOPSLA’14).
[14]
Andreas Chatzistergiou, Marcelo Cintra, and Stratis D. Viglas. 2015. REWIND: Recovery write-ahead system for in-memory non-volatile data-structures. Proc. VLDB Endow. 8, 5 (Jan. 2015), 497--508.
[15]
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-heaps: Making persistent objects fast and safe with next-generation non-volatile memories. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[16]
J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the Symposium on Operating Systems Principles (SOSP).
[17]
Intel Corp. 2016. Intel 64 and IA-32 Architectures Developer’s Manual: Vol. 3A.
[18]
Marc de Kruijf and Karthikeyan Sankaralingam. 2011. Idempotent processor architecture. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44).
[19]
Marc de Kruijf and Karthikeyan Sankaralingam. 2011. Idempotent processor architecture. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, 140--151.
[20]
Marc de Kruijf and Karthikeyan Sankaralingam. 2013. Idempotent code generation: Implementation, analysis, and evaluation. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (CGO’13).
[21]
Marc A. de Kruijf, Karthikeyan Sankaralingam, and Somesh Jha. 2012. Static analysis and compiler design for idempotent processing. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12).
[22]
Xiangyu Dong, Naveen Muralimanohar, Norm Jouppi, Richard Kaufmann, and Yuan Xie. 2009. Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems. In Proceedings of the International Conference on High Performance Computing Networking, Storage and Analysis (SC).
[23]
H. Elnawawy, M. Alshboul, J. Tuck, and Y. Solihin. 2017. Efficient checkpointing of loop-based codes for non-volatile main memory. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 318--329.
[24]
Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M. Chen, and Thomas F. Wenisch. 2018. Persistency for synchronization-free regions. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 46--61.
[25]
Terry Ching-Hsiang Hsu, Helge Brügner, Indrajit Roy, Kimberly Keeton, and Patrick Eugster. 2017. NVthreads: Practical persistence for multi-threaded applications. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17).
[26]
Dewan Ibtesham, Kurt B. Ferreira, and Dorian Arnold. 2015. A checkpoint compression study for high-performance computing systems. Int. J. High Perform. Comput. Appl. 29, 4 (2015), 387--402.
[27]
Intel. 2016. Persistent Memory Programming. Retrieved from http://pmem.io.
[28]
Intel and Micron. 2015. Intel and Micron Produce Breakthrough Memory Technology.
[29]
Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. 2016. Failure-atomic persistent memory updates via JUSTDO logging. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16).
[30]
Yangqing Jia. 2014. Learning Semantic Image Representations at a Large Scale. Ph.D. Dissertation.
[31]
Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, and Stratis Viglas. 2015. Efficient persist barriers for multicores. In Proceedings of International Symposium on Microarchitecture (Micro).
[32]
A. Joshi, V. Nagarajan, S. Viglas, and M. Cintra. 2017. ATOM: Atomic durability in non-volatile memory through hardware logging. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[33]
S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. 2013. Optimizing checkpoints using NVM as virtual memory. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS).
[34]
T. Kawahara, R. Takemura, K. Miura, J. Hayakawa, S. Ikeda, Y. Lee, R. Sasaki, Y. Goto, K. Ito, T. Meguro, F. Matsukura, H. Takahashi, H. Matsuoka, and H. Ohno. 2007. 2Mb spin-transfer torque RAM (SPRAM) with bit-by-bit bidirectional current write and parallelizing-direction current read. In Proceedings of the International Solid-State Circuits Conference (ISSCC).
[35]
Wook-Hee Kim, Jinwoong Kim, Woongki Baek, Beomseok Nam, and Youjip Won. 2016. NVWAL: Exploiting NVRAM in write-ahead logging. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). ACM, New York, NY, 385--398.
[36]
Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-performance transactions for persistent memories. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[37]
Emre Kultursay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evaluating STT-RAM as an energy-effcient main memory alternative. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS).
[38]
Benjamin C. Lee. 2010. Phase change technology and the future of main memory. IEEE Micro (2010).
[39]
Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, Weimin Zheng, and Jinglei Ren. 2017. DudeTM: Building durable transactions with decoupling for persistent memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). ACM, New York, NY, 329--343.
[40]
Q. Liu, J. Izraelevitz, S. K. Lee, M. L. Scott, S. H. Noh, and C. Jung. 2018. iDO: Compiler-directed failure atomicity for nonvolatile memory. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[41]
Qingrui Liu, Joseph Izraelevitz, Se Kwon Lee, Michael L. Scott, Sam H. Noh, and Changhee Jung. 2018. iDO: Compiler-directed failure atomicity for nonvolatile memory. 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2018).
[42]
Q. Liu, C. Jung, D. Lee, and D. Tiwari. 2016. Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.
[43]
Youyou Lu, Jiwu Shu, Long Sun, and Onur Mutlu. 2014. Loose-ordering consistency for persistent memory. In Proceedings of the International Conference on Computer Design (ICCD).
[44]
Scott A. Mahlke, William Y. Chen, Wen-mei W. Hwu, B. Ramakrishna Rau, and Michael S. Schlansker. 1992. Sentinel scheduling for VLIW and superscalar processors. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V).
[45]
C. Mohan, D. Haderle, B. Lindsay, et al. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) (1992).
[46]
Adam Moody, Greg Bronevetsky, Kathryn Mohror, and Bronis R. de Supinski. 2010. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC).
[47]
K. Osawa, A. Sekiya, H. Naganuma, and R. Yokota. 2017. Accelerating matrix multiplication in deep learning by using low-rank approximation. In 2017 International Conference on High Performance Computing Simulation (HPCS).
[48]
Steven Pelley, Peter M. Chen, and Thomas F. Wenisch. 2014. Memory persistency. In Proceedings of International Symposium on Computer Architecture (ISCA).
[49]
M. K. Qureshi. 2011. Pay-as-you-go: Low-overhead hard-error correction for phase change memories. In 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[50]
Raghunath Rajachandrasekar, Sreeram Potluri, Akshay Venkatesh, Khaled Hamidouche, Md. Wasi ur Rahman, and Dhabaleswar K. (DK) Panda. 2014. MIC-check: A distributed checkpointing framework for the Intel many integrated cores architecture. In Proceedings of the International Symposium on High-performance Parallel and Distributed Computing (HPDC).
[51]
Mohit Saxena and Michael M. Swift. 2010. FlashVM: Virtual memory management on flash. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’10).
[52]
B. Schroeder and G. A. Gibson. 2007. Understanding failures in petascale computers. Journal of Physics 78, 1 (2007), 12--22.
[53]
Seunghee Shin, Satish Kumar Tirukkovalluri, James Tuck, and Yan Solihin. 2017. Proteus: A flexible and fast software supported hardware logging approach for NVM. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture.
[54]
Seunghee Shin, James Tuck, and Yan Solihin. 2017. Hiding the long latency of persist barriers using speculative execution. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17).
[55]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[56]
M. E. Wolf and M. S. Lam. 1990. A data locality optimizing algorithm. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI).
[57]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of International Symposium on Computer Architecture (ISCA).
[58]
S. C. Woo, J. P. Singh, and J. L. Hennessy. 1994. The performance advantages of integrating block data transfer in cache-coherent multiprocessors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Cited By

View all
  • (2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
  • (2023)PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00015(74-85)Online publication date: 21-Oct-2023
  • (2023)Reconciling Selective Logging and Hardware Persistent Memory Transaction2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071088(664-676)Online publication date: Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 16, Issue 2
June 2019
317 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3325131
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2019
Accepted: 01 February 2019
Revised: 01 February 2019
Received: 01 December 2018
Published in TACO Volume 16, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Memory systems
  2. computer architecture
  3. emerging memory technologies

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)216
  • Downloads (Last 6 weeks)20
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
  • (2023)PreFlush: Lightweight Hardware Prediction Mechanism for Cache Line Flush and Writeback2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00015(74-85)Online publication date: 21-Oct-2023
  • (2023)Reconciling Selective Logging and Hardware Persistent Memory Transaction2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071088(664-676)Online publication date: Feb-2023
  • (2022) Pop-Crypt: Identification and Management of Pop ular Words for Enhancing Lifetime of En Crypt ed Nonvolatile Main Memories IEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.318379330:9(1219-1229)Online publication date: Sep-2022
  • (2021)Clobber-NVM: log less, re-execute moreProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446730(346-359)Online publication date: 19-Apr-2021
  • (2021)Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded SystemsIEEE Transactions on Computers10.1109/TC.2020.298825170:3(332-346)Online publication date: 9-Feb-2021
  • (2021)BBB: Simplifying Persistent Programming using Battery-Backed Buffers2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00019(111-124)Online publication date: Feb-2021
  • (2020)WETProceedings of the 57th ACM/EDAC/IEEE Design Automation Conference10.5555/3437539.3437610(1-6)Online publication date: 20-Jul-2020
  • (2020)SageProceedings of the VLDB Endowment10.14778/3397230.339725113:9(1598-1613)Online publication date: 1-May-2020
  • (2019)Compiler-support for Critical Data Persistence in NVMACM Transactions on Architecture and Code Optimization10.1145/337123616:4(1-25)Online publication date: 26-Dec-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media