research-article

Cooperative NV-NUMA: prolonging non-volatile memory lifetime through bandwidth sharing

Authors:

Mohammad Reza Jokar,

Frederic T. ChongAuthors Info & Claims

MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Pages 67 - 78

https://doi.org/10.1145/3240302.3240308

Published: 01 October 2018 Publication History

Abstract

Resistive memory technologies, such as ReRAM and PCM, are potentially promising replacements for DRAM technology. Their limited endurance (and thus short lifetime), however, is a major obstacle to their commercialization. Analytic models and experimental data show a polynomial relationship between write latency and endurance. Thus, we can use slow writes to introduce less wear, but it is challenging to design a system that meets the memory lifetime requirements without losing performance. We address this challenge in a multiprocessor non-uniform memory architecture (NUMA) environment through memory bandwidth sharing between processing nodes.

While previous approaches have distributed data and computation to share memory capacity or parallelize applications, our main goal is to share memory bandwidth between NUMA nodes when running workloads with varying memory bandwidth needs. When a node has extra memory bandwidth and a write is for data residing in that node, then a slow write can be issued and lifetime can be improved. Data distribution, however, creates a new challenge in that latency for remote nodes' memory accesses can degrade performance. In order to mitigate this degradation, we propose Cooperative NV-NUMA, which detects hot remote memory pages by monitoring Last Level Cache (LLC) evictions, and caches these pages locally. We simulate a proof-of-concept design that explores the proposed technique for a suite of applications. We find that, our approach can meet lifetime requirements while substantially improving the performance of the NUMA nodes with challenging lifetime (up to 48%) over previous work.

References

[1]

Fredrik Dahlgren and Josep Torrellas. 1999. Cache-only memory architectures. Computer 32, 6 (1999), 72--79.

Digital Library

[2]

Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. 2013. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. SIGARCH Comput. Archit. News 41, 1 (March 2013), 381--394.

Digital Library

[3]

Babak Falsafi and David A Wood. 1997. Reactive NUMA: a design for unifying S-COMA and CC-NUMA. In ACM SIGARCH Computer Architecture News, Vol. 25. ACM, 229--240.

Digital Library

[4]

Brad Fitzpatrick and A Vorobey. 2003. Memcached: a distributed memory object caching system. http://memcached.org/.

[5]

Erik Hagersten, Anders Landin, and Seif Haridi. 1992. DDM-a cache-only memory architecture. Computer 25, 9 (1992), 44--54.

Digital Library

[6]

John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.

Digital Library

[7]

C. C. Huang, R. Kumar, M. Elver, B. Grot, and V. Nagarajan. 2016. C3D: Mitigating the NUMA bottleneck via coherent DRAM caches. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.

Digital Library

[8]

R. Iyer and L. N. Bhuyan. 1999. Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors. In Proceedings Fifth International Symposium on High-Performance Computer Architecture. 152--160.

Digital Library

[9]

Mohammad Reza Jokar, Mohammad Arjomand, and Hamid Sarbazi-Azad. 2016. Sequoia: A high-endurance NVM-based cache architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 3 (2016), 954--967.

Digital Library

[10]

Y. Kim, W. Yang, and O. Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Computer Architecture Letters 15, 1 (Jan 2016), 45--49.

Digital Library

[11]

Konstantin K. Likharev. 1998. Layered tunnel barriers for nonvolatile memory devices. Applied Physics Letters 73, 15 (1998), 2137--2139.

[12]

Xueqing Liu, Vijay Patel, Zhongkui Tan, Konstantin K Likharev, and James E Lukens. 2007. High-quality aluminum-oxide tunnel barriers for scalable, floating-gate random-access memories (FGRAM). In Proc. Int. Conf. on Memory Technology and Design (ICMTD). 235--237.

[13]

Lukas M. Maas, Thomas Kissinger, Dirk Habich, and Wolfgang Lehner. 2013. BUZZARD: A NUMA-aware In-memory Indexing System. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). ACM, New York, NY, USA, 1285--1286.

Digital Library

[14]

Zoltan Majo and Thomas R. Gross. 2017. A Library for Portable and Composable Data Locality Optimizations for NUMA Systems. ACM Trans. Parallel Comput. 3, 4, Article 20 (March 2017), 32 pages.

Digital Library

[15]

J. McPherson, J-Y. Kim, A. Shanware, and H. Mogul. 2003. Thermochemical description of dielectric breakdown in high dielectric constant materials. Applied Physics Letters 82, 13 (2003), 2121--2123.

[16]

Nevill Francis Mott and Ronald Wilfrid Gurney. 1948. Electronic processes in ionic crystals. (1948).

[17]

John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2010. The Case for RAMClouds: Scalable High-performance Storage Entirely in DRAM. SIGOPS Oper. Syst. Rev. 43, 4 (Jan. 2010), 92--105.

Digital Library

[18]

Matthew D Pickett, Dmitri B Strukov, Julien L Borghetti, J Joshua Yang, Gregory S Snider, Duncan R Stewart, and R Stanley Williams. 2009. Switching dynamics in titanium dioxide memristive devices. Journal of Applied Physics 106, 7 (2009), 074508.

[19]

M.K. Qureshi, M.M. Franceschini, and L.A. Lastras-Montano. 2010. Improving read performance of Phase Change Memories via Write Cancellation and Write Pausing. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. 1--11.

[20]

Moinuddin K. Qureshi, Michele M. Franceschini, Ashish Jagmohan, and Luis A. Lastras. 2012. PreSET: Improving Performance of Phase Change Memories by Exploiting Asymmetry in Write Times. In Proceedings of the 39th Annual International Symposium on Computer Architecture. 380--391.

Digital Library

[21]

Moinuddin K. Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009. Enhancing Lifetime and Security of PCM-based Main Memory with Start-gap Wear Leveling. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture. 14--23.

Digital Library

[22]

Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable High Performance Main Memory System Using Phase-change Memory Technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture. 24--33.

Digital Library

[23]

Hebatallah Saadeldeen, Diana Franklin, Guoping Long, Charlotte Hill, Aisha Browne, Dmitri Strukov, Timothy Sherwood, and Frederic T Chong. 2013. Memristors for neural branch prediction: a case study in strict latency and write endurance challenges. In Proceedings of the ACM International Conference on Computing Frontiers. 26:1--26:10.

Digital Library

[24]

Ashley Saulsbury, Tim Wilkinson, John Carter, and Anders Landin. 1995. An argument for simple COMA. In High-Performance Computer Architecture, 1995. Proceedings., First IEEE Symposium on. IEEE, 276--285.

Digital Library

[25]

Nak Hee Seong, Dong Hyuk Woo, and Hsien-Hsin S. Lee. 2010. Security Refresh: Prevent Malicious Wear-out and Increase Durability for Phase-change Memory with Dynamically Randomized Address Mapping. In Proceedings of the 37th Annual International Symposium on Computer Architecture. 383--394.

Digital Library

[26]

Dmitri B. Strukov. 2016. Endurance-write-speed tradeoffs in nonvolatile memories. Applied Physics A 122, 4 (2016), 1--4.

[27]

Ben Verghese, Scott Devine, Anoop Gupta, and Mendel Rosenblum. 1996. Operating System Support for Improving Data Locality on CC-NUMA Compute Servers. SIGOPS Oper. Syst. Rev. 30, 5 (Sept. 1996), 279--289.

Digital Library

[28]

Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P Jouppi. 2013. i 2 WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 234--245.

Digital Library

[29]

K. M. Wilson and B. B. Aglietti. 2001. Dynamic Page Placement to Improve Locality in CC-NUMA Multiprocessors for TPC-C. In Supercomputing, ACM/IEEE 2001 Conference. 35--35.

Digital Library

[30]

J. Joshua Yang, Dmitri B. Strukov, and Duncan R. Stewart. 2013. Memristive devices for computing. Nature Nanotechnology 8, 1 (2013), 13--24.

[31]

Hung-Chang Yu, Kai-Chun Lin, Ku-Feng Lin, Chin-Yi Huang, Yu-Der Chih, Tong-Chern Ong, J. Chang, S. Natarajan, and L.C. Tran. 2013. Cycling endurance optimization scheme for 1Mb STT-MRAM in 40nm technology. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. 224--225.

[32]

Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, and Frederic T Chong. 2016. Mellow writes: Extending lifetime in resistive memories through selective slow write backs. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 519--531.

Digital Library

[33]

Lunkai Zhang, Dmitri Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, and Diana Franklin. 2014. SpongeDirectory: Flexible Sparse Directories Utilizing Multi-level Memristors. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. 61--74.

Digital Library

[34]

Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A Durable and Energy Efficient Main Memory Using Phase Change Memory Technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture. 14--23.

Digital Library

Cited By

Singh DYeung D(2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676890
Bartolo ASabry Aly MMichelogiannakis GMitra S(2023)MC-ELMM: Multi-Chip Endurance-Limited Memory ManagementProceedings of the International Symposium on Memory Systems10.1145/3631882.3631905(1-16)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3631882.3631905
Gupta SImani MSim JHuang AWu FNajafi MRosing T(2020)SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116338(1598-1601)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116338
Show More Cited By

Index Terms

Cooperative NV-NUMA: prolonging non-volatile memory lifetime through bandwidth sharing
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Emerging technologies
    1. Memory and dense storage

Recommendations

WriteSmoothing: improving lifetime of non-volatile caches using intra-set wear-leveling
GLSVLSI '14: Proceedings of the 24th edition of the great lakes symposium on VLSI

Driven by the trends of increasing core-count and bandwidth-wall problem, the size of last level caches (LLCs) has greatly increased. Since SRAM consumes high leakage power, researchers have explored use of non-volatile memories (NVMs) for designing ...
VAIL: A Victim-Aware Cache Policy to improve NVM Lifetime for hybrid memory system
Abstract
Nowadays emerging Non-Volatile Memory (NVM) is introduced to remedy the shortages of current DRAM-based main memory, but limited write en-durance of NVM would severely restrict memory system. Many techniques are proposed to extend NVM ...
Exploring Dynamic Redundancy to Resuscitate Faulty PCM Blocks

DRAM technology challenges have increased the necessity to adapt to the emerging memory technologies like Phase-Change Memory (PCM or PRAM). While such emerging technologies provide benefits like storage density, nonvolatility, and low energy ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '18: Proceedings of the International Symposium on Memory Systems

October 2018

361 pages

ISBN:9781450364751

DOI:10.1145/3240302

General Chair:
Bruce Jacob
University of Maryland

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MEMSYS '18

MEMSYS '18: The International Symposium on Memory Systems

October 1 - 4, 2018

Virginia, Alexandria, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
148
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Singh DYeung D(2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676890
Bartolo ASabry Aly MMichelogiannakis GMitra S(2023)MC-ELMM: Multi-Chip Endurance-Limited Memory ManagementProceedings of the International Symposium on Memory Systems10.1145/3631882.3631905(1-16)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3631882.3631905
Gupta SImani MSim JHuang AWu FNajafi MRosing T(2020)SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116338(1598-1601)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116338
Heidari MRafatirad S(2020)Semantic Convolutional Neural Network model for Safe Business Investment by Using BERT2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS52053.2020.9336575(1-6)Online publication date: 14-Dec-2020
https://doi.org/10.1109/SNAMS52053.2020.9336575
Imani MSamragh Razlighi MKim YGupta SKoushanfar FRosing T(2020)Deep Learning Acceleration with Neuron-to-Memory Transformation2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00011(1-14)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00011
Zhang MZhang LChong FLiu Z(2019)Balancing Performance and Energy Efficiency of ONoC by Using Adaptive Bandwidth2019 IEEE 37th International Conference on Computer Design (ICCD)10.1109/ICCD46524.2019.00095(664-667)Online publication date: Nov-2019
https://doi.org/10.1109/ICCD46524.2019.00095

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents