Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3240302.3240308acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Cooperative NV-NUMA: prolonging non-volatile memory lifetime through bandwidth sharing

Published: 01 October 2018 Publication History

Abstract

Resistive memory technologies, such as ReRAM and PCM, are potentially promising replacements for DRAM technology. Their limited endurance (and thus short lifetime), however, is a major obstacle to their commercialization. Analytic models and experimental data show a polynomial relationship between write latency and endurance. Thus, we can use slow writes to introduce less wear, but it is challenging to design a system that meets the memory lifetime requirements without losing performance. We address this challenge in a multiprocessor non-uniform memory architecture (NUMA) environment through memory bandwidth sharing between processing nodes.
While previous approaches have distributed data and computation to share memory capacity or parallelize applications, our main goal is to share memory bandwidth between NUMA nodes when running workloads with varying memory bandwidth needs. When a node has extra memory bandwidth and a write is for data residing in that node, then a slow write can be issued and lifetime can be improved. Data distribution, however, creates a new challenge in that latency for remote nodes' memory accesses can degrade performance. In order to mitigate this degradation, we propose Cooperative NV-NUMA, which detects hot remote memory pages by monitoring Last Level Cache (LLC) evictions, and caches these pages locally. We simulate a proof-of-concept design that explores the proposed technique for a suite of applications. We find that, our approach can meet lifetime requirements while substantially improving the performance of the NUMA nodes with challenging lifetime (up to 48%) over previous work.

References

[1]
Fredrik Dahlgren and Josep Torrellas. 1999. Cache-only memory architectures. Computer 32, 6 (1999), 72--79.
[2]
Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. 2013. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. SIGARCH Comput. Archit. News 41, 1 (March 2013), 381--394.
[3]
Babak Falsafi and David A Wood. 1997. Reactive NUMA: a design for unifying S-COMA and CC-NUMA. In ACM SIGARCH Computer Architecture News, Vol. 25. ACM, 229--240.
[4]
Brad Fitzpatrick and A Vorobey. 2003. Memcached: a distributed memory object caching system. http://memcached.org/.
[5]
Erik Hagersten, Anders Landin, and Seif Haridi. 1992. DDM-a cache-only memory architecture. Computer 25, 9 (1992), 44--54.
[6]
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17.
[7]
C. C. Huang, R. Kumar, M. Elver, B. Grot, and V. Nagarajan. 2016. C3D: Mitigating the NUMA bottleneck via coherent DRAM caches. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.
[8]
R. Iyer and L. N. Bhuyan. 1999. Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors. In Proceedings Fifth International Symposium on High-Performance Computer Architecture. 152--160.
[9]
Mohammad Reza Jokar, Mohammad Arjomand, and Hamid Sarbazi-Azad. 2016. Sequoia: A high-endurance NVM-based cache architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 3 (2016), 954--967.
[10]
Y. Kim, W. Yang, and O. Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Computer Architecture Letters 15, 1 (Jan 2016), 45--49.
[11]
Konstantin K. Likharev. 1998. Layered tunnel barriers for nonvolatile memory devices. Applied Physics Letters 73, 15 (1998), 2137--2139.
[12]
Xueqing Liu, Vijay Patel, Zhongkui Tan, Konstantin K Likharev, and James E Lukens. 2007. High-quality aluminum-oxide tunnel barriers for scalable, floating-gate random-access memories (FGRAM). In Proc. Int. Conf. on Memory Technology and Design (ICMTD). 235--237.
[13]
Lukas M. Maas, Thomas Kissinger, Dirk Habich, and Wolfgang Lehner. 2013. BUZZARD: A NUMA-aware In-memory Indexing System. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). ACM, New York, NY, USA, 1285--1286.
[14]
Zoltan Majo and Thomas R. Gross. 2017. A Library for Portable and Composable Data Locality Optimizations for NUMA Systems. ACM Trans. Parallel Comput. 3, 4, Article 20 (March 2017), 32 pages.
[15]
J. McPherson, J-Y. Kim, A. Shanware, and H. Mogul. 2003. Thermochemical description of dielectric breakdown in high dielectric constant materials. Applied Physics Letters 82, 13 (2003), 2121--2123.
[16]
Nevill Francis Mott and Ronald Wilfrid Gurney. 1948. Electronic processes in ionic crystals. (1948).
[17]
John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2010. The Case for RAMClouds: Scalable High-performance Storage Entirely in DRAM. SIGOPS Oper. Syst. Rev. 43, 4 (Jan. 2010), 92--105.
[18]
Matthew D Pickett, Dmitri B Strukov, Julien L Borghetti, J Joshua Yang, Gregory S Snider, Duncan R Stewart, and R Stanley Williams. 2009. Switching dynamics in titanium dioxide memristive devices. Journal of Applied Physics 106, 7 (2009), 074508.
[19]
M.K. Qureshi, M.M. Franceschini, and L.A. Lastras-Montano. 2010. Improving read performance of Phase Change Memories via Write Cancellation and Write Pausing. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. 1--11.
[20]
Moinuddin K. Qureshi, Michele M. Franceschini, Ashish Jagmohan, and Luis A. Lastras. 2012. PreSET: Improving Performance of Phase Change Memories by Exploiting Asymmetry in Write Times. In Proceedings of the 39th Annual International Symposium on Computer Architecture. 380--391.
[21]
Moinuddin K. Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009. Enhancing Lifetime and Security of PCM-based Main Memory with Start-gap Wear Leveling. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture. 14--23.
[22]
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable High Performance Main Memory System Using Phase-change Memory Technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture. 24--33.
[23]
Hebatallah Saadeldeen, Diana Franklin, Guoping Long, Charlotte Hill, Aisha Browne, Dmitri Strukov, Timothy Sherwood, and Frederic T Chong. 2013. Memristors for neural branch prediction: a case study in strict latency and write endurance challenges. In Proceedings of the ACM International Conference on Computing Frontiers. 26:1--26:10.
[24]
Ashley Saulsbury, Tim Wilkinson, John Carter, and Anders Landin. 1995. An argument for simple COMA. In High-Performance Computer Architecture, 1995. Proceedings., First IEEE Symposium on. IEEE, 276--285.
[25]
Nak Hee Seong, Dong Hyuk Woo, and Hsien-Hsin S. Lee. 2010. Security Refresh: Prevent Malicious Wear-out and Increase Durability for Phase-change Memory with Dynamically Randomized Address Mapping. In Proceedings of the 37th Annual International Symposium on Computer Architecture. 383--394.
[26]
Dmitri B. Strukov. 2016. Endurance-write-speed tradeoffs in nonvolatile memories. Applied Physics A 122, 4 (2016), 1--4.
[27]
Ben Verghese, Scott Devine, Anoop Gupta, and Mendel Rosenblum. 1996. Operating System Support for Improving Data Locality on CC-NUMA Compute Servers. SIGOPS Oper. Syst. Rev. 30, 5 (Sept. 1996), 279--289.
[28]
Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P Jouppi. 2013. i 2 WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations. In High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on. IEEE, 234--245.
[29]
K. M. Wilson and B. B. Aglietti. 2001. Dynamic Page Placement to Improve Locality in CC-NUMA Multiprocessors for TPC-C. In Supercomputing, ACM/IEEE 2001 Conference. 35--35.
[30]
J. Joshua Yang, Dmitri B. Strukov, and Duncan R. Stewart. 2013. Memristive devices for computing. Nature Nanotechnology 8, 1 (2013), 13--24.
[31]
Hung-Chang Yu, Kai-Chun Lin, Ku-Feng Lin, Chin-Yi Huang, Yu-Der Chih, Tong-Chern Ong, J. Chang, S. Natarajan, and L.C. Tran. 2013. Cycling endurance optimization scheme for 1Mb STT-MRAM in 40nm technology. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International. 224--225.
[32]
Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, and Frederic T Chong. 2016. Mellow writes: Extending lifetime in resistive memories through selective slow write backs. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 519--531.
[33]
Lunkai Zhang, Dmitri Strukov, Hebatallah Saadeldeen, Dongrui Fan, Mingzhe Zhang, and Diana Franklin. 2014. SpongeDirectory: Flexible Sparse Directories Utilizing Multi-level Memristors. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. 61--74.
[34]
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A Durable and Energy Efficient Main Memory Using Phase Change Memory Technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture. 14--23.

Cited By

View all
  • (2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
  • (2023)MC-ELMM: Multi-Chip Endurance-Limited Memory ManagementProceedings of the International Symposium on Memory Systems10.1145/3631882.3631905(1-16)Online publication date: 2-Oct-2023
  • (2020)SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116338(1598-1601)Online publication date: Mar-2020
  • Show More Cited By

Index Terms

  1. Cooperative NV-NUMA: prolonging non-volatile memory lifetime through bandwidth sharing

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      MEMSYS '18: Proceedings of the International Symposium on Memory Systems
      October 2018
      361 pages
      ISBN:9781450364751
      DOI:10.1145/3240302
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 October 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. lifetime
      2. non-uniform memory architecture (NUMA)
      3. non-volatile memory
      4. write endurance

      Qualifiers

      • Research-article

      Conference

      MEMSYS '18
      MEMSYS '18: The International Symposium on Memory Systems
      October 1 - 4, 2018
      Virginia, Alexandria, USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 26 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)MORSE: Memory Overwrite Time Guided Soft Writes to Improve ReRAM Energy and EnduranceProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676890(26-39)Online publication date: 14-Oct-2024
      • (2023)MC-ELMM: Multi-Chip Endurance-Limited Memory ManagementProceedings of the International Symposium on Memory Systems10.1145/3631882.3631905(1-16)Online publication date: 2-Oct-2023
      • (2020)SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116338(1598-1601)Online publication date: Mar-2020
      • (2020)Semantic Convolutional Neural Network model for Safe Business Investment by Using BERT2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS)10.1109/SNAMS52053.2020.9336575(1-6)Online publication date: 14-Dec-2020
      • (2020)Deep Learning Acceleration with Neuron-to-Memory Transformation2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00011(1-14)Online publication date: Feb-2020
      • (2019)Balancing Performance and Energy Efficiency of ONoC by Using Adaptive Bandwidth2019 IEEE 37th International Conference on Computer Design (ICCD)10.1109/ICCD46524.2019.00095(664-667)Online publication date: Nov-2019

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media