research-article

Open access

Performance Modeling and Evaluation of a Production Disaggregated Memory System

Authors:

Bill AllcockAuthors Info & Claims

MEMSYS '20: Proceedings of the International Symposium on Memory Systems

Pages 223 - 232

https://doi.org/10.1145/3422575.3422795

Published: 21 March 2021 Publication History

All formats PDF

Abstract

High performance computers rely on large memories to cache data and improve performance. However, managing the ever-increasing number of levels in the memory hierarchy becomes increasingly difficult. The Disaggregated Memory System (DMS) architecture was introduced in recent years for better memory utilization. DMS is a global memory pool between the local memories and storage. To leverage DMS, we need a better understanding of its performance and how to exploit its full potential. In this study, we first present a DMS performance model for performance evaluation and analysis. We next conduct a thorough performance evaluation to identify application-DMS characteristics under different system configurations. Experimental tests are conducted on the RAM Area Network (RAN), a DMS implementation available at the Argonne National Laboratory, for performance evaluation. Then, the results of performance experiments are presented along with an analysis of the pros and cons of the RAN-DMS design and implementation. The counterintuitive performance results for the K-means application are analyzed at code-level to illustrate DMS performance. Finally, based on our findings, we present some discussions on future DMS design and its potential on AI applications.

References

[1]

K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In ISCA ’09: Proceedings of the 36th annual International Symposium on Computer Architecture, pages 267–278, New York, NY, USA, 2009. ACM.

Digital Library

[2]

A. Lebeck, X. Fan, H. Zheng and C. Ellis. Power Aware Page Allocation. In Proc. of the 9th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), Nov. 2000.

Digital Library

[3]

V. Pandey, W. Jiang, Y. Zhou and R. Bianchini. DMA Aware Memory Energy Conservation. In Proc. of the 12th Int. Sym. on High-Performance Computer Architecture (HPCA-12), 2006

[4]

K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge and S. Reinhardt. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments. In Proc. of the 35th Int. Sym. on Computer Architecture (ISCA-35), June 2008.

Digital Library

[5]

P. Ranganathan and N. Jouppi. Enterprise IT Trends and Implications for Architecture Research. In Proc. of the 11th Int. Sym. on High-Performance Computer Architecture (HPCA-11), 2005

Digital Library

[6]

K. T. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. System-level implications of disaggregated memory. In IEEE Symposium on High Performance Computer Architecture (HPCA), pages 189–200, Feb. 2012.

Digital Library

[7]

W. Allcock, B. Bernardoni, C. Bertoni, N. Getty, J. Insley, M. E. Papka, S. Rizzi and B. Toonen. RAM as a Network Managed Resource. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2018.

[8]

Argonne Leadership Computing Facility. Introduciton of Cooley. https://www.alcf.anl.gov/user-guides/cooley.

[9]

P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agarwal, S. Ratnasamy, and S. Shenker, “Network requirements for resource disaggregation,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), pp. 249–264, 2016.

[10]

Y. Shan, Y. Huang, Y. Chen and Y. Zhang. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18), pp. 69-87, 2018.

[11]

W. A. Wulf and S. A. McKee. Hitting the memory wall: Implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20{24, Mar. 1995.

Digital Library

[12]

J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach (sixth edition), The Morgan Kaufmann Publishers, 2019 (ISBN-13: 978-0128119051).

[13]

A. Glew, “MLP yes! ILP no!” in Proc. ASPLOS Wild and Crazy Idea a workshop Session‘98, Oct. 1998.

[14]

X.-H. Sun and D. Wang. Concurrent Average Memory Access Time. Computer, vol. 47, no. 5, pp. 74–80, 2014.

[15]

X.-H. Sun, “Concurrent-AMAT: A Mathematical Model for Big Data access,” HPC Magazine, 2014.

[16]

H. Meyer, J. C. Sancho, J. V. Quiroga, F. Zyulkyarov, D. Roca, and M. Nemirovsky. Disaggregated computing. an evaluation of current trends for datacentres. In international Conference on Computational Science (ICCS’17), 2017.

[17]

J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. Shin. Efficient Memory Disaggregation with Infiniswap. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’17), 2017.

Digital Library

[18]

D. Buragohain, A. Ghogare, T. Patel, M. Vutukuru, and P. Kulkarni. DiME: A Performance Emulator for Disaggregated Memory Architectures. In Proceedings of the 8th Asia-Pacific Workshop on Systems. ACM, 2017.

Digital Library

[19]

N. Zhang, C. Jiang, X.-H. Sun, and S. L. Song. Evaluating GPGPU Memory Performance Through the C-AMAT Model. In Proceedings of the Workshop on Memory Centric Programming for HPC, pp. 35-39. ACM, 2017.

Digital Library

[20]

Y.-H. Liu, and X.-H. Sun. LPM: concurrency-driven layered performance matching. In Parallel Processing (ICPP), 2015 44th International Conference on, pp. 879-888. IEEE, 2015.

Digital Library

[21]

Y. Liu and X. Sun, "LPM: A Systematic Methodology for Concurrent Data Access Pattern Optimization from a Matching Perspective," in IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 11, pp. 2478-2493, 1 Nov. 2019

Digital Library

[22]

Intel. Intel® 64 and IA-32 Architectures Optimization Reference Manual.https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

[23]

Perf profiler, https://perf.wiki.kernel.org/index.php/Main_Page

[24]

The DGEMM Benchmark,” http://www.nersc.gov/research-and-development/apex/apex-benchmarks/dgemm/, 2018, [Online; accessed 11-May-2018].

[25]

“The Graph500 Benchmark,” http://www.graph500.org/, 2018, [Online; accessed 11-May -2018].

[26]

“HPC Challenge Benchmark,” http://icl.cs.utk.edu/hpcc/index.html, 2018, [Online; accessed 11-May-2018].

[27]

H. Lv, G. Tan, M. Chen, and N. Sun. Understanding parallelism in graph traversal on multi-core clusters. Computer Science-Research and Development, vol. 28, no. 2-3, pp. 193–201, 2013.

Digital Library

[28]

S. Che, M. Boyer, J. Meng, Rodinia: A benchmark suite for heterogeneous computing. in IEEE International Symposium on Workload Characterization, pp. 44-54, 2009.

[29]

M. A. Sasongko, M. Chabbi, P. Akhtar, and D. Unat. “ComDetective: A Lightweight Communication Detection Tool for Threads”. In Proceedings of ACM Supercomputing (SC’19), Denver, November 17–22, 2019

Digital Library

[30]

Intel Vtune Amplifier, https://software.intel.com/en-us/vtune

[31]

L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, N. R. Tallent, HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice & Experience, v.22 n.6, p.685-701, April 2010

[32]

Intel Optane, https://www.intel.com/content/www/us/en/architecture-and-technology/optane-memory.html

[33]

A. Kougkas, H. Devarajan and X.-H. Sun, "Hermes: A Heterogeneous-Aware Multi-Tiered Distributed I/O Buffering System," in Proc. of 27th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), Tempe, AZ, USA, pp. 219-230, June 2018.

[34]

Compute Express Link (CXL) , https://www.computeexpresslink.org/

[35]

Hewlett-Packard. The Machine: A New Kind of Computer. http://www.hpl.hp.com/research/systemsresearch/themachine/.

[36]

L. A. Barroso. Warehouse-scale computing: Entering the teenage decade. In Proceeding of the 38th annual International Symposium on Computer Architecture, ISCA’11, New York, NY, USA, ACM, 2011.

[37]

T. Newhall, S. Finney, K. Ganchev, and M. Spiegel, “Nswap: A network swapping module for linux clusters,” in Proc. Eur. Conf. Parallel Process., 2003, pp. 1160–1169.

[38]

S. Dwarkadas, N. Hardavellas, L. Kontothanassis, R. Nikhil, and R. Stets, “Cashmere-VLM: Remote memory paging for software distributed shared memory,” in Proc. 13th Int. Parallel Process. Symp. 10th Symp. Parallel Distrib. Process., Apr. 1999, pp. 153–159.

[39]

G. Bernard and S. Hamma, “Remote memory paging in networks of workstations,” in Proc. SUUG Int. Conf. Open Syst.: Solutions Open Word, 1994.

[40]

E. A. Anderson and J. M. Neefe, “An exploration of network RAM,” EECS Department, Univ. California, Berkeley, Tech. Rep. UCB/CSD-98-1000, Dec. 1994.

[41]

G. Sims, “All about Linux swap space,” [Online]. Available: https://www.linux.com/news/all-about-linux-swap-space, 2007.

[42]

K. Koh, K. Kim, S. Jeon and J. Huh, "Disaggregated Cloud Memory with Elastic Block Management," in IEEE Transactions on Computers, vol. 68, no. 1, pp. 39-52, 1 Jan. 2019.

[43]

P. S. Rao and G. Porter, “Is memory disaggregation feasible? a case study with spark sql,” in 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), March 2016, pp. 75–80.

Digital Library

[44]

K. Katrinis, D. Syrivelis, D. Pnevmatikatos, G. Zervas, D. Theodoropoulos, I. Koutsopoulos, K. Hasharoni, D. Raho, C. Pinto, F. Espina, S. Lopez-Buedo, Q. Chen, M. Nemirovsky, D. Roca, H. Klos, and T. Berends. Rack-scale disaggregated cloud data centers: The dReDBox project vision. In Design, Automation Test in Europe Conference Exhibition (DATE ’16), 2016.

Cited By

Lu XNajafi HLiu JSun X(2024)CHROME: Concurrency-Aware Holistic Cache Management Framework with Online Reinforcement Learning2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00090(1154-1167)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00090

Recommendations

Rethinking software runtimes for disaggregated memory
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Disaggregated memory can address resource provisioning inefficiencies in current datacenters. Multiple software runtimes for disaggregated memory have been proposed in an attempt to make disaggregated memory practical. These systems rely on the virtual ...
Efficient Remote Memory Paging for Disaggregated Memory Systems
Algorithms and Architectures for Parallel Processing
Abstract
Memory disaggregation has attracted increasing attention in recent years because it is a cost-efficient approach to scale memory capacity for applications in a data center. However, the latency of remote memory access is a major concern in ...
DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated Memory
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture

Memory disaggregation is a promising solution to scale memory capacity and bandwidth shared by multiple server nodes in a flexible and cost-effective manner. DRAM power consumption, which is reported to be around 40% of the total system power in the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '20: Proceedings of the International Symposium on Memory Systems

September 2020

362 pages

ISBN:9781450388993

DOI:10.1145/3422575

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MEMSYS 2020

MEMSYS 2020: The International Symposium on Memory Systems

September 28 - October 1, 2020

DC, Washington, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
532
Total Downloads

Downloads (Last 12 months)201
Downloads (Last 6 weeks)31

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lu XNajafi HLiu JSun X(2024)CHROME: Concurrency-Aware Holistic Cache Management Framework with Online Reinforcement Learning2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00090(1154-1167)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00090

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents