Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3422575.3422795acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article
Open access

Performance Modeling and Evaluation of a Production Disaggregated Memory System

Published: 21 March 2021 Publication History

Abstract

High performance computers rely on large memories to cache data and improve performance. However, managing the ever-increasing number of levels in the memory hierarchy becomes increasingly difficult. The Disaggregated Memory System (DMS) architecture was introduced in recent years for better memory utilization. DMS is a global memory pool between the local memories and storage. To leverage DMS, we need a better understanding of its performance and how to exploit its full potential. In this study, we first present a DMS performance model for performance evaluation and analysis. We next conduct a thorough performance evaluation to identify application-DMS characteristics under different system configurations. Experimental tests are conducted on the RAM Area Network (RAN), a DMS implementation available at the Argonne National Laboratory, for performance evaluation. Then, the results of performance experiments are presented along with an analysis of the pros and cons of the RAN-DMS design and implementation. The counterintuitive performance results for the K-means application are analyzed at code-level to illustrate DMS performance. Finally, based on our findings, we present some discussions on future DMS design and its potential on AI applications.

References

[1]
K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In ISCA ’09: Proceedings of the 36th annual International Symposium on Computer Architecture, pages 267–278, New York, NY, USA, 2009. ACM.
[2]
A. Lebeck, X. Fan, H. Zheng and C. Ellis. Power Aware Page Allocation. In Proc. of the 9th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), Nov. 2000.
[3]
V. Pandey, W. Jiang, Y. Zhou and R. Bianchini. DMA Aware Memory Energy Conservation. In Proc. of the 12th Int. Sym. on High-Performance Computer Architecture (HPCA-12), 2006
[4]
K. Lim, P. Ranganathan, J. Chang, C. Patel, T. Mudge and S. Reinhardt. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments. In Proc. of the 35th Int. Sym. on Computer Architecture (ISCA-35), June 2008.
[5]
P. Ranganathan and N. Jouppi. Enterprise IT Trends and Implications for Architecture Research. In Proc. of the 11th Int. Sym. on High-Performance Computer Architecture (HPCA-11), 2005
[6]
K. T. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. System-level implications of disaggregated memory. In IEEE Symposium on High Performance Computer Architecture (HPCA), pages 189–200, Feb. 2012.
[7]
W. Allcock, B. Bernardoni, C. Bertoni, N. Getty, J. Insley, M. E. Papka, S. Rizzi and B. Toonen. RAM as a Network Managed Resource. In IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2018.
[8]
Argonne Leadership Computing Facility. Introduciton of Cooley. https://www.alcf.anl.gov/user-guides/cooley.
[9]
P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agarwal, S. Ratnasamy, and S. Shenker, “Network requirements for resource disaggregation,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), pp. 249–264, 2016.
[10]
Y. Shan, Y. Huang, Y. Chen and Y. Zhang. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18), pp. 69-87, 2018.
[11]
W. A. Wulf and S. A. McKee. Hitting the memory wall: Implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20{24, Mar. 1995.
[12]
J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach (sixth edition), The Morgan Kaufmann Publishers, 2019 (ISBN-13: 978-0128119051).
[13]
A. Glew, “MLP yes! ILP no!” in Proc. ASPLOS Wild and Crazy Idea a workshop Session‘98, Oct. 1998.
[14]
X.-H. Sun and D. Wang. Concurrent Average Memory Access Time. Computer, vol. 47, no. 5, pp. 74–80, 2014.
[15]
X.-H. Sun, “Concurrent-AMAT: A Mathematical Model for Big Data access,” HPC Magazine, 2014.
[16]
H. Meyer, J. C. Sancho, J. V. Quiroga, F. Zyulkyarov, D. Roca, and M. Nemirovsky. Disaggregated computing. an evaluation of current trends for datacentres. In international Conference on Computational Science (ICCS’17), 2017.
[17]
J. Gu, Y. Lee, Y. Zhang, M. Chowdhury, and K. Shin. Efficient Memory Disaggregation with Infiniswap. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’17), 2017.
[18]
D. Buragohain, A. Ghogare, T. Patel, M. Vutukuru, and P. Kulkarni. DiME: A Performance Emulator for Disaggregated Memory Architectures. In Proceedings of the 8th Asia-Pacific Workshop on Systems. ACM, 2017.
[19]
N. Zhang, C. Jiang, X.-H. Sun, and S. L. Song. Evaluating GPGPU Memory Performance Through the C-AMAT Model. In Proceedings of the Workshop on Memory Centric Programming for HPC, pp. 35-39. ACM, 2017.
[20]
Y.-H. Liu, and X.-H. Sun. LPM: concurrency-driven layered performance matching. In Parallel Processing (ICPP), 2015 44th International Conference on, pp. 879-888. IEEE, 2015.
[21]
Y. Liu and X. Sun, "LPM: A Systematic Methodology for Concurrent Data Access Pattern Optimization from a Matching Perspective," in IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 11, pp. 2478-2493, 1 Nov. 2019
[22]
Intel. Intel® 64 and IA-32 Architectures Optimization Reference Manual.https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
[23]
Perf profiler, https://perf.wiki.kernel.org/index.php/Main_Page
[24]
The DGEMM Benchmark,” http://www.nersc.gov/research-and-development/apex/apex-benchmarks/dgemm/, 2018, [Online; accessed 11-May-2018].
[25]
“The Graph500 Benchmark,” http://www.graph500.org/, 2018, [Online; accessed 11-May -2018].
[26]
“HPC Challenge Benchmark,” http://icl.cs.utk.edu/hpcc/index.html, 2018, [Online; accessed 11-May-2018].
[27]
H. Lv, G. Tan, M. Chen, and N. Sun. Understanding parallelism in graph traversal on multi-core clusters. Computer Science-Research and Development, vol. 28, no. 2-3, pp. 193–201, 2013.
[28]
S. Che, M. Boyer, J. Meng, Rodinia: A benchmark suite for heterogeneous computing. in IEEE International Symposium on Workload Characterization, pp. 44-54, 2009.
[29]
M. A. Sasongko, M. Chabbi, P. Akhtar, and D. Unat. “ComDetective: A Lightweight Communication Detection Tool for Threads”. In Proceedings of ACM Supercomputing (SC’19), Denver, November 17–22, 2019
[30]
Intel Vtune Amplifier, https://software.intel.com/en-us/vtune
[31]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, N. R. Tallent, HPCTOOLKIT: tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice & Experience, v.22 n.6, p.685-701, April 2010
[32]
Intel Optane, https://www.intel.com/content/www/us/en/architecture-and-technology/optane-memory.html
[33]
A. Kougkas, H. Devarajan and X.-H. Sun, "Hermes: A Heterogeneous-Aware Multi-Tiered Distributed I/O Buffering System," in Proc. of 27th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), Tempe, AZ, USA, pp. 219-230, June 2018.
[34]
Compute Express Link (CXL) , https://www.computeexpresslink.org/
[35]
Hewlett-Packard. The Machine: A New Kind of Computer. http://www.hpl.hp.com/research/systemsresearch/themachine/.
[36]
L. A. Barroso. Warehouse-scale computing: Entering the teenage decade. In Proceeding of the 38th annual International Symposium on Computer Architecture, ISCA’11, New York, NY, USA, ACM, 2011.
[37]
T. Newhall, S. Finney, K. Ganchev, and M. Spiegel, “Nswap: A network swapping module for linux clusters,” in Proc. Eur. Conf. Parallel Process., 2003, pp. 1160–1169.
[38]
S. Dwarkadas, N. Hardavellas, L. Kontothanassis, R. Nikhil, and R. Stets, “Cashmere-VLM: Remote memory paging for software distributed shared memory,” in Proc. 13th Int. Parallel Process. Symp. 10th Symp. Parallel Distrib. Process., Apr. 1999, pp. 153–159.
[39]
G. Bernard and S. Hamma, “Remote memory paging in networks of workstations,” in Proc. SUUG Int. Conf. Open Syst.: Solutions Open Word, 1994.
[40]
E. A. Anderson and J. M. Neefe, “An exploration of network RAM,” EECS Department, Univ. California, Berkeley, Tech. Rep. UCB/CSD-98-1000, Dec. 1994.
[41]
G. Sims, “All about Linux swap space,” [Online]. Available: https://www.linux.com/news/all-about-linux-swap-space, 2007.
[42]
K. Koh, K. Kim, S. Jeon and J. Huh, "Disaggregated Cloud Memory with Elastic Block Management," in IEEE Transactions on Computers, vol. 68, no. 1, pp. 39-52, 1 Jan. 2019.
[43]
P. S. Rao and G. Porter, “Is memory disaggregation feasible? a case study with spark sql,” in 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), March 2016, pp. 75–80.
[44]
K. Katrinis, D. Syrivelis, D. Pnevmatikatos, G. Zervas, D. Theodoropoulos, I. Koutsopoulos, K. Hasharoni, D. Raho, C. Pinto, F. Espina, S. Lopez-Buedo, Q. Chen, M. Nemirovsky, D. Roca, H. Klos, and T. Berends. Rack-scale disaggregated cloud data centers: The dReDBox project vision. In Design, Automation Test in Europe Conference Exhibition (DATE ’16), 2016.

Cited By

View all
  • (2024)CHROME: Concurrency-Aware Holistic Cache Management Framework with Online Reinforcement Learning2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00090(1154-1167)Online publication date: 2-Mar-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '20: Proceedings of the International Symposium on Memory Systems
September 2020
362 pages
ISBN:9781450388993
DOI:10.1145/3422575
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. C-AMAT
  2. Disaggregated Memory
  3. Performance Evaluation
  4. Performance Modeling
  5. RAN
  6. Utilization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MEMSYS 2020
MEMSYS 2020: The International Symposium on Memory Systems
September 28 - October 1, 2020
DC, Washington, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)201
  • Downloads (Last 6 weeks)31
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CHROME: Concurrency-Aware Holistic Cache Management Framework with Online Reinforcement Learning2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00090(1154-1167)Online publication date: 2-Mar-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media