Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3357526.3357575acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article
Public Access

Portable application guidance for complex memory systems

Published: 30 September 2019 Publication History

Abstract

The recent emergence of new memory technologies and multi-tier memory architectures has disrupted the traditional view of memory as a single block of volatile storage with uniform performance. Several options for managing data on heterogeneous memory platforms now exist, but current approaches either rely on inflexible, and often inefficient, hardware-based caching, or they require expert knowledge and source code modification to utilize the different performance and capabilities within each memory tier. This paper introduces a new software-based framework to collect and apply memory management guidance for applications on heterogeneous memory platforms. The framework, together with new tools, combines a generic runtime API with automated program profiling and analysis to achieve data management that is both efficient and portable across multiple memory architectures.
To validate this combined software approach, we deployed it on two real-world heterogeneous memory platforms: 1) an Intel® Cascade Lake platform with conventional DDR4 alongside nonvolatile Optane™ DC memory with large capacity, and 2) an Intel® Knights Landing platform high-bandwidth, but limited capacity, MCDRAM backed by DDR4 SDRAM. We tested our approach on these platforms using three scientific mini-applications (LULESH, AMG, and SNAP) as well as one scalable scientific application (QMCPACK) from the CORAL benchmark suite. The experiments show that portable application guidance has potential to improve performance significantly, especially on the Cascade Lake platform -- which achieved peak speedups of 22x and 7.8x over the default unguided software- and hardware-based management policies. Additionally, this work evaluates the impact of various factors on the effectiveness of memory usage guidance, including: the amount of capacity that is available in the high performance tier, the input that is used during program profiling, and whether the profiling was conducted on the same architecture or transferred from a machine with a different architecture.

References

[1]
[n.d.]. Flang. https://github.com/flang-compiler/flang
[2]
[n.d.]. pagemap, from the userspace perspective. https://www.kernel.org/doc/Documentation/vm/pagemap.txt
[3]
2019. https://perf.wiki.kernel.org/index.php/Main_Page
[4]
2019. SICM: Simplified Interface to Complex Memory - Exascale Computing Project 2019. https://www.exascaleproject.org/project/sicm-simplified-interface-complex-memory/
[5]
2019. U.S. Department of Energy and Cray to Deliver Record-Setting Frontier Supercomputer at ORNL. https://www.ornl.gov/news/us-department-energy-and-cray-deliver-record-setting-frontier-supercomputer-ornl
[6]
2019. U.S. Department of Energy and Intel to deliver first exascale supercomputer. https://www.anl.gov/article/us-department-of-energy-and-intel-to-deliver-first-exascale-supercomputer
[7]
Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W. Keckler. 2015. Page Placement Strategies for GPUs Within Heterogeneous Memory Systems. SIGPLAN Not. 50, 4 (March 2015), 607--618.
[8]
Neha Agarwal and Thomas F. Wenisch. 2017. Thermostat: Application-transparent Page Management for Two-tiered Main Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 631--644.
[9]
Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. 2013. Traffic management: a holistic approach to memory placement on NUMA systems. In ACM SIGPLAN Notices, Vol. 48. ACM, 381--394.
[10]
Subramanya R Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. 2016. Data tiering in heterogeneous memory systems. In Proceedings of the Eleventh European Conference on Computer Systems. ACM, 15.
[11]
T. Chad Effler, Adam P. Howard, Tong Zhou, Michael R. Jantz, Kshitij A. Doshi, and Prasad A. Kulkarni. 2018. On Automated Feedback-Driven Data Placement in Hybrid Memories. In LNCS International Conference on Architecture of Computing Systems (ARCS '18).
[12]
T. Chad Effler, Brandon Kammerdiener, Michael R. Jantz, Saikat Sengupta, Prasad A. Kulkarni, Kshitij A. Doshi, and Terry Jones. 2019. Evaluating the Effectiveness of Program Data Features for Guiding Data Management. In Proceedings of the International Symposium on Memory Systems (MemSys '19). ACM, New York, NY, USA.
[13]
Jason Evans. 2006. A Scalable Concurrent malloc (3) Implementation for FreeBSD. (2006).
[14]
Rentong Guo, Xiaofei Liao, Hai Jin, Jianhui Yue, and Guang Tan. 2015. Night-Watch: integrating lightweight and transparent cache pollution control into dynamic memory allocation systems. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 307--318.
[15]
Intel. 2019. Intel Optane DC Persistent Memory. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.
[16]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, Jishen Zhao, and Steven Swanson. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. CoRR abs/1903.05714 (2019). arXiv:1903.05714 http://arxiv.org/abs/1903.05714
[17]
Michael R. Jantz, Forrest J. Robinson, Prasad A. Kulkarni, and Kshitij A. Doshi. 2015. Cross-layer Memory Management for Managed Language Applications. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 488--504.
[18]
D. Kothe, S. Lee, and I. Qualters. 2019. Exascale Computing in the United States. Computing in Science Engineering 21, 1 (Jan 2019), 17--29.
[19]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04). IEEE Computer Society, Washington, DC, USA, 75--. http://dl.acm.org/citation.cfm?id=977395.977673
[20]
Y. Li, S. Ghose, J. Choi, J. Sun, H. Wang, and O. Mutlu. 2017. Utility-Based Hybrid Memory Management. In 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[21]
LLNL. 2014. CORAL Benchmark Codes. https://asc.llnl.gov/CORAL-benchmarks.
[22]
M.R. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, and G.H. Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. 126--136.
[23]
Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2016. Whirlpool: Improving Dynamic Cache Management with Static Data Classification. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 113--127.
[24]
Barack Obama. 2015. Executive Order -- Creating a National Strategic Computing Initiative. https://obamawhitehouse.archives.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative
[25]
Matthew Benjamin Olson, Joseph T. Teague, Divyani Rao, Michael R. JANTZ, Kshitij A. Doshi, and Prasad A. Kulkarni. 2018. Cross-Layer Memory Management to Improve DRAM Energy Efficiency. ACM Trans. Archit. Code Optim. 15, 2, Article 20 (May 2018), 27 pages.
[26]
M Ben Olson, Tong Zhou, Michael R Jantz, Kshitij A Doshi, M Graham Lopez, and Oscar Hernandez. 2018. MemBrain: Automated Application Guidance for Hybrid Memory Systems. In 2018 IEEE International Conference on Networking, Architecture and Storage (NAS). IEEE, 1--10.
[27]
H. Servat, A. J. PeÃśa, G. Llort, E. Mercadal, H. Hoppe, and J. Labarta. 2017. Automating the Application Data Placement in Hybrid Memory Systems. In 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[28]
Avinash Sodani. 2015. Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor. In Hot Chips 27 Symposium (HCS), 2015 IEEE. IEEE, 1--24.
[29]
Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Software-Defined Cache Hierarchies. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 652--665.
[30]
GR Voskuilen, MP Frank, SD Hammond, and AF Rodrigues. [n.d.]. Evaluating the Opportunities for Multi-Level Memory--An ASC 2016 L2 Milestone. ([n. d.]).
[31]
Kai Wu, Yingchao Huang, and Dong Li. 2017. Unimem: Runtime Data Managementon Non-volatile Memory-based Heterogeneous Main Memory. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 58, 14 pages.

Cited By

View all
  • (2024)The ECP SICM project: Managing complex memory hierarchies for exascale applicationsThe International Journal of High Performance Computing Applications10.1177/10943420241288243Online publication date: 3-Nov-2024
  • (2022)Online Application Guidance for Heterogeneous Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/353385519:3(1-27)Online publication date: 6-Jul-2022
  • (2021)Dancing in the Dark: Profiling for Tiered Memory2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00011(13-22)Online publication date: May-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '19: Proceedings of the International Symposium on Memory Systems
September 2019
517 pages
ISBN:9781450372060
DOI:10.1145/3357526
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3DXPoint
  2. application guidance
  3. heterogeneous memory
  4. memory management
  5. optane
  6. performance
  7. program profiling

Qualifiers

  • Research-article

Funding Sources

Conference

MEMSYS '19
MEMSYS '19: The International Symposium on Memory Systems
September 30 - October 3, 2019
District of Columbia, Washington, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)7
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The ECP SICM project: Managing complex memory hierarchies for exascale applicationsThe International Journal of High Performance Computing Applications10.1177/10943420241288243Online publication date: 3-Nov-2024
  • (2022)Online Application Guidance for Heterogeneous Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/353385519:3(1-27)Online publication date: 6-Jul-2022
  • (2021)Dancing in the Dark: Profiling for Tiered Memory2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00011(13-22)Online publication date: May-2021
  • (2020)Performance Potential of Mixed Data Management Modes for Heterogeneous Memory Systems2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)10.1109/MCHPC51950.2020.00007(10-16)Online publication date: Nov-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media