research-article

Multiple Physical Mappings: Dynamic DRAM Channel Sharing and Partitioning

Authors:

Marius Hillenbrand,

Mathias Gottschlag,

Frank BellosaAuthors Info & Claims

APSys '17: Proceedings of the 8th Asia-Pacific Workshop on Systems

Article No.: 21, Pages 1 - 9

https://doi.org/10.1145/3124680.3124742

Published: 02 September 2017 Publication History

Abstract

When an OS allocates memory to a process, it implicitly performs long-term scheduling on DRAM resources such as channels and banks: Each mapped page frame allows memory operations to send requests to the channels and DRAM banks which are backing that page frame. The OS should be able to choose between sharing or dedicating resources dynamically -- yet it cannot do that on conventional systems.

We observed slowdowns from DRAM interference of up to 36% on our 4-core prototype platform for some combinations of workloads, caused by the uncontrolled sharing of DRAM channels in the typical configuration of channel interleaving. Previous work proposed channel partitioning to mitigate that interference, but thereby reduces maximum throughput for individual applications even when workloads do not interfere.

With our approach, we enable the OS to choose between channel interleaving and partitioning at run-time, at the granularity of address space (AS) segments. For that purpose, we map DRAM into the physical AS multiple times, as one dedicated region per channel for partitioning and then as another region that interleaves all channels. We implement this approach on commodity hardware. We change the OS's memory management so that we can dedicate channels to processes or share channels between processes with interleaving by choosing page frames from the appropriate region. As a result, we can switch to the configuration that achieves optimum execution speed and system throughput at application run-time (e.g., when workloads change), whereas a conventional system would have to choose interleaving or partitioning while booting.

References

[1]

2009. NAS Parallel Benchmarks (NPB) version 3.3.1. (2009). http://www.nas.nasa.gov/Software/NPB/

[2]

Reto Achermann, Lukas Humbel, David Cock, and Timothy Roscoe. 2017. Formalizing Memory Accesses and Interrupts. In Proceedings of the 2nd Workshop on Models for Formal Analysis of Real Systems (MARS 2017). Open Publishing Association, 66--116.

[3]

Advanced Micro Devices, Inc. 2015. BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 30h-3Fh Processors (49125 rev 3.06 ed.). Advanced Micro Devices, Inc.

[4]

Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.

Digital Library

[5]

David Black-Schaffer, Nikos Nikoleris, Erik Hagersten, and David Eklov. 2013. Bandwidth Bandit: Quantitative Characterization of Memory Contention. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO '13). IEEE Computer Society, Washington, DC, USA, 1--10.

Digital Library

[6]

Mahdi Nazm Bojnordi and Engin Ipek. 2012. PARDIS: A Programmable Memory Controller for the DDRx Interfacing Standards. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). IEEE Computer Society, Washington, DC, USA, 13--24.

Digital Library

[7]

Intel Corporation. 2016. Intel 64 and IA-32 Architectures Software Developer's Manual - Volume 3 (325384-060us ed.). Chapter 17.17.

[8]

Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. 2014. Large Pages May Be Harmful on NUMA Systems. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 231--242.

[9]

Simon Gerber, Gerd Zellweger, Reto Achermann, Kornilios Kourtis, Timothy Roscoe, and Dejan Milojicic. 2015. Not Your Parents' Physical Address Space. In Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HOTOS'15). USENIX Association, Berkeley, CA, USA.

[10]

Mohsen Ghasempour, Aamer Jaleel, Jim D. Garside, and Mikel Luján. 2016. DReAM: Dynamic Re-Arrangement of Address Mapping to Improve the Performance of DRAMs. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS '16). ACM, New York, NY, USA, 362--373.

Digital Library

[11]

Mark Gottscho, Sriram Govindan, Bikash Sharma, Mohammed Shoaib, and Puneet Gupta. 2016. X-Mem: A Cross-Platform and Extensible Memory Characterization Tool for the Cloud. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 263--273.

[12]

John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Computer Architecture News 34, 4 (Sept. 2006), 1--17.

Digital Library

[13]

Intel Corporation 2009. Intel Xeon Processor 5500 Series Datasheet, Volume 2 (321322-002 ed.). Intel Corporation.

[14]

Intel Corporation 2010. Intel Xeon Processor 7500 Series Datasheet, Volume 2 (323341-001 ed.). Intel Corporation.

[15]

Bruce Jacob, Spencer Ng, and David Wang. 2007. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Digital Library

[16]

Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Mike Sullivan, Ikhwan Lee, and Mattan Erez. 2012. Balancing DRAM locality and parallelism in shared memory CMP systems. In 18th IEEE International Symposium on High Performance Computer Architecture, HPCA 2012. IEEE Computer Society, 53--64.

Digital Library

[17]

Gangyong Jia, Xi Li, Youwei Yuan, Jian Wan, Congfeng Jiang, and Dong Dai. 2014. PseudoNUMA for Reducing Memory Interference in Multi-core Systems. In Proceedings of the High Performance Computing Symposium (HPC '14). Society for Computer Simulation International, San Diego, CA, USA, Article 6, 8 pages.

Digital Library

[18]

Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. 2010. Thread Cluster Memory Scheduling: Exploiting Diferences in Memory Access Behavior. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '43). IEEE Computer Society, Washington, DC, USA, 65--76.

Digital Library

[19]

Jochen Liedtke, Hermann Haertig, and Michael Hohmuth. 1997. OS-Controlled Cache Predictability for Real-Time Systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97). IEEE Computer Society, Washington, DC, USA, 213-.

[20]

Lei Liu, Zehan Cui, Yong Li, Yungang Bao, Mingyu Chen, and Chengyong Wu. 2014. BPM/BPM+: Software-Based Dynamic Memory Partitioning Mechanisms for Mitigating DRAM Bank-/Channel-Level Interferences in Multicore Systems. ACM Transactions on Architecture and Code Optimization 11, 1, Article 5 (Feb. 2014), 28 pages.

Digital Library

[21]

Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, and Chengyong Wu. 2012. A Software Memory Partition Approach for Eliminating Bank-Level Interference in Multicore Systems. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12). ACM, New York, NY, USA, 367--376.

Digital Library

[22]

R.A. Maddox, G. Singh, and R.J. Safranek. 2009. Weaving High Performance Multiprocessor Fabric: Architectural Insights Into the Intel QuickPath Interconnect. Intel Press.

[23]

Wei Mi, Xiaobing Feng, Jingling Xue, and Yaocang Jia. 2010. Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors. In Proceedings of the 2010 IFIP International Conference on Network and Parallel Computing (NPC'10). Springer-Verlag, Berlin, Heidelberg, 329--343.

[24]

Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. 2011. Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 374--385.

Digital Library

[25]

Onur Mutlu and Lavanya Subramanian. 2014. Research Problems and Opportunities in Memory Systems. Supercomputing Frontiers and Innovations: an International Journal 1, 3 (Oct. 2014), 19--55.

Digital Library

[26]

Wonjun Song, Hyunwoo Choi, Junhong Kim, Eunsoo Kim, Yongdae Kim, and John Kim. 2016. PIkit: A New Kernel-Independent Processor-Interconnect Rootkit. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 37--51. htps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/song

[27]

Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, and Onur Mutlu. 2016. BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling. IEEE Transactions on Parallel and Distributed Systems 27, 10 (Oct. 2016), 3071--3087.

Digital Library

[28]

Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-application Interference at Shared Caches and Main Memory. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 62--75.

Digital Library

[29]

Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, and Onur Mutlu. 2013. MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA '13). IEEE Computer Society, Washington, DC, USA, 639--650.

Digital Library

[30]

Linus Torvalds. 2016. Linux kernel v4.4.36, Documentation/memory-hotplug.txt. (2016). http://www.kernel.org/

[31]

UEFI Forum 2016. Advanced Configuration and Power Interface Specification (6.1 ed.). UEFI Forum.

[32]

Mingli Xie, Dong Tong, Kan Huang, and Xu Cheng. 2014. Improving System Throughput and Fairness Simultaneously in Shared Memory CMP Systems via Dynamic Bank Partitioning. In 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014. IEEE Computer Society, 344--355.

[33]

Heechul Yun, Renato Mancuso, Zheng Pei Wu, and Rodolfo Pellizzoni. 2014. PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms. In 20th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2014, Berlin, Germany, April 15-17, 2014. IEEE Computer Society, 155--166.

[34]

Lixin Zhang, Zhen Fang, Mide Parker, Binu K. Mathew, Lambert Schaelicke, John B. Carter, Wilson C. Hsieh, and Sally A. McKee. 2001. The Impulse Memory Controller. IEEE Trans. Comput. 50, 11 (Nov. 2001), 1117--1132.

Digital Library

[35]

Sergey Zhuravlev, Juan Carlos Saez, Sergey Blagodurov, Alexandra Fedorova, and Manuel Prieto. 2012. Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors. Comput. Surveys 45, 1, Article 4 (Dec. 2012), 28 pages.

Digital Library

Cited By

Zhao YGao MLiu FHu YWang ZLin HLi JXian HDong HYang TJing NLiang XJiang L(2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00053
Bakita JAnderson J(2022)Enabling GPU Memory Oversubscription via Transparent Paging to an NVMe SSD2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00039(370-382)Online publication date: Dec-2022
https://doi.org/10.1109/RTSS55097.2022.00039
Bakita JAhmed SOsborne STang SChen JSmith FAnderson J(2021)Simultaneous Multithreading in Mixed-Criticality Real-Time Systems2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00030(278-291)Online publication date: May-2021
https://doi.org/10.1109/RTAS52030.2021.00030

Index Terms

Multiple Physical Mappings: Dynamic DRAM Channel Sharing and Partitioning
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Allocation / deallocation strategies
        Main memory

Recommendations

Reducing memory interference in multicore systems via application-aware memory channel partitioning
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Main memory is a major shared resource among cores in a multicore system. If the interference between different applications' memory requests is not controlled effectively, system performance can degrade significantly. Previous work aimed to mitigate ...
Effects of multiple co-channel interferers on the performance of amplify-and-forward relaying with optimum combining, multiple relays and multiple antennas

Cooperative relaying is a method used mainly to improve cellular networks in terms of diversity gain and coverage extension. However, its performance is gravely affected by the co-channel interference (CCI), especially when a high channel reuse is ...
Putting the OS in control of DRAM with mapping aliases
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference

On multicore CPUs, processes compete for shared resources such as caches and memory. Sharing DRAM resources such as channels and banks can increase request latencies and thereby slow down applications [3]. In our experiments, we observe that slowdowns ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

APSys '17: Proceedings of the 8th Asia-Pacific Workshop on Systems

September 2017

207 pages

ISBN:9781450351973

DOI:10.1145/3124680

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 September 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

APSys '17

Sponsor:

SIGOPS

APSys '17: 8th Asia-Pacific Workshop on Systems

September 2, 2017

Mumbai, India

Acceptance Rates

APSys '17 Paper Acceptance Rate 27 of 51 submissions, 53%;

Overall Acceptance Rate 169 of 430 submissions, 39%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
379
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)8

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao YGao MLiu FHu YWang ZLin HLi JXian HDong HYang TJing NLiang XJiang L(2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00053
Bakita JAnderson J(2022)Enabling GPU Memory Oversubscription via Transparent Paging to an NVMe SSD2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00039(370-382)Online publication date: Dec-2022
https://doi.org/10.1109/RTSS55097.2022.00039
Bakita JAhmed SOsborne STang SChen JSmith FAnderson J(2021)Simultaneous Multithreading in Mixed-Criticality Real-Time Systems2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00030(278-291)Online publication date: May-2021
https://doi.org/10.1109/RTAS52030.2021.00030
Vijaykumar NJain AMajumdar DHsieh KPekhimenko GEbrahimi EHajinazar NGibbons PMutlu O(2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00027

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents