Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3124680.3124742acmconferencesArticle/Chapter ViewAbstractPublication PagesapsysConference Proceedingsconference-collections
research-article

Multiple Physical Mappings: Dynamic DRAM Channel Sharing and Partitioning

Published: 02 September 2017 Publication History

Abstract

When an OS allocates memory to a process, it implicitly performs long-term scheduling on DRAM resources such as channels and banks: Each mapped page frame allows memory operations to send requests to the channels and DRAM banks which are backing that page frame. The OS should be able to choose between sharing or dedicating resources dynamically -- yet it cannot do that on conventional systems.
We observed slowdowns from DRAM interference of up to 36% on our 4-core prototype platform for some combinations of workloads, caused by the uncontrolled sharing of DRAM channels in the typical configuration of channel interleaving. Previous work proposed channel partitioning to mitigate that interference, but thereby reduces maximum throughput for individual applications even when workloads do not interfere.
With our approach, we enable the OS to choose between channel interleaving and partitioning at run-time, at the granularity of address space (AS) segments. For that purpose, we map DRAM into the physical AS multiple times, as one dedicated region per channel for partitioning and then as another region that interleaves all channels. We implement this approach on commodity hardware. We change the OS's memory management so that we can dedicate channels to processes or share channels between processes with interleaving by choosing page frames from the appropriate region. As a result, we can switch to the configuration that achieves optimum execution speed and system throughput at application run-time (e.g., when workloads change), whereas a conventional system would have to choose interleaving or partitioning while booting.

References

[1]
2009. NAS Parallel Benchmarks (NPB) version 3.3.1. (2009). http://www.nas.nasa.gov/Software/NPB/
[2]
Reto Achermann, Lukas Humbel, David Cock, and Timothy Roscoe. 2017. Formalizing Memory Accesses and Interrupts. In Proceedings of the 2nd Workshop on Models for Formal Analysis of Real Systems (MARS 2017). Open Publishing Association, 66--116.
[3]
Advanced Micro Devices, Inc. 2015. BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 30h-3Fh Processors (49125 rev 3.06 ed.). Advanced Micro Devices, Inc.
[4]
Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[5]
David Black-Schaffer, Nikos Nikoleris, Erik Hagersten, and David Eklov. 2013. Bandwidth Bandit: Quantitative Characterization of Memory Contention. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO '13). IEEE Computer Society, Washington, DC, USA, 1--10.
[6]
Mahdi Nazm Bojnordi and Engin Ipek. 2012. PARDIS: A Programmable Memory Controller for the DDRx Interfacing Standards. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). IEEE Computer Society, Washington, DC, USA, 13--24.
[7]
Intel Corporation. 2016. Intel 64 and IA-32 Architectures Software Developer's Manual - Volume 3 (325384-060us ed.). Chapter 17.17.
[8]
Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. 2014. Large Pages May Be Harmful on NUMA Systems. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 231--242.
[9]
Simon Gerber, Gerd Zellweger, Reto Achermann, Kornilios Kourtis, Timothy Roscoe, and Dejan Milojicic. 2015. Not Your Parents' Physical Address Space. In Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HOTOS'15). USENIX Association, Berkeley, CA, USA.
[10]
Mohsen Ghasempour, Aamer Jaleel, Jim D. Garside, and Mikel Luján. 2016. DReAM: Dynamic Re-Arrangement of Address Mapping to Improve the Performance of DRAMs. In Proceedings of the Second International Symposium on Memory Systems (MEMSYS '16). ACM, New York, NY, USA, 362--373.
[11]
Mark Gottscho, Sriram Govindan, Bikash Sharma, Mohammed Shoaib, and Puneet Gupta. 2016. X-Mem: A Cross-Platform and Extensible Memory Characterization Tool for the Cloud. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 263--273.
[12]
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Computer Architecture News 34, 4 (Sept. 2006), 1--17.
[13]
Intel Corporation 2009. Intel Xeon Processor 5500 Series Datasheet, Volume 2 (321322-002 ed.). Intel Corporation.
[14]
Intel Corporation 2010. Intel Xeon Processor 7500 Series Datasheet, Volume 2 (323341-001 ed.). Intel Corporation.
[15]
Bruce Jacob, Spencer Ng, and David Wang. 2007. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[16]
Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Mike Sullivan, Ikhwan Lee, and Mattan Erez. 2012. Balancing DRAM locality and parallelism in shared memory CMP systems. In 18th IEEE International Symposium on High Performance Computer Architecture, HPCA 2012. IEEE Computer Society, 53--64.
[17]
Gangyong Jia, Xi Li, Youwei Yuan, Jian Wan, Congfeng Jiang, and Dong Dai. 2014. PseudoNUMA for Reducing Memory Interference in Multi-core Systems. In Proceedings of the High Performance Computing Symposium (HPC '14). Society for Computer Simulation International, San Diego, CA, USA, Article 6, 8 pages.
[18]
Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. 2010. Thread Cluster Memory Scheduling: Exploiting Diferences in Memory Access Behavior. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '43). IEEE Computer Society, Washington, DC, USA, 65--76.
[19]
Jochen Liedtke, Hermann Haertig, and Michael Hohmuth. 1997. OS-Controlled Cache Predictability for Real-Time Systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97). IEEE Computer Society, Washington, DC, USA, 213-.
[20]
Lei Liu, Zehan Cui, Yong Li, Yungang Bao, Mingyu Chen, and Chengyong Wu. 2014. BPM/BPM+: Software-Based Dynamic Memory Partitioning Mechanisms for Mitigating DRAM Bank-/Channel-Level Interferences in Multicore Systems. ACM Transactions on Architecture and Code Optimization 11, 1, Article 5 (Feb. 2014), 28 pages.
[21]
Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, and Chengyong Wu. 2012. A Software Memory Partition Approach for Eliminating Bank-Level Interference in Multicore Systems. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12). ACM, New York, NY, USA, 367--376.
[22]
R.A. Maddox, G. Singh, and R.J. Safranek. 2009. Weaving High Performance Multiprocessor Fabric: Architectural Insights Into the Intel QuickPath Interconnect. Intel Press.
[23]
Wei Mi, Xiaobing Feng, Jingling Xue, and Yaocang Jia. 2010. Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors. In Proceedings of the 2010 IFIP International Conference on Network and Parallel Computing (NPC'10). Springer-Verlag, Berlin, Heidelberg, 329--343.
[24]
Sai Prashanth Muralidhara, Lavanya Subramanian, Onur Mutlu, Mahmut Kandemir, and Thomas Moscibroda. 2011. Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 374--385.
[25]
Onur Mutlu and Lavanya Subramanian. 2014. Research Problems and Opportunities in Memory Systems. Supercomputing Frontiers and Innovations: an International Journal 1, 3 (Oct. 2014), 19--55.
[26]
Wonjun Song, Hyunwoo Choi, Junhong Kim, Eunsoo Kim, Yongdae Kim, and John Kim. 2016. PIkit: A New Kernel-Independent Processor-Interconnect Rootkit. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 37--51. htps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/song
[27]
Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, and Onur Mutlu. 2016. BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling. IEEE Transactions on Parallel and Distributed Systems 27, 10 (Oct. 2016), 3071--3087.
[28]
Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-application Interference at Shared Caches and Main Memory. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 62--75.
[29]
Lavanya Subramanian, Vivek Seshadri, Yoongu Kim, Ben Jaiyen, and Onur Mutlu. 2013. MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA '13). IEEE Computer Society, Washington, DC, USA, 639--650.
[30]
Linus Torvalds. 2016. Linux kernel v4.4.36, Documentation/memory-hotplug.txt. (2016). http://www.kernel.org/
[31]
UEFI Forum 2016. Advanced Configuration and Power Interface Specification (6.1 ed.). UEFI Forum.
[32]
Mingli Xie, Dong Tong, Kan Huang, and Xu Cheng. 2014. Improving System Throughput and Fairness Simultaneously in Shared Memory CMP Systems via Dynamic Bank Partitioning. In 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014. IEEE Computer Society, 344--355.
[33]
Heechul Yun, Renato Mancuso, Zheng Pei Wu, and Rodolfo Pellizzoni. 2014. PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms. In 20th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2014, Berlin, Germany, April 15-17, 2014. IEEE Computer Society, 155--166.
[34]
Lixin Zhang, Zhen Fang, Mide Parker, Binu K. Mathew, Lambert Schaelicke, John B. Carter, Wilson C. Hsieh, and Sally A. McKee. 2001. The Impulse Memory Controller. IEEE Trans. Comput. 50, 11 (Nov. 2001), 1117--1132.
[35]
Sergey Zhuravlev, Juan Carlos Saez, Sergey Blagodurov, Alexandra Fedorova, and Manuel Prieto. 2012. Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors. Comput. Surveys 45, 1, Article 4 (Dec. 2012), 28 pages.

Cited By

View all
  • (2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
  • (2022)Enabling GPU Memory Oversubscription via Transparent Paging to an NVMe SSD2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00039(370-382)Online publication date: Dec-2022
  • (2021)Simultaneous Multithreading in Mixed-Criticality Real-Time Systems2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00030(278-291)Online publication date: May-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
APSys '17: Proceedings of the 8th Asia-Pacific Workshop on Systems
September 2017
207 pages
ISBN:9781450351973
DOI:10.1145/3124680
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 September 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Channel Interleaving
  2. DRAM Partitioning
  3. Dynamic DRAM Address Mapping
  4. Interference
  5. Memory Controllers

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

APSys '17
Sponsor:

Acceptance Rates

APSys '17 Paper Acceptance Rate 27 of 51 submissions, 53%;
Overall Acceptance Rate 169 of 430 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)8
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00053(644-659)Online publication date: 29-Jun-2024
  • (2022)Enabling GPU Memory Oversubscription via Transparent Paging to an NVMe SSD2022 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS55097.2022.00039(370-382)Online publication date: Dec-2022
  • (2021)Simultaneous Multithreading in Mixed-Criticality Real-Time Systems2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00030(278-291)Online publication date: May-2021
  • (2018)A case for richer cross-layer abstractionsProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00027(207-220)Online publication date: 2-Jun-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media