Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3297858.3304024acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Nimble Page Management for Tiered Memory Systems

Published: 04 April 2019 Publication History

Abstract

Software-controlled heterogeneous memory systems have the potential to increase the performance and cost efficiency of computing systems. However they can only deliver on this promise if supported by efficient page management policies and mechanisms within the operating system (OS). Current OS implementations do not support efficient tiering of data between heterogeneous memories. Instead, they rely on expensive offlining of memory or swapping data to disk as a means of profiling and migrating hot or cold data between memory nodes. They also leave numerous optimizations on the table; for example, multi-threaded hardware is not leveraged to maximize page migration throughput, resulting in up to 95% under-utilization of available memory bandwidth. To remedy these shortcomings, we propose and implement a general purpose OS-integrated multi-level memory management system that reuses current OS page tracking structures to tier pages directly between memories with no additional monitoring overhead. We augment this system with four additional optimizations: native support for transparent huge page migration, multi-threaded migration of a page, concurrent migration of multiple pages, and symmetric exchange of pages. Combined, these optimizations dramatically reduce kernel software overheads and improve raw page migration throughput over 15×. Implemented in Linux and evaluated on x86, Power, and ARM64 systems, our OS support for heterogeneous memories improves application performance 40% over baseline Linux for a suite of real-world memory-intensive workloads utilizing a multi-level disaggregated memory system.

References

[1]
Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W. Keckler. 2015. Page Placement Strategies for GPUs within Heterogeneous Memory Systems. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 607--618.
[2]
Neha Agarwal and Thomas F.Wenisch. 2017. Thermostat: Applicationtransparent Page Management for Two-tiered Main Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 631--644.
[3]
Nadav Amit. 2017. Optimizing the TLB Shootdown Algorithm with Page Access Tracking. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 27--39. https://www.usenix.org/conference/atc17/technical-sessions/ presentation/amit
[4]
Andrea Arcangeli. {n. d.}. RFC: Transparent Hugepage support. https: //lwn.net/Articles/358904/. {Online; accessed 31-Jul-2018}.
[5]
Amro Awad, Arkaprava Basu, Sergey Blagodurov, Yan Solihin, and Gabriel H. Loh. 2017. Avoiding TLB Shootdowns Through Self- Invalidating TLB Entries. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 273--287.
[6]
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 237--248.
[7]
Santiago Bock, Bruce R. Childers, Rami Melhem, and Daniel Mossé. 2014. Concurrent Page Migration for Mobile Systems with OSmanaged Hybrid Memory. In Proceedings of the 11th ACM Conference on Computing Frontiers (CF '14). ACM, New York, NY, USA, Article 31, 10 pages.
[8]
Rohit Chandra, Scott Devine, Ben Verghese, Anoop Gupta, and Mendel Rosenblum. 1994. Scheduling and Page Migration for Multiprocessor Compute Servers. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI). ACM, New York, NY, USA, 12--24.
[9]
Chiachen Chou, Aamer Jaleel, and Moinuddin Qureshi. 2017. BATMAN: Techniques for Maximizing System Bandwidth of Memory Systems with stacked-DRAM. In Proceedings of the International Symposium on Memory Systems (MEMSYS '17). ACM, New York, NY, USA, 268--280.
[10]
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2014. CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 1--12.
[11]
Julita Corbalan, Xavier Martorell, and Jesus Labarta. 2003. Evaluation of the Memory Page Migration Influence in the System Performance: The Case of the SGI O2000. In Proceedings of the 17th Annual International Conference on Supercomputing (ICS '03). ACM, New York, NY, USA, 121--129.
[12]
Jonathan Corbet. 2012. AutoNUMA: the other approach to NUMA scheduling. http://lwn.net/Articles/488709/. {Online; accessed 31-Jul- 2018}.
[13]
Guilherme Cox and Abhishek Bhattacharjee. 2017. Efficient Address Translation for Architectures with Multiple Page Sizes. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, NewYork, NY, USA, 435--448.
[14]
Kathy Davies. 2016. What's new in Hyper-V on Windows Server 2016 Technical Preview. https://technet. microsoft.com/en-us/windows-server-docs/compute/hyper-v/ what-s-new-in-hyper-v-on-windows. {Online; accessed: 31-Jul-2018}.
[15]
Peter J. Denning. 1967. The Working Set Model for Program Behavior. In Proceedings of the First ACM Symposium on Operating System Principles (SOSP '67). ACM, New York, NY, USA, 15.1--15.12.
[16]
Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, and Norman P. Jouppi. 2010. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10). IEEE Computer Society, Washington, DC, USA, 1--11.
[17]
Y. Du, M. Zhou, B. R. Childers, D. Mossé, and R. Melhem. 2015. Supporting superpages in non-contiguous physical memory. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 223--234.
[18]
Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2014. BadgerTrap: A Tool to Instrument x86--64 TLB Misses. SIGARCH Comput. Archit. News 42, 2 (Sept. 2014), 20--23.
[19]
Jayneel Gandhi, Vasileios Karakostas, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman S. Ünsal. 2016. Range Translations for Fast Virtual Memory. IEEE Micro 36, 3 (May 2016), 118--126.
[20]
Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quema. 2014. Large Pages May Be Harmful on NUMA Systems. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). USENIX Association, Philadelphia, PA, 231-- 242. https://www.usenix.org/conference/atc14/technical-sessions/ presentation/gaud
[21]
Mel Gorman. 2004. Understanding the Linux Virtual Memory Manager. Prentice Hall. https://books.google.com/books?id=ce1QAAAAMAAJ
[22]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. 2017. Efficient Memory Disaggregation with Infiniswap. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 649-- 667. https://www.usenix.org/conference/nsdi17/technical-sessions/ presentation/gu
[23]
Nagendra Gulur, Mahesh Mehendale, R. Manikantan, and R. Govindarajan. 2014. Bi-Modal DRAM Cache: A Scalable and Effective Die-Stacked DRAMCache. In Proceedings of the 47th Annual IEEE/ACMInternational Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 38--50.
[24]
Vishal Gupta, Min Lee, and Karsten Schwan. 2015. HeteroVisor: Exploiting Resource Heterogeneity to Enhance the Elasticity of Cloud Platforms. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '15). ACM, New York, NY, USA, 79--92.
[25]
Swapnil Haria, Mark D. Hill, and Michael M. Swift. 2018. Devirtualizing Memory in Heterogeneous Systems. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 637--650.
[26]
Intel. {n. d.}. Intel Memory Latency Checker. https://software.intel. com/en-us/articles/intelr-memory-latency-checker. {Online; accessed 31-Jul-2018}.
[27]
Intel. 2016. Knights Landing (KNL): 2nd Generation Intel Xeon Phi Processor. http://www.hotchips.org/wp-content/uploads/hc_archives/ hc27/HC27.25-Tuesday-Epub/HC27.25.70-Processors-Epub/HC27. 25.710-Knights-Landing-Sodani-Intel.pdf. {Online; accessed 31-Jul-2018}.
[28]
JEDEC. 2014. JESD79--4A: DDR4 SDRAM Standard. https://www. jedec.org/sites/default/files/docs/JESD79--4A.pdf. {Online; accessed 31-Jul-2018}.
[29]
JEDEC. 2015. High Bandwidth Memory(HBM) DRAM - JESD235A. http://www.jedec.org/standards-documents/docs/jesd235a. {Online; accessed 31-Jul-2018}.
[30]
Djordje Jevdjic, Gabriel H. Loh, Cansu Kaynak, and Babak Falsafi. 2014. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 25--37.
[31]
Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W. Hwu, Huian Li, Matthias S. Müller, Wolfgang E. Nagel, Maxim Perminov, Pavel Shelepugin, Kevin Skadron, John Stratton, Alexey Titov, KeWang, Matthijs vanWaveren, Brian Whitney, Sandra Wienke, Rengan Xu, and Kalyan Kumaran. 2015. SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance. Springer International Publishing, Cham, 46--67.
[32]
Sudarsun Kannan,Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 521--534.
[33]
Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman Ünsal. 2015. Redundant Memory Mappings for Fast Access to Large Memories. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 66--78.
[34]
Mohan Kumar, Steffen Maass, Sanidhya Kashyap, Ján Veselý, Zi Yan, Taesoo Kim, Abhishek Bhattacharjee, and Tushar Krishna. 2018. LATR: Lazy Translation Coherence. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 651--664.
[35]
Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J. Rossbach, and Emmett Witchel. 2016. Coordinated and Efficient Huge Page Management with Ingens. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 705--721. http://dl.acm.org/citation. cfm?id=3026877.3026931
[36]
Christoph Lameter. {n. d.}. Swap migration V3: Overview. https: //lwn.net/Articles/156603/. {Online; accessed 31-Jul-2018}.
[37]
Christoph Lameter. 2013. NUMA (Non-Uniform Memory Access): An Overview. Queue 11, 7, Article 40 (July 2013), 12 pages.
[38]
Lawerence Livermore National Laboratory. 2016. CORAL/Sierra. https: //asc.llnl.gov/coral-info. {Online; accessed 31-Jul-2018}.
[39]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting Phase Change Memory As a Scalable Dram Alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 2--13.
[40]
Baptiste Lepers, Vivien Quéma, and Alexandra Fedorova. 2015. Thread and memory placement on NUMA systems: asymmetry matters. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 277--289.
[41]
Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. 2009. Disaggregated Memory for Expansion and Sharing in Blade Servers. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 267--278.
[42]
Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2012. System-level Implications of Disaggregated Memory. In International Symposium on High-Performance Computer Architecture (HPCA). 1--12.
[43]
Felix Xiaozhu Lin and Xu Liu. 2016. Memif: Towards programming heterogeneous memory asynchronously. In Proceedings of the Twenty- First International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 369--383.
[44]
Gabriel H. Loh and Mark D. Hill. 2011. Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 454--464.
[45]
Jasmina Malicevic, Subramanya Dulloor, Narayanan Sundaram, Nadathur Satish, Jeff Jackson, and Willy Zwaenepoel. 2015. Exploiting nvm in large-scale graph analytics. In Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads. ACM, 2.
[46]
Sally A. McKee. 2004. Reflections on the Memory Wall. In Proceedings of the 1st Conference on Computing Frontiers (CF '04). ACM, New York, NY, USA, 162--.
[47]
Marshall Kirk McKusick and George V. Neville-Neil. 2004. The Design and Implementation of the FreeBSD Operating System. Pearson Education.
[48]
Mitesh R. Meswani, Sergey Blagodurov, David Roberts, J ohn Slice, Mike Ignatowski, and Gabriel H. Loh. 2015. Heterogeneous Memory Architectures: A HW/SW Approach For Mixing Die-stacked And Offpackage Memories. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 126--136.
[49]
Micron 2015. Hybrid Memory Cube Specification 2.1. https://www. nuvation.com/sites/default/files/Nuvation-Engineering-Images/ Articles/FPGAs-and-HMC/HMC-30G-VSR_HMCC_Specification. pdf. {Online; accessed 31-Jul-2018}.
[50]
Micron. 2016. 3D XPoint Technology. https://www.micron.com/ products/advanced-solutions/3d-xpoint-technology. {Online; accessed 31-Jul-2018}.
[51]
Jeffery Mogul, Eduardo Argollo, Mehul Shah, and Paolo Faraboschi. 2009. Operating System Support for NVM+DRAM Hybrid Main Memory. In Proceedings of the 12th Conference on Hot Topics in Operating Systems (HotOS'09). USENIX Association, Berkeley, CA, USA, 14--18. http://dl.acm.org/citation.cfm?id=1855568.1855582
[52]
Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. 2010. Introducing the Graph 500. In Cray User's Group.
[53]
Linux Newbies. 2017. Linux 4.14 Release Note. https://kernelnewbies. org/Linux_4.14#Memory_management
[54]
Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou, Constantine D. Polychronopoulos, Jesús Labarta, and Eduard Ayguadé. 2000. A Case for User-level Dynamic Page Migration. In Proceedings of the 14th International Conference on Supercomputing (ICS '00). ACM, New York, NY, USA, 119--130.
[55]
Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou, Constantine D. Polychronopoulos, Jesús Labarta, and Eduard Ayguadé. 2000. User- Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors. In Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing (ICPP '00). IEEE Computer Society, Washington, DC, USA, 95--. http://dl.acm.org/citation.cfm? id=850941.852887
[56]
NVIDIA Corporation. 2013. Unified Memory in CUDA 6. http: //devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/. {Online; accessed 31-Jul-2018}.
[57]
NVIDIA Corporation. 2014. NVLink, Pascal and Stacked Memory: Feeding the Appetite for Big Data. http://devblogs.nvidia.com/parallelforall/ nvlink-pascal-stacked-memory-feeding-appetite-big-data/. {Online; accessed 14-Aug-2016}.
[58]
Oak Ridge National Laboratory. 2018. Summit. https://www.olcf.ornl. gov/summit/. {Online; accessed 31-Jul-2018}.
[59]
Mark Oskin and Gabriel H. Loh. 2015. A Software-Managed Approach to Die-Stacked DRAM. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 188--200.
[60]
Ashish Panwar, Aravinda Prasad, and K. Gopinath. 2018. Making Huge Pages Actually Useful. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). ACM, New York, NY, USA, 679--692.
[61]
Misel-Myrto Papadopoulou, Xin Tong, André Seznec, and Andreas Moshovos. 2015. Prediction-based superpage-friendly TLB designs. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 210--222.
[62]
Mayank Parasar, Abhishek Bhattacharjee, and Tushar Krishna. 2018. SEESAW: Using Superpages to Improve VIPT Caches. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). IEEE Press, Piscataway, NJ, USA, 193--206.
[63]
J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot Chips 23 Symposium (HCS). 1--24.
[64]
Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB reach by exploiting clustering in page translations. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 558--567.
[65]
Binh Pham, Ján Veselý, Gabriel H. Loh, and Abhishek Bhattacharjee. 2015. Large Pages and Lightweight Memory Management in Virtualized Environments: Can You Have It Both Ways?. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 1--12.
[66]
Bharath Pichai, Lisa Hsu, and Abhishek Bhattacharjee. 2014. Architectural Support for Address Translation on GPUs: Designing Memory Management Units for CPU/GPUs with Unified Address Spaces. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, NewYork, NY, USA, 743--758.
[67]
Jason Power, Mark D. Hill, and David A. Wood. 2014. Supporting x86--64 address translation for 100s of GPU lanes. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 568--578.
[68]
Moinuddin K. Qureshi and Gabe H. Loh. 2012. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design. In Proceedings of the 2012 45th Annual International Symposium on Microarchitecture. 12.
[69]
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable High Performance Main Memory System Using Phasechange Memory Technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 24--33.
[70]
Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page Placement in Hybrid Memory Systems. In Proceedings of the International Conference on Supercomputing (ICS '11). ACM, New York, NY, USA, 85--95.
[71]
Bogdan F. Romanescu, Alvin R. Lebeck, Daniel J. Sorin, and Anne Bracy. 2010. UNified Instruction/Translation/Data (UNITD) coherence: One protocol to rule them all. In HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. 1--12.
[72]
Jee Ho Ryoo, Lizy K. John, and Arkaprava Basu. 2018. A Case for Granularity Aware Page Migration. In Proceedings of the International Conference on Supercomputing (ICS '18). ACM, New York, NY, USA.
[73]
Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B Gibbons, and Michael A Kozuch. 2016. RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 481--493.
[74]
André Seznec. 2004. Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB. IEEE Trans. Comput. 53, 7 (July 2004), 924--927.
[75]
Jaewoong Sim, Alaa R Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. 2014. Transparent hardware management of stacked dram as part of memory. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 13--24.
[76]
Jaewoong Sim, Gabriel H. Loh, Hyesoon Kim, Mike O'Connor, and Mithuna Thottethodi. 2012. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 247--257.
[77]
Mustafa M. Tikir and Jeffrey K. Hollingsworth. 2008. Hardware Monitors for Dynamic Page Migration. J. Parallel Distrib. Comput. 68, 9 (Sept. 2008), 1186--1200.
[78]
Linus Torvalds. 2014. Performance profiling on core kernel code. https: //plus.google.com/+LinusTorvalds/posts/YDKRFDwHwr6. {Online; accessed 31-Jul-2018}.
[79]
UEFI.org. 2017. Advanced Configuration and Power Interface Specification, Version 6.2. http://www.uefi.org/sites/default/files/resources/ ACPI_6_2.pdf. {Online; accessed 31-Jul-2018}.
[80]
Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adrian Cristal, and Osman S. Unsal. 2011. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory. In 2011 International Conference on Parallel Architectures and Compilation Techniques. 340-- 349.
[81]
Hao Wang, Jie Zhang, Sharmila Shridhar, Gieseo Park, Myoungsoo Jung, and Nam Sung Kim. 2016. DUANG: Fast and lightweight page migration in asymmetric memory systems. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 481--493.
[82]
Zi Yan, Ján Veselý, Guilherme Cox, and Abhishek Bhattacharjee. 2017. Hardware Translation Coherence for Virtualized Systems. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 430--443.
[83]
Ross Zwisler. 2017. Surface Heterogeneous Memory Performance Information. https://lwn.net/Articles/727348/. {Online; accessed 31- Jul-2018}.

Cited By

View all
  • (2024)Enhancing QoS in Multicore Systems with Heterogeneous Memory ConfigurationsElectronics10.3390/electronics1317349213:17(3492)Online publication date: 3-Sep-2024
  • (2024)Moses: Heap Partitioning for Semantic Data TieringProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699386(25-32)Online publication date: 3-Nov-2024
  • (2024)Virtual Memory Revisited for Tiered MemoryProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680475(1-7)Online publication date: 4-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
April 2019
1126 pages
ISBN:9781450362405
DOI:10.1145/3297858
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. heterogeneous memory management
  2. operating system
  3. page migration

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '19

Acceptance Rates

ASPLOS '19 Paper Acceptance Rate 74 of 351 submissions, 21%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)973
  • Downloads (Last 6 weeks)131
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing QoS in Multicore Systems with Heterogeneous Memory ConfigurationsElectronics10.3390/electronics1317349213:17(3492)Online publication date: 3-Sep-2024
  • (2024)Moses: Heap Partitioning for Semantic Data TieringProceedings of the 2nd Workshop on Disruptive Memory Systems10.1145/3698783.3699386(25-32)Online publication date: 3-Nov-2024
  • (2024)Virtual Memory Revisited for Tiered MemoryProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680475(1-7)Online publication date: 4-Sep-2024
  • (2024)Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace HopperProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673110(199-209)Online publication date: 12-Aug-2024
  • (2024)Trimma: Trimming Metadata Storage and Latency for Hybrid Memory SystemsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3689612(108-120)Online publication date: 14-Oct-2024
  • (2024)IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658659(69-82)Online publication date: 3-Jun-2024
  • (2024)FaaSMem: Improving Memory Efficiency of Serverless Computing with Memory Pool ArchitectureProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651355(331-348)Online publication date: 27-Apr-2024
  • (2024)GMT: GPU Orchestrated Memory Tiering for the Big Data EraProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651353(464-478)Online publication date: 27-Apr-2024
  • (2024)Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00065(834-847)Online publication date: 29-Jun-2024
  • (2024)A Three-Tier Buffer Manager Integrating CXL Device Memory for Database Systems2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00063(395-401)Online publication date: 13-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media