Article

Coordinated and efficient huge page management with ingens

Authors:

Christopher J. Rossbach,

Emmett WitchelAuthors Info & Claims

OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation

Pages 705 - 721

Published: 02 November 2016 Publication History

Abstract

Modern computing is hungry for RAM, with today's enormous capacities eagerly consumed by diverse workloads. Hardware address translation overheads have grown with memory capacity, motivating hardware manufacturers to provide TLBs with thousands of entries for large page sizes (called huge pages). Operating systems and hypervisors support huge pages with a hodge-podge of best-effort algorithms and spot fixes that made sense for architectures with limited huge page support, but the time has come for a more fundamental redesign.

Ingens is a framework for huge page support that relies on a handful of basic primitives to provide transparent huge page support in a principled, coordinated way. By managing contiguity as a first-class resource and by tracking utilization and access frequency of memory pages, Ingens is able to eliminate a number of fairness and performance pathologies that plague current systems. Experiments with our prototype demonstrate fairness improvements, performance improvements (up to 18%), tail-latency reduction (up to 41%), and reduction of memory bloat from 69% to less than 1% for important applications like Web services (e.g., the Cloudstone benchmark) and the Redis key-value store.

References

[1]

http://www.7-cpu.com/cpu/Skylake.html. [Accessed April, 2016].

[2]

http://www.7-cpu.com/cpu/Haswell.html. [Accessed April, 2016].

[3]

Apache Cloudstack. https://en.wikipedia.org/wiki/Apache_CloudStack. [Accessed April, 2016].

[4]

Apache Hadoop. http://hadoop.apache.org/. [Accessed April, 2016].

[5]

Apache Spark. http://spark.apache.org/docs/latest/index.html. [Accessed April, 2016].

[6]

Application-friendly kernel interfaces. https://lwn.net/Articles/227818/. [March, 2007].

[7]

Cloudera recommends turning off memory compaction due to high CPU utilization. http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_admin_performance.html. [Accessed April, 2016].

[8]

Cloudsuite. http://parsa.epfl.ch/cloudsuite/graph.html. [Accessed April, 2016].

[9]

CouchBase recommends disabling huge pages. http://blog.couchbase.com/often-overlooked-linux-os-tweaks. [March, 2014].

[10]

Data Plane Development Kit. http://www.dpdk.org/. [Accessed April-2016].

[11]

DokuDB recommends disabling huge pages. https://www.percona.com/blog/2014/07/23/why-tokudb-hates-transparent-hugepages/. [July, 2014].

[12]

Exponential moving average. https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average. [Accessed April, 2016].

[13]

High CPU utilization in Hadoop due to transparent huge pages. https://www.ghostar.org/2015/02/transparent-huge-pages-on-hadoop-makes-me-sad/. [February, 2015].

[14]

High CPU utilization in Mysql due to transparent huge pages. http://developer.okta.com/blog/2015/05/22/tcmalloc. [May, 2015].

[15]

Huge page support in Mac OS X. https://developer.apple.com/legacy/ library/documentation/Darwin/Reference/ManPages/man2/mmap.2.html. [Accessed April-2016].

[16]

IBM cloud with KVM hypervisor. http: //www.networkworld.com/article/2230172/opensource-subnet/redhat-s-kvm-virtualization-proves-itself-in-ibm-s-cloud.html. [March, 2010].

[17]

IBM recommends turning off huge pages due to high CPU utilization. http://www-01.ibm.com/support/docview.wss?uid=swg21677458. [July, 2014].

[18]

Intel hardware random number generator. https://software.intel.com/enus/articles/intel-digital-random-number-generator-drng-software-implementation-guide. [May, 2014].

[19]

Intel HiBench. https://github.com/intel-hadoop/HiBench/tree/master/workloads. [Accessed April, 2016].

[20]

Jemalloc. http://www.canonware.com/jemalloc/. [Accessed April-2016].

[21]

Large-page support in Windows. https://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs.85).aspx. [Accessed April-2016].

[22]

Liblinear. https://www.csie.ntu.edu.tw/~cjlin/liblinear/. [Accessed April, 2016].

[23]

MongoDB. https://www.mongodb.com/. [Accessed April, 2016].

[24]

MongoDB recommends disabling huge pages. https://docs.mongodb.org/manual/tutorial/transparent-huge-pages/. [Accessed April, 2016].

[25]

Movie recommendation with Spark. http: //ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html. [Accessed April, 2016].

[26]

NuoDB recommends disabling huge pages. http: //www.nuodb.com/techblog/linux-transparent-huge-pages-jemalloc-and-nuodb. [May, 2014].

[27]

OpenStack. https://openvirtualizationalliance.org/what-kvm/openstack. [Accessed April-2016].

[28]

PARSEC 3.0 benchmark suite. http://parsec.cs.princeton.edu/. [Accessed April, 2016].

[29]

Redis. http://redis.io/. [Accessed April, 2016].

[30]

Redis recommends disabling huge pages. http://redis.io/topics/latency. [Accessed April, 2016].

[31]

Redis SSD swap discussion. http://antirez.com/news/52. [March, 2013].

[32]

SAP IQ recommends disabling huge pages. http: //scn.sap.com/people/markmumy/blog/2014/05/22/sap-iq-and-linux-hugepagestransparent-hugepages. [May, 2014].

[33]

SPEC CPU 2006. https://www.spec.org/cpu2006/. [Accessed April, 2016].

[34]

Splunk recommends disabling huge pages. http://docs.splunk.com/Documentation/Splunk/6.1.3/ReleaseNotes/SplunkandTHP. [December, 2013].

[35]

Thread-caching malloc. http://googperftools.sourceforge.net/doc/tcmalloc.html. [Accessed April-2016].

[36]

Transparent huge pages in 2.6.38. https://lwn.net/Articles/423584/. [January, 2011].

[37]

VoltDB recommends disabling huge pages. https://docs.voltdb.com/AdminGuide/adminmemmgt.php. [Accessed April, 2016].

[38]

Jeongseob Ahn, Seongwook Jin, and Jaehyuk Huh. Revisiting hardware-assisted page walks for virtualized systems. In International Symposium on Computer Architecture (ISCA), 2012.

Digital Library

[39]

Jeongseob Ahn, Seongwook Jin, and Jaehyuk Huh. Fast two-level address translation for virtualized systems. In IEEE Transactions on Computers, 2015.

Digital Library

[40]

AMD. AMD-V Nested Paging, 2010. http:// developer.amd.com/wordpress/media/2012/10/NPT-WP-1%201-final-TM.pdf.

[41]

Jean Araujo, Rubens Matos, Paulo Maciel, Rivalino Matias, and Ibrahim Beicker. Experimental evaluation of software aging effects on the eucalyptus cloud computing infrastructure. In Middleware Industry Track Workshop, 2011.

Digital Library

[42]

Thomas W. Barr, Alan L. Cox, and Scott Rixner. Translation caching: Skip, don't walk (the page table). In International Symposium on Computer Architecture (ISCA), 2010.

Digital Library

[43]

Thomas W. Barr, Alan L. Cox, and Scott Rixner. Spectlb: A mechanism for speculative address translation. In International Symposium on Computer Architecture (ISCA), 2011.

Digital Library

[44]

Arkapravu Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. Efficient virtual memory for big memory servers. In International Symposium on Computer Architecture (ISCA), 2013.

Digital Library

[45]

Aaron Beitch, Brandon Liu, Timothy Yung, Rean Griffith, Armando Fox, and David Patterson. Rain: A workload generation toolkit for cloud computing applications. In U.C. Berkeley Technical Publications (UCB/EECS-2010-14), 2010.

[46]

Abhishek Bhattacharjee. Large-reach memory management unit caches. In International Symposium on Microarchitecture, 2013.

Digital Library

[47]

Abhishek Bhattacharjee, Daniel Lustig, and Margaret Martonosi. Shared last-level TLBs for chip multiprocessors. In IEEE International Symposium on High Performance Computer Architecture (HPCA), 2011.

Digital Library

[48]

Abhishek Bhattacharjee and Margaret Martonosi. Characterizing the TLB behavior of emerging parallel workloads on chip multiprocessors. In International Conference on Parallel Architectures and Compilation Techniques (PACT), 2009.

Digital Library

[49]

Yu Du, Miao Zhou, B.R. Childers, D. Mosse, and R. Melhem. Supporting superpages in noncontiguous physical memory. In IEEE International Symposium on High Performance Computer Architecture (HPCA), 2015.

[50]

Tammy Everts. The average web page is more than 2 MB size. https://www.soasta.com/blog/page-bloat-average-web-page-2-mb/. [June, 2015].

[51]

Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 37-48, New York, NY, USA, 2012. ACM.

Digital Library

[52]

Jayneel Gandhi, Mark D. Hill, and Michael M. Swift. Exceeding the best of nested and shadow paging. In International Symposium on Computer Architecture (ISCA), 2016.

Digital Library

[53]

Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. Efficient memory virtualization. In International Symposium on Microarchitecture, 2014.

Digital Library

[54]

Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. Large pages may be harmful on numa systems. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, pages 231-242, Berkeley, CA, USA, 2014. USENIX Association.

Digital Library

[55]

Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 17-30, Hollywood, CA, 2012. USENIX.

Digital Library

[56]

Mel Gorman and Patrick Healy. Supporting superpage allocation without additional hardware support. In Proceedings of the 7th International Symposium on Memory Management, 2008.

Digital Library

[57]

Mel Gorman and Patrick Healy. Performance characteristics of explicit superpage support. In Workshorp on the Interaction between Operating Systems and Computer Architecture (WIOSCA), 2010.

Digital Library

[58]

Mel Gorman and Andy Whitcroft. The what, the why and the where to of anti-fragmentation. In Linux Symposium, 2005.

[59]

Intel Corporation. Intel 64 and IA-32 Architectures Software Developers Manual, 2016. https:// www-ssl.intel.com/content/dam/www/ public/us/en/documents/manuals/64- ia-32-architectures-software-developer-manual-325462.pdf.

[60]

Gokul B. Kandiraju and Anand Sivasubramaniam. Going the distance for TLB prefetching: An application-driven study. In International Symposium on Computer Architecture (ISCA), 2002.

Digital Library

[61]

Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrin Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman nsal. Redundant memory mappings for fast access to large memories. In International Symposium on Computer Architecture (ISCA), 2015.

Digital Library

[62]

Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. KVM: The linux virtual machine monitor. In Linux Symposium, 2007.

[63]

Kernel Same-page Merging. https://en. wikipedia.org/wiki/Kernel_samepage_merging. [Accessed April, 2016].

[64]

Ching-Pei Lee and Chih-Jen Lin. Large-scale linear RankSVM. Neural Comput., 26(4):781-817, April 2014.

Digital Library

[65]

Huge Pages Part 2 (Interfaces). https://lwn.net/Articles/375096/. [February, 2010].

[66]

Daniel Lustig, Abhishek Bhattacharjee, and Margaret Martonosi. TLB improvements for chip multiprocessors: Inter-core cooperative prefetchers and shared last-level TLBs. ACM Transactions on Architecture and Code Optimization (TACO), 2013.

Digital Library

[67]

Timothy Merrifield and H. Reza Taheri. Performance implications of extended page tables on virtualized x86 processors. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '16, pages 25-35, New York, NY, USA, 2016. ACM.

Digital Library

[68]

Juan Navarro, Sitaram Iyer, Peter Druschel, and Alan Cox. Practical, transparent operating system support for superpages. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2002.

Digital Library

[69]

M.-M. Papadopoulou, Xin Tong, A. Seznec, and A. Moshovos. Prediction-based superpage-friendly TLB designs. In IEEE International Symposium on High Performance Computer Architecture (HPCA), 2015.

[70]

Idle Page Tracking. http://lxr.free-electrons.com/source/ Documentation/vm/idle_page_tracking.txt. [November, 2015].

[71]

Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. Increasing TLB reach by exploiting clustering in page translations. In IEEE International Symposium on High Performance Computer Architecture (HPCA), 2014.

[72]

Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. CoLT: Coalesced large-reach TLBs. In International Symposium on Microarchitecture, 2012.

Digital Library

[73]

Binh Pham, Jan Vesely, Gabriel Loh, and Abhishek Bhattacharjee. Large pages and lightweight memory management in virtualized systems: Can you have it both ways? In International Symposium on Microarchitecture, 2015.

Digital Library

[74]

Ashley Saulsbury, Fredrik Dahlgren, and Per Stenström. Recency-based TLB preloading. In International Symposium on Computer Architecture (ISCA), 2000.

Digital Library

[75]

Tom Shanley. Pentium Pro Processor System Architecture. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1996.

Digital Library

[76]

Richard L. Sites and Richard T. Witek. ALPHA architecture reference manual. Digital Press, Boston, Oxford, Melbourne, 1998.

Digital Library

[77]

Will Sobel, Shanti Subramanyam, Akara Sucharitakul, Jimmy Nguyen, Hubert Wong, Arthur Klepchukov, Sheetal Patil, O Fox, and David Patterson. Cloudstone: Multi-platform, multilanguage benchmark and measurement tools for web 2.0, 2008.

[78]

Shekhar Srikantaiah and Mahmut Kandemir. Synergistic tlbs for high performance address translation in chip multiprocessors. In International Symposium on Microarchitecture, 2010.

Digital Library

[79]

M. Talluri and M. D. Hill. Surpassing the TLB performance of superpages with less operating system support. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1994.

Digital Library

[80]

Transparent Hugepages. https://lwn.net/Articles/359158/. [October, 2009].

[81]

Carl A. Waldspurger. Memory resource management in VMware ESX server. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2002.

Digital Library

Cited By

Prabhu RNayak AMohan JRamjee RPanwar AEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)vAttention: Dynamic Memory Management for Serving LLMs without PagedAttentionProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707256(1133-1150)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707256
Li CSha SZeng YYang XLuo YWang XWang ZZhou DBagchi SZhang Y(2024)Taming hot bloat under virtualization with HUGESCOPEProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692053(999-1012)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692053
Zhou ZGogte VVaish NKennelly CXia PKanev SMoseley TDelimitrou CRanganathan PTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Characterizing a Memory Allocator at Warehouse ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651350(192-206)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651350
Show More Cited By

Coordinated and efficient huge page management with ingens

Recommendations

Huge Page Friendly Virtualized Memory Management
Abstract
With the rapid increase of memory consumption by applications running on cloud data centers, we need more efficient memory management in a virtualized environment. Exploiting huge pages becomes more critical for a virtual machine’s performance ...
Ingens: Huge Page Support for the OS and Hypervisor
Special Topics

Memory capacity and demand have grown hand in hand in recent years. However, overheads for memory virtualization, in particular for address translation, grow with memory capacity as well, motivating hardware manufacturers to provide TLBs with thousands ...
Radiant: efficient page table management for tiered memory systems
ISMM 2021: Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management

Modern enterprise servers are increasingly embracing tiered memory systems with a combination of low latency DRAMs and large capacity but high latency non-volatile main memories (NVMMs) such as Intel’s Optane DC PMM. Prior works have focused on the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation

November 2016

786 pages

ISBN:9781931971331

Program Chairs:
Kimberly Keeton
Hewlett Packard Labs
,
Timothy Roscoe
ETH Zurich

Sponsors

VMware
NetApp
Google Inc.
Microsoft: Microsoft
Facebook: Facebook

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

USENIX Association

United States

Publication History

Published: 02 November 2016

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Prabhu RNayak AMohan JRamjee RPanwar AEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)vAttention: Dynamic Memory Management for Serving LLMs without PagedAttentionProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707256(1133-1150)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707256
Li CSha SZeng YYang XLuo YWang XWang ZZhou DBagchi SZhang Y(2024)Taming hot bloat under virtualization with HUGESCOPEProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692053(999-1012)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692053
Zhou ZGogte VVaish NKennelly CXia PKanev SMoseley TDelimitrou CRanganathan PTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Characterizing a Memory Allocator at Warehouse ScaleProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651350(192-206)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651350
Guo CZhang RXu JLeng JLiu ZHuang ZGuo MWu HZhao SZhao JZhang KTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory StitchingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640423(450-466)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640423
Wang YPerarnau SChien A(2024)UpDown: Combining Scalable Address Translation with Locality ControlProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00141(1014-1024)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00141
Han JGosakan KKuszmaul WMubarek IMukherjee NSriram KTagliavini GWest EBender MBhattacharjee AConway AFarach-Colton MGandhi JJohnson RKannan SPorter D(2024)Mosaic Pages: Big TLB Reach With Small PagesIEEE Micro10.1109/MM.2024.340918144:4(52-59)Online publication date: 6-Jun-2024
https://dl.acm.org/doi/10.1109/MM.2024.3409181
Zhao KXue KWang ZSchatzberg DYang LManousis AWeiner JRiel RSharma BTang CSkarlatos D(2024)Contiguitas: The Pursuit of Physical Memory Contiguity in Data CentersIEEE Micro10.1109/MM.2024.340693344:4(44-51)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/MM.2024.3406933
Solomon EZhou YCox A(2023)An Empirical Evaluation of PTE CoalescingProceedings of the International Symposium on Memory Systems10.1145/3631882.3631902(1-16)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3631882.3631902
Du DYang BXia YChen H(2023)Accelerating Extra Dimensional Page Walks for Confidential ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614293(654-669)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614293
Li BGuo YWang YJaleel AYang JTang X(2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614269
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten