research-article

Free access

Large pages and lightweight memory management in virtualized environments: can you have it both ways?

Authors:

Gabriel H. Loh,

Abhishek BhattacharjeeAuthors Info & Claims

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Pages 1 - 12

https://doi.org/10.1145/2830772.2830773

Published: 05 December 2015 Publication History

Abstract

Large pages have long been used to mitigate address translation overheads on big-memory systems, particularly in virtualized environments where TLB miss overheads are severe. We show, however, that far from being a panacea, large pages are used sparingly by modern virtualization software. This is because large pages often preclude lightweight memory management, which can outweigh their Translation Lookaside Buffer (TLB) benefits. For example, they reduce opportunities to deduplicate memory among virtual machines in overcommitted systems, interfere with lightweight memory monitoring, and hamper the agility of virtual machine (VM) migrations. While many of these problems are particularly severe in overcommitted systems with scarce memory resources, they can (and often do) exist generally in cloud deployments. In response, virtualization software often (though it doesn't have to) splinters guest operating system (OS) large pages into small system physical pages, sacrificing address translation performance for overall system-level benefits. We introduce simple hardware that bridges this fundamental conflict, using speculative techniques to group contiguous, aligned small page translations such that they approach the address translation performance of large pages. Our Generalized Large-page Utilization Enhancements (GLUE) allow system hypervisors to splinter large pages for agile memory management, while retaining almost all of the TLB performance of unsplintered large pages.

References

[1]

"Software Optimization Guide for AMD Family 15h Processors," Advanced Micro Devices Inc, Tech. Rep., 2014.

[2]

A. Agarwal and S. Pudar, "Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches," ISCA, 1993.

Digital Library

[3]

K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung, "BioBench: A Benchmark Suite of Bioinformatics Applications," in Proc. of the Intl. Symp. on Performance Analysis of Systems and Software, Austin, TX, March 2005, pp. 2--9.

Digital Library

[4]

A. Arcangeli, "Transparent Hugepage Support," KVM Forum, 2010.

[5]

A. Arcangeli, I. Eidus, and C. Wright, "Increasing Memory Density by Using KSM," Ottawa Linux Symposium, 2009.

[6]

G. Atwood, "Current and Emerging Memory Technology Landscape," Flash Memory Summit, 2011.

[7]

I. Banerjee, F. Guo, K. Tati, and R. Venkatasubramanian, "Memory Overcommittment in the ESX Server," VMware Technical Journal, 2013.

[8]

T. Barr, A. Cox, and S. Rixner, "Translation Caching: Skip, Don't Walk (the Page Table)," ISCA, 2010.

Digital Library

[9]

T. Barr, A. Cox, and S. Rixner, "SpecTLB: A Mechanism for Speculative Address Translation," ISCA, 2011.

Digital Library

[10]

A. Basu, J. Gandhi, J. Chang, M. Hill, and M. Swift, "Efficient Virtual Memory for Big Memory Servers," ISCA, 2013.

Digital Library

[11]

R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, "Accelerating Two-Dimensional Page Walks for Virtualized Systems," ASPLOS, 2008.

Digital Library

[12]

A. Bhattacharjee, D. Lustig, and M. Martonosi, "Shared Last-Level TLBs for Chip Multiprocessors," HPCA, 2011.

Digital Library

[13]

A. Bhattacharjee, "Large-Reach Memory Management Unit Caches," MICRO, 2013.

Digital Library

[14]

C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Simplications," IISWC, 2008.

[15]

J. Buell, D. Hecht, J. Heo, K. Saladi, and R. Taheri, "Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications," VMWare Technical Journal, 2013.

[16]

Y. Du, M. Zhou, B. Childers, D. Mosse, and R. Melhem, "Supporting Superpages in Non-Contiguous Physical Memory," HPCA, 2015.

[17]

M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," ASPLOS, 2012.

Digital Library

[18]

J. Gandhi, A. Basu, M. Hill, and M. Swift, "Efficient Memory Virtualization," MICRO, 2014.

Digital Library

[19]

F. Gaud, B. Lepers, J. Decouchant, J. Funston, and A. Fedorova, "Large Pages May be Harmful on NUMA Systems," USENIX ATC, 2014.

Digital Library

[20]

F. Guo, S. Kim, Y. Baskakov, and I. Banerjee, "Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi," VEE, 2015.

Digital Library

[21]

D. Kanter, "Haswell Memory Hierarchy," http://www.realworldtech.com/haswell-cpu/5/, 2012.

[22]

V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, "Redundant Memory Mappings for Fast Access to Large Memories," ISCA, 2015.

Digital Library

[23]

B. Kero, "Running 512 Containers on a Laptop," http://bke.ro/running-512-containers-on-a-laptop, 2015.

[24]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," in Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation, Chicago, IL, June 2005, pp. 190--200.

Digital Library

[25]

M. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, and G. Loh, "Heterogeneous Memory Architectures: A HW/SW Approach for Mixing Die-Stacked and Off-Package Memories," HPCA, 2015.

[26]

J. Navarro, S. Iyer, P. Druschel, and A. Cox, "Practical, Transparent Operating System Support for Superpages," OSDI, 2002.

Digital Library

[27]

M. Papadopoulou, X. Tong, A. Seznec, and A. Moshovos, "Prediction-Based Superpage-Friendly TLB Designs," HPCA, 2014.

[28]

H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi, "Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation," in Proc. of the 37th Intl. Symp. on Microarchitecture, Portland, OR, December 2004.

Digital Library

[29]

S. Phadke and S. Narayanasamy, "MLP Aware Heterogeneous Memory System," DATE, 2011.

[30]

B. Pham, A. Bhattacharjee, Y. Eckert, and G. Loh, "Increasing TLB Reach by Exploiting Clustering in Page Translations," HPCA, 2014.

[31]

B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012.

Digital Library

[32]

B. Romanescu, A. Lebeck, and D. Sorin, "Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency," ASPLOS, 2010.

Digital Library

[33]

D. Sanchez and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," MICRO, 2010.

Digital Library

[34]

A. W. Services, "AWS Cloud Formation User Guide," 2010.

[35]

A. Seznec, "A Case for Two-Way Skewed Associative Cache," ISCA, 1993.

Digital Library

[36]

A. Seznec, "Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB," IEEE Transactions on Computers, 2004.

Digital Library

[37]

M. Spjuth, M. Karlsson, and E. Hagersten, "The Elbow Cache: A Power-Efficient Alternative to Highly Associative Caches," Uppsala University Technical Report 2003-46, 2003.

[38]

M. Talluri and M. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," ASPLOS, 1994.

Digital Library

[39]

VMware, "Large Page Performance: ESX Server 3.5 and ESX Server 3i v3.5," VMware Performance Study, 2008.

[40]

VMware, "VProbes Programming Reference," 2008.

[41]

C. Waldspurger, "Memory Resource Management in VMware ESX Server," OSDI, 2002.

Digital Library

[42]

T. Wood, G. Tarasuk-Levin, P. Shenoy, P. Desnoyers, E. Cecchet, and M. Corner, "Memory Buddies: Exploiting Page Sharing for Smart Colocation in Virtualized Data Centers," VEE, 2009.

Digital Library

[43]

J. Xiao, Z. Xu, H. Huang, and H. Wang, "Security Implications of Memory Deduplicationi in a Virtualized Environment," DSN, 2013.

Digital Library

[44]

Y. Xie, "Modeling, Architecture, and Applications for Emerging Non-Volatile Memory Technologies," IEEE Computer Design and Test, 2011.

Digital Library

Cited By

Kanellopoulos KBera RStojiljkovic KBostanci FFirtina CAusavarungnirun RKumar RHajinazar NSadrosadati MVijaykumar NMutlu O(2023)Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address MappingsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623789(1196-1212)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623789
Manocha AYan ZTureci EAragón JNellans DMartonosi M(2023)Architectural Support for Optimizing Huge Page Selection Within the OSProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614296(1213-1226)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614296
Kanellopoulos KNam HBostanci NBera RSadrosadati MKumar RBartolini DMutlu O(2023)Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache ResourcesProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614276(1178-1195)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614276
Show More Cited By

Index Terms

Large pages and lightweight memory management in virtualized environments: can you have it both ways?
1. Computer systems organization
  1. Architectures

Recommendations

Transparently bridging semantic gap in CPU management for virtualized environments

Consolidated environments are progressively accommodating diverse and unpredictable workloads in conjunction with virtual desktop infrastructure and cloud computing. Unpredictable workloads, however, aggravate the semantic gap between the virtual ...
Performance Implications of Extended Page Tables on Virtualized x86 Processors
VEE '16: Proceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Managing virtual memory is an expensive operation, and becomes even more expensive on virtualized servers. Process- ing TLB misses on a virtualized x86 server requires a two-dimensional page walk that can have 6x more page table lookups, hence 6x more ...
Performance Implications of Extended Page Tables on Virtualized x86 Processors
VEE '16

Managing virtual memory is an expensive operation, and becomes even more expensive on virtualized servers. Process- ing TLB misses on a virtualized x86 server requires a two-dimensional page walk that can have 6x more page table lookups, hence 6x more ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

December 2015

787 pages

ISBN:9781450340342

DOI:10.1145/2830772

General Chair:
Milos Prvulovic
Georgia Tech

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE Computer Society TC-uARCH
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

MICRO-48

Sponsor:

SIGMICRO

MICRO-48: The 48th Annual IEEE/ACM International Symposium of Microarchitecture

December 5 - 9, 2015

Waikiki, Hawaii

Acceptance Rates

MICRO-48 Paper Acceptance Rate 61 of 283 submissions, 22%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

100
Total Citations
View Citations
1,493
Total Downloads

Downloads (Last 12 months)203
Downloads (Last 6 weeks)16

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kanellopoulos KBera RStojiljkovic KBostanci FFirtina CAusavarungnirun RKumar RHajinazar NSadrosadati MVijaykumar NMutlu O(2023)Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address MappingsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623789(1196-1212)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3623789
Manocha AYan ZTureci EAragón JNellans DMartonosi M(2023)Architectural Support for Optimizing Huge Page Selection Within the OSProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614296(1213-1226)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614296
Kanellopoulos KNam HBostanci NBera RSadrosadati MKumar RBartolini DMutlu O(2023)Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache ResourcesProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614276(1178-1195)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614276
Li BGuo YWang YJaleel AYang JTang X(2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614269
Gosakan KHan JKuszmaul WMubarek IMukherjee NSriram KTagliavini GWest EBender MBhattacharjee AConway AFarach-Colton MGandhi JJohnson RKannan SPorter DAamodt TJerger NSwift M(2023)Mosaic Pages: Big TLB Reach with Small PagesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582021(433-448)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582021
Zhao KXue KWang ZSchatzberg DYang LManousis AWeiner JVan Riel RSharma BTang CSkarlatos DSolihin YHeinrich M(2023)Contiguitas: The Pursuit of Physical Memory Contiguity in DatacentersProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589079(1-15)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589079
Jia WZhang JShan JDing XFedorova ANarayanan DDi Luna GQuerzoni L(2023)Making Dynamic Page Coalescing Effective on Virtualized CloudsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567487(298-313)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3552326.3567487
Im JKim JKwon YMaeng S(2023)On-Demand Virtualization for Post-Copy OS Migration in Bare-Metal CloudIEEE Transactions on Cloud Computing10.1109/TCC.2022.317948511:2(2028-2038)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2022.3179485
Yao LLi YGuo FWu SXu YLui J(2023)Towards High Performance and Efficient Memory Deduplication via Mixed PagesIEEE Transactions on Computers10.1109/TC.2022.319174272:4(926-940)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TC.2022.3191742
Jia WZhang JShan JDu YDing XXu T(2023)HugeGPT: Storing Guest Page Tables on Host Huge Pages to Accelerate Address Translation2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00014(62-73)Online publication date: 21-Oct-2023
https://doi.org/10.1109/PACT58117.2023.00014
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents