Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2830772.2830773acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Free access

Large pages and lightweight memory management in virtualized environments: can you have it both ways?

Published: 05 December 2015 Publication History

Abstract

Large pages have long been used to mitigate address translation overheads on big-memory systems, particularly in virtualized environments where TLB miss overheads are severe. We show, however, that far from being a panacea, large pages are used sparingly by modern virtualization software. This is because large pages often preclude lightweight memory management, which can outweigh their Translation Lookaside Buffer (TLB) benefits. For example, they reduce opportunities to deduplicate memory among virtual machines in overcommitted systems, interfere with lightweight memory monitoring, and hamper the agility of virtual machine (VM) migrations. While many of these problems are particularly severe in overcommitted systems with scarce memory resources, they can (and often do) exist generally in cloud deployments. In response, virtualization software often (though it doesn't have to) splinters guest operating system (OS) large pages into small system physical pages, sacrificing address translation performance for overall system-level benefits. We introduce simple hardware that bridges this fundamental conflict, using speculative techniques to group contiguous, aligned small page translations such that they approach the address translation performance of large pages. Our Generalized Large-page Utilization Enhancements (GLUE) allow system hypervisors to splinter large pages for agile memory management, while retaining almost all of the TLB performance of unsplintered large pages.

References

[1]
"Software Optimization Guide for AMD Family 15h Processors," Advanced Micro Devices Inc, Tech. Rep., 2014.
[2]
A. Agarwal and S. Pudar, "Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct-Mapped Caches," ISCA, 1993.
[3]
K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung, "BioBench: A Benchmark Suite of Bioinformatics Applications," in Proc. of the Intl. Symp. on Performance Analysis of Systems and Software, Austin, TX, March 2005, pp. 2--9.
[4]
A. Arcangeli, "Transparent Hugepage Support," KVM Forum, 2010.
[5]
A. Arcangeli, I. Eidus, and C. Wright, "Increasing Memory Density by Using KSM," Ottawa Linux Symposium, 2009.
[6]
G. Atwood, "Current and Emerging Memory Technology Landscape," Flash Memory Summit, 2011.
[7]
I. Banerjee, F. Guo, K. Tati, and R. Venkatasubramanian, "Memory Overcommittment in the ESX Server," VMware Technical Journal, 2013.
[8]
T. Barr, A. Cox, and S. Rixner, "Translation Caching: Skip, Don't Walk (the Page Table)," ISCA, 2010.
[9]
T. Barr, A. Cox, and S. Rixner, "SpecTLB: A Mechanism for Speculative Address Translation," ISCA, 2011.
[10]
A. Basu, J. Gandhi, J. Chang, M. Hill, and M. Swift, "Efficient Virtual Memory for Big Memory Servers," ISCA, 2013.
[11]
R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, "Accelerating Two-Dimensional Page Walks for Virtualized Systems," ASPLOS, 2008.
[12]
A. Bhattacharjee, D. Lustig, and M. Martonosi, "Shared Last-Level TLBs for Chip Multiprocessors," HPCA, 2011.
[13]
A. Bhattacharjee, "Large-Reach Memory Management Unit Caches," MICRO, 2013.
[14]
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Simplications," IISWC, 2008.
[15]
J. Buell, D. Hecht, J. Heo, K. Saladi, and R. Taheri, "Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications," VMWare Technical Journal, 2013.
[16]
Y. Du, M. Zhou, B. Childers, D. Mosse, and R. Melhem, "Supporting Superpages in Non-Contiguous Physical Memory," HPCA, 2015.
[17]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," ASPLOS, 2012.
[18]
J. Gandhi, A. Basu, M. Hill, and M. Swift, "Efficient Memory Virtualization," MICRO, 2014.
[19]
F. Gaud, B. Lepers, J. Decouchant, J. Funston, and A. Fedorova, "Large Pages May be Harmful on NUMA Systems," USENIX ATC, 2014.
[20]
F. Guo, S. Kim, Y. Baskakov, and I. Banerjee, "Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi," VEE, 2015.
[21]
D. Kanter, "Haswell Memory Hierarchy," http://www.realworldtech.com/haswell-cpu/5/, 2012.
[22]
V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, "Redundant Memory Mappings for Fast Access to Large Memories," ISCA, 2015.
[23]
B. Kero, "Running 512 Containers on a Laptop," http://bke.ro/running-512-containers-on-a-laptop, 2015.
[24]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," in Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation, Chicago, IL, June 2005, pp. 190--200.
[25]
M. Meswani, S. Blagodurov, D. Roberts, J. Slice, M. Ignatowski, and G. Loh, "Heterogeneous Memory Architectures: A HW/SW Approach for Mixing Die-Stacked and Off-Package Memories," HPCA, 2015.
[26]
J. Navarro, S. Iyer, P. Druschel, and A. Cox, "Practical, Transparent Operating System Support for Superpages," OSDI, 2002.
[27]
M. Papadopoulou, X. Tong, A. Seznec, and A. Moshovos, "Prediction-Based Superpage-Friendly TLB Designs," HPCA, 2014.
[28]
H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi, "Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation," in Proc. of the 37th Intl. Symp. on Microarchitecture, Portland, OR, December 2004.
[29]
S. Phadke and S. Narayanasamy, "MLP Aware Heterogeneous Memory System," DATE, 2011.
[30]
B. Pham, A. Bhattacharjee, Y. Eckert, and G. Loh, "Increasing TLB Reach by Exploiting Clustering in Page Translations," HPCA, 2014.
[31]
B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012.
[32]
B. Romanescu, A. Lebeck, and D. Sorin, "Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency," ASPLOS, 2010.
[33]
D. Sanchez and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," MICRO, 2010.
[34]
A. W. Services, "AWS Cloud Formation User Guide," 2010.
[35]
A. Seznec, "A Case for Two-Way Skewed Associative Cache," ISCA, 1993.
[36]
A. Seznec, "Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB," IEEE Transactions on Computers, 2004.
[37]
M. Spjuth, M. Karlsson, and E. Hagersten, "The Elbow Cache: A Power-Efficient Alternative to Highly Associative Caches," Uppsala University Technical Report 2003-46, 2003.
[38]
M. Talluri and M. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," ASPLOS, 1994.
[39]
VMware, "Large Page Performance: ESX Server 3.5 and ESX Server 3i v3.5," VMware Performance Study, 2008.
[40]
VMware, "VProbes Programming Reference," 2008.
[41]
C. Waldspurger, "Memory Resource Management in VMware ESX Server," OSDI, 2002.
[42]
T. Wood, G. Tarasuk-Levin, P. Shenoy, P. Desnoyers, E. Cecchet, and M. Corner, "Memory Buddies: Exploiting Page Sharing for Smart Colocation in Virtualized Data Centers," VEE, 2009.
[43]
J. Xiao, Z. Xu, H. Huang, and H. Wang, "Security Implications of Memory Deduplicationi in a Virtualized Environment," DSN, 2013.
[44]
Y. Xie, "Modeling, Architecture, and Applications for Emerging Non-Volatile Memory Technologies," IEEE Computer Design and Test, 2011.

Cited By

View all
  • (2023)Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address MappingsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623789(1196-1212)Online publication date: 28-Oct-2023
  • (2023)Architectural Support for Optimizing Huge Page Selection Within the OSProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614296(1213-1226)Online publication date: 28-Oct-2023
  • (2023)Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache ResourcesProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614276(1178-1195)Online publication date: 28-Oct-2023
  • Show More Cited By

Index Terms

  1. Large pages and lightweight memory management in virtualized environments: can you have it both ways?

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
    December 2015
    787 pages
    ISBN:9781450340342
    DOI:10.1145/2830772
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. TLB
    2. speculation
    3. virtual memory
    4. virtualization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MICRO-48
    Sponsor:

    Acceptance Rates

    MICRO-48 Paper Acceptance Rate 61 of 283 submissions, 22%;
    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)203
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address MappingsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623789(1196-1212)Online publication date: 28-Oct-2023
    • (2023)Architectural Support for Optimizing Huge Page Selection Within the OSProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614296(1213-1226)Online publication date: 28-Oct-2023
    • (2023)Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache ResourcesProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614276(1178-1195)Online publication date: 28-Oct-2023
    • (2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
    • (2023)Mosaic Pages: Big TLB Reach with Small PagesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582021(433-448)Online publication date: 25-Mar-2023
    • (2023)Contiguitas: The Pursuit of Physical Memory Contiguity in DatacentersProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589079(1-15)Online publication date: 17-Jun-2023
    • (2023)Making Dynamic Page Coalescing Effective on Virtualized CloudsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567487(298-313)Online publication date: 8-May-2023
    • (2023)On-Demand Virtualization for Post-Copy OS Migration in Bare-Metal CloudIEEE Transactions on Cloud Computing10.1109/TCC.2022.317948511:2(2028-2038)Online publication date: 1-Apr-2023
    • (2023)Towards High Performance and Efficient Memory Deduplication via Mixed PagesIEEE Transactions on Computers10.1109/TC.2022.319174272:4(926-940)Online publication date: 1-Apr-2023
    • (2023)HugeGPT: Storing Guest Page Tables on Host Huge Pages to Accelerate Address Translation2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00014(62-73)Online publication date: 21-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media