Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3352460.3358294acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Prefetched Address Translation

Published: 12 October 2019 Publication History

Abstract

With explosive growth in dataset sizes and increasing machine memory capacities, per-application memory footprints are commonly reaching into hundreds of GBs. Such huge datasets pressure the TLB, resulting in frequent misses that must be resolved through a page walk -- a long-latency pointer chase through multiple levels of the in-memory radix tree-based page table.
Anticipating further growth in dataset sizes and their adverse affect on TLB hit rates, this work seeks to accelerate page walks while fully preserving existing virtual memory abstractions and mechanisms -- a must for software compatibility and generality. Our idea is to enable direct indexing into a given level of the page table, thus eliding the need to first fetch pointers from the preceding levels. A key contribution of our work is in showing that this can be done by simply ordering the pages containing the page table in physical memory to match the order of the virtual memory pages they map to. Doing so enables direct indexing into the page table using a base-plus-offset arithmetic.
We introduce Address Translation with Prefetching (ASAP), a new approach for reducing the latency of address translation to a single access to the memory hierarchy. Upon a TLB miss, ASAP launches prefetches to the deeper levels of the page table, bypassing the preceding levels. These prefetches happen concurrently with a conventional page walk, which observes a latency reduction due to prefetching while guaranteeing that only correctly-predicted entries are consumed. ASAP requires minimal extensions to the OS and trivial microarchitectural support. Moreover, ASAP is fully legacy-preserving, requiring no modifications to the existing radix tree-based page table, TLBs and other software and hardware mechanisms for address translation. Our evaluation on a range of memory-intensive workloads shows that under SMT colocation, ASAP is able to reduce page walk latency by an average of 25% (42% max) in native execution, and 45% (55% max) under virtualization.

References

[1]
J. Gandhi, M. D. Hill, and M. M. Swift, "Agile paging: Exceeding the best of nested and shadow paging," in Proceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016, pp. 707--718.
[2]
Linley Group, "3D XPoint fetches data in a flash," Microprocessor Report, September 2015.
[3]
Intel, "5-level paging and 5-level EPT," Intel, White Paper 335252-002, May 2017.
[4]
B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharjee, "CoLT: Coalesced large-reach TLBs," in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2012, pp. 258--269.
[5]
B. Pham, A. Bhattacharjee, Y. Eckert, and G. H. Loh, "Increasing TLB reach by exploiting clustering in page translations," in Proceedings of the 20th IEEE Symposium on High-Performance Computer Architecture (HPCA), 2014, pp. 558--567.
[6]
C. H. Park, T. Heo, J. Jeong, and J. Huh, "Hybrid TLB coalescing: Improving TLB translation coverage under diverse fragmented memory allocations," in Proceedings of the 44th International Symposium on Computer Architecture (ISCA), 2017, pp. 444--456.
[7]
G. Cox and A. Bhattacharjee, "Efficient address translation for architectures with multiple page sizes," in Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXII), 2017, pp. 435--448.
[8]
Y. Kwon, H. Yu, S. Peter, C. J. Rossbach, and E. Witchel, "Coordinated and efficient huge page management with ingens," in Proceedings of the 12th Symposium on Operating System Design and Implementation (OSDI), 2016, pp. 705--721.
[9]
B. Pham, J. Veselý, G. H. Loh, and A. Bhattacharjee, "Large pages and lightweight memory management in virtualized environments: Can you have it both ways?" in Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2015, pp. 1--12.
[10]
A. Seznec, "Concurrent support of multiple page sizes on a skewed associative TLB," IEEE Trans. Computers, vol. 53, no. 7, pp. 924--927, 2004.
[11]
M.-M. Papadopoulou, X. Tong, A. Seznec, and A. Moshovos, "Prediction-based superpage-friendly TLB designs." in Proceedings of the 21st IEEE Symposium on High-Performance Computer Architecture (HPCA), 2015, pp. 210--222.
[12]
A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift, "Efficient virtual memory for big memory servers," in Proceedings of the 40th International Symposium on Computer Architecture (ISCA), 2013, pp. 237--248.
[13]
J. Gandhi, A. Basu, M. D. Hill, and M. M. Swift, "Efficient memory virtualization: Reducing dimensionality of nested page walks," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2014, pp. 178--189.
[14]
V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. D. Hill, K. S. McKinley, M. Nemirovsky, M. M. Swift, and O. S. Unsal, "Redundant memory mappings for fast access to large memories," in Proceedings of the 42nd International Symposium on Computer Architecture (ISCA), 2015, pp. 66--78.
[15]
H. Alam, T. Zhang, M. Erez, and Y. Etsion, "Do-it-yourself virtual memory translation," in Proceedings of the 44th International Symposium on Computer Architecture (ISCA), 2017, pp. 457--468.
[16]
J. H. Ryoo, N. Gulur, S. Song, and L. K. John, "Rethinking TLB designs in virtualized environments: A very large part-of-memory TLB," in Proceedings of the 44th International Symposium on Computer Architecture (ISCA), 2017, pp. 469--480.
[17]
T. W. Barr, A. L. Cox, and S. Rixner, "Translation caching: skip, don't walk (the page table)," in Proceedings of the 37th International Symposium on Computer Architecture (ISCA), 2010, pp. 48--59.
[18]
I. Yaniv and D. Tsafrir, "Hash, don't cache (the page table)," in Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2016, pp. 337--350.
[19]
E. Bugnion, J. Nieh, and D. Tsafrir, Hardware and software support for virtualization, ser. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, 2017.
[20]
R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, "Accelerating two-dimensional page walks for virtualized systems," in Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIII), 2008, pp. 26--35.
[21]
J. Araujo, R. Matos, P. Maciel, R. Matias, and I. Beicker, "Experimental evaluation of software aging effects on the eucalyptus cloud computing infrastructure," in Proceedings of the Middleware 2011 Industry Track Workshop. ACM, 2011, p. 4.
[22]
J. Hu, X. Bai, S. Sha, Y. Luo, X. Wang, and Z. Wang, "HUB: Hugepage ballooning in kernel-based virtual machines," in Proceedings of the 4th International Symposium on Memory Systems (MEMSYS), 2018.
[23]
K. Keeton, "Memory-driven computing," in Proceedings of 15th USENIX Conference on File and Storage Technologies (FAST), 2017.
[24]
D. Tang, P. Carruthers, Z. Totari, and M. W. Shapiro, "Assessment of the effect of memory page retirement on system RAS against hardware faults," in Proceedings of the 36th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2006, pp. 365--370.
[25]
"Bad page offlining," 2009. [Online]. Available: www.mcelog.org/badpageofflining.html
[26]
J. Meza, Q. Wu, S. Kumar, and O. Mutlu, "Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field," in Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2015, pp. 415--426.
[27]
L. Zhang, B. Neely, D. Franklin, D. B. Strukov, Y. Xie, and F. T. Chong, "Mellow writes: Extending lifetime in resistive memories through selective slow write backs," in Proceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016, pp. 519--531.
[28]
M. Zhang, L. Zhang, L. Jiang, Z. Liu, and F. T. Chong, "Balancing performance and lifetime of MLC PCM by using a region retention monitor," in Proceedings of the 23rd IEEE Symposium on High-Performance Computer Architecture (HPCA), 2017, pp. 385--396.
[29]
Y. Du, M. Zhou, B. R. Childers, D. Mossé, and R. G. Melhem, "Supporting superpages in non-contiguous physical memory," in Proceedings of the 21st IEEE Symposium on High-Performance Computer Architecture (HPCA), 2015, pp. 223--234.
[30]
S. Sivaram, "Storage class memory: Learning from 3D NAND," Flash Memory Summit, 2016. [Online]. Available: www.flashmemorysummit.com/English/Collaterals/Proceedings/2016/20160809_Keynote4_WD_Sivaram.pdf
[31]
K. Suzuki and S. Swanson, "The non-volatile memory technology database (NVMDB)." University of California, San Diego, Tech. Rep. CS2015-1011, 2015.
[32]
R. Kath, "Managing virtual memory," 1993. [Online]. Available: msdn.microsoft.com/en-us/library/ms810627.aspx
[33]
A. Arcangeli, "Transparent hugepage support," KVMForum, 2010.
[34]
"Galois," 2018. [Online]. Available: iss.ices.utexas.edu/?p=projects/galois
[35]
A. Bhattacharjee, "Large-reach memory management unit caches," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2013, pp. 383--394.
[36]
D. Bruening and S. Amarasinghe, "Efficient, transparent, and comprehensive runtime code manipulation," Ph.D. dissertation, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2004.
[37]
A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, "Large-scale cluster management at Google with Borg," in Proceedings of the 10th European Conference on Computer Systems (EuroSys), 2015, pp. 1--17.
[38]
S. Kanev, J. P. Darago, K. Hazelwood, P. Ranganathan, T. Moseley, G.-Y. Wei, and D. Brooks, "Profiling a warehouse-scale computer," in Proceedings of the 42nd International Symposium on Computer Architecture (ISCA), 2015, pp. 158--169.
[39]
S. Van Schaik, K. Razavi, B. Gras, H. Bos, and C. Giuffrida, "RevAnC: A framework for reverse engineering hardware page table caches," in Proceedings of the 10th European Workshop on Systems Security. ACM, 2017, p. 3.
[40]
"libhugetlbfs(7) - Linux man page," 2006. [Online]. Available: linux.die.net/man/7/libhugetlbfs
[41]
A. Bhattacharjee, D. Lustig, and M. Martonosi, "Shared last-level TLBs for chip multiprocessors," in Proceedings of the 17th IEEE Symposium on High-Performance Computer Architecture (HPCA), 2011, pp. 62--63.
[42]
D. Lustig, A. Bhattacharjee, and M. Martonosi, "TLB improvements for chip multiprocessors: Inter-core cooperative prefetchers and shared last-level TLBs," TACO, vol. 10, no. 1, pp. 2:1--2:38, 2013.
[43]
G. B. Kandiraju and A. Sivasubramaniam, "Going the distance for TLB prefetching: An application-driven study," in Proceedings of the 29th International Symposium on Computer Architecture (ISCA), 2002, pp. 195--206.
[44]
Intel, "Intel Itanium© architecture software developer's manual, Volume 2," 2010. [Online]. Available: www.intel.com/content/www/us/en/processors/itanium/itanium-architecture-software-developer-rev-2-3-vol-2-manual.html
[45]
IBM, "Power ISA version 2.07 B," 2018. [Online]. Available: https://openpowerfoundation.org/?resource_lib=ibm-power-isa-version-2-07-b
[46]
Sun Microsystems, "UltraSPARC T2 supplement to the UltraSPARC architecture," 2007. [Online]. Available: https://www.oracle.com/technetwork/systems/opensparc/t2-13-ust2-uasuppl-draft-p-ext-1537760.html
[47]
T. W. Barr, A. L. Cox, and S. Rixner, "SpecTLB: A mechanism for speculative address translation," in Proceedings of the 38th International Symposium on Computer Architecture (ISCA), 2011, pp. 307--318.
[48]
J. Navarro, S. Iyer, P. Druschel, and A. L. Cox, "Practical, transparent operating system support for superpages," in Proceedings of the 5th Symposium on Operating System Design and Implementation (OSDI), 2002.
[49]
M. Talluri and M. D. Hill, "Surpassing the TLB performance of superpages with less operating system support," in Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), 1994, pp. 171--182.
[50]
P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, "Spectre attacks: Exploiting speculative execution," in Proceedings of the 40th IEEE Symposium on Security and Privacy (SP), 2019, pp. 19--37.
[51]
M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, "Meltdown: Reading kernel memory from user space," in Proceedings of the 27th USENIX Security Symposium, 2018, pp. 973--990.
[52]
J. V. Bulck, M. Minkin, O. Weisse, D. Genkin, B. Kasikci, F. Piessens, M. Silberstein, T. F. Wenisch, Y. Yarom, and R. Strackx, "Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution," in Proceedings of the 27th USENIX Security Symposium, 2018, pp. 991--1008.
[53]
A. Bhattacharjee, "Translation-triggered prefetching," in Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXII), 2017, pp. 63--76.
[54]
J. Ahn, S. Jin, and J. Huh, "Revisiting hardware-assisted page walks for virtualized systems," in Proceedings of the 39th International Symposium on Computer Architecture (ISCA), 2012, pp. 476--487.
[55]
C. Waldspurger, "Memory resource management in VMware ESX server," Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), 2002.

Cited By

View all
  • (2024)SmartNIC-Enabled Live Migration for Storage-Optimized VMsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680487(45-52)Online publication date: 4-Sep-2024
  • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
  • (2024)DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00085(1129-1143)Online publication date: 29-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
October 2019
1104 pages
ISBN:9781450369381
DOI:10.1145/3352460
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. microarchitecture
  2. virtual memory
  3. virtualization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MICRO '52
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)234
  • Downloads (Last 6 weeks)33
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SmartNIC-Enabled Live Migration for Storage-Optimized VMsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680487(45-52)Online publication date: 4-Sep-2024
  • (2024)Direct Memory Translation for Virtualized CloudsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640358(287-304)Online publication date: 27-Apr-2024
  • (2024)DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00085(1129-1143)Online publication date: 29-Jun-2024
  • (2024)Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00065(834-847)Online publication date: 29-Jun-2024
  • (2024)Counter-light Memory Encryption2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00058(724-738)Online publication date: 29-Jun-2024
  • (2024)CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00016(76-88)Online publication date: 27-May-2024
  • (2023)Accelerating Extra Dimensional Page Walks for Confidential ComputingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614293(654-669)Online publication date: 28-Oct-2023
  • (2023)IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE InvalidationsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614269(1163-1177)Online publication date: 28-Oct-2023
  • (2023)Mosaic Pages: Big TLB Reach with Small PagesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582021(433-448)Online publication date: 25-Mar-2023
  • (2023)Contiguitas: The Pursuit of Physical Memory Contiguity in DatacentersProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589079(1-15)Online publication date: 17-Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media