research-article

PreTrans: reducing TLB CAM-search via page number prediction and speculative pre-translation

Authors:

Mithuna ThottethodiAuthors Info & Claims

ISLPED '13: Proceedings of the 2013 International Symposium on Low Power Electronics and Design

Pages 341 - 346

Published: 04 September 2013 Publication History

Abstract

The need for fast address translation within tight time constraints (before L1 tag check but after effective address computation) imposes many design constraints. The freedom from such constraints can potentially lead to lower TLB energy costs. In this paper, we observe that (1) data accesses commonly use base-displacement addressing modes in which the effective address is computed as the sum of a base and a displacement, and (2) the effective page numbers are predictable once the base address is known. Further, it is easy to cache address translations alongside the predicted page numbers thus enabling speculative address translation that can filter accesses to the TLB. The two observations enable our PreTrans design in which (a) a speculative translation is available based solely on the base address, and (b) the translation is available simultaneously with the effective (virtual) address. PreTrans replaces most of the energy-expensive CAM-lookups for TLB access with RAM lookups, which translates to significant power improvements in the TLB.

References

[1]

Cortex-a15 mpcore processors. http://infocenter.arm.com/help/index. jsp?topic=/com.arm.doc.ddi0438g/index.html.

[2]

Cortex-a9 series processors. http://infocenter.arm.com/help/index.jsp? topic=/com.arm.doc.ddi0388e/Chddijbd.html.

[3]

Intel 64 and ia-32 architectures software developer's manual. 3A:399.

[4]

T. Austin, D. Pnevmatikatos, and G. Sohi. Streamlining data cache access with fast address calculation. In Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on, pages 369--380, 1995.

Digital Library

[5]

T. Austin and G. Sohi. Zero-cycle loads: microarchitecture support for reducing load latency. In Microarchitecture, 1995., Proceedings of the 28th Annual International Symposium on, pages 82--92, 1995.

Digital Library

[6]

T. W. Barr, A. L. Cox, and S. Rixner. Spectlb: a mechanism for speculative address translation. In Proceedings of the 38th annual international symposium on Computer architecture, ISCA '11, pages 307--318, 2011.

Digital Library

[7]

A. Basu, M. D. Hill, and M. M. Swift. Reducing memory reference energy with opportunistic virtual caching. In Proceedings of the 39th International Symposium on Computer Architecture, ISCA '12, pages 297--308, 2012.

Digital Library

[8]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton University, January 2008.

Digital Library

[9]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2): 1--7, Aug. 2011.

Digital Library

[10]

D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, ASPLOS IV, pages 40--52, 1991.

Digital Library

[11]

M. Cekleov and M. Dubois. Virtual-address caches. part 1: problems and solutions in uniprocessors. Micro, IEEE, 17(5): 64--71, sep/oct 1997.

Digital Library

[12]

T.-F. Chen and J.-L. Baer. A performance study of software and hardware data prefetching schemes. In Computer Architecture, 1994., Proceedings the 21st Annual International Symposium on, pages 223--232, 1994.

Digital Library

[13]

W. Y. Chen, S. A. Mahlke, P. P. Chang, and W.-m. W. Hwu. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching. In Proceedings of the 24th annual international symposium on Microarchitecture, MICRO 24, pages 69--73, 1991.

Digital Library

[14]

M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 37--48, New York, NY, USA, 2012. ACM.

Digital Library

[15]

F. Gabbay and A. Mendelson. Using value prediction to increase the power of speculative execution hardware. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 16: 234--270, 1998.

Digital Library

[16]

J. R. Goodman. Coherency for multiprocessor virtual address caches. SIGOPS Oper. Syst. Rev., 21(4): 72--81, Oct. 1987.

Digital Library

[17]

J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4): 1--17, Sept. 2006.

Digital Library

[18]

T. Juan, T. Lang, and J. J. Navarro. Reducing tlb power requirements. In Proceedings of the 1997 international symposium on Low power electronics and design, ISLPED '97, pages 196--201, 1997.

Digital Library

[19]

I. Kadayif, A. Sivasubramaniam, M. Kandemir, G. Kandiraju, and G. Chen. Generating physical addresses directly for saving instruction tlb energy. In Microarchitecture, 2002. (MICRO-35). Proceedings. 35th Annual IEEE/ACM International Symposium on, pages 185--196, 2002.

Digital Library

[20]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 469--480, 2009.

Digital Library

[21]

M. H. Lipasti and J. P. Shen. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, MICRO 29, pages 226--237, Washington, DC, USA, 1996. IEEE Computer Society.

Digital Library

[22]

W. H. Wang, J.-L. Baer, and H. M. Levy. Organization and performance of a two-level virtual-real cache hierarchy. In Proceedings of the 16th annual international symposium on Computer architecture, ISCA '89, pages 140--148, 1989.

Digital Library

Cited By

Baoni VMittal ASohi G(2021)Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480104(366-379)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480104

Index Terms

PreTrans: reducing TLB CAM-search via page number prediction and speculative pre-translation
1. Applied computing
  1. Physical sciences and engineering
    1. Electronics
2. Hardware

Recommendations

Location cache: a low-power L2 cache system
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and design

While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which ...
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems

Conventional on-chip TLB hierarchies are unable to fully cover the growing application working-set sizes. To make things worse, Last-Level TLB (LLT) misses require multiple accesses to the page table even with the use of page walk caches. Consequently, ...
Filtering Translation Bandwidth with Virtual Caching
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

Heterogeneous computing with GPUs integrated on the same chip as CPUs is ubiquitous, and to increase programmability many of these systems support virtual address accesses from GPU hardware. However, this entails address translation on every memory ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISLPED '13: Proceedings of the 2013 International Symposium on Low Power Electronics and Design

September 2013

440 pages

ISBN:9781479912353

General Chairs:
Pai Chou
UC Irvine / NTHU Taiwan
,
Ru Huang
Peking University
,
Program Chairs:
Yuan Xie
Penn State / AMD
,
Tanay Karnik
Intel

Sponsors

Publisher

IEEE Press

Publication History

Published: 04 September 2013

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISLPED'13

Sponsor:

SIGDA

ISLPED'13: International Symposium on Low Power Electronics and Design

September 4 - 6, 2013

Beijing, China

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
54
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Baoni VMittal ASohi G(2021)Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache AccessesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480104(366-379)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480104

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents