research-article

The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup

Authors:

Andreas Sembrant,

Erik Hagersten,

David Black-SchafferAuthors Info & Claims

ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture

Pages 133 - 144

Published: 14 June 2014 Publication History

Abstract

Modern processors optimize for cache energy and performance by employing multiple levels of caching that address bandwidth, low-latency and high-capacity. A request typically traverses the cache hierarchy, level by level, until the data is found, thereby wasting time and energy in each level. In this paper, we present the Direct-to-Data (D2D) cache that locates data across the entire cache hierarchy with a single lookup. To navigate the cache hierarchy, D2D extends the TLB with per cache-line location information that indicates in which cache and way the cache line is located. This allows the D2D cache to: 1) skip levels in the hierarchy (by accessing the right cache level directly), 2) eliminate extra data array reads (by reading the right way directly), 3) avoid tag comparisons (by eliminating the tag arrays), and 4) go directly to DRAM on cache misses (by checking the TLB). This reduces the L2 latency by 40% and saves 5-17% of the total cache hierarchy energ

D2D's lower L2 latency directly improves L2 sensitive applications' performance by 5-14%. More significantly, we can take advantage of the L2 latency reduction to optimize other parts of the micro-architecture. For example, we can reduce the ROB size for the L2 bound applications by 25%, or we can reduce the L1 cache size, delivering an overall 21% energy savings across all benchmarks, without hurting performance.

References

[1]

B. M. Beckmann and D. A. Wood, "Managing Wire Delay in Large Chip-Multiprocessor Caches," in Proc. International Symposium on Microarchitecture (MICRO), 2004.

Digital Library

[2]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 Simulator," SIGARCH Comput. Archit. News, 2011.

Digital Library

[3]

M. Boettcher, G. Gabrielli, B. M. Al-Hashimi, and D. Kershaw, "MALEC: A Multiple Access Low Energy Cache," in Proc. Design, Automation Test in Europe Conference Exhibition (DATE), 2013.

Digital Library

[4]

B. Calder, D. Grunwald, and J. Emer, "Predictive Sequential Associative Cache," in Proc. International Symposium on High-Performance Computer Architecture (HPCA), 1996.

Digital Library

[5]

Z. Chishti, M. D. Powell, and T. N. Vijaykumar, "Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2003.

Digital Library

[6]

E. Hagersten and A. Singhal, "Method and Apparatus for Selecting a Way of a Multi-way Associative Cache by Storing Waylets in a Translation Structure," Patent US-5--778--427, July, 1998.

[7]

N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Reactive NUCA: Near-optimal Block Placement and Replication in Distributed Caches," in Proc. International Symposium on Computer Architecture (ISCA), 2009.

Digital Library

[8]

J. L. Henning, "SPEC CPU2006 Benchmark Descriptions," SIGARCH Comput. Archit. News, 2006.

Digital Library

[9]

S. Kaxiras and M. Martonosi, Computer Architecture Techniques for Power-Efficiency, 2008.

Digital Library

[10]

C. Kim, D. Burger, and S. W. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," in Proc. Internationl Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2002.

Digital Library

[11]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2009.

Digital Library

[12]

R. Min, W.-B. Jone, and Y. Hu, "Location Cache: A Low-Power L2 Cache System," in Proc. International Symposium on Low Power Electronics and Design (ISPLED), 2004.

Digital Library

[13]

N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," Hewlett Packard Labs, Tech. Rep., 2009.

[14]

M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy, "Reducing Set-Associative Cache Energy viaWay-Prediction and Selective Direct-Mapping," in Proc. International Symposium on Microarchitecture (MICRO), 2001.

Digital Library

[15]

A. Sembrant, E. Hagersten, and D. Black-Schaffer, "TLC: A Tag-Less Cache for Reducing Dynamic First Level Cache Energy," in Proc. International Symposium on Microarchitecture (MICRO), 2013.

Digital Library

[16]

A. Seznec, "Don'T Use the Page Number, but a Pointer to It," in Proc. International Symposium on Computer Architecture (ISCA), 1996.

Digital Library

[17]

A. Sodani, "Race to Exascale: Opportunities and Challenges," in MICRO 2011 Keynote, 2011.

[18]

SPECjbb2005, http://www.spec.org/jbb2005/.

[19]

Transaction Processing Performance Council, http://www.tpc.org/.

[20]

J. Zebchuk, E. Safi, and A. Moshovos, "A Framework for Coarse- Grain Optimizations in the On-Chip Memory Hierarchy," in Proc. International Symposium on Microarchitecture (MICRO), 2007.

Digital Library

[21]

C. Zhang, X. Zhang, and Y. Yan, "Two Fast and High-Associativity Cache Schemes," Micro, IEEE, 1997.

Digital Library

Cited By

Tran KJimborean ACarlson TKoukos KSjälander MKaxiras S(2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresACM SIGPLAN Notices10.1145/3296979.319239353:4(328-343)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192393
Tran KJimborean ACarlson TKoukos KSjälander MKaxiras SFoster JGrossman D(2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192393(328-343)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3192366.3192393
Tsai PGan YSanchez DOskin MInoue K(2018)Rethinking the memory hierarchy for modern languagesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00025(203-216)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00025
Show More Cited By

Recommendations

The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup
ISCA '14

Modern processors optimize for cache energy and performance by employing multiple levels of caching that address bandwidth, low-latency and high-capacity. A request typically traverses the cache hierarchy, level by level, until the data is found, ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture

June 2014

566 pages

ISBN:9781479943944

General Chairs:
Pen-Chung Yew
University of Minnesota
,
Antonia Zhai
University of Minnesota
,
Program Chair:
Steve Keckler
NVIDIA/University of Texas at Austin

ACM SIGARCH Computer Architecture News Volume 42, Issue 3
ISCA '14
June 2014
552 pages
ISSN:0163-5964
DOI:10.1145/2678373
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Sponsors

IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

IEEE Press

Publication History

Published: 14 June 2014

Check for updates

Qualifiers

Research-article

Conference

ISCA'14

Sponsor:

IEEE TCCA
SIGARCH

ISCA'14: The 41st Annual International Symposium on Computer Architecture

June 14 - 18, 2014

Minnesota, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
885
Total Downloads

Downloads (Last 12 months)33
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tran KJimborean ACarlson TKoukos KSjälander MKaxiras S(2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresACM SIGPLAN Notices10.1145/3296979.319239353:4(328-343)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3296979.3192393
Tran KJimborean ACarlson TKoukos KSjälander MKaxiras SFoster JGrossman D(2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192393(328-343)Online publication date: 11-Jun-2018
https://dl.acm.org/doi/10.1145/3192366.3192393
Tsai PGan YSanchez DOskin MInoue K(2018)Rethinking the memory hierarchy for modern languagesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00025(203-216)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00025
Kim YLee YMoon B(2017)A study of partitioned DIMM tree management for multimedia server systemsMultimedia Tools and Applications10.1007/s11042-016-3382-676:17(17937-17954)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1007/s11042-016-3382-6
Gandhi JBasu AHill MSwift MFlautner KWenisch TOzer EFerdman M(2014)Efficient Memory VirtualizationProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.37(178-189)Online publication date: 13-Dec-2014
https://dl.acm.org/doi/10.1109/MICRO.2014.37
Jamet AVavouliotis GJiménez DAlvarez LCasas M(2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00046
Bera RKanellopoulos KBalachandran SNovo DOlgun ASadrosadati MMutlu OHardavellas NCampanoni SGrot BKarpuzcu U(2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00015
Wang ZWeng JLowe-Power JGaur JNowatzki T(2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00060
Oliveira GGomez-Luna JOrosa LGhose SVijaykumar NFernandez ISadrosadati MMutlu O(2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3110993
Holtryd NManivannan MStenstrom PPericas M(2020)DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00066(578-589)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00066
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents