Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2665671.2665694acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup

Published: 14 June 2014 Publication History

Abstract

Modern processors optimize for cache energy and performance by employing multiple levels of caching that address bandwidth, low-latency and high-capacity. A request typically traverses the cache hierarchy, level by level, until the data is found, thereby wasting time and energy in each level. In this paper, we present the Direct-to-Data (D2D) cache that locates data across the entire cache hierarchy with a single lookup. To navigate the cache hierarchy, D2D extends the TLB with per cache-line location information that indicates in which cache and way the cache line is located. This allows the D2D cache to: 1) skip levels in the hierarchy (by accessing the right cache level directly), 2) eliminate extra data array reads (by reading the right way directly), 3) avoid tag comparisons (by eliminating the tag arrays), and 4) go directly to DRAM on cache misses (by checking the TLB). This reduces the L2 latency by 40% and saves 5-17% of the total cache hierarchy energ
D2D's lower L2 latency directly improves L2 sensitive applications' performance by 5-14%. More significantly, we can take advantage of the L2 latency reduction to optimize other parts of the micro-architecture. For example, we can reduce the ROB size for the L2 bound applications by 25%, or we can reduce the L1 cache size, delivering an overall 21% energy savings across all benchmarks, without hurting performance.

References

[1]
B. M. Beckmann and D. A. Wood, "Managing Wire Delay in Large Chip-Multiprocessor Caches," in Proc. International Symposium on Microarchitecture (MICRO), 2004.
[2]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 Simulator," SIGARCH Comput. Archit. News, 2011.
[3]
M. Boettcher, G. Gabrielli, B. M. Al-Hashimi, and D. Kershaw, "MALEC: A Multiple Access Low Energy Cache," in Proc. Design, Automation Test in Europe Conference Exhibition (DATE), 2013.
[4]
B. Calder, D. Grunwald, and J. Emer, "Predictive Sequential Associative Cache," in Proc. International Symposium on High-Performance Computer Architecture (HPCA), 1996.
[5]
Z. Chishti, M. D. Powell, and T. N. Vijaykumar, "Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2003.
[6]
E. Hagersten and A. Singhal, "Method and Apparatus for Selecting a Way of a Multi-way Associative Cache by Storing Waylets in a Translation Structure," Patent US-5--778--427, July, 1998.
[7]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, "Reactive NUCA: Near-optimal Block Placement and Replication in Distributed Caches," in Proc. International Symposium on Computer Architecture (ISCA), 2009.
[8]
J. L. Henning, "SPEC CPU2006 Benchmark Descriptions," SIGARCH Comput. Archit. News, 2006.
[9]
S. Kaxiras and M. Martonosi, Computer Architecture Techniques for Power-Efficiency, 2008.
[10]
C. Kim, D. Burger, and S. W. Keckler, "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," in Proc. Internationl Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2002.
[11]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2009.
[12]
R. Min, W.-B. Jone, and Y. Hu, "Location Cache: A Low-Power L2 Cache System," in Proc. International Symposium on Low Power Electronics and Design (ISPLED), 2004.
[13]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," Hewlett Packard Labs, Tech. Rep., 2009.
[14]
M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy, "Reducing Set-Associative Cache Energy viaWay-Prediction and Selective Direct-Mapping," in Proc. International Symposium on Microarchitecture (MICRO), 2001.
[15]
A. Sembrant, E. Hagersten, and D. Black-Schaffer, "TLC: A Tag-Less Cache for Reducing Dynamic First Level Cache Energy," in Proc. International Symposium on Microarchitecture (MICRO), 2013.
[16]
A. Seznec, "Don'T Use the Page Number, but a Pointer to It," in Proc. International Symposium on Computer Architecture (ISCA), 1996.
[17]
A. Sodani, "Race to Exascale: Opportunities and Challenges," in MICRO 2011 Keynote, 2011.
[18]
SPECjbb2005, http://www.spec.org/jbb2005/.
[19]
Transaction Processing Performance Council, http://www.tpc.org/.
[20]
J. Zebchuk, E. Safi, and A. Moshovos, "A Framework for Coarse- Grain Optimizations in the On-Chip Memory Hierarchy," in Proc. International Symposium on Microarchitecture (MICRO), 2007.
[21]
C. Zhang, X. Zhang, and Y. Yan, "Two Fast and High-Associativity Cache Schemes," Micro, IEEE, 1997.

Cited By

View all
  • (2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresACM SIGPLAN Notices10.1145/3296979.319239353:4(328-343)Online publication date: 11-Jun-2018
  • (2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192393(328-343)Online publication date: 11-Jun-2018
  • (2018)Rethinking the memory hierarchy for modern languagesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00025(203-216)Online publication date: 20-Oct-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
June 2014
566 pages
ISBN:9781479943944

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 June 2014

Check for updates

Qualifiers

  • Research-article

Conference

ISCA'14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresACM SIGPLAN Notices10.1145/3296979.319239353:4(328-343)Online publication date: 11-Jun-2018
  • (2018)SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order coresProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192393(328-343)Online publication date: 11-Jun-2018
  • (2018)Rethinking the memory hierarchy for modern languagesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00025(203-216)Online publication date: 20-Oct-2018
  • (2017)A study of partitioned DIMM tree management for multimedia server systemsMultimedia Tools and Applications10.1007/s11042-016-3382-676:17(17937-17954)Online publication date: 1-Sep-2017
  • (2014)Efficient Memory VirtualizationProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2014.37(178-189)Online publication date: 13-Dec-2014
  • (2024)A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00046(528-542)Online publication date: 2-Mar-2024
  • (2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
  • (2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
  • (2021)DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement BottlenecksIEEE Access10.1109/ACCESS.2021.31109939(134457-134502)Online publication date: 2021
  • (2020)DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00066(578-589)Online publication date: May-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media