Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Set-associative cache simulation using generalized binomial trees

Published: 01 February 1995 Publication History

Abstract

Set-associative caches are widely used in CPU memory hierarchies, I/O subsystems, and file systems to reduce average access times. This article proposes an efficient simulation technique for simulating a group of set-associative caches in a single pass through the address trace, where all caches have the same line size but varying associativities and varying number of sets. The article also introduces a generalization of the ordinary binomial tree and presents a representation of caches in this class using the Generalized Binomial Tree (gbt). The tree representation permits efficient search and update of the caches. Theoretically, the new algorithm, GBF_LS, based on the gbt structure, always takes fewer comparisons than the two earlier algorithms for the same class of caches: all-associativity and generalized forest simulation. Experimentally, the new algorithm shows performance gains in the range of 1.2 to 3.8 over the earlier algorithms on address traces of the SPEC benchmarks. A related algorithm for simulating multiple alternative direct-mapped caches with fixed cache size, but varying line size, is also presented.

References

[1]
BENNETT, B. T AND KRUSKAL, V.J. 1975. LRU stack processing. IBM J. Res. Devel. (July), 353 357.
[2]
HEIDELBERGER, P. AND STONE, H. S. 1990. Parallel trace-driven cache simulation by time partitioning. In Proceedings of the Winter Simulation Conference IEEE, New York, 734-737.
[3]
HILL, M. D AND SMITH, A. J. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. 38, 12 (Dec.), 1612-1630.
[4]
LAHA, S., PATEL, J. H., AND IYER, R. K. 1988. Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Trans. Comput. C-37, 11 (Nov.), 1925-1936.
[5]
MATTSON, R. L., GECSEI, J., SLUTZ, D. R., AND TRAIGER, I.L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78-117.
[6]
OLKEN, F. 1981. Efficient methods for calculating the success function of fixed space replacement policies. Tech. Rep. LBL-12370, Lawrence Berkeley Laboratory, Berkeley, Calif.
[7]
PUZAK, T. R. 1985. Analysis of cache replacement algorithms. Ph.D. thesis, Univ. of Massachusetts, Amherst, Mass.
[8]
SMITH, A.J. 1985. Disk cache--miss ratio analysis and design considerations. ACM Trans. Comput. Syst. 3, 3, 161-203.
[9]
SMITH, A.J. 1982. Cache memories. ACM Comput Surv. 14, 3 (Sept.), 473-530.
[10]
SMITH, A.J. 1977. Two methods for the efficient analysis of memory address trace data. IEEE Trans. Softw. Eng. SE-3, 1 (Jan.), 94-101.
[11]
STONE, H.S. 1987. High-Performance Computer Architecture. 2nd ed. Addison-Wesley, Readrag, Mass.
[12]
SUGUMAR, R.A. 1993. Mult-configuration simulation algorithms for the evaluation of computer architecture designs. Ph.D. thesis, Univ. of Michigan, Ann Arbor, Mich.
[13]
SUGUMAR, R. A. AND ABRAHAM, S. G. 1993. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings ofACM SIGMETRICS Conference. ACM, New York. 24-35.
[14]
THOMPSON, J.G. 1987. Efficient analysis of caching systems. Ph.D. thesis, Univ. of California, Berkeley.
[15]
TRAIGER, I. L. AND SLUTZ, D. R. 1971. One pass techniques for the evaluation of memory hierarchies. Tech. Rep. RJ 892, IBM, Armonk, N.Y.
[16]
WANe, W.-H. ANO BAER, J.-L. 1990. Efficient trace-driven simulation methods for cache performance analysis. In Proceedings ofACM SIGMETRICS Conference. ACM, New York, 27-36.

Cited By

View all
  • (2020)A Machine Learning Methodology for Cache Memory Design Based on Dynamic InstructionsACM Transactions on Embedded Computing Systems10.1145/337692019:2(1-20)Online publication date: 11-Mar-2020
  • (2019)Adaptive Thread Scheduling in Chip MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-019-00637-yOnline publication date: 14-May-2019
  • (2019)Efficient Cache Simulation for Affine ComputationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_6(65-85)Online publication date: 15-Nov-2019
  • Show More Cited By

Recommendations

Reviews

Mircea Raducu Stan

Caches are used to decrease average access times in CPU memory hierarchies, file systems, and so on. The performance of a cache under a given workload is determined by three parameters: cache size, line size, and degree of associativity. The current wisdom for choosing an optimal set of cache parameters is to simulate various cache configurations on address traces of representative workloads (such as the SPEC89 suite). Cache simulation is time-intensive, so several methods are used to reduce simulation time: reduced-trace simulation, parallel simulation, and single-pass simulation. The authors propose two efficient single-pass simulation techniques. The first algorithm uses generalized binomial trees to simulate in a single pass a group of set-associative caches with the same line size but varying numbers of sets and varying associativity. The second, related algorithm can be used for simulating in a single pass a group of direct-mapped caches with fixed size but varying line sizes. Part 1, “Introduction,” and part 2, “Related Work,” present a good review of previous work on single-step set-associative cache simulation, with emphasis on the AA (all-associativity) and FS+ (forest simulation) algorithms. Part 3, “Set-associative Cache Simulation,” lays the theoretical foundation for single-step cache simulation, introduces the concepts of generalized binomial tree and forest with their properties, and then presents the main GBF-LS (generalized binomial forest—fixed line size) algorithm. The authors show that the new algorithm is always more efficient than the AA and FS+ algorithms. Part 4, “Direct-mapped Caches,” presents a related single-step algorithm for direct-mapped caches with varying line sizes, and part 5, “Empirical Performance Evaluations,” shows experimental comparisons between the new and previously proposed algorithms. The appendix gives formal definitions for generalized binomial trees and proves some of their properties. Anyone involved in cache simulation will find this paper useful, especially since the single-step simulation techniques described here can be used with reduced-trace and parallel simulation for maximum efficiency.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 13, Issue 1
Feb. 1995
88 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/200912
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 1995
Published in TOCS Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. all-associativity simulation
  2. binomial tree
  3. cache modeling
  4. inclusion properties
  5. set-associative caches
  6. single-pass simulation
  7. trace-driven simulation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)11
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)A Machine Learning Methodology for Cache Memory Design Based on Dynamic InstructionsACM Transactions on Embedded Computing Systems10.1145/337692019:2(1-20)Online publication date: 11-Mar-2020
  • (2019)Adaptive Thread Scheduling in Chip MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-019-00637-yOnline publication date: 14-May-2019
  • (2019)Efficient Cache Simulation for Affine ComputationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_6(65-85)Online publication date: 15-Nov-2019
  • (2018)Analytical Miss Rate Calculation of L2 Cache from the RD Profile of L1 CacheIEEE Transactions on Computers10.1109/TC.2017.272387867:1(9-15)Online publication date: 1-Jan-2018
  • (2018)Analytical Two-Level Near Threshold Cache Exploration for Low Power Biomedical ApplicationsAdvanced Computer Architecture10.1007/978-981-13-2423-9_8(95-108)Online publication date: 13-Sep-2018
  • (2015)Fast and precise cache performance estimation for out-of-order executionProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition10.5555/2755753.2757075(1132-1137)Online publication date: 9-Mar-2015
  • (2015)Exploring Multilevel Cache Hierarchies in Application Specific MPSoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.244573634:12(1991-2003)Online publication date: 18-Nov-2015
  • (2015)Speeding up single pass simulation of PLRUt cachesThe 20th Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2015.7059091(695-700)Online publication date: Jan-2015
  • (2014)Hardware-based fast exploration of cache hierarchies in application specific MPSoCsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2617020(1-6)Online publication date: 24-Mar-2014
  • (2014)MASH{fifo}Proceedings of the 51st Annual Design Automation Conference10.1145/2593069.2593159(1-6)Online publication date: 1-Jun-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media