Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/956417.956579acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

TLC: Transmission Line Caches

Published: 03 December 2003 Publication History

Abstract

It is widely accepted that the disproportionate scalingof transistor and conventional on-chip interconnect performancepresents a major barrier to future high performancesystems. Previous research has focused on wire-centricdesigns that use parallelism, locality, and on-chipwiring bandwidth to compensate for long wire latency.An alternative approach to this problem is to exploitnewly-emerging on-chip transmission line technology toreduce communication latency. Compared to conventionalRC wires, transmission lines can reduce delay by up to afactor of 30 for global wires, while eliminating the needfor repeaters. However, this latency reduction comes at thecost of a comparable reduction in bandwidth.In this paper, we investigate using transmission linesto access large level-2 on-chip caches. We propose a familyof Transmission Line Cache (TLC) designs that representdifferent points in the latency/bandwidth spectrum.Compared to the recently-proposed Dynamic Non-UniformCache Architecture (DNUCA) design, the base TLCdesign reduces the required cache area by 18% andreduces the interconnection network's dynamic powerconsumption by an average of 61%. The optimized TLCdesigns attain similar performance using fewer transmis-sionlines but with some additional complexity. Simulationresults using full-system simulation show that TLC providesmore consistent performance than the DNUCAdesign across a wide variety of workloads. TLC caches arelogically simpler than DNUCA designs, but requiregreater circuit and manufacturing complexity.

References

[1]
{1} V. Agarwal, S. W. Keckler, and D. Burger. The Effect of Technology Scaling on Microarchitectural Structures. Technical Report TR-00-02, Department of Computer Sciences, University of Texas at Austin, May 2001.
[2]
{2} A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. Xu, D. J. Sorin, M. D. Hill, and D. A. Wood. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer, 36(2):50-57, Feb. 2003.
[3]
{3} B. S. Amrutur and M. A. Horowitz. Speed and Power Scaling of SRAMs. IEEE Transactions on Solid-State Circuits, 35(2):175- 185, Feb. 2000.
[4]
{4} H. Bao, J. Bielak, O. Ghattas, L. F. Kallivokas, D. R. O'Hallaron, J. R. Shewchuk, and J. Xu. Large-scale simulation of elastic wave propagation in heterogeneous media on parallel computers. Computer Methods in Applied Mechanics and Engineering, pages 85-102, 1998.
[5]
{5} P. Barford and M. Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. In Proceedings of the 1998 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 151-160, June 1998.
[6]
{6} B. J. Benschneider and et. al. A 300-MHz 64-b Quad-Issue CMOS RISC Microprocessor. IEEE Journal of Solid-State Circuits, 30(11):1203-1214, Nov. 1995.
[7]
{7} A. S. Brown. Fast Films. IEEE Spectrum, 20(2):36-40, Feb. 2003.
[8]
{8} R. T. Chang, N. Talwalkar, C. P. Yue, and S. S. Wong. Near Speed-of-Light Signaling Over On-Chip Electrical Interconnects. IEEE Journal of Solid-State Circuits, 38(5):834-838, May 2003.
[9]
{9} C. T. Chaung. Design Considerations of SOI Digital CMOS. In Proceedings of the IEEE 1998 International SOI Conference, pages 5-8, 1998.
[10]
{10} W. J. Dally and J. W. Poulton. Digital Systems Engineering. Cambridge University Press, 1998.
[11]
{11} A. Deutsch. Electrical Characteristics of Interconnections for High-Performance Systems. Proceedings of the IEEE, 86(2):315-355, Feb. 1998.
[12]
{12} A. R. Djordjevic, M. B. Bazdar, T. K. Sarkar, and R. F. Harrington. Matrix Parameters for Multiconductor Transmission Lines: Software and User's Manual. Artech House, 1989.
[13]
{13} I. T. R. for Semiconductors. ITRS 1999 Edition. Semiconductor Industry Association, 1999.
[14]
{14} I. T. R. for Semiconductors. ITRS 2002 Update. Semiconductor Industry Association, 2002. http://public.itrs.net/Files/2002Update/2002Update.pdf.
[15]
{15} J. L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium. IEEE Computer, 33(7):28-35, July 2000.
[16]
{16} G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, Feb. 2001.
[17]
{17} R. Ho, K. W. Mai, and M. A. Horowitz. The Future of Wires. Proceedings of the IEEE, 89(4):490-504, Apr. 2001.
[18]
{18} M. S. Hrishikesh, N. P. Jouppi, K. I. Farkas, D. Burger, S. W. Keckler, and P. Shivakumar. The Optimal Logic Depth Per Pipeline Stage is 6 to 8 Inverter Delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002.
[19]
{19} S. Kempainen. LVDS Provides Higher Bit Rates, Lower Power, and Improved Noise Performance. http://www.measurement.tm.agilent.com/insight/2000_v5_i2/insig ht_v5i2_articl%e01.shtml, 2000.
[20]
{20} R. E. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24-36, March/April 1999.
[21]
{21} R. E. Kessler, R. Jooss, A. Lebeck, and M. D. Hill. Inexpensive Implementations of Set-Associativity. In Proceedings of the 16th Annual International Symposium on Computer Architecture, May 1989.
[22]
{22} S. P. Khatri and et. al. A Novel VLSI Layout Fabric for Deep Sub-Micron Applications. In Design Automation Conference, pages 491-496, June 1999.
[23]
{23} C. Kim. Personal Communication, May 2003.
[24]
{24} C. Kim, D. Burger, and S. W. Keckler. An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2002.
[25]
{25} G. K. Konstadinidis and et. al. Implementation of a Third-Generation 1.1-GHz 64-bit Microprocessor. IEEE Journal of Solid-State Circuits, 37(11):1461-1469, Nov. 2002.
[26]
{26} P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50-58, Feb. 2002.
[27]
{27} C. J. Mauer, M. D. Hill, and D. A. Wood. Full System Timing-First Simulation. In Proceedings of the 2002 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 108- 116, June 2002.
[28]
{28} C. McNairy and D. Soltis. Itanium 2 Processor Microarchitecture. IEEE Micro, 23(2):44-55, March/April 2003.
[29]
{29} M. Minzuno, K. Anjo, Y. Sumi, M. Fukaishi, H. Wakabayashi, T. Mogami, T. Horiuchi, and M. Yamashina. Clock Distribution Networks with On-Chip Transmission Lines. In Proceedings of the IEEE 2000 International Interconnect Technology Conference, pages 3-5, 2000.
[30]
{30} R. Nagarajan, K. Sankaralingam, D. Burger, and S. Keckler. A Design Space Evaluation of Grid Processor Architectures. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture, pages 40-51, Dec. 2001.
[31]
{31} S. Palacharla and J. E. Smith. Complexity-Effective Superscalar Processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 206-218, June 1997.
[32]
{32} D. A. Priore. Inductance on Silicon for Sub-micron CMOS VLSI. In Proceedings of the 1993 Symposium on VLSI Circuits, pages 17-18, 1993.
[33]
{33} M. Racanelli and et. al. Ultra High Speed SiGe NPN for Advanced BiCMOS Technology. Electron Devices Meeting, IEDM Technical Digest. International, pages 15.3.1-15.3.4, 2001.
[34]
{34} D. Sylvester, W. Jiang, and K. Keutzer. BACPAC - Berkeley Advanced Chip Performance Calculator website. http://www-device.eecs.berkeley.edu/dennis/bacpac/.
[35]
{35} D. Sylvester and K. Keutzer. Getting to the Bottom of Deep Submicron II: a Global Wiring Paradigm. In Proceedings of the 1999 International Symposium on Physical Design, pages 193-200, 1999.
[36]
{36} Systems Performance Evaluation Cooperation. SPEC Benchmarks. http://www.spec.org.
[37]
{37} J. M. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Server Group Whitepaper, Oct. 2001.
[38]
{38} F. F. Tsui. JSP - A Research Signal Processor in Josephson Technology. IBM Journal of Research and Development, 24(2):243-252, Mar. 1980.
[39]
{39} H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: A Power-Performance Simulator for Interconnection Networks. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture, pages 294-305, Nov. 2002.
[40]
{40} J. D. Warnock and et. al. The Circuit and Physical Design of the POWER4 Microprocessor. IBM Journal of Research and Development, 46(1):27-51, Jan. 2002.
[41]
{41} N. Weste and K. Eshragian. Principles of CMOS VLSI Design: A Systems Perspective. Addison-Wesley Publishing Co., 1982.
[42]
{42} C.-Y. Wu and M.-C. Shiau. Delay Models and Speed Improvement Techniques for RC Tree Interconnections Among Small-Geometry CMOS Inverters. IEEE Journal of Solid-State Circuits, 25(5):1247- 1256, Oct. 1990.
[43]
{43} T. Xanthopoulos, D. W. Bailey, M. K. G. Atul K. Gangwar, A. K. Jain, and B. K. Prewitt. The Design and Analysis of the Clock Distribution Network for a 1.2 GHz Alpha Microprocessor. In Proceedings of the IEEE 2001 International Solid-State Circuits Conference, pages 402-403, 2001.

Cited By

View all
  • (2019)ReplicaProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304033(849-863)Online publication date: 4-Apr-2019
  • (2018)What Your DRAM Power Models Are Not Telling YouProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244192:3(1-41)Online publication date: 21-Dec-2018
  • (2017)BVFProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123944(532-545)Online publication date: 14-Oct-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
December 2003
412 pages
ISBN:076952043X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 03 December 2003

Check for updates

Qualifiers

  • Article

Conference

MICRO-36
Sponsor:

Acceptance Rates

MICRO 36 Paper Acceptance Rate 35 of 134 submissions, 26%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)ReplicaProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304033(849-863)Online publication date: 4-Apr-2019
  • (2018)What Your DRAM Power Models Are Not Telling YouProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244192:3(1-41)Online publication date: 21-Dec-2018
  • (2017)BVFProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123944(532-545)Online publication date: 14-Oct-2017
  • (2016)RacerThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195678(1-13)Online publication date: 15-Oct-2016
  • (2016)A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA ArchitectureACM Transactions on Design Automation of Electronic Systems10.1145/290794622:1(1-26)Online publication date: 27-May-2016
  • (2014)A thermal resilient integration of many-core microprocessors and main memory by 2.5D TSI I/OsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616824(1-4)Online publication date: 24-Mar-2014
  • (2014)Locality-oblivious cache organization leveraging single-cycle multi-hop NoCsACM SIGARCH Computer Architecture News10.1145/2654822.254197642:1(715-728)Online publication date: 24-Feb-2014
  • (2014)Locality-oblivious cache organization leveraging single-cycle multi-hop NoCsACM SIGPLAN Notices10.1145/2644865.254197649:4(715-728)Online publication date: 24-Feb-2014
  • (2014)Locality-oblivious cache organization leveraging single-cycle multi-hop NoCsProceedings of the 19th international conference on Architectural support for programming languages and operating systems10.1145/2541940.2541976(715-728)Online publication date: 24-Feb-2014
  • (2013)Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnectProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523764(309-318)Online publication date: 7-Oct-2013
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media