Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2540708.2540729acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

DESC: energy-efficient data exchange using synchronized counters

Published: 07 December 2013 Publication History

Abstract

Increasing cache sizes in modern microprocessors require long wires to connect cache arrays to processor cores. As a result, the last-level cache (LLC) has become a major contributor to processor energy, necessitating techniques to increase the energy efficiency of data exchange over LLC interconnects.
This paper presents an energy-efficient data exchange mechanism using synchronized counters. The key idea is to represent information by the delay between two consecutive pulses on a set of wires, which makes the number of state transitions on the interconnect independent of the data patterns, and significantly lowers the activity factor. Simulation results show that the proposed technique reduces overall processor energy by 7%, and the L2 cache energy by 1.81× on a set of sixteen parallel applications. This efficiency gain is attained at a cost of less than 1% area overhead to the L2 cache, and a 2% delay overhead to execution time.

References

[1]
Nir Magen, Avinoam Kolodny, Uri Weiser, and Nachum Shamir. Interconnect-power dissipation in a microprocessor. International Workshop on System Level Interconnect Prediction, 2004.
[2]
A. N. Udipi, N. Muralimanohar, and R. Balasubramonian. Non-uniform power access in large caches with low-swing wires. International Conference on High Performance Computing, 2009.
[3]
A. Lambrechts, P. Raghavan, M. Jayapala, F. Catthoor, and D. Verkest. Energy-aware interconnect optimization for a coarse grained reconfigurable processor. International Conference on VLSI Design, 2008.
[4]
G. Chandra, P. Kapur, and K. C. Saraswat. Scaling trends for the on chip power dissipation. International Interconnect Technology Conference, 2002.
[5]
Nikos Hardavellas, Michael Ferdman, Anastasia Ailamaki, and Babak Falsafi. Power scaling: the ultimate obstacle to 1k-core chips. Technical Report NWU-EECS-10-05, 2010.
[6]
Bradford M. Beckmann and David A. Wood. TLC: Transmission line caches. International Symposium on Microarchitecture, 2003.
[7]
Hui Zhang and J. Rabaey. Low-swing interconnect interface circuits. International Symposium on Low Power Electronics and Design, 1998.
[8]
Changkyu Kim, Doug Burger, and Stephen W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. International Conference on Architectural Support for Programming Languages and Operating Systems, 2002.
[9]
Daniel Citron. Exploiting low entropy to reduce wire delay. IEEE Computer Architecture Letters, 2004.
[10]
A. Seznec. Decoupled sectored caches: conciliating low tag implementation cost. In International Symposium on Computer Architecture, 1994.
[11]
Julien Dusser, Thomas Piquet, and André Seznec. Zero-Content Augmented Caches. Rapport de recherche RR-6705, INRIA, 2008.
[12]
Luis Villa, Michael Zhang, and Krste Asanovic. Dynamic zero compression for cache energy reduction. International Symposium on Microarchitecture, 2000.
[13]
Rajeev Balasubramonian, Naveen Muralimanohar, Karthik Ramani, and Venkatanand Venkatachalapathy. Microarchitectural wire management for performance and power in partitioned architectures. International Symposium on High-Performance Computer Architecture, 2005.
[14]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. International Symposium on Microarchitecture, 2007.
[15]
Mircea R. Stan and Wayne P. Burleson. Bus-invert coding for low-power I/O. IEEE Transactions on VLSI Systems, 1995.
[16]
S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: Exploiting generational behavior to reduce cache leakage power. International Symposium on Computer Architecture, 2001.
[17]
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy caches: simple techniques for reducing leakage power. International Symposium on Computer Architecture, 2002.
[18]
Nam Sung Kim, David Blaauw, and Trevor Mudge. Leakage power optimization techniques for ultra deep sub-micron multi-level caches. International Conference on Computer-Aided Design, 2003.
[19]
J. G. Proakis. Digital Communications. Third Edition, McGraw-Hill, 1995.
[20]
K. Itoh, K. Sasaki, and Y. Nakagome. Trends in low- power RAM circuit technologies. Symposium on Low Power Electronics, 1994.
[21]
J. Wuu, D. Weiss, C. Morganti, and M. Dreesen. The asynchronous 24MB on-chip level-3 cache for a dual-core Itanium®-family processor. International Solid-State Circuits Conference, 2005.
[22]
C. W. Slayman. Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations. IEEE Transactions on Device and Materials Reliability, 2005.
[23]
Y. Q. Shi, Xi Min Zhang, Zhi-Cheng Ni, and N. Ansari. Interleaving for combating bursts of errors. IEEE Circuits and Systems Magazine, 2004.
[24]
Doe Hyun Yoon and Mattan Erez. Memory mapped ecc: Low-cost error protection for last level caches. International Symposium on Computer Architecture, 2009.
[25]
Jose Renau et al. SESC simulator, Jan. 2005. http://sesc.sourceforge.net.
[26]
Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. International Symposium on Computer Architecture, 2009.
[27]
K. Kanda, T. Miyazaki, Min Kyeong Sik, H. Kawaguchi, and T. Sakurai. Two orders of magnitude leakage power reduction of low voltage SRAMs by row-by-row dynamic VDD control (RRDV) scheme. International ASIC/SOC Conference, 2002.
[28]
S. Rusu, S. Tam, H. Muljono, D. Ayers, and J. Chang. A dual-core multi-threaded xeon processor with 16MB L3 cache. International Solid-State Circuits Conference, 2006.
[29]
Varghese George, Sanjeev Jahagirdar, Chao Tong, K. Smits, Satish Damaraju, Scott Siers, Ves Naydenov, Tanveer Khondker, Sanjib Sarkar, and Puneet Singh. Penryn: 45-nm next generation Intel Core 2 processor. Asian Solid-State Circuits Conference, 2007.
[30]
D. James. Intel Ivy Bridge unveiled - the first commercial tri-gate, high-k, metal-gate CPU. Custom Integrated Circuits Conference, 2012.
[31]
E. Karl, Yih Wang, Yong-Gee Ng, Zheng Guo, F. Hamzaoglu, U. Bhattacharya, K. Zhang, K. Mistry, and M. Bohr. A 4.6GHz 162MB SRAM design in 22nm tri-gate CMOS technology with integrated active vmin-enhancing assist circuitry. International Solid-State Circuits Conference, 2012.
[32]
N. Maeda, S. Komatsu, M. Morimoto, and Y. Shimazaki. A 0.41 ua standby leakage 32kb embedded SRAM with low-voltage resume-standby utilizing all digital current comparator in 28nm hkmg CMOS. Symposium on VLSI Circuits, 2012.
[33]
Masaki Fujigaya, Noriaki Sakamoto, Takao Koike, Takahiro Irita, Kohei Wakahara, Tsugio Matsuyama, Keiji Hasegawa, Toshiharu Saito, Akira Fukuda, Kaname Teranishi, Kazuki Fukuoka, Noriaki Maeda, Koji Nii, Takeshi Kataoka, and Toshihiro Hattori. A 28nm high-k metal-gate single-chip communications processor with 1.5GHz dual-core application processor and LTE/HSPA+-capable baseband processor. International Solid-State Circuits Conference, 2013.
[34]
Fumihiko Tachibana, Osamu Hirabayashi, Yasuhisa Takeyama, Miyako Shizuno, Atsushi Kawasumi, Keiichi Kushida, Azuma Suzuki, Yusuke Niki, Shinichi Sasaki, Tomoaki Yabe, and Yasuo Unekawa. A 27% active and 85% standby power reduction in dual-power-supply SRAM using BL power calculator and digitally controllable retention circuit. International Solid-State Circuits, 2013.
[35]
Richard M. Yoo, Anthony Romano, and Christos Kozyrakis. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. International Symposium on Workload Characterization, 2009.
[36]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In International Symposium on Computer Architecture, 1995.
[37]
L. Dagum and R. Menon. OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science and Engineering, 1998.
[38]
D. H. Bailey et al. NAS parallel benchmarks. Technical report RNR-94-007., NASA Ames Research Center, 1994.
[39]
Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. Simpoint 3.0: Faster and more flexible program analysis. Journal of Instruction Level Parallelism, 2005.
[40]
Encounter RTL compiler. http://www.cadence.com/products/ld/rtl_compiler/.
[41]
Free PDK 45nm open-access based PDK for the 45nm technology node. http://www.eda.ncsu.edu/wiki/FreePDK.
[42]
ITRS. International Technology Roadmap for Semiconductors. http://www.itrs.net/links/2010itrs/home2010.htm.
[43]
Wei Zhao and Yu Cao. New generation of predictive technology model for sub-45nm design exploration. International Symposium on Quality Electronic Design, 2006.
[44]
John L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News, 2006.

Cited By

View all
  • (2020)Energy-Efficient Time-Based Adaptive Encoding for Off-Chip CommunicationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2020.301806228:12(2551-2562)Online publication date: Dec-2020
  • (2020)STFL-DDR: Improving the Energy-Efficiency of Memory InterfaceIEEE Transactions on Computers10.1109/TC.2020.297882669:12(1823-1834)Online publication date: 1-Dec-2020
  • (2019)STFLProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317819(1-6)Online publication date: 2-Jun-2019
  • Show More Cited By

Index Terms

  1. DESC: energy-efficient data exchange using synchronized counters

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
      December 2013
      498 pages
      ISBN:9781450326384
      DOI:10.1145/2540708
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 December 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. caches
      2. data encoding
      3. interconnect
      4. low power
      5. signaling

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MICRO-46
      Sponsor:

      Acceptance Rates

      MICRO-46 Paper Acceptance Rate 39 of 239 submissions, 16%;
      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Energy-Efficient Time-Based Adaptive Encoding for Off-Chip CommunicationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2020.301806228:12(2551-2562)Online publication date: Dec-2020
      • (2020)STFL-DDR: Improving the Energy-Efficiency of Memory InterfaceIEEE Transactions on Computers10.1109/TC.2020.297882669:12(1823-1834)Online publication date: 1-Dec-2020
      • (2019)STFLProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317819(1-6)Online publication date: 2-Jun-2019
      • (2018)What Your DRAM Power Models Are Not Telling YouProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244192:3(1-41)Online publication date: 21-Dec-2018
      • (2018)Read-Tuned STT-RAM and eDRAM Cache Hierarchies for Throughput and Energy OptimizationIEEE Access10.1109/ACCESS.2018.28136686(14576-14590)Online publication date: 2018
      • (2017)Adaptive Time-based Encoding for Energy-Efficient Large Cache ArchitecturesProceedings of the 5th International Workshop on Energy Efficient Supercomputing10.1145/3149412.3149417(1-8)Online publication date: 12-Nov-2017
      • (2017)BVFProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123944(532-545)Online publication date: 14-Oct-2017
      • (2017)Temporal codes in on-chip interconnects2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)10.1109/ISLPED.2017.8009158(1-6)Online publication date: Jul-2017
      • (2016)Reducing data movement energy via online data clustering and encodingThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195676(1-13)Online publication date: 15-Oct-2016
      • (2016)A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.243578827:5(1524-1536)Online publication date: 1-May-2016
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media