Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/CGO.2007.6acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems

Published: 11 March 2007 Publication History

Abstract

As a result of process parameter variations, a large variability in circuit delay occurs in scaled technologies. This delay or latency variation problem is particularly pressing for memory components due to the minimum sized transistors used to build them. Current memory design techniques mostly cope with such variations by adopting a worst-case design option, which simply assumes all memory locations are operated under the worst possible latency, whereas in reality some memory locations could be much faster than the others. Note that, assuming any other latency value other than the worst-case latency for all memory locations uniformly can lead to reliability problems, since the data may not be ready when the assumed latency has passed. Instead of operating under the worst-case design option, this paper proposes and experimentally evaluates a compilerdriven approach that operates an on-chip scratch-pad memory (SPM) assuming different latencies for the different SPM lines. Our goal is to reduce execution cycles without creating any reliability problems due to variations in access latencies. The proposed scheme achieves its goal by evaluating the reuse of different data items and adopting a reuse and latency aware data-to-SPM placement. It also employs data migration within SPM when it helps to cut down the number of execution cycles further. We also discuss an alternate scheme that can reduce latency of select SPM locations by controlling a circuit level mechanism in software to further improve performance. We implemented our approach within an optimizing compiler and tested its effectiveness through extensive simulations. Our experiments with twelve embedded application codes show that the proposed approach performs much better than the worst-case based design paradigm (16.2% improvement on the average) and comes close (within 5.7%) to an hypothetical bestcase design (i.e., one with no process variation) where every SPM locations uniformly have low latency.

References

[1]
{1} Mibench.
[2]
{2} Tms370cx7x 8-bit microcontroller, texas instruments, Revised February 1997.
[3]
{3} Cpu12 reference manual, motorola corporation, 2000.
[4]
{4} Mp98: A mobile processor, http://www.labs.nec.co.jp /MP98/top-e.htm.
[5]
{5} Majc-5200, http://www.sun.com/microelectronics/MAJC /5200wp.html.
[6]
{6} A. Agarwal, B. Paul, S. Mukhopadhyay, and K. Roy. Process variation in embedded memories: failure analysis and variation aware architecture. IEEE Journal of Solid-State Circuits, 40(9):1804-1814, 2005.
[7]
{7} F. Angiolini, L. Benini, and A. Caprara. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 318-326, 2003.
[8]
{8} T. Austin, E. Larson, and D. Ernst. Simplescalar: An infrastructure for computer system modeling. IEEE Computer, 35(2):59-67, 2002.
[9]
{9} O. Avissar, R. Barua, and D. Stewart. An optimal memory allocation scheme for scratch-pad-based embedded systems. Transactions on Embedded Computing Systems, 1(1):6-26, 2002.
[10]
{10} S. Borkar. Vlsi design challenges for 2015+. GSRC Quarterly Workshop, March 2005.
[11]
{11} K. Bowman, S. Duvall, and J. Meindl. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits, 37(2):183-190, 2002.
[12]
{12} F. Catthoor, E. de Greef, and S. Suytack. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, MA, USA, 1998.
[13]
{13} H. Chang and S. Sapatnekar. Full-chip analysis of leakage power under process variations, including spatial correlations. In Proceedings of DAC, 2005.
[14]
{14} C.-H. Chen and A. K. Somani. Fault-containment in cache memories for tmr redundant processor systems. IEEE Trans. Comput., 48(4), 1999.
[15]
{15} Q. Chen, H. Mahmoodi, S. Bhunia, and K. Roy. Modeling and testing of sram for new failure mechanisms due to process variations in nanoscale cmos. In Proceedings of the VLSI Testing Symposium, pages 53-59, 2005.
[16]
{16} T. Chen and S. Naffziger. Comparison of adaptive body bias (abb) and adaptive supply voltage(asv) for improving delay and leakage under the presence of process variation. IEEE Trans. on VLSI systems, 11(5):888-899, 2003.
[17]
{17} P. Z. et al. Process and environmental variation impacts on asic timing. In Proceedings of DAC, pages 336-342, 2005.
[18]
{18} S. B. et al. Parameter variations and impact on circuits and microarchitecture. In Proceedings of DAC, 2003.
[19]
{19} P. Francesco, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. M. Mendias. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Conference on Design Automation, pages 238-243, 2004.
[20]
{20} J. Gregg and T. Chen. Post silicon power/performance optimization in the presense of process variations using individ dual well adaptive body biasing (iwabb). In Proceedings of ISQED, pages 453-458, 2004.
[21]
{21} R. Heald and P. Wang. Variability in sub-100nm sram design. In Proceedings of ICCAD, pages 347-353, 2004.
[22]
{22} J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 2005.
[23]
{23} M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. Dynamic management of scratch-pad memory space. In Proceedings of the 38th conference on Design automation, pages 690-695, 2001.
[24]
{24} S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: exploiting generational behavior to reduce cache leakage power. In ISCA '01: Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001.
[25]
{25} S. Kim and A. K. Somani. Area efficient architectures for information integrity in cache memories. In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 246-255, 1999.
[26]
{26} I. Kolcu. Personal communication.
[27]
{27} C. Lee, M. Potkonjak, and W. H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Proceedings of International Symposium on Microarchitecture, pages 330-335, 1997.
[28]
{28} W. Li. Compiling for numa parallel machines. PhD thesis, Cornell University, Ithaca, NY, USA, 1993.
[29]
{29} S. Mukhopadhyay, H. Mahmoodi, and K. Roy. Modeling and estimation of failure probability due to parameter variations in nano-scale srams for yield enhancement. In Proceedings of Symposium on VLSI Circuits, pages 789-796, 2004.
[30]
{30} S. Nassif. Within chip variability analysis. In Proceedings of IEEE IEDM Conference, pages 283-286, 1998.
[31]
{31} S. Nassif. Modeling and analysis of manufacturing variations. In Proceedings of CICC, pages 223-228, 2001.
[32]
{32} P. R. Panda, N. D. Dutt, and A. Nicolau. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test, 1997.
[33]
{33} A. Papanikolaou, F. Lobmaier, H. Wang, M. Miranda, and F. Catthoor. A system-level methodology for fully compensating process variability impact of memory organizations in periodic applications. In Proceedings of CODES+ISSS, pages 117-122, 2005.
[34]
{34} G. Reinman and N. P. Jouppi. Cacti 2.0: An integrated cache timing and power model. Technical report, Compaq, February 2000.
[35]
{35} K. Roy. Process variations in nanoscale technologies: Failure analysis, self-calibration, process-compensation, and fault tolerance. GSRC Quarterly Workshop, June 2005.
[36]
{36} S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe, 2002.
[37]
{37} J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage. IEEE Journal of Solid-State Circuits, 37(11):1396-1402, 2002.
[38]
{38} M. Verma, L. Wehmeyer, and P. Marwedel. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe, 2004.
[39]
{39} R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. Suif: an infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Not., 29(12):31-37, 1994.
[40]
{40} M. J. Wolfe. High Performance Compilers for Parallel Computing . Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.
[41]
{41} N. D. Zervas, K. Masselos, and C. Goutis. Code transformations for embedded multimedia applications: impact on power and performance. In Proceedings of ISCA Power-Driven Microarchitecture Workshop, 1998.
[42]
{42} W. Zhang. Enhancing data cache reliability by the addition of a small fully-associative replication cache. In Proceedings of the 18th Annual International Conference on Supercomputing , pages 12-19, 2004.
[43]
{43} W. Zhang, M. Kandemir, A. Sivasubramaniam, and M. J. Irwin. Performance, energy, and reliability tradeoffs in replicating hot cache lines. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 309-317, 2003.

Cited By

View all
  • (2009)Process variation aware thread mapping for chip multiprocessorsProceedings of the Conference on Design, Automation and Test in Europe10.5555/1874620.1874821(821-826)Online publication date: 20-Apr-2009

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '07: Proceedings of the International Symposium on Code Generation and Optimization
March 2007
346 pages
ISBN:0769527647

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 11 March 2007

Check for updates

Qualifiers

  • Article

Conference

CGO07

Acceptance Rates

CGO '07 Paper Acceptance Rate 27 of 84 submissions, 32%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2009)Process variation aware thread mapping for chip multiprocessorsProceedings of the Conference on Design, Automation and Test in Europe10.5555/1874620.1874821(821-826)Online publication date: 20-Apr-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media