Article

Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems

Authors:

M. KarakoyAuthors Info & Claims

CGO '07: Proceedings of the International Symposium on Code Generation and Optimization

Pages 232 - 243

https://doi.org/10.1109/CGO.2007.6

Published: 11 March 2007 Publication History

Abstract

As a result of process parameter variations, a large variability in circuit delay occurs in scaled technologies. This delay or latency variation problem is particularly pressing for memory components due to the minimum sized transistors used to build them. Current memory design techniques mostly cope with such variations by adopting a worst-case design option, which simply assumes all memory locations are operated under the worst possible latency, whereas in reality some memory locations could be much faster than the others. Note that, assuming any other latency value other than the worst-case latency for all memory locations uniformly can lead to reliability problems, since the data may not be ready when the assumed latency has passed. Instead of operating under the worst-case design option, this paper proposes and experimentally evaluates a compilerdriven approach that operates an on-chip scratch-pad memory (SPM) assuming different latencies for the different SPM lines. Our goal is to reduce execution cycles without creating any reliability problems due to variations in access latencies. The proposed scheme achieves its goal by evaluating the reuse of different data items and adopting a reuse and latency aware data-to-SPM placement. It also employs data migration within SPM when it helps to cut down the number of execution cycles further. We also discuss an alternate scheme that can reduce latency of select SPM locations by controlling a circuit level mechanism in software to further improve performance. We implemented our approach within an optimizing compiler and tested its effectiveness through extensive simulations. Our experiments with twelve embedded application codes show that the proposed approach performs much better than the worst-case based design paradigm (16.2% improvement on the average) and comes close (within 5.7%) to an hypothetical bestcase design (i.e., one with no process variation) where every SPM locations uniformly have low latency.

References

[1]

{1} Mibench.

[2]

{2} Tms370cx7x 8-bit microcontroller, texas instruments, Revised February 1997.

[3]

{3} Cpu12 reference manual, motorola corporation, 2000.

[4]

{4} Mp98: A mobile processor, http://www.labs.nec.co.jp /MP98/top-e.htm.

[5]

{5} Majc-5200, http://www.sun.com/microelectronics/MAJC /5200wp.html.

[6]

{6} A. Agarwal, B. Paul, S. Mukhopadhyay, and K. Roy. Process variation in embedded memories: failure analysis and variation aware architecture. IEEE Journal of Solid-State Circuits, 40(9):1804-1814, 2005.

[7]

{7} F. Angiolini, L. Benini, and A. Caprara. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 318-326, 2003.

Digital Library

[8]

{8} T. Austin, E. Larson, and D. Ernst. Simplescalar: An infrastructure for computer system modeling. IEEE Computer, 35(2):59-67, 2002.

Digital Library

[9]

{9} O. Avissar, R. Barua, and D. Stewart. An optimal memory allocation scheme for scratch-pad-based embedded systems. Transactions on Embedded Computing Systems, 1(1):6-26, 2002.

Digital Library

[10]

{10} S. Borkar. Vlsi design challenges for 2015+. GSRC Quarterly Workshop, March 2005.

[11]

{11} K. Bowman, S. Duvall, and J. Meindl. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits, 37(2):183-190, 2002.

[12]

{12} F. Catthoor, E. de Greef, and S. Suytack. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, MA, USA, 1998.

Digital Library

[13]

{13} H. Chang and S. Sapatnekar. Full-chip analysis of leakage power under process variations, including spatial correlations. In Proceedings of DAC, 2005.

Digital Library

[14]

{14} C.-H. Chen and A. K. Somani. Fault-containment in cache memories for tmr redundant processor systems. IEEE Trans. Comput., 48(4), 1999.

Digital Library

[15]

{15} Q. Chen, H. Mahmoodi, S. Bhunia, and K. Roy. Modeling and testing of sram for new failure mechanisms due to process variations in nanoscale cmos. In Proceedings of the VLSI Testing Symposium, pages 53-59, 2005.

Digital Library

[16]

{16} T. Chen and S. Naffziger. Comparison of adaptive body bias (abb) and adaptive supply voltage(asv) for improving delay and leakage under the presence of process variation. IEEE Trans. on VLSI systems, 11(5):888-899, 2003.

Digital Library

[17]

{17} P. Z. et al. Process and environmental variation impacts on asic timing. In Proceedings of DAC, pages 336-342, 2005.

Digital Library

[18]

{18} S. B. et al. Parameter variations and impact on circuits and microarchitecture. In Proceedings of DAC, 2003.

Digital Library

[19]

{19} P. Francesco, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. M. Mendias. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Conference on Design Automation, pages 238-243, 2004.

Digital Library

[20]

{20} J. Gregg and T. Chen. Post silicon power/performance optimization in the presense of process variations using individ dual well adaptive body biasing (iwabb). In Proceedings of ISQED, pages 453-458, 2004.

Digital Library

[21]

{21} R. Heald and P. Wang. Variability in sub-100nm sram design. In Proceedings of ICCAD, pages 347-353, 2004.

Digital Library

[22]

{22} J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 2005.

Digital Library

[23]

{23} M. Kandemir, J. Ramanujam, J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. Dynamic management of scratch-pad memory space. In Proceedings of the 38th conference on Design automation, pages 690-695, 2001.

Digital Library

[24]

{24} S. Kaxiras, Z. Hu, and M. Martonosi. Cache decay: exploiting generational behavior to reduce cache leakage power. In ISCA '01: Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001.

Digital Library

[25]

{25} S. Kim and A. K. Somani. Area efficient architectures for information integrity in cache memories. In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 246-255, 1999.

Digital Library

[26]

{26} I. Kolcu. Personal communication.

[27]

{27} C. Lee, M. Potkonjak, and W. H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Proceedings of International Symposium on Microarchitecture, pages 330-335, 1997.

Digital Library

[28]

{28} W. Li. Compiling for numa parallel machines. PhD thesis, Cornell University, Ithaca, NY, USA, 1993.

Digital Library

[29]

{29} S. Mukhopadhyay, H. Mahmoodi, and K. Roy. Modeling and estimation of failure probability due to parameter variations in nano-scale srams for yield enhancement. In Proceedings of Symposium on VLSI Circuits, pages 789-796, 2004.

[30]

{30} S. Nassif. Within chip variability analysis. In Proceedings of IEEE IEDM Conference, pages 283-286, 1998.

[31]

{31} S. Nassif. Modeling and analysis of manufacturing variations. In Proceedings of CICC, pages 223-228, 2001.

[32]

{32} P. R. Panda, N. D. Dutt, and A. Nicolau. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Conference on Design and Test, 1997.

Digital Library

[33]

{33} A. Papanikolaou, F. Lobmaier, H. Wang, M. Miranda, and F. Catthoor. A system-level methodology for fully compensating process variability impact of memory organizations in periodic applications. In Proceedings of CODES+ISSS, pages 117-122, 2005.

Digital Library

[34]

{34} G. Reinman and N. P. Jouppi. Cacti 2.0: An integrated cache timing and power model. Technical report, Compaq, February 2000.

[35]

{35} K. Roy. Process variations in nanoscale technologies: Failure analysis, self-calibration, process-compensation, and fault tolerance. GSRC Quarterly Workshop, June 2005.

[36]

{36} S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe, 2002.

Digital Library

[37]

{37} J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage. IEEE Journal of Solid-State Circuits, 37(11):1396-1402, 2002.

[38]

{38} M. Verma, L. Wehmeyer, and P. Marwedel. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe, 2004.

Digital Library

[39]

{39} R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. Suif: an infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Not., 29(12):31-37, 1994.

Digital Library

[40]

{40} M. J. Wolfe. High Performance Compilers for Parallel Computing . Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.

Digital Library

[41]

{41} N. D. Zervas, K. Masselos, and C. Goutis. Code transformations for embedded multimedia applications: impact on power and performance. In Proceedings of ISCA Power-Driven Microarchitecture Workshop, 1998.

[42]

{42} W. Zhang. Enhancing data cache reliability by the addition of a small fully-associative replication cache. In Proceedings of the 18th Annual International Conference on Supercomputing , pages 12-19, 2004.

Digital Library

[43]

{43} W. Zhang, M. Kandemir, A. Sivasubramaniam, and M. J. Irwin. Performance, energy, and reliability tradeoffs in replicating hot cache lines. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 309-317, 2003.

Digital Library

Cited By

Hong SNarayanan SKandemir MÖzturk ÖBenini LDe Micheli GAl-Hashimi BMueller W(2009)Process variation aware thread mapping for chip multiprocessorsProceedings of the Conference on Design, Automation and Test in Europe10.5555/1874620.1874821(821-826)Online publication date: 20-Apr-2009
https://dl.acm.org/doi/10.5555/1874620.1874821

Index Terms

Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
  2. Integrated circuits
    1. Semiconductor memory
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

SA-SPM: an efficient compiler for security aware scratchpad memory (invited paper)
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems

Scratchpad memories (SPM) are often used to boost the performance of application-specific embedded systems. In embedded systems, main memories are vulnerable to external attacks such as bus snooping or memory extraction. Therefore it is desirable to ...
Compiler directed write-mode selection for high performance low power volatile PCM
LCTES '13

Micro-Controller Units (MCUs) are widely adopted ubiquitous computing devices. Due to tight cost and energy constraints, MCUs often integrate very limited internal RAM memory on top of Flash storage, which exposes Flash to heavy write traffic and ...
Endurance-aware cache line management for non-volatile caches

Nonvolatile memories (NVMs) have the potential to replace low-level SRAM or eDRAM on-chip caches because NVMs save standby power and provide large cache capacity. However, limited write endurance is a common problem for NVM technologies, and today's ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '07: Proceedings of the International Symposium on Code Generation and Optimization

March 2007

346 pages

ISBN:0769527647

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 11 March 2007

Check for updates

Qualifiers

Article

Conference

CGO07

Sponsor:

CGO07: 5th Annual IEEE / ACM International Symposium on Code Generation and Optimization

March 11 - 14, 2007

Acceptance Rates

CGO '07 Paper Acceptance Rate 27 of 84 submissions, 32%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hong SNarayanan SKandemir MÖzturk ÖBenini LDe Micheli GAl-Hashimi BMueller W(2009)Process variation aware thread mapping for chip multiprocessorsProceedings of the Conference on Design, Automation and Test in Europe10.5555/1874620.1874821(821-826)Online publication date: 20-Apr-2009
https://dl.acm.org/doi/10.5555/1874620.1874821

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten