Abstract
Digital signal processor (DSP) cores with very-long-instruction-word (VLIW) architectures have been widely used in recent embedded and multimedia systems-on-chips. Improving power efficiency becomes a crucial issue for designing VLIW-DSP cores. This paper proposes a novel approach—energy-proportional parallel computing—to improve the power efficiency of a VLIW-DSP core. The main theme in this approach is to adapt parallelism by performance demand and apply power gating to idle hardware. The challenge is to reduce the power dissipated on register files. Energy proportionality is realized through an architecture featuring distributed and power-gated register files. Power efficiency is exploited by the compiler through instruction scheduling and register allocation. Energy-aware list scheduling is proposed to reduce the register file power, while building an as-soon-as-possible schedule. Following the scheduling outcome, register allocation with energy-oriented decision rules through weighted graph coloring is performed. Evaluation with the MiBench benchmark suite shows a significant power saving and scaling range on the register file power. Moreover, the evaluation justifies the effect of a power-gated processor to maintain good energy efficiency over applications with diverse behavior. This result shows that energy-proportional parallel computing is an attractive direction for future processor design in the deep-submicron era.
Similar content being viewed by others
References
Texas Instruments (2011) OMAP 5 mobile applications platform. http://www.ti.com/pdfs/wtbu/swct010.pdf
Philips (2011) Philips nexperia—highly integrated programmable system-on-chip. http://www.semiconductors.philips.com/products/nexperia
St Nomadik (2011) ST Nomadik multimedia processor. http://www.st.com/stonline/prodpres/dedicate/proc/proc.htm
Qualcomm Inc. (2010) Snapdragon mobile processors and chipsets. http://www.qualcomm.com/snapdragon
Lambrechts A, Raghavan P, Leroy A, Talavera G, Aa T, Jayapala M, Catthoor F, Verkest D, Deconinck G, Corporaal H, Robert F, Carrabina J (2005) Power breakdown analysis for a heterogeneous NoC platform running a video application. In: Proceedings of 16th IEEE international conference on application-specific systems, architecture processors (ASAP’05), pp 179–184
Zyuban V, Kogge P (1998) The energy complexity of register files. In: Proceedings of the 1998 international symposium on low power electronics and design (ISLPED’98). ACM, New York, pp 305–310
Texas Instruments (2005) TMS320C6455 fixed-point digital signal processor. http://www.ti.com.cn/cn/lit/ds/symlink/tms320c6455.pdf
Analog Devices (2010) Getting started with Blackfin processors. http://www.analog.com/static/imported-files/processor_manuals/GettingStartedwithBlackfinProcessors.pdf
Freescale Semiconductor (2008) Tuning C code for StarCore-based digital signal processors. http://cache.freescale.com/files/dsp/doc/app_note/an3357.pdf
Terechko AS, Corporaal H (2007) Inter-cluster communication in VLIW architectures. ACM Trans Archit Code Optim 4(2):11
Wang M, Wang Y, Liu D, Qin Z, Shao Z (2010) Compiler-assisted leakage-aware loop scheduling for embedded VLIW DSP processors. J Syst Softw 83(5):772–785
Nagpal R, Srikant Y (2011) Compiler-assisted power optimization for clustered VLIW architectures. Parallel Comput 37(1):42–59
Industry Technology Roadmap for Semiconductors (2010) ITRS report 2010. http://www.itrs.net/links/2010itrs/home2010.htm
Shin Y, Seomun J, Choi KM, Sakurai T (2010) Power gating: circuits, design methodologies, and best practice for standard-cell VLSI designs. ACM Trans Des Autom Electron Syst 15:28:1–28:37
Kim HS, Vijaykrishnan N, Kandemir M, Irwin MJ (2003) Adapting instruction level parallelism for optimizing leakage in VLIW architectures. In: Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems (LCTES’03). ACM, New York, pp 275–283
You Y-P, Lee C, Lee JK (2006) Compilers for leakage power reduction. ACM Trans Des Autom Electron Syst 11(1):147–164
You Y-P, Huang C-W, Lee JK (2007) Compilation for compact power-gating controls. ACM Trans Des Autom Electron Syst 12(4):51
Venkatachalam V, Franz M (2005) Power reduction techniques for microprocessor systems. ACM Comput Surv 37(3):195–237
Barroso L, Holzle U (2007) The case for energy-proportional computing. Computer 40:33–37
Cameron KW (2010) The challenges of energy-proportional computing. Computer 43:82–83
Ma YC, Liu TA, Chao WS (2013) Energy-aware compiler optimization for VLIW-DSP cores. In: Pan JS, Yang CN, Lin CC (eds) Advances in intelligent systems and applications of smart innovation, systems and technologies. Springer, Berlin. 21: 779–788
Taiwan Semiconductor Manufacturing Company (2011) 40nm technology. http://www.tsmc.com/english/dedicatedfoundry/technology/40nm.htm
Fisher JA, Faraboschi P, Young C (2005) Embedded computing: a VLIW approach to architecture, compilers, and tools. Elsevier, London
Aho AV, Lam MS, Sethi R, Ullman JD (2007) Compilers: principles, techniques, and tools, 2/e. Addison Wesley, USA
Chaitin GJ (1982) Register allocation & spilling via graph coloring. In: Proceedings of the 1982 SIGPLAN symposium on Compiler construction (SIGPLAN’82). ACM, New York, pp 98–105
Faraboschi P, Brown G, Fisher JA, Desoli G, Homewood F (2000) Lx: a technology platform for customizable vliw embedded processing. In: Proceedings of the 27th annual internationalsymposium on computer architecture (ISCA’00). ACM, New York, pp 203–213
Lin Y-C, Lu CH, Wu C-J, Tang C-L, You Y-P, Moo Y-C, Lee J-K (2008) Effective code generation for distributed and ping-pong register files: a case study on PAC VLIW DSP cores. J Signal Process Syst 51:269–288
Dally W, Balfour J, Black-Shaffer D, Chen J, Harting R, Parikh V, Park J, Sheffield D (2008) Efficient embedded computing. Computer 41(7):27–32
Aleta A, Codina JM, Gonzalez A, Kaeli D (2007) Heterogeneous clustered vliw microarchitectures. In: Proceedings of the international symposium on code generation and optimization (CGO’07). IEEE Computer Society, Washington, DC, pp 354–366
Kailas K, Franklin M, Ebcioglu K (2002) A register file architecture and compilation scheme for clustered ilp processors. In: Proceedings of the 8th international euro-par conference on parallel processing (Euro-Par’02). Springer, London, pp 500–511
Akturan C, Jacome MF (2001) Caliber: a software pipelining algorithm for clustered embedded VLIW processors. In: Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design (ICCAD’01). IEEE Press, Piscataway, pp 112–118
Qian Y, Carr S, Sweany P (2002) Loop fusion for clustered vliw architectures. In: Proceedings of the joint conference on languages, compilers and tools for embedded systems: software and compilers for embedded systems (LCTES/SCOPES’02). ACM, New York, pp 112–119
Qian Y, Carr S, Sweany PH (2002) Optimizing loop performance for clustered vliw architectures. In: Proceedings of the 2002 International conference on parallel architectures and compilation techniques (PACT’02). IEEE Computer Society, Washington, DC, pp 271–280
Zalamea J, Llosa J, Ayguadé E, Valero M (2001) Modulo scheduling with integrated register spilling for clustered vliw architectures. In: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture (MICRO 34). IEEE Computer Society, Washington, DC, pp 160–169
Wen M, Wu N, Guan M, Zhang C (2008) Load scheduling: reducing pressure on distributed register files for free. In Proceedings of the 2008 Asia and South Pacific design automation conference (ASP DAC’08). IEEE Computer Society Press, Los Alamitos, pp 340–345
Wu C-J, Lin Y-T, Lee J-K (2012) Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files. J Supercomput 61(3):1024–1047
Shieh W-Y, Wang B-S (2012) Power-aware register assignment for large register file design. J Supercomput 61(3):719–742
Won HS, Kim KS, Jeong KO, Park KT, Choi KM, Kong JT (2003) MTCMOS design methodology and its application to mobile computing. In: Proceedings of the international symposium on low power electronics and design (ISLPED’03), pp 110–115
Roy S, Ranganathan N, Katkoori S (2009) A framework for power-gating functional units in embedded microprocessors, IEEE Trans Very Large Scale Integr (VLSI) Syst 17(11):1640–1649
Morgan R (1998) Building an optimizing compiler. Butterworth-Heinemann, London
Hochbaum DS (1995) Approximation Algorithms for NP-Hard problems. International Thomson Publishing
Guthaus M, Ringenberg J, Ernst D, Austin T, Mudge T, Brown R (2001) Mibench: a free, commercially representative embedded benchmark suite, In: Proceedings of the IEEE international workshop on workload characterization (WWC-4), pp 3–14
Chiu J-C, Yang K-M (2010) A novel instruction stream buffer for VLIW architectures. Comput Electr Eng 36(1):190–198
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis and transformation. In: Proceedings of the international symposium on code generation and optimization: feedbackdirected and runtime optimization (CGO’04). IEEE Computer Society, Washington, DC, p 75
Keating M (2007) Low power methodology manual for system-on-chip design. Springer, New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ma, YC., Chao, WS. & Liu, TA. Enabling energy-proportional computing on instruction-level parallel processors. J Supercomput 71, 391–447 (2015). https://doi.org/10.1007/s11227-014-1301-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1301-z