Abstract
Architectural resources and program recurrences are themain limitations to the amount of Instruction-Level Parallelism (ILP) exploitable from loops. To increase the number of operations per second, current designs use high degrees of resource replication for memory ports and functional units. But the high costs in terms of power and cycle time of this technique limit the degree of replication.
Clustering is a technique aimed at decentralizing the design of future wide issue cores and enable them to meet the technology constraints in terms of cycle time, area and power. Another way to reduce the complexity of recent cores is using wide functional units. This technique only requires minor modifications to the underlying hardware, but also imposes a penalty on the exploitable parallelism.
In this paper we evaluate a broad range of VLIW configurations that make use of these two techniques. From this study we conclude that applying both techniques yields configurations with very good power-performance efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berry, M., Chen, D., Koss, P., Kuck, D.: The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers, Technical Report 827, CSRD, Univ. of Illinois at Urbana-Champaign (November 1988)
Brooks, D., Tiwari, V., Martsoni, M.: Wattch: A Framework for Architectural- Level Power Analysis and Optimizations, Int’l Symp. on Computer Architecture, ISCA 2000 (2000)
Faraboschi, P., Brown, G., Desoli, G., Homewood, F.: Lx: A technology platform for customizable VLIW embedded processing. In: Proc. 27th Annual Intl. Symp. on Computer Architecture, pp. 203-213 (June 2000)
Friedman, J., Greenfield, Z.: The tigersharc DSP architecture, IEEE Micro, 66-76 (January-February 2000)
Glaskowsky, P.N.: MAP1000 unfolds at Equator. Microprocessor Report. 12(16) (December 1998)
Hrishikesh, M.S., Jouppi, N.P., Farkas, K.I., Burger, D., Keckler, S.W., Shivakumar, P.: The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays. In: Proc. of the 29 th Symp. on Comp. Arch (ISCA 2002) (May 2002)
Llosa, J., Valero, M., Ayguadé, E., González, A.: Hypernode reduction modulo scheduling. In: Proc. of the 28 th Annual Int. Symp. on Microarchitecture (MICRO- 28), November 1995, pp. 350–360 (1995)
Lòpez, D., Llosa, J., Valero, M., Ayguadé, E.: Cost–Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures. IEEE Trans. on Comp. 50(10), 1033–1051 (2001)
Lòpez, D., Llosa, J., Valero, M., Ayguadé, E.: Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures. IEEE. Trans. on Comp. 50(10), 1033–1051 (2001)
Watanabe, T.: The NEC SX-3 Supercomputer System. In: Proc. ComCon 1991, pp. 303–308 (1991)
White, S.W., Dhawan, S.: POWER2: Next Generation of the RISC System/6000 Family. IBM J. Research and Development 38(5), 493–502 (1994)
Wilton, S.J.E., Jouppi, N.P.: CACTI: An enhanced Cache Access and Cycle Time Model. IEEE. J. Solid-State Circuits 31(5), 677–688 (1996)
Zalamea, J., Llosa, J., Ayguadé, E., Valero, M.: MIRS: Modulo Scheduling with integrated register spilling. In: Proc. of 14th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC 2001) (August 2001)
Zalamea, J., Llosa, J., Ayguadé, E.: andM. Valero. Modulo Scheduling with integrated register spilling for Clustered VLIW Architectures. In: Proc. 34th annual Int. Symp. on Microarch (December 2001)
AltiVec Vectorizes PowerPC Microprocessor Report 12(6) (May 1998)
INTEL, Pentium III Processor: Developer’s Manual, Intel Technology Report (1999), available at http://developer.intel.com/design/PentiumIII
T.I.Inc.: TMS320C62x/67x CPU and Instruction Set Reference Guide (1998)
Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing, High-Performance Computer Architecture. In: HPCA-6. Proceedings. Sixth International Symposium on (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pericàs, M., Ayguadé, E., Zalamea, J., Llosa, J., Valero, M. (2004). Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units. In: Pimentel, A.D., Vassiliadis, S. (eds) Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2004. Lecture Notes in Computer Science, vol 3133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27776-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-27776-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22377-1
Online ISBN: 978-3-540-27776-7
eBook Packages: Springer Book Archive