Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units

Miquel Pericàs¹⁷,
Eduard Ayguadé¹⁷,
Javier Zalamea¹⁷,
Josep Llosa¹⁷ &
…
Mateo Valero¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3133))

Included in the following conference series:

International Workshop on Embedded Computer Systems

596 Accesses

Abstract

Architectural resources and program recurrences are themain limitations to the amount of Instruction-Level Parallelism (ILP) exploitable from loops. To increase the number of operations per second, current designs use high degrees of resource replication for memory ports and functional units. But the high costs in terms of power and cycle time of this technique limit the degree of replication.

Clustering is a technique aimed at decentralizing the design of future wide issue cores and enable them to meet the technology constraints in terms of cycle time, area and power. Another way to reduce the complexity of recent cores is using wide functional units. This technique only requires minor modifications to the underlying hardware, but also imposes a penalty on the exploitable parallelism.

In this paper we evaluate a broad range of VLIW configurations that make use of these two techniques. From this study we conclude that applying both techniques yields configurations with very good power-performance efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

Merging Compilation and Microarchitectural Configuration Spaces for Performance/Power Optimization in VLIW-Based Systems

References

Berry, M., Chen, D., Koss, P., Kuck, D.: The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers, Technical Report 827, CSRD, Univ. of Illinois at Urbana-Champaign (November 1988)
Google Scholar
Brooks, D., Tiwari, V., Martsoni, M.: Wattch: A Framework for Architectural- Level Power Analysis and Optimizations, Int’l Symp. on Computer Architecture, ISCA 2000 (2000)
Google Scholar
Faraboschi, P., Brown, G., Desoli, G., Homewood, F.: Lx: A technology platform for customizable VLIW embedded processing. In: Proc. 27th Annual Intl. Symp. on Computer Architecture, pp. 203-213 (June 2000)
Google Scholar
Friedman, J., Greenfield, Z.: The tigersharc DSP architecture, IEEE Micro, 66-76 (January-February 2000)
Google Scholar
Glaskowsky, P.N.: MAP1000 unfolds at Equator. Microprocessor Report. 12(16) (December 1998)
Google Scholar
Hrishikesh, M.S., Jouppi, N.P., Farkas, K.I., Burger, D., Keckler, S.W., Shivakumar, P.: The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays. In: Proc. of the 29 th Symp. on Comp. Arch (ISCA 2002) (May 2002)
Google Scholar
Llosa, J., Valero, M., Ayguadé, E., González, A.: Hypernode reduction modulo scheduling. In: Proc. of the 28 th Annual Int. Symp. on Microarchitecture (MICRO- 28), November 1995, pp. 350–360 (1995)
Google Scholar
Lòpez, D., Llosa, J., Valero, M., Ayguadé, E.: Cost–Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures. IEEE Trans. on Comp. 50(10), 1033–1051 (2001)
Article Google Scholar
Lòpez, D., Llosa, J., Valero, M., Ayguadé, E.: Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures. IEEE. Trans. on Comp. 50(10), 1033–1051 (2001)
Article Google Scholar
Watanabe, T.: The NEC SX-3 Supercomputer System. In: Proc. ComCon 1991, pp. 303–308 (1991)
Google Scholar
White, S.W., Dhawan, S.: POWER2: Next Generation of the RISC System/6000 Family. IBM J. Research and Development 38(5), 493–502 (1994)
Article Google Scholar
Wilton, S.J.E., Jouppi, N.P.: CACTI: An enhanced Cache Access and Cycle Time Model. IEEE. J. Solid-State Circuits 31(5), 677–688 (1996)
Article Google Scholar
Zalamea, J., Llosa, J., Ayguadé, E., Valero, M.: MIRS: Modulo Scheduling with integrated register spilling. In: Proc. of 14th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC 2001) (August 2001)
Google Scholar
Zalamea, J., Llosa, J., Ayguadé, E.: andM. Valero. Modulo Scheduling with integrated register spilling for Clustered VLIW Architectures. In: Proc. 34th annual Int. Symp. on Microarch (December 2001)
Google Scholar
AltiVec Vectorizes PowerPC Microprocessor Report 12(6) (May 1998)
Google Scholar
INTEL, Pentium III Processor: Developer’s Manual, Intel Technology Report (1999), available at http://developer.intel.com/design/PentiumIII
T.I.Inc.: TMS320C62x/67x CPU and Instruction Set Reference Guide (1998)
Google Scholar
Rixner, S., Dally, W.J., Khailany, B., Mattson, P., Kapasi, U.J., Owens, J.D.: Register organization for media processing, High-Performance Computer Architecture. In: HPCA-6. Proceedings. Sixth International Symposium on (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya, Barcelona, Spain
Miquel Pericàs, Eduard Ayguadé, Javier Zalamea, Josep Llosa & Mateo Valero

Authors

Miquel Pericàs
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar
Javier Zalamea
View author publications
You can also search for this author in PubMed Google Scholar
Josep Llosa
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Valero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Systems Architecture Group, University of Amsterdam, The Netherlands
Andy D. Pimentel
Computer Engineering Lab, TUDelft., Postbus 5031, 2600 GA, Delft, The Netherlands
Stamatis Vassiliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pericàs, M., Ayguadé, E., Zalamea, J., Llosa, J., Valero, M. (2004). Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units. In: Pimentel, A.D., Vassiliadis, S. (eds) Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2004. Lecture Notes in Computer Science, vol 3133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27776-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-27776-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22377-1
Online ISBN: 978-3-540-27776-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

Merging Compilation and Microarchitectural Configuration Spaces for Performance/Power Optimization in VLIW-Based Systems

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

Merging Compilation and Microarchitectural Configuration Spaces for Performance/Power Optimization in VLIW-Based Systems

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation