Compiling for the multiscalar architecture

January 1998

Author:
T. N. Vijaykumar,
Supervisor:
Gurindar S. Sohi

Publisher:

The University of Wisconsin - Madison

ISBN:978-0-591-82604-3

Order Number:AAI9813127

Pages:

207

Purchase on ProQuest

Bibliometrics

Abstract

High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level parallelism (ILP) to achieve high performance on such applications by searching for independent instructions in a dynamic window of instructions and executing them on a wide-issue pipeline. Increasing the window size and the issue width to extract more ILP may hinder achieving high clock speeds, limiting overall performance.

The Multiscalar architecture employs multiple small windows and many narrow-issue processing units to exploit ILP at high clock speeds. Sequential programs are partitioned into code fragments called tasks, which are speculatively executed in parallel. Inter-task register dependences are honored via communication and synchronization and inter-task control flow and memory dependences are handled by speculation and verification in hardware.

Since this thesis is the first attempt at investigating the problem of compiling for the Multiscalar architecture, I identify the fundamental interactions between applications and the Multiscalar architecture from the standpoint of performance and explore a few compiler optimization opportunities instead of proposing the best technique for a specific problem.

Control flow speculation, register communication, memory dependence speculation, load imbalance, and task overheads are key performance issues. To extract high degrees of ILP, compiler heuristics partition programs into large tasks, which comprise multiple basic blocks. To maintain high prediction accuracy and avoid delays due to inter-task register communication, the heuristics control the number of successors of tasks while including register dependences within tasks. Inter-task register communication is generated and scheduled to overlap computation and inter-task register communication.

For the SPEC95 benchmarks, the heuristics increase task sizes significantly while improving control flow speculation accuracy with respect to basic blocks, enabling large window spans from which to extract parallelism. Including register dependences within tasks improves performance considerably. Sophisticated register communication generation and scheduling are effective in boosting performance. Dead register analysis reduces register communication traffic considerably. All the optimizations grow in importance for larger number of PUs.

Cited By

Contributors

T N Vijaykumar
Purdue University
- Publication Years1994 - 2023
- Publication counts86
- Citation count9,178
- Available for Download98
- Downloads (cumulative)93,439
- Downloads (12 months)7,825
- Downloads (6 weeks)1,322
- Average Downloads per Article953
- Average Citation per Article107
View Full Profile
Gurindar S Sohi
University of Wisconsin-Madison
- Publication Years1985 - 2021
- Publication counts99
- Citation count10,542
- Available for Download111
- Downloads (cumulative)102,778
- Downloads (12 months)10,378
- Downloads (6 weeks)1,918
- Average Downloads per Article926
- Average Citation per Article106
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

The multiscalar architecture
Multiscalar processors
ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of ...
Task Selection for the Multiscalar Architecture
Special issue on compilation and architectural support for parallel applications

The multiscalar architecture advocates a distributed processor organization and task-level speculation to exploit high degrees of instruction level parallelism (ILP) in sequential programs without impeding improvements in clock speeds. The main goal of ...

Browse Theses

Sections

Cited By

The multiscalar architecture

Multiscalar processors

Task Selection for the Multiscalar Architecture

Sections

Cited By

Save to Binder

Recommendations

The multiscalar architecture

Multiscalar processors

Task Selection for the Multiscalar Architecture