Nothing Special   »   [go: up one dir, main page]

skip to main content
Compiling for the multiscalar architecture
Publisher:
  • The University of Wisconsin - Madison
ISBN:978-0-591-82604-3
Order Number:AAI9813127
Pages:
207
Reflects downloads up to 25 Nov 2024Bibliometrics
Skip Abstract Section
Abstract

High-performance, general-purpose microprocessors serve as compute engines for computers ranging from personal computers to supercomputers. Sequential programs constitute a major portion of real-world software that run on the computers. State-of-the-art microprocessors exploit instruction level parallelism (ILP) to achieve high performance on such applications by searching for independent instructions in a dynamic window of instructions and executing them on a wide-issue pipeline. Increasing the window size and the issue width to extract more ILP may hinder achieving high clock speeds, limiting overall performance.

The Multiscalar architecture employs multiple small windows and many narrow-issue processing units to exploit ILP at high clock speeds. Sequential programs are partitioned into code fragments called tasks, which are speculatively executed in parallel. Inter-task register dependences are honored via communication and synchronization and inter-task control flow and memory dependences are handled by speculation and verification in hardware.

Since this thesis is the first attempt at investigating the problem of compiling for the Multiscalar architecture, I identify the fundamental interactions between applications and the Multiscalar architecture from the standpoint of performance and explore a few compiler optimization opportunities instead of proposing the best technique for a specific problem.

Control flow speculation, register communication, memory dependence speculation, load imbalance, and task overheads are key performance issues. To extract high degrees of ILP, compiler heuristics partition programs into large tasks, which comprise multiple basic blocks. To maintain high prediction accuracy and avoid delays due to inter-task register communication, the heuristics control the number of successors of tasks while including register dependences within tasks. Inter-task register communication is generated and scheduled to overlap computation and inter-task register communication.

For the SPEC95 benchmarks, the heuristics increase task sizes significantly while improving control flow speculation accuracy with respect to basic blocks, enabling large window spans from which to extract parallelism. Including register dependences within tasks improves performance considerably. Sophisticated register communication generation and scheduling are effective in boosting performance. Dead register analysis reduces register communication traffic considerably. All the optimizations grow in importance for larger number of PUs.

Cited By

  1. ACM
    Estebanez A, Llanos D and Gonzalez-Escribano A (2016). A Survey on Thread-Level Speculation Techniques, ACM Computing Surveys, 49:2, (1-39), Online publication date: 11-Nov-2016.
  2. ACM
    Colohan C, Ailamaki A, Steffan J and Mowry T (2008). Incrementally parallelizing database transactions with thread-level speculation, ACM Transactions on Computer Systems (TOCS), 26:1, (1-50), Online publication date: 1-Feb-2008.
  3. ACM
    Zhai A, Steffan J, Colohan C and Mowry T (2008). Compiler and hardware support for reducing the synchronization of speculative threads, ACM Transactions on Architecture and Code Optimization (TACO), 5:1, (1-33), Online publication date: 1-May-2008.
  4. ACM
    Dou J and Cintra M (2007). A compiler cost model for speculative parallelization, ACM Transactions on Architecture and Code Optimization (TACO), 4:2, (12-es), Online publication date: 1-Jun-2007.
  5. Colohan C, Ailamaki A, Steffan J and Mowry T Tolerating Dependences Between Large Speculative Threads Via Sub-Threads Proceedings of the 33rd annual international symposium on Computer Architecture, (216-226)
  6. ACM
    Colohan C, Ailamaki A, Steffan J and Mowry T (2019). Tolerating Dependences Between Large Speculative Threads Via Sub-Threads, ACM SIGARCH Computer Architecture News, 34:2, (216-226), Online publication date: 1-May-2006.
  7. ACM
    Quiñones C, Madriles C, Sánchez J, Marcuello P, González A and Tullsen D (2005). Mitosis compiler, ACM SIGPLAN Notices, 40:6, (269-279), Online publication date: 12-Jun-2005.
  8. ACM
    Quiñones C, Madriles C, Sánchez J, Marcuello P, González A and Tullsen D Mitosis compiler Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, (269-279)
  9. Marcuello P, González A and Tubella J (2004). Thread Partitioning and Value Prediction for Exploiting Speculative Thread-Level Parallelism, IEEE Transactions on Computers, 53:2, (114-125), Online publication date: 1-Feb-2004.
  10. ACM
    Du Z, Lim C, Li X, Yang C, Zhao Q and Ngai T A cost-driven compilation framework for speculative parallelization of sequential programs Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, (71-81)
  11. ACM
    Du Z, Lim C, Li X, Yang C, Zhao Q and Ngai T (2004). A cost-driven compilation framework for speculative parallelization of sequential programs, ACM SIGPLAN Notices, 39:6, (71-81), Online publication date: 9-Jun-2004.
  12. Dou J and Cintra M Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, (203-214)
  13. ACM
    Zhai A, Colohan C, Steffan J and Mowry T Compiler optimization of scalar value communication between speculative threads Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, (171-183)
  14. ACM
    Zhai A, Colohan C, Steffan J and Mowry T (2002). Compiler optimization of scalar value communication between speculative threads, ACM SIGPLAN Notices, 37:10, (171-183), Online publication date: 1-Oct-2002.
  15. ACM
    Zhai A, Colohan C, Steffan J and Mowry T (2002). Compiler optimization of scalar value communication between speculative threads, ACM SIGARCH Computer Architecture News, 30:5, (171-183), Online publication date: 1-Dec-2002.
  16. ACM
    Zhai A, Colohan C, Steffan J and Mowry T (2002). Compiler optimization of scalar value communication between speculative threads, ACM SIGOPS Operating Systems Review, 36:5, (171-183), Online publication date: 1-Dec-2002.
  17. Codrescu L, Wills D and Meindl J (2001). Architecture of the Atlas Chip-Multiprocessor, IEEE Transactions on Computers, 50:1, (67-82), Online publication date: 1-Jan-2001.
  18. Rotenberg E and Smith J Control independence in trace processors Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, (4-15)
  19. Vijaykumar T and Sohi G Task selection for a multiscalar processor Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, (81-92)
Contributors
  • Purdue University
  • University of Wisconsin-Madison
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations