Compiler optimization of scalar value communication between speculative threads
A Zhai, CB Colohan, JG Steffan, TC Mowry - Proceedings of the 10th …, 2002 - dl.acm.org
A Zhai, CB Colohan, JG Steffan, TC Mowry
Proceedings of the 10th international conference on Architectural support …, 2002•dl.acm.orgWhile there have been many recent proposals for hardware that supports Thread-Level
Speculation (TLS), there has been relatively little work on compiler optimizations to fully
exploit this potential for parallelizing programs optimistically. In this paper, we focus on one
important limitation of program performance under TLS, which is stalls due to forwarding
scalar values between threads that would otherwise cause frequent data dependences. We
present and evaluate dataflow algorithms for three increasingly-aggressive instruction …
Speculation (TLS), there has been relatively little work on compiler optimizations to fully
exploit this potential for parallelizing programs optimistically. In this paper, we focus on one
important limitation of program performance under TLS, which is stalls due to forwarding
scalar values between threads that would otherwise cause frequent data dependences. We
present and evaluate dataflow algorithms for three increasingly-aggressive instruction …
While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.