Article

Removing architectural bottlenecks to the scalability of speculative parallelization

Authors:

Milos Prvulovic,

María Jesús Garzarán,

Lawrence Rauchwerger,

Josep TorrellasAuthors Info & Claims

ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture

Pages 204 - 215

https://doi.org/10.1145/379240.379264

Published: 01 May 2001 Publication History

Abstract

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.

References

[1]

H. Akkary and M. A. Driscoll. A Dynamic Multithreading Processor. In International Symposium on Microarchitecture, pages 226-236, December 1998.

Digital Library

[2]

J. Barnes. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/. University of Hawaii, 1994.

[3]

M. Berry et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. International Journal of Supercompurer Applications, 3(3):5--40, Fall 1989.

Digital Library

[4]

W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J, Lee, D. Padua, Y. Paek, B. Ponenger, L. Rauchwerger, and P. Tu. Advanced Program Restructuring for High-Performance Computers with Polaris. IEEE Computer, 29(12):78-82, December 1996.

Digital Library

[5]

M. Cintra, J. E Martfnez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 13-24, June 2000,

Digital Library

[6]

1. Duff, R. Schreiber, and P. Havlak. HPF-2 Scope of Activities and Motivating Applications. Technical Report CRPC-TR94492, Rice University, November 1994.

[7]

S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In Proceedings of the 4th International Symposium on High-PerJbrmance Computer Architecture, pages 195-205, February 1998.

Digital Library

[8]

M. Gupta and R. Nim. Techniques for Speculative Run-Time Parallelization of Loops. In Proceedings of Supercomputing 1998, November 1998.

Digital Library

[9]

L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In 8th International Con}erence on Architectural Support fbr Programming Languages and Operating Systems, pages 58-69, October 1998.

Digital Library

[10]

J.L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millenium. IEEE Computer, 33(7):28-35, July 2000.

Digital Library

[11]

T. Knight. An Architecture for Mostly Functional Languages. In ACM Lisp and Functional Programming Con}erence, pages 500-519, August 1986.

Digital Library

[12]

V. Krishnan and J. Torrellas. An Execution-Driven Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, October 1998,

Digital Library

[13]

V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):866-880, September 1999.

Digital Library

[14]

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.

Digital Library

[15]

P. Mareuello and A. Gonzalez. Clustered Speculative Muhithreaded Processors. In Proceedings of the 1999 International Conference on Supercomputing, pages 365-372, June 1999.

Digital Library

[16]

A. Nowatzyk, G, Aybay, M. Browne, E. Kelly, M. Parkin, B. Radke, and S. Vishin. The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 1995 International Conference on Parallel Processing, pages I1-I10, August 1995.

[17]

J. Oplinger, D. Heine, and M. S. Lam. In Search of Speculative Thread- Level Parallelism. In International Conference on Parallel Architectures and Compilation Techniques, October 1999.

Digital Library

[18]

M. Prvulovic. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. Masters Thesis, Computer Science Department, University of lllinois at Urbana-Champaign, November 2000.

[19]

L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run- Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and lmplementation, pages 218-232, June 1995.

Digital Library

[20]

P. Rundberg and P. Stenstrom. Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors. In Fourth Workshop on Multithreaded Execution, Architecture and Compilation, December 2000.

[21]

G. Sohi, S. Breach, and S. Vajapeyam. Multiscalar Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414-425, June 1995.

Digital Library

[22]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 1- 12, June 2000.

Digital Library

[23]

M. Tremblay, MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.

[24]

J. Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P. C. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):881-902, September 1999.

Digital Library

[25]

J. Veenstra and R. Fowler. MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201-207, January 1994.

Digital Library

[26]

Y. Zhang, L. Rauchwerger, and J. Torrellas. A Unified Approach to Speculative Parallelization of Loops in DSM Multipi-ocessors. Technical Report 1542, University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, October 1998.

[27]

Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors. In Proceedings of the 4th International Symposium on High- Performance Computer Architecture, pages 162-174, February 1998.

Digital Library

Cited By

Duan YZhou XAhn WTorrellas J(2012)BulkCompactorProceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture10.1109/HPCA.2012.6169040(1-12)Online publication date: 25-Feb-2012
https://dl.acm.org/doi/10.1109/HPCA.2012.6169040
Tian CLin CFeng MGupta R(2011)Enhanced speculative parallelization via incremental recoveryACM SIGPLAN Notices10.1145/2038037.194158046:8(189-200)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/2038037.1941580
Tian CLin CFeng MGupta RCascaval CYew P(2011)Enhanced speculative parallelization via incremental recoveryProceedings of the 16th ACM symposium on Principles and practice of parallel programming10.1145/1941553.1941580(189-200)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/1941553.1941580
Show More Cited By

Index Terms

Recommendations

Removing architectural bottlenecks to the scalability of speculative parallelization
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far ...
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

Multimedia SIMD extensions such as MMX and AltiVec speed up media processing; however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and ...
Three Architectural Models for Compiler-Controlled Speculative Execution

To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture

June 2001

289 pages

ISBN:0769511627

DOI:10.1145/379240

Chairman:
Per Stenström
Chalmers Univ. of Technology

ACM SIGARCH Computer Architecture News Volume 29, Issue 2
Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
May 2001
262 pages
ISSN:0163-5964
DOI:10.1145/384285
Editor:
Per Stenström
Chalmers Univ. of Technology
Issue’s Table of Contents

Copyright © 2001 Authors.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\TCCA: TC on Computer Arhitecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA01

Sponsor:

SIGARCH
IEEE-CS\TCCA

ISCA01: 28th International Symposium on Computer Architecture

June 30 - July 4, 2001

Göteborg, Sweden

Acceptance Rates

ISCA '01 Paper Acceptance Rate 24 of 163 submissions, 15%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
542
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Duan YZhou XAhn WTorrellas J(2012)BulkCompactorProceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture10.1109/HPCA.2012.6169040(1-12)Online publication date: 25-Feb-2012
https://dl.acm.org/doi/10.1109/HPCA.2012.6169040
Tian CLin CFeng MGupta R(2011)Enhanced speculative parallelization via incremental recoveryACM SIGPLAN Notices10.1145/2038037.194158046:8(189-200)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/2038037.1941580
Tian CLin CFeng MGupta RCascaval CYew P(2011)Enhanced speculative parallelization via incremental recoveryProceedings of the 16th ACM symposium on Principles and practice of parallel programming10.1145/1941553.1941580(189-200)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/1941553.1941580
Mustafa GWaheed AMahmood W(2011)SeekBin: An automated tool for analyzing thread level speculative parallelization potential2011 7th International Conference on Emerging Technologies10.1109/ICET.2011.6048489(1-6)Online publication date: Sep-2011
https://doi.org/10.1109/ICET.2011.6048489
Tian CFeng MGupta R(2010)Speculative parallelization using state separation and multiple value predictionACM SIGPLAN Notices10.1145/1837855.180666345:8(63-72)Online publication date: 5-Jun-2010
https://dl.acm.org/doi/10.1145/1837855.1806663
Tian CFeng MGupta RVitek JLea D(2010)Speculative parallelization using state separation and multiple value predictionProceedings of the 2010 international symposium on Memory management10.1145/1806651.1806663(63-72)Online publication date: 5-Jun-2010
https://dl.acm.org/doi/10.1145/1806651.1806663
Ioannou NSinger JKhan SXekalakis PYiapanis PPocock ABrown GLujan MWatson ICintra M(2010)Toward a more accurate understanding of the limits of the TLS execution paradigmProceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)10.1109/IISWC.2010.5649169(1-12)Online publication date: 2-Dec-2010
https://dl.acm.org/doi/10.1109/IISWC.2010.5649169
Wiesner BBrinda T(2009)How do robots foster the learning of basic concepts in informatics?ACM SIGCSE Bulletin10.1145/1595496.156304541:3(403-403)Online publication date: 6-Jul-2009
https://dl.acm.org/doi/10.1145/1595496.1563045
Cassel LLeBlanc RMcGettrick AWrinn M(2009)Concurrency and parallelism in the computing ontologyACM SIGCSE Bulletin10.1145/1595496.156304441:3(402-402)Online publication date: 6-Jul-2009
https://dl.acm.org/doi/10.1145/1595496.1563044
Rubio-Sánchez MVelázquez-Iturbide J(2009)Tail recursion by using function generalizationACM SIGCSE Bulletin10.1145/1595496.156303641:3(394-394)Online publication date: 6-Jul-2009
https://dl.acm.org/doi/10.1145/1595496.1563036
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents