Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/379240.379264acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Removing architectural bottlenecks to the scalability of speculative parallelization

Published: 01 May 2001 Publication History

Abstract

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far show that it is hard to deliver scalable speedups. Often, the problem is not true dependence violations, but sub-optimal architectural design. Consequently, we attempt to identify and eliminate major architectural bottlenecks that limit the scalability of speculative parallelization. The solutions that we propose are: low-complexity commit in constant time to eliminate the task commit bottleneck, a memory-based overflow area to eliminate stall due to speculative buffer overflow, and exploiting high-level access patterns to minimize speculation-induced traffic. To show that the resulting system is truly scalable, we perform simulations with up to 128 processors. With our optimizations, the speedups for 128 and 64 processors reach 63 and 48, respectively. The average speedup for 64 processors is 32, nearly four times higher than without our optimizations.

References

[1]
H. Akkary and M. A. Driscoll. A Dynamic Multithreading Processor. In International Symposium on Microarchitecture, pages 226-236, December 1998.
[2]
J. Barnes. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/. University of Hawaii, 1994.
[3]
M. Berry et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. International Journal of Supercompurer Applications, 3(3):5--40, Fall 1989.
[4]
W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J, Lee, D. Padua, Y. Paek, B. Ponenger, L. Rauchwerger, and P. Tu. Advanced Program Restructuring for High-Performance Computers with Polaris. IEEE Computer, 29(12):78-82, December 1996.
[5]
M. Cintra, J. E Martfnez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 13-24, June 2000,
[6]
1. Duff, R. Schreiber, and P. Havlak. HPF-2 Scope of Activities and Motivating Applications. Technical Report CRPC-TR94492, Rice University, November 1994.
[7]
S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In Proceedings of the 4th International Symposium on High-PerJbrmance Computer Architecture, pages 195-205, February 1998.
[8]
M. Gupta and R. Nim. Techniques for Speculative Run-Time Parallelization of Loops. In Proceedings of Supercomputing 1998, November 1998.
[9]
L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In 8th International Con}erence on Architectural Support fbr Programming Languages and Operating Systems, pages 58-69, October 1998.
[10]
J.L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millenium. IEEE Computer, 33(7):28-35, July 2000.
[11]
T. Knight. An Architecture for Mostly Functional Languages. In ACM Lisp and Functional Programming Con}erence, pages 500-519, August 1986.
[12]
V. Krishnan and J. Torrellas. An Execution-Driven Framework for Fast and Accurate Simulation of Superscalar Processors. In International Conference on Parallel Architectures and Compilation Techniques, October 1998,
[13]
V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):866-880, September 1999.
[14]
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May 1990.
[15]
P. Mareuello and A. Gonzalez. Clustered Speculative Muhithreaded Processors. In Proceedings of the 1999 International Conference on Supercomputing, pages 365-372, June 1999.
[16]
A. Nowatzyk, G, Aybay, M. Browne, E. Kelly, M. Parkin, B. Radke, and S. Vishin. The S3.mp Scalable Shared Memory Multiprocessor. In Proceedings of the 1995 International Conference on Parallel Processing, pages I1-I10, August 1995.
[17]
J. Oplinger, D. Heine, and M. S. Lam. In Search of Speculative Thread- Level Parallelism. In International Conference on Parallel Architectures and Compilation Techniques, October 1999.
[18]
M. Prvulovic. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. Masters Thesis, Computer Science Department, University of lllinois at Urbana-Champaign, November 2000.
[19]
L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run- Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and lmplementation, pages 218-232, June 1995.
[20]
P. Rundberg and P. Stenstrom. Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors. In Fourth Workshop on Multithreaded Execution, Architecture and Compilation, December 2000.
[21]
G. Sohi, S. Breach, and S. Vajapeyam. Multiscalar Processors. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414-425, June 1995.
[22]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 1- 12, June 2000.
[23]
M. Tremblay, MAJC: Microprocessor Architecture for Java Computing. Hot Chips, August 1999.
[24]
J. Y. Tsai, J. Huang, C. Amlo, D. Lilja, and P. C. Yew. The Superthreaded Processor Architecture. IEEE Trans. on Computers, Special Issue on Multithreaded Architectures, 48(9):881-902, September 1999.
[25]
J. Veenstra and R. Fowler. MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 201-207, January 1994.
[26]
Y. Zhang, L. Rauchwerger, and J. Torrellas. A Unified Approach to Speculative Parallelization of Loops in DSM Multipi-ocessors. Technical Report 1542, University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, October 1998.
[27]
Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors. In Proceedings of the 4th International Symposium on High- Performance Computer Architecture, pages 162-174, February 1998.

Cited By

View all
  • (2012)BulkCompactorProceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture10.1109/HPCA.2012.6169040(1-12)Online publication date: 25-Feb-2012
  • (2011)Enhanced speculative parallelization via incremental recoveryACM SIGPLAN Notices10.1145/2038037.194158046:8(189-200)Online publication date: 12-Feb-2011
  • (2011)Enhanced speculative parallelization via incremental recoveryProceedings of the 16th ACM symposium on Principles and practice of parallel programming10.1145/1941553.1941580(189-200)Online publication date: 12-Feb-2011
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture
June 2001
289 pages
ISBN:0769511627
DOI:10.1145/379240
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 29, Issue 2
    Special Issue: Proceedings of the 28th annual international symposium on Computer architecture (ISCA '01)
    May 2001
    262 pages
    ISSN:0163-5964
    DOI:10.1145/384285
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ISCA01
Sponsor:

Acceptance Rates

ISCA '01 Paper Acceptance Rate 24 of 163 submissions, 15%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2012)BulkCompactorProceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture10.1109/HPCA.2012.6169040(1-12)Online publication date: 25-Feb-2012
  • (2011)Enhanced speculative parallelization via incremental recoveryACM SIGPLAN Notices10.1145/2038037.194158046:8(189-200)Online publication date: 12-Feb-2011
  • (2011)Enhanced speculative parallelization via incremental recoveryProceedings of the 16th ACM symposium on Principles and practice of parallel programming10.1145/1941553.1941580(189-200)Online publication date: 12-Feb-2011
  • (2011)SeekBin: An automated tool for analyzing thread level speculative parallelization potential2011 7th International Conference on Emerging Technologies10.1109/ICET.2011.6048489(1-6)Online publication date: Sep-2011
  • (2010)Speculative parallelization using state separation and multiple value predictionACM SIGPLAN Notices10.1145/1837855.180666345:8(63-72)Online publication date: 5-Jun-2010
  • (2010)Speculative parallelization using state separation and multiple value predictionProceedings of the 2010 international symposium on Memory management10.1145/1806651.1806663(63-72)Online publication date: 5-Jun-2010
  • (2010)Toward a more accurate understanding of the limits of the TLS execution paradigmProceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)10.1109/IISWC.2010.5649169(1-12)Online publication date: 2-Dec-2010
  • (2009)How do robots foster the learning of basic concepts in informatics?ACM SIGCSE Bulletin10.1145/1595496.156304541:3(403-403)Online publication date: 6-Jul-2009
  • (2009)Concurrency and parallelism in the computing ontologyACM SIGCSE Bulletin10.1145/1595496.156304441:3(402-402)Online publication date: 6-Jul-2009
  • (2009)Tail recursion by using function generalizationACM SIGCSE Bulletin10.1145/1595496.156303641:3(394-394)Online publication date: 6-Jul-2009
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media