Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Speculative parallelization using state separation and multiple value prediction

Published: 05 June 2010 Publication History

Abstract

With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be easily parallelized due to the presence of dependences. To solve this problem, different solutions have been proposed. Some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime. However, when this assumption is violated, the recovery causes very large overhead. Other approaches incur large synchronization or computation overhead when resolving the dependences. Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution. In this paper we propose a compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross-iteration dependences. The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables). These speculative versions and the preceding loop iteration are executed in separate memory states simultaneously. After the execution, if one of these versions is correct (i.e., its predicted values are found to be correct), then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved. The memory states of other incorrect versions are completely discarded. Based on this idea, we further propose a runtime adaptive scheme that not only gives a good performance but also achieves better CPU utilization. We conducted experiments on 10 benchmark programs on a real machine. The results show that our technique can achieve 1.7x speedup on average across all used benchmarks.

References

[1]
H. Agrawal and J. R. Horgan. Dynamic program slicing. In PLDI '90, pages 246--256.
[2]
P. S. Ahuja, K. Skadron, M. Martonosi, and D. W. Clark. Multipath execution: Opportunities and limits. In Supercomputing'98, pages 101--108.
[3]
M. G. Burke and R. K. Cytron. Interprocedural dependence analysis and parallelization. In PLDI '86, pages 162--175.
[4]
M. Cintra and D. R. Llanos. Toward efficient and robust software speculative parallelization on multiprocessors. In PPoPP '03, pages 13--24.
[5]
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07, pages 1--12.
[6]
M. Franklin and G. S. Sohi. Arb: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers, 45(5):552--571, 1996.
[7]
M. J. Garzaran, M. Prvulovic, and J. M. Llaberia. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors. TACO, 2(3):247--279, 2005.
[8]
S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In HPCA '98, pages 195--205.
[9]
M. Gupta and R. Nim. Techniques for speculative run-time parallelization of loops. In Supercomputing '98, pages 1--12.
[10]
L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS '98, pages 58--69.
[11]
K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In CGO '09, pages 157--168.
[12]
K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
[13]
M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. How much parallelism is there in irregular applications? In PPoPP '09, pages 3--14.
[14]
M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS '08, pages 233--243.
[15]
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07, pages 211--222.
[16]
L. Lamport. The parallel execution of do loops. Commun. ACM, 17(2):83--93, 1974.
[17]
C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO '04, pages 75--88.
[18]
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput., 24(3-4):445--475, 1998.
[19]
M. H. Lipasti, C. B.Wilkerson, and J. P. Shen. Value locality and load value prediction. In ASPLOS '96, pages 138--147.
[20]
C.-K. Luk, R. Cohn, R.Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI '05, pages 190--200.
[21]
P. Marcuello and A. González. Clustered speculative multithreaded processors. In In Supercomputing '99, pages 365--372.
[22]
M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas. Removing architectural bottlenecks to the scalability of speculative parallelization. In ISCA '01, pages 204--215.
[23]
C. G. Quiñones, C. Madriles, F. J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In PLDI '06, pages 313--325.
[24]
L. Rauchwerger and D. A. Padua. The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.
[25]
Y. Sazeides, Y. Sazeides, J. E. Smith, and J. E. Smith. The predictability of data values. In MICRO '97, pages 248--258.
[26]
G. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA '95, pages 414--425.
[27]
C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI '10.
[28]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08, pages 330--341.
[29]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Speculative parallelization of sequential loops on multicores. International Journal of Parallel Programming, 37(5):508--535, 2009.
[30]
J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881--902, 1999.
[31]
D. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCA '95, pages 392--403.
[32]
A. K. Uht, V. Sindagi, and K. Hall. Disjoint eager execution: an optimal form of speculative execution. In MICRO '95, pages 313--325.
[33]
S. Wallace, B. Calder, and D. M. Tullsen. Threaded multiple path execution. In ISCA '98, pages 238--249.
[34]
K. Wang and M. Franklin. Highly accurate data value prediction using hybrid predictors. In MICRO '97, pages 281--290.
[35]
A. Zhai, C. B. Colohan, J. G. Steffan, and T. C. Mowry. Compiler optimization of scalar value communication between speculative threads. In ASPLOS '02, pages 171--183.
[36]
Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative parallelization of partially-parallel loops in dsm multiprocessors. In HPCA '99, pages 135--141.

Cited By

View all
  • (2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019
  • (2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
  • (2021)Scalable FSM parallelization via path fusion and higher-order speculationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446705(887-901)Online publication date: 19-Apr-2021
  • Show More Cited By

Index Terms

  1. Speculative parallelization using state separation and multiple value prediction

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 45, Issue 8
    ISMM '10
    August 2010
    129 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1837855
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISMM '10: Proceedings of the 2010 international symposium on Memory management
      June 2010
      140 pages
      ISBN:9781450300544
      DOI:10.1145/1806651
      • General Chair:
      • Jan Vitek,
      • Program Chair:
      • Doug Lea
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2010
    Published in SIGPLAN Volume 45, Issue 8

    Check for updates

    Author Tags

    1. multicore processors
    2. speculative parallelization

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019
    • (2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
    • (2021)Scalable FSM parallelization via path fusion and higher-order speculationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446705(887-901)Online publication date: 19-Apr-2021
    • (2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
    • (2020)Challenging Sequential Bitstream Processing via Principled Bitwise SpeculationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378461(607-621)Online publication date: 9-Mar-2020
    • (2018)Revealing parallel scans and reductions in recurrences through function reconstructionProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243204(1-13)Online publication date: 1-Nov-2018
    • (2017)Software‐Based Speculative ParallelizationProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch10(205-225)Online publication date: 27-Jan-2017
    • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
    • (2016)Fast Task Submission in Software Thread Level Speculation Systems2016 IEEE Trustcom/BigDataSE/ISPA10.1109/TrustCom.2016.0332(2160-2166)Online publication date: Aug-2016
    • (2016)A Flexible Chip Multiprocessor Simulator Dedicated for Thread Level Speculation2016 IEEE Trustcom/BigDataSE/ISPA10.1109/TrustCom.2016.0327(2127-2132)Online publication date: Aug-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media