research-article

Speculative parallelization using state separation and multiple value prediction

Authors:

Rajiv GuptaAuthors Info & Claims

ACM SIGPLAN Notices, Volume 45, Issue 8

Pages 63 - 72

https://doi.org/10.1145/1837855.1806663

Published: 05 June 2010 Publication History

Abstract

With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be easily parallelized due to the presence of dependences. To solve this problem, different solutions have been proposed. Some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime. However, when this assumption is violated, the recovery causes very large overhead. Other approaches incur large synchronization or computation overhead when resolving the dependences. Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution. In this paper we propose a compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross-iteration dependences. The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables). These speculative versions and the preceding loop iteration are executed in separate memory states simultaneously. After the execution, if one of these versions is correct (i.e., its predicted values are found to be correct), then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved. The memory states of other incorrect versions are completely discarded. Based on this idea, we further propose a runtime adaptive scheme that not only gives a good performance but also achieves better CPU utilization. We conducted experiments on 10 benchmark programs on a real machine. The results show that our technique can achieve 1.7x speedup on average across all used benchmarks.

References

[1]

H. Agrawal and J. R. Horgan. Dynamic program slicing. In PLDI '90, pages 246--256.

Digital Library

[2]

P. S. Ahuja, K. Skadron, M. Martonosi, and D. W. Clark. Multipath execution: Opportunities and limits. In Supercomputing'98, pages 101--108.

Digital Library

[3]

M. G. Burke and R. K. Cytron. Interprocedural dependence analysis and parallelization. In PLDI '86, pages 162--175.

Digital Library

[4]

M. Cintra and D. R. Llanos. Toward efficient and robust software speculative parallelization on multiprocessors. In PPoPP '03, pages 13--24.

Digital Library

[5]

C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07, pages 1--12.

Digital Library

[6]

M. Franklin and G. S. Sohi. Arb: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers, 45(5):552--571, 1996.

Digital Library

[7]

M. J. Garzaran, M. Prvulovic, and J. M. Llaberia. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors. TACO, 2(3):247--279, 2005.

Digital Library

[8]

S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative versioning cache. In HPCA '98, pages 195--205.

Digital Library

[9]

M. Gupta and R. Nim. Techniques for speculative run-time parallelization of loops. In Supercomputing '98, pages 1--12.

Digital Library

[10]

L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In ASPLOS '98, pages 58--69.

Digital Library

[11]

K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In CGO '09, pages 157--168.

Digital Library

[12]

K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.

Digital Library

[13]

M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. How much parallelism is there in irregular applications? In PPoPP '09, pages 3--14.

Digital Library

[14]

M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In ASPLOS '08, pages 233--243.

Digital Library

[15]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07, pages 211--222.

Digital Library

[16]

L. Lamport. The parallel execution of do loops. Commun. ACM, 17(2):83--93, 1974.

Digital Library

[17]

C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In CGO '04, pages 75--88.

Digital Library

[18]

A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput., 24(3-4):445--475, 1998.

Digital Library

[19]

M. H. Lipasti, C. B.Wilkerson, and J. P. Shen. Value locality and load value prediction. In ASPLOS '96, pages 138--147.

Digital Library

[20]

C.-K. Luk, R. Cohn, R.Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI '05, pages 190--200.

Digital Library

[21]

P. Marcuello and A. González. Clustered speculative multithreaded processors. In In Supercomputing '99, pages 365--372.

Digital Library

[22]

M. Prvulovic, M. J. Garzarán, L. Rauchwerger, and J. Torrellas. Removing architectural bottlenecks to the scalability of speculative parallelization. In ISCA '01, pages 204--215.

Digital Library

[23]

C. G. Quiñones, C. Madriles, F. J. Sánchez, P. Marcuello, A. González, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In PLDI '06, pages 313--325.

[24]

L. Rauchwerger and D. A. Padua. The lrpd test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999.

Digital Library

[25]

Y. Sazeides, Y. Sazeides, J. E. Smith, and J. E. Smith. The predictability of data values. In MICRO '97, pages 248--258.

Digital Library

[26]

G. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA '95, pages 414--425.

Digital Library

[27]

C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI '10.

Digital Library

[28]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08, pages 330--341.

Digital Library

[29]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Speculative parallelization of sequential loops on multicores. International Journal of Parallel Programming, 37(5):508--535, 2009.

Digital Library

[30]

J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded processor architecture. IEEE Transactions on Computers, 48(9):881--902, 1999.

Digital Library

[31]

D. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCA '95, pages 392--403.

Digital Library

[32]

A. K. Uht, V. Sindagi, and K. Hall. Disjoint eager execution: an optimal form of speculative execution. In MICRO '95, pages 313--325.

Digital Library

[33]

S. Wallace, B. Calder, and D. M. Tullsen. Threaded multiple path execution. In ISCA '98, pages 238--249.

Digital Library

[34]

K. Wang and M. Franklin. Highly accurate data value prediction using hybrid predictors. In MICRO '97, pages 281--290.

Digital Library

[35]

A. Zhai, C. B. Colohan, J. G. Steffan, and T. C. Mowry. Compiler optimization of scalar value communication between speculative threads. In ASPLOS '02, pages 171--183.

Digital Library

[36]

Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative parallelization of partially-parallel loops in dsm multiprocessors. In HPCA '99, pages 135--141.

Digital Library

Cited By

Xia YJiang PAgrawal GAmaral JKulkarni M(2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3302516.3307354
Estebanez ALlanos DOrden DPalop B(2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
https://doi.org/10.1371/journal.pone.0267602
Qiu JSun XSabet AZhao ZSherwood TBerger EKozyrakis C(2021)Scalable FSM parallelization via path fusion and higher-order speculationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446705(887-901)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446705
Show More Cited By

Index Terms

Speculative parallelization using state separation and multiple value prediction
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Speculative parallelization using state separation and multiple value prediction
ISMM '10: Proceedings of the 2010 international symposium on Memory management

With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation

The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...
Speculative parallelization on GPGPUs
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

This paper overviews the first speculative parallelization technique for GPUs that can exploit parallelism in loops even in the presence of dynamic irregularities that may give rise to cross-iteration dependences. The execution of a speculatively ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 45, Issue 8

ISMM '10

August 2010

129 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1837855

Issue’s Table of Contents

ISMM '10: Proceedings of the 2010 international symposium on Memory management
June 2010
140 pages
ISBN:9781450300544
DOI:10.1145/1806651
General Chair:
Jan Vitek
Purdue University
,
Program Chair:
Doug Lea
State University of New York at Oswego

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2010

Published in SIGPLAN Volume 45, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
363
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xia YJiang PAgrawal GAmaral JKulkarni M(2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3302516.3307354
Estebanez ALlanos DOrden DPalop B(2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
https://doi.org/10.1371/journal.pone.0267602
Qiu JSun XSabet AZhao ZSherwood TBerger EKozyrakis C(2021)Scalable FSM parallelization via path fusion and higher-order speculationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446705(887-901)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446705
Apostolakis SXu ZTan ZChan GCampanoni SAugust DDonaldson ATorlak E(2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386028
Qiu JJiang LZhao ZLarus JCeze LStrauss K(2020)Challenging Sequential Bitstream Processing via Principled Bitwise SpeculationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378461(607-621)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378461
Jiang PChen LAgrawal GEvripidou SStenström PO'Boyle M(2018)Revealing parallel scans and reductions in recurrences through function reconstructionProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243204(1-13)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243204
Tian CFeng MGupta R(2017)Software‐Based Speculative ParallelizationProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch10(205-225)Online publication date: 27-Jan-2017
https://doi.org/10.1002/9781119332015.ch10
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Xu YWang JLiu YShen L(2016)Fast Task Submission in Software Thread Level Speculation Systems2016 IEEE Trustcom/BigDataSE/ISPA10.1109/TrustCom.2016.0332(2160-2166)Online publication date: Aug-2016
https://doi.org/10.1109/TrustCom.2016.0332
Wang YAn HLiu ZLi LHuang J(2016)A Flexible Chip Multiprocessor Simulator Dedicated for Thread Level Speculation2016 IEEE Trustcom/BigDataSE/ISPA10.1109/TrustCom.2016.0327(2127-2132)Online publication date: Aug-2016
https://doi.org/10.1109/TrustCom.2016.0327
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents