research-article

Speculative separation for privatization and reductions

Authors:

Nick P. Johnson,

Prakash Prabhu,

David I. AugustAuthors Info & Claims

PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 359 - 370

https://doi.org/10.1145/2254064.2254107

Published: 11 June 2012 Publication History

Abstract

Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic parallelization. Privatization relieves these constraints by replicating data structures, thus enabling scalable parallelization. Prior privatization schemes are limited to arrays and scalar variables because they are sensitive to the layout of dynamic data structures. This work presents Privateer, the first fully automatic privatization system to handle dynamic and recursive data structures, even in languages with unrestricted pointers. To reduce sensitivity to memory layout, Privateer speculatively separates memory objects. Privateer's lightweight runtime system validates speculative separation and speculative privatization to ensure correct parallel execution. Privateer enables automatic parallelization of general-purpose C/C++ applications, yielding a geomean whole-program speedup of 11.4x over best sequential execution on 24 cores, while non-speculative parallelization yields only 0.93x.

References

[1]

E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C++. In Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, 2009.

Digital Library

[2]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008.

Digital Library

[3]

H.-J. Boehm. Simple garbage-collector-safety. In Proceedings of the ACM SIGPLAN 1996 conference on Programming Language Design and Implementation, pages 89--98, New York, NY, 1996. ACM.

Digital Library

[4]

T. Chen, J. Lin, X. Dai, W.-C. Hsu, and P.-C. Yew. Data dependence profiling for speculative optimizations. In E. Duesterwald, editor, Compiler Construction, volume 2985 of Lecture Notes in Computer Science, pages 2733--2733. Springer Berlin / Heidelberg, 2004.

[5]

W. Y. Chen, S. A. Mahlke, and W. W. Hwu. Tolerating first level memory access latency in high-performance systems. In Proceedings of the 1992 International Conference on Parallel Processing, pages 36--43, Boca Raton, Florida, 1992. CRC Press.

[6]

R. Cytron, J. Ferrante, B. K. Rosen, M. N.Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451--490, October 1991.

Digital Library

[7]

F. H. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Parallel and Distributed Processing Symposium, pages 20--29, 2002.

Digital Library

[8]

D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Distributed Computing, pages 194--208, 2006.

Digital Library

[9]

C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 223--234, New York, NY, 2007. ACM.

Digital Library

[10]

P. Feautrier. Array expansion. In Proceedings of the 2nd International Conference on Supercomputing, pages 429--441. ACM, 1988.

Digital Library

[11]

F. Gabbay and A. Mendelson. Can program profiling support value prediction? In Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture, pages 270--280, Washington, DC, 1997. IEEE Computer Society.

Digital Library

[12]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3--14, Washington, DC, 2001. IEEE Computer Society.

Digital Library

[13]

H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, and D. I. August. Automatic speculative DOALL for clusters. Proceedings of the 10th IEEE/ACM International Symposium on Code Generation and Optimization, March 2012.

Digital Library

[14]

K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 107--120, 1998.

Digital Library

[15]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the Annual International Symposium on Code Generation and Optimization, pages 75--86, 2004.

Digital Library

[16]

D. E. Maydan, S. P. Amarasinghe, and M. S. Lam. Array-data flow analysis and its use in array privatization. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 2--15, New York, NY, 1993. ACM.

Digital Library

[17]

M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009.

Digital Library

[18]

Y. Ni, A. Welc, A.-R. Adl-Tabatabai, M. Bach, S. Berkowits, J. Cownie, R. Geva, S. Kozhukow, R. Narayanaswamy, J. Olivier, S. Preis, B. Saha, A. Tal, and X. Tian. Design and implementation of transactional constructs for C/C++. In Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, pages 195--212, 2008.

Digital Library

[19]

C. G. Quiünones, C. Madriles, J. Sánchez, P. Marcuello, A. Gonzleáz, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proceedings of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation, pages 269--279, New York, NY, 2005. ACM.

Digital Library

[20]

E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the Annual International Symposium on Code Generation and Optimization, 2008.

Digital Library

[21]

L. Rauchwerger and D. Padua. The Privatizing DOALL test: A run-time technique for DOALL loop identification and array privatization. In Proceedings of the 8th International Conference on Supercomputing, pages 33--43, New York, NY, 1994. ACM.

Digital Library

[22]

L. Rauchwerger and D. Padua. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. ACM SIGPLAN Notices, 30(6):218--232, 1995.

Digital Library

[23]

S. Rus, G. He, C. Alias, and L. Rauchwerger. Region Array SSA. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 43--52. ACM, 2006.

Digital Library

[24]

S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. International Journal of Parallel Programming, 31:251--283, August 2003.

Digital Library

[25]

Standard Performance Evaluation Corporation. http://spec.org.

[26]

The GNU Project. GNU Binutils. http://gnu.org/software/binutils.

[27]

C. Tian, M. Feng, and R. Gupta. Supporting Speculative Parallelization in the Presence of Dynamic Data Structures. In ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010.

Digital Library

[28]

Trimaran. Trimaran Benchmarks Packages. http://trimaran.org.

[29]

P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 500--521, 1994.

Digital Library

[30]

N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 49--59, Washington, DC, 2007. IEEE Computer Society.

Digital Library

[31]

P. Vanbroekhoven, G. Janssens, M. Bruynooghe, and F. Catthoor. A practical dynamic single assignment transformation. ACM Transactions on Design Automation of Electronic Systems, 12, September 2007.

Digital Library

[32]

H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques.

Digital Library

[33]

K. Veeraraghavan, D. Lee, B.Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. Doubleplay: parallelizing sequential logging and replay. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 15--26, New York, NY, 2011. ACM.

Digital Library

[34]

Q. Wu, A. Pyatakov, A. N. Spiridonov, E. Raman, D. W. Clark, and D. I. August. Exposing memory access regularities using objectrelative memory profiling. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 2004.

Digital Library

[35]

H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008

Cited By

Xu ZChon YSu YTan ZApostolakis SCampanoni SAugust D(2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649827
Bhosale AEigenmann RLee IChabbi MSteuwer M(2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638493
Zhang YLiu TSun ZChen ZLi XZuo ZJust RFraser G(2023)Catamaran: Low-Overhead Memory Safety Enforcement via Parallel AccelerationProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598098(816-828)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598098
Show More Cited By

Index Terms

Speculative separation for privatization and reductions
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Speculative separation for privatization and reductions
PLDI '12

Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic ...
Perspective: A Sensible Approach to Speculative Automatic Parallelization
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

The promise of automatic parallelization, freeing programmers from the error-prone and time-consuming process of making efficient use of parallel processing resources, remains unrealized. For decades, the imprecision of memory analysis limited the ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2012

572 pages

ISBN:9781450312059

DOI:10.1145/2254064

General Chairs:
Jan Vitek
Purdue University
,
Haibo Lin
Microsoft China
,
Program Chair:
Frank Tip
IBM T.J. Watson Research Center

ACM SIGPLAN Notices Volume 47, Issue 6
PLDI '12
June 2012
534 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2345156
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '12

Sponsor:

SIGPLAN

PLDI '12: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 11 - 16, 2012

Beijing, China

Acceptance Rates

PLDI '12 Paper Acceptance Rate 48 of 255 submissions, 19%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
467
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu ZChon YSu YTan ZApostolakis SCampanoni SAugust D(2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649827
Bhosale AEigenmann RLee IChabbi MSteuwer M(2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638493
Zhang YLiu TSun ZChen ZLi XZuo ZJust RFraser G(2023)Catamaran: Low-Overhead Memory Safety Enforcement via Parallel AccelerationProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598098(816-828)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598098
Apostolakis SXu ZTan ZChan GCampanoni SAugust DDonaldson ATorlak E(2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386028
Ying VJeffrey MSanchez DMartínez JDuato JEeckhout L(2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00024
Yan XJoa AWong BCassell BSzepesi TNaouach MLam DPierre GFerreira PShrira L(2018)SpecRPCProceedings of the 19th International Middleware Conference10.1145/3274808.3274829(266-278)Online publication date: 26-Nov-2018
https://dl.acm.org/doi/10.1145/3274808.3274829
Ginsbach PO'Boyle MReddi VSmith ATang L(2017)Discovery and exploitation of general reductions: a constraint based approachProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049862(269-280)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3049832.3049862
Li ZLiu LDeng YYin SWang YWei S(2017)Aggressive Pipelining of Irregular Applications on Reconfigurable HardwareACM SIGARCH Computer Architecture News10.1145/3140659.308022845:2(575-586)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3140659.3080228
Li ZLiu LDeng YYin SWang YWei S(2017)Aggressive Pipelining of Irregular Applications on Reconfigurable HardwareProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080228(575-586)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3079856.3080228
Oh TBeard SJohnson NPopovych SAugust D(2017)A Generalized Framework for Automatic Scripting Language Parallelization2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.28(356-369)Online publication date: Sep-2017
https://doi.org/10.1109/PACT.2017.28
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents