Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Speculative separation for privatization and reductions

Published: 11 June 2012 Publication History

Abstract

Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic parallelization. Privatization relieves these constraints by replicating data structures, thus enabling scalable parallelization. Prior privatization schemes are limited to arrays and scalar variables because they are sensitive to the layout of dynamic data structures. This work presents Privateer, the first fully automatic privatization system to handle dynamic and recursive data structures, even in languages with unrestricted pointers. To reduce sensitivity to memory layout, Privateer speculatively separates memory objects. Privateer's lightweight runtime system validates speculative separation and speculative privatization to ensure correct parallel execution. Privateer enables automatic parallelization of general-purpose C/C++ applications, yielding a geomean whole-program speedup of 11.4x over best sequential execution on 24 cores, while non-speculative parallelization yields only 0.93x.

References

[1]
E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C++. In Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, 2009.
[2]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008.
[3]
H.-J. Boehm. Simple garbage-collector-safety. In Proceedings of the ACM SIGPLAN 1996 conference on Programming Language Design and Implementation, pages 89--98, New York, NY, 1996. ACM.
[4]
T. Chen, J. Lin, X. Dai, W.-C. Hsu, and P.-C. Yew. Data dependence profiling for speculative optimizations. In E. Duesterwald, editor, Compiler Construction, volume 2985 of Lecture Notes in Computer Science, pages 2733--2733. Springer Berlin / Heidelberg, 2004.
[5]
W. Y. Chen, S. A. Mahlke, and W. W. Hwu. Tolerating first level memory access latency in high-performance systems. In Proceedings of the 1992 International Conference on Parallel Processing, pages 36--43, Boca Raton, Florida, 1992. CRC Press.
[6]
R. Cytron, J. Ferrante, B. K. Rosen, M. N.Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451--490, October 1991.
[7]
F. H. Dang, H. Yu, and L. Rauchwerger. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Parallel and Distributed Processing Symposium, pages 20--29, 2002.
[8]
D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Distributed Computing, pages 194--208, 2006.
[9]
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 223--234, New York, NY, 2007. ACM.
[10]
P. Feautrier. Array expansion. In Proceedings of the 2nd International Conference on Supercomputing, pages 429--441. ACM, 1988.
[11]
F. Gabbay and A. Mendelson. Can program profiling support value prediction? In Proceedings of the 30th annual ACM/IEEE International Symposium on Microarchitecture, pages 270--280, Washington, DC, 1997. IEEE Computer Society.
[12]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3--14, Washington, DC, 2001. IEEE Computer Society.
[13]
H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, and D. I. August. Automatic speculative DOALL for clusters. Proceedings of the 10th IEEE/ACM International Symposium on Code Generation and Optimization, March 2012.
[14]
K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 107--120, 1998.
[15]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the Annual International Symposium on Code Generation and Optimization, pages 75--86, 2004.
[16]
D. E. Maydan, S. P. Amarasinghe, and M. S. Lam. Array-data flow analysis and its use in array privatization. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 2--15, New York, NY, 1993. ACM.
[17]
M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009.
[18]
Y. Ni, A. Welc, A.-R. Adl-Tabatabai, M. Bach, S. Berkowits, J. Cownie, R. Geva, S. Kozhukow, R. Narayanaswamy, J. Olivier, S. Preis, B. Saha, A. Tal, and X. Tian. Design and implementation of transactional constructs for C/C++. In Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, pages 195--212, 2008.
[19]
C. G. Quiünones, C. Madriles, J. Sánchez, P. Marcuello, A. Gonzleáz, and D. M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proceedings of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation, pages 269--279, New York, NY, 2005. ACM.
[20]
E. Raman, G. Ottoni, A. Raman, M. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the Annual International Symposium on Code Generation and Optimization, 2008.
[21]
L. Rauchwerger and D. Padua. The Privatizing DOALL test: A run-time technique for DOALL loop identification and array privatization. In Proceedings of the 8th International Conference on Supercomputing, pages 33--43, New York, NY, 1994. ACM.
[22]
L. Rauchwerger and D. Padua. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. ACM SIGPLAN Notices, 30(6):218--232, 1995.
[23]
S. Rus, G. He, C. Alias, and L. Rauchwerger. Region Array SSA. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pages 43--52. ACM, 2006.
[24]
S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. International Journal of Parallel Programming, 31:251--283, August 2003.
[25]
Standard Performance Evaluation Corporation. http://spec.org.
[26]
The GNU Project. GNU Binutils. http://gnu.org/software/binutils.
[27]
C. Tian, M. Feng, and R. Gupta. Supporting Speculative Parallelization in the Presence of Dynamic Data Structures. In ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010.
[28]
Trimaran. Trimaran Benchmarks Packages. http://trimaran.org.
[29]
P. Tu and D. A. Padua. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 500--521, 1994.
[30]
N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 49--59, Washington, DC, 2007. IEEE Computer Society.
[31]
P. Vanbroekhoven, G. Janssens, M. Bruynooghe, and F. Catthoor. A practical dynamic single assignment transformation. ACM Transactions on Design Automation of Electronic Systems, 12, September 2007.
[32]
H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architecture and Compilation Techniques.
[33]
K. Veeraraghavan, D. Lee, B.Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. Doubleplay: parallelizing sequential logging and replay. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 15--26, New York, NY, 2011. ACM.
[34]
Q. Wu, A. Pyatakov, A. N. Spiridonov, E. Raman, D. W. Clark, and D. I. August. Exposing memory access regularities using objectrelative memory profiling. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 2004.
[35]
H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008

Cited By

View all
  • (2024)Using Hardware-Transactional-Memory Support to Implement Speculative Task ExecutionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104939(104939)Online publication date: Jun-2024
  • (2022)Using Hardware Transactional Memory to Implement Speculative Privatization in OpenMPLanguages and Compilers for Parallel Computing10.1007/978-3-030-95953-1_5(57-73)Online publication date: 16-Feb-2022
  • (2017)Performance Evaluation and Enhancement of Process-Based Parallel Loop ExecutionInternational Journal of Parallel Programming10.1007/s10766-015-0394-145:1(185-198)Online publication date: 1-Feb-2017
  • Show More Cited By

Index Terms

  1. Speculative separation for privatization and reductions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 6
    PLDI '12
    June 2012
    534 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2345156
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2012
      572 pages
      ISBN:9781450312059
      DOI:10.1145/2254064
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2012
    Published in SIGPLAN Volume 47, Issue 6

    Check for updates

    Author Tags

    1. automatic parallelization
    2. separation
    3. speculation

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Using Hardware-Transactional-Memory Support to Implement Speculative Task ExecutionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104939(104939)Online publication date: Jun-2024
    • (2022)Using Hardware Transactional Memory to Implement Speculative Privatization in OpenMPLanguages and Compilers for Parallel Computing10.1007/978-3-030-95953-1_5(57-73)Online publication date: 16-Feb-2022
    • (2017)Performance Evaluation and Enhancement of Process-Based Parallel Loop ExecutionInternational Journal of Parallel Programming10.1007/s10766-015-0394-145:1(185-198)Online publication date: 1-Feb-2017
    • (2013)Exploring Irregular Reduction Support in Transactional MemoryAlgorithms and Architectures for Parallel Processing10.1007/978-3-319-03859-9_22(257-266)Online publication date: 18-Dec-2013
    • (2024)PROMPT: A Fast and Extensible Memory Profiling FrameworkProceedings of the ACM on Programming Languages10.1145/36498278:OOPSLA1(449-473)Online publication date: 29-Apr-2024
    • (2024)Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638493(80-93)Online publication date: 2-Mar-2024
    • (2023)Catamaran: Low-Overhead Memory Safety Enforcement via Parallel AccelerationProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598098(816-828)Online publication date: 12-Jul-2023
    • (2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
    • (2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
    • (2018)SpecRPCProceedings of the 19th International Middleware Conference10.1145/3274808.3274829(266-278)Online publication date: 26-Nov-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media