Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2967938.2967959acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Public Access

Speculatively Exploiting Cross-Invocation Parallelism

Published: 11 September 2016 Publication History

Abstract

Automatic parallelization has shown promise in producing scalable multi-threaded programs for multi-core architectures. Most existing automatic techniques parallelize independent loops and insert global synchronization between loop invocations. For programs with many loop invocations, frequent synchronization often becomes the performance bottleneck. Some techniques exploit cross-invocation parallelism to overcome this problem. Using static analysis, they partition iterations among threads to avoid cross-thread dependences. However, this approach may fail if dependence pattern information is not available at compile time. To address this limitation, this work proposes SpecCross--the first automatic parallelization technique to exploit cross-invocation parallelism using speculation. With speculation, iterations from different loop invocations can execute concurrently, and the program synchronizes only on misspeculation. This allows SpecCross to adapt to dependence patterns that only manifest on particular inputs at runtime. Evaluation on eight programs shows that SpecCross achieves a geomean speedup of 3.43x over parallel execution without cross-invocation parallelization.

References

[1]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers Inc., 2002.
[2]
E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C++. In OOPSLA, 2009.
[3]
M. J. Best, S. Mottishaw, C. Mustard, M. Roth, A. Fedorova, and A. Brownsword. Synchronization via scheduling: techniques for efficiently managing shared state. In PLDI, 2011.
[4]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT, 2008.
[5]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. In PPoPP, 1995.
[6]
D. R. Butenhof. Programming with POSIX threads. Addison-Wesley Longman Publishing Co., Inc., 1997.
[7]
S. Campanoni, T. Jones, G. Holloway, V. J. Reddi, G.-Y. Wei, and D. Brooks. Helix: automatic parallelization of irregular programs for chip multiprocessing. In CGO, 2012.
[8]
C. D. Carothers and B. K. Szymanski. Checkpointing multithreaded programs. Dr. Dobbs, August 2002.
[9]
L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In ISCA, 2006.
[10]
G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In ICPP, 2008.
[11]
R. Cytron. DOACROSS: Beyond vectorization for multiprocessors. In ICPP, 1986.
[12]
W. R. Dieter and J. E. Lumpp Jr. User-level checkpointing for linuxthreads programs. In USENIX, FREENIX Track, 2001.
[13]
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI, 2007.
[14]
A. J. Dorta, C. Rodriguez, F. de Sande, and A. Gonzalez-Escribano. The OpenMP source code repository. In PDP, 2005.
[15]
R. Ferrer, A. Duran, X. Martorell, and E. Ayguadé. Unrolling loops containing task parallelism. In LCPC, 2009.
[16]
S. Ghosh, Y. Park, and A. Raman. Enabling efficient alias speculation. In LCTES '15.
[17]
T. C. Grosser. Enabling polyhedral optimizations in llvm. Diploma thesis, Department of Informatics and Mathematics, University of Passau, Germany, April 2011.
[18]
R. Gupta. The fuzzy barrier: a mechanism for high speed synchronization of processors. In ASPLOS, 1989.
[19]
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In ISCA, 2004.
[20]
H. Han and C.-W. Tseng. Improving compiler and run-time support for irregular reductions using local writes. In LCPC, 1999.
[21]
M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993.
[22]
J. Huang, A. Raman, Y. Zhang, T. B. Jablin, T.-H. Hung, and D. I. August. Decoupled Software Pipelining Creates Parallelization Opportunities. In CGO, 2010.
[23]
K. Z. Ibrahim and G. T. Byrd. On the exploitation of value predication and producer identification to reduce barrier synchronization time. In IPDPS, 2001.
[24]
LLVM Test Suite Guide.\http://llvm.org/docs/TestingGuide.html.
[25]
E. P. Markatos and T. J. LeBlanc. Using processor affinity in loop scheduling on shared-memory multiprocessors. In SC, 1992.
[26]
J. F. Martínez and J. Torrellas. Speculative synchronization: applying thread-level speculation to explicitly parallel applications. In ASPLOS, 2002.
[27]
P. E. Mckenney. Memory barriers: a hardware view for software hackers, 2009.
[28]
M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI, 2009.
[29]
V. Nagarajan and R. Gupta. ECMon: exposing cache events for monitoring. In ISCA, 2009.
[30]
V. Nagarajan and R. Gupta. Speculative optimizations for parallel programs on multicores. In Proceedings of the 22Nd International Conference on Languages and Compilers for Parallel Computing, LCPC'09, 2010.
[31]
NAS Parallel Benchmarks 3.\http://www.nas.nasa.gov/Resources/Software/npb.html.
[32]
C. E. Oancea and A. Mycroft. Software thread-level speculation: an optimistic library implementation. In IWMSE, 2008.
[33]
M. F. P. O'Boyle, L. Kervella, and F. Bodin. Synchronization minimization in a SPMD execution model. J. Parallel Distrib. Comput., 29, September 1995.
[34]
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO, 2005.
[35]
C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, C-36(12), December 1987.
[36]
R. Ponnusamy, J. Saltz, and A. Choudhary. Runtime compilation techniques for data partitioning and communication schedule reuse. In SC, 1993.
[37]
L.-N. Pouchet. PolyBench: the Polyhedral Benchmark suite. http://www-roc.inria.fr/ pouchet/software/polybench/download.
[38]
P. Prabhu, S. Ghosh, Y. Zhang, N. P. Johnson, and D. I. August. Commutative set: A language extension for implicit parallel programming. In PLDI, 2011.
[39]
L. Rauchwerger, N. M. Amato, and D. A. Padua. A scalable method for run-time loop parallelization. International Journal of Parallel Programming (IJPP), 26:537--576, 1995.
[40]
L. Rauchwerger and D. A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE TPDS, 1999.
[41]
A. Robison, M. Voss, and A. Kukanov. Optimization via reflection on work stealing in TBB. In IPDPS, 2008.
[42]
J. Saltz, R. Mirchandaney, and R. Crowley. Run-time parallelization and scheduling of loops. IEEE Transactions on Computers, 40, 1991.
[43]
N. Shavit and D. Touitou. Software transactional memory. In PODC, 1995.
[44]
M. F. Spear, M. M. Michael, and C. von Praun. RingSTM: scalable transactions with a single atomic instruction. In SPAA, 2008.
[45]
Standard Performance Evaluation Corporation.\http://www.spec.org.
[46]
J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 2005.
[47]
P. Swamy and C. Vipin. Minimum dependence distance tiling of nested loops with non-uniform dependences. In IPDPS, 1994.
[48]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO, 2008.
[49]
C.-W. Tseng. Compiler optimizations for eliminating barrier synchronization. In PPoPP, 1995.
[50]
T. Tzen and L. Ni. Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Transactions on Parallel and Distributed Systems, 4(1), January 1993.
[51]
M. J. Wolfe. Optimizing Compilers for Supercomputers. PhD thesis, Department of Computer Science, University of Illinois, Urbana, IL, October 1982.
[52]
L. Yen, J. Bobba, M. R. Marty, K. E. Moore, H. Volos, M. D. Hill, M. M. Swift, and D. A. Wood. Log™-SE: decoupling hardware transactional memory from caches. In HPCA, 2007.
[53]
N. Yonezawa, K. Wada, and T. Aida. Barrier elimination based on access dependency analysis for openmp. In Parallel and Distributed Processing and Applications. Springer Berlin / Heidelberg, 2006.
[54]
J. Zhao, J. Shirako, V. K. Nandivada, and V. Sarkar. Reducing task creation and termination overhead in explicitly parallel programs. In PACT, 2010.
[55]
L. Ziarek, S. Jagannathan, M. Fluet, and U. A. Acar. Speculative n-way barriers. In DAMP, 2008.

Cited By

View all
  • (2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
September 2016
474 pages
ISBN:9781450341219
DOI:10.1145/2967938
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic parallelization
  2. barrier speculation
  3. code generation

Qualifiers

  • Research-article

Funding Sources

Conference

PACT '16
Sponsor:
  • IFIP WG 10.3
  • IEEE TCCA
  • SIGARCH
  • IEEE CS TCPP

Acceptance Rates

PACT '16 Paper Acceptance Rate 31 of 119 submissions, 26%;
Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)9
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media