Article

Scalable Speculative Parallelization on Commodity Clusters

Authors:

David I. AugustAuthors Info & Claims

MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 3 - 14

https://doi.org/10.1109/MICRO.2010.19

Published: 04 December 2010 Publication History

Abstract

While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In particular, high inter-node communication cost and lack of globally shared memory appear to make clusters suitable only for server applications with abundant task-level parallelism and scientific applications with regular and independent units of work. Clever use of pipeline parallelism (DSWP), thread-level speculation (TLS), and speculative pipeline parallelism (Spec-DSWP) can mitigate the costs of inter-thread communication on shared memory multicore machines. This paper presents Distributed Software Multi-threaded Transactional memory (DSMTX), a runtime system which makes these techniques applicable to non-shared memory clusters, allowing them to efficiently address inter-node communication costs. Initial results suggest that DSMTX enables efficient cluster execution of a wider set of application types. For 11 sequential C programs parallelized for a 4-core 32-node (128 total core) cluster without shared memory, DSMTX achieves a geomean speedup of 49x. This compares favorably to the 15x speedup achieved by our implementation of TLS-only support for clusters.

References

[1]

R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann Publishers Inc., 2002.

Digital Library

[2]

C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared memory computing on networks of workstations. Computer, 29(2):18-28, 1996.

Digital Library

[3]

D. I. August, N. Vachharajani, and M. J. Bridges. System and method for supporting multi-threaded transactions. United States Patent Application 12/380677. March 2008.

[4]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008.

Digital Library

[5]

R. L. Bocchino, V. S. Adve, and B. L. Chamberlain. Software transactional memory for large scale clusters. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, 2008.

Digital Library

[6]

M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007.

Digital Library

[7]

M. J. Bridges. The VELOCITY Compiler: Extracting Efficient Multicore Execution from Legacy Sequential Codes. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008.

Digital Library

[8]

D. Buntinas, G. Mercier, and W. Gropp. Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the nemesis communication subsystem. Parallel Computing, North-Holland, 33(9):634-644, 2007.

Digital Library

[9]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA '05: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, 2005.

Digital Library

[10]

M. Couceiro, P. Romano, N. Carvalho, and L. Rodrigues. D2STM: Dependable distributed software transactional memory. Pacific Rim International Symposium on Dependable Computing, IEEE, 0:307-313, 2009.

Digital Library

[11]

R. Cytron. DOACROSS: Beyond vectorization for multiprocessors. In Proceedings of the International Conference on Parallel Processing, August 1986.

[12]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, 2004.

Digital Library

[13]

T. El-Ghazawi, W. Carlson, T. Sterling, and K. Yelick. UPC: Distributed Shared-Memory Programming. John Wiley and Sons, 2005.

Digital Library

[14]

J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, F. Pailet, S. Jain, T. Jacob, S. Yada, S. Marella, P. Salihundam, V. Erraguntla, M. Konow, M. Riepen, G. Droege, J. Lindemann, M. Gries, T. Apel, K. Henriss, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, R. Van Der Wijngaart, and T. Mattson. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, 7-11 2010.

[15]

J. Huang, A. Raman, Y. Zhang, T. B. Jablin, T.-H. Hung, and D. I. August. Decoupled Software Pipelining Creates Parallelization Opportunities. In Proceedings of the 2010 International Symposium on Code Generation and Optimization, April 2010.

Digital Library

[16]

C. Kotselidis, M. Ansari, K. Jarvis, M. Luján, C. Kirkham, and I. Watson. DiSTM: A software transactional memory framework for clusters. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, 2008.

Digital Library

[17]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, 2004.

Digital Library

[18]

K. Y. Luigi, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. pages 10-11, 1998.

[19]

K. Manassiev, M. Mihailescu, and C. Amza. Exploiting distributed version concurrency in a transactional memory cluster. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, 2006.

Digital Library

[20]

M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI '09: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009.

Digital Library

[21]

C. E. Oancea and A. Mycroft. Software thread-level speculation: an optimistic library implementation. In IWMSE '08: Proceedings of the 1st International Workshop on Multicore Software Engineering, 2008.

Digital Library

[22]

G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO '05: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, 2005.

Digital Library

[23]

D. A. Patterson and J. L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, San Francisco, CA, 4th edition, 2008.

Digital Library

[24]

A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative Parallelization Using Software Multi-threaded Transactions. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2010.

Digital Library

[25]

R. Rangan, N. Vachharajani, A. Stoler, G. Ottoni, D. I. August, and G. Z. N. Cai. Support for high-frequency streaming in CMPs. In Proceedings of the 39th International Symposium on Microarchitecture, December 2006.

Digital Library

[26]

Standard Performance Evaluation Corporation (SPEC). http://www.spec.org.

[27]

J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23(3):253-300, February 2005.

Digital Library

[28]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.

Digital Library

[29]

W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007.

Digital Library

[30]

C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, 2008.

Digital Library

[31]

N. Vachharajani. Intelligent Speculation for Pipelined Multithreading. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008.

Digital Library

[32]

N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, 2007.

Digital Library

[33]

R. M. Yoo and H.-H. S. Lee. Helper transactions: Enabling thread-level speculation via a transactional memory system. In PESPMA '08: Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, June 2008.

[34]

A. Zhai. Compiler Optimization of Value Communication for Thread-Level Speculation. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States, January 2005.

Digital Library

[35]

Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In The 5TH International Symposium on High-Performance Computer Architecture, February 1999.

Digital Library

[36]

H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In HPCA '08: Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008.

[37]

C. Zilles and G. Sohi. Master/slave speculative parallelization. In MICRO '02: Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, 2002.

Digital Library

Cited By

Apostolakis SXu ZTan ZChan GCampanoni SAugust DDonaldson ATorlak E(2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386028
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Yiapanis PBrown GLuján M(2015)Compiler-Driven Software Speculation for Thread-Level ParallelismACM Transactions on Programming Languages and Systems10.1145/282150538:2(1-45)Online publication date: 22-Dec-2015
https://dl.acm.org/doi/10.1145/2821505
Show More Cited By

Index Terms

Scalable Speculative Parallelization on Commodity Clusters

Recommendations

Speculative parallelization using software multi-threaded transactions
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...
Speculative parallelization using software multi-threaded transactions
ASPLOS '10

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...
Speculative parallelization using software multi-threaded transactions
ASPLOS '10

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

December 2010

542 pages

ISBN:9780769542997

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 04 December 2010

Check for updates

Author Tags

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
288
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Apostolakis SXu ZTan ZChan GCampanoni SAugust DDonaldson ATorlak E(2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386028
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Yiapanis PBrown GLuján M(2015)Compiler-Driven Software Speculation for Thread-Level ParallelismACM Transactions on Programming Languages and Systems10.1145/282150538:2(1-45)Online publication date: 22-Dec-2015
https://dl.acm.org/doi/10.1145/2821505
Bondhugula UGropp WMatsuoka S(2013)Compiling affine loop nests for distributed-memory parallel architecturesProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503289(1-12)Online publication date: 17-Nov-2013
https://dl.acm.org/doi/10.1145/2503210.2503289
Yiapanis PRosas-Ham DBrown GLuján M(2013)Optimizing software runtime systems for speculative parallelizationACM Transactions on Architecture and Code Optimization10.1145/2400682.24006989:4(1-27)Online publication date: 20-Jan-2013
https://dl.acm.org/doi/10.1145/2400682.2400698
Barreto JDragojevic AFerreira PFilipe RGuerraoui R(2012)Unifying thread-level speculation and transactional memoryProceedings of the 13th International Middleware Conference10.5555/2442626.2442639(187-207)Online publication date: 3-Dec-2012
https://dl.acm.org/doi/10.5555/2442626.2442639
Luo YZhai A(2012)Dynamically dispatching speculative threads to improve sequential executionACM Transactions on Architecture and Code Optimization10.1145/2355585.23555869:3(1-31)Online publication date: 5-Oct-2012
https://dl.acm.org/doi/10.1145/2355585.2355586
Kim HJohnson NLee JMahlke SAugust DEidt CHoller ASrinivasan UAmarasinghe S(2012)Automatic speculative DOALL for clustersProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259029(94-103)Online publication date: 31-Mar-2012
https://dl.acm.org/doi/10.1145/2259016.2259029
Campanoni SJones THolloway GReddi VWei GBrooks DEidt CHoller ASrinivasan UAmarasinghe S(2012)HELIXProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259028(84-93)Online publication date: 31-Mar-2012
https://dl.acm.org/doi/10.1145/2259016.2259028
Xu SChen LGuo MHuang Z(2012)Shared work listProceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/2141702.2141716(124-133)Online publication date: 26-Feb-2012
https://dl.acm.org/doi/10.1145/2141702.2141716
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents