Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/MICRO.2010.19acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Scalable Speculative Parallelization on Commodity Clusters

Published: 04 December 2010 Publication History

Abstract

While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In particular, high inter-node communication cost and lack of globally shared memory appear to make clusters suitable only for server applications with abundant task-level parallelism and scientific applications with regular and independent units of work. Clever use of pipeline parallelism (DSWP), thread-level speculation (TLS), and speculative pipeline parallelism (Spec-DSWP) can mitigate the costs of inter-thread communication on shared memory multicore machines. This paper presents Distributed Software Multi-threaded Transactional memory (DSMTX), a runtime system which makes these techniques applicable to non-shared memory clusters, allowing them to efficiently address inter-node communication costs. Initial results suggest that DSMTX enables efficient cluster execution of a wider set of application types. For 11 sequential C programs parallelized for a 4-core 32-node (128 total core) cluster without shared memory, DSMTX achieves a geomean speedup of 49x. This compares favorably to the 15x speedup achieved by our implementation of TLS-only support for clusters.

References

[1]
R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann Publishers Inc., 2002.
[2]
C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared memory computing on networks of workstations. Computer, 29(2):18-28, 1996.
[3]
D. I. August, N. Vachharajani, and M. J. Bridges. System and method for supporting multi-threaded transactions. United States Patent Application 12/380677. March 2008.
[4]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008.
[5]
R. L. Bocchino, V. S. Adve, and B. L. Chamberlain. Software transactional memory for large scale clusters. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, 2008.
[6]
M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007.
[7]
M. J. Bridges. The VELOCITY Compiler: Extracting Efficient Multicore Execution from Legacy Sequential Codes. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008.
[8]
D. Buntinas, G. Mercier, and W. Gropp. Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the nemesis communication subsystem. Parallel Computing, North-Holland, 33(9):634-644, 2007.
[9]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA '05: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, 2005.
[10]
M. Couceiro, P. Romano, N. Carvalho, and L. Rodrigues. D2STM: Dependable distributed software transactional memory. Pacific Rim International Symposium on Dependable Computing, IEEE, 0:307-313, 2009.
[11]
R. Cytron. DOACROSS: Beyond vectorization for multiprocessors. In Proceedings of the International Conference on Parallel Processing, August 1986.
[12]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI'04: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, 2004.
[13]
T. El-Ghazawi, W. Carlson, T. Sterling, and K. Yelick. UPC: Distributed Shared-Memory Programming. John Wiley and Sons, 2005.
[14]
J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, F. Pailet, S. Jain, T. Jacob, S. Yada, S. Marella, P. Salihundam, V. Erraguntla, M. Konow, M. Riepen, G. Droege, J. Lindemann, M. Gries, T. Apel, K. Henriss, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, R. Van Der Wijngaart, and T. Mattson. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, 7-11 2010.
[15]
J. Huang, A. Raman, Y. Zhang, T. B. Jablin, T.-H. Hung, and D. I. August. Decoupled Software Pipelining Creates Parallelization Opportunities. In Proceedings of the 2010 International Symposium on Code Generation and Optimization, April 2010.
[16]
C. Kotselidis, M. Ansari, K. Jarvis, M. Luján, C. Kirkham, and I. Watson. DiSTM: A software transactional memory framework for clusters. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, 2008.
[17]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, 2004.
[18]
K. Y. Luigi, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. pages 10-11, 1998.
[19]
K. Manassiev, M. Mihailescu, and C. Amza. Exploiting distributed version concurrency in a transactional memory cluster. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, 2006.
[20]
M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI '09: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009.
[21]
C. E. Oancea and A. Mycroft. Software thread-level speculation: an optimistic library implementation. In IWMSE '08: Proceedings of the 1st International Workshop on Multicore Software Engineering, 2008.
[22]
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO '05: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, 2005.
[23]
D. A. Patterson and J. L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, San Francisco, CA, 4th edition, 2008.
[24]
A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative Parallelization Using Software Multi-threaded Transactions. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2010.
[25]
R. Rangan, N. Vachharajani, A. Stoler, G. Ottoni, D. I. August, and G. Z. N. Cai. Support for high-frequency streaming in CMPs. In Proceedings of the 39th International Symposium on Microarchitecture, December 2006.
[26]
Standard Performance Evaluation Corporation (SPEC). http://www.spec.org.
[27]
J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23(3):253-300, February 2005.
[28]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In Proceedings of the 27th International Symposium on Computer Architecture, June 2000.
[29]
W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007.
[30]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, 2008.
[31]
N. Vachharajani. Intelligent Speculation for Pipelined Multithreading. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008.
[32]
N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, 2007.
[33]
R. M. Yoo and H.-H. S. Lee. Helper transactions: Enabling thread-level speculation via a transactional memory system. In PESPMA '08: Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, June 2008.
[34]
A. Zhai. Compiler Optimization of Value Communication for Thread-Level Speculation. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States, January 2005.
[35]
Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In The 5TH International Symposium on High-Performance Computer Architecture, February 1999.
[36]
H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In HPCA '08: Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008.
[37]
C. Zilles and G. Sohi. Master/slave speculative parallelization. In MICRO '02: Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, 2002.

Cited By

View all
  • (2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • (2015)Compiler-Driven Software Speculation for Thread-Level ParallelismACM Transactions on Programming Languages and Systems10.1145/282150538:2(1-45)Online publication date: 22-Dec-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
December 2010
542 pages
ISBN:9780769542997

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 04 December 2010

Check for updates

Author Tags

  1. distributed systems
  2. loop-level parallelism
  3. multi-threaded transactions
  4. pipelined parallelism
  5. software transactional memory
  6. thread-level speculation

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)SCAF: a speculation-aware collaborative dependence analysis frameworkProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386028(638-654)Online publication date: 11-Jun-2020
  • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
  • (2015)Compiler-Driven Software Speculation for Thread-Level ParallelismACM Transactions on Programming Languages and Systems10.1145/282150538:2(1-45)Online publication date: 22-Dec-2015
  • (2013)Compiling affine loop nests for distributed-memory parallel architecturesProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503289(1-12)Online publication date: 17-Nov-2013
  • (2013)Optimizing software runtime systems for speculative parallelizationACM Transactions on Architecture and Code Optimization10.1145/2400682.24006989:4(1-27)Online publication date: 20-Jan-2013
  • (2012)Unifying thread-level speculation and transactional memoryProceedings of the 13th International Middleware Conference10.5555/2442626.2442639(187-207)Online publication date: 3-Dec-2012
  • (2012)Dynamically dispatching speculative threads to improve sequential executionACM Transactions on Architecture and Code Optimization10.1145/2355585.23555869:3(1-31)Online publication date: 5-Oct-2012
  • (2012)Automatic speculative DOALL for clustersProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259029(94-103)Online publication date: 31-Mar-2012
  • (2012)HELIXProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259028(84-93)Online publication date: 31-Mar-2012
  • (2012)Shared work listProceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/2141702.2141716(124-133)Online publication date: 26-Feb-2012
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media