research-article

Effective parallelization of loops in the presence of I/O operations

Authors:

Iulian NeamtiuAuthors Info & Claims

PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 487 - 498

https://doi.org/10.1145/2254064.2254122

Published: 11 June 2012 Publication History

Abstract

Software-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors. However, none of the previous efforts deal with efficient parallelization of hybrid loops, i.e., loops that contain a mix of computation and I/O operations. In this paper, we propose a set of techniques for efficiently parallelizing hybrid loops. Our techniques apply DOALL parallelism to hybrid loops by breaking the cross-iteration dependences caused by I/O operations. We also support speculative execution of I/O operations to enable speculative parallelization of hybrid loops. Helper threading is used to reduce the I/O bus contention caused by the improved parallelism. We provide an easy-to-use programming model for exploiting parallelism in loops with I/O operations. Parallelizing hybrid loops using our model requires few modifications to the code. We have developed a prototype implementation of our programming model. We have evaluated our implementation on a 24-core machine using eight applications, including a widely-used genomic sequence assembler and a multi-player game server, and others from PARSEC and SPEC CPU2000 benchmark suites. The hybrid loops in these applications take 23%-99% of the total execution time on our 24-core machine. The parallelized applications achieve speedups of 3.0x-12.8x with hybrid loop parallelization over the sequential versions of the same applications. Compared to the versions of applications where only computation loops are parallelized, hybrid loop parallelization improves the application performance by 68% on average.

References

[1]

DDBJ sequence read archive.\\ http://trace.ddbj.nig.ac.jp/dra/index_e.shtml.

[2]

Space tyrant. http://spacetyrant.com/st.c.

[3]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72--81, 2008.

Digital Library

[4]

C. Blundell, E. C. Lewis, and M. M. K. Martin. Unrestricted transactional memory: Supporting I/O and system calls within transactions. Technical Report TR-CIS-06-09, University of Pennsylvania, 2006.

[5]

A. D. Brown, T. C. Mowry, and O. Krieger. Compiler-based I/O prefetching for out-of-core applications. ACM Transactions on Computer Systems, 19: 111--170, May 2001.

Digital Library

[6]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA), pages 519--538, 2005.

Digital Library

[7]

U. Consortium. UPC language specifications, v1.2. Berkeley Lab Technical Report LBNL-59208, 2005.

[8]

L. Dagum and R. Menon. Openmp: An industry-standard api for shared-memory programming. IEEE computational science & engineering, 5 (1): 46--55, 1998.

Digital Library

[9]

C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 223--234, 2007.

Digital Library

[10]

M. Feng, R. Gupta, and Y. Hu. SpiceC: scalable parallelism via implicit copying and explicit commit. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 69--80, 2011.

Digital Library

[11]

W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, 1994.

Digital Library

[12]

J. L. Henning. SPEC CPU2000: Measuring cpu performance in the new millennium. Computer, 33: 28--35, July 2000.

Digital Library

[13]

K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 157--168, 2009.

Digital Library

[14]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 211--222, 2007.

Digital Library

[15]

M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 233--243, 2008.

Digital Library

[16]

M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A suite of parallel irregular programs. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 65--76, 2009.

[17]

C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, 36: 1425--1439, 1987.

Digital Library

[18]

D. Quinlan. Rose: Compiler support for object-oriented framework. In Proceedings of the Workshop on Compilers for Parallel Computers (CPC), 2000.

[19]

A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 65--76, 2010.

Digital Library

[20]

J. Reinders. Intel threading building blocks. O'Reilly Media, 2007.

Digital Library

[21]

M. Scott, M. F. Spear, L. Dalessandro, and V. J. Marathe. Delaunay triangulation with transactions and barriers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2007.

Digital Library

[22]

A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. Wiley Publishing, 2008.

Digital Library

[23]

S. W. Son, S. P. Muralidhara, O. Ozturk, M. Kandemir, I. Kolcu, and M. Karakoy. Profiler and compiler assisted adaptive I/O prefetching for shared storage caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 112--121, 2008.

Digital Library

[24]

R. Thakur, W. Gropp, and E. Lusk. An abstract-device interface for implementing portable parallel-I/O interfaces. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS), pages 180--187, 1996.

Digital Library

[25]

R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS), pages 182--191, 1999.

Digital Library

[26]

C. Tian, M. Feng, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 330--341, 2008.

Digital Library

[27]

C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 62--73, 2010.

Digital Library

[28]

C. Tian, C. Lin, M. Feng, and R. Gupta. Enhanced speculative parallelization via incremental recovery. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 189--200, 2011.

Digital Library

[29]

N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 49--59, 2007.

Digital Library

[30]

D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Research, 18: 821--829, 2008.

Cited By

Estebanez ALlanos DOrden DPalop B(2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
https://doi.org/10.1371/journal.pone.0267602
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Yu HKo HLi Z(2013)General data structure expansion for multi-threadingACM SIGPLAN Notices10.1145/2499370.246218248:6(243-252)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2499370.2462182
Show More Cited By

Index Terms

Effective parallelization of loops in the presence of I/O operations
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types

Recommendations

Effective parallelization of loops in the presence of I/O operations
PLDI '12

Software-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors. However, none of the previous efforts deal with efficient ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation

The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...
A cost-driven compilation framework for speculative parallelization of sequential programs
PLDI '04

The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2012

572 pages

ISBN:9781450312059

DOI:10.1145/2254064

General Chairs:
Jan Vitek
Purdue University
,
Haibo Lin
Microsoft China
,
Program Chair:
Frank Tip
IBM T.J. Watson Research Center

ACM SIGPLAN Notices Volume 47, Issue 6
PLDI '12
June 2012
534 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2345156
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '12

Sponsor:

SIGPLAN

PLDI '12: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 11 - 16, 2012

Beijing, China

Acceptance Rates

PLDI '12 Paper Acceptance Rate 48 of 255 submissions, 19%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
423
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Estebanez ALlanos DOrden DPalop B(2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
https://doi.org/10.1371/journal.pone.0267602
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Yu HKo HLi Z(2013)General data structure expansion for multi-threadingACM SIGPLAN Notices10.1145/2499370.246218248:6(243-252)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2499370.2462182
Yu HKo HLi ZBoehm HFlanagan C(2013)General data structure expansion for multi-threadingProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462182(243-252)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2491956.2462182
Lu XChen LLi Z(2017)Performance Evaluation and Enhancement of Process-Based Parallel Loop ExecutionInternational Journal of Parallel Programming10.1007/s10766-015-0394-145:1(185-198)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s10766-015-0394-1

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents