Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2254064.2254122acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Effective parallelization of loops in the presence of I/O operations

Published: 11 June 2012 Publication History

Abstract

Software-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors. However, none of the previous efforts deal with efficient parallelization of hybrid loops, i.e., loops that contain a mix of computation and I/O operations. In this paper, we propose a set of techniques for efficiently parallelizing hybrid loops. Our techniques apply DOALL parallelism to hybrid loops by breaking the cross-iteration dependences caused by I/O operations. We also support speculative execution of I/O operations to enable speculative parallelization of hybrid loops. Helper threading is used to reduce the I/O bus contention caused by the improved parallelism. We provide an easy-to-use programming model for exploiting parallelism in loops with I/O operations. Parallelizing hybrid loops using our model requires few modifications to the code. We have developed a prototype implementation of our programming model. We have evaluated our implementation on a 24-core machine using eight applications, including a widely-used genomic sequence assembler and a multi-player game server, and others from PARSEC and SPEC CPU2000 benchmark suites. The hybrid loops in these applications take 23%-99% of the total execution time on our 24-core machine. The parallelized applications achieve speedups of 3.0x-12.8x with hybrid loop parallelization over the sequential versions of the same applications. Compared to the versions of applications where only computation loops are parallelized, hybrid loop parallelization improves the application performance by 68% on average.

References

[1]
DDBJ sequence read archive.\\ http://trace.ddbj.nig.ac.jp/dra/index_e.shtml.
[2]
Space tyrant. http://spacetyrant.com/st.c.
[3]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 72--81, 2008.
[4]
C. Blundell, E. C. Lewis, and M. M. K. Martin. Unrestricted transactional memory: Supporting I/O and system calls within transactions. Technical Report TR-CIS-06-09, University of Pennsylvania, 2006.
[5]
A. D. Brown, T. C. Mowry, and O. Krieger. Compiler-based I/O prefetching for out-of-core applications. ACM Transactions on Computer Systems, 19: 111--170, May 2001.
[6]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA), pages 519--538, 2005.
[7]
U. Consortium. UPC language specifications, v1.2. Berkeley Lab Technical Report LBNL-59208, 2005.
[8]
L. Dagum and R. Menon. Openmp: An industry-standard api for shared-memory programming. IEEE computational science & engineering, 5 (1): 46--55, 1998.
[9]
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 223--234, 2007.
[10]
M. Feng, R. Gupta, and Y. Hu. SpiceC: scalable parallelism via implicit copying and explicit commit. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 69--80, 2011.
[11]
W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the Message Passing Interface. The MIT Press, 1994.
[12]
J. L. Henning. SPEC CPU2000: Measuring cpu performance in the new millennium. Computer, 33: 28--35, July 2000.
[13]
K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast track: A software system for speculative program optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 157--168, 2009.
[14]
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 211--222, 2007.
[15]
M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic parallelism benefits from data partitioning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 233--243, 2008.
[16]
M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A suite of parallel irregular programs. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 65--76, 2009.
[17]
C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers, 36: 1425--1439, 1987.
[18]
D. Quinlan. Rose: Compiler support for object-oriented framework. In Proceedings of the Workshop on Compilers for Parallel Computers (CPC), 2000.
[19]
A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 65--76, 2010.
[20]
J. Reinders. Intel threading building blocks. O'Reilly Media, 2007.
[21]
M. Scott, M. F. Spear, L. Dalessandro, and V. J. Marathe. Delaunay triangulation with transactions and barriers. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2007.
[22]
A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. Wiley Publishing, 2008.
[23]
S. W. Son, S. P. Muralidhara, O. Ozturk, M. Kandemir, I. Kolcu, and M. Karakoy. Profiler and compiler assisted adaptive I/O prefetching for shared storage caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 112--121, 2008.
[24]
R. Thakur, W. Gropp, and E. Lusk. An abstract-device interface for implementing portable parallel-I/O interfaces. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS), pages 180--187, 1996.
[25]
R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS), pages 182--191, 1999.
[26]
C. Tian, M. Feng, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 330--341, 2008.
[27]
C. Tian, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), pages 62--73, 2010.
[28]
C. Tian, C. Lin, M. Feng, and R. Gupta. Enhanced speculative parallelization via incremental recovery. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 189--200, 2011.
[29]
N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 49--59, 2007.
[30]
D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Research, 18: 821--829, 2008.

Cited By

View all

Index Terms

  1. Effective parallelization of loops in the presence of I/O operations

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2012
    572 pages
    ISBN:9781450312059
    DOI:10.1145/2254064
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 6
      PLDI '12
      June 2012
      534 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2345156
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. doall parallelization
    2. helper threading
    3. i/o contention
    4. speculative parallelization

    Qualifiers

    • Research-article

    Conference

    PLDI '12
    Sponsor:

    Acceptance Rates

    PLDI '12 Paper Acceptance Rate 48 of 255 submissions, 19%;
    Overall Acceptance Rate 406 of 2,067 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)On the choice of the best chunk size for the speculative execution of loopsPLOS ONE10.1371/journal.pone.026760217:5(e0267602)Online publication date: 17-May-2022
    • (2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
    • (2013)General data structure expansion for multi-threadingACM SIGPLAN Notices10.1145/2499370.246218248:6(243-252)Online publication date: 16-Jun-2013
    • (2013)General data structure expansion for multi-threadingProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462182(243-252)Online publication date: 16-Jun-2013
    • (2017)Performance Evaluation and Enhancement of Process-Based Parallel Loop ExecutionInternational Journal of Parallel Programming10.1007/s10766-015-0394-145:1(185-198)Online publication date: 1-Feb-2017

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media