Article

Free access

A new model for integrated nested task and data parallel programming

Authors:

Jaspal Subhlok,

Bwolen YangAuthors Info & Claims

PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 1 - 12

https://doi.org/10.1145/263764.263768

Published: 21 June 1997 Publication History

Abstract

High Performance Fortran (HPF) has emerged as a standard language fordata parallel computing. However, a wide variety of scientific applications are best programmed by a combination of task and data parallelism. Therefore, a good model of task parallelism is important for continued success of HPF for parallel programming. This paper presents a task parallelism model that is simple, elegant, and relatively easy to implement in an HPF environment. Task parallelism is exploited by mechanisms for dividing processors into subgroups and mapping computations and data onto processor subgroups. This model of task parallelism has been implemented in the Fx compiler at Carnegie Mellon University. The paper addresses the main issues in compiling integrated task and data parallel programs and reports on the use of this model for programming various flat and nested task structures. Performance results are presented for a set of programs spanning signal processing, image processing, computer vision and environment modeling. A variant of this task model is a new approved extension of HPF and this paper offers insight into the power of expression and ease of implementation of this extension.

References

[1]

AGRAWAL, G., SUSSMAN, A., AND SALTZ, J. An integrated runtime and compile-time approach for paxallelizing structured and block structured applications. IEEE Transactions on Parallel and Distributed Systems 6, 7 (July 1995), 747-754.]]

Digital Library

[2]

BARNES, J., AND P.HuT. A hierarchical O(N log N) force calculation algorithm. Nature 4, 324 (1986), 446- 449.]]

[3]

CHAPMAN# B.# MEHROTRA# P.# VAN ROSENDALE# J.# AND ZIMA, H. A software architecture for multidisciplinary applications: Integrating task and data parallelism. Tech. Rep. 94-18, ICASE, NASA Langley Research Center, Hampton, VA, Mar. 1994.]]

Digital Library

[4]

CHAPMAN# B.# MEHROTRA, P., AND ZIMA# H. Programming in Vienna Fortran. Scientific Programming 1, 1 (Aug. 1992), 31--50.]]

Digital Library

[5]

DINDA# P.# GROSS, T.# O'HALLARON# D.# SEGALL# E.# STICHNOTH# J., SUBHLOK# J.# WEBB# J.# AND YANG, B. The CMU task parallel program suite. Tech. Rep. CMU-CS-94-131, School of Computer Science, Carnegie Mellon University, Mar. 1994.]]

[6]

FOSTER, I., AVALANI, B., CHOUDHARY, h., AND Xu, M. A compilation system that integrates High Performance Fortran and Fortran M. In Proceeding of 199# Scalable High Performance Computing Conference (Knoxville, TN, October 1994), pp. 293-300.]]

[7]

FOSTER, I., KOHR, D., KrUSHNAXYEa, R., AND CHOUDHArtY, A. Double standards: Bringing task parallelism to HPF via the Message Passing Interface. In Supercomputing '96 (Pittsburgh, PA, November 1996).]]

Digital Library

[8]

GrtopP, W., LUSK, E., AND SKJELLUM, A. Using MPI: Portable parallel processing with the Message Passing Interface. The MIT Press, Cambridge, MA, 1994.]]

Digital Library

[9]

GRoss, T., O'HALLArtON, D., AND SUBHLOK, J. Task parallelism in a High Performance Fortran framework. IEEE Parallel # Distributed Technology, 3 (1994), 16- 26.]]

Digital Library

[10]

Hmn PErtFOaMANCn FOaTaAN FOaUM. High Performance Fortran Language Specification, Draft Version 2.0, Dec. 1996.]]

[11]

HmANANDANI, S., KENNEDY, K., AND TSEN(#, C. Compiling fortran D for MIMD distributed-memory machines. Commumeations of the A CM 35, 8 (August 1992), 66-80.]]

Digital Library

[12]

KOELBEL, C., LOVEMAN, D., STEELE, G., AND ZOSEL, M. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.]]

Digital Library

[13]

MCRAE, G., RUSSELL, A., AND HARLEY, R. CIT Photochemical A irshed Model- Systems Manual. Carnegie Mellon University, Pittsburgh, PA, and California Institute of Technology, Pasadena, CA, Feb. 1992.]]

[14]

MEUrtOTrtA, P., AND HAINES, M. An overview of the Opus language and runtime system. Tech. Rep. 94- 39, ICASE, NASA Langley Research Center, Hampton, VA, May 1994.]]

Digital Library

[15]

OKUTOMI, M., AND KANADE, T. A multiple-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 4 (1993), 353-363.]]

Digital Library

[16]

RAMASWAMY, S., SAPATNEKAR, S., AND BANERJEE, P. A convex programming approach for exploiting data and functional parallelism. In Proceedings of the 199# International Conference on Parallel Processing (St Charles, IL, August 1994), vol. 2, pp. 116-125.]]

Digital Library

[17]

SHAW, G., GABEL, R., MARTINEZ, D., ROCCO, A., POHLIG, S., GERBER, A., NOONAN, Z., AND TEITEL- BAUM, K. Multiprocessors for radar signal processing. Tech. Rep. 961, MIT Lincoln Laboratory, Nov. 1992.]]

[18]

STICHNOTH, J., O'HALLARON, D., AND GROSS, T. Generating communication for array statements: Design, implementation, and evaluation. Journal of Parallel and Distributed Computing 21, 1 (1994), 150-159.]]

Digital Library

[19]

STRICKER, T., STICHNOTH, J., O'HALLARON, D., HINR1CHS, S., AND GROSS, T. Decoupling synchronization and data transfer in message passing systems of parallel computers. In Proceedings of the 1995 International Conference on Supercomputing (Barcelona, July 1995), ACM, pp. 1-10.]]

Digital Library

[20]

SUBHLOK, J., O'HALLARON, D., GROSS, T., DINDA, P., AND WEBB, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Supercomputing '94 (Washington, DC, November 1994), pp. 330-339.]]

[21]

SUBHLOK, J., AND VONDRAN, G. Optimal mapping of sequences of data parallel tasks. In Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Santa Barbara, CA, July 1995), pp. 134-143.]]

Digital Library

[22]

SUBH{,OK, J., AND VONDRAN, G. Optimal latencythroughput tradeoffs for data paxallel pipelines. In Eighth Annual A CM Symposium on Parallel Algorithms and Architectures (Padua, Italy, June 1996), pp. 62-71.]]

Digital Library

[23]

WI#BB, J. Latency and bandwidth consideration in parallel robotics image processing, in Supercomputing '93 (Portland, OR, Nov. 1993), pp. 230-239.]]

Digital Library

[24]

YANG, B., WEBB, J., STICHNOTH, J., O'HALLARON, D., AND GROSS, T. Do&merge: Integrating parallel loops and reductions. In Sixth Annual Workshop on Languages and Compilers for Parallel Computing (Portland, Oregon, Aug 1993).]]

Digital Library

Cited By

Wu QLi RBeard JJohn LRodríguez GSadayappan PSukumaran-Rajam A(2024)BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641568(100-112)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641568
Jenista JEom YDemsky B(2011)OoOJavaACM SIGPLAN Notices10.1145/2038037.194156346:8(57-68)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/2038037.1941563
Jenista JEom YDemsky BCascaval CYew P(2011)OoOJavaProceedings of the 16th ACM symposium on Principles and practice of parallel programming10.1145/1941553.1941563(57-68)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/1941553.1941563
Show More Cited By

Index Terms

A new model for integrated nested task and data parallel programming

Recommendations

A new model for integrated nested task and data parallel programming

High Performance Fortran (HPF) has emerged as a standard language fordata parallel computing. However, a wide variety of scientific applications are best programmed by a combination of task and data parallelism. Therefore, a good model of task ...
Data-Parallel Programming on MIMD Computers

The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of ...
PDDP, a data parallel programming model

PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming

June 1997

287 pages

ISBN:0897919068

DOI:10.1145/263764

Chairmen:
Rob Schreiber
Hewlett-Packard Labs, Palo Alto, CA
,
Keshav Pingali
Cornell Univ., Ithaca, NY
,
Editor:
Michael A. Berman

ACM SIGPLAN Notices Volume 32, Issue 7
July 1997
287 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/263767
Chairmen:
Rob Schreiber
Hewlett-Packard Labs, Palo Alto, CA
,
Keshav Pingali
Cornell Univ., Ithaca, NY
,
Editor:
A. Michael Berman
Issue’s Table of Contents

Copyright © 1997 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 1997

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PPoPP97

Sponsor:

SIGPLAN

PPoPP97: Principles & Practices of Parallel Programming

June 18 - 21, 1997

Nevada, Las Vegas, USA

Acceptance Rates

PPOPP '97 Paper Acceptance Rate 26 of 86 submissions, 30%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
647
Total Downloads

Downloads (Last 12 months)136
Downloads (Last 6 weeks)12

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu QLi RBeard JJohn LRodríguez GSadayappan PSukumaran-Rajam A(2024)BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641568(100-112)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641568
Jenista JEom YDemsky B(2011)OoOJavaACM SIGPLAN Notices10.1145/2038037.194156346:8(57-68)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/2038037.1941563
Jenista JEom YDemsky BCascaval CYew P(2011)OoOJavaProceedings of the 16th ACM symposium on Principles and practice of parallel programming10.1145/1941553.1941563(57-68)Online publication date: 12-Feb-2011
https://dl.acm.org/doi/10.1145/1941553.1941563
Dummler JRauber TRunger G(2011)Semi-dynamic Scheduling of Parallel Tasks for Heterogeneous Clusters2011 10th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2011.11(1-8)Online publication date: Jul-2011
https://doi.org/10.1109/ISPDC.2011.11
Rauber TRünger G(2010)Software Architectures for Flexible Task-Oriented Program Execution on Multicore SystemsComplex Systems Design & Management10.1007/978-3-642-15654-0_9(123-135)Online publication date: 2010
https://doi.org/10.1007/978-3-642-15654-0_9
Dümmler JRauber TRünger G(2007)Communicating Multiprocessor-TasksLanguages and Compilers for Parallel Computing10.1007/978-3-540-85261-2_20(292-307)Online publication date: 1-Oct-2007
https://dl.acm.org/doi/10.1007/978-3-540-85261-2_20
Bansal SKumar PSingh K(2006)An improved two-step algorithm for task and data parallel scheduling in distributed memory machinesParallel Computing10.1016/j.parco.2006.08.00432:10(759-774)Online publication date: 1-Nov-2006
https://dl.acm.org/doi/10.1016/j.parco.2006.08.004
Nieplocha JKrishnan MPalmer BTipparaju VZhang YBagherzadeh NValero MRamirez A(2005)Exploiting processor groups to extend scalability of the GA shared memory programming modelProceedings of the 2nd conference on Computing frontiers10.1145/1062261.1062305(262-272)Online publication date: 4-May-2005
https://dl.acm.org/doi/10.1145/1062261.1062305
Hirano MSato MTanaka Y(2005)OpenGRParallel Computing10.1016/j.parco.2005.03.01631:10-12(1140-1154)Online publication date: 1-Oct-2005
https://dl.acm.org/doi/10.1016/j.parco.2005.03.016
Angelov YRamachandran UMackenzie KMatthew Rehg JEssa I(2005)Experiences with optimizing two stream-based applications for cluster executionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2005.02.00265:6(678-691)Online publication date: 1-Jun-2005
https://dl.acm.org/doi/10.1016/j.jpdc.2005.02.002
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten