Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/263764.263768acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article
Free access

A new model for integrated nested task and data parallel programming

Published: 21 June 1997 Publication History

Abstract

High Performance Fortran (HPF) has emerged as a standard language fordata parallel computing. However, a wide variety of scientific applications are best programmed by a combination of task and data parallelism. Therefore, a good model of task parallelism is important for continued success of HPF for parallel programming. This paper presents a task parallelism model that is simple, elegant, and relatively easy to implement in an HPF environment. Task parallelism is exploited by mechanisms for dividing processors into subgroups and mapping computations and data onto processor subgroups. This model of task parallelism has been implemented in the Fx compiler at Carnegie Mellon University. The paper addresses the main issues in compiling integrated task and data parallel programs and reports on the use of this model for programming various flat and nested task structures. Performance results are presented for a set of programs spanning signal processing, image processing, computer vision and environment modeling. A variant of this task model is a new approved extension of HPF and this paper offers insight into the power of expression and ease of implementation of this extension.

References

[1]
AGRAWAL, G., SUSSMAN, A., AND SALTZ, J. An integrated runtime and compile-time approach for paxallelizing structured and block structured applications. IEEE Transactions on Parallel and Distributed Systems 6, 7 (July 1995), 747-754.]]
[2]
BARNES, J., AND P.HuT. A hierarchical O(N log N) force calculation algorithm. Nature 4, 324 (1986), 446- 449.]]
[3]
CHAPMAN# B.# MEHROTRA# P.# VAN ROSENDALE# J.# AND ZIMA, H. A software architecture for multidisciplinary applications: Integrating task and data parallelism. Tech. Rep. 94-18, ICASE, NASA Langley Research Center, Hampton, VA, Mar. 1994.]]
[4]
CHAPMAN# B.# MEHROTRA, P., AND ZIMA# H. Programming in Vienna Fortran. Scientific Programming 1, 1 (Aug. 1992), 31--50.]]
[5]
DINDA# P.# GROSS, T.# O'HALLARON# D.# SEGALL# E.# STICHNOTH# J., SUBHLOK# J.# WEBB# J.# AND YANG, B. The CMU task parallel program suite. Tech. Rep. CMU-CS-94-131, School of Computer Science, Carnegie Mellon University, Mar. 1994.]]
[6]
FOSTER, I., AVALANI, B., CHOUDHARY, h., AND Xu, M. A compilation system that integrates High Performance Fortran and Fortran M. In Proceeding of 199# Scalable High Performance Computing Conference (Knoxville, TN, October 1994), pp. 293-300.]]
[7]
FOSTER, I., KOHR, D., KrUSHNAXYEa, R., AND CHOUDHArtY, A. Double standards: Bringing task parallelism to HPF via the Message Passing Interface. In Supercomputing '96 (Pittsburgh, PA, November 1996).]]
[8]
GrtopP, W., LUSK, E., AND SKJELLUM, A. Using MPI: Portable parallel processing with the Message Passing Interface. The MIT Press, Cambridge, MA, 1994.]]
[9]
GRoss, T., O'HALLArtON, D., AND SUBHLOK, J. Task parallelism in a High Performance Fortran framework. IEEE Parallel # Distributed Technology, 3 (1994), 16- 26.]]
[10]
Hmn PErtFOaMANCn FOaTaAN FOaUM. High Performance Fortran Language Specification, Draft Version 2.0, Dec. 1996.]]
[11]
HmANANDANI, S., KENNEDY, K., AND TSEN(#, C. Compiling fortran D for MIMD distributed-memory machines. Commumeations of the A CM 35, 8 (August 1992), 66-80.]]
[12]
KOELBEL, C., LOVEMAN, D., STEELE, G., AND ZOSEL, M. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.]]
[13]
MCRAE, G., RUSSELL, A., AND HARLEY, R. CIT Photochemical A irshed Model- Systems Manual. Carnegie Mellon University, Pittsburgh, PA, and California Institute of Technology, Pasadena, CA, Feb. 1992.]]
[14]
MEUrtOTrtA, P., AND HAINES, M. An overview of the Opus language and runtime system. Tech. Rep. 94- 39, ICASE, NASA Langley Research Center, Hampton, VA, May 1994.]]
[15]
OKUTOMI, M., AND KANADE, T. A multiple-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 4 (1993), 353-363.]]
[16]
RAMASWAMY, S., SAPATNEKAR, S., AND BANERJEE, P. A convex programming approach for exploiting data and functional parallelism. In Proceedings of the 199# International Conference on Parallel Processing (St Charles, IL, August 1994), vol. 2, pp. 116-125.]]
[17]
SHAW, G., GABEL, R., MARTINEZ, D., ROCCO, A., POHLIG, S., GERBER, A., NOONAN, Z., AND TEITEL- BAUM, K. Multiprocessors for radar signal processing. Tech. Rep. 961, MIT Lincoln Laboratory, Nov. 1992.]]
[18]
STICHNOTH, J., O'HALLARON, D., AND GROSS, T. Generating communication for array statements: Design, implementation, and evaluation. Journal of Parallel and Distributed Computing 21, 1 (1994), 150-159.]]
[19]
STRICKER, T., STICHNOTH, J., O'HALLARON, D., HINR1CHS, S., AND GROSS, T. Decoupling synchronization and data transfer in message passing systems of parallel computers. In Proceedings of the 1995 International Conference on Supercomputing (Barcelona, July 1995), ACM, pp. 1-10.]]
[20]
SUBHLOK, J., O'HALLARON, D., GROSS, T., DINDA, P., AND WEBB, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Supercomputing '94 (Washington, DC, November 1994), pp. 330-339.]]
[21]
SUBHLOK, J., AND VONDRAN, G. Optimal mapping of sequences of data parallel tasks. In Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Santa Barbara, CA, July 1995), pp. 134-143.]]
[22]
SUBH{,OK, J., AND VONDRAN, G. Optimal latencythroughput tradeoffs for data paxallel pipelines. In Eighth Annual A CM Symposium on Parallel Algorithms and Architectures (Padua, Italy, June 1996), pp. 62-71.]]
[23]
WI#BB, J. Latency and bandwidth consideration in parallel robotics image processing, in Supercomputing '93 (Portland, OR, Nov. 1993), pp. 230-239.]]
[24]
YANG, B., WEBB, J., STICHNOTH, J., O'HALLARON, D., AND GROSS, T. Do&merge: Integrating parallel loops and reductions. In Sixth Annual Workshop on Languages and Compilers for Parallel Computing (Portland, Oregon, Aug 1993).]]

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
June 1997
287 pages
ISBN:0897919068
DOI:10.1145/263764
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PPoPP97
Sponsor:
PPoPP97: Principles & Practices of Parallel Programming
June 18 - 21, 1997
Nevada, Las Vegas, USA

Acceptance Rates

PPOPP '97 Paper Acceptance Rate 26 of 86 submissions, 30%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)85
  • Downloads (Last 6 weeks)22
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641568(100-112)Online publication date: 17-Feb-2024
  • (2011)OoOJavaACM SIGPLAN Notices10.1145/2038037.194156346:8(57-68)Online publication date: 12-Feb-2011
  • (2011)OoOJavaProceedings of the 16th ACM symposium on Principles and practice of parallel programming10.1145/1941553.1941563(57-68)Online publication date: 12-Feb-2011
  • (2011)Semi-dynamic Scheduling of Parallel Tasks for Heterogeneous Clusters2011 10th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2011.11(1-8)Online publication date: Jul-2011
  • (2010)Software Architectures for Flexible Task-Oriented Program Execution on Multicore SystemsComplex Systems Design & Management10.1007/978-3-642-15654-0_9(123-135)Online publication date: 2010
  • (2007)Communicating Multiprocessor-TasksLanguages and Compilers for Parallel Computing10.1007/978-3-540-85261-2_20(292-307)Online publication date: 1-Oct-2007
  • (2006)An improved two-step algorithm for task and data parallel scheduling in distributed memory machinesParallel Computing10.1016/j.parco.2006.08.00432:10(759-774)Online publication date: 1-Nov-2006
  • (2005)Exploiting processor groups to extend scalability of the GA shared memory programming modelProceedings of the 2nd conference on Computing frontiers10.1145/1062261.1062305(262-272)Online publication date: 4-May-2005
  • (2005)OpenGRParallel Computing10.1016/j.parco.2005.03.01631:10-12(1140-1154)Online publication date: 1-Oct-2005
  • (2005)Experiences with optimizing two stream-based applications for cluster executionJournal of Parallel and Distributed Computing10.1016/j.jpdc.2005.02.00265:6(678-691)Online publication date: 1-Jun-2005
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media