Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1654059.1654103acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

PFunc: modern task parallelism for modern high performance computing

Published: 14 November 2009 Publication History

Abstract

HPC today faces new challenges due to paradigm shifts in both hardware and software. The ubiquity of multi-cores, many-cores, and GPGPUs is forcing traditional serial as well as distributed-memory parallel applications to be parallelized for these architectures. Emerging applications in areas such as informatics are placing unique requirements on parallel programming tools that have not yet been addressed. Although, of all the available parallel programming models, task parallelism appears to be the most promising in meeting these new challenges, current solutions for task parallelism are inadequate. In this paper, we introduce PFunc, a new library for task parallelism that extends the feature set of current solutions for task parallelism with custom task scheduling, task priorities, task affinities, multiple completion notifications and task groups. These features enable PFunc to naturally and efficiently parallelize a wide variety of modern HPC applications and to support the SPMD model of parallel programming. We present three case studies: demand-driven DAG execution, frequent pattern mining and iterative sparse solvers to demonstrate the utility of PFunc's new features.

References

[1]
Cray XMT Programming Environment User's Guide. Cray. http://docs.cray.com.
[2]
Performance Application Programming Interface. Innovative Computing Laboratory, Knoxville, TN.
[3]
Datasets of the workshops on Frequent Itemset Mining Implementations (FIMI). University of Helsinki, 2004. http://fimi.cs.helsinki.fi/.
[4]
Martín Abadi and Leslie Lamport. Composing specifications. ACM Transactions on Programming Languages and Systems (TOPLAS, 15(1), Jan 1993.
[5]
Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between sets of items in large databases. In SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pages 207--216, New York, NY, USA, 1993. ACM.
[6]
Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In VLDB '94: Proceedings of the 20th International Conference on Very Large Data Bases, pages 487--499, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.
[7]
Eric Allen, David Chase, Joe Hallett, Victor Luchangco, Jan-Willem Maessen, Sukyoung Ryu, Guy L. Steele Jr., and Sam Tobin-Hochstadt. The Fortress Language Specification, Version 1.0. Technical report, Sun Microsystems, Inc., 2008.
[8]
P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: A standard template adaptive parallel C++ library. In International Workshop on Advanced Compiler Technology for High Performance and Embedded Processors, page 10, July 2001.
[9]
Ping An, Alin Jula, Silvius Rus, Steven Saunders, Tim Smith, Gabriel Tanase, Nathan Thomas, Nancy Amato, and Lawrence Rauchwerger. STAPL: An adaptive, generic parallel programming library for C++. In Workshop on Languages and Compilers for Parallel Computing, pages 193--208, August 2001.
[10]
B Bacci, S Gorlatch, C Lengauer, and S Pelagatti. Skeletons and transformations in an integrated parallel programming environment. Parallel Computing Technologies (PaCT-99), Jan 1999.
[11]
Micheal A. Bender and Micheal O. Rabin. Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk. Theory of Computing Systems Special Issue on SPAA00, 35:289--304, 2002.
[12]
Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509--517, 1975.
[13]
Robert D. Blumofe and Charles E. Leiserson. Scheduling Multithreaded Computations by Work Stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS, pages 356--368, 1994.
[14]
E. Boiten, A. Geerling, and H. Partsch. Transformational derivation of (parallel) programs using skeletons.
[15]
Ansgar Brüll and Herbert Kuchen. TPascal - A Language for Task Parallel Programming. In Euro-Par '96: Proceedings of the Second International Euro-Par Conference on Parallel Processing, pages 654--659, London, UK, 1996. Springer-Verlag.
[16]
Bradford L. Chamberlain, David Callahan, and Hans P. Zima. Parallel Programmability and the Chapel Language. International Journal of High Performance Computing Applications, Jan 2007.
[17]
K Chandy and C Kesselman. CC++: A declarative concurrent object oriented programming notation. Research Directions in Concurrent Object-Oriented Programming, Jan 1993.
[18]
Philippe Charles, Christopher Donawa, Kemal Ebcioglu, Chritstian Grothoff, Allan Kielstra, Christoph von Praun, Vijay Sarawsat, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. Proceedings of the 20th annual ACM SIGPLAN conference on Object Oriented Programming, Systems, Languages and Applications, Jan 2005.
[19]
M Cole. Algorithmic skeletons: structured management of parallel computation. homepages.inf.ed.ac.uk, Jan 1989.
[20]
M Cole. Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing, Jan 2004.
[21]
Guojing Cong, Sreedhar Kodali, Sriram Krishnamoorthy, Doug Lea, Vijay Saraswat, and Tong Wen. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 536--545, Washington, DC, USA, 2008. IEEE Computer Society.
[22]
Frederica Darema. SPMD model: past, present and future, Recent Advances in Parallel Virtual Machine and Message Passing Interface. volume 1 of 8th European PVM/MPI Users' Group Meeting, pages 23--26, 2001.
[23]
J. Darlington, A. J. Field, P. G. Harrison, P. H. J. Kelly, D. W. N. Sharp, Q. Wu, and R. L. While. Parallel programming using skeleton functions. In A. Bode, M. Reeve, and G. Wolf, editors, PARLE '93: Parallel Architectures and Languages Europe, pages 146--160. Springer-Verlag, Berlin, DE, 1993.
[24]
J Darlington, Y Guo, H To, and J Yang. Parallel skeletons for structured composition. 5th ACM SIGPLAN Symposium on Principles and Practice of parallel programming, Jan 1995.
[25]
Timothy Davis. The University of Florida Sparse Matrix Collection,. Technical report, University of Florida, 1998. http://www.cise.ufl.edu/research/sparse/matrices/.
[26]
I Foster, R Olson, and S Tuecke. Programming in fortran m. osti.gov, Jan 1993.
[27]
I Foster and S Taylor. Strand: A practical parallel programming language. North American Conference on Logic Programming, 2008.
[28]
Ian Foster. Compositional parallel programming languages. ACM Transactions on Programming Languages and Systems (TOPLAS, 18(4), Jul 1996.
[29]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Quebec, Canada, June 1998. Proceedings published in ACM SIGPLAN Notices, Vol. 33, No. 5, May, 1998.
[30]
Dennis Gannon, Peter Beckman, Elizabeth Johnson, Todd Green, and Mike Levine. HPC++ and the HPC++Lib Toolkit. The High Performance C++ consortium.
[31]
Amol Ghoting, Gregory Buehrer, Srinivasan Parthasarathy, Daehyun Kim, Anthony Nguyen, Yen-Kuang Chen, and Pradeep Dubey. Cache-conscious frequent pattern mining on modern and emerging processors. The VLDB Journal, 16(1):77--96, 2007.
[32]
Thomas Gross, David R. O'Hallaron, and Jaspal Subhlok. Task Parallelism in a High Performance Fortran Framework. IEEE Parallel Distrib. Technol., 2(3):16--26, 1994.
[33]
Yi Guo, Rajkishore Barik, Raghavan Raman, and Vivek Sarkar. Work-First and Help-First Scheduling Policies for Async-Finish Task Parallelism. In Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (to appear), May 2009.
[34]
Anshul Gupta. Improved symbolic and numerical factorization algorithms for unsymmetric sparse matrices. SIAM Journal on Matrix Analysis and Applications, 24(2):529--552, 2002.
[35]
Anshul Gupta. WSMP: Watson Sparse Matrix Package (Part-II: Direct solution of general sparse systems). Technical Report RC 21888, IBM T. J. Watson Research Center, Yorktown Heights, NY, November 2000. http://www.cs.umn.edu/~agupta/wsmp.
[36]
Robert H. Halstead, Jr. MULTILISP: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst., 7(4):501--538, 1985.
[37]
Eric Juan, Jeffrey Tsai, and Tadao Murata. Compositional verification of concurrent systems using petri-net-based condensation rules. ACM Transactions on Programming Languages and Systems (TOPLAS, 20(5), Sep 1998.
[38]
Laxmikant V. Kale and Sanjeev Krishnan. Charm++: A portable concurrent object oriented system based on c. In In Proceedings of the Conference on Object Oriented Programming Systems, Languages and Applications, pages 91--108. ACM Press, 1993.
[39]
Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta, and Andrew Lumsdaine. Extending Task Parallelism For Frequent Pattern Mining. In Proceedings of the International Conference on Parallel Computing (ParCO), Lyon, France, September 2009.
[40]
Prabhanjan Kambadur, Torsten Hoefler, Anshul Gupta, and Andrew Lumsdaine. Demand-driven Execution of Static Direct Acyclic Graphs Using Task Parallelism. In International Conference on High Performance Computing (HiPC), Kochi, India, December 2009.
[41]
George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20:359--392, 1998.
[42]
H Kuchen. A skeleton library. Springer, Jan 2002.
[43]
Ulrich Meyer and Peter Sanders. Δ-stepping: A parallelizable shortest path algorithm. J. Algorithms, 49(1):114--152, 2003.
[44]
Aaftab Munshi. OpenCL: Parallel Computing on the GPU and CPU. Presentation at the Special Interest Group on Graphics and Interactive Techniques (SIGGRAPH), 2008. http://s08.idav.ucdavis.edu/munshi-opencl.pdf.
[45]
OpenMP Architecture Review Board. OpenMP Application Program Interface, version 3.0. May 2008.
[46]
James Reinders. Intel Threading Building Blocks. O'Reilly, 2007.
[47]
Yousef Saad. Iterative Methods for Sparse Linear Systems, 2nd edition. SIAM, 2003.
[48]
Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Micheal Abrash, Pradeep Dubey, Stephen Jenkins, Adam Lake, Jeremy Sugerman, Robert Cavin nad Roger Espasa, Ed Grochowski, Toni Juan, and Pat Hanrahan. Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Transactions on Graphics, 27(3), August 2008.
[49]
Mohammed J Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li. New Algorithms for Fast Discovery of Association Rules. Technical report, Rochester, NY, USA, 1997.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
November 2009
778 pages
ISBN:9781605587448
DOI:10.1145/1654059
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '09
Sponsor:

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Placement Strategies: Structured Skeleton Composition with Location-Aware Remote DataTrends in Functional Programming10.1007/978-3-030-57761-2_11(229-248)Online publication date: 18-Aug-2020
  • (2019)FunctionFlowFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-6286-813:1(73-85)Online publication date: 1-Feb-2019
  • (2017)Extensibility and Composability of a Multi-Stencil Domain Specific FrameworkInternational Journal of Parallel Programming10.1007/s10766-017-0539-5Online publication date: 21-Nov-2017
  • (2016)An Efficient Nonlinear Regression Approach for Genome-wide Detection of Marginal and Interacting Genetic VariationsJournal of Computational Biology10.1089/cmb.2015.020223:5(372-389)Online publication date: May-2016
  • (2016)A sparse memory allocation data structure for sequential and parallel association rule miningThe Journal of Supercomputing10.1007/s11227-015-1566-x72:2(347-370)Online publication date: 1-Feb-2016
  • (2015)An Efficient Nonlinear Regression Approach for Genome-Wide Detection of Marginal and Interacting Genetic VariationsResearch in Computational Molecular Biology10.1007/978-3-319-16706-0_17(167-187)Online publication date: 26-Mar-2015
  • (2013)Hybrid parallel task placement in X10Proceedings of the third ACM SIGPLAN X10 Workshop10.1145/2481268.2481277(31-38)Online publication date: 20-Jun-2013
  • (2013)On the Merits of Distributed Work-Stealing on Selective Locality-Aware TasksProceedings of the 2013 42nd International Conference on Parallel Processing10.1109/ICPP.2013.19(100-109)Online publication date: 1-Oct-2013
  • (2012)Next challenges for adaptive learning systemsACM SIGKDD Explorations Newsletter10.1145/2408736.240874614:1(48-55)Online publication date: 10-Dec-2012
  • (2012)Sparse methods for biomedical dataACM SIGKDD Explorations Newsletter10.1145/2408736.240873914:1(4-15)Online publication date: 10-Dec-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media