Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3295500.3356205acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

D2P: from recursive formulations to distributed-memory codes

Published: 17 November 2019 Publication History

Abstract

Recursive formulations of programs are straightforward to reason about and write, often have good locality properties, and readily expose parallelism. We observe that it is easier to automatically generate distributed-memory codes for recursive formulations with certain properties: i) inclusive---a recursive method's parameters summarize the data access done within the method body. ii) Intersection---data-set intersection tests among method invocations can be computed efficiently. In this paper we present D2P, a system that automatically generates distributed-memory codes for recursive divide-conquer algorithms with these properties. D2P produces MPI-based implementations starting from shared-memory specifications of the recursive algorithms. We evaluate D2P with recursive Dynamic Programming (DP) algorithms, since these algorithms have the desired properties and are well known. We show that the generated implementations are scalable and efficient: D2P-generated implementations execute faster than implementations generated by recent distributed DP frameworks, and are competitive with (and often faster than) hand-written implementations.

References

[1]
Marco Aldinucci, Massimiliano Meneghin, and Massimo Torquati. 2010. Efficient Smith-Waterman on multi-core with FastFlow. In Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on. IEEE, 195--199.
[2]
Matthew Baker, Aaron Welch, and Manjunath Gorentla Venkata. 2014. Parallelizing the Smith-Waterman algorithm using OpenSHMEM and MPI-3 one-sided interfaces. In Workshop on OpenSHMEM and Related Technologies. Springer, 178--191.
[3]
Ayon Basumallik and Rudolf Eigenmann. 2006. Optimizing irregular shared-memory applications for distributed-memory systems. In Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM, 119--128.
[4]
Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing locality and independence with logical regions. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for. IEEE, 1--11.
[5]
Richard Bellman. 1957. Dynamic Programming (1 ed.). Princeton University Press, Princeton, NJ, USA. http://books.google.com/books?id=fyVtp3EMxasC&pg=PR5&dq=dynamic+programming+richard+e+bellman&client=firefox-a#v=onepage&q=dynamic%20programming%20richard%20e%20bellman&f=false
[6]
Uday Bondhugula. 2011. Automatic Distributed Memory Code Generation using the Polyhedral Framework. Technical Report IISc-CSA-TR-2011-3. Indian Institute of Science, Bangalore.
[7]
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Mathieu Faverge, Thomas Hérault, and Jack J Dongarra. 2013. Parsec: Exploiting heterogeneity to enhance scalability. Computing in Science & Engineering 15, 6 (2013), 36--45.
[8]
caida07 [n. d.]. The CAIDA AS Relationships Dataset, Nov'5, 2007. http://www.caida.org/data/as-relationships/. ([n. d.]).
[9]
Sanjay Chatterjee, Sagnak Tasirlar, Zoran Budimlic, Vincent Cave, Milind Chabbi, Max Grossman, Vivek Sarkar, and Yonghong Yan. 2013. Integrating asynchronous task parallelism with MPI. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. IEEE, 712--725.
[10]
Rezaul Chowdhury, Pramod Ganapathi, Jesmin Jahan Tithi, Charles Bachmeier, Bradley C Kuszmaul, Charles E Leiserson, Armando Solar-Lezama, and Yuan Tang. 2016. Autogen: Automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 10.
[11]
Rezaul Alan Chowdhury, Hai-Son Le, and Vijaya Ramachandran. 2010. Cache-oblivious dynamic programming for bioinformatics. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 7, 3 (2010), 495--510.
[12]
Rezaul Alam Chowdhury and Vijaya Ramachandran. 2008. Cache-efficient dynamic programming algorithms for multicores. In Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures. ACM, 207--216.
[13]
CilkPlus [n. d.]. Intel CilkPlus. https://www.cilkplus.org. ([n. d.]).
[14]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.
[15]
Aaron E Darling, Lucas Carey, and W Chun Feng. 2003. The design, implementation, and evaluation of mpiBLAST. Technical Report. Los Alamos National Laboratory.
[16]
Jun Du, Ce Yu, Jizhou Sun, Chao Sun, Shanjiang Tang, and Yanlong Yin. 2013. EasyHPS: A multilevel hybrid parallel system for dynamic programming. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International. IEEE, 630--639.
[17]
Robert W Floyd. 1962. Algorithm 97: shortest path. Commun. ACM 5, 6 (1962), 345.
[18]
Matteo Frigo, Charles E Leiserson, and Keith H Randall. 1998. The implementation of the Cilk-5 multithreaded language. In ACM Sigplan Notices, Vol. 33. ACM, 212--223.
[19]
Zvi Galil and Kunsoo Park. 1994. Parallel algorithms for dynamic programming recurrences with more than O (1) dependency. J. Parallel and Distrib. Comput. 21, 2 (1994), 213--222.
[20]
Kang Su Gatlin and Larry Carter. 1999. Architecture-cognizant divide and conquer algorithms. In Proceedings of the 1999 ACM/IEEE conference on Supercomputing. ACM, 25.
[21]
Khaled Hamidouche, Fernando Machado Mendonca, Joel Falcou, Alba Cristina Magalhaes Alves de Melo, and Daniel Etiemble. 2013. Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++. International Journal of Parallel Programming 41, 1 (01 Feb 2013), 111--136.
[22]
Chao Huang and Laxmikant Kale. 2007. Charisma: Orchestrating migratable parallel objects. In Proceedings of the 16th international symposium on High performance distributed computing. ACM, 75--84.
[23]
intelmpi [n. d.]. Intel MPI Libraries:. https://software.intel.com/en-us/intel-mpi-library. ([n. d.]).
[24]
Shachar Itzhaky, Rohit Singh, Armando Solar-Lezama, Kuat Yessenov, Yongquan Lu, Charles Leiserson, and Rezaul Chowdhury. 2016. Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, 145--164.
[25]
GT Klincsek. 1980. Minimal triangulations of polygonal domains. Annals of Discrete Mathematics 9 (1980), 121--123.
[26]
Okwan Kwon, Fahed Jubair, Rudolf Eigenmann, and Samuel Midkiff. 2012. A hybrid approach of OpenMP for clusters. In ACM SIGPLAN Notices, Vol. 47. ACM, 75--84.
[27]
Jonathan Lifflander and Sriram Krishnamoorthy. 2017. Cache locality optimization for recursive programs. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 1--16.
[28]
Saeed Maleki, Madanlal Musuvathi, and Todd Mytkowicz. 2014. Parallelizing dynamic programming through rank convergence. ACM SIGPLAN Notices 49, 8 (2014), 219--232.
[29]
Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. (1966).
[30]
Mahdi Noorian, Hamidreza Pooshfam, Zeinab Noorian, and Rosni Abdullah. 2009. Performance enhancement of smith-waterman algorithm using hybrid model: Comparing the mpi and hybrid programming paradigm on smp clusters. In Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on. IEEE, 492--497.
[31]
Ruth Nussinov, George Pieczenik, Jerrold R Griggs, and Daniel J Kleitman. 1978. Algorithms for loop matchings. SIAM Journal on Applied mathematics 35, 1 (1978), 68--82.
[32]
Mahesh Ravishankar, Roshan Dathathri, Venmugil Elango, Louis-Noël Pouchet, J Ramanujam, Atanas Rountev, and P Sadayappan. 2015. Distributed memory code generation for mixed irregular/regular computations. In ACM SIGPLAN Notices, Vol. 50. ACM, 65--75.
[33]
Vivek Sarkar. 1989. Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT Press, Cambridge, MA, USA.
[34]
Stephen T Sherry, M-H Ward, M Kholodov, J Baker, Lon Phan, Elizabeth M Smigielski, and Karl Sirotkin. 2001. dbSNP: the NCBI database of genetic variation. Nucleic acids research 29, 1 (2001), 308--311.
[35]
T.F. Smith and M.S. Waterman. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147, 1 (1981), 195 -- 197.
[36]
Saurabh Srivastava, Sumit Gulwani, and Jeffrey S Foster. 2010. From program verification to program synthesis. In ACM Sigplan Notices, Vol. 45. ACM, 313--326.
[37]
Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An Approach for Code Generation in the Sparse Polyhedral Framework. Parallel Comput. 53, C (April 2016), 32--57.
[38]
J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, and N. Wilkins-Diehr. 2014. XSEDE: Accelerating Scientific Discovery. Computing in Science & Engineering 16, 5 (Sept.-Oct. 2014), 62--74.
[39]
upcsw [n. d.]. SmithWaterman with OpenMPI and OpenMP. https://www.alexjf.net/projects/distributed-systems/smith-waterman-openmp-and-openmpi/. ([n.d.]).
[40]
Chen Wang, Ce Yu, Shanjiang Tang, Jian Xiao, Jizhou Sun, and Xiangfei Meng. 2016. A general and fast distributed system for large-scale dynamic programming applications. Parallel Comput. 60 (2016), 1--21.
[41]
Wenduo Zhou and David K Lowenthal. 2006. A parallel, out-of-core algorithm for RNA secondary structure prediction. In Parallel Processing, 2006. ICPP 2006. International Conference on. IEEE, 74--81.
[42]
Michal Ziv-Ukelson, Irit Gat-Viks, Ydo Wexler, and Ron Shamir. 2008. A faster algorithm for RNA co-folding. In International Workshop on Algorithms in Bioinformatics. Springer, 174--185.

Cited By

View all
  • (2021)Scaling implicit parallelism via dynamic control replicationProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441587(105-118)Online publication date: 17-Feb-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2019
1921 pages
ISBN:9781450362290
DOI:10.1145/3295500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed memory
  2. inspector-executor
  3. recursion

Qualifiers

  • Research-article

Funding Sources

Conference

SC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)143
  • Downloads (Last 6 weeks)20
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Scaling implicit parallelism via dynamic control replicationProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441587(105-118)Online publication date: 17-Feb-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media