Article

High-Performance Energy-Efficient Recursive Dynamic Programming with Matrix-Multiplication-Like Flexible Kernels

Authors:

Rezaul ChowdhuryAuthors Info & Claims

IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium

Pages 303 - 312

https://doi.org/10.1109/IPDPS.2015.107

Published: 25 May 2015 Publication History

Abstract

Dynamic Programming (DP) problems arise in wide range of application areas spanning from logistics to computational biology. In this paper, we show how to obtain high-performing parallel implementations for a class of Problems by reducing them to highly utilizable flexible kernels through cache-oblivious recursive divide- and-conquer(CORDAC). We implement parallel CORDAC algorithms for four non-trivial DP problems, namely the parenthesization problem, Floyd-Warshall's all-pairs shortest path (FW-APSP), sequence alignment with general gap penalty (gap problem)and protein accordion folding. To the best of our knowledge our algorithms for protein accordion folding and the gap problem are novel. All four algorithms have asymptotically optimal cache performance, and all but FW-APSP have asymptotically more parallelism than their looping counterparts. We show that the base cases of our CORDAC algorithms are predominantly matrix-multiplication-like (MM-like) flexible kernels that expose many optimization opportunities not offered by traditional looping DP codes. As a result, one can obtain highly efficient DP implementations by optimizing those flexible kernels only. Our implementations achieve 5--150 speedup over their standard loop based DP counterparts while consuming order-of-magnitude less energy on modern multicore machines with 16--32 cores. We also compareour implementations with parallel tiled codes generated by existing polyhedral compilers: Polly, PoCC and PLuTo, and show that our implementations run significantly faster. Finally, we present results on manicures (Intel Xeon Phi) and clusters of multicores obtained using simple extensions for SIMD and shared-distributed-shared-memory architectures, respectively, demonstrating the versatility of our approach. Our optimization approach is highly systematic and suitable for automation.

Cited By

View all

Ding XGu YSun YAgrawal KPetrank E(2024)Parallel and (Nearly) Work-Efficient Dynamic ProgrammingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659958(219-232)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659958
Shubham Prakash SGanapathi P(2022)An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix TranspositionInformation Processing Letters10.1016/j.ipl.2021.106166173:COnline publication date: 22-Apr-2022
https://dl.acm.org/doi/10.1016/j.ipl.2021.106166
Chowdhury RGanapathi PTschudi STithi JBachmeier CLeiserson CSolar-Lezama AKuszmaul BTang Y(2017)AutogenACM Transactions on Parallel Computing10.1145/31256324:1(1-30)Online publication date: 5-Oct-2017
https://dl.acm.org/doi/10.1145/3125632
Show More Cited By

High-Performance Energy-Efficient Recursive Dynamic Programming with Matrix-Multiplication-Like Flexible Kernels
1. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
High-performance recursive dynamic programming for bioinformatics using MM-like flexible kernels
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Dynamic Programming (DP) provides optimal solutions to a problem by combining optimal solutions to many overlapping subproblems. DP algorithms exploit this overlapping property to explore otherwise exponential-sized problem spaces in polynomial time, ...
Improving matrix-based dynamic programming on massively parallel accelerators

Dynamic programming techniques are well-established and employed by various practical algorithms, including the edit-distance algorithm or the dynamic time warping algorithm. These algorithms usually operate in an iteration-based manner where new values ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium

May 2015

1110 pages

ISBN:9781479986491

Publisher

IEEE Computer Society

United States

Publication History

Published: 25 May 2015

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ding XGu YSun YAgrawal KPetrank E(2024)Parallel and (Nearly) Work-Efficient Dynamic ProgrammingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659958(219-232)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659958
Shubham Prakash SGanapathi P(2022)An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix TranspositionInformation Processing Letters10.1016/j.ipl.2021.106166173:COnline publication date: 22-Apr-2022
https://dl.acm.org/doi/10.1016/j.ipl.2021.106166
Chowdhury RGanapathi PTschudi STithi JBachmeier CLeiserson CSolar-Lezama AKuszmaul BTang Y(2017)AutogenACM Transactions on Parallel Computing10.1145/31256324:1(1-30)Online publication date: 5-Oct-2017
https://dl.acm.org/doi/10.1145/3125632
Chowdhury RGanapathi PTang YTithi JScheideler CHajiaghayi M(2017)Provably Efficient Scheduling of Cache-oblivious Wavefront AlgorithmsProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087586(339-350)Online publication date: 24-Jul-2017
https://dl.acm.org/doi/10.1145/3087556.3087586
Itzhaky SSingh RSolar-Lezama AYessenov KLu YLeiserson CChowdhury R(2016)Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformationsACM SIGPLAN Notices10.1145/3022671.298399351:10(145-164)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1145/3022671.2983993
Chowdhury RGanapathi PTithi JBachmeier CKuszmaul BLeiserson CSolar-Lezama ATang Y(2016)AUTOGENACM SIGPLAN Notices10.1145/3016078.285116751:8(1-12)Online publication date: 27-Feb-2016
https://dl.acm.org/doi/10.1145/3016078.2851167
Itzhaky SSingh RSolar-Lezama AYessenov KLu YLeiserson CChowdhury RVisser ESmaragdakis Y(2016)Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformationsProceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications10.1145/2983990.2983993(145-164)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1145/2983990.2983993
Chowdhury RGanapathi PTithi JBachmeier CKuszmaul BLeiserson CSolar-Lezama ATang YAsenjo RHarris T(2016)AUTOGENProceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2851141.2851167(1-12)Online publication date: 27-Feb-2016
https://dl.acm.org/doi/10.1145/2851141.2851167
Chowdhury RGanapathi PPradhan VTithi JXiao Y(2016)An Efficient Cache-oblivious Parallel Viterbi AlgorithmProceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 983310.1007/978-3-319-43659-3_42(574-587)Online publication date: 24-Aug-2016
https://dl.acm.org/doi/10.1007/978-3-319-43659-3_42

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

High-performance recursive dynamic programming for bioinformatics using MM-like flexible kernels

Improving matrix-based dynamic programming on massively parallel accelerators

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations