Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/IPDPS.2015.107guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

High-Performance Energy-Efficient Recursive Dynamic Programming with Matrix-Multiplication-Like Flexible Kernels

Published: 25 May 2015 Publication History

Abstract

Dynamic Programming (DP) problems arise in wide range of application areas spanning from logistics to computational biology. In this paper, we show how to obtain high-performing parallel implementations for a class of Problems by reducing them to highly utilizable flexible kernels through cache-oblivious recursive divide- and-conquer(CORDAC). We implement parallel CORDAC algorithms for four non-trivial DP problems, namely the parenthesization problem, Floyd-Warshall's all-pairs shortest path (FW-APSP), sequence alignment with general gap penalty (gap problem)and protein accordion folding. To the best of our knowledge our algorithms for protein accordion folding and the gap problem are novel. All four algorithms have asymptotically optimal cache performance, and all but FW-APSP have asymptotically more parallelism than their looping counterparts. We show that the base cases of our CORDAC algorithms are predominantly matrix-multiplication-like (MM-like) flexible kernels that expose many optimization opportunities not offered by traditional looping DP codes. As a result, one can obtain highly efficient DP implementations by optimizing those flexible kernels only. Our implementations achieve 5--150 speedup over their standard loop based DP counterparts while consuming order-of-magnitude less energy on modern multicore machines with 16--32 cores. We also compareour implementations with parallel tiled codes generated by existing polyhedral compilers: Polly, PoCC and PLuTo, and show that our implementations run significantly faster. Finally, we present results on manicures (Intel Xeon Phi) and clusters of multicores obtained using simple extensions for SIMD and shared-distributed-shared-memory architectures, respectively, demonstrating the versatility of our approach. Our optimization approach is highly systematic and suitable for automation.

Cited By

View all
  • (2024)Parallel and (Nearly) Work-Efficient Dynamic ProgrammingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659958(219-232)Online publication date: 17-Jun-2024
  • (2022)An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix TranspositionInformation Processing Letters10.1016/j.ipl.2021.106166173:COnline publication date: 22-Apr-2022
  • (2017)AutogenACM Transactions on Parallel Computing10.1145/31256324:1(1-30)Online publication date: 5-Oct-2017
  • Show More Cited By
  1. High-Performance Energy-Efficient Recursive Dynamic Programming with Matrix-Multiplication-Like Flexible Kernels

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium
    May 2015
    1110 pages
    ISBN:9781479986491

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 25 May 2015

    Author Tags

    1. cache-oblivious
    2. divide-and-conquer
    3. dynamic programming
    4. flexible kernel
    5. polyhedral compiler
    6. recursive

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Parallel and (Nearly) Work-Efficient Dynamic ProgrammingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659958(219-232)Online publication date: 17-Jun-2024
    • (2022)An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix TranspositionInformation Processing Letters10.1016/j.ipl.2021.106166173:COnline publication date: 22-Apr-2022
    • (2017)AutogenACM Transactions on Parallel Computing10.1145/31256324:1(1-30)Online publication date: 5-Oct-2017
    • (2017)Provably Efficient Scheduling of Cache-oblivious Wavefront AlgorithmsProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087586(339-350)Online publication date: 24-Jul-2017
    • (2016)Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformationsACM SIGPLAN Notices10.1145/3022671.298399351:10(145-164)Online publication date: 19-Oct-2016
    • (2016)AUTOGENACM SIGPLAN Notices10.1145/3016078.285116751:8(1-12)Online publication date: 27-Feb-2016
    • (2016)Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformationsProceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications10.1145/2983990.2983993(145-164)Online publication date: 19-Oct-2016
    • (2016)AUTOGENProceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2851141.2851167(1-12)Online publication date: 27-Feb-2016
    • (2016)An Efficient Cache-oblivious Parallel Viterbi AlgorithmProceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 983310.1007/978-3-319-43659-3_42(574-587)Online publication date: 24-Aug-2016

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media