Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Automatic Hierarchical Parallelization of Linear Recurrences

Published: 19 March 2018 Publication History

Abstract

Linear recurrences encompass many fundamental computations including prefix sums and digital filters. Later result values depend on earlier result values in recurrences, making it a challenge to compute them in parallel. We present a new work- and space-efficient algorithm to compute linear recurrences that is amenable to automatic parallelization and suitable for hierarchical massively-parallel architectures such as GPUs. We implemented our approach in a domain-specific code generator that emits optimized CUDA code. Our evaluation shows that, for standard prefix sums and single-stage IIR filters, the generated code reaches the throughput of memory copy for large inputs, which cannot be surpassed. On higher-order prefix sums, it performs nearly as well as the fastest handwritten code from the literature. On tuple-based prefix sums and digital filters, our automatically parallelized code outperforms the fastest prior implementations.

References

[1]
Alg3: https://github.com/andmax/gpufilter/, accessed 8/8/2017.
[2]
G.E. Blelloch. "Scans as Primitive Parallel Operations." IEEE Transactions on Computers, 38(11):1526--1538. 1989.
[3]
G.E. Blelloch. "Prefix Sums and Their Applications." In John H. Reif (Ed.), Synthesis of Parallel Algorithms, Morgan Kaufmann, 1990.
[4]
G. Chaurasia, J. Ragan-Kelley, S. Paris, G. Drettakis, and F. Durand. "Compiling High Performance Recursive Filters." In Proceedings of the 7th Conference on High-Performance Graphics, pp. 85--94. 2015.
[5]
CUB: https://nvlabs.github.io/cub/, accessed 8/8/2017.
[6]
Y. Dotsenko, N.K. Govindaraju, P.P. Sloan, C. Boyd, and J. Manferdelli. "Fast Scan Algorithms on Graphics Processors." In Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 205--213. 2008.
[7]
J. Hensley, T. Scheuermann, G. Coombe, M. Singh, and A. Lastra. "Fast Summed-Area Table Generation and its Applications." Computer Graphics Forum, 24(3):547--555. 2005.
[8]
W.D. Hillis and G.L. Steele. "Data Parallel Algorithms." Communications of the ACM, 29(12): 1170--1183. 1986.
[9]
R.M. Karp, R.E. Miller, and S. Winograd. "The Organization of Computations for Uniform Recurrence equations." Journal of the ACM, 14:3, pp. 563--590. 1967.
[10]
P.M. Kogge and H.S. Stone. "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations." IEEE Transactions on Computers, 22(8):786--793. 1973.
[11]
S. Maleki, A. Yang, and M. Burtscher. "Higher-Order and Tuple-Based Massively-Parallel Prefix Sums." In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 539--552. 2016.
[12]
B. Merry. "A Performance Comparison of Sort and Scan Libraries for GPUs." World Scientific Publishing Company. 2014.
[13]
D. Merrill and M. Garland. "Single-Pass Parallel Prefix Scan with Decoupled Look-back." NVIDIA Technical Report NVR-2016-002. 2016.
[14]
n-nacci numbers: https://en.wikipedia.org/wiki/Generalizations_of_Fibonacci_numbers, accessed 8/8/2017.
[15]
D. Nehab, A. Maximo, R.S. Lima, and H. Hoppe. "GPU-Efficient Recursive Filtering and Summed-Area Tables." In Proceedings of the SIGGRAPH Asia Conference, pp. 176:1--176:12. 2011.
[16]
A.V. Oppenheim and R.W. Schafer. "Discrete-Time Signal Processing." 3rd Edition. Prentice Hall. 2009.
[17]
Rec: https://github.com/mit-gfx/recfilter, accessed 8/8/2017.
[18]
SAM: http://cs.txstate.edu/~burtscher/research/SAM/, accessed 8/8/2017.
[19]
S. Sengupta, A.E. Lefohn, and J.D. Owens. "A Work-Efficient Step-Efficient Prefix Sum Algorithm." In Proceedings of the Workshop on Edge Computing Using New Commodity Architectures, pp. 26--27. 2006.
[20]
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. "Scan Primitives for GPU Computing." In Proceedings of Graphics Hardware, pp. 97--106. 2007.
[21]
S. Sengupta, M. Harris, and M. Garland. "Efficient Parallel Scan Algorithms for GPUs." NVIDIA. 2008 - gpucomputing.net.
[22]
S.W. Smith. "Digital Signal Processing: A Practical Guide for Engineers and Scientists." Newnes, 2002. ISBN 0--7506--7444-X.
[23]
H.S. Stone. "An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations." Journal of the ACM, 20(1):27--38. 1973.
[24]
W. Sung and S. Mitra. "Efficient Multi-Processor Implementation of Recursive Digital Filters." In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, 11:257--260. 1986.
[25]
W. Thies, M. Karczmarek, and S.P. Amarasinghe. "StreamIt: A Language for Streaming Applications." In Proceedings of the 11th International Conference on Compiler Construction, pp. 179--196. 2002.

Cited By

View all
  • (2021)Parallel Tiled Code for Computing General Linear Recurrence EquationsElectronics10.3390/electronics1017205010:17(2050)Online publication date: 25-Aug-2021
  • (2021)BiPartProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441611(161-174)Online publication date: 17-Feb-2021
  • (2023)cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End PerformanceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607048(1-13)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 53, Issue 2
ASPLOS '18
February 2018
809 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3296957
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
    March 2018
    827 pages
    ISBN:9781450349116
    DOI:10.1145/3173162
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018
Published in SIGPLAN Volume 53, Issue 2

Check for updates

Author Tags

  1. automatic parallelization
  2. code optimization
  3. linear recurrences
  4. prefix sums
  5. recursive filters

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)116
  • Downloads (Last 6 weeks)17
Reflects downloads up to 29 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Parallel Tiled Code for Computing General Linear Recurrence EquationsElectronics10.3390/electronics1017205010:17(2050)Online publication date: 25-Aug-2021
  • (2021)BiPartProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441611(161-174)Online publication date: 17-Feb-2021
  • (2023)cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End PerformanceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607048(1-13)Online publication date: 12-Nov-2023
  • (2021)GPU efficient 1D and 3D recursive filteringDigital Signal Processing10.1016/j.dsp.2021.103076114(103076)Online publication date: Jul-2021
  • (2020)Scaling out speculative execution of finite-state machines with parallel mergeProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374524(160-172)Online publication date: 19-Feb-2020
  • (2020)An efficient parallel strategy for high-cost prefix operationThe Journal of Supercomputing10.1007/s11227-020-03473-xOnline publication date: 5-Nov-2020
  • (2019)Enabling prefix sum parallelism pattern for recurrences with principled function reconstructionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307354(17-28)Online publication date: 16-Feb-2019

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media