Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1152154.1152164acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
Article

Compiling for stream processing

Published: 16 September 2006 Publication History

Abstract

This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses information about the program structure and estimates of kernel and memory operation execution times to overlap kernel execution with memory transfers, maximizing performance, and to optimize use of scarce on-chip memory, significantly reducing external memory bandwidth. Our compiler applies optimizations such as strip-mining, loop unrolling, and software pipelining, at the level of kernels and stream memory operations. We evaluate the performance of our compiler on a suite of media and scientific benchmarks. Our results show that compiler management of on-chip storage reduces external memory bandwidth by 35% to 93% and reduces execution time by 23% to 72% compared to cachelike LRU management of the same storage. We show that strip-mining stream applications enables producer-consumer locality to be captured in on-chip storage reducing external bandwidth by 50% to 80%. We also evaluate the sensitivity of performance to the scheduling methods used and to critical resources. Overall, our compiler is able to overlap memory operations and manage local storage so that 78% to 96% of program execution time is spent in running computational kernels.

References

[1]
The tendra open source project. http://www.tendra.org.
[2]
T. Adam, K. Chandy, and J. Dickson. A comparison of list scheduling for parallel processing systems. J. Communications of the ACM, vol. 17, no. 17, pages 685--690, December 1974.
[3]
J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. Evaluating the imagine stream architecture. In Proceedings of the Annual International Symposium on Computer Architecture, June 2004.
[4]
ClearSpeed. CSX600 datasheet. http://www.clearspeed.com/downloads/CSX600Processor.pdf, 2005.
[5]
T. F. Coleman and Y. Li. An interior trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization, pages 418--445, 1996.
[6]
T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1989.
[7]
W. J. Dally, P. Hanrahan, M. Erez, T. J. Knight, F. Labonte, J. H. Ahn, N. Jayasena, U. Kapasi, A. Das, J. Gummaraju, and I. Buck. Merrimac: Supercomputing with streams. In Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, page 35, Phoenix, Arizona, 2003.
[8]
M. Erez, J. H. Ahn, A. Garg, W. J. Dally, and E. Darve. Analysis and performance results of a molecular modeling application on merrimac. In Proceedings of the SC'04 Conference, Pittsburgh, Pennsylvaniva, November, 2004.
[9]
J. Fabri. Automatic storage optimization. In Proceedings of the ACM SIGPLAN 1979 Symposium on Compiler Construction, pages 83--91, 1979.
[10]
J. Gergov. Algorithms for compile-time memory optimization. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 907--908, 1999.
[11]
D. R. Kaiser. Loop Optimization Techniques on Multi-Issue Architectures. PhD thesis, University of Michigan, 1994.
[12]
T. Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka. A stereo machine for video-rate dense depth mapping and its new applications. In Proceedings of the 15th Computer Vision and Pattern Recognition Conference, pages 196--202, San Francisco, CA, June 18--20, 1996.
[13]
H. Kasahara and S. Narita. Practical multiprocessor scheduling algorithms for efficient parallel processing. IEEE Transactions on Computers, no. 11, pages 1023--1029, November 1984.
[14]
B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, and A. Chang. Imagine: Media processing with streams. IEEE Micro, pages 35--46, Mar/Apr 2001.
[15]
M. Lam. Software pipelining: An effective scheduling technique for vliw machines. In Proceedings of the ACM SIGPLAN 88 Conference on Programming Language Design and Implementation (PLDI), pages 318--328, 1988.
[16]
P. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University, 2001.
[17]
P. Mattson, W. J. Dally, S. Rixner, U. J. Kapasi, and J. D. Owens. Communication scheduling. In Proceedings of the 9th international conference on Architectural support for programming languages and operating systems(ASPLOS), volume 35, pages 82--92, Cambridge, MA, November 2000.
[18]
J. D. Owens. Computer Graphics on a Stream Architecture. PhD thesis, Stanford University, 2002.
[19]
D. Pham, S. A. M. Bollinger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D.Stasiak, M. S. an M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first generation cell processor. In ISSCC Digest of Technical Papers, San Francisco, CA, Feb, 2005.
[20]
S. Rixner. Stream Processor Architecture. Kluwer Academic Publishers, Boston, MA, 2001.
[21]
D. van der Spoel et al. Gromacs user manual version 3.1. http://www.gromacs.org, 2001.
[22]
M. Wolfe. More iteration space tiling. In Proceedings of the Supercomputing 89, pages 655--664, New York, NY, November 1989.

Cited By

View all
  • (2024)The Dataflow Abstract Machine Simulator Framework2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00046(532-547)Online publication date: 29-Jun-2024
  • (2023)Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline Parallelism2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071026(1262-1274)Online publication date: Feb-2023
  • (2020)Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00056(596-608)Online publication date: Oct-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques
September 2006
308 pages
ISBN:159593264X
DOI:10.1145/1152154
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SRF allocation
  2. Stream Operation Precedence (SOP) graph
  3. StreamC
  4. coarse-grained operations
  5. producer-consumer locality
  6. scoreboard slot assignment
  7. software-pipelining
  8. stream programming model
  9. stream scheduling
  10. strip-mining
  11. task level parallelism

Qualifiers

  • Article

Conference

PACT06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)4
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Dataflow Abstract Machine Simulator Framework2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00046(532-547)Online publication date: 29-Jun-2024
  • (2023)Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline Parallelism2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071026(1262-1274)Online publication date: Feb-2023
  • (2020)Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00056(596-608)Online publication date: Oct-2020
  • (2018)Hardware AlgorithmsPrinciples and Structures of FPGAs10.1007/978-981-13-0824-6_6(137-177)Online publication date: 4-Sep-2018
  • (2015)Scheduling stream programs with improving arithmetic unit usage on NoC-based VLIW multi-core architecturesProceedings of the 12th ACM International Conference on Computing Frontiers10.1145/2742854.2742872(1-8)Online publication date: 6-May-2015
  • (2015)Towards Semi-automated Parallelization of Data Stream ProcessingIntelligent Distributed Computing IX10.1007/978-3-319-25017-5_22(235-245)Online publication date: 18-Oct-2015
  • (2014)BobolangProceedings of the 23rd international symposium on High-performance parallel and distributed computing10.1145/2600212.2600711(311-314)Online publication date: 23-Jun-2014
  • (2014)A High-Utilization Scheduling Schemeof Stream Programs on ClusteredVLIW Stream ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.8025:4(840-850)Online publication date: 1-Apr-2014
  • (2014)Exploiting the Inter-cluster Record Reuse for Stream ProcessorsProceedings of the 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)10.1109/HPCC.2014.159(916-921)Online publication date: 20-Aug-2014
  • (2014)A case study of different task implementations for multioutput stages in non-trivial parallel pipeline applicationsParallel Computing10.1016/j.parco.2014.05.00340:8(374-393)Online publication date: Aug-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media