Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A stream compiler for communication-exposed architectures

Published: 01 October 2002 Publication History

Abstract

With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures.In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, some analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element.We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.

References

[1]
Streamit homepage. http://compiler.lcs.mit.edu/streamit.]]
[2]
The Transputer Databook. Inmos Corporation, 1988.]]
[3]
G. Berry and G. Gonthier. The Esterel Synchronous Programming Language: Design, Semantics, Implementation. Science of Computer Programming, 19(2), 1992.]]
[4]
S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996.]]
[5]
G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete. Cyclo-static dataflow. IEEE Trans. on Signal Processing, 1996.]]
[6]
S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H. T. Kung, M. Lam, B. Moore, C. Peterson, J. Pieper, L. Rankin, P. S. Tseng, J. Sutton, J. Urbanski, and J. Webb. iWarp: An integrated solution to high-speed parallel computing. In Supercomputing, 1988.]]
[7]
E. Caspi, M. Chu, R. Huang, J. Yeh, J. Wawrzynek, and A. DeHon. Stream Computations Organized for Reconfigurable Execution (SCORE): Extended Abstract. In Proceedings of the Conference on Field Programmable Logic and Applications, 2000.]]
[8]
M. B. T. et. al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro vol 22, Issue 2, 2002.]]
[9]
J. Gaudiot, W. Bohm, T. DeBoni, J. Feo, and P. Mille". The Sisal Model of Functional Programming and its Implementation. In Proceedings of the Second Aizu International Symposium on Parallel Algorithms/Architectures Synthesis, 1997.]]
[10]
T. Gautier, P. L. Guernic, and L. Besnard. Signal: A declarative language for synchronous programming of real-time systems. Springer Verlag Lecture Notes in Computer Science, 274, 1987.]]
[11]
V. Gay-Para, T. Graf, A.-G. Lemonnier, and E. Wais. Kopi Reference manual. http://www.dms.at/kopi/docs/kopi.html, 2001.]]
[12]
T. Gross and D. R. O'Halloron. iWarp, Anatomy of a Parallel Computing System. MIT Press, 1998.]]
[13]
N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proc. of the IEEE, 79(9), 1991.]]
[14]
R. Ho, K. Mai, and M. Horowitz. The Future of Wires. In Proc. of the IEEE, 2001.]]
[15]
C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21(8), 1978.]]
[16]
Inmos Corporation. Occam 2 Reference Manual. Prentice Hall, 1988.]]
[17]
U. J. Kapasi, P. Mattson, W. J. Dally, J. D. Owens, and B. Towles. Stream scheduling. In Proc. of the 3rd Workshop on Media and Streaming Processors, 2001.]]
[18]
S. Kirkpatrick, J. C. D. Gelatt, and M. Vecchi. Optimization by Simulated Annealing. Science, 220(4598), May 1983.]]
[19]
J. Lebak. Polymorphous Computing Architecture (PCA) Example Applications and Description. External Report, Lincoln Laboratory, Mass. Inst. of Technology, 2001.]]
[20]
E. A. Lee. Overview of the Ptolemy Project. UCB/ERL Technical Memorandum UCB/ERL M01/11, Dept. EECS, University of California, Berkeley, CA, March 2001.]]
[21]
W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. P. Amarasinghe. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. In Architectural Support for Programming Languages and Operating Systems, 1998.]]
[22]
K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, and M. Horowitz. Smart memories: A modular recongurable architecture. In ISCA 2000, Vancouver, BC, Canada.]]
[23]
D. May, R. Shepherd, and C. Keane. Communicating Process Architecture: Transputers and Occam. Future Parallel Computers: An Advanced Course, Pisa, Lecture Notes in Computer Science, 272, June 1987.]]
[24]
A. Mitschele-Thiel. Automatic Configuration and Optimization of Parallel Transputer Applications. Transputer Applications and Systems '93, 1993.]]
[25]
D. R. O'Hallaron. The ASSIGN Parallel Program Generator. Carnegie Mellon Technical Report CMU-CS-91-141, 1991.]]
[26]
T. A. Proebsting and S. A. Watterson. Filter Fusion. In POPL, 1996.]]
[27]
S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In International Symposium on Microarchitecture, 1998.]]
[28]
K. Sankaralingam, R. Nagarajan, S. Keckler, and D. Burger. A Technology-Scalable Architecture for Fast Clocks and High ILP. University of Texas at Austin, Dept. of Computer Sciences Technical Report TR-01-02, 2001.]]
[29]
R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.]]
[30]
M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures. Technical Report MIT-LCS-TR-859, Mass. Inst. of Technology, July 2002.]]
[31]
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of the International Conference on Compiler Construction, to appear, Grenoble, France, 2002.]]
[32]
W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Hoffmann, M. Brown, and S. Amarasinghe. StreamIt: A Compiler for Streaming Applications. MIT-LCS Technical Memo LCS-TM-622, Cambridge, MA, 2001.]]
[33]
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9), 1997.]]

Cited By

View all
  • (2024)SongC: A Compiler for Hybrid Near-Memory and In-Memory Many-Core ArchitectureIEEE Transactions on Computers10.1109/TC.2023.331194873:10(2420-2433)Online publication date: 1-Oct-2024
  • (2018)SpinStreamsProceedings of the 19th International Middleware Conference10.1145/3274808.3274814(66-79)Online publication date: 26-Nov-2018
  • (2018)Implicit Data-Parallelism in Kahn Process NetworksProceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/3183767.3183790(20-25)Online publication date: 23-Jan-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 37, Issue 10
October 2002
296 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/605432
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
    October 2002
    318 pages
    ISBN:1581135742
    DOI:10.1145/605397
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2002
Published in SIGPLAN Volume 37, Issue 10

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SongC: A Compiler for Hybrid Near-Memory and In-Memory Many-Core ArchitectureIEEE Transactions on Computers10.1109/TC.2023.331194873:10(2420-2433)Online publication date: 1-Oct-2024
  • (2018)SpinStreamsProceedings of the 19th International Middleware Conference10.1145/3274808.3274814(66-79)Online publication date: 26-Nov-2018
  • (2018)Implicit Data-Parallelism in Kahn Process NetworksProceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/3183767.3183790(20-25)Online publication date: 23-Jan-2018
  • (2017)Tightening Contention Delays While Scheduling Parallel Applications on Multi-core ArchitecturesACM Transactions on Embedded Computing Systems10.1145/312649616:5s(1-20)Online publication date: 27-Sep-2017
  • (2016)Using fusion to enable late design decisions for pipelined computationsProceedings of the 5th International Workshop on Functional High-Performance Computing10.1145/2975991.2975993(9-16)Online publication date: 8-Sep-2016
  • (2016)Communication-aware mapping of stream graphs for multi-GPU platformsProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854055(94-104)Online publication date: 29-Feb-2016
  • (2016)A Retargetable Compilation Framework for Heterogeneous Reconfigurable ComputingACM Transactions on Reconfigurable Technology and Systems10.1145/28439469:4(1-22)Online publication date: 11-Aug-2016
  • (2016)Modeling Multi-Periodic Simulink Systems by Synchronous Dataflow Graphs2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS.2016.7461343(1-10)Online publication date: Apr-2016
  • (2015)Compiling high performance recursive filtersProceedings of the 7th Conference on High-Performance Graphics10.1145/2790060.2790063(85-94)Online publication date: 7-Aug-2015
  • (2015)Orchestrating Multiple Data-Parallel Kernels on Multiple DevicesProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.14(355-366)Online publication date: 18-Oct-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media