article

A stream compiler for communication-exposed architectures

Authors:

Michael I. Gordon,

Michal Karczmarek,

Andrew A. Lamb,

Henry Hoffmann,

Saman AmarasingheAuthors Info & Claims

ACM SIGPLAN Notices, Volume 37, Issue 10

Pages 291 - 303

https://doi.org/10.1145/605432.605428

Published: 01 October 2002 Publication History

Abstract

With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wire-exposed architectures.In this paper, we describe our compiler for StreamIt: a high-level, architecture-independent language for streaming applications. We focus on our backend for the Raw processor. Though StreamIt exposes the parallelism and communication patterns of stream programs, some analysis is needed to adapt a stream program to a software-exposed processor. We describe a partitioning algorithm that employs fission and fusion transformations to adjust the granularity of a stream graph, a layout algorithm that maps a stream graph to a given network topology, and a scheduling strategy that generates a fine-grained static communication pattern for each computational element.We have implemented a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations. Using the cycle-accurate Raw simulator, we demonstrate that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance. We consider this work to be a first step towards a portable programming model for communication-exposed architectures.

References

[1]

Streamit homepage. http://compiler.lcs.mit.edu/streamit.]]

[2]

The Transputer Databook. Inmos Corporation, 1988.]]

[3]

G. Berry and G. Gonthier. The Esterel Synchronous Programming Language: Design, Semantics, Implementation. Science of Computer Programming, 19(2), 1992.]]

Digital Library

[4]

S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. Software Synthesis from Dataflow Graphs. Kluwer Academic Publishers, 1996.]]

Digital Library

[5]

G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete. Cyclo-static dataflow. IEEE Trans. on Signal Processing, 1996.]]

Digital Library

[6]

S. Borkar, R. Cohn, G. Cox, S. Gleason, T. Gross, H. T. Kung, M. Lam, B. Moore, C. Peterson, J. Pieper, L. Rankin, P. S. Tseng, J. Sutton, J. Urbanski, and J. Webb. iWarp: An integrated solution to high-speed parallel computing. In Supercomputing, 1988.]]

Digital Library

[7]

E. Caspi, M. Chu, R. Huang, J. Yeh, J. Wawrzynek, and A. DeHon. Stream Computations Organized for Reconfigurable Execution (SCORE): Extended Abstract. In Proceedings of the Conference on Field Programmable Logic and Applications, 2000.]]

Digital Library

[8]

M. B. T. et. al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro vol 22, Issue 2, 2002.]]

Digital Library

[9]

J. Gaudiot, W. Bohm, T. DeBoni, J. Feo, and P. Mille". The Sisal Model of Functional Programming and its Implementation. In Proceedings of the Second Aizu International Symposium on Parallel Algorithms/Architectures Synthesis, 1997.]]

Digital Library

[10]

T. Gautier, P. L. Guernic, and L. Besnard. Signal: A declarative language for synchronous programming of real-time systems. Springer Verlag Lecture Notes in Computer Science, 274, 1987.]]

Digital Library

[11]

V. Gay-Para, T. Graf, A.-G. Lemonnier, and E. Wais. Kopi Reference manual. http://www.dms.at/kopi/docs/kopi.html, 2001.]]

[12]

T. Gross and D. R. O'Halloron. iWarp, Anatomy of a Parallel Computing System. MIT Press, 1998.]]

Digital Library

[13]

N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proc. of the IEEE, 79(9), 1991.]]

[14]

R. Ho, K. Mai, and M. Horowitz. The Future of Wires. In Proc. of the IEEE, 2001.]]

[15]

C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21(8), 1978.]]

Digital Library

[16]

Inmos Corporation. Occam 2 Reference Manual. Prentice Hall, 1988.]]

[17]

U. J. Kapasi, P. Mattson, W. J. Dally, J. D. Owens, and B. Towles. Stream scheduling. In Proc. of the 3rd Workshop on Media and Streaming Processors, 2001.]]

[18]

S. Kirkpatrick, J. C. D. Gelatt, and M. Vecchi. Optimization by Simulated Annealing. Science, 220(4598), May 1983.]]

[19]

J. Lebak. Polymorphous Computing Architecture (PCA) Example Applications and Description. External Report, Lincoln Laboratory, Mass. Inst. of Technology, 2001.]]

[20]

E. A. Lee. Overview of the Ptolemy Project. UCB/ERL Technical Memorandum UCB/ERL M01/11, Dept. EECS, University of California, Berkeley, CA, March 2001.]]

[21]

W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. P. Amarasinghe. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. In Architectural Support for Programming Languages and Operating Systems, 1998.]]

Digital Library

[22]

K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, and M. Horowitz. Smart memories: A modular recongurable architecture. In ISCA 2000, Vancouver, BC, Canada.]]

Digital Library

[23]

D. May, R. Shepherd, and C. Keane. Communicating Process Architecture: Transputers and Occam. Future Parallel Computers: An Advanced Course, Pisa, Lecture Notes in Computer Science, 272, June 1987.]]

Digital Library

[24]

A. Mitschele-Thiel. Automatic Configuration and Optimization of Parallel Transputer Applications. Transputer Applications and Systems '93, 1993.]]

[25]

D. R. O'Hallaron. The ASSIGN Parallel Program Generator. Carnegie Mellon Technical Report CMU-CS-91-141, 1991.]]

[26]

T. A. Proebsting and S. A. Watterson. Filter Fusion. In POPL, 1996.]]

Digital Library

[27]

S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In International Symposium on Microarchitecture, 1998.]]

Digital Library

[28]

K. Sankaralingam, R. Nagarajan, S. Keckler, and D. Burger. A Technology-Scalable Architecture for Fast Clocks and High ILP. University of Texas at Austin, Dept. of Computer Sciences Technical Report TR-01-02, 2001.]]

Digital Library

[29]

R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.]]

[30]

M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures. Technical Report MIT-LCS-TR-859, Mass. Inst. of Technology, July 2002.]]

[31]

W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of the International Conference on Compiler Construction, to appear, Grenoble, France, 2002.]]

Digital Library

[32]

W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Hoffmann, M. Brown, and S. Amarasinghe. StreamIt: A Compiler for Streaming Applications. MIT-LCS Technical Memo LCS-TM-622, Cambridge, MA, 2001.]]

[33]

E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9), 1997.]]

Digital Library

Cited By

Lin JQu HMa SJi XLi HLi XSong CZhang W(2024)SongC: A Compiler for Hybrid Near-Memory and In-Memory Many-Core ArchitectureIEEE Transactions on Computers10.1109/TC.2023.331194873:10(2420-2433)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TC.2023.3311948
Mencagli GDazzi PTonci NPierre GFerreira PShrira L(2018)SpinStreamsProceedings of the 19th International Middleware Conference10.1145/3274808.3274814(66-79)Online publication date: 26-Nov-2018
https://dl.acm.org/doi/10.1145/3274808.3274814
Khasanov RGoens ACastrillon J(2018)Implicit Data-Parallelism in Kahn Process NetworksProceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/3183767.3183790(20-25)Online publication date: 23-Jan-2018
https://dl.acm.org/doi/10.1145/3183767.3183790
Show More Cited By

A stream compiler for communication-exposed architectures
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types

Recommendations

A stream compiler for communication-exposed architectures

With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed ...
A stream compiler for communication-exposed architectures
ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems

With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed ...
A stream compiler for communication-exposed architectures
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems

With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 37, Issue 10

October 2002

296 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/605432

Issue’s Table of Contents

ASPLOS X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
October 2002
318 pages
ISBN:1581135742
DOI:10.1145/605397
Conference Chair:
Kourosh Gharachorloo
Compaq Western Research Lab
,
Program Chair:
David A. Wood

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2002

Published in SIGPLAN Volume 37, Issue 10

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

331
Total Citations
View Citations
1,368
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin JQu HMa SJi XLi HLi XSong CZhang W(2024)SongC: A Compiler for Hybrid Near-Memory and In-Memory Many-Core ArchitectureIEEE Transactions on Computers10.1109/TC.2023.331194873:10(2420-2433)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TC.2023.3311948
Mencagli GDazzi PTonci NPierre GFerreira PShrira L(2018)SpinStreamsProceedings of the 19th International Middleware Conference10.1145/3274808.3274814(66-79)Online publication date: 26-Nov-2018
https://dl.acm.org/doi/10.1145/3274808.3274814
Khasanov RGoens ACastrillon J(2018)Implicit Data-Parallelism in Kahn Process NetworksProceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/3183767.3183790(20-25)Online publication date: 23-Jan-2018
https://dl.acm.org/doi/10.1145/3183767.3183790
Rouxel BDerrien SPuaut I(2017)Tightening Contention Delays While Scheduling Parallel Applications on Multi-core ArchitecturesACM Transactions on Embedded Computing Systems10.1145/312649616:5s(1-20)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126496
Karácsony MClaessen KDuke DKameyama Y(2016)Using fusion to enable late design decisions for pipelined computationsProceedings of the 5th International Workshop on Functional High-Performance Computing10.1145/2975991.2975993(9-16)Online publication date: 8-Sep-2016
https://dl.acm.org/doi/10.1145/2975991.2975993
Nguyen DLee JFranke BWu YRastello F(2016)Communication-aware mapping of stream graphs for multi-GPU platformsProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854055(94-104)Online publication date: 29-Feb-2016
https://dl.acm.org/doi/10.1145/2854038.2854055
Ul-Abdin ZSvensson B(2016)A Retargetable Compilation Framework for Heterogeneous Reconfigurable ComputingACM Transactions on Reconfigurable Technology and Systems10.1145/28439469:4(1-22)Online publication date: 11-Aug-2016
https://dl.acm.org/doi/10.1145/2843946
Klikpo EKhatib JMunier-Kordon A(2016)Modeling Multi-Periodic Simulink Systems by Synchronous Dataflow Graphs2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS.2016.7461343(1-10)Online publication date: Apr-2016
https://doi.org/10.1109/RTAS.2016.7461343
Chaurasia GRagan-Kelley JParis SDrettakis GDurand FEisemann EClarberg PDoggett MMolnar SFatahalian KMunkberg J(2015)Compiling high performance recursive filtersProceedings of the 7th Conference on High-Performance Graphics10.1145/2790060.2790063(85-94)Online publication date: 7-Aug-2015
https://dl.acm.org/doi/10.1145/2790060.2790063
Lee JSamadi MMahlke S(2015)Orchestrating Multiple Data-Parallel Kernels on Multiple DevicesProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.14(355-366)Online publication date: 18-Oct-2015
https://dl.acm.org/doi/10.1109/PACT.2015.14
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents