MPSoC Design Using Application-Specific Architecturally Visible Communication

Theo Kluter⁶,
Philip Brisk⁶,
Edoardo Charbon⁶ &
…
Paolo Ienne⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5409))

Included in the following conference series:

International Conference on High-Performance Embedded Architectures and Compilers

962 Accesses
2 Citations

Abstract

This paper advocates the placement of Architecturally Visible Communication (AVC) buffers between adjacent cores in MPSoCs to provide high-throughput communication for streaming applications. Producer/consumer relationships map poorly onto cache-based MPSoCs. Instead, we instantiate application specific AVC buffers on top of a distributed consistent and coherent cache-based system with shared main memory to provide the desired functionality. Using JPEG compression as a case study, we show that the use of AVC buffers in conjunction with parallel execution via heterogeneous software pipelining provides a speedup of as much as 4.2x compared to a baseline single processor system, with an increase in estimated memory energy consumption of only 1.6x. Additionally, we describe a method to integrate the AVC buffers into the L1 cache coherence protocol; this allows the runtime system to guarantee memory safety and coherence in situations where the parallelization of the application may be unsafe due to pointers that could not be resolved at compile time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Double Buffering for MCDRAM on Second Generation $$\hbox {Intel}^{\circledR }$$ Xeon Phi $$^{\text {TM}}$$ Processors with OpenMP

Store Buffer Reduction with MMUs

Finepoints: Partitioned Multithreaded MPI Communication

References

Ahn, J.H., et al.: Evaluating the imagine stream architecture. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, Munich, Germany, pp. 14–25 (2004)
Google Scholar
Amarasinghe, S., et al.: Language and compiler design for streaming applications. International Journal of Parallel Programming 33, 261–278 (2005)
Article Google Scholar
Dally, W.J., et al.: Merrimac: Supercomputing with streams. In: Proceedings of the Fifteenth International Conference on Supercomputing, Phoenix, Arizona, pp. 35–42 (November 2003)
Google Scholar
Das, A., Dally, W.J., Mattson, P.: Compiling for stream processing. In: Proceedings of the 15th International Conference on Parallel Architecture and Compilation Techniques, Seattle, Washington, pp. 33–42 (September 2006)
Google Scholar
Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, pp. 151–162 (October 2006)
Google Scholar
Gordon, M.I., et al.: A stream compiler for communication-exposed architectures. In: Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, pp. 291–303 (October 2002)
Google Scholar
Gummaraju, J., Rosenblum, M.: Stream programming on general-purpose processors. In: Proceedings of the 38th Annual International Symposium on Microarchitecture, Barcelona, Spain, pp. 343–354 (November 2005)
Google Scholar
Halfhill, T.R.: EEMBC releases first benchmarks. Microprocessor Report (May 1, 2000)
Google Scholar
Khailany, B.K., et al.: A programmable 512 gops stream processor for signal, image, and video processing, vol. 43, pp. 202–213. IEEE, Los Alamitos (2008)
Google Scholar
Kudlur, M., Fan, K., Mahlke, S.: Streamroller: Automatic synthesis of prescribed throughput accelerator pipelines. In: Proceedings of the 14th International Conference CODES-ISSS, Seoul, Korea, pp. 270–275 (October 2006)
Google Scholar
Lee, E.A., Messerschmitt, D.G.: Static scheduling of synchronous data flow programs for digital signal processing. IEEE Trans. Comput. 36(1), 24–35 (1987)
Article Google Scholar
Lin, Y., et al.: Hierarchical coarse-grained stream compilation for software defined radio. In: Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, Salzberg, Austria, pp. 115–124 (September 2007)
Google Scholar
Lin, Y., et al.: Soda: A low-power architecture for software-defined radio. In: Proceedings of the 33nd Annual International Symposium on Computer Architecture, Boston, Massachusetts, pp. 89–101 (June 2006)
Google Scholar
Rul, S., Vandierendonck, H., de Bosschere, K.: Detecting the existence of coarse-grain parallelism in general-purpose programs. In: Proceedings of the 1st Workshop on Programmability Issues for Multi-Core Computers, Goteborg, Sweden (January 2008)
Google Scholar
Sermulins, J., et al.: Cache aware optimization of stream programs. In: Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, Chicago, Illinois, pp. 115–126 (June 2005)
Google Scholar
Tarjan, D., Thoziyoor, S., Jouppi, N.P.: CACTI 4.0. Technical Report HPL-2006-86, Hewlett-Packard Development Company, Palo Alto, Calif. (June 2006)
Google Scholar
Taylor, M.B., et al.: Evaluation of the RAW microprocessor: An exposed-wire-delay architecture for ILP and streams. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, Munich, Germany, pp. 2–13 (June 2004)
Google Scholar
Tensilica. Xtensa LX2: Product Brief (April 2007)
Google Scholar
Thies, W., Chandrasekhar, V., Amarasinghe, S.: A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In: Proceedings of the 40th Annual International Symposium on Microarchitecture, Chicago, Illinois, pp. 356–359 (December 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland
Theo Kluter, Philip Brisk, Edoardo Charbon & Paolo Ienne

Authors

Theo Kluter
View author publications
You can also search for this author in PubMed Google Scholar
Philip Brisk
View author publications
You can also search for this author in PubMed Google Scholar
Edoardo Charbon
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Ienne
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRISA, Campus de Beaulieu, 35042, Rennes Cedex, France
André Seznec
Intel Corporation, Massachusetts Microprocessor Design Center, 77 Reed Road, MA 01749, Hudson, USA
Joel Emer
School of Informatics, Institute for Computing Systems Architecture, King’ s Buildings, EH9 3JZ, Edinburgh, United Kingdom
Michael O’Boyle
Department of Electrical Engineering, Princeton University, 34 Olden Street, NJ 08544-5263, Princeton, USA
Margaret Martonosi
Department of Computer Science, University of Augsburg, 86135, Augsburg, Germany
Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kluter, T., Brisk, P., Charbon, E., Ienne, P. (2009). MPSoC Design Using Application-Specific Architecturally Visible Communication. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2009. Lecture Notes in Computer Science, vol 5409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92990-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-92990-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92989-5
Online ISBN: 978-3-540-92990-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MPSoC Design Using Application-Specific Architecturally Visible Communication

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Double Buffering for MCDRAM on Second Generation $$\hbox {Intel}^{\circledR }$$ Xeon Phi $$^{\text {TM}}$$ Processors with OpenMP

Store Buffer Reduction with MMUs

Finepoints: Partitioned Multithreaded MPI Communication

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MPSoC Design Using Application-Specific Architecturally Visible Communication

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Double Buffering for MCDRAM on Second Generation $$\hbox {Intel}^{\circledR }$$ Xeon Phi $$^{\text {TM}}$$ Processors with OpenMP

Store Buffer Reduction with MMUs

Finepoints: Partitioned Multithreaded MPI Communication

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation