Article

A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Authors:

Stephen P. Crago,

Lakshmi Srinivasan,

Matthew C. FrenchAuthors Info & Claims

ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture

Pages 410 - 421

https://doi.org/10.1145/859618.859665

Published: 01 May 2003 Publication History

Abstract

Trends in microprocessors of increasing die size and clock speed and decreasing feature sizes have fueled rapidly increasing performance. However, the limited improvements in DRAM latency and bandwidth and diminishing returns of increasing superscalar ILP and cache sizes have led to the proposal of new microprocessor architectures that implement processor-in- memory, stream processing, and tiled processing. Each architecture is typically evaluated separately and compared to a baseline architecture. In this paper, we evaluate the performance of processors that implement these architectures on a common set of signal processing kernels.The implementation results are compared with the measured performance of a conventional system based on the PowerPC with Altivec. The results show that these new processors show significant improvements over conventional systems and that each architecture has its own strengths and weaknesses.

References

[1]

Apple, http://www.apple.com/powermac/, 2002.

[2]

M. Gordon, W Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe, "A Stream Compiler for Communication-Exposed Architectures, MIT Tech. Memo TM-627, Cambridge, MA, March, 2002.

[3]

A. Gupta, J. L. Hennessy, K. Gharachorloo, T. Mowry, and W. D. Weber, "Computative Evaluation of Latency Reducing and Tolerating Techniques," Proc. 18th Annual International Symposium on Computer Architecture, Toronto, May 1991.

Digital Library

[4]

B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, and A. Chang., "Imagine: Media Processing with Streams," IEEE Micro, March/April 2001, pp. 35--46.

Digital Library

[5]

C. Kozyrakis, "Scalable Vector Media-processors for Embedded Systems," Ph. D. dissertation, UC Berkeley, May 2002.

Digital Library

[6]

U. Kapasi, W. J. Dally, S. Rixner, J. D. Owens, and B. Khailany, "The Imagine Stream Processor," International Conference on Computer Design, Freiburg, Germany, September 2002.

Digital Library

[7]

J. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 2nd Edition, Morgan Kaufmann Publishers, Inc., 1996.

Digital Library

[8]

Mitsubishi Microcomputers, M32000D4BFP-80 Data Book, http://www.mitsubishichips.com/data/datasheets/mcus/mcupdf/ds/e32r80.pdf.

[9]

J. D. Owens, S. Rixner, U. J. Kapasi, P. Mattson, B. Towles, B. Serebrin, and W. J. Dally, "Media Processing Applications on the Imagine," Stream Processor Proceedings of International Conference on Computer Design, Freiburg, Germany, September 2002.

Digital Library

[10]

S. A. Przybylski, Cache and Memory Hierarchy Design: A Performance-Directed Approach, Morgan Kaufmann Publishers, San Mateo, CA, 1990.

Digital Library

[11]

S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens, "A Bandwidth-Efficient Architecture for Media Processing," 31st Annual International Symposium on Microarchitecture, Dallas, Texas, November 1998.

Digital Library

[12]

A. J. Smith, "Cache Memories," Computing Surveys, Vol. 14, No. 3, pp. 473--530, 1982.

Digital Library

[13]

J. Suh and S. P. Crago, "PIM- and Stream Processor-based Processing for Radar Signal Applications," MSP 02, Austine, TX, 2002.

[14]

J. Suh, S. P. Crago, C. Li, and R. Parker, "A PIM-based Multiprocessor System," International Parallel and Distributed Processing Symposium, San Francisco, CA, 2000.

Digital Library

[15]

M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffmann, P. Johnson, W. Lee, A. Saraf, N. Shnidman, V. Strumpen, S. Amarasinghe, and A. Agarwal, "A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network," Proceedings of the IEEE International Solid- State Circuits Conference, February 2003.

[16]

M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal, "Scalar Operand Networks: On-chip Interconnect for ILP in Partitioned Architectures," International Symposium on High Performance Computer Architecture, February 2003.

Digital Library

[17]

C. Kozyrakis, D. Patterson, "Vector Vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks," 35th International Symposium on Microarchitecture, Instabul, Turkey, November 2002.

Digital Library

Cited By

Yoon IKhan ADatta SRaychowdhury AChang MNi KJerry MGangopadhyay SSmith GHamam TRomberg JNarayanan V(2019)A FerroFET-Based In-Memory Processor for Solving Distributed and Iterative Optimizations via Least-Squares MethodIEEE Journal on Exploratory Solid-State Computational Devices and Circuits10.1109/JXCDC.2019.29302225:2(132-141)Online publication date: Dec-2019
https://doi.org/10.1109/JXCDC.2019.2930222
Taylor MLee WMiller JWentzlaff DBratt IGreenwald BHoffmann HJohnson PKim JPsota JSaraf AShnidman NStrumpen VFrank MAmarasinghe SAgarwal A(2009)Tiled Multicore ProcessorsMulticore Processors and Systems10.1007/978-1-4419-0263-4_1(1-33)Online publication date: 3-Aug-2009
https://doi.org/10.1007/978-1-4419-0263-4_1
Leverich JArakida HSolomatnikov AFiroozshahian AHorowitz MKozyrakis C(2008)Comparative evaluation of memory models for chip multiprocessorsACM Transactions on Architecture and Code Optimization10.1145/1455650.14556515:3(1-30)Online publication date: 1-Dec-2008
https://dl.acm.org/doi/10.1145/1455650.1455651
Show More Cited By

A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels
1. General and reference
  1. Cross-computing tools and techniques
2. Social and professional topics
  1. Professional topics
    1. Computing profession

Recommendations

A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels
ISCA 2003

Trends in microprocessors of increasing die size and clock speed and decreasing feature sizes have fueled rapidly increasing performance. However, the limited improvements in DRAM latency and bandwidth and diminishing returns of increasing superscalar ...
Performance Evaluation and Benchmarking of Native Signal Processing
Euro-Par '99: Proceedings of the 5th International Euro-Par Conference on Parallel Processing

DSP processor growth is phenomenal and continues to grow rapidly, but general-purpose microprocessors have entered the multimedia and signal processing oriented stream by adding DSP functionality to the instruction set and also providing optimized ...
Mini-graph processing

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture

June 2003

432 pages

ISBN:0769519458

DOI:10.1145/859618

Conference Chair:
Allan Gottlieb
New York University & NEC Laboratories America
,
Program Chair:
Kai Li
Princeton University

ACM SIGARCH Computer Architecture News Volume 31, Issue 2
ISCA 2003
May 2003
422 pages
ISSN:0163-5964
DOI:10.1145/871656
Issue’s Table of Contents

Copyright © 2003 Authors.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA03

Sponsor:

SIGARCH

ISCA03: International Symposium on Computer Architecture

June 9 - 11, 2003

California, San Diego

Acceptance Rates

ISCA '03 Paper Acceptance Rate 36 of 184 submissions, 20%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
885
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)1

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yoon IKhan ADatta SRaychowdhury AChang MNi KJerry MGangopadhyay SSmith GHamam TRomberg JNarayanan V(2019)A FerroFET-Based In-Memory Processor for Solving Distributed and Iterative Optimizations via Least-Squares MethodIEEE Journal on Exploratory Solid-State Computational Devices and Circuits10.1109/JXCDC.2019.29302225:2(132-141)Online publication date: Dec-2019
https://doi.org/10.1109/JXCDC.2019.2930222
Taylor MLee WMiller JWentzlaff DBratt IGreenwald BHoffmann HJohnson PKim JPsota JSaraf AShnidman NStrumpen VFrank MAmarasinghe SAgarwal A(2009)Tiled Multicore ProcessorsMulticore Processors and Systems10.1007/978-1-4419-0263-4_1(1-33)Online publication date: 3-Aug-2009
https://doi.org/10.1007/978-1-4419-0263-4_1
Leverich JArakida HSolomatnikov AFiroozshahian AHorowitz MKozyrakis C(2008)Comparative evaluation of memory models for chip multiprocessorsACM Transactions on Architecture and Code Optimization10.1145/1455650.14556515:3(1-30)Online publication date: 1-Dec-2008
https://dl.acm.org/doi/10.1145/1455650.1455651
Diamos GYalamanchili SParashar MSchwan KWeissman JLaforenza D(2008)HarmonyProceedings of the 17th international symposium on High performance distributed computing10.1145/1383422.1383447(197-200)Online publication date: 23-Jun-2008
https://dl.acm.org/doi/10.1145/1383422.1383447
Du JYang XWang GTang TZeng K(2007)Architecture-based optimization for mapping scientific applications to imagineProceedings of the 5th international conference on Parallel and Distributed Processing and Applications10.5555/2395970.2395978(32-43)Online publication date: 29-Aug-2007
https://dl.acm.org/doi/10.5555/2395970.2395978
Leverich JArakida HSolomatnikov AFiroozshahian AHorowitz MKozyrakis C(2007)Comparing memory systems for chip multiprocessorsACM SIGARCH Computer Architecture News10.1145/1273440.125070735:2(358-368)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1273440.1250707
Leverich JArakida HSolomatnikov AFiroozshahian AHorowitz MKozyrakis CTullsen DCalder B(2007)Comparing memory systems for chip multiprocessorsProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250707(358-368)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1250662.1250707
Göddeke DStrzodka RTurek S(2007)Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulationsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/1744576060112207622:4(221-256)Online publication date: 1-Jan-2007
https://dl.acm.org/doi/10.1080/17445760601122076
Du JYang XWang GTang TZeng K(2007)Architecture-Based Optimization for Mapping Scientific Applications to ImagineParallel and Distributed Processing and Applications10.1007/978-3-540-74742-0_6(32-43)Online publication date: 2007
https://doi.org/10.1007/978-3-540-74742-0_6
Teller JOzguner FEwing R(2006)The Morphable Nanoprocessor Architecture: Reconfiguration at Runtime2006 49th IEEE International Midwest Symposium on Circuits and Systems10.1109/MWSCAS.2006.382082(399-403)Online publication date: Aug-2006
https://doi.org/10.1109/MWSCAS.2006.382082
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents