Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/859618.859665acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Published: 01 May 2003 Publication History

Abstract

Trends in microprocessors of increasing die size and clock speed and decreasing feature sizes have fueled rapidly increasing performance. However, the limited improvements in DRAM latency and bandwidth and diminishing returns of increasing superscalar ILP and cache sizes have led to the proposal of new microprocessor architectures that implement processor-in- memory, stream processing, and tiled processing. Each architecture is typically evaluated separately and compared to a baseline architecture. In this paper, we evaluate the performance of processors that implement these architectures on a common set of signal processing kernels.The implementation results are compared with the measured performance of a conventional system based on the PowerPC with Altivec. The results show that these new processors show significant improvements over conventional systems and that each architecture has its own strengths and weaknesses.

References

[1]
Apple, http://www.apple.com/powermac/, 2002.
[2]
M. Gordon, W Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe, "A Stream Compiler for Communication-Exposed Architectures, MIT Tech. Memo TM-627, Cambridge, MA, March, 2002.
[3]
A. Gupta, J. L. Hennessy, K. Gharachorloo, T. Mowry, and W. D. Weber, "Computative Evaluation of Latency Reducing and Tolerating Techniques," Proc. 18th Annual International Symposium on Computer Architecture, Toronto, May 1991.
[4]
B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, and A. Chang., "Imagine: Media Processing with Streams," IEEE Micro, March/April 2001, pp. 35--46.
[5]
C. Kozyrakis, "Scalable Vector Media-processors for Embedded Systems," Ph. D. dissertation, UC Berkeley, May 2002.
[6]
U. Kapasi, W. J. Dally, S. Rixner, J. D. Owens, and B. Khailany, "The Imagine Stream Processor," International Conference on Computer Design, Freiburg, Germany, September 2002.
[7]
J. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 2nd Edition, Morgan Kaufmann Publishers, Inc., 1996.
[8]
Mitsubishi Microcomputers, M32000D4BFP-80 Data Book, http://www.mitsubishichips.com/data/datasheets/mcus/mcupdf/ds/e32r80.pdf.
[9]
J. D. Owens, S. Rixner, U. J. Kapasi, P. Mattson, B. Towles, B. Serebrin, and W. J. Dally, "Media Processing Applications on the Imagine," Stream Processor Proceedings of International Conference on Computer Design, Freiburg, Germany, September 2002.
[10]
S. A. Przybylski, Cache and Memory Hierarchy Design: A Performance-Directed Approach, Morgan Kaufmann Publishers, San Mateo, CA, 1990.
[11]
S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens, "A Bandwidth-Efficient Architecture for Media Processing," 31st Annual International Symposium on Microarchitecture, Dallas, Texas, November 1998.
[12]
A. J. Smith, "Cache Memories," Computing Surveys, Vol. 14, No. 3, pp. 473--530, 1982.
[13]
J. Suh and S. P. Crago, "PIM- and Stream Processor-based Processing for Radar Signal Applications," MSP 02, Austine, TX, 2002.
[14]
J. Suh, S. P. Crago, C. Li, and R. Parker, "A PIM-based Multiprocessor System," International Parallel and Distributed Processing Symposium, San Francisco, CA, 2000.
[15]
M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffmann, P. Johnson, W. Lee, A. Saraf, N. Shnidman, V. Strumpen, S. Amarasinghe, and A. Agarwal, "A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network," Proceedings of the IEEE International Solid- State Circuits Conference, February 2003.
[16]
M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal, "Scalar Operand Networks: On-chip Interconnect for ILP in Partitioned Architectures," International Symposium on High Performance Computer Architecture, February 2003.
[17]
C. Kozyrakis, D. Patterson, "Vector Vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks," 35th International Symposium on Microarchitecture, Instabul, Turkey, November 2002.

Cited By

View all
  • (2019)A FerroFET-Based In-Memory Processor for Solving Distributed and Iterative Optimizations via Least-Squares MethodIEEE Journal on Exploratory Solid-State Computational Devices and Circuits10.1109/JXCDC.2019.29302225:2(132-141)Online publication date: Dec-2019
  • (2009)Tiled Multicore ProcessorsMulticore Processors and Systems10.1007/978-1-4419-0263-4_1(1-33)Online publication date: 3-Aug-2009
  • (2008)Comparative evaluation of memory models for chip multiprocessorsACM Transactions on Architecture and Code Optimization10.1145/1455650.14556515:3(1-30)Online publication date: 1-Dec-2008
  • Show More Cited By
  1. A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture
      June 2003
      432 pages
      ISBN:0769519458
      DOI:10.1145/859618
      • Conference Chair:
      • Allan Gottlieb,
      • Program Chair:
      • Kai Li
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 31, Issue 2
        ISCA 2003
        May 2003
        422 pages
        ISSN:0163-5964
        DOI:10.1145/871656
        Issue’s Table of Contents

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 May 2003

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      ISCA03
      Sponsor:
      ISCA03: International Symposium on Computer Architecture
      June 9 - 11, 2003
      California, San Diego

      Acceptance Rates

      ISCA '03 Paper Acceptance Rate 36 of 184 submissions, 20%;
      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)25
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 26 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)A FerroFET-Based In-Memory Processor for Solving Distributed and Iterative Optimizations via Least-Squares MethodIEEE Journal on Exploratory Solid-State Computational Devices and Circuits10.1109/JXCDC.2019.29302225:2(132-141)Online publication date: Dec-2019
      • (2009)Tiled Multicore ProcessorsMulticore Processors and Systems10.1007/978-1-4419-0263-4_1(1-33)Online publication date: 3-Aug-2009
      • (2008)Comparative evaluation of memory models for chip multiprocessorsACM Transactions on Architecture and Code Optimization10.1145/1455650.14556515:3(1-30)Online publication date: 1-Dec-2008
      • (2008)HarmonyProceedings of the 17th international symposium on High performance distributed computing10.1145/1383422.1383447(197-200)Online publication date: 23-Jun-2008
      • (2007)Architecture-based optimization for mapping scientific applications to imagineProceedings of the 5th international conference on Parallel and Distributed Processing and Applications10.5555/2395970.2395978(32-43)Online publication date: 29-Aug-2007
      • (2007)Comparing memory systems for chip multiprocessorsACM SIGARCH Computer Architecture News10.1145/1273440.125070735:2(358-368)Online publication date: 9-Jun-2007
      • (2007)Comparing memory systems for chip multiprocessorsProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250707(358-368)Online publication date: 9-Jun-2007
      • (2007)Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulationsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/1744576060112207622:4(221-256)Online publication date: 1-Jan-2007
      • (2007)Architecture-Based Optimization for Mapping Scientific Applications to ImagineParallel and Distributed Processing and Applications10.1007/978-3-540-74742-0_6(32-43)Online publication date: 2007
      • (2006)The Morphable Nanoprocessor Architecture: Reconfiguration at Runtime2006 49th IEEE International Midwest Symposium on Circuits and Systems10.1109/MWSCAS.2006.382082(399-403)Online publication date: Aug-2006
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media