Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/998680.1006734acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Evaluating the Imagine Stream Architecture

Published: 02 March 2004 Publication History

Abstract

This paper describes an experimental evaluation of theprototype Imagine stream processor. Imagine [Imagine: Media processing with streams] is a stream processor that employs a two-level register hierarchy with9.7 Kbytes of local register file capacity and 128 Kbytesof stream register file (SRF) capacity to capture producer-consumerlocality in stream applications. Parallelism is exploitedusing an array of 48 floating-point arithmetic unitsorganized as eight SIMD clusters with a 6-wide VLIW percluster. We evaluate the performance of each aspect ofthe Imagine architecture using a set of synthetic micro-benchmarks,key media processing kernels, and full applications.These micro-benchmarks show that the prototypehardware can attain 7.96 GFLOPS or 25.4 GOPS of arithmeticperformance, 12.7 Gbytes/s of SRF bandwidth, 1.58Gbytes/s of memory system bandwidth, and accept up to2 million stream processor instructions per second from ahost processor.On a set of media processing kernels, Imagine sustainedan average of 43% of peak arithmetic performance. Anevaluation of full applications provides a breakdown ofwhere execution time is spent. Over full applications, Imagineachieves 39.4% of peak performance, of the remainderon average 36.4% of time is lost due to load imbalancebetween arithmetic units in the VLIW clusters and limitedinstruction-level parallelism within kernel inner loops,10.6% is due to kernel startup and shutdown overhead becauseof short stream lengths, 7.6% is due to memory stalls,and the rest is due to insufficient host processor bandwidth.Further analysis included in the paper presents the impactof host instruction bandwidth on application performance,particularly on smaller datasets. In summary, the experimentalmeasurements described in this paper demonstratethe high performance and efficiency of stream processing:operating at 200 MHz, Imagine sustains 4.81 GFLOPS onQR decomposition while dissipating 7.42 Watts.

References

[1]
{1} K. C. Cain, J. A. Torres, and R. T. Williams. RT_STAP: Real-time space-time adaptive processing benchmark. Technical Report MTR 96B0000021, MITRE, February 1997.
[2]
{2} S. Chatterji, M. Narayanan, J. Duell, and L. Oliker. Performance evaluation of two emerging media processors: Viram and imagine. In International Parallel and Distributed Processing Symposium, pages 229-235, April 2003.
[3]
{3} W. J. Dally, P. Hanrahan, M. Erez, T. J. Knight, F. Labonté, J. H. Ahn, N. Jayasena, U. J. Kapasi, A. Das, J. Gummaraju, and I. Buck. Merrimac: Supercomputing with streams. In SC2003, November 2003.
[4]
{4} Intel®. Intel® pentium® m processor. http://www.intel. com/design/mobile/datashts/25261202.pdf.
[5]
{5} T. Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka. A stereo machine for video-rate dense depth mapping and its new applications. In Proceedings of the 15th Computer Vision and Pattern Recognition Conference, pages 196-202, San Francisco, CA, June 18-20, 1996.
[6]
{6} U. J. Kapasi, P. Mattson, W. J. Dally, J. D. Owens, and B. Towles. Stream scheduling. Concurrent VLSI Architecture Tech Report 122, Stanford University, Computer Systems Laboratory, March 2002.
[7]
{7} U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream processors. IEEE Computer, pages 54-62, August 2003.
[8]
{8} B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, and A. Chang. Imagine: Media processing with streams. IEEE Micro, pages 35-46, Mar/Apr 2001.
[9]
{9} P. Mattson, W. J. Dally, S. Rixner, U. J. Kapasi, and J. D. Owens. Communication scheduling. In Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pages 82- 92, November 2000.
[10]
{10} K. Proudfoot, W. R. Mark, S. Tzvetkov, and P. Hanrahan. A real-time procedural shading system for programmable graphics hardware. In Proceedings of ACM SIGGRAPH, pages 159-170, August 2001.
[11]
{11} J. S. Rai. A feasibility study on the application of stream architectures for packet processing applications. Master's thesis, North Carolina State University, Raleigh, NC, 2003.
[12]
{12} S. Rajagopal, S. Rixner, and J. R. Cavallaro. A programmable baseband processor design for software defined radios. In 45th IEEE International Midwest Symposium on Circuits and Systems, volume 3, pages 413-416, August 2002.
[13]
{13} S. Rixner. Stream Processor Architecture. Kluwer Academic Publishers, Boston, MA, 2001.
[14]
{14} D. Sager, G. Hinton, M. Upton, T. Chappell, T. D. Fletcher, S. Samaan, and R. Murray. A 0.18µm CMOS IA32 micro-processor with a 4GHz integer execution unit. In 2001 IEEE International Solid-State Circuits Conference Digest of Technical Papers, pages 324-325, February 2001.
[15]
{15} J. Suh, E.-G. Kim, S. P. Crago, L. Srinivasan, and M. C. French. A performance analysis of pim, stream processing, and tiled processing on memory-intensive signal processing kernels. In 30th Annual International Symposium on Computer Architecture, pages 410-421, June 2003.
[16]
{16} Texas Instruments. TMS320C6713 Floating-Point Digital Signal Processors, 2003.03 edition.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture
June 2004
373 pages
ISBN:0769521436
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 32, Issue 2
    ISCA 2004
    March 2004
    373 pages
    ISSN:0163-5964
    DOI:10.1145/1028176
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 March 2004

Check for updates

Qualifiers

  • Article

Conference

ISCA04
Sponsor:

Acceptance Rates

ISCA '04 Paper Acceptance Rate 31 of 217 submissions, 14%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)P2IPMicroprocessors & Microsystems10.1016/j.micpro.2015.06.01039:7(529-540)Online publication date: 1-Oct-2015
  • (2014)Construction and exploitation of VLIW ASIPs with heterogeneous vector-widthsMicroprocessors & Microsystems10.5555/2948290.294836538:8(947-959)Online publication date: 1-Nov-2014
  • (2014)RIVERACM Transactions on Reconfigurable Technology and Systems10.1145/26552387:3(1-16)Online publication date: 3-Sep-2014
  • (2013)Designing on-chip networks for throughput acceleratorsACM Transactions on Architecture and Code Optimization10.1145/251242910:3(1-35)Online publication date: 16-Sep-2013
  • (2012)LibraProceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2012.17(84-95)Online publication date: 1-Dec-2012
  • (2010)Mighty-morphing power-SIMDProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878934(67-76)Online publication date: 24-Oct-2010
  • (2010)Conservation coresProceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems10.1145/1736020.1736044(205-218)Online publication date: 13-Mar-2010
  • (2010)Conservation coresACM SIGPLAN Notices10.1145/1735971.173604445:3(205-218)Online publication date: 13-Mar-2010
  • (2010)Conservation coresACM SIGARCH Computer Architecture News10.1145/1735970.173604438:1(205-218)Online publication date: 13-Mar-2010
  • (2010)Throughput-Effective On-Chip Networks for Manycore AcceleratorsProceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2010.50(421-432)Online publication date: 4-Dec-2010
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media