Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Tartan: evaluating spatial computation for whole program execution

Published: 20 October 2006 Publication History

Abstract

Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system.Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energyefficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.

References

[1]
M. Beck, R. Johnson, et al. From control flow to data flow. Journal of Parallel and Distributed Computing, 12:118--129, 1991.
[2]
T. Bjerregaard and J. Sparsø. A scheduling discipline for latency and bandwidth guarantees in asynchronous Network-on-Chip. In International Symposium on Advanced Research in Asynchronous Circuits and Systems (ASYNC). IEEE, 2005.
[3]
D. Brooks, V. Tiwari, et al. Wattch: a framework for architectural level power analysis and optimizations. In International Symposium on Computer Architecture (ISCA), pages 83--94. ACM Press, 2000.
[4]
M. Budiu, P.V. Artigas, et al. Dataflow: A complement to superscalar. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 177--186, March 20-22 2005.
[5]
M. Budiu and S.C. Goldstein. Compiling application-specific hardware. In International Conference on Field Programmable Logic and Applications (FPL), pages 853--863, September 2002.
[6]
M. Budiu and S.C. Goldstein. Pegasus: An efficient intermediate representation. Technical Report CMU-CS-02-107, Carnegie Mellon University, May 2002.
[7]
M. Budiu, M. Mishra, et al. Peer-to-peer hardware-software interfaces for reconfigurable fabrics. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 57--66, April 2002.
[8]
M. Budiu, G. Venkataramani, et al. Spatial computation. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 14--26, October 2004.
[9]
T. Callahan and J. Wawrzynek. Adapting software pipelining for reconfigurable computing. In Intl. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES). ACM, 2000.
[10]
L. Carter, B. Simon, et al. Path analysis and renaming for predicated instruction scheduling. International Journal of Parallel Programming, special issue, 28(6), 2000.
[11]
T. Chelcea and S. Nowick. Robust interfaces for mixed-timing systems. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems, volume 12-8, pages 857--873, 2004.
[12]
R. Cytron, J. Ferrante, et al. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems (TOPLAS), 13(4):451--490, 1991.
[13]
M.B. Gokhale, J.M. Stone, et al. Co-synthesis to a hybrid RISC/FPGA architecture. J. VLSI Signal Process. Syst., 24(2-3):165--180, 2000.
[14]
S.C. Goldstein. The impact of the nanoscale on computing systems. In IEEE/ACM International Conference on Computer-aided design (ICCAD), 2005.
[15]
S.C. Goldstein, H. Schmit, et al. PipeRench: a coprocessor for streaming multimedia acceleration. In International Symposium on Computer Architecture (ISCA), pages 28--39, May 1999.
[16]
S. Hauck, S.M. Burns, et al. An FPGA for implementing asynchronous circuits. IEEE Design & Test of Computers, 11(3):60--69, 1994.
[17]
J.R. Heath, P.J. Kuekes, et al. A defect-tolerant computer architecture: Opportunities for nanotechnology. Science, 280, 1998.
[18]
N. Huot, H. Dubreuil, et al. FPGA architecture for multi-style asynchronous logic. In Design, Automation and Test in Europe (DATE), pages 32--33. IEEE Computer Society, 2005.
[19]
Intel Corp. Intel Pentium M Datasheet, January 2006.
[20]
J. Kao, S. Narendra, et al. Subthreshold leakage modeling and reduction techniques. In Proceedings of the 2002 IEEE/ACM International Conference on Computer Aided Design (ICCAD), pages 141--148, 2002.
[21]
I. Kuon and J. Rose. Measuring the gap between FPGAs and ASICs. In Proceedings of the International Symposium on Field Programmable ate Arrays (FPGA'06), pages 21-30, February 2006.
[22]
E. Larson, S. Chatterjee, et al. MASE: A novel architecture or detailed microarchitectural modeling. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), November 4-6 2001.
[23]
C. Lee, M. Potkonjak, et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 330--335, 1997.
[24]
S.A. Mahlke, D.C. Lin, et al. Effective compiler support for predicated execution using the hyperblock. In International Symposium on Computer Architecture (ISCA), pages 45--54, Dec 1992.
[25]
K. Mai, T. Paaske, et al. Smart memories: A modular reconfigurable architecture. In International Symposium on Computer Architecture (ISCA), June 2000.
[26]
B.J. Nelson. Remote procedure call. Technical Report CSL-81-9, Xerox Palo Alto Research Center, 1981.
[27]
R. Payne. Self-timed FPGA systems. In W. Moore and W. Luk, editors, International Conference on Field Programmable Logic and Applications (FPL), volume 975 of Lecture Notes in Computer Science, pages 21--35. Springer, 1995.
[28]
R. Razdan and M.D. Smith. A High-Performance Microarchitecture with Hardware-Programmable Functional Units. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 172--80. IEEE/ACM, November 1994.
[29]
S. Rixner, W.J. Dally, et al. A bandwidth-efficient architecture for media processing. In IEEE/ACM International Symposium on Microarchitecture (MICRO), December 1998.
[30]
K. Sankaralingam, R. Nagarajan, et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In International Symposium on Computer Architecture (ISCA), pages 422--433. ACM Press, 2003.
[31]
H. Schmit, D. Whelihan, et al. Piperench: A virtualized programmable datapath in 0.18 micron technology. In IEEE Custom Integrated Circuits Conference, pages 63--66, 2002.
[32]
L. Shang and N. Jha. High-level power modeling of CPLDs and FPGAs. In International Conference on Computer Design (ICCD), pages 46--51, September 2001.
[33]
L. Shang, A.S. Kaviani, et al. Dynamic power consumption in Virtex-II FPGA family. In ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pages 157--164. ACM Press, 2002.
[34]
Standard Performance Evaluation Corp. SPEC INT 95 Benchmark Suite, 1995.
[35]
Standard Performance Evaluation Corp. SPEC INT 2000 Benchmark Suite, 2000.
[36]
I. Sutherland. Micropipelines: Turing award lecture. Communications of the ACM, 32 (6):720--738, June 1989.
[37]
S. Swanson, K. Michelson, et al. Wavescalar. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 291--302, December 2003.
[38]
M.B. Taylor, W. Lee, et al. Evaluation of the RAW microprocessor: An exposed-wire-delay architecture for ILP and streams. In International Symposium on Computer Architecture (ISCA), pages 2--13. IEEE Computer Society, 2004.
[39]
J. Teifel and R. Manohar. An asynchronous dataflow FPGA architecture. IEEE Trans. Computers, 53(11):1376--1392, 2004.
[40]
W. Tsu, K. Macy, et al. HSRA: high-speed, hierarchical synchronous reconfigurable array. In ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pages 125--134. ACM Press, 1999.
[41]
G. Venkataramani, T. Bjerregaard, et al. SOMA: a tool for synthesizing and optimizing memory accesses in ASICs. In International Symposium on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 231--236. ACM Press, 2005.
[42]
G. Venkataramani, M. Budiu, et al. C to asynchronous dataflow circuits: An end-to-end toolflow. In International Workshop on Logic Synthesis, June 2004.
[43]
G. Venkataramani, T. Chelcea, et al. HLS support for unconstrained memory accesses. In International Workshop on Logic Syntheis, June 2005.
[44]
M. Wazlowski, L. Agarwal, et al. PRISM-II compiler and architecture. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 9--16, Apr 1993.
[45]
C. Wong, A. Martin, et al. An architecture for asynchronous FPGAs. In Proceedings of Field Programmable Technology (FPT), pages 170--177, 2003.
[46]
A.Z. Ye, A. Moshovos, et al. CHIMAERA: A high-performance architecture with a tightly-coupled reconfigurable unit. In International Symposium on Computer Architecture (ISCA), ACM Computer Architecture News. ACM Press, 2000.

Cited By

View all
  • (2023)FLEX: Introducing FLEXible Execution on CGRA with Spatio-Temporal Vector Dataflow2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323612(1-9)Online publication date: 28-Oct-2023
  • (2023)Spatula: A Hardware Accelerator for Sparse Matrix FactorizationProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623783(91-104)Online publication date: 28-Oct-2023
  • (2023)MESA: Microarchitecture Extensions for Spatial Architecture GenerationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589084(1-14)Online publication date: 17-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 41, Issue 11
Proceedings of the 2006 ASPLOS Conference
November 2006
425 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1168918
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
    October 2006
    440 pages
    ISBN:1595934510
    DOI:10.1145/1168857
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2006
Published in SIGPLAN Volume 41, Issue 11

Check for updates

Author Tags

  1. asynchronous circuits
  2. dataflow machine
  3. defect tolerance
  4. low power
  5. reconfigurable hardware
  6. spatial computation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)9
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)FLEX: Introducing FLEXible Execution on CGRA with Spatio-Temporal Vector Dataflow2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323612(1-9)Online publication date: 28-Oct-2023
  • (2023)Spatula: A Hardware Accelerator for Sparse Matrix FactorizationProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623783(91-104)Online publication date: 28-Oct-2023
  • (2023)MESA: Microarchitecture Extensions for Spatial Architecture GenerationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589084(1-14)Online publication date: 17-Jun-2023
  • (2023)FLEX: Introducing FLEXible Execution on CGRA with Spatio-Temporal Vector Dataflow2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323612(1-9)Online publication date: 28-Oct-2023
  • (2023)Program Balancing in Compilation for Buffered Hybrid Dataflow Processors2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC57700.2023.00018(57-66)Online publication date: Jun-2023
  • (2022)Buffer Allocation for Exposed Datapath Architectures2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC57363.2022.00013(18-25)Online publication date: Dec-2022
  • (2021)NOVIA: A Framework for Discovering Non-Conventional Inline AcceleratorsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480094(507-521)Online publication date: 18-Oct-2021
  • (2021)Fifer: Practical Acceleration of Irregular Applications on Reconfigurable ArchitecturesMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480048(1064-1077)Online publication date: 18-Oct-2021
  • (2021)DiAG: a dataflow-inspired architecture for general-purpose processorsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446703(93-106)Online publication date: 19-Apr-2021
  • (2020)DSAGENProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00032(268-281)Online publication date: 30-May-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media