Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. “Dark Silicon”), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-efficiency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFG is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Govindaraju V, Ho C H, Sankaralingam K. Dynamically specialized datapaths for energy efficient computing. In Proc. the 17th Symp. High Performance Computer Architecture (HPCA), February 2011, pp.503-514.

  2. Venkatesh G, Sampson J, Goulding N et al. Conservation cores: Reducing the energy of mature computations. ACM SIGARCH Computer Architecture News, 2010, 38(1): 205-218.

    Article  Google Scholar 

  3. Guha A, Zhang Y, ur Rasool R et al. Systematic evaluation of workload clustering for extremely energy-efficient architectures. ACM SIGARCH Computer Architecture News, 2013, 41(2): 22-29.

    Article  Google Scholar 

  4. Cong J, Ghodrat M A, Gill M et al. Architecture support in accelerator-rich CMPs. In Proc. the 49th Annual Design Automation Conference (DAC), June 2012, pp.843-849.

  5. Hameed R, Qadeer W, Wachs M et al. Understanding sources of inefficiency in general-purpose chips. In Proc. the 37th ISCA, June 2010, pp.37-47.

  6. Memik G, Memik S O, Mangione-Smith W H. Design and analysis of a layer seven network processor accelerator using reconfigurable logic. In Proc. the 10th IEEE Symposium on Field-Programmable Custom Computing Machines, April 2002, pp.131-140.

  7. Yoon C W, Woo R, Kook J et al. An 80/20-MHz 160-mW multimedia processor integrated with embedded DRAM, MPEG-4 accelerator and 3-D rendering engine for mobile applications. IEEE Journal of Solid-State Circuits, 2001, 36(11): 1758-1767.

    Article  Google Scholar 

  8. Steinkraus D, Buck I, Simard P Y. Using GPUs for machine learning algorithms. In Proc. the 8th Int. Conf. Document Analysis and Recognition, August 29-September 1, 2005, pp.1115-1119.

  9. Pionteck T, Staake T, Stiefmeier T et al. Design of a reconfigurable AES encryption/decryption engine for mobile terminals. In Proc. Int. Symp. Circuits and Systems, May 2004, Vol.2, pp.545-548.

    Google Scholar 

  10. Lattner C, Adve V. LLVM: A compilation framework for life-long program analysis & transformation. In Proc. Int. Symp. Code Generation and Optimization: Feedback-Directed and Runtime Optimization, March 2004, pp.75-86.

  11. Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Proc. the 18th Int. Conf. Data Engineering, March 2002, pp.117-128.

  12. Wu L, Weaver C, Austin T. CryptoManiac: A fast flexible architecture for secure communication. In Proc. Int. Symp. Computer Architecture, June 30-July 4, 2001, pp.110-119.

  13. Ebeling C, Cronquist D C, Franklin P. RaPiD — Reconfigurable pipelined datapath. In Proc. the 6th International Workshop on Field-Programmable Logic, Sept. 1996, pp.126-135.

  14. Goldstein S C, Schmit H, Moe M et al. PipeRench: A coprocessor for streaming multimedia acceleration. In Proc. the 26th Int. Symp. Computer Architecture, May 1999, pp.28-39.

    Google Scholar 

  15. Ahn J H, Dally W J, Khailany B et al. Evaluating the imagine stream architecture. In Proc. the 31st Int. Symp. Computer Architecture, June 2004.

  16. Boeing A, Braunl T. Evaluation of real-time physics simulation systems. In Proc. the 5th International Conference on Computer Graphics and Interactive Techniques in Australia and Southeast Asia, December 2007, pp.281-288.

  17. Luo Z, Liu H, Wu X. Artificial neural network computation on graphic process unit. In Proc. Int. Joint Conf. Neural Networks, July 31-Aug. 4, 2005, Vol.1, pp.622-626.

  18. Lindholm E, Nickolls J, Oberman S et al. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 2008, 28(2): 39-55.

    Article  Google Scholar 

  19. Owens J D, Luebke D, Govindaraju N et al. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 2007, 26(1): 80-113.

    Article  Google Scholar 

  20. Demme J, Sethumadhavan S. Approximate graph clustering for program characterization. ACM Transactions on Architecture and Code Optimization (TACO), 2012, 8(4): Article No. 21.

  21. Cong J, Liu B, Majumdar R et al. Behavior-level observability analysis for operation gating in low-power behavioral synthesis. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2010, 16(1): Article No.4.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Chen.

Additional information

This paper is supported by the National Natural Science Foundation of China under Grant Nos. 601173006, 61221062, and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010403.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(DOC 28 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, P., Zhang, L., Han, YH. et al. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications. J. Comput. Sci. Technol. 29, 239–246 (2014). https://doi.org/10.1007/s11390-014-1426-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1426-9

Keywords

Navigation