Abstract
The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. “Dark Silicon”), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-efficiency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFG is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Govindaraju V, Ho C H, Sankaralingam K. Dynamically specialized datapaths for energy efficient computing. In Proc. the 17th Symp. High Performance Computer Architecture (HPCA), February 2011, pp.503-514.
Venkatesh G, Sampson J, Goulding N et al. Conservation cores: Reducing the energy of mature computations. ACM SIGARCH Computer Architecture News, 2010, 38(1): 205-218.
Guha A, Zhang Y, ur Rasool R et al. Systematic evaluation of workload clustering for extremely energy-efficient architectures. ACM SIGARCH Computer Architecture News, 2013, 41(2): 22-29.
Cong J, Ghodrat M A, Gill M et al. Architecture support in accelerator-rich CMPs. In Proc. the 49th Annual Design Automation Conference (DAC), June 2012, pp.843-849.
Hameed R, Qadeer W, Wachs M et al. Understanding sources of inefficiency in general-purpose chips. In Proc. the 37th ISCA, June 2010, pp.37-47.
Memik G, Memik S O, Mangione-Smith W H. Design and analysis of a layer seven network processor accelerator using reconfigurable logic. In Proc. the 10th IEEE Symposium on Field-Programmable Custom Computing Machines, April 2002, pp.131-140.
Yoon C W, Woo R, Kook J et al. An 80/20-MHz 160-mW multimedia processor integrated with embedded DRAM, MPEG-4 accelerator and 3-D rendering engine for mobile applications. IEEE Journal of Solid-State Circuits, 2001, 36(11): 1758-1767.
Steinkraus D, Buck I, Simard P Y. Using GPUs for machine learning algorithms. In Proc. the 8th Int. Conf. Document Analysis and Recognition, August 29-September 1, 2005, pp.1115-1119.
Pionteck T, Staake T, Stiefmeier T et al. Design of a reconfigurable AES encryption/decryption engine for mobile terminals. In Proc. Int. Symp. Circuits and Systems, May 2004, Vol.2, pp.545-548.
Lattner C, Adve V. LLVM: A compilation framework for life-long program analysis & transformation. In Proc. Int. Symp. Code Generation and Optimization: Feedback-Directed and Runtime Optimization, March 2004, pp.75-86.
Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Proc. the 18th Int. Conf. Data Engineering, March 2002, pp.117-128.
Wu L, Weaver C, Austin T. CryptoManiac: A fast flexible architecture for secure communication. In Proc. Int. Symp. Computer Architecture, June 30-July 4, 2001, pp.110-119.
Ebeling C, Cronquist D C, Franklin P. RaPiD — Reconfigurable pipelined datapath. In Proc. the 6th International Workshop on Field-Programmable Logic, Sept. 1996, pp.126-135.
Goldstein S C, Schmit H, Moe M et al. PipeRench: A coprocessor for streaming multimedia acceleration. In Proc. the 26th Int. Symp. Computer Architecture, May 1999, pp.28-39.
Ahn J H, Dally W J, Khailany B et al. Evaluating the imagine stream architecture. In Proc. the 31st Int. Symp. Computer Architecture, June 2004.
Boeing A, Braunl T. Evaluation of real-time physics simulation systems. In Proc. the 5th International Conference on Computer Graphics and Interactive Techniques in Australia and Southeast Asia, December 2007, pp.281-288.
Luo Z, Liu H, Wu X. Artificial neural network computation on graphic process unit. In Proc. Int. Joint Conf. Neural Networks, July 31-Aug. 4, 2005, Vol.1, pp.622-626.
Lindholm E, Nickolls J, Oberman S et al. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 2008, 28(2): 39-55.
Owens J D, Luebke D, Govindaraju N et al. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 2007, 26(1): 80-113.
Demme J, Sethumadhavan S. Approximate graph clustering for program characterization. ACM Transactions on Architecture and Code Optimization (TACO), 2012, 8(4): Article No. 21.
Cong J, Liu B, Majumdar R et al. Behavior-level observability analysis for operation gating in low-power behavioral synthesis. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2010, 16(1): Article No.4.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is supported by the National Natural Science Foundation of China under Grant Nos. 601173006, 61221062, and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010403.
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(DOC 28 kb)
Rights and permissions
About this article
Cite this article
Chen, P., Zhang, L., Han, YH. et al. A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications. J. Comput. Sci. Technol. 29, 239–246 (2014). https://doi.org/10.1007/s11390-014-1426-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-014-1426-9