research-article

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

Authors:

Martin Herbordt,

Ang LiAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 1051 - 1063

https://doi.org/10.1145/3466752.3480113

Published: 17 October 2021 Publication History

Abstract

Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from the poor data locality and redundant computation due to the large size, high sparsity, and irregular non-zero distribution of real-world graphs.

In this paper we propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation. The mechanism is a new online graph restructuring algorithm we refer to as islandization. The proposed algorithm finds clusters of nodes with strong internal but weak external connections. The islandization process yields two major benefits. First, by processing islands rather than individual nodes, there is better on-chip data reuse and fewer off-chip memory accesses. Second, there is less redundant computation as aggregation for common/shared neighbors in an island can be reused. The parallel search, identification, and leverage of graph islands are all handled purely in hardware at runtime working in an incremental pipeline. This is done without any preprocessing of the graph data or adjustment of the GCN model structure. Experimental results show that I-GCN can significantly reduce off-chip accesses and prune 38% of aggregation operations, leading to performance speedups over CPUs, GPUs, the prior art GCN accelerators of 5549 ×, 403 ×, and 5.7 × on average, respectively.

References

[1]

Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, and Eduard Alarcón. 2020. Computing Graph Neural Networks: A Survey from Algorithms to Accelerators. arxiv:2010.00130 [cs.LG]

[2]

A. Abou-Rjeili and G. Karypis. 2006. Multilevel algorithms for partitioning power-law graphs. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium. 10 pp.–.

[3]

Junya Arai, Hiroaki Shiokawa, Takeshi Yamamuro, Makoto Onizuka, and Sotetsu Iwamura. 2016. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 22–31. https://doi.org/10.1109/IPDPS.2016.110

[4]

Adam Auten, Matthew Tomei, and Rakesh Kumar. 2020. Hardware acceleration of graph neural networks. In 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6.

[5]

Vignesh Balaji and Brandon Lucia. 2018. When is Graph Reordering an Optimization? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs. In 2018 IEEE International Symposium on Workload Characterization (IISWC). 203–214. https://doi.org/10.1109/IISWC.2018.8573478

[6]

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral Networks and Locally Connected Networks on Graphs. In the 2nd International Conference on Learning Representations.

[7]

Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247(2018).

[8]

Xiaobing Chen, Yuke Wang, Xinfeng Xie, Xing Hu, Abanti Basak, Ling Liang, Mingyu Yan, Lei Deng, Yufei Ding, Zidong Du, 2020. Rubik: A Hierarchical Architecture for Efficient Graph Learning. arXiv preprint arXiv:2009.12495(2020).

[9]

YuAng Chen and Yeh-Ching Chung. 2021. Corder: cache-aware reordering for optimizing graph analytics. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 472–473.

Digital Library

[10]

Connor W. Coley, Wengong Jin, Luke Rogers, Timothy F. Jamison, Tommi S. Jaakkola, William H. Green, Regina Barzilay, and Klavs F. Jensen. 2019. A graph-convolutional neural network model for the prediction of chemical reactivity. Chemical Science 10(2019), 370–377. Issue 2.

[11]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 3844–3852.

Digital Library

[12]

Priyank Faldu, Jeff Diamond, and Boris Grot. 2019. A Closer Look at Lightweight Graph Reordering. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 1–13. https://doi.org/10.1109/IISWC47752.2019.9041948

[13]

Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. Computing Research Repository (CoRR) in arXiv abs/1903.02428 (2019).

[14]

T. Geng, A. Li, R. Shi, C. Wu, T. Wang, Y. Li, P. Haghi, A. Tumeo, S. Che, S. Reinhardt, and M. C. Herbordt. 2020. AWB-GCN: a Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture. 922–936.

[15]

T. Geng, T. Wang, C. Wu, Y. Li, C. Yang, W. Wu, A. Li, and M.C. Herbordt. 2021. O3BNN-R: an Out-Of-Order Architecture for High-Performance and Regularized BNN Inference. IEEE Transactions on Parallel and Distributed Systems 32, 1 (2021), 199–213.

[16]

T. Geng, T. Wang, C. Wu, C. Yang, W. Wu, A. Li, and M.C. Herbordt. 2019. O3BNN: an Out-Of-Order Architecture for High-Performance Binarized Neural Network Inference with Fine-Grained Pruning. ACM International Conference on Supercomputing 2160 (2019), 461–472. 3330345. 3330386.

Digital Library

[17]

Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, 17–30.

Digital Library

[18]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1024–1034.

Digital Library

[19]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE, 243–254.

Digital Library

[20]

Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. Extensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319–333.

Digital Library

[21]

Avinash Karanth Jiajun Li, Ahmed Louriand Razvan Bunescu. 2021. GCNAX: A Flexible and Energy-efficient Accelerator for Graph Convolutional Neural Networks. In 2021 IEEE International Symposium on High-Performance Computer Architecture.

[22]

Konstantinos I Karantasis, Andrew Lenharth, Donald Nguyen, Mara J Garzaran, and Keshav Pingali. 2014. Parallelization of reordering algorithms for bandwidth and wavefront reduction. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 921–932.

Digital Library

[23]

Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2017. A novel zero weight/activation-aware hardware architecture of convolutional neural network. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 1462–1467.

[24]

Kevin Kiningham, Christopher Re, and Philip Levis. 2020. GRIP: A Graph Neural Network AcceleratorArchitecture. arXiv preprint arXiv:2007.13828(2020).

[25]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Computing Research Repository (CoRR) in arXiv abs/1609.02907 (2016).

[26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.

[27]

Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical computer science 407, 1-3 (2008), 458–473.

[28]

Eunjae Lee, Junghyun Kim, Keunhak Lim, Sam H Noh, and Jiwon Seo. 2019. Pre-select static caching and neighborhood ordering for bfs-like algorithms on disk-based graph engines. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 459–474.

[29]

Shengwen Liang, Cheng Liu, Ying Wang, Huawei Li, and Xiaowei Li. 2020. DeepBurning-GL: an automated framework for generating graph neural network accelerators. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1–9.

Digital Library

[30]

Shengwen Liang, Ying Wang, Cheng Liu, Lei He, LI Huawei, Dawen Xu, and Xiaowei Li. 2020. EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks. IEEE Trans. Comput. (2020).

[31]

Yongsub Lim, U Kang, and Christos Faloutsos. 2014. Slashburn: Graph compression and mining beyond caveman communities. IEEE Transactions on Knowledge and Data Engineering 26, 12(2014), 3077–3089.

[32]

Yuxiao Liu, Ning Zhang, Dan Wu, Audun Botterud, Rui Yao, and Chongqing Kang. 2020. Guiding Cascading Failure Search with Interpretable Graph Convolutional Network. Computing Research Repository (CoRR) in arXiv abs/2001.11553 (2020).

[33]

Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.

[34]

Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–14. https://doi.org/10.1109/MICRO.2018.00010

Digital Library

[35]

Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2017. Cache-Guided Scheduling: Exploiting caches to maximize locality in graph processing. AGP’17 (2017).

[36]

Huy-Trung Nguyen, Quoc-Dung Ngo, and Van-Hoang Le. 2018. IoT botnet detection approach based on PSI graph and DGCNN classifier. In 2018 IEEE International Conference on Information Communication and Signal Processing (ICICSP). IEEE, 118–122.

[37]

Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. Outerspace: An outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 724–736.

[38]

Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. Sigma: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 58–70.

[39]

Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. 2020. Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 766–780.

[40]

Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 689–702.

[41]

Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315(2019).

[42]

T. Wang, T. Geng, A. Li, X. Jin, and M. Herbordt. 2020. FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters. IEEE Trans. Comput. 69, 08 (2020), 1143–1158.

[43]

Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data. 1813–1828.

Digital Library

[44]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems (2020).

[45]

Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. 2014. Distributed power-law graph computing: Theoretical and empirical analysis. In Advances in neural information processing systems. 1673–1681.

[46]

Tian Xie and Jeffrey C Grossman. 2018. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters 120, 14 (2018), 145301.

[47]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations.

[48]

Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. HyGCN: A GCN Accelerator with Hybrid Architecture. Computing Research Repository (CoRR) in arXiv abs/2001.02514 (2020).

[49]

Hongxia Yang. 2019. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 3165–3166.

Digital Library

[50]

Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. 2019. Graph Transformer Networks. In Advances in Neural Information Processing Systems. 11960–11970.

[51]

Hanqing Zeng and Viktor Prasanna. 2020. Graphact: Accelerating gcn training on cpu-fpga heterogeneous platforms. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 255–265.

Digital Library

[52]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-x: An accelerator for sparse neural networks. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 20.

Digital Library

[53]

Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293–302. https://doi.org/10.1109/BigData.2017.8257937

[54]

Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 293–302.

[55]

Yongan Zhang, Haoran You, Yonggan Fu, Tong Geng, Ang Li, and Yingyan Lin. 2021. G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2021).

[56]

Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34, 13 (2018), i457–i466.

Cited By

Kose HNunez-Yanez JPiechocki RPope J(2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
https://doi.org/10.3390/info15070377
Chen SLiu JShen L(2024)A Survey on Graph Neural Network Acceleration: A Hardware PerspectiveChinese Journal of Electronics10.23919/cje.2023.00.13533:3(601-622)Online publication date: May-2024
https://doi.org/10.23919/cje.2023.00.135
Li SChen RTang ELiu YYang JWang K(2024)S-LGCN: Software-Hardware Co-Design for Accelerating LightGCN2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546640(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546640
Show More Cited By

Recommendations

Optimizing inter-nest data locality in imperfect stencils based on loop blocking
Abstract
With the interesting growth in high-performance computing, the performance of data-driven programs is becoming more and more dependent on fast memory access, which can be improved by data locality. Data locality between a pair of loop nest is ...
Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the ...
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
1,741
Total Downloads

Downloads (Last 12 months)319
Downloads (Last 6 weeks)41

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kose HNunez-Yanez JPiechocki RPope J(2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
https://doi.org/10.3390/info15070377
Chen SLiu JShen L(2024)A Survey on Graph Neural Network Acceleration: A Hardware PerspectiveChinese Journal of Electronics10.23919/cje.2023.00.13533:3(601-622)Online publication date: May-2024
https://doi.org/10.23919/cje.2023.00.135
Li SChen RTang ELiu YYang JWang K(2024)S-LGCN: Software-Hardware Co-Design for Accelerating LightGCN2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546640(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546640
Procaccini MSahebi AGiorgi R(2024)A survey of graph convolutional networks (GCNs) in FPGA-based acceleratorsJournal of Big Data10.1186/s40537-024-01022-411:1Online publication date: 11-Nov-2024
https://doi.org/10.1186/s40537-024-01022-4
Tang ELi SChen RZhou HMa YZhang HYu JWang K(2024)Graph-OPU: A Highly Flexible FPGA-Based Overlay Processor for Graph Neural NetworksACM Transactions on Reconfigurable Technology and Systems10.1145/369163617:4(1-33)Online publication date: 2-Sep-2024
https://dl.acm.org/doi/10.1145/3691636
Zhang XXia YKang ZDu XGao J(2024)A Review of FPGA-based Graph Neural Network Accelerator ArchitecturesProceedings of the 2024 7th International Conference on Signal Processing and Machine Learning10.1145/3686490.3686539(335-343)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3686490.3686539
Yun SNam HKyung KPark JKim BKwon YLee EAhn J(2024)CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding LayersProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656595(338-351)Online publication date: 3-Jun-2024
https://doi.org/10.1145/3650200.3656595
Xue RYan MHan DTeng YTang ZYe XFan DDe V(2024)GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and RecouplingProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656540(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3656540
Gianinazzi LZiogas AHuang LLuczynski PAshkboosh SScheidl FCarigiet AGe CAbubaker NBesta MBen-Nun THoefler TLee IChabbi MSteuwer M(2024)Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638496(404-416)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3627535.3638496
Zhang BKannan RBusart CPrasanna V(2024)VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision TasksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346689135:12(2405-2422)Online publication date: Dec-2024
https://doi.org/10.1109/TPDS.2024.3466891
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents