Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3466752.3480113acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

Published: 17 October 2021 Publication History

Abstract

Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from the poor data locality and redundant computation due to the large size, high sparsity, and irregular non-zero distribution of real-world graphs.
In this paper we propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation. The mechanism is a new online graph restructuring algorithm we refer to as islandization. The proposed algorithm finds clusters of nodes with strong internal but weak external connections. The islandization process yields two major benefits. First, by processing islands rather than individual nodes, there is better on-chip data reuse and fewer off-chip memory accesses. Second, there is less redundant computation as aggregation for common/shared neighbors in an island can be reused. The parallel search, identification, and leverage of graph islands are all handled purely in hardware at runtime working in an incremental pipeline. This is done without any preprocessing of the graph data or adjustment of the GCN model structure. Experimental results show that I-GCN can significantly reduce off-chip accesses and prune 38% of aggregation operations, leading to performance speedups over CPUs, GPUs, the prior art GCN accelerators of 5549 ×, 403 ×, and 5.7 × on average, respectively.

References

[1]
Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, and Eduard Alarcón. 2020. Computing Graph Neural Networks: A Survey from Algorithms to Accelerators. arxiv:2010.00130 [cs.LG]
[2]
A. Abou-Rjeili and G. Karypis. 2006. Multilevel algorithms for partitioning power-law graphs. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium. 10 pp.–.
[3]
Junya Arai, Hiroaki Shiokawa, Takeshi Yamamuro, Makoto Onizuka, and Sotetsu Iwamura. 2016. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 22–31. https://doi.org/10.1109/IPDPS.2016.110
[4]
Adam Auten, Matthew Tomei, and Rakesh Kumar. 2020. Hardware acceleration of graph neural networks. In 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6.
[5]
Vignesh Balaji and Brandon Lucia. 2018. When is Graph Reordering an Optimization? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs. In 2018 IEEE International Symposium on Workload Characterization (IISWC). 203–214. https://doi.org/10.1109/IISWC.2018.8573478
[6]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral Networks and Locally Connected Networks on Graphs. In the 2nd International Conference on Learning Representations.
[7]
Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247(2018).
[8]
Xiaobing Chen, Yuke Wang, Xinfeng Xie, Xing Hu, Abanti Basak, Ling Liang, Mingyu Yan, Lei Deng, Yufei Ding, Zidong Du, 2020. Rubik: A Hierarchical Architecture for Efficient Graph Learning. arXiv preprint arXiv:2009.12495(2020).
[9]
YuAng Chen and Yeh-Ching Chung. 2021. Corder: cache-aware reordering for optimizing graph analytics. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 472–473.
[10]
Connor W. Coley, Wengong Jin, Luke Rogers, Timothy F. Jamison, Tommi S. Jaakkola, William H. Green, Regina Barzilay, and Klavs F. Jensen. 2019. A graph-convolutional neural network model for the prediction of chemical reactivity. Chemical Science 10(2019), 370–377. Issue 2.
[11]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 3844–3852.
[12]
Priyank Faldu, Jeff Diamond, and Boris Grot. 2019. A Closer Look at Lightweight Graph Reordering. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 1–13. https://doi.org/10.1109/IISWC47752.2019.9041948
[13]
Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric. Computing Research Repository (CoRR) in arXiv abs/1903.02428 (2019).
[14]
T. Geng, A. Li, R. Shi, C. Wu, T. Wang, Y. Li, P. Haghi, A. Tumeo, S. Che, S. Reinhardt, and M. C. Herbordt. 2020. AWB-GCN: a Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture. 922–936.
[15]
T. Geng, T. Wang, C. Wu, Y. Li, C. Yang, W. Wu, A. Li, and M.C. Herbordt. 2021. O3BNN-R: an Out-Of-Order Architecture for High-Performance and Regularized BNN Inference. IEEE Transactions on Parallel and Distributed Systems 32, 1 (2021), 199–213.
[16]
T. Geng, T. Wang, C. Wu, C. Yang, W. Wu, A. Li, and M.C. Herbordt. 2019. O3BNN: an Out-Of-Order Architecture for High-Performance Binarized Neural Network Inference with Fine-Grained Pruning. ACM International Conference on Supercomputing 2160 (2019), 461–472. 3330345. 3330386.
[17]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, 17–30.
[18]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1024–1034.
[19]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE, 243–254.
[20]
Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. Extensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319–333.
[21]
Avinash Karanth Jiajun Li, Ahmed Louriand Razvan Bunescu. 2021. GCNAX: A Flexible and Energy-efficient Accelerator for Graph Convolutional Neural Networks. In 2021 IEEE International Symposium on High-Performance Computer Architecture.
[22]
Konstantinos I Karantasis, Andrew Lenharth, Donald Nguyen, Mara J Garzaran, and Keshav Pingali. 2014. Parallelization of reordering algorithms for bandwidth and wavefront reduction. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 921–932.
[23]
Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2017. A novel zero weight/activation-aware hardware architecture of convolutional neural network. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 1462–1467.
[24]
Kevin Kiningham, Christopher Re, and Philip Levis. 2020. GRIP: A Graph Neural Network AcceleratorArchitecture. arXiv preprint arXiv:2007.13828(2020).
[25]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Computing Research Repository (CoRR) in arXiv abs/1609.02907 (2016).
[26]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.
[27]
Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical computer science 407, 1-3 (2008), 458–473.
[28]
Eunjae Lee, Junghyun Kim, Keunhak Lim, Sam H Noh, and Jiwon Seo. 2019. Pre-select static caching and neighborhood ordering for bfs-like algorithms on disk-based graph engines. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 459–474.
[29]
Shengwen Liang, Cheng Liu, Ying Wang, Huawei Li, and Xiaowei Li. 2020. DeepBurning-GL: an automated framework for generating graph neural network accelerators. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1–9.
[30]
Shengwen Liang, Ying Wang, Cheng Liu, Lei He, LI Huawei, Dawen Xu, and Xiaowei Li. 2020. EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks. IEEE Trans. Comput. (2020).
[31]
Yongsub Lim, U Kang, and Christos Faloutsos. 2014. Slashburn: Graph compression and mining beyond caveman communities. IEEE Transactions on Knowledge and Data Engineering 26, 12(2014), 3077–3089.
[32]
Yuxiao Liu, Ning Zhang, Dan Wu, Audun Botterud, Rui Yao, and Chongqing Kang. 2020. Guiding Cascading Failure Search with Interpretable Graph Convolutional Network. Computing Research Repository (CoRR) in arXiv abs/2001.11553 (2020).
[33]
Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.
[34]
Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–14. https://doi.org/10.1109/MICRO.2018.00010
[35]
Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2017. Cache-Guided Scheduling: Exploiting caches to maximize locality in graph processing. AGP’17 (2017).
[36]
Huy-Trung Nguyen, Quoc-Dung Ngo, and Van-Hoang Le. 2018. IoT botnet detection approach based on PSI graph and DGCNN classifier. In 2018 IEEE International Conference on Information Communication and Signal Processing (ICICSP). IEEE, 118–122.
[37]
Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. Outerspace: An outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 724–736.
[38]
Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. Sigma: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 58–70.
[39]
Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. 2020. Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 766–780.
[40]
Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 689–702.
[41]
Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315(2019).
[42]
T. Wang, T. Geng, A. Li, X. Jin, and M. Herbordt. 2020. FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters. IEEE Trans. Comput. 69, 08 (2020), 1143–1158.
[43]
Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data. 1813–1828.
[44]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems (2020).
[45]
Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. 2014. Distributed power-law graph computing: Theoretical and empirical analysis. In Advances in neural information processing systems. 1673–1681.
[46]
Tian Xie and Jeffrey C Grossman. 2018. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters 120, 14 (2018), 145301.
[47]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations.
[48]
Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. HyGCN: A GCN Accelerator with Hybrid Architecture. Computing Research Repository (CoRR) in arXiv abs/2001.02514 (2020).
[49]
Hongxia Yang. 2019. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 3165–3166.
[50]
Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. 2019. Graph Transformer Networks. In Advances in Neural Information Processing Systems. 11960–11970.
[51]
Hanqing Zeng and Viktor Prasanna. 2020. Graphact: Accelerating gcn training on cpu-fpga heterogeneous platforms. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 255–265.
[52]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-x: An accelerator for sparse neural networks. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 20.
[53]
Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293–302. https://doi.org/10.1109/BigData.2017.8257937
[54]
Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 293–302.
[55]
Yongan Zhang, Haoran You, Yonggan Fu, Tong Geng, Ang Li, and Yingyan Lin. 2021. G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2021).
[56]
Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34, 13 (2018), i457–i466.

Cited By

View all
  • (2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
  • (2024)A Survey on Graph Neural Network Acceleration: A Hardware PerspectiveChinese Journal of Electronics10.23919/cje.2023.00.13533:3(601-622)Online publication date: May-2024
  • (2024)S-LGCN: Software-Hardware Co-Design for Accelerating LightGCN2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546640(1-6)Online publication date: 25-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Locality
  2. Graph Neural Network
  3. Hardware Accelerator
  4. High-Performance Computing
  5. Machine Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MICRO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)319
  • Downloads (Last 6 weeks)41
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable SystemsInformation10.3390/info1507037715:7(377)Online publication date: 28-Jun-2024
  • (2024)A Survey on Graph Neural Network Acceleration: A Hardware PerspectiveChinese Journal of Electronics10.23919/cje.2023.00.13533:3(601-622)Online publication date: May-2024
  • (2024)S-LGCN: Software-Hardware Co-Design for Accelerating LightGCN2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546640(1-6)Online publication date: 25-Mar-2024
  • (2024)A survey of graph convolutional networks (GCNs) in FPGA-based acceleratorsJournal of Big Data10.1186/s40537-024-01022-411:1Online publication date: 11-Nov-2024
  • (2024)Graph-OPU: A Highly Flexible FPGA-Based Overlay Processor for Graph Neural NetworksACM Transactions on Reconfigurable Technology and Systems10.1145/369163617:4(1-33)Online publication date: 2-Sep-2024
  • (2024)A Review of FPGA-based Graph Neural Network Accelerator ArchitecturesProceedings of the 2024 7th International Conference on Signal Processing and Machine Learning10.1145/3686490.3686539(335-343)Online publication date: 12-Jul-2024
  • (2024)CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding LayersProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656595(338-351)Online publication date: 3-Jun-2024
  • (2024)GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and RecouplingProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656540(1-6)Online publication date: 23-Jun-2024
  • (2024)Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638496(404-416)Online publication date: 2-Mar-2024
  • (2024)VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision TasksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346689135:12(2405-2422)Online publication date: Dec-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media