Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3613424.3614305acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Open access

TT-GNN: Efficient On-Chip Graph Neural Network Training via Embedding Reformation and Hardware Optimization

Published: 08 December 2023 Publication History

Abstract

Training Graph Neural Networks on large graphs is challenging due to the need to store graph data and move them along the memory hierarchy. In this work, we tackle this by effectively compressing graph embedding matrix such that the model training can be fully enabled with on-chip compute and memory resources. Specifically, we leverage the graph homophily property and consider using Tensor-train to represent the graph embedding. This allows nodes with similar neighborhoods to partially share the feature representation.
While applying Tensor-train reduces the size of the graph embedding, it imposes several challenges to hardware design. On one hand, utilizing low-rank representation requires the features to be decompressed before being sent to GNN models, which introduces extra computation overhead. On the other hand, the decompressed features might still exceed on-chip memory capacity even with the minibatch setting, causing inefficient off-chip memory access. Thus, we propose the TT-GNN hardware accelerator with a specialized dataflow tailored for on-chip Tensor-train GNN learning. Based on the on-chip memory capacity and training configuration, TT-GNN adaptively breaks down a minibatch into smaller microbatches that can be fitted on-chip. The microbatch composition and scheduling order are designed to maximize data reuse and reduce redundant computations both across and within microbatches. To mitigate TT computation overhead, we further propose a unified algorithm to jointly handle TT decompression during forward propagation and TT gradient derivation during backward propagation. Evaluated on a series of benchmarks, the proposed software-hardware solution is able to outperform existing CPU-GPU training systems on both training performance (1.55 ∼ 4210 ×) and energy efficiency (2.83 ∼ 2254 ×). We believe TT-GNN introduces a new perspective to address large-scale GNN training and enables possibilities to train GNN models even under a significantly constrained resource budget.

References

[1]
Youhui Bai, Cheng Li, Zhiqi Lin, Yufei Wu, Youshan Miao, Yunxin Liu, and Yinlong Xu. 2021. Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs. IEEE Transactions on Parallel and Distributed Systems 32, 10 (2021), 2541–2556. https://doi.org/10.1109/TPDS.2021.3065737
[2]
C. Chen, K. Li, Y. Li, and X. Zou. 2022. ReGNN: A Redundancy-Eliminated Graph Neural Networks Accelerator. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE Computer Society, Los Alamitos, CA, USA, 429–443. https://doi.org/10.1109/HPCA53966.2022.00039
[3]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. CoRR abs/1905.07953 (2019). arxiv:1905.07953http://arxiv.org/abs/1905.07953
[4]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. CoRR abs/1905.07953 (2019). arXiv:1905.07953http://arxiv.org/abs/1905.07953
[5]
Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR abs/1602.02830 (2016). arXiv:1602.02830http://arxiv.org/abs/1602.02830
[6]
Chunhua Deng, Fangxuan Sun, Xuehai Qian, Jun Lin, Zhongfeng Wang, and Bo Yuan. 2019. TIE: Energy-Efficient Tensor Train-Based Inference Engine for Deep Neural Network. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix, Arizona) (ISCA ’19). Association for Computing Machinery, New York, NY, USA, 264–278. https://doi.org/10.1145/3307650.3322258
[7]
David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. 2015. Convolutional Networks on Graphs for Learning Molecular Fingerprints. CoRR abs/1509.09292 (2015). arxiv:1509.09292http://arxiv.org/abs/1509.09292
[8]
Mateusz Gabor and Rafał Zdunek. 2022. Convolutional Neural Network Compression via Tensor-Train Decomposition on Permuted Weight Tensor with Automatic Rank Determination. In Computational Science – ICCS 2022, Derek Groen, Clélia de Mulatier, Maciej Paszynski, Valeria V. Krzhizhanovskaya, Jack J. Dongarra, and Peter M. A. Sloot (Eds.). Springer International Publishing, Cham, 654–667.
[9]
Kai Guo and Markus J. Buehler. 2022. Rapid prediction of protein natural frequencies using graph neural networks. Digital Discovery 1 (2022), 277–285. Issue 3. https://doi.org/10.1039/D1DD00007A
[10]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. CoRR abs/1706.02216 (2017). arxiv:1706.02216http://arxiv.org/abs/1706.02216
[11]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv preprint arXiv:2005.00687 (2020).
[12]
Qian Huang, Horace He, Abhay Singh, Ser-Nam Lim, and Austin Benson. 2021. Combining Label Propagation and Simple Models out-performs Graph Neural Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=8E1-f3VhX1o
[13]
Zhihao Jia, Sina Lin, Rex Ying, Jiaxuan You, Jure Leskovec, and Alex Aiken. 2019. Redundancy-Free Computation Graphs for Graph Neural Networks. arxiv:1906.03707 [cs.LG]
[14]
Tim Kaler, Nickolas Stathas, Anne Ouyang, Alexandros-Stavros Iliopoulos, Tao B. Schardl, Charles E. Leiserson, and Jie Chen. 2021. Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining. CoRR abs/2110.08450 (2021). arXiv:2110.08450https://arxiv.org/abs/2110.08450
[15]
George Karypis and Vipin Kumar. 1998. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 20, 1 (Dec. 1998), 359–392.
[16]
Brucek Khailany, Haoxing Ren, Steve Dai, Saad Godil, Ben Keller, Robert Kirby, Alicia Klinefelter, Rangharajan Venkatesan, Yanqing Zhang, Bryan Catanzaro, and William J. Dally. 2020. Accelerating Chip Design With Machine Learning. IEEE Micro 40, 6 (2020), 23–32. https://doi.org/10.1109/MM.2020.3026231
[17]
Valentin Khrulkov, Oleksii Hrinchuk, L. Mirvakhabova, and I. Oseledets. 2019. Tensorized Embedding Layers for Efficient Model Compression. ArXiv abs/1901.10787 (2019).
[18]
Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arxiv:1412.6980 [cs.LG]
[19]
Yunjae Lee, Jinha Chung, and Minsoo Rhu. 2022. SmartSAGE: Training Large-Scale Graph Neural Networks Using in-Storage Processing Architectures. In Proceedings of the 49th Annual International Symposium on Computer Architecture (New York, New York) (ISCA ’22). Association for Computing Machinery, New York, NY, USA, 932–945. https://doi.org/10.1145/3470496.3527391
[20]
Cangyuan Li, Ying Wang, Cheng Liu, Shengwen Liang, Huawei Li, and Xiaowei Li. 2021. GLIST: Towards In-Storage Graph Learning. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 225–238. https://www.usenix.org/conference/atc21/presentation/li-cangyuan
[21]
Yi-Chien Lin and Viktor Prasanna. 2023. HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture. arxiv:2303.00158 [cs.DC]
[22]
Zhiqi Lin, Cheng Li, Youshan Miao, Yunxin Liu, and Yinlong Xu. 2020. PaGraph: Scaling GNN Training on Large Graphs via Computation-Aware Caching. In Proceedings of the 11th ACM Symposium on Cloud Computing (Virtual Event, USA) (SoCC ’20). Association for Computing Machinery, New York, NY, USA, 401–415. https://doi.org/10.1145/3419111.3421281
[23]
Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, and Chuanxiong Guo. 2021. BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing. CoRR abs/2112.08541 (2021). arXiv:2112.08541https://arxiv.org/abs/2112.08541
[24]
Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, and Chuanxiong Guo. 2021. BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing. arxiv:2112.08541 [cs.LG]
[25]
Xin Liu, Mingyu Yan, Shuhan Song, Zhengyang Lv, Wenming Li, Guangyu Sun, Xiaochun Ye, and Dongrui Fan. 2022. GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware. arxiv:2108.11571 [cs.LG]
[26]
Daniela Sánchez Lopera, Lorenzo Servadei, Gamze Naz Kiprit, Souvik Hazra, Robert Wille, and Wolfgang Ecker. 2021. A Survey of Graph Neural Networks for Electronic Design Automation. In 2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD). 1–6. https://doi.org/10.1109/MLCAD52597.2021.9531070
[27]
Yuzhe Ma, Zhuolun He, Wei Li, Lu Zhang, and Bei Yu. 2020. Understanding Graphs in EDA: From Shallow to Deep Learning. Proceedings of the 2020 International Symposium on Physical Design (2020).
[28]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman Jouppi. 2009. Cacti 6.0: A tool to model large caches. HP Laboratories (01 2009).
[29]
Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. 2015. Tensorizing Neural Networks. CoRR abs/1509.06569 (2015). arXiv:1509.06569http://arxiv.org/abs/1509.06569
[30]
Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. 2015. Tensorizing Neural Networks. CoRR abs/1509.06569 (2015). arxiv:1509.06569http://arxiv.org/abs/1509.06569
[31]
Charles C. Onu, Jacob E. Miller, and Doina Precup. 2020. A Fully Tensorized Recurrent Neural Network. CoRR abs/2010.04196 (2020). arXiv:2010.04196https://arxiv.org/abs/2010.04196
[32]
I. V. Oseledets. 2011. Tensor-Train Decomposition. SIAM Journal on Scientific Computing 33, 5 (2011), 2295–2317. https://doi.org/10.1137/090752286 arXiv:https://doi.org/10.1137/090752286
[33]
Mike O’Connor, Niladrish Chatterjee, Donghyuk Lee, John Wilson, Aditya Agrawal, Stephen W. Keckler, and William J. Dally. 2017. Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 41–54.
[34]
Yeonhong Park, Sunhong Min, and Jae W. Lee. 2022. Ginex: SSD-Enabled Billion-Scale Graph Neural Network Training on a Single Machine via Provably Optimal in-Memory Caching. Proc. VLDB Endow. 15, 11 (jul 2022), 2626–2639. https://doi.org/10.14778/3551793.3551819
[35]
Sandeep Polisetty, Juelin Liu, Kobi Falus, Yi Ren Fung, Seung-Hwan Lim, Hui Guan, and Marco Serafini. 2023. GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism. arxiv:2303.13775 [cs.DC]
[36]
Manon Réau, Nicolas Renaud, Li C Xue, and Alexandre M J J Bonvin. 2022. DeepRank-GNN: a graph neural network framework to learn patterns in protein–protein interfaces. Bioinformatics 39, 1 (11 2022). https://doi.org/10.1093/bioinformatics/btac759 arXiv:https://academic.oup.com/bioinformatics/article-pdf/39/1/btac759/48448995/btac759_supplementary_data.pdfbtac759.
[37]
Shihui Song and Peng Jiang. 2022. Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs. In Proceedings of the 36th ACM International Conference on Supercomputing (Virtual Event) (ICS ’22). Association for Computing Machinery, New York, NY, USA, Article 39, 10 pages. https://doi.org/10.1145/3524059.3532384
[38]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arxiv:1710.10903 [stat.ML]
[39]
Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Hao Zhang, Haibin Lin, Junbo Zhao, Jinyang Li, Alexander J. Smola, and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. CoRR abs/1909.01315 (2019). arXiv:1909.01315http://arxiv.org/abs/1909.01315
[40]
Qiange Wang, Yanfeng Zhang, Hao Wang, Chaoyi Chen, Xiaodong Zhang, and Ge Yu. 2022. NeutronStar: Distributed GNN Training with Hybrid Dependency Management. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 1301–1315. https://doi.org/10.1145/3514221.3526134
[41]
Shiwen Wu, Wentao Zhang, Fei Sun, and Bin Cui. 2020. Graph Neural Networks in Recommender Systems: A Survey. arxiv:2011.02260 [cs.IR]
[42]
Jianbang Yang, Dahai Tang, Xiaoniu Song, Lei Wang, Qiang Yin, Rong Chen, Wenyuan Yu, and Jingren Zhou. 2022. GNNLab: A Factored System for Sample-Based GNN Training over GPUs. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys ’22). Association for Computing Machinery, New York, NY, USA, 417–434. https://doi.org/10.1145/3492321.3519557
[43]
Shuangyan Yang, Minjia Zhang, Wenqian Dong, and Dong Li. 2023. Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (Vancouver, BC, Canada) (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 103–117. https://doi.org/10.1145/3575693.3575725
[44]
Yinchong Yang, Denis Krompass, and Volker Tresp. 2017. Tensor-Train Recurrent Neural Networks for Video Classification. CoRR abs/1707.01786 (2017). arxiv:1707.01786http://arxiv.org/abs/1707.01786
[45]
Ziyue Yang, Maghesree Chakraborty, and Andrew D White. 2020. Predicting Chemical Shifts with Graph Neural Networks. bioRxiv (2020). https://doi.org/10.1101/2020.08.26.267971 arXiv:https://www.biorxiv.org/content/early/2020/08/27/2020.08.26.267971.full.pdf
[46]
Chunxing Yin, Bilge Acun, Xing Liu, and Carole-Jean Wu. 2021. TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models. arxiv:2101.11714 [cs.LG]
[47]
Chunxing Yin, Da Zheng, Israt Nisa, Christos Faloutos, George Karypis, and Richard Vuduc. 2022. Nimble GNN Embedding with Tensor-Train Decomposition. arxiv:2206.10581 [cs.LG]
[48]
Lizhi Zhang, Zhiquan Lai, Shengwei Li, Yu Tang, Feng Liu, and Dongsheng Li. 2021. 2PGraph: Accelerating GNN Training over Large Graphs on GPU Clusters. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). 103–113. https://doi.org/10.1109/Cluster48925.2021.00036
[49]
Hang Zhao, Yujing Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, and Qi Zhang. 2020. Multivariate Time-series Anomaly Detection via Graph Attention Network. arxiv:2009.02040 [cs.LG]
[50]
Zhe Zhou, Cong Li, Xuechao Wei, Xiaoyang Wang, and Guangyu Sun. 2022. GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing. arxiv:2111.00680 [cs.LG]

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
October 2023
1528 pages
ISBN:9798400703294
DOI:10.1145/3613424
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2023

Check for updates

Author Tags

  1. Graph Neural Networks
  2. Hardware Accelerator
  3. Tensor-train Decomporition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MICRO '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 961
    Total Downloads
  • Downloads (Last 12 months)961
  • Downloads (Last 6 weeks)108
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media