Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394486.3403168acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

Published: 20 August 2020 Publication History

Abstract

Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) --- a self-supervised graph neural network pre-training framework --- to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.

References

[1]
Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of modern physics, Vol. 74, 1 (2002), 47.
[2]
J Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. 2006. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems. 41--50.
[3]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. In KDD '06 . 44--54.
[4]
Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et almbox. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018).
[5]
Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science, Vol. 353, 6295 (2016), 163--166.
[6]
Stephen P Borgatti and Martin G Everett. 2000. Models of core/periphery structures. Social networks, Vol. 21, 4 (2000), 375--395.
[7]
Ronald S Burt. 2009. Structural holes: The social structure of competition .Harvard university press.
[8]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST), Vol. 2, 3 (2011), 1--27.
[9]
Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2019. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR '19 .
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT '19. 4171--4186.
[11]
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD '17 . 135--144.
[12]
Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. 2018. Learning structural node embeddings via diffusion wavelets. In KDD '18 . 1320--1329.
[13]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In ICML '17. JMLR. org, 1263--1272.
[14]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD '16. 855--864.
[15]
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR '06, Vol. 2. IEEE, 1735--1742.
[16]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034.
[17]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR '20 . 9729--9738.
[18]
Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. 2012. Rolx: structural role extraction & mining in large graphs. In KDD '12. 1231--1239.
[19]
Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. 2019 b. Pre-training graph neural networks. In ICLR '19 .
[20]
Ziniu Hu, Changjun Fan, Ting Chen, Kai-Wei Chang, and Yizhou Sun. 2019 a. Unsupervised Pre-Training of Graph Convolutional Networks. ICLR 2019 Workshop: Representation Learning on Graphs and Manifolds (2019).
[21]
Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In KDD '02 . 538--543.
[22]
Yilun Jin, Guojie Song, and Chuan Shi. 2019. GraLSP: Graph Neural Networks with Local Structural Patterns. arXiv preprint arXiv:1911.07675 (2019).
[23]
Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. 2016. Benchmark Data Sets for Graph Kernels. http://graphkernels.cs.tu-dortmund.de
[24]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR '15 .
[25]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR '17 .
[26]
Elizabeth A Leicht, Petter Holme, and Mark EJ Newman. 2006. Vertex similarity in networks. Physical Review E, Vol. 73, 2 (2006), 026120.
[27]
Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In KDD '06. 631--636.
[28]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD '05 . 177--187.
[29]
Silvio Micali and Zeyuan Allen Zhu. 2016. Reconstructing markov processes from independent and anonymous experiments. Discrete Applied Mathematics, Vol. 200 (2016), 108--122.
[30]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[31]
Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr, Inbal Ayzenshtat, Michal Sheffer, and Uri Alon. 2004. Superfamilies of evolved and designed networks. Science, Vol. 303, 5663 (2004), 1538--1542.
[32]
Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science, Vol. 298, 5594 (2002), 824--827.
[33]
Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and Shantanu Jaiswal. 2017. graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005 (2017).
[34]
Mark EJ Newman. 2006. Modularity and community structure in networks. Proceedings of the national academy of sciences, Vol. 103, 23 (2006), 8577--8582.
[35]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
[36]
Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD '04 . 653--658.
[37]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8024--8035.
[38]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research, Vol. 12, Oct (2011), 2825--2830.
[39]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD '14 . 701--710.
[40]
Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. 2019. Netsmf: Large-scale network embedding as sparse matrix factorization. In The World Wide Web Conference. 1509--1520.
[41]
Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018a. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM '18 . 459--467.
[42]
Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018b. Deepinf: Social influence prediction with deep learning. In KDD '18 . 2110--2119.
[43]
Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In KDD '17 . 385--394.
[44]
Scott C Ritchie, Stephen Watts, Liam G Fearnley, Kathryn E Holt, Gad Abraham, and Michael Inouye. 2016. A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell systems, Vol. 3, 1 (2016), 71--82.
[45]
Daniel A Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing, Vol. 42, 1 (2013), 1--26.
[46]
Fan-Yun Sun, Jordan Hoffman, Vikas Verma, and Jian Tang. 2019. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In ICLR '19 .
[47]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW '15. 1067--1077.
[48]
Shang-Hua Teng et almbox. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science, Vol. 12, 1--2 (2016), 1--274.
[49]
Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2019. Contrastive multiview coding. arXiv preprint arXiv:1906.05849 (2019).
[50]
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM '06. IEEE, 613--622.
[51]
Johan Ugander, Lars Backstrom, Cameron Marlow, and Jon Kleinberg. 2012. Structural diversity in social contagion. Proceedings of the National Academy of Sciences, Vol. 109, 16 (2012), 5962--5966.
[52]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[53]
Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR '18 (2018).
[54]
Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing, Vol. 17, 4 (2007), 395--416.
[55]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019 a. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR '19 .
[56]
Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, et almbox. 2019 b. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315 (2019).
[57]
Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-world networks. nature, Vol. 393, 6684 (1998), 440.
[58]
Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR '18 . 3733--3742.
[59]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In ICLR '19 .
[60]
Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In KDD '15. 1365--1374.
[61]
Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, Vol. 42, 1 (2015), 181--213.
[62]
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In KDD '18 . 974--983.
[63]
Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, and et al. 2019 b. OAG: Toward Linking Large-Scale Heterogeneous Entity Graphs. In KDD '19 . 2585--2595.
[64]
Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. 2019 a. ProNE: fast and scalable network representation learning. In IJCAI '19 . 4278--4284.
[65]
Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. 2015. Panther: Fast top-k similarity search on large networks. In KDD '15 . 1445--1454.
[66]
Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An end-to-end deep learning architecture for graph classification. In AAAI '18 .

Cited By

View all
  • (2025)Collaborative graph neural networks for augmented graphs: A local-to-global perspectivePattern Recognition10.1016/j.patcog.2024.111020158(111020)Online publication date: Feb-2025
  • (2025)Understanding and mitigating dimensional collapse of Graph Contrastive Learning: A non-maximum removal approachNeural Networks10.1016/j.neunet.2024.106652181(106652)Online publication date: Jan-2025
  • (2025)Multi-level discriminator based contrastive learning for multiplex networksNeurocomputing10.1016/j.neucom.2024.128754613(128754)Online publication date: Jan-2025
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph neural network
  2. graph representation learning
  3. pre-training

Qualifiers

  • Research-article

Funding Sources

  • National Key R&D Program of China
  • NSFC for Distinguished Young Scholar
  • NSFC

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)674
  • Downloads (Last 6 weeks)69
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Collaborative graph neural networks for augmented graphs: A local-to-global perspectivePattern Recognition10.1016/j.patcog.2024.111020158(111020)Online publication date: Feb-2025
  • (2025)Understanding and mitigating dimensional collapse of Graph Contrastive Learning: A non-maximum removal approachNeural Networks10.1016/j.neunet.2024.106652181(106652)Online publication date: Jan-2025
  • (2025)Multi-level discriminator based contrastive learning for multiplex networksNeurocomputing10.1016/j.neucom.2024.128754613(128754)Online publication date: Jan-2025
  • (2025)Fusing temporal and semantic dependencies for session-based recommendationInformation Processing & Management10.1016/j.ipm.2024.10389662:1(103896)Online publication date: Jan-2025
  • (2025)Contrastive meta-reinforcement learning for heterogeneous graph neural architecture searchExpert Systems with Applications10.1016/j.eswa.2024.125433260(125433)Online publication date: Jan-2025
  • (2025)C-KGE: Curriculum learning-based Knowledge Graph EmbeddingComputer Speech & Language10.1016/j.csl.2024.10168989(101689)Online publication date: Jan-2025
  • (2024)Grasper: A Generalist Pursuer for Pursuit-Evasion ProblemsProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662971(1147-1155)Online publication date: 6-May-2024
  • (2024)Toward Unified AI Drug Discovery with Multimodal KnowledgeHealth Data Science10.34133/hds.01134Online publication date: 23-Feb-2024
  • (2024)A Negative Sample-Free Graph Contrastive Learning AlgorithmMathematics10.3390/math1210158112:10(1581)Online publication date: 18-May-2024
  • (2024)Triple Generative Self-Supervised Learning Method for Molecular Property PredictionInternational Journal of Molecular Sciences10.3390/ijms2507379425:7(3794)Online publication date: 28-Mar-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media