Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3540261.3541479guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Adversarial graph augmentation to improve graph contrastive learning

Published: 10 June 2024 Publication History

Abstract

Self-supervised learning of graph neural networks (GNN) is in great need because of the widespread label scarcity issue in real-world graph/network data. Graph contrastive learning (GCL), by training GNNs to maximize the correspondence between the representations of the same graph in its different augmented forms, may yield robust and transferable GNNs even without using labels. However, GNNs trained by traditional GCL often risk capturing redundant graph features and thus may be brittle and provide sub-par performance in downstream tasks. Here, we propose a novel principle, termed adversarial-GCL (AD-GCL), which enables GNNs to avoid capturing redundant information during the training by optimizing adversarial graph augmentation strategies used in GCL. We pair AD-GCL with theoretical explanations and design a practical instantiation based on trainable edge-dropping graph augmentation. We experimentally validate AD-GCL2 by comparing with the state-of-the-art GCL methods and achieve performance gains of up-to 14% in unsupervised, 6% in transfer, and 3% in semi-supervised learning settings overall with 18 different benchmark datasets for the tasks of molecule property regression and classification, and social network classification.

Supplementary Material

Additional material (3540261.3541479_supp.pdf)
Supplemental material.

References

[1]
A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A. W. Nelson, A. Bridgland et al., "Improved protein structure prediction using potentials from deep learning," Nature, vol. 577, no. 7792, pp. 706–710, 2020.
[2]
J. Shlomi, P. Battaglia, and J.-R. Vlimant, "Graph neural networks in particle physics," Machine Learning: Science and Technology, vol. 2, no. 2, p. 021001, 2020.
[3]
W. L. Hamilton, "Graph representation learning," Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 14, no. 3, pp. 1–159, 2020.
[4]
K. Hornik, M. Stinchcombe, H. White et al., "Multilayer feedforward networks are universal approximators." Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
[5]
G. Cybenko, "Approximation by superpositions of a sigmoidal function," Mathematics of control, signals and systems, vol. 2, no. 4, pp. 303–314, 1989.
[6]
F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, "The graph neural network model," IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2008.
[7]
I. Chami, S. Abu-El-Haija, B. Perozzi, C. Ré, and K. Murphy, "Machine learning on graphs: A model and comprehensive taxonomy," arXiv preprint arXiv:2005.03675, 2020.
[8]
Z. Zhang, P. Cui, and W. Zhu, "Deep learning on graphs: A survey," IEEE TKDE, 2020.
[9]
W. L. Hamilton, R. Ying, and J. Leskovec, "Representation learning on graphs: Methods and applications," IEEE Data Engineering Bulletin, vol. 40, no. 3, pp. 52–74, 2017.
[10]
T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," in International Conference on Learning Representations, 2017.
[11]
H. Dai, B. Dai, and L. Song, "Discriminative embeddings of latent variable models for structured data," in International Conference on Machine Learning. PMLR, 2016, pp. 2702–2711.
[12]
P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, "Graph attention networks," in International Conference on Learning Representations, 2018.
[13]
M. Zhang, Z. Cui, M. Neumann, and Y. Chen, "An end-to-end deep learning architecture for graph classification," in the AAAI Conference on Artificial Intelligence, 2018, pp. 4438–4445.
[14]
K. Xu, W. Hu, J. Leskovec, and S. Jegelka, "How powerful are graph neural networks?" in International Conference on Learning Representations, 2019.
[15]
C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe, "Weisfeiler and leman go neural: Higher-order graph neural networks," in the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 4602–4609.
[16]
P. Li, Y. Wang, H. Wang, and J. Leskovec, "Distance encoding: Design provably more powerful neural networks for graph representation learning," Advances in Neural Information Processing Systems, vol. 33, 2020.
[17]
W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec, "Strategies for pre-training graph neural networks," International Conference on Learning Representations, 2020.
[18]
F.-Y. Sun, J. Hoffmann, and J. Tang, "Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization," arXiv preprint arXiv:1908.01000, 2019.
[19]
H. G. Vogel, Drug discovery and evaluation: pharmacological assays. Springer Science & Business Media, 2002.
[20]
T. N. Kipf and M. Welling, "Variational graph auto-encoders," arXiv preprint arXiv:1611.07308, 2016.
[21]
A. Grover, A. Zweig, and S. Ermon, "Graphite: Iterative generative modeling of graphs," in International Conference on Machine Learning. PMLR, 2019, pp. 2434–2444.
[22]
Z. Peng, W. Huang, M. Luo, Q. Zheng, Y. Rong, T. Xu, and J. Huang, "Graph representation learning via graphical mutual information maximization," in Proceedings of The Web Conference 2020, 2020.
[23]
P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, "Deep graph infomax," arXiv preprint arXiv:1809.10341, 2018.
[24]
Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, "Graph contrastive learning with augmentations," Advances in Neural Information Processing Systems, vol. 33, 2020.
[25]
K. Hassani and A. H. Khasahmadi, "Contrastive multi-view representation learning on graphs," in International Conference on Machine Learning. PMLR, 2020, pp. 4116–4126.
[26]
Y. Xie, Z. Xu, Z. Wang, and S. Ji, "Self-supervised learning of graph neural networks: A unifed review," arXiv preprint arXiv:2102.10757, 2021.
[27]
Y. Liu, S. Pan, M. Jin, C. Zhou, F. Xia, and P. S. Yu, "Graph self-supervised learning: A survey," arXiv preprint arXiv:2103.00111, 2021.
[28]
S. Zhang, Z. Hu, A. Subramonian, and Y. Sun, "Motif-driven contrastive learning of graph representations," arXiv preprint arXiv:2012.12533, 2020.
[29]
S. Thakoor, C. Tallec, M. G. Azar, R. Munos, P. Veličković, and M. Valko, "Bootstrapped representation learning on graphs," arXiv preprint arXiv:2102.06514, 2021.
[30]
Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, "Graph contrastive learning with adaptive augmentation," arXiv preprint arXiv:2010.14945, 2020.
[31]
J. Qiu, Q. Chen, Y. Dong, J. Zhang, H. Yang, M. Ding, K. Wang, and J. Tang, "Gcc: Graph contrastive coding for graph neural network pre-training," in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1150–1160.
[32]
M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural computation, vol. 15, no. 6, pp. 1373–1396, 2003.
[33]
B. Perozzi, R. Al-Rfou, and S. Skiena, "Deepwalk: Online learning of social representations," in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 701–710.
[34]
A. Grover and J. Leskovec, "node2vec: Scalable feature learning for networks," in the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 855–864.
[35]
L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo, "struc2vec: Learning node representations from structural identity," in the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 385–394.
[36]
W. Hamilton, Z. Ying, and J. Leskovec, "Inductive representation learning on large graphs," in Advances in Neural Information Processing Systems, 2017.
[37]
K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, C. Falout-sos, and L. Li, "Rolx: structural role extraction & mining in large graphs," in the ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp. 1231–1239.
[38]
C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec, "Learning structural node embeddings via diffusion wavelets," in the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1320–1329.
[39]
R. Linsker, "Self-organization in a perceptual network," Computer, vol. 21, no. 3, pp. 105–117, 1988.
[40]
M. Tschannen, J. Djolonga, P. K. Rubenstein, S. Gelly, and M. Lucic, "On mutual information maximization for representation learning," in International Conference on Learning Representations, 2020.
[41]
N. Tishby, F. C. Pereira, and W. Bialek, "The information bottleneck method," arXiv preprint physics/0004057, 2000.
[42]
N. Tishby and N. Zaslavsky, "Deep learning and the information bottleneck principle," in 2015 IEEE Information Theory Workshop (ITW). IEEE, 2015.
[43]
Z. Goldfeld and Y. Polyanskiy, "The information bottleneck problem and its applications in machine learning," IEEE Journal on Selected Areas in Information Theory, 2020.
[44]
A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, "Deep variational information bottleneck," arXiv preprint arXiv:1612.00410, 2016.
[45]
X. B. Peng, A. Kanazawa, S. Toyer, P. Abbeel, and S. Levine, "Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information fow," arXiv preprint arXiv:1810.00821, 2018.
[46]
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerch-ner, "beta-vae: Learning basic visual concepts with a constrained variational framework." in International Conference on Learning Representations, 2017.
[47]
T. Wu, H. Ren, P. Li, and J. Leskovec, "Graph information bottleneck," in Advances in Neural Information Processing Systems, 2020.
[48]
J. Yu, T. Xu, Y. Rong, Y. Bian, J. Huang, and R. He, "Recognizing predictive substructures with subgraph information bottleneck," International Conference on Learning Representations, 2021.
[49]
J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, "Neural message passing for quantum chemistry," in International Conference on Machine Learning. JMLR. org, 2017.
[50]
T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley & Sons, 2012.
[51]
B. Weisfeiler and A. Leman, "A reduction of a graph to a canonical form and an algebra arising during this reduction," Nauchno-Technicheskaya Informatsia, 1968.
[52]
W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, "Open graph benchmark: Datasets for machine learning on graphs," arXiv preprint arXiv:2005.00687, 2020.
[53]
D. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gómez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, "Convolutional networks on graphs for learning molecular fingerprints," Advances in Neural Information Processing Systems, vol. 2015, pp. 2224–2232, 2015.
[54]
E. N. Gilbert, "Random graphs," The Annals of Mathematical Statistics, vol. 30, no. 4, pp. 1141–1144, 1959.
[55]
P. Erdős and A. Rényi, "On random graphs i." Publ. Math. Debrecen, vol. 6, pp. 290–297, 1959.
[56]
C. J. Maddison, A. Mnih, and Y. W. Teh, "The concrete distribution: A continuous relaxation of discrete random variables," in International Conference on Learning Representations, 2017.
[57]
E. Jang, S. Gu, and B. Poole, "Categorical reparameterization with gumbel-softmax," in International Conference on Learning Representations, 2017.
[58]
D. Luo, W. Cheng, D. Xu, W. Yu, B. Zong, H. Chen, and X. Zhang, "Parameterized explainer for graph neural network," Advances in Neural Information Processing Systems, vol. 33, 2020.
[59]
A. v. d. Oord, Y. Li, and O. Vinyals, "Representation learning with contrastive predictive coding," arXiv preprint arXiv:1807.03748, 2018.
[60]
Y. Tian, D. Krishnan, and P. Isola, "Contrastive multiview coding," arXiv preprint arXiv:1906.05849, 2019.
[61]
B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, and G. Tucker, "On variational bounds of mutual information," in International Conference on Machine Learning, 2019.
[62]
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A simple framework for contrastive learning of visual representations," in International Conference on Machine Learning. PMLR, 2020, pp. 1597–1607.
[63]
S. Becker and G. E. Hinton, "Self-organizing neural network that discovers surfaces in random-dot stereograms," Nature, vol. 355, no. 6356, pp. 161–163, 1992.
[64]
O. Henaff, "Data-efficient image recognition with contrastive predictive coding," in International Conference on Machine Learning. PMLR, 2020, pp. 4182–4192.
[65]
R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio, "Learning deep representations by mutual information estimation and maximization," in International Conference on Learning Representations, 2019.
[66]
T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton, "Big self-supervised models are strong semi-supervised learners," arXiv preprint arXiv:2006.10029, 2020.
[67]
P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm, "Deep graph infomax," arXiv preprint arXiv:1809.10341, 2018.
[68]
Z. Peng, W. Huang, M. Luo, Q. Zheng, Y. Rong, T. Xu, and J. Huang, "Graph representation learning via graphical mutual information maximization," in Proceedings of The Web Conference 2020, 2020, pp. 259–270.
[69]
Y. Jiao, Y. Xiong, J. Zhang, Y. Zhang, T. Zhang, and Y. Zhu, "Sub-graph contrast for scalable self-supervised graph representation learning," arXiv preprint arXiv:2009.10273, 2020.
[70]
Y. You, T. Chen, Y. Shen, and Z. Wang, "Graph contrastive learning automated," arXiv preprint arXiv:2106.07594, 2021.
[71]
Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, "What makes for good views for contrastive learning?" in Advances in Neural Information Processing Systems, 2020.
[72]
K. Xu, W. Hu, J. Leskovec, and S. Jegelka, "How powerful are graph neural networks?" in International Conference on Learning Representations, 2019.
[73]
C. Morris, N. M. Kriege, F. Bause, K. Kersting, P. Mutzel, and M. Neumann, "Tudataset: A collection of benchmark datasets for learning with graphs," in ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), 2020. [Online]. Available: www.graphlearning.io
[74]
V. P. Dwivedi, C. K. Joshi, T. Laurent, Y. Bengio, and X. Bresson, "Benchmarking graph neural networks," arXiv preprint arXiv:2003.00982, 2020.
[75]
N. M. Kriege, F. D. Johansson, and C. Morris, "A survey on graph kernels," Applied Network Science, vol. 5, no. 1, pp. 1–42, 2020.
[76]
P. Yanardag and S. Vishwanathan, "Deep graph kernels," in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 1365–1374.
[77]
N. Shervashidze, P. Schweitzer, E. J. v. Leeuwen, K. Mehlhorn, and K. M. Borgwardt, "Weisfeiler-lehman graph kernels," Journal of Machine Learning Research, vol. 12, no. Sep, pp. 2539–2561, 2011.
[78]
A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal, "graph2vec: Learning distributed representations of graphs," arXiv preprint arXiv:1707.05005, 2017.
[79]
B. Adhikari, Y. Zhang, N. Ramakrishnan, and B. A. Prakash, "Sub2vec: Feature learning for subgraphs," in Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2018, pp. 170–182.
[80]
T. M. Cover, Elements of information theory. John Wiley & Sons, 1999.
[81]
L. Babai, "Groups, graphs, algorithms: The graph isomorphism problem," in Proc. ICM, vol. 3. World Scientific, 2018, pp. 3303–3320.
[82]
H. A. Helfgott, J. Bajpai, and D. Dona, "Graph isomorphisms in quasi-polynomial time," arXiv preprint arXiv:1710.04574, 2017.
[83]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[84]
C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal, "Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization," ACM Transactions on Mathematical Software (TOMS), vol. 23, no. 4, pp. 550–560, 1997.
[85]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, "Liblinear: A library for large linear classification," Journal of Machine Learning Research, vol. 9, pp. 1871–1874, 2008.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems
December 2021
30517 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 June 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media