research-article

Self-attentive Rationalization for Interpretable Graph Contrastive Learning

Authors:

Tat-Seng ChuaAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 19, Issue 2

Article No.: 43, Pages 1 - 21

https://doi.org/10.1145/3665894

Published: 15 February 2025 Publication History

Abstract

Graph augmentation is the key component to reveal instance-discriminative features of a graph as its rationale—an interpretation for it—in graph contrastive learning (GCL). Existing rationale-aware augmentation mechanisms in GCL frameworks roughly fall into two categories and suffer from inherent limitations: (1) non-heuristic methods with the guidance of domain knowledge to preserve salient features, which require expensive expertise and lack generality, or (2) heuristic augmentations with a co-trained auxiliary model to identify crucial substructures, which face not only the dilemma between system complexity and transformation diversitybut also the instability stemming from the co-training of two separated sub-models. Inspired by recent studies on transformers, we propose self-attentive rationale-guided GCL (SR-GCL), which integrates rationale generator and encoder together, leverages the self-attention values in transformer module as a natural guidance to delineate semantically informative substructures from both node- and edge-wise perspectives, and contrasts on rationale-aware augmented pairs. On real-world biochemistry datasets, visualization results verify the effectiveness and interpretability of self-attentive rationalization, and the performance on downstream tasks demonstrates the state-of-the-art performance of SR-GCL for graph model pre-training. Codes are available at https://github.com/lsh0520/SR-GCL.

References

[1]

Shiyu Chang, Yang Zhang, Mo Yu, and Tommi S. Jaakkola. 2020. Invariant Rationalization. In Proceedings of International Conference on Machine Learning Research (PMLR), Vol. 119. 1448–1458.

[2]

Dexiong Chen, Leslie O’Bray, and Karsten M. Borgwardt. 2022. Structure-Aware Transformer for Graph Representation Learning. In Proceedings of International Conference on Machine Learning Research (PMLR), Vol. 162. 3469–3489.

[3]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of International Conference on Machine Learning Research (PMLR), Vol. 119. 1597–1607.

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1). Association for Computational Linguistics, 4171–4186.

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations (ICLR). Retrieved from https://openreview.net/

[6]

Vijay Prakash Dwivedi and Xavier Bresson. 2020. A Generalization of Transformer Networks to Graphs. arXiv:2012.09699v2. Retrieved from https://arxiv.org/abs/2012.09699

[7]

William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems (NIPS). 1024–1034.

[8]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked Autoencoders Are Scalable Vision Learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16000–16009.

[9]

Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, and Jie Tang. 2022. GraphMAE: Self-Supervised Masked Graph Autoencoders. In Proceedings of 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). ACM, 594–604.

Digital Library

[10]

Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay S. Pande, and Jure Leskovec. 2020. Strategies for Pre-training Graph Neural Networks. In International Conference on Learning Representations.

[11]

Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparametrization with Gumble-Softmax. In International Conference on Learning Representations (ICLR ’17). Retrieved from https://openreview.net/

[12]

Devin Kreuzer, Dominique Beaini, William L. Hamilton, Vincent Létourneau, and Prudencio Tossou. 2021. Rethinking Graph Transformers with Spectral Attention. In 35th Conference on Neural Information Processing Systems (NeurIPS). 21618–21629.

Digital Library

[13]

Greg Landrum. 2010. RDkit, Open-Source Cheminformatics Software. Retrieved from https://www.rdkit.org/

[14]

Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, and Qi Tian. 2024. Towards 3D Molecule-Text Interpretation in Language Models. arXiv:2401.13923v2. Retrieved from https://arxiv.org/abs/2401.13923

[15]

Sihang Li, Xiang Wang, An Zhang, Yingxin Wu, Xiangnan He, and Tat-Seng Chua. 2022. Let Invariant Rationale Discovery Inspire Graph Contrastive Learning. In Proceedings of International Conference on Machine Learning Research (PMLR), Vol. 162. 13052–13065.

[16]

Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. 2022. Pre-Training Molecular Graph Representation with 3D Geometry. In International Conference on Learning Representations (ICLR). Retrieved from https://openreview.net/

[17]

Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. 2023. MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. In EMNLP. Association for Computational Linguistics, 15623–15638.

[18]

Yanchen Luo, Sihang Li, Yongduo Sui, Junkang Wu, Jiancan Wu, and Xiang Wang. 2024. Masked Graph Modeling with Multi-View Contrast. In International Conference on Data Engineering (ICDE).

[19]

Grégoire Mialon, Dexiong Chen, Margot Selosse, and Julien Mairal. 2021. GraphiT: Encoding Graph Structure in Transformers. arXiv:2106.05667v1. Retrieved from https://arxiv.org/abs/2106.05667

[20]

Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. 2020. TUDataset: A Collection of Benchmark Datasets for Learning with Graphs. arXiv:2007.08663v1. Retrieved from https://arxiv.org/abs/2007.08663

[21]

Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and Shantanu Jaiswal. 2017. graph2vec: Learning Distributed Representations of Graphs. arXiv:1707.05005. Retrieved from https://arxiv.org/abs/1707.05005 (2017).

[22]

Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang. 2020. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). ACM, 1150–1160.

Digital Library

[23]

Ladislav Rampásek, Mikhail Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. 2022. Recipe for a general, powerful, scalable graph transformer. arXiv:2205.12454. Retrieved from https://arxiv.org/abs/2205.12454

[24]

Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. 2020. Self-Supervised Graph Transformer on Large-Scale Molecular Data. In Advances in Neural Information Processing Systems (NeurIPS).

[25]

Yaorui Shi, An Zhang, Enzhi Zhang, Zhiyuan Liu, and Xiang Wang. 2023. ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction. In EMNLP (Findings). Association for Computational Linguistics, 5506–5520.

[26]

Teague Sterling and John J Irwin. 2015. ZINC 15–Ligand Discovery for Everyone. Journal of Chemical Information and Modeling 55, 11 (2015), 2324–2337.

[27]

Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. 2020. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In International Conference on Learning Representations (ICLR).

[28]

Susheel Suresh, Pan Li, Cong Hao, and Jennifer Neville. 2021. Adversarial Graph Augmentation to Improve Graph Contrastive Learning. In 35th Conference on Neural Information Processing Systems (NeurIPS). 15920–15933.

Digital Library

[29]

Jiliang Tang, Salem Alelyani, and Huan Liu. 2014. Feature Selection for Classification: A Review. In Data Classification: Algorithms and Applications. CRC Press. 37–64.

[30]

Josephine M. Thomas, Alice Moallemy-Oureh, Silvia Beddar-Wiesing, and Clara Holzhüter. 2022. Graph Neural Networks Designed for Different Graph Types: A Survey. arXiv:2204.03080. Retrieved from https://arxiv.org/abs/2204.03080

[31]

Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748. Retrieved from https://arxiv.org/abs/1807.03748

[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All you Need. In Advances in Neural Information Processing Systems (NIPS). 5998–6008.

[33]

Petar Velickovic, William Fedus, William L. Hamilton, Pietro Lio, Yoshua Bengio, and R. Devon Hjelm. 2019. Deep Graph Infomax. In International Conference on Learning Representations (ICLR) (Poster). Retrieved from https://openreview.net/

[34]

Feng Wang and Huaping Liu. 2021. Understanding the Behaviour of Contrastive Loss. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Computer Vision Foundation/IEEE, 2495–2504.

[35]

Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip Yu. 2022. Generalizing to Unseen Domains: A Survey on Domain Generalization. IEEE Transactions on Knowledge and Data Engineering 35, 8 (2022), 8052–8072.

Digital Library

[36]

Yifei Wang, Qi Zhang, Yisen Wang, Jiansheng Yang, and Zhouchen Lin. 2021. Chaos Is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap. In International Conference on Learning Representations.

[37]

Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, and Ion Stoica. 2021. Representing Long-Range Context for Graph Neural Networks with Global Attention. In 35th Conference on Neural Information Processing Systems (NeurIPS). 13266–13279.

Digital Library

[38]

Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. 2018a. MoleculeNet: A Benchmark for Molecular Machine Learning. Chemical Science 9, 2 (2018), 513–530.

[39]

Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. 2018b. Unsupervised Feature Learning via Non-Parametric Instance Discrimination. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Computer Vision Foundation/IEEE Computer Society. 3733–3742.

[40]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks? In International Conference on Learning Representations (ICLR).

[41]

Minghao Xu, Hang Wang, Bingbing Ni, Hongyu Guo, and Jian Tang. 2021. Self-Supervised Graph-Level Representation Learning with Local and Global Structure. In Proceedings of International Conference on Machine Learning Research (PMLR), Vol. 139. 11548–11558.

[42]

Xinyi Xu, Cheng Deng, Yaochen Xie, and Shuiwang Ji. 2022. Group Contrastive Self-Supervised Learning on Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 3169–3180.

[43]

Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. 2021. Do Transformers Really Perform Badly for Graph Representation?. In 35th Conference on Neural Information Processing Systems (NeurIPS). 28877–28888.

Digital Library

[44]

Yuning You, Tianlong Chen, Yang Shen, and Zhangyang Wang. 2021. Graph Contrastive Learning Automated. In Proceedings of International Conference on Machine Learning Research (PMLR), Vol. 139. 12121–12132.

[45]

Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph Contrastive Learning with Agmentations. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vol. 33. 5812–5823.

[46]

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontañón, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. In 34th Conference on Neural Information Processing Systems (NeurIPS).

[47]

Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2020. Deep Graph Contrastive Representation Learning. arXiv:2006.04131. Retrieved from https://arxiv.org/abs/2006.04131

[48]

Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021. Graph Contrastive Learning with Adaptive Augmentation. In Proceedings of Web Conference 2021 (WWW ’21). 2069–2080.

Digital Library

Cited By

Fan WZhao STang J(2025)Introduction for the Special Issue on Trustworthy Artificial IntelligenceACM Transactions on Knowledge Discovery from Data10.1145/371218419:2(1-6)Online publication date: 16-Feb-2025
https://dl.acm.org/doi/10.1145/3712184

Recommendations

Adaptive Graph Augmentation for Graph Contrastive Learning
Advanced Intelligent Computing Technology and Applications
Abstract
Graph contrastive learning emerged as a promising method for graph representation learning. The traditional graph contrastive methods utilize data augmentations for original graphs and train models during pre-training, and for different downstream ...
JGCL: Joint Self-Supervised and Supervised Graph Contrastive Learning
WWW '22: Companion Proceedings of the Web Conference 2022

Semi-supervised and self-supervised learning on graphs are two popular avenues for graph representation learning. We demonstrate that no single method from semi-supervised and self-supervised learning works uniformly well for all settings in the node ...
A self-attention based contrastive learning method for bearing fault diagnosis
Abstract
The shortage of labeled data is a major obstacle to the practical application of advanced fault diagnosis technologies, and the large amount of unlabeled data may be the key to solving this problem. This paper proposes a self-attention based ...
Highlights
- A contrastive learning method that relies on only positive pairs is introduced.
- A Signal Transformer based on the self-attention mechanism is proposed.
- Data augmentation methods for 1D vibration signals are investigated in detail.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 19, Issue 2

February 2025

651 pages

EISSN:1556-472X

DOI:10.1145/3703012

Editor:
Jian Pei
Duke University

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2025

Online AM: 23 May 2024

Accepted: 01 May 2024

Revised: 11 February 2024

Received: 08 September 2023

Published in TKDD Volume 19, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
University Synergy Innovation Program of Anhui Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
417
Total Downloads

Downloads (Last 12 months)417
Downloads (Last 6 weeks)41

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fan WZhao STang J(2025)Introduction for the Special Issue on Trustworthy Artificial IntelligenceACM Transactions on Knowledge Discovery from Data10.1145/371218419:2(1-6)Online publication date: 16-Feb-2025
https://dl.acm.org/doi/10.1145/3712184

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents