Ada-MIP: Adaptive Self-supervised Graph Representation Learning via Mutual Information and Proximity Optimization
Abstract
1 Introduction
2 Related Work
2.1 Graph Kernels
2.2 Graph Contrastive Learning
3 Preliminaries
3.1 Graph Neural Networks
3.2 Weisfeiler-Lehman Test
4 Proposed Ada-MIP
4.1 Top-k Graph Augmenter
4.2 Multi-head Encoder
4.3 Time Complexity Analysis
5 Semi-supervised Ada-MIP
6 Experiments
6.1 Datasets
Biochemical Network | Social Network | ||||||||
---|---|---|---|---|---|---|---|---|---|
Dataset | NCI1 | PROTEINS | DD | PTC-MR | MUTAG | COLLAB | IMDB-B | IMDB-M | RDT-B |
#Graphs | 4,110 | 1,113 | 1,178 | 344 | 188 | 5,000 | 1,000 | 1,500 | 2,000 |
Avg. #Nodes | 29.87 | 39.06 | 284.32 | 14.29 | 17.93 | 74.5 | 19.77 | 13.0 | 508.52 |
Avg. #Edges | 32.30 | 72.82 | 715.66 | 14.69 | 19.79 | 2,457.78 | 96.53 | 65.94 | 497.75 |
#Classes | 2 | 2 | 2 | 2 | 2 | 3 | 2 | 3 | 2 |
6.2 Model Configuration
6.3 Baseline Methods
6.4 Experimental Results on Unsupervised Learning
Dataset | NCI1 | PROTEINS | DD | PTC-MR | MUTAG | COLLAB | IMDB-B | IMDB-M | RDT-B |
---|---|---|---|---|---|---|---|---|---|
SP | \(73.2\pm 0.3\) | \(69.8 \pm 1.2\) | \(70.4 \pm 3.6\) | \(58.2\pm 2.4\) | \(85.2\pm 2.4\) | \(64.5 \pm 2.1\) | \(55.6\pm 0.2\) | \(38.0\pm 0.3\) | \(64.1\pm 0.1\) |
WL | \(80.0\pm 0.5\) | \(72.9\pm 0.6\) | \(74.0 \pm 2.2\) | \(58.0\pm 0.5\) | \(80.7\pm 3.0\) | \(69.3 \pm 3.4\) | \(72.3\pm 3.4\) | \(47.0\pm 0.5\) | \(68.8\pm 0.4\) |
PM | \(73.3\pm 0.3\) | \(71.8 \pm 0.9\) | \(75.1 \pm 1.6\) | \(61.5\pm 1.3\) | \(84.9\pm 1.2\) | \(71.2 \pm 0.9\) | \(70.7\pm 0.6\) | \(47.8\pm 0.6\) | \(82.3\pm 0.2\) |
MLG | \(74.1 \pm 0.6\) | \(72.4 \pm 1.3\) | \(75.9 \pm 2.4\) | \(63.3\pm 1.5\) | \(87.9\pm 1.6\) | \(72.7 \pm 0.8\) | \(66.6\pm 0.3\) | \(41.2\pm 0.03\) | >1 Day |
Node2vec | \(54.9 \pm 1.6\) | \(57.5 \pm 3.6\) | \(67.1 \pm 3.5\) | \(58.6\pm 8.0\) | \(72.6\pm 10.2\) | \(66.2 \pm 5.3\) | \(56.4 \pm 2.8\) | \(38.1 \pm 3.3\) | \(69.7 \pm 4.1\) |
Sub2vec | \(52.8 \pm 1.5\) | \(53.0 \pm 5.6\) | \(67.5 \pm 1.3\) | \(60.0\pm 6.4\) | \(61.1\pm 15.8\) | \(67.4 \pm 1.7\) | \(55.3\pm 1.5\) | \(36.7\pm 0.8\) | \(71.5\pm 0.4\) |
Graph2vec | \(73.2 \pm 1.8\) | \(71.9 \pm 1.8\) | \(68.7 \pm 2.3\) | \(60.2\pm 6.9\) | \(83.2\pm 9.3\) | \(72.0 \pm 1.3\) | \(67.0\pm 0.6\) | \(44.6\pm 0.5\) | \(75.8\pm 1.0\) |
GVAE | \(74.4 \pm 0.2\) | \(72.9 \pm 0.4\) | \(71.1 \pm 1.3\) | \(61.2\pm 1.8\) | \(87.7\pm 0.7\) | \(75.1 \pm 0.2\) | \(70.7\pm 0.7\) | \(49.3\pm 0.4\) | \(87.1\pm 0.1\) |
InfoGraph | \(76.2\pm 1.1\) | \(74.4\pm 0.3\) | \(72.9\pm 1.8\) | \(61.7\pm 1.4\) | \(89.0\pm 1.1\) | \(70.7 \pm 1.1\) | \(73.0\pm 0.9\) | \(49.7\pm 0.5\) | \(82.5\pm 1.4\) |
GraphCL | \(77.9\pm 0.4\) | \(74.4\pm 0.5\) | \(\underline{78.6\pm 0.4}\) | \(60.2 \pm 1.7\) | \(86.8\pm 1.3\) | \(71.4 \pm 1.1\) | \(71.1\pm 0.4\) | \(48.5\pm 0.6\) | \(89.5\pm 0.8\) |
JOAO | \(77.5 \pm 0.9\) | \(74.6\pm 0.4\) | \(77.3\pm 0.5\) | \(60.4 \pm 1.7\) | \(87.4\pm 1.0\) | \(69.5 \pm 0.4\) | \(70.2\pm 3.1\) | \(48.8 \pm 0.8\) | \(85.3\pm 1.4\) |
JOAOv2 | \(78.2 \pm 1.4\) | \(74.1\pm 1.1\) | \(77.4\pm 1.2\) | \(61.0 \pm 2.8\) | \(87.7\pm 0.8\) | \(69.3 \pm 0.4\) | \(70.8\pm 0.3\) | \(49.2 \pm 0.5\) | \(86.4\pm 1.5\) |
MVGRL | \(75.1\pm 0.5\) | \(71.5\pm 0.3\) | OOM | \(62.5\pm 1.7\) | \(89.7\pm 1.1\) | OOM | \(\underline{74.2\pm 0.7}\) | \(51.2\pm 0.5\) | \(84.5\pm 0.6\) |
AD-GCL | \(75.8\pm 0.5\) | \(75.0\pm 0.5\) | \(75.4\pm 0.4\) | \(63.2 \pm 2.4\) | \(88.6\pm 1.1\) | \(74.8 \pm 0.4\) | \(71.5\pm 1.0\) | \(50.6 \pm 0.7\) | \(\mathbf {92.0\pm 0.4}\) |
GASSL | \(80.2\pm 1.9\) | \(-\) | \(-\) | \(\underline{64.6\pm 6.1}\) | \(\underline{90.9\pm 7.9}\) | \(\underline{78.0 \pm 2.0}\) | \(\underline{74.2\pm 0.5}\) | \(\underline{51.7\pm 2.5}\) | \(-\) |
AutoGCL | \(\mathbf {82.2\pm 0.3}\) | \(75.8 \pm 0.4\) | \(77.6 \pm 0.6\) | \(63.1 \pm 2.3\) | \(88.6\pm 1.2\) | \(70.1 \pm 0.7\) | \(73.3\pm 0.4\) | \(50.6 \pm 0.8\) | \(88.6\pm 0.5\) |
SimGRACE | \(79.1\pm 0.4\) | \(75.4 \pm 0.1\) | \(77.4 \pm 1.1\) | \(63.2 \pm 3.1\) | \(89.0\pm 1.3\) | \(71.7 \pm 0.8\) | \(71.3\pm 0.8\) | \(50.9 \pm 0.9\) | \(89.5\pm 0.9\) |
Ada-MIP | \(\underline{80.7\pm 0.7}\) | \(\mathbf {77.0\pm 0.5}\) | \(\mathbf {81.7\pm 1.2}\) | \(\mathbf {66.0\pm 1.4}\) | \(\mathbf {91.5\pm 2.5}\) | \(\mathbf {79.5 \pm 1.1}\) | \(\mathbf {75.4\pm 1.9}\) | \(\mathbf {52.1\pm 0.5}\) | \(\underline{91.1\pm 1.1}\) |
6.5 Experimental Results on Semi-supervised Learning
Target | Mu(0) | Alpha(1) | HOMO(2) | LUMO(3) | Gap(4) | R2(5) | ZPVE(6) | U0(7) | U(8) | H(9) | G(10) | Cv(11) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Sup-GNN | 0.254 | 0.554 | 0.171 | 0.168 | 0.259 | 3.91 | 0.0096 | 7.69 | 9.30 | 7.32 | 7.63 | 0.182 |
InfoGraph | 0.236 | \(\underline{0.517}\) | 0.170 | 0.159 | 0.249 | 3.87 | 0.0093 | 7.58 | 8.29 | 6.17 | 6.39 | 0.180 |
GraphCL | 0.242 | 0.551 | 0.167 | 0.163 | 0.254 | 3.54 | 0.0091 | 7.49 | 8.85 | 7.14 | 6.70 | 0.177 |
MVGRL | 0.248 | 0.537 | 0.165 | 0.165 | 0.251 | 3.57 | 0.0088 | 5.55 | 5.96 | 6.45 | 6.38 | 0.181 |
AD-GCL | 0.250 | 0.536 | 0.168 | 0.166 | 0.234 | 3.17 | 0.0085 | 7.20 | 5.91 | 6.98 | 6.08 | 0.175 |
Ada-MIP | 0.230 | 0.549 | 0.161 | 0.149 | \(\underline{0.228}\) | 2.42 | 0.0077 | 5.46 | 4.98 | 5.59 | 5.14 | 0.176 |
Ada-MIP-AD | \(\underline{0.227}\) | \(\mathbf {0.511}\) | \(\mathbf {0.159}\) | \(\mathbf {0.147}\) | 0.231 | \(\mathbf {2.14}\) | \(\mathbf {0.0074}\) | \(\underline{4.38}\) | \(\mathbf {4.08}\) | \(\underline{3.98}\) | \(\underline{3.91}\) | \(\underline{0.173}\) |
Ada-MIP-SAD | \(\mathbf {0.223}\) | 0.535 | \(\underline{0.160}\) | \(\underline{0.148}\) | \(\mathbf {0.225}\) | \(\underline{2.28}\) | \(\underline{0.0076}\) | \(\mathbf {4.06}\) | \(\underline{4.23}\) | \(\mathbf {3.39}\) | \(\mathbf {3.67}\) | \(\mathbf {0.172}\) |
6.6 Ablation Studies
Dataset | PROTEINS | DD | PTC-MR | IMDB-B | IMDB-M |
---|---|---|---|---|---|
Ada-MIP w/o CH | 73.5 | 73.4 | 60.3 | 72.5 | 48.5 |
Ada-MIP w/o PH | 75.7 | 80.4 | 63.6 | 74.5 | 51.7 |
Ada-MIP w/o TGA | 76.0 | 79.4 | 64.6 | 74.8 | 51.5 |
Ada-MIP w SP | 76.5 | 80.9 | 64.2 | 75.3 | \(\mathbf {52.1}\) |
Ada-MIP w PM | 76.5 | 80.7 | 64.5 | 75.0 | 51.8 |
Ada-MIP w MLG | 76.7 | 80.2 | 63.9 | 74.4 | 51.9 |
Ada-MIP w WL | \(\mathbf {77.0}\) | \(\mathbf {81.7}\) | \(\mathbf {66.0}\) | \(\mathbf {75.4}\) | \(\mathbf {52.1}\) |
2 Views (Avg) | 76.3 | 80.5 | 64.3 | 74.3 | 51.6 |
2 Views (Max) | 76.7 | 80.9 | 65.3 | 74.7 | 51.7 |
3 Views (Avg) | 76.4 | 81.6 | 65.3 | 74.4 | 51.9 |
3 Views (Max) | 76.5 | \(\mathbf {82.5}\) | \(\mathbf {66.2}\) | 74.8 | \(\mathbf {52.1}\) |
4 Views | \(\mathbf {77.0}\) | 81.7 | 66.0 | \(\mathbf {75.4}\) | \(\mathbf {52.1}\) |
6.7 Hyper-parameter Sensitivity Analysis
Dataset | PROTEINS | DD | PTC-MR | IMDB-B | IMDB-M |
---|---|---|---|---|---|
\(\alpha =0.001\) | 76.9 | 81.1 | 66.4 | 74.3 | 51.7 |
\(\alpha =0.005\) | \(\mathbf {77.0}\) | \(\mathbf {81.7}\) | 66.0 | \(\mathbf {75.4}\) | 52.1 |
\(\alpha =0.01\) | 76.8 | 81.5 | \(\mathbf {66.1}\) | 74.7 | \(\mathbf {52.2}\) |
\(\alpha =0.05\) | 76.5 | 80.7 | 64.8 | 74.7 | 51.9 |
GraphCL | 74.4 | 78.6 | 60.2 | 71.1 | 48.5 |
AutoGCL | 75.0 | 77.4 | 63.1 | 73.3 | 50.6 |
\(k_e^{EP}=0.95\) | 75.1 | 79.7 | 63.1 | 73.2 | \(\mathbf {51.3}\) |
\(k_e^{EP}=0.9\) | 75.2 | 79.9 | 63.4 | 73.6 | 51.0 |
\(k_e^{EP}=0.85\) | \(\mathbf {75.5}\) | \(\mathbf {80.1}\) | \(\mathbf {63.9}\) | \(\mathbf {73.8}\) | 50.7 |
\(k_e^{EP}=0.8\) | 74.9 | 79.5 | 63.3 | 73.1 | 50.4 |
\(k^{ND}_n=0.95\) | 75.1 | 79.9 | 63.3 | 72.9 | 50.1 |
\(k^{ND}_n=0.9\) | \(\mathbf {75.3}\) | \(\mathbf {80.2}\) | \(\mathbf {64.1}\) | 73.4 | 50.4 |
\(k^{ND}_n=0.85\) | 74.8 | 79.8 | 63.5 | \(\mathbf {73.5}\) | \(\mathbf {50.7}\) |
\(k^{ND}_n=0.8\) | 74.5 | 79.8 | 63.1 | 73.1 | 50.2 |
\(k^{FM}_n=0.95\) | 75.1 | 79.4 | 62.9 | 73.9 | 50.9 |
\(k^{FM}_n=0.9\) | 75.4 | \(\mathbf {79.6}\) | \(\mathbf {63.2}\) | 74.1 | \(\mathbf {51.3}\) |
\(k^{FM}_n=0.85\) | \(\mathbf {75.5}\) | 79.2 | 62.7 | \(\mathbf {74.2}\) | 51.0 |
\(k^{FM}_n=0.8\) | 75.2 | 79.0 | 62.3 | 73.7 | 50.7 |
\(l=1\) | 75.1 | 79.3 | 62.1 | 73.8 | 50.5 |
\(l=2\) | \(\mathbf {75.7}\) | 79.7 | \(\mathbf {63.0}\) | 74.0 | 50.8 |
\(l=3\) | 75.4 | 79.4 | 62.3 | \(\mathbf {74.4}\) | \(\mathbf {51.1}\) |
6.8 Contrastive Loss Plot
6.9 Model Running Time
GraphCL | GVAE | Ada-MIP | |||||
---|---|---|---|---|---|---|---|
Total | Time per epoch | Total | Time per epoch | Total | Precomputing | Time per epoch | |
RDT-B | 939.3 | \(3.13 \pm 0.32\) | 2,364.6 | \(7.88 \pm 0.47\) | 1194.25 | 18.25 | \(3.92\pm 0.28\) |
DD | 864.4 | \(2.88 \pm 0.35\) | 2,016.8 | \(6.72 \pm 0.38\) | 1,116.6 | 15.60 | \(3.67 \pm 0.39\) |
6.10 Visualization
7 Conclusion
A Algorithms of Augmentation Heads
B Proof of Theorem 1
References
Index Terms
- Ada-MIP: Adaptive Self-supervised Graph Representation Learning via Mutual Information and Proximity Optimization
Recommendations
Self-supervised Graph Learning with Segmented Graph Channels
Machine Learning and Knowledge Discovery in DatabasesAbstractSelf-supervised graph learning adopts self-defined signals as supervision to learn representations. This learning paradigm solves the critical problem of utilizing unlabeled graph data. Conventional self-supervised graph learning methods rely on ...
A debiased self-training framework with graph self-supervised pre-training aided for semi-supervised rumor detection
AbstractExisting rumor detection models have achieved remarkable performance in fully-supervised settings. However, it is time-consuming and labor-intensive to obtain extensive labeled rumor data. To mitigate the reliance on labeled data, semi-supervised ...
Highlights- A self-training framework for semi-supervised rumor detection is proposed.
- Graph self-supervised pre-training is employed to alleviate confirmation bias.
- Self-adaptive thresholds are designed to generate reliable pseudo-labels.
JGCL: Joint Self-Supervised and Supervised Graph Contrastive Learning
WWW '22: Companion Proceedings of the Web Conference 2022Semi-supervised and self-supervised learning on graphs are two popular avenues for graph representation learning. We demonstrate that no single method from semi-supervised and self-supervised learning works uniformly well for all settings in the node ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Funding Sources
- NSF
- 100-Talents Program of Xinhua News Agency, and the Program of Shanghai Academic/Technology Research Leader
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 901Total Downloads
- Downloads (Last 12 months)499
- Downloads (Last 6 weeks)99
Other Metrics
Citations
Cited By
View allView Options
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderHTML Format
View this article in HTML Format.
HTML FormatLogin options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in