Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3580305.3599398acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Kernel Ridge Regression-Based Graph Dataset Distillation

Published: 04 August 2023 Publication History

Abstract

The huge volume of emerging graph datasets has become a double-bladed sword for graph machine learning. On the one hand, it empowers the success of a myriad of graph neural networks (GNNs) with strong empirical performance. On the other hand, training modern graph neural networks on huge graph data is computationally expensive. How to distill the given graph dataset while retaining most of the trained models' performance is a challenging problem. Existing efforts try to approach this problem by solving meta-learning-based bilevel optimization objectives. A major hurdle lies in that the exact solutions of these methods are computationally intensive and thus, most, if not all, of them are solved by approximate strategies which in turn hurt the distillation performance. In this paper, inspired by the recent advances in neural network kernel methods, we adopt a kernel ridge regression-based meta-learning objective which has a feasible exact solution. However, the computation of graph neural tangent kernel is very expensive, especially in the context of dataset distillation. As a response, we design a graph kernel, named LiteGNTK, tailored for the dataset distillation problem which is closely related to the classic random walk graph kernel. An effective model named Kernel rıdge regression-based graph Dataset Distillation (KIDD) and its variants are proposed. KIDD shows nice efficiency in both the forward and backward propagation processes. At the same time, KIDD shows strong empirical performance over 7 real-world datasets compared with the state-of-the-art distillation methods. Thanks to the ability to find the exact solution of the distillation objective, the learned training graphs by KIDD can sometimes even outperform the original whole training set with as few as 1.65% training graphs.

Supplementary Material

MP4 File (rtfp0477-2min-promo.mp4)
The promotion video of the KDD 2023 paper "Kernel Ridge Regression-Based Graph Dataset Distillation"

References

[1]
Abubakar Abid, Muhammad Fatih Balin, and James Zou. 2019. Concrete autoencoders for differentiable feature selection and reconstruction. arXiv preprint arXiv:1901.09346 (2019).
[2]
Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. 2019. A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning. PMLR, 242--252.
[3]
Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, Russ R Salakhutdinov, and Ruosong Wang. 2019. On exact computation with an infinitely wide neural net. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[4]
Chen Cai, Dingkang Wang, and Yusu Wang. 2021. Graph Coarsening with Neural Networks. In International Conference on Learning Representations.
[5]
George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. 2022. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4750--4759.
[6]
Benoît Colson, Patrice Marcotte, and Gilles Savard. 2005. Bilevel programming: A survey. 4or, Vol. 3, 2 (2005), 87--107.
[7]
Kaize Ding, Zhe Xu, Hanghang Tong, and Huan Liu. 2022. Data augmentation for deep graph learning: A survey. arXiv preprint arXiv:2202.08235 (2022).
[8]
Simon Du, Jason Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. 2019b. Gradient descent finds global minima of deep neural networks. In International conference on machine learning. PMLR, 1675--1685.
[9]
Simon S Du, Kangcheng Hou, Russ R Salakhutdinov, Barnabas Poczos, Ruosong Wang, and Keyulu Xu. 2019a. Graph neural tangent kernel: Fusing graph neural networks with graph kernels. Advances in neural information processing systems, Vol. 32 (2019).
[10]
Negin Entezari, Saba A Al-Sayouri, Amirali Darvishzadeh, and Evangelos E Papalexakis. 2020. All you need is low (rank) defending against adversarial attacks on graphs. In Proceedings of the 13th International Conference on Web Search and Data Mining. 169--177.
[11]
Reza Zanjirani Farahani and Masoud Hekmatfar. 2009. Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media.
[12]
Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. 2017. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning. PMLR, 1165--1173.
[13]
Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. 2018. Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning. PMLR, 1568--1577.
[14]
Luca Franceschi, Mathias Niepert, Massimiliano Pontil, and Xiao He. 2019. Learning discrete structures for graph neural networks. In International conference on machine learning. PMLR, 1972--1982.
[15]
Dongqi Fu and Jingrui He. 2022. Natural and Artificial Dynamics in Graphs: Concept, Progress, and Future. Frontiers Big Data, Vol. 5 (2022). https://doi.org/10.3389/fdata.2022.1062637
[16]
Dongqi Fu, Zhe Xu, Hanghang Tong, and Jingrui He. 2023. Natural and Artificial Dynamics in GNNs: A Tutorial. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM 2023, Singapore, 27 February 2023 - 3 March 2023, Tat-Seng Chua, Hady W. Lauw, Luo Si, Evimaria Terzi, and Panayiotis Tsaparas (Eds.). ACM, 1252--1255. https://doi.org/10.1145/3539597.3572726
[17]
Kunihiko Fukushima. 1975. Cognitron: A self-organizing multilayered neural network. Biological cybernetics, Vol. 20, 3 (1975), 121--136.
[18]
Johannes Gasteiger, Aleksandar Bojchevski, and Stephan Günnemann. 2019. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. In International Conference on Learning Representations.
[19]
Teofilo F Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. Theoretical computer science, Vol. 38 (1985), 293--306.
[20]
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 895--904.
[21]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, Vol. 33 (2020), 22118--22133.
[22]
Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, Vol. 31 (2018).
[23]
Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=rkE3y85ee
[24]
Dejun Jiang, Zhenxing Wu, Chang-Yu Hsieh, Guangyong Chen, Ben Liao, Zhe Wang, Chao Shen, Dongsheng Cao, Jian Wu, and Tingjun Hou. 2021. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. Journal of cheminformatics, Vol. 13, 1 (2021), 1--23.
[25]
Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph structure learning for robust graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 66--74.
[26]
Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. 2022a. Condensing graphs via one-step gradient matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 720--730.
[27]
Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah. 2022b. Graph Condensation for Graph Neural Networks. In International Conference on Learning Representations.
[28]
U Kang, Hanghang Tong, and Jimeng Sun. 2012. Fast random walk graph kernel. In Proceedings of the 2012 SIAM international conference on data mining. SIAM, 828--838.
[29]
Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. 2003. Marginalized kernels between labeled graphs. In Proceedings of the 20th international conference on machine learning (ICML-03). 321--328.
[30]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[31]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl
[32]
Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018. Deep Neural Networks as Gaussian Processes. In International Conference on Learning Representations.
[33]
Liangyue Li, Yuan Yao, Jie Tang, Wei Fan, and Hanghang Tong. 2016. Quint: on query-specific optimal networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 985--994.
[34]
Jonathan Lorraine, Paul Vicol, and David Duvenaud. 2020. Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics. PMLR, 1540--1552.
[35]
Dougal Maclaurin, David Duvenaud, and Ryan Adams. 2015. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning. PMLR, 2113--2122.
[36]
Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=S1jE5L5gl
[37]
Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. 2020. TUDataset: A collection of benchmark datasets for learning with graphs. In ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL 2020). arxiv: 2007.08663 www.graphlearning.io
[38]
Radford M Neal. 1996. Priors for infinite networks. In Bayesian Learning for Neural Networks. Springer, 29-53.
[39]
Timothy Nguyen, Zhourong Chen, and Jaehoon Lee. 2021. Dataset Meta-Learning from Kernel Ridge-Regression. In International Conference on Learning Representations.
[40]
Joonhyung Park, Hajin Shim, and Eunho Yang. 2022. Graph transplant: Node saliency-guided graph mixup with local structure preservation. In Proceedings of the First MiniCon Conference.
[41]
Fabian Pedregosa. 2016. Hyperparameter optimization with approximate gradient. In International conference on machine learning. PMLR, 737--746.
[42]
Jeff M Phillips. 2017. Coresets and sketches. In Handbook of discrete and computational geometry. Chapman and Hall/CRC, 1269--1288.
[43]
Ozan Sener and Silvio Savarese. 2018. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In International Conference on Learning Representations.
[44]
Amirreza Shaban, Ching-An Cheng, Nathan Hatch, and Byron Boots. 2019. Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1723--1732.
[45]
Susheel Suresh, Pan Li, Cong Hao, and Jennifer Neville. 2021. Adversarial graph augmentation to improve graph contrastive learning. Advances in Neural Information Processing Systems, Vol. 34 (2021), 15920--15933.
[46]
Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation. arXiv preprint arXiv:1811.10959 (2018).
[47]
Max Welling. 2009. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning. 1121--1128.
[48]
Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International conference on machine learning. PMLR, 6861--6871.
[49]
Lirong Wu, Haitao Lin, Zhangyang Gao, Cheng Tan, Stan Li, et al. 2021. Graphmixup: Improving class-imbalanced node classification on graphs by self-supervised context prediction. arXiv preprint arXiv:2106.11133 (2021).
[50]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations.
[51]
Zhe Xu, Kaize Ding, Yu-Xiong Wang, Huan Liu, and Hanghang Tong. 2022a. Generalized Few-Shot Node Classification. In IEEE International Conference on Data Mining, ICDM 2022, Orlando, FL, USA, November 28 - Dec. 1, 2022, Xingquan Zhu, Sanjay Ranka, My T. Thai, Takashi Washio, and Xindong Wu (Eds.). IEEE, 608--617. https://doi.org/10.1109/ICDM54844.2022.00071
[52]
Zhe Xu, Boxin Du, and Hanghang Tong. 2022b. Graph sanitation with application to node classification. In Proceedings of the ACM Web Conference 2022. 1136--1147.
[53]
Yuning You, Tianlong Chen, Yang Shen, and Zhangyang Wang. 2021. Graph contrastive learning automated. In International Conference on Machine Learning. PMLR, 12121--12132.
[54]
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. 2020. Graph contrastive learning with augmentations. Advances in Neural Information Processing Systems, Vol. 33 (2020), 5812--5823.
[55]
Si Zhang, Dawei Zhou, Mehmet Yigit Yildirim, Scott Alcorn, Jingrui He, Hasan Davulcu, and Hanghang Tong. 2017. Hidden: hierarchical dense subgraph detection with application to financial fraud detection. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 570--578.
[56]
Beidi Zhao, Boxin Du, Zhe Xu, Liangyue Li, and Hanghang Tong. 2022a. Learning Optimal Propagation for Graph Neural Networks. arXiv preprint arXiv:2205.02998 (2022).
[57]
Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2021b. Dataset Condensation with Gradient Matching. ICLR, Vol. 1, 2 (2021), 3.
[58]
Tong Zhao, Gang Liu, Stephan Günnemann, and Meng Jiang. 2022b. Graph Data Augmentation for Graph Machine Learning: A Survey. arXiv preprint arXiv:2202.08871 (2022).
[59]
Tong Zhao, Yozen Liu, Leonardo Neves, Oliver Woodford, Meng Jiang, and Neil Shah. 2021a. Data augmentation for graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11015--11023.
[60]
Tianxiang Zhao, Xiang Zhang, and Suhang Wang. 2021c. Graphsmote: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the 14th ACM international conference on web search and data mining. 833--841.
[61]
Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021. 2069--2080.

Cited By

View all
  • (2024)Graph Condensation for Open-World Graph LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671917(851-862)Online publication date: 25-Aug-2024
  • (2024)Graph Data Condensation via Self-expressive Graph Structure ReconstructionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671710(1992-2002)Online publication date: 25-Aug-2024
  • (2024)Self-Supervised Learning for Graph Dataset CondensationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671682(3289-3298)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph dataset distillation
  2. graph machine learning

Qualifiers

  • Research-article

Funding Sources

  • NSF
  • DARPA
  • ARO
  • DHS
  • NIFA

Conference

KDD '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)889
  • Downloads (Last 6 weeks)81
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Graph Condensation for Open-World Graph LearningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671917(851-862)Online publication date: 25-Aug-2024
  • (2024)Graph Data Condensation via Self-expressive Graph Structure ReconstructionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671710(1992-2002)Online publication date: 25-Aug-2024
  • (2024)Self-Supervised Learning for Graph Dataset CondensationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671682(3289-3298)Online publication date: 25-Aug-2024
  • (2024)Masked Graph Transformer for Large-Scale RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657971(2502-2506)Online publication date: 10-Jul-2024
  • (2024)Towards Mitigating Dimensional Collapse of Representations in Collaborative FilteringProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635832(106-115)Online publication date: 4-Mar-2024
  • (2024)Fast Graph Condensation with Structure-based Neural Tangent KernelProceedings of the ACM Web Conference 202410.1145/3589334.3645694(4439-4448)Online publication date: 13-May-2024
  • (2024)Globally Interpretable Graph Learning via Distribution MatchingProceedings of the ACM Web Conference 202410.1145/3589334.3645674(992-1002)Online publication date: 13-May-2024
  • (2024)Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00602(6296-6304)Online publication date: 16-Jun-2024
  • (2023)Hessian-aware Quantized Node Embeddings for RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608826(757-762)Online publication date: 14-Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media