Abstract
In recent years, there has been a significant surge in machine learning techniques, particularly in the domain of deep learning, tailored for handling attributed graphs. Nevertheless, to work, these methods assume that the attributes values are fully known, which is not realistic in numerous real-world applications. This paper explores the potential of Optimal Transport (OT) to impute missing attributes on graphs. To proceed, we design a novel multi-view OT loss function that can encompass both node feature data and the underlying topological structure of the graph by utilizing multiple graph representations. We then utilize this novel loss to train efficiently a Graph Convolutional Neural Network (GCN) architecture capable of imputing all missing values over the graph at once. We evaluate the interest of our approach with experiments both on synthetic data and real-world graphs, including different missingness mechanisms and a wide range of missing data. These experiments demonstrate that our method is competitive with the state-of-the-art in all cases and of particular interest on weakly homophilic graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code and additional material available at: github.com/RichardSrn/GRIOT.
References
Berg, R.V.D., Kipf, T.N., Welling, M.: Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017)
Brogat-Motte, L., Flamary, R., Brouard, C., Rousu, J., d’Alché-Buc, F.: Learning to predict graphs with Fused Gromov-Wasserstein barycenters. In: ICML, pp. 2321–2335 (2022)
Chen, X., Chen, S., Yao, J., Zheng, H., Zhang, Y., Tsang, I.W.: Learning on attribute-missing graphs. IEEE PAMI 44(2), 740–757 (2022)
Chung, F.R.: Spectral Graph Theory, vol. 92. American Mathematical Society (1997)
Craven, M., et al.: Learning to extract symbolic knowledge from the world wide web. AAAI/IAAI 3(3.6), 2 (1998)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Dwivedi, V.P., Bresson, X.: A generalization of transformer networks to graphs. CoRR abs/2012.09699 (2020). https://arxiv.org/abs/2012.09699
Flamary, R., et al.: POT: python optimal transport. J. Mach. Learn. Res. 22(78), 1–8 (2021)
Huisman, M.: Imputation of missing network data: some simple procedures. In: Encyclopedia of Social Network Analysis and Mining, pp. 707–715 (2014)
Jiang, B., Zhang, Z.: Incomplete graph representation and learning via partial graph neural networks. arXiv preprint arXiv:2003.10130 (2020)
Kantorovich, L.V.: Mathematical methods of organizing and planning production. Manage. Sci. 6(4), 366–422 (1960)
Kerdoncuff, T., Emonet, R., Perrot, M., Sebban, M.: Optimal tensor transport. In: AAAI Conference on Artificial Intelligence, vol. 36, pp. 7124–7132 (2022)
Khanam, K.Z., Srivastava, G., Mago, V.: The homophily principle in social network analysis: a survey. Multimedia Tools Appl. 82(6), 8811–8854 (2023)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2017)
Kossinets, G.: Effects of missing data in social networks. Soc. Netw. 247–268 (2006)
Laclau, C., Redko, I., Choudhary, M., Largeron, C.: All of the fairness for edge prediction with optimal transport. In: AISTATS, vol. 130, pp. 1774–1782 (2021)
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 793. Wiley, Hoboken (2019)
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol. 27(1), 415–444 (2001)
Mensch, A., Blondel, M., Peyré, G.: Geometric losses for distributional learning. In: International Conference on Machine Learning, pp. 4516–4525 (2019)
Monti, F., Bronstein, M., Bresson, X.: Geometric matrix completion with recurrent multi-graph neural networks. In: NIPS, vol. 30 (2017)
Muzellec, B., Josse, J., Boyer, C., Cuturi, M.: Missing data imputation using optimal transport. In: ICML, pp. 7130–7140 (2020)
Rossi, E., Kenlay, H., Gorinova, M.I., Chamberlain, B.P., Dong, X., Bronstein, M.M.: On the unreasonable effectiveness of feature propagation in learning on graphs with missing node features. In: Learning on Graphs Conference (2022)
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2011)
Taguchi, H., Liu, X., Murata, T.: Graph convolutional networks for graphs containing missing features. Futur. Gener. Comput. Syst. 117, 155–168 (2021)
Van Buuren, S., Groothuis-Oudshoorn, K.: Mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011)
Vayer, T., Chapel, L., Flamary, R., Tavenard, R., Courty, N.: Fused Gromov-Wasserstein distance for structured objects: theoretical foundations and mathematical properties. Algorithms 13(9), 212 (2020)
Villani, C., et al.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9
Yang, Z., Cohen, W., Salakhudinov, R.: Revisiting semi-supervised learning with graph embeddings. In: ICML, pp. 40–48 (2016)
Yoon, J., Jordon, J., van der Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: Proceedings of ICML, vol. 80, pp. 5689–5698 (2018)
Zhang, J., Xiao, X., Huang, L.K., Rong, Y., Bian, Y.: Fine-tuning graph neural networks via graph topology induced optimal transport. In: IJCAI, pp. 3730–3736 (2022)
Zheng, X., Liu, Y., Pan, S., Zhang, M., Jin, D., Yu, P.S.: Graph neural networks for graphs with heterophily: a survey. arXiv preprint arXiv:2202.07082 (2022)
Acknowledgment
This work has been funded by a public grant from the French National Research Agency (ANR) under the “France 2030” investment plan, which has the reference EUR MANUTECH SLEIGHT - ANR-17-EURE-0026.
This work is openly licensed via CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Statement
Datasets used in this article are publicly accessible, and the code is public, making the results reproducible (Code and additional material available at: github.com/RichardSrn/GRIOT). This paper presents work in machine learning, a field that can have potential societal consequences, but none of which we feel must be specifically highlighted here.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Serrano, R., Laclau, C., Jeudy, B., Largeron, C. (2024). Reconstructing the Unseen: GRIOT for Attributed Graph Imputation with Optimal Transport. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14946. Springer, Cham. https://doi.org/10.1007/978-3-031-70365-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-70365-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70364-5
Online ISBN: 978-3-031-70365-2
eBook Packages: Computer ScienceComputer Science (R0)