Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3627673.3679669acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Data Imputation from the Perspective of Graph Dirichlet Energy

Published: 21 October 2024 Publication History

Abstract

Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at https://github.com/liguanlue/GLPN.

References

[1]
Saeed Anwar and Nick Barnes. 2020. Densely residual laplacian super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 3 (2020), 1192--1204.
[2]
Vincent Audigier, Franccois Husson, and Julie Josse. 2016. Multiple imputation for continuous variables using a Bayesian principal component analysis. Journal of statistical computation and simulation, Vol. 86, 11 (2016), 2140--2156.
[3]
Michal Bechny, Florian Sobieczky, Jürgen Zeindl, and Lisa Ehrlinger. 2021. Missing Data Patterns: From Theory to an Application in the Steel Industry. In 33rd International Conference on Scientific and Statistical Database Management. 214--219.
[4]
Dimitris Bertsimas, Colin Pawlowski, and Ying Daisy Zhuo. 2017. From predictive methods to missing data imputation: an optimization approach. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 7133--7171.
[5]
Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. 2005. Protein function prediction via graph kernels. Bioinformatics, Vol. 21, suppl_1 (2005), i47--i56.
[6]
Shilpi Bose, Chandra Das, Tamaghna Gangopadhyay, and Samiran Chattopadhyay. 2013. A modified local least squares-based missing value estimation method in microarray gene expression data. In 2013 2nd International Conference on Advanced Computing, Networking and Security. IEEE, 18--23.
[7]
Shilpi Bose, Chandra Das, Tamaghna Gangopadhyay, and Samiran Chattopadhyay. 2013. A Modified Local Least Squares-Based Missing Value Estimation Method in Microarray Gene Expression Data. In 2013 2nd International Conference on Advanced Computing, Networking and Security, Mangalore, India, December 15--17, 2013. IEEE, 18--23.
[8]
Peter J Burt and Edward H Adelson. 1987. The Laplacian pyramid as a compact image code. In Readings in computer vision. Elsevier, 671--679.
[9]
Chen Cai and Yusu Wang. 2020. A note on over-smoothing for graph neural networks. arXiv preprint arXiv:2006.13318 (2020).
[10]
Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen. 2010. A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, Vol. 20, 4 (2010), 1956--1982.
[11]
Zhiyong Cui, Kristian Henrickson, Ruimin Ke, and Yinhai Wang. 2019. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Transactions on Intelligent Transportation Systems (2019).
[12]
Zhiyong Cui, Ruimin Ke, and Yinhai Wang. 2018. Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction. arXiv preprint arXiv:1801.02143 (2018).
[13]
Hiroshi De Silva and A Shehan Perera. 2016. Missing data imputation using Evolutionary k-Nearest neighbor algorithm for gene expression data. In 2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer). IEEE, 141--146.
[14]
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 39, 1 (1977), 1--22.
[15]
Tlamelo Emmanuel, Thabiso M. Maupong, Dimane Mpoeleng, Thabo Semong, Banyatsang Mphago, and Oteng Tabona. 2021. A survey on missing data in machine learning. J. Big Data, Vol. 8, 1 (2021), 140. https://doi.org/10.1186/s40537-021-00516--9
[16]
Hongyang Gao and Shuiwang Ji. 2019. Graph u-nets. In international conference on machine learning. PMLR, 2083--2092.
[17]
Ziqi Gao, Yifan Niu, Jiashun Cheng, Jianheng Tang, Lanqing Li, Tingyang Xu, Peilin Zhao, Fugee Tsung, and Jia Li. 2023. Handling missing data via max-entropy regularized graph autoencoder. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 7651--7659.
[18]
Pedro J García-Laencina, José-Luis Sancho-Gómez, and Aníbal R Figueiras-Vidal. 2010. Pattern classification with missing data: a review. Neural Computing and Applications, Vol. 19, 2 (2010), 263--282.
[19]
Alberto P García-Plaza, Víctor Fresno, Raquel Martínez Unanue, and Arkaitz Zubiaga. 2016. Using fuzzy logic to leverage HTML markup for web page representation. IEEE Transactions on Fuzzy Systems, Vol. 25, 4 (2016), 919--933.
[20]
Lovedeep Gondara and Ke Wang. 2018. Mida: Multiple imputation using denoising autoencoders. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 260--272.
[21]
Trevor Hastie, Rahul Mazumder, Jason D Lee, and Reza Zadeh. 2015. Matrix completion and low-rank SVD via fast alternating least squares. The Journal of Machine Learning Research, Vol. 16, 1 (2015), 3367--3402.
[22]
Wenbing Huang, Yu Rong, Tingyang Xu, Fuchun Sun, and Junzhou Huang. 2020. Tackling over-smoothing for general graph convolutional networks. arXiv preprint arXiv:2008.09864 (2020).
[23]
O Ivanov, M Figurnov, and D Vetrov. 2019. Variational autoencoder with arbitrary conditioning. In 7th International Conference on Learning Representations, ICLR 2019.
[24]
Phimmarin Keerin and Tossapon Boongoen. 2021. Improved knn imputation for missing values in gene expression data. Computers, Materials and Continua, Vol. 70, 2 (2021), 4009--4025.
[25]
Ki-Yeol Kim, Byoung-Jin Kim, and Gwan-Su Yi. 2004. Reuse of imputed data in microarray analysis increases imputation efficiency. BMC bioinformatics, Vol. 5, 1 (2004), 1--9.
[26]
Thomas N Kipf and Max Welling. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).
[27]
Thomas N Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.
[28]
Trent Kyono, Yao Zhang, Alexis Bellot, and Mihaela van der Schaar. 2021. MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[29]
Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2017. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition. 624--632.
[30]
Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2018. Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 11 (2018), 2599--2613.
[31]
Kamakshi Lakshminarayan, Steven A Harp, and Tariq Samad. 1999. Imputation of missing data in industrial databases. Applied intelligence, Vol. 11, 3 (1999), 259--275.
[32]
Marine Le Morvan, Julie Josse, Erwan Scornet, and Gaël Varoquaux. 2021. What'sa good imputation to predict with missing values? Advances in Neural Information Processing Systems, Vol. 34 (2021).
[33]
Junhyun Lee, Inyeop Lee, and Jaewoo Kang. 2019. Self-attention graph pooling. In International conference on machine learning. PMLR, 3734--3743.
[34]
Jia Li, Jiajin Li, Yang Liu, Jianwei Yu, Yueting Li, and Hong Cheng. 2021. Deconvolutional Networks on Graph Data. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[35]
Jia Li, Yu Rong, Hong Cheng, Helen Meng, Wenbing Huang, and Junzhou Huang. 2019. Semi-supervised graph classification: A hierarchical graph perspective. In The World Wide Web Conference. 972--982.
[36]
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations.
[37]
Derek Lim, Felix Hohne, Xiuyu Li, Sijia Linda Huang, Vaishnavi Gupta, Omkar Bhalerao, and Ser Nam Lim. 2021. Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods. Advances in Neural Information Processing Systems, Vol. 34 (2021), 20887--20902.
[38]
Sitao Luan, Chenqing Hua, Qincheng Lu, Jiaqi Zhu, Mingde Zhao, Shuyuan Zhang, Xiao-Wen Chang, and Doina Precup. 2022. Revisiting Heterophily For Graph Neural Networks. In Advances in Neural Information Processing Systems.
[39]
Pierre-Alexandre Mattei and Jes Frellsen. 2019. MIWAE: Deep generative modelling and imputation of incomplete data sets. In International conference on machine learning. PMLR, 4413--4423.
[40]
Rahul Mazumder, Trevor Hastie, and Robert Tibshirani. 2010. Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research, Vol. 11 (2010), 2287--2322.
[41]
Yimeng Min, Frederik Wenkel, and Guy Wolf. 2020. Scattering gcn: Overcoming oversmoothness in graph convolutional networks. Advances in Neural Information Processing Systems, Vol. 33 (2020), 14498--14508.
[42]
Stephen A Mistler and Craig K Enders. 2017. A comparison of joint model and fully conditional specification imputation for multilevel missing data. Journal of Educational and Behavioral Statistics, Vol. 42, 4 (2017), 432--466.
[43]
Karthika Mohan, Judea Pearl, and Jin Tian. 2013. Graphical models for inference with missing data. Advances in neural information processing systems, Vol. 26 (2013).
[44]
Christopher Morris, Nils M Kriege, Kristian Kersting, and Petra Mutzel. 2016. Faster kernels for graphs with continuous attributes via hashing. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 1095--1100.
[45]
Boris Muzellec, Julie Josse, Claire Boyer, and Marco Cuturi. 2020. Missing data imputation using optimal transport. In International Conference on Machine Learning. PMLR, 7130--7140.
[46]
Francesco Orsini, Paolo Frasconi, and Luc De Raedt. 2015. Graph invariant kernels. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
[47]
Jiwoong Park, Minsik Lee, Hyung Jin Chang, Kyuewang Lee, and Jin Young Choi. 2019. Symmetric graph convolutional autoencoder for unsupervised graph representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6519--6528.
[48]
Neta Rabin and Dalia Fishelov. 2017. Missing data completion using diffusion maps and laplacian pyramids. In International Conference on Computational Science and Its Applications. Springer, 284--297.
[49]
Neta Rabin and Dalia Fishelov. 2019. Two directional Laplacian pyramids with application to data imputation. Advances in Computational Mathematics, Vol. 45, 4 (2019), 2123--2146.
[50]
Emanuele Rossi, Henry Kenlay, Maria I Gorinova, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael Bronstein. 2021. On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features. arXiv preprint arXiv:2111.12128 (2021).
[51]
Donald B Rubin. 1976. Inference and missing data. Biometrika, Vol. 63, 3 (1976), 581--592.
[52]
Aude Sportisse, Claire Boyer, and Julie Josse. 2020. Estimation and imputation in probabilistic principal component analysis with missing not at random data. Advances in Neural Information Processing Systems, Vol. 33 (2020), 7067--7077.
[53]
Daniel J Stekhoven and Peter Bühlmann. 2012. MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics (2012).
[54]
Hibiki Taguchi, Xin Liu, and Tsuyoshi Murata. 2021. Graph convolutional networks for graphs containing missing features. Future Generation Computer Systems, Vol. 117 (2021), 155--168.
[55]
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B Altman. 2001 a. Missing value estimation methods for DNA microarrays. Bioinformatics, Vol. 17, 6 (2001), 520--525.
[56]
Olga G. Troyanskaya, Michael N. Cantor, Gavin Sherlock, Patrick O. Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B. Altman. 2001 b. Missing value estimation methods for DNA microarrays. Bioinform., Vol. 17, 6 (2001), 520--525.
[57]
Stef Van Buuren. 2007. Multiple imputation of discrete and continuous data by fully conditional specification. Statistical methods in medical research, Vol. 16, 3 (2007), 219--242.
[58]
Stef Van Buuren and Karin Groothuis-Oudshoorn. 2011. mice: Multivariate imputation by chained equations in R. Journal of statistical software, Vol. 45 (2011), 1--67.
[59]
Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International conference on machine learning. PMLR, 6861--6871.
[60]
Yangyang Wu, Jun Wang, Xiaoye Miao, Wenjia Wang, and Jianwei Yin. 2023. Differentiable and scalable generative adversarial models for data imputation. IEEE Transactions on Knowledge and Data Engineering (2023).
[61]
Yuankai Wu, Dingyi Zhuang, Aurelie Labbe, and Lijun Sun. 2021. Inductive Graph Neural Networks for Spatiotemporal Kriging. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4478--4485.
[62]
Jinsung Yoon, James Jordon, and Mihaela Schaar. 2018. Gain: Missing data imputation using generative adversarial nets. In International conference on machine learning. PMLR, 5689--5698.
[63]
Jiaxuan You, Xiaobai Ma, Yi Ding, Mykel J Kochenderfer, and Jure Leskovec. 2020. Handling missing data with graph representation learning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 19075--19087.
[64]
Xin Zheng, Yixin Liu, Shirui Pan, Miao Zhang, Di Jin, and Philip S Yu. 2022. Graph neural networks for graphs with heterophily: A survey. arXiv preprint arXiv:2202.07082 (2022).
[65]
Kaixiong Zhou, Xiao Huang, Daochen Zha, Rui Chen, Li Li, Soo-Hyun Choi, and Xia Hu. 2021. Dirichlet energy constrained learning for deep graph neural networks. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[66]
Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. 2020. Beyond homophily in graph neural networks: Current limitations and effective designs. Advances in Neural Information Processing Systems, Vol. 33 (2020), 7793--7804.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
October 2024
5705 pages
ISBN:9798400704369
DOI:10.1145/3627673
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dirichlet energy
  2. graph deep learning
  3. missing data

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China Grant
  • the Nansha Key Area Science and Technology Project
  • the Guangzhou Industrial Information and Intelligent Key Laboratory Project

Conference

CIKM '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 104
    Total Downloads
  • Downloads (Last 12 months)104
  • Downloads (Last 6 weeks)23
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media