research-article

Homogeneous network embedding for massive graphs via reweighted personalized PageRank

Authors:

Sourav S. BhowmickAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 13, Issue 5

Pages 670 - 683

https://doi.org/10.14778/3377369.3377376

Published: 01 January 2020 Publication History

Abstract

Given an input graph G and a node v ∈ G, homogeneous network embedding (HNE) maps the graph structure in the vicinity of v to a compact, fixed-dimensional feature vector. This paper focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility.

Our proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic idea of deriving embedding vectors from pairwise personalized PageRank (PPR) values. Our contributions are twofold: first, we design a simple and efficient baseline HNE method based on PPR that is capable of handling billion-edge graphs on commodity hardware; second and more importantly, we identify an inherent drawback of vanilla PPR, and address it in our main proposal NRP. Specifically, PPR was designed for a very different purpose, i.e., ranking nodes in G based on their relative importance from a source node's perspective. In contrast, HNE aims to build node embeddings considering the whole graph. Consequently, node embeddings derived directly from PPR are of suboptimal utility.

The proposed NRP approach overcomes the above deficiency through an effective and efficient node reweighting algorithm, which augments PPR values with node degree information, and iteratively adjusts embedding vectors accordingly. Overall, NRP takes O(mlogn) time and O(m) space to compute all node embeddings for a graph with m edges and n nodes. Our extensive experiments that compare NRP against 18 existing solutions over 7 real graphs demonstrate that NRP achieves higher result utility than all the solutions for link prediction, graph reconstruction and node classification, while being up to orders of magnitude faster. In particular, on a billion-edge Twitter graph, NRP terminates within 4 hours, using a single CPU core.

References

[1]

S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi. Watch your step: Learning node embeddings via graph attention. In NeurIPS, pages 9180--9190, 2018.

[2]

A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In WWW, pages 37--48, 2013.

Digital Library

[3]

L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. In WSDM, pages 635--644, 2011.

Digital Library

[4]

M. J. Brzozowski and D. M. Romero. Who should i follow? recommending people in directed social networks. In Fifth International AAAI Conference on Weblogs and Social Media, 2011.

[5]

H. Cai, V. W. Zheng, and K. C. Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. TKDE, 30(9):1616--1637, 2018.

Digital Library

[6]

S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891--900, 2015.

Digital Library

[7]

S. Cao, W. Lu, and Q. Xu. Deep neural networks for learning graph representations. In AAAI, 2016.

[8]

H. Chen, B. Perozzi, Y. Hu, and S. Skiena. HARP: hierarchical representation learning for networks. In AAAI, 2018.

[9]

H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W.-C. Peng, and X. Li. Exploiting centrality information with graph convolutions for network representation learning. In ICDE, pages 590--601, 2019.

[10]

K. L. Clarkson and D. P. Woodruff. Low-rank approximation and regression in input sparsity time. STOC, pages 81--90, 2013.

Digital Library

[11]

P. Cui, X. Wang, J. Pei, and W. Zhu. A survey on network embedding. TKDE, 31(5):833--852, 2018.

[12]

Q. Dai, Q. Li, J. Tang, and D. Wang. Adversarial network embedding. In AAAI, 2018.

[13]

Q. Dai, X. Shen, L. Zhang, Q. Li, and D. Wang. Adversarial training methods for network embedding. In WWW, pages 329--339, 2019.

[14]

C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec. Learning structural node embeddings via diffusion wavelets. In KDD, pages 1320--1329, 2018.

Digital Library

[15]

H. Gao and H. Huang. Self-paced network embedding. In KDD, pages 1406--1415, 2018.

Digital Library

[16]

A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In KDD, pages 855--864, 2016.

Digital Library

[17]

Y. Gu, Y. Sun, Y. Li, and Y. Yang. Rare: Social rank regulated large-scale network embedding. In WWW, pages 359--368, 2018.

[18]

Y. Gu, Y. Sun, Y. Li, and Y. Yang. Rare: Social rank regulated large-scale network embedding. In WWW, pages 359--368, 2018.

[19]

N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217--288, 2011.

Digital Library

[20]

G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.

Digital Library

[21]

Kaggle, 2012. https://www.kaggle.com/c7kddcup2012-track1.

[22]

T. N. Kipf and M. Welling. Variational graph auto-encoders. NeurIPS Workshop, 2016.

[23]

T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.

[24]

H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010.

Digital Library

[25]

Y.-A. Lai, C.-C. Hsu, W. Chen, M.-Y. Yeh, and S.-D. Lin. Prune: Preserving proximity and global ranking for network embedding. In NeurIPS, pages 5257--5266, 2017.

[26]

A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. Pytorch-biggraph: A large-scale graph embedding system. In SysML, 2019.

[27]

X. Liu, T. Murata, K.-S. Kim, C. Kotarasu, and C. Zhuang. A general view for network embedding as matrix factorization. In WSDM, pages 375--383, 2019.

Digital Library

[28]

J. Ma, P. Cui, X. Wang, and W. Zhu. Hierarchical taxonomy aware network embedding. In KDD, pages 1920--1929, 2018.

Digital Library

[29]

C. Musco and C. Musco. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In NeurIPS, pages 1396--1404, 2015.

[30]

M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu. Asymmetric transitivity preserving graph embedding. In KDD, pages 1105--1114, 2016.

Digital Library

[31]

B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: online learning of social representations. In KDD, pages 701--710, 2014.

Digital Library

[32]

B. Perozzi, V. Kulkarni, H. Chen, and S. Skiena. Don't walk, skip!: Online learning of multi-scale network embeddings. In ASONAM, pages 258--265, 2017.

Digital Library

[33]

J. Qiu, Y. Dong, H. Ma, J. Li, C. Wang, K. Wang, and J. Tang. Netsmf: Large-scale network embedding as sparse matrix factorization. In WWW, pages 1509--1520, 2019.

[34]

J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM, pages 459--467, 2018.

Digital Library

[35]

P. Radivojac, W. T. Cark, T. R. Oron, A. M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, K. Verspoor, and et. al. A large-scale evaluation of computational protein function prediction. Nature methods, 10(3):221, 2013.

[36]

L. F. R. Ribeiro, P. H. P. Saverese, and D. R. Figueiredo. struc2vec: Learning node representations from structural identity. In KDD, pages 385--394, 2017.

Digital Library

[37]

T. Sarlos. Improved approximation algorithms for large matrices via random projections. In FOCS, pages 143--152, 2006.

Digital Library

[38]

J. Shi, R. Yang, T. Jin, X. Xiao, and Y. Yang. Realtime top-k personalized pagerank over large graphs on gpus. PVLDB, 13(1):15--28, 2019.

Digital Library

[39]

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: large-scale information network embedding. In WWW, pages 1067--1077, 2015.

Digital Library

[40]

L. Tang and H. Liu. Leveraging social media networks for classification. DMKD, 23(3):447--478, 2011.

Digital Library

[41]

R. Trivedi, B. Sisman, X. L. Dong, C. Faloutsos, J. Ma, and H. Zha. Linknbed: Multi-graph representation learning with entity linkage. In ACL, pages 252--262, 2018.

[42]

A. Tsitsulin, D. Mottin, P. Karras, and E. Müller. Verse: Versatile graph embeddings from similarity measures. In WWW, pages 539--548, 2018.

[43]

K. Tu, P. Cui, X. Wang, P. S. Yu, and W. Zhu. Deep recursive network embedding with regular equivalence. In KDD, pages 2357--2366, 2018.

Digital Library

[44]

D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In KDD, pages 1225--1234, 2016.

Digital Library

[45]

H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xing, and M. Guo. Graphgan: Graph representation learning with generative adversarial nets. In AAAI, 2018.

[46]

J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In KDD, pages 839--848, 2018.

Digital Library

[47]

Q. Wang, S. Wang, M. Gong, and Y. Wu. Feature hashing for network representation learning. In IJCAI, pages 2812--2818, 2018.

[48]

R. Wang, S. Wang, and X. Zhou. Parallelizing approximate single-source personalized pagerank queries on shared memory. VLDBJ, 28(6):923--940, 2019.

[49]

S. Wang, Y. Tang, X. Xiao, Y. Yang, and Z. Li. Hubppr: Effective indexing for approximate personalized pagerank. PVLDB, 10(3):205--216, 2016.

Digital Library

[50]

S. Wang, R. Yang, R. Wang, X. Xiao, Z. Wei, W. Lin, Y. Yang, and N. Tang. Efficient algorithms for approximate single-source personalized pagerank queries. TODS, 44(4):18, 2019.

Digital Library

[51]

S. Wang, R. Yang, X. Xiao, Z. Wei, and Y. Yang. FORA: simple and effective approximate single-source personalized pagerank. In KDD, pages 505--514, 2017.

Digital Library

[52]

X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang. Community preserving network embedding. In AAAI, 2017.

Digital Library

[53]

Z. Wei, X. He, X. Xiao, S. Wang, S. Shang, and J. Wen. Topppr: Top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456, 2018.

Digital Library

[54]

S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 2015.

Digital Library

[55]

L. Y. Wu, A. Fisch, S. Chopra, K. Adams, A. Bordes, and J. Weston. Starspace: Embed all the things! In AAAI, 2018.

[56]

C. Yang, M. Sun, Z. Liu, and C. Tu. Fast network embedding enhancement via high order proximity approximation. In IJCAI, pages 3894--3900, 2017.

[57]

J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. KAIS, 42(1):181--213, 2015.

Digital Library

[58]

R. Yang, J. Shi, X. Xiao, S. S. Bhowmick, and Y. Yang. Homogeneous network embedding for massive graphs via personalized pagerank. arXiv preprint, 2019.

[59]

Y. Yin and Z. Wei. Scalable graph embeddings via sparse transpose proximities. In KDD, 2019.

Digital Library

[60]

W. Yu, C. Zheng, W. Cheng, C. C. Aggarwal, D. Song, B. Zong, H. Chen, and W. Wang. Learning deep network representations with adversarially regularized autoencoders. In KDD, pages 2663--2671, 2018.

Digital Library

[61]

D. Zhang, J. Yin, X. Zhu, and C. Zhang. Network representation learning: A survey. IEEE Trans. Big Data, 2018.

[62]

J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding. Prone: Fast and scalable network representation learning. In IJCAI, pages 4278--4284, 2019.

[63]

Z. Zhang, P. Cui, H. Li, X. Wang, and W. Zhu. Billion-scale network embedding with iterative random projection. In ICDM, pages 787--796, 2018.

[64]

Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, and W. Zhu. Arbitrary-order proximity preserved network embedding. In KDD, pages 2778--2786, 2018.

Digital Library

[65]

C. Zhou, Y. Liu, X. Liu, Z. Liu, and J. Gao. Scalable graph embedding for asymmetric proximity. In AAAI, 2017.

Digital Library

[66]

D. Zhu, P. Cui, D. Wang, and W. Zhu. Deep variational network embedding in wasserstein space. In KDD, pages 2827--2836, 2018.

Digital Library

[67]

Z. Zhu, S. Xu, M. Qu, and J. Tang. Graphvite: A high-performance cpu-gpu hybrid system for node embedding. In WWW, pages 2494--2504, 2019.

Cited By

Feng ZQiao MPiao CCheng H(2025)On Graph Representation for Attributed Hypergraph ClusteringProceedings of the ACM on Management of Data10.1145/37097413:1(1-26)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709741
Wu WLi SJiang MLuo CLi F(2025)Time- and Space-Efficiently Sketching Billion-Scale Attributed NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350825637:2(966-978)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3508256
Fang PLi ZKhan ALuo SWang FShi ZFeng D(2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3424333
Show More Cited By

Homogeneous network embedding for massive graphs via reweighted personalized PageRank
1. Information systems
  1. Information systems applications

Recommendations

1-Homogeneous Graphs with Cocktail Party μ-Graphs

Let Γ be a graph with diameter d ≥ 2. Recall Γ is 1-homogeneous (in the sense of Nomura) whenever for every edge xy of Γ the distance partition
{{z ∈ V(Γ) | ∂(z, y) = i, ∂(x, z) = j} | 0 ≤ i, j ≤ d}
is equitable and its parameters do not depend on the edge ...
AT4 family and 2-homogeneous graphs
The 2000 Com²MaC conference on association schemes, codes and designs

Let @C denote an antipodal distance-regular graph of diameter four, with eigenvalues k=@q"0>@q"1>...>@q"4 and antipodal class size r. Then its Krein parameters satisfyq"1"1^2q"1"2^3q"1"3^4q"2"2^2q"2"2^4q"2"3^3q"2"4^4q"3"3^4>0,q"1"2^2=q"1"2^4=q"1"4^4=q"2"...
Pseudo 1-homogeneous distance-regular graphs

Let Γ be a distance-regular graph of diameter d 2 and a ₁ 0. Let be a real number. A pseudo cosine sequence for is a sequence of real numbers ₀, , _d such that ₀=1 and c _i _{i 1}+ a _i _i+ b _i _{i +1}=...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 13, Issue 5

January 2020

195 pages

ISSN:2150-8097

Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 January 2020

Published in PVLDB Volume 13, Issue 5

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
283
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng ZQiao MPiao CCheng H(2025)On Graph Representation for Attributed Hypergraph ClusteringProceedings of the ACM on Management of Data10.1145/37097413:1(1-26)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709741
Wu WLi SJiang MLuo CLi F(2025)Time- and Space-Efficiently Sketching Billion-Scale Attributed NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350825637:2(966-978)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3508256
Fang PLi ZKhan ALuo SWang FShi ZFeng D(2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3424333
Yang RShi J(2024)Efficient High-Quality Clustering for Large Bipartite GraphsProceedings of the ACM on Management of Data10.1145/36392782:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639278
Wang HYang RXiao XBaeza-Yates RBonchi F(2024)Effective Edge-wise Representation Learning in Edge-Attributed Bipartite GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671805(3081-3091)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671805
Lai YLin XYang RWang HBaeza-Yates RBonchi F(2024)Efficient Topology-aware Data Augmentation for High-Degree Graph Neural NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671765(1463-1473)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671765
Yang RWu YLin XWang QChan TShi JBaeza-Yates RBonchi F(2024)Effective Clustering on Large Attributed Bipartite GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671764(3782-3793)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671764
Lin LYu YWang ZWang ZZhao YZhao JJia TSerra ESpezzano F(2024)PSNE: Efficient Spectral Sparsification Algorithms for Scaling Network EmbeddingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679540(1420-1429)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679540
Zhang XWeng ZWang SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Towards Deeper Understanding of PPR-based Embedding Approaches: A Topological PerspectiveProceedings of the ACM Web Conference 202410.1145/3589334.3645663(969-979)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645663
Xie KDong XZhang YZhang XGuo QWang S(2024)Learning-Based Attribute-Augmented Proximity Matrix Factorization for Attributed Network EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338584736:11(6517-6531)Online publication date: 25-Apr-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3385847
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents