Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Homogeneous network embedding for massive graphs via reweighted personalized PageRank

Published: 01 January 2020 Publication History

Abstract

Given an input graph G and a node vG, homogeneous network embedding (HNE) maps the graph structure in the vicinity of v to a compact, fixed-dimensional feature vector. This paper focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility.
Our proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic idea of deriving embedding vectors from pairwise personalized PageRank (PPR) values. Our contributions are twofold: first, we design a simple and efficient baseline HNE method based on PPR that is capable of handling billion-edge graphs on commodity hardware; second and more importantly, we identify an inherent drawback of vanilla PPR, and address it in our main proposal NRP. Specifically, PPR was designed for a very different purpose, i.e., ranking nodes in G based on their relative importance from a source node's perspective. In contrast, HNE aims to build node embeddings considering the whole graph. Consequently, node embeddings derived directly from PPR are of suboptimal utility.
The proposed NRP approach overcomes the above deficiency through an effective and efficient node reweighting algorithm, which augments PPR values with node degree information, and iteratively adjusts embedding vectors accordingly. Overall, NRP takes O(mlogn) time and O(m) space to compute all node embeddings for a graph with m edges and n nodes. Our extensive experiments that compare NRP against 18 existing solutions over 7 real graphs demonstrate that NRP achieves higher result utility than all the solutions for link prediction, graph reconstruction and node classification, while being up to orders of magnitude faster. In particular, on a billion-edge Twitter graph, NRP terminates within 4 hours, using a single CPU core.

References

[1]
S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi. Watch your step: Learning node embeddings via graph attention. In NeurIPS, pages 9180--9190, 2018.
[2]
A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In WWW, pages 37--48, 2013.
[3]
L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. In WSDM, pages 635--644, 2011.
[4]
M. J. Brzozowski and D. M. Romero. Who should i follow? recommending people in directed social networks. In Fifth International AAAI Conference on Weblogs and Social Media, 2011.
[5]
H. Cai, V. W. Zheng, and K. C. Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. TKDE, 30(9):1616--1637, 2018.
[6]
S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891--900, 2015.
[7]
S. Cao, W. Lu, and Q. Xu. Deep neural networks for learning graph representations. In AAAI, 2016.
[8]
H. Chen, B. Perozzi, Y. Hu, and S. Skiena. HARP: hierarchical representation learning for networks. In AAAI, 2018.
[9]
H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W.-C. Peng, and X. Li. Exploiting centrality information with graph convolutions for network representation learning. In ICDE, pages 590--601, 2019.
[10]
K. L. Clarkson and D. P. Woodruff. Low-rank approximation and regression in input sparsity time. STOC, pages 81--90, 2013.
[11]
P. Cui, X. Wang, J. Pei, and W. Zhu. A survey on network embedding. TKDE, 31(5):833--852, 2018.
[12]
Q. Dai, Q. Li, J. Tang, and D. Wang. Adversarial network embedding. In AAAI, 2018.
[13]
Q. Dai, X. Shen, L. Zhang, Q. Li, and D. Wang. Adversarial training methods for network embedding. In WWW, pages 329--339, 2019.
[14]
C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec. Learning structural node embeddings via diffusion wavelets. In KDD, pages 1320--1329, 2018.
[15]
H. Gao and H. Huang. Self-paced network embedding. In KDD, pages 1406--1415, 2018.
[16]
A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In KDD, pages 855--864, 2016.
[17]
Y. Gu, Y. Sun, Y. Li, and Y. Yang. Rare: Social rank regulated large-scale network embedding. In WWW, pages 359--368, 2018.
[18]
Y. Gu, Y. Sun, Y. Li, and Y. Yang. Rare: Social rank regulated large-scale network embedding. In WWW, pages 359--368, 2018.
[19]
N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217--288, 2011.
[20]
G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.
[21]
Kaggle, 2012. https://www.kaggle.com/c7kddcup2012-track1.
[22]
T. N. Kipf and M. Welling. Variational graph auto-encoders. NeurIPS Workshop, 2016.
[23]
T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
[24]
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010.
[25]
Y.-A. Lai, C.-C. Hsu, W. Chen, M.-Y. Yeh, and S.-D. Lin. Prune: Preserving proximity and global ranking for network embedding. In NeurIPS, pages 5257--5266, 2017.
[26]
A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. Pytorch-biggraph: A large-scale graph embedding system. In SysML, 2019.
[27]
X. Liu, T. Murata, K.-S. Kim, C. Kotarasu, and C. Zhuang. A general view for network embedding as matrix factorization. In WSDM, pages 375--383, 2019.
[28]
J. Ma, P. Cui, X. Wang, and W. Zhu. Hierarchical taxonomy aware network embedding. In KDD, pages 1920--1929, 2018.
[29]
C. Musco and C. Musco. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In NeurIPS, pages 1396--1404, 2015.
[30]
M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu. Asymmetric transitivity preserving graph embedding. In KDD, pages 1105--1114, 2016.
[31]
B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: online learning of social representations. In KDD, pages 701--710, 2014.
[32]
B. Perozzi, V. Kulkarni, H. Chen, and S. Skiena. Don't walk, skip!: Online learning of multi-scale network embeddings. In ASONAM, pages 258--265, 2017.
[33]
J. Qiu, Y. Dong, H. Ma, J. Li, C. Wang, K. Wang, and J. Tang. Netsmf: Large-scale network embedding as sparse matrix factorization. In WWW, pages 1509--1520, 2019.
[34]
J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM, pages 459--467, 2018.
[35]
P. Radivojac, W. T. Cark, T. R. Oron, A. M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, K. Verspoor, and et. al. A large-scale evaluation of computational protein function prediction. Nature methods, 10(3):221, 2013.
[36]
L. F. R. Ribeiro, P. H. P. Saverese, and D. R. Figueiredo. struc2vec: Learning node representations from structural identity. In KDD, pages 385--394, 2017.
[37]
T. Sarlos. Improved approximation algorithms for large matrices via random projections. In FOCS, pages 143--152, 2006.
[38]
J. Shi, R. Yang, T. Jin, X. Xiao, and Y. Yang. Realtime top-k personalized pagerank over large graphs on gpus. PVLDB, 13(1):15--28, 2019.
[39]
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: large-scale information network embedding. In WWW, pages 1067--1077, 2015.
[40]
L. Tang and H. Liu. Leveraging social media networks for classification. DMKD, 23(3):447--478, 2011.
[41]
R. Trivedi, B. Sisman, X. L. Dong, C. Faloutsos, J. Ma, and H. Zha. Linknbed: Multi-graph representation learning with entity linkage. In ACL, pages 252--262, 2018.
[42]
A. Tsitsulin, D. Mottin, P. Karras, and E. Müller. Verse: Versatile graph embeddings from similarity measures. In WWW, pages 539--548, 2018.
[43]
K. Tu, P. Cui, X. Wang, P. S. Yu, and W. Zhu. Deep recursive network embedding with regular equivalence. In KDD, pages 2357--2366, 2018.
[44]
D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In KDD, pages 1225--1234, 2016.
[45]
H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xing, and M. Guo. Graphgan: Graph representation learning with generative adversarial nets. In AAAI, 2018.
[46]
J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In KDD, pages 839--848, 2018.
[47]
Q. Wang, S. Wang, M. Gong, and Y. Wu. Feature hashing for network representation learning. In IJCAI, pages 2812--2818, 2018.
[48]
R. Wang, S. Wang, and X. Zhou. Parallelizing approximate single-source personalized pagerank queries on shared memory. VLDBJ, 28(6):923--940, 2019.
[49]
S. Wang, Y. Tang, X. Xiao, Y. Yang, and Z. Li. Hubppr: Effective indexing for approximate personalized pagerank. PVLDB, 10(3):205--216, 2016.
[50]
S. Wang, R. Yang, R. Wang, X. Xiao, Z. Wei, W. Lin, Y. Yang, and N. Tang. Efficient algorithms for approximate single-source personalized pagerank queries. TODS, 44(4):18, 2019.
[51]
S. Wang, R. Yang, X. Xiao, Z. Wei, and Y. Yang. FORA: simple and effective approximate single-source personalized pagerank. In KDD, pages 505--514, 2017.
[52]
X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang. Community preserving network embedding. In AAAI, 2017.
[53]
Z. Wei, X. He, X. Xiao, S. Wang, S. Shang, and J. Wen. Topppr: Top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456, 2018.
[54]
S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 2015.
[55]
L. Y. Wu, A. Fisch, S. Chopra, K. Adams, A. Bordes, and J. Weston. Starspace: Embed all the things! In AAAI, 2018.
[56]
C. Yang, M. Sun, Z. Liu, and C. Tu. Fast network embedding enhancement via high order proximity approximation. In IJCAI, pages 3894--3900, 2017.
[57]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. KAIS, 42(1):181--213, 2015.
[58]
R. Yang, J. Shi, X. Xiao, S. S. Bhowmick, and Y. Yang. Homogeneous network embedding for massive graphs via personalized pagerank. arXiv preprint, 2019.
[59]
Y. Yin and Z. Wei. Scalable graph embeddings via sparse transpose proximities. In KDD, 2019.
[60]
W. Yu, C. Zheng, W. Cheng, C. C. Aggarwal, D. Song, B. Zong, H. Chen, and W. Wang. Learning deep network representations with adversarially regularized autoencoders. In KDD, pages 2663--2671, 2018.
[61]
D. Zhang, J. Yin, X. Zhu, and C. Zhang. Network representation learning: A survey. IEEE Trans. Big Data, 2018.
[62]
J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding. Prone: Fast and scalable network representation learning. In IJCAI, pages 4278--4284, 2019.
[63]
Z. Zhang, P. Cui, H. Li, X. Wang, and W. Zhu. Billion-scale network embedding with iterative random projection. In ICDM, pages 787--796, 2018.
[64]
Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, and W. Zhu. Arbitrary-order proximity preserved network embedding. In KDD, pages 2778--2786, 2018.
[65]
C. Zhou, Y. Liu, X. Liu, Z. Liu, and J. Gao. Scalable graph embedding for asymmetric proximity. In AAAI, 2017.
[66]
D. Zhu, P. Cui, D. Wang, and W. Zhu. Deep variational network embedding in wasserstein space. In KDD, pages 2827--2836, 2018.
[67]
Z. Zhu, S. Xu, M. Qu, and J. Tang. Graphvite: A high-performance cpu-gpu hybrid system for node embedding. In WWW, pages 2494--2504, 2019.

Cited By

View all
  • (2025)On Graph Representation for Attributed Hypergraph ClusteringProceedings of the ACM on Management of Data10.1145/37097413:1(1-26)Online publication date: 11-Feb-2025
  • (2025)Time- and Space-Efficiently Sketching Billion-Scale Attributed NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350825637:2(966-978)Online publication date: 1-Feb-2025
  • (2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: 1-Jan-2025
  • Show More Cited By
  1. Homogeneous network embedding for massive graphs via reweighted personalized PageRank

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 5
    January 2020
    195 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 January 2020
    Published in PVLDB Volume 13, Issue 5

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)On Graph Representation for Attributed Hypergraph ClusteringProceedings of the ACM on Management of Data10.1145/37097413:1(1-26)Online publication date: 11-Feb-2025
    • (2025)Time- and Space-Efficiently Sketching Billion-Scale Attributed NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350825637:2(966-978)Online publication date: 1-Feb-2025
    • (2025)Information-Oriented Random Walks and Pipeline Optimization for Distributed Graph EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342433337:1(408-422)Online publication date: 1-Jan-2025
    • (2024)Efficient High-Quality Clustering for Large Bipartite GraphsProceedings of the ACM on Management of Data10.1145/36392782:1(1-27)Online publication date: 26-Mar-2024
    • (2024)Effective Edge-wise Representation Learning in Edge-Attributed Bipartite GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671805(3081-3091)Online publication date: 25-Aug-2024
    • (2024)Efficient Topology-aware Data Augmentation for High-Degree Graph Neural NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671765(1463-1473)Online publication date: 25-Aug-2024
    • (2024)Effective Clustering on Large Attributed Bipartite GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671764(3782-3793)Online publication date: 25-Aug-2024
    • (2024)PSNE: Efficient Spectral Sparsification Algorithms for Scaling Network EmbeddingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679540(1420-1429)Online publication date: 21-Oct-2024
    • (2024)Towards Deeper Understanding of PPR-based Embedding Approaches: A Topological PerspectiveProceedings of the ACM Web Conference 202410.1145/3589334.3645663(969-979)Online publication date: 13-May-2024
    • (2024)Learning-Based Attribute-Augmented Proximity Matrix Factorization for Attributed Network EmbeddingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338584736:11(6517-6531)Online publication date: 25-Apr-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media