Abstract
Hypergraphs are instrumental in modeling complex relational systems that encompass a wide spectrum of high-order interactions among components. One prevalent analysis task is the properties estimation of large-scale hypergraphs, which involves selecting a subset of nodes and hyperedges while preserving the characteristics of the entire hypergraph. This paper aims to sample hypergraphs via random walks and is the first to perform unbiased random walks for sampling of nodes and hyperedges simultaneously in large-scale hypergraphs to the best of our knowledge. Initially, we analyze the stationary distributions of nodes and hyperedges for the simple random walk, and show that there is a high bias in both nodes and hyperedges. Subsequently, to eliminate the high bias of the simple random walk, we propose unbiased random walk strategies for nodes and hyperedges, respectively. Finally, a single joint walk schema is developed for sampling nodes and hyperedges simultaneously. To accelerate the convergence process, we employ delayed acceptance and history-aware techniques to assist our algorithm in achieving fast convergence. Extensive experimental results validate our theoretical findings, and the unbiased sampling algorithms for nodes and hyperedges have their complex hypergraph scenarios for which they are applicable. The joint random walk algorithm balanced the sampling applicable to both nodes and hyperedges.
Similar content being viewed by others
Data Availability
No datasets were generated or analysed during the current study.
References
Shao, Y., Huang, S., Li, Y., Miao, X., Cui, B., Chen, L.: Memory-aware framework for fast and scalable second-order random walk over billion-edge natural graphs. VLDB J. 30(5), 769–797 (2021)
Ahmed, N.K., Duffield, N.G., Neville, J., Kompella, R.R.: Graph sample and hold: a framework for big-graph analytics. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, USA - August 24 - 27, 2014, pp. 1446–1455 (2014)
Cui, Y., Li, X., Li, J., Wang, H., Chen, X.: A survey of sampling method for social media embeddedness relationship. ACM Comput. Surv. 55(4), 74–17439 (2023)
Krishnamurthy, V., Faloutsos, M., Chrobak, M., Cui, J., Lao, L., Percus, A.G.: Sampling large internet topologies for simulation purposes. Comput. Networks 51(15), 4284–4302 (2007)
Tang, W., Luo, G., Wu, Y., Tian, L., Zheng, X., Cai, Z.: A second-order diffusion model for influence maximization in social networks. IEEE Trans. Comput. Soc. Syst. 6(4), 702–714 (2019)
Zhao, Y., Jiang, H., Chen, Q., Qin, Y., Xie, H., Wu, Y., Liu, S., Zhou, Z., Xia, J., Zhou, F.: Preserving minority structures in graph sampling. IEEE Trans. Vis. Comput. Graph. 27(2), 1698–1708 (2021)
Chen, J., Ma, T., Xiao, C.: Fastgcn: Fast learning with graph convolutional networks via importance sampling. In: 6th International Conference on Learning Representations, ICLR (2018)
Nakajima, K., Shudo, K.: Social graph restoration via random walk sampling. In: 38th IEEE International Conference on Data Engineering, ICDE, pp. 1–14 (2022)
Xu, X., Lee, C., Eun, D.Y.: Challenging the limits: Sampling online social networks with cost constraints. In: IEEE Conference on Computer Communications, INFOCOM, pp. 1–9 (2017)
Musciotto, F., Battiston, F., Mantegna, R.N.: Detecting informative higher-order interactions in statistically validated hypergraphs. Commun. Phys. 4(1), 1–9 (2021)
Veldt, N., Benson, A.R., Kleinberg, J.: Combinatorial characterizations and impossibilities for higher-order homophily. Sci. Adv. 9(1), 3200 (2023)
Ganmor, E., Segev, R., Schneidman, E.: Sparse low-order interaction network underlies a highly correlated and learnable neural population code. Proc. Natl. Acad. Sci. 108(23), 9679–9684 (2011)
Lee, G., Ko, J., Shin, K.: Hypergraph motifs: Concepts, algorithms, and discoveries. Proc. VLDB Endow. 13(11), 2256–2269 (2020)
Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: Clustering, classification, and embedding. In: Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, pp. 1601–1608 (2006)
Zeng, Y., Huang, Y., Ren, X.-L., Lu, L.: Identifying vital nodes through augmented random walks on higher-order networks (2023)
Choe, M., Yoo, J., Lee, G., Baek, W., Kang, U., Shin, K.: MiDaS: Representative Sampling from Real-world Hypergraphs. In: Proceedings of the ACM Web Conference 2022. WWW ’22, pp. 1080–1092 (2022)
Chitra, U., Raphael, B.: Random Walks on Hypergraphs with Edge-Dependent Vertex Weights. In: Proceedings of the 36th International Conference on Machine Learning, pp. 1172–1181 (2019)
Hayashi, K., Aksoy, S.G., Park, C.H., Park, H.: Hypergraph random walks, laplacians, and clustering. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. CIKM ’20, pp. 495–504, New York, NY, USA (2020)
Dyer, M., Greenhill, C., Kleer, P., Ross, J., Stougie, L.: Sampling hypergraphs with given degrees. Discrete Math. 344(11), 112566 (2021)
Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: A case study of unbiased sampling of osns. In: 29th IEEE International Conference on Computer Communications, INFOCOM, pp. 2498–2506 (2010)
Chitra, U., Raphael, B.J.: Random walks on hypergraphs with edge-dependent vertex weights. In: Proceedings of the 36th International Conference on Machine Learning, ICML, vol. 97, pp. 1172–1181 (2019)
Carletti, T., Battiston, F., Cencetti, G., Fanelli, D.: Random walks on hypergraphs. Phys. Rev. E 101(2), 022308 (2020)
Aksoy, S.G., Joslyn, C.A., Marrero, C.O., Praggastis, B., Purvine, E.: Hypernetwork science via high-order hypergraph walks. EPJ Data Sci. 9(1), 16 (2020)
Eriksson, A., Edler, D., Rojas, A., Domenico, M., Rosvall, M.: How choosing random-walk model and network representation matters for flow-based community detection in hypergraphs. Commun. Phys. 4(1), 133 (2021)
Carletti, T., Fanelli, D., Lambiotte, R.: Random walks and community detection in hypergraphs. J. Phys. Complex. 2(1), 015011 (2021)
Xie, H., Yi, P., Li, Y., Lui, J.C.S.: Optimizing random walk based statistical estimation over graphs via bootstrapping. IEEE Trans. Knowl. Data Eng. 35(3), 2916–2929 (2023)
Hu, P., Lau, W.C.: A survey and taxonomy of graph sampling. CoRR abs/1308.5865 (2013)
Kirkland, S.: Two-mode networks exhibiting data loss. J. Complex Netw. 6(2), 297–316 (2018)
Hong, S., Lu, S.: Graph sampling methods for big complex networks integrating centrality, k-core, and spectral sparsification. In: SAC ’20: The 35th ACM/SIGAPP Symposium on Applied Computing, pp. 1843–1851 (2020)
Han, J.-D.J., Dupuy, D., Bertin, N., Cusick, M.E., Vidal, M.: Effect of sampling on topology predictions of protein-protein interaction networks. Nat. Biotechnol 23(7), 839–844 (2005)
Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 177–187 (2005)
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636 (2006)
Yousuf, M.I., Anwer, I., Anwar, R.: Empirical characterization of graph sampling algorithms. CoRR abs/2102.07980 (2021)
Pearson, K.: The problem of the random walk. Nature 72(1865), 294–294 (1905)
Jin, L., Chen, Y., Hui, P., Ding, C., Wang, T., Vasilakos, A.V., Deng, B., Li, X.: Albatross sampling: robust and effective hybrid vertex sampling for social graphs. In: Proceedings of the 3rd ACM International Workshop on Hot Topics in Planet-scale Measurement, pp. 11–16 (2011)
Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: A case study of unbiased sampling of osns. In: INFOCOM 2010. 29th IEEE International Conference on Computer Communications, pp. 2498–2506 (2010)
Ribeiro, B.F., Towsley, D.F.: Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Internet Measurement Conference, pp. 390–403 (2010)
Zhou, Z., Zhang, N., Das, G.: Leveraging history for faster sampling of online social networks. Proc. VLDB Endow. 8(10), 1034–1045 (2015)
Wang, R., Li, Y., Lin, S., Wu, W., Xie, H., Xu, Y., Lui, J.C.S.: Common neighbors matter: Fast random walk sampling with common neighbor awareness. IEEE Trans. Knowl. Data Eng. 35(5), 4570–4584 (2023)
Li, Y., Wu, Z., Lin, S., Xie, H., Lv, M., Xu, Y., Lui, J.C.S.: Walking with perception: Efficient random walk sampling via common neighbor awareness. In: 35th IEEE International Conference on Data Engineering, ICDE, pp. 962–973 (2019)
Liu, Q., Huang, Y., Metaxas, D.N.: Hypergraph with sampling for image retrieval. Pattern Recognit. 44(10–11), 2255–2262 (2011)
Joos, F., Kim, J., Kühn, D., Osthus, D.: Hypergraph regularity and random sampling. Random Struct. Algorithms 62(4), 956–1015 (2023)
Xie, X., Shi, S., Wang, H., Li, M.: SAT: sampling acceleration tree for adaptive database repartition. World Wide Web (WWW) 26(5), 3503–3533 (2023)
Cooper, C., Frieze, A., Radzik, T.: The cover times of random walks on random uniform hypergraphs. Theor. Comput. Sci. 509, 51–69 (2013)
Bermond, J., Heydemann, M., Sotteau, D.: Line graphs of hypergraphs I. Discret. Math. 18(3), 235–241 (1977)
Lu, L., Peng, X.: High-ordered random walks and generalized laplacians on hypergraphs. In: Algorithms and Models for the Web Graph - 8th International Workshop, WAW. Lecture Notes in Computer Science, vol. 6732, pp. 14–25 (2011)
Joslyn, C., Aksoy, S., Arendt, D., Jenkins, L., Praggastis, B., Purvine, E., Zalewski, M.: High performance hypergraph analytics of domain name system relationships. In: HICSS Symposium on Cybersecurity Big Data Analytics (2019)
Brooks, S., Gelman, A., Jones, G., Meng, X.-L.: Handbook of Markov Chain Monte Carlo, (2011)
Avin, C., Lando, Y., Lotker, Z.: Radio cover time in hyper-graphs. Ad Hoc Networks 12, 278–290 (2014)
Lee, C., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. In: ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS, pp. 319–330 (2012)
Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika (1970)
Andrieu, C., Livingstone, S.: Peskun-tierney ordering for markovian monte carlo: Beyond the reversible scenario. Ann. Stat. 49(4), 1958–1981 (2021)
Diaconis, P., Holmes, S., Neal, R.M.: Analysis of a nonreversible markov chain sampler. Ann. Appl. Probab, 726–752 (2000)
Neal, R.M.: Improving asymptotic variance of mcmc estimators: Non-reversible chains are better. Preprint arXiv:math/0407281 (2004)
Katzir, L., Liberty, E., Somekh, O.: Estimating sizes of social networks via biased sampling. In: Proceedings of the 20th International Conference on World Wide Web, WWW, pp. 597–606 (2011)
Funding
National Key Research and Development Program of China under Grant 2020YFB1005900, National Natural Science Foundation of China (NSFC) under Grant 62122042, Shandong University multidisciplinary research and innovation team of young scholars under Grant 2020QNQT017. Young Scientists Fund of the Natural Science Foundation of Shandong Province under Grant 61150005202301.
Author information
Authors and Affiliations
Contributions
Qi Luo and Zhenzhen Xie wrote the manuscript, Dongxiao Yu and Yu Liu collected the data, and Xiuzhen Cheng, Xiahua Jia and Xuemin Lin analyzed the results. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethical Approval
not applicable.
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Luo, Q., Xie, Z., Liu, Y. et al. Sampling hypergraphs via joint unbiased random walk. World Wide Web 27, 15 (2024). https://doi.org/10.1007/s11280-024-01253-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11280-024-01253-8