Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Sampling Graphlets of Multiplex Networks: A Restricted Random Walk Approach

Published: 14 June 2021 Publication History

Abstract

Graphlets are induced subgraph patterns that are crucial to the understanding of the structure and function of a large network. A lot of effort has been devoted to calculating graphlet statistics where random walk-based approaches are commonly used to access restricted graphs through the available application programming interfaces (APIs). However, most of them merely consider individual networks while overlooking the strong coupling between different networks. In this article, we estimate the graphlet concentration in multiplex networks with real-world applications. An inter-layer edge connects two nodes in different layers if they actually belong to the same node. The access to a multiplex network is restrictive in the sense that the upper layer allows random walk sampling, whereas the nodes of lower layers can be accessed only through the inter-layer edges and only support random node or edge sampling. To cope with this new challenge, we define a suit of two-layer graphlets and propose novel random walk sampling algorithms to estimate the proportion of all the three-node graphlets. An analytical bound on the sampling steps is proved to guarantee the convergence of our unbiased estimator. We further generalize our algorithm to explore the tradeoff between the estimated accuracy of different graphlets when the sample budget is split into different layers. Experimental evaluation on real-world and synthetic multiplex networks demonstrates the accuracy and high efficiency of our unbiased estimators.

References

[1]
Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1446–1455.
[2]
N. K. Ahmed, N. Duffield, T. Willke, and R. A. Rossi. 2017. On sampling from massive graph streams. arXiv:1703.02625. Retrieved from https://arxiv.org/abs/1703.02625.
[3]
N. K. Ahmed, N. Duffield, T. L. Willke, and R. A. Rossi. 2017. On sampling from massive graph streams. VLDB J. 10, 11 (2017), 1430–1441.
[4]
N. K. Ahmed, J. Neville, and R. Kompella. 2013. Network sampling: From static to streaming graphs. ACM Trans. Knowl. Discov. Data 8, 2 (2013), 1–56.
[5]
N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. 2015. Efficient graphlet counting for large networks. In Proceedings of the International Conference on Data Mining. IEEE, 1–10.
[6]
Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 1 (2002), 47.
[7]
Massoud Amin. 2002. Toward secure and resilient interdependent infrastructures. J. Infrastruct. Syst. 8, 3 (2002), 67–75.
[8]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 16–24.
[9]
M. A. Bhuiyan, M. Rahman, M. Rahman, and Al. H. Mohammad. 2012. Guise: Uniform sampling of graphlets for large graph analysis. In Proceedings of the International Conference on Data Mining. IEEE, 91–100.
[10]
Hanjo D. Boekhout, Walter A. Kosters, and Frank W. Takes. 2018. Counting multilayer temporal motifs in complex networks. In Proceedings of the International Conference on Complex Networks and Their Applications. Springer, 565–577.
[11]
Béla Bollobás and Bollobás Béla. 2001. Random Graphs. Number 73. Cambridge University Press.
[12]
S. P. Borgatti, A. Mehra, D. J. Brass, and G. Labianca. 2009. Network analysis in the social sciences. Science 323, 5916 (2009), 892–895.
[13]
Fabio Celli, F. Marta L. Di Lascio, Matteo Magnani, Barbara Pacelli, and Luca Rossi. 2010. Social network data and practices: The case of Friendfeed. In Proceedings of the International Conference on Social Computing, Behavioral Modeling and Prediction.Lecture Notes in Computer Science. Springer, Berlin.
[14]
X. Chen, Y. Li, P. Wang, and J. Lui. 2016. A general framework for estimating graphlet statistics via random walk. VLDB J. 10, 3 (2016), 253–264.
[15]
Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. 2012. Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In Proceedings of the Symposium on Theoretical Aspects of Computer Science (STACS’12).
[16]
G. M. Coclite, M. Garavello, and B. Piccoli. 2005. Traffic flow on a road network. SIAM J. Math. Anal. 36, 6 (2005), 1862–1886.
[17]
Yuxiao Dong, Jie Tang, Sen Wu, Jilei Tian, Nitesh V Chawla, Jinghai Rao, and Huanhuan Cao. 2012. Link prediction and recommendation across heterogeneous social networks. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining. IEEE, 181–190.
[18]
Charles J. Geyer. 2005. Markov chain Monte Carlo lecture notes.
[19]
M. Gjoka, C. T. Butts, M. Kurant, and A. Markopoulou. 2011. Multigraph sampling of online social networks. IEEE J. Select. Areas Commun. 29, 9 (2011), 1893–1905.
[20]
J. W. Godfrey. 1969. The mechanism of a road network. Traffic Eng. Contr. 8, 8 (1969).
[21]
Qingyuan Gong, Yang Chen, Xiaolong Yu, Chao Xu, Zhichun Guo, Yu Xiao, Fehmi Ben Abdesslem, Xin Wang, and Pan Hui. 2019. Exploring the power of social hub services. World Wide Web 22, 6 (2019), 2825–2852.
[22]
Yacov Y Haimes and Pu Jiang. 2001. Leontief-based model of risk in complex interconnected infrastructures. J. Infrastruct. Syst. 7, 1 (2001), 1–12.
[23]
Fritz Heider. 1958. The Psychology of Interpersonal Relations. Psychology Press.
[24]
T. Hočevar and J. Demšar. 2014. A combinatorial approach to graphlet counting. Bioinformatics 30, 4 (2014), 559–565.
[25]
J. M. Hofman and C. H. Wiggins. 2008. Bayesian approach to network modularity. Phys. Rev. Lett. 100, 25 (2008), 258701.
[26]
Hong Huang, Jie Tang, Lu Liu, JarDer Luo, and Xiaoming Fu. 2015. Triadic closure pattern analysis and prediction in social networks. IEEE Trans. Knowl. Data Eng. 27, 12 (2015), 3374–3389.
[27]
M. Jha, C. Seshadhri, and A. Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, 589–597.
[28]
M. Jha, C. Seshadhri, and A. Pinar. 2015. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In Proceedings of the International Conference on World Wide Web. 495–505.
[29]
K. Juszczyszyn, K. Musial, and M. Budka. 2011. Link prediction based on subgraph evolution in dynamic social networks. In Proceedings of the International Conference on Social Computing. IEEE, 27–34.
[30]
L. Katzir and S. J. Hardiman. 2015. Estimating clustering coefficients and size of social networks via random walk. ACM Trans. Web 9, 4 (2015), 19.
[31]
Peter Klimek and Stefan Thurner. 2013. Triadic closure dynamics drives scaling laws in social multiplex networks. New J. Phys. 15, 6 (2013), 063008.
[32]
Jérôme Kunegis. 2013. Konect: The Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. 1343–1350.
[33]
J. Kunegis, A. Lommatzsch, and C. Bauckhage. 2009. The slashdot zoo: Mining a social network with negative edges. In Proceedings of the International Conference on World Wide Web. ACM, 741–750.
[34]
C. H. Lee, X. Xu, and D. Y. Eun. 2012. Beyond random walk and metropolis-hastings samplers: Why you should not backtrack for unbiased graph sampling. In Proceedings of the ACM Special Interest Group on Performance Evaluation (SIGMETRICS’12), Vol. 40. ACM, 319–330.
[35]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data.
[36]
J. Y. Li and M. Y. Yeh. 2011. On sampling type distribution from heterogeneous social networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 111–122.
[37]
R. H. Li, J. X. Yu, L. Qin, R. Mao, and T. Jin. 2015. On random walk based graph sampling. In Proceedings of the International Conference on Data Engineering. IEEE, 927–938.
[38]
Matteo Magnani and Luca Rossi. 2011. The ML-model for multi-layer social networks. In Proceedings of the Advances in Social Network Analysis and Mining (ASONAM’11). IEEE Computer Society, 5–12.
[39]
Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. 383–389.
[40]
J. D. Noh and H. Rieger. 2004. Random walks on complex networks. Phys. Rev. Lett. 92, 11 (2004), 118701.
[41]
N. Pržulj. 2007. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177–e183.
[42]
M. Rahman, M. Bhuiyan, and M. Al. Hasan. 2012. Graft: An approximate graphlet counting algorithm for large graph analysis. In Proceedings of the International Conference on Information and Knowledge Management. ACM, 1467–1471.
[43]
J. Scott. 1988. Social network analysis. Sociology 22, 1 (1988), 109–127.
[44]
C. Seshadhri, A. Pinar, and T. G. Kolda. 2013. Triadic measures on graphs: The power of wedge sampling. In Proceedings of the International Conference on Data Mining. SIAM, 10–18.
[45]
S. Suri and S. Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the International Conference on World Wide Web. 607–614.
[46]
Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the evolution of user interaction in Facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN’09).
[47]
J. S. Vitter. 1985. Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 1 (1985), 37–57.
[48]
P. Wang, J. Lui, B. Ribeiro, D. Towsley, J. Zhao, and X. Guan. 2014. Efficiently estimating motif statistics of large networks. ACM Trans. Knowl. Discov. Data 9, 2 (2014), 8.
[49]
P. Wang, J. Tao, J. Zhao, and X. Guan. 2015. Moss: A scalable tool for efficiently sampling and counting 4-and 5-node graphlets. arXiv:1509.08089. Retrieved from https://arxiv.org/abs/1509.08089.
[50]
Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 6684 (1998), 440–442.
[51]
Sebastian Wernicke and Florian Rasche. 2006. FANMOD: A tool for fast network motif detection. Bioinformatics 22, 9 (2006), 1152–1153.
[52]
O. Younis, M. Krunz, and S. Ramasubramanian. 2006. Node clustering in wireless sensor networks: Recent developments and deployment challenges. IEEE Netw. 20, 3 (2006), 20–25.
[53]
Jing Zhang, Zhanpeng Fang, Wei Chen, and Jie Tang. 2015. Diffusion of “following” links in microblogging networks. IEEE Trans. Knowl. Data Eng. 27, 8 (2015), 2093–2106.

Cited By

View all

Index Terms

  1. Sampling Graphlets of Multiplex Networks: A Restricted Random Walk Approach

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on the Web
    ACM Transactions on the Web  Volume 15, Issue 4
    November 2021
    152 pages
    ISSN:1559-1131
    EISSN:1559-114X
    DOI:10.1145/3465465
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2021
    Accepted: 01 March 2021
    Revised: 01 December 2020
    Received: 01 May 2020
    Published in TWEB Volume 15, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Graphlets
    2. multiplex network
    3. graph sampling
    4. random walk
    5. unbiased estimation

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Key R&D Program of China
    • Natural Science Foundation of China
    • Shanghai-Hong Kong Collaborative Project
    • Key-Area Research and Development Program of Guangdong Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 30 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media