Abstract
Graph sampling is a widely used procedure in social network analysis, has attracted great interest in the scientific community and is considered as a very powerful and useful tool in several domains of network analysis. Apart from initial research in this area, which has proposed simple processes such as the classic Random Walk algorithm, Random Node and Random Edge sampling, during the last decade, more advanced graph sampling approaches have been emerged. In this paper, we extensively study the properties of a newly proposed method, the Rank Degree method, which leads to representative graph subgraphs. The Rank Degree is a novel graph exploration method which significantly differs from other existing methods in the literature. The novelty of the Rank Degree lies on the fact that its core methodology corresponds to a deterministic graph exploration; one specific variation corresponds to a number of parallel deterministic traverses that explore the graph. We perform extensive experiments on twelve real-world datasets of a different type, using a variety of measures and comparing our method with Forest Fire, Metropolis Hastings Random Walk and Metropolis Hastings. We provide strong evidence that our approach leads to highly efficient graph sampling; the generated samples preserve several graph properties, to a large extent.
Similar content being viewed by others
References
Ahmed NK, Neville J, Kompella RR (2012) Network sampling: from static to streaming graphs. CoRR abs/1211.3412
Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24:1–24:74
Çem E, Tozal ME, Saraç K (2013) Impact of sampling design in estimation of graph characteristics. In: IEEE 32nd international performance computing and communications conference, IPCCC 2013, San Diego, CA, USA, December 6–8, 2013, pp 1–10
Chen D, Lu L, Shang MS, Zhang YC, Zhou T (2012) Identifying influential nodes in complex networks. Phys A Stat Mech Appl 391(4):1777–1787
Chiericetti F, Dasgupta A, Kumar R, Lattanzi S, Sarlós T (2016) On sampling nodes in a network. In: Proceedings of the 25th international conference on world wide web, Republic and Canton of Geneva, Switzerland, WWW ’16, pp 471–481
Cho E, Myers SA, Leskovec J (2011) Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’11, pp 1082–1090
Choudhury MD, Lin Y, Sundaram H, Candan KS, Xie L, Kelliher A (2010) How does the data sampling strategy impact the discovery of information diffusion in social media? In: Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC, USA, May 23–26, 2010
Csardi G, Nepusz T (2006) The igraph software package for complex network research. Inter J Complex Syst 1695. http://igraph.org
Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239
Gabielkov M, Rao A, Legout A (2014) Sampling online social networks: an experimental study of twitter. In: ACM SIGCOMM 2014 conference, SIGCOMM’14, Chicago, IL, USA, August 17–22, 2014, pp 127–128
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of OSNs. In: IEEE Proceedings on INFOCOM, 2010, pp 1–9
Gjoka M, Kurant M, Butts CT, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892
Gu Y, McCallum A, Towsley D (2005) Detecting anomalies in network traffic using maximum entropy estimation. In: Proceedings of the 5th ACM SIGCOMM conference on internet measurement, USENIX Association, Berkeley, CA, USA, IMC ’05, pp 32–32
Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4):784–796
Hu P, Lau WC (2013) A survey and taxonomy of graph sampling. CoRR abs/1308.5865
Hubler C, peter Kriegel H, Borgwardt K, Ghahramani Z (2008) Metropolis algorithms for representative subgraph sampling. In: Eighth IEEE international conference on data mining, 2008. ICDM08, pp 283–292
Kitsak M, Gallos LK, Havlin S, Liljerosand F, Muchnik L, Stanley HE, Makse HA (2010) Identification of influential spreaders in complex networks. Nat Phys 6:888–893. doi:10.1038/nphys1746
Krivitsky PN, Kolaczyk ED (2015) On the question of effective sample size in network modeling: an asymptotic inquiry. Stat Sci 30(2):184–198
Kurant M, Markopoulou A, Thiran P (2011) Towards unbiased BFS sampling. IEEE J Sel Areas Commun 29(9):1799–1809
Lee C, Xu X, Eun DY (2012) Beyond random walk and Metropolis–Hastings samplers: why you should not backtrack for unbiased graph sampling. In: ACM SIGMETRICS/PERFORMANCE joint international conference on measurement and modeling of computer systems, SIGMETRICS ’12, London, UK, June 11–15, 2012, pp 319–330
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, PA, USA, August 20–23, 2006, pp 631–636
Leskovec J, Krevl A (2014) SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data
Leskovec J, Kleinberg JM, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, USA, August 21–24, 2005, pp 177–187
Leskovec J, Kleinberg JM, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. TKDD 1(1):1–40
Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, New York, NY, USA, CHI ’10, pp 1361–1370
Levin DA, Peres Y, Wilmer EL (2008) Markov chains and mixing times. American Mathematical Society (AMS), Providence.
Li R, Yu JX, Qin L, Mao R, Jin T (2015) On random walk based graph sampling. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13–17, 2015, pp 927–938
Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, CA, USA, August 21–24, 2011, pp 105–113
McAuley JJ, Leskovec J (2012) Learning to discover social circles in ego networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in Neural Information Processing Systems 25 (NIPS 2012). Curran Associates, Inc., pp 539–547
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. doi:10.1063/1.1699114
Potamias M, Bonchi F, Castillo C, Gionis A (2009) Fast shortest path distance estimation in large networks. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’09, pp 867–876
Ribeiro BF, Towsley DF (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM internet measurement conference, IMC 2010, Melbourne, Australia, November 1–3, 2010, pp 390–403
Richardson M, Agrawal R, Domingos P (2003) Trust management for the semantic web. Springer, Berlin
Ripeanu M, Foster IT, Iamnitchi A (2002) Mapping the gnutella network: properties of large-scale peer-to-peer systems and implications for system design. CoRR cs.DC/0209028
Robles-Granda P, Moreno S, Neville J (2016) Sampling of attributed networks from hierarchical generative models. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, August 13–17, 2016, pp 1155–1164
Salamanos N, Voudigari E, Yannakoudakis EJ (2016) Identifying influential spreaders by graph sampling. In: Proceedings of the 5th international workshop on complex networks and their applications, Milan, Italy, November 30–December 02, 2016
Stutzbach D, Rejaie R, Duffield N, Sen S, Willinger W (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17(2):377–390
Vattani A, Chakrabarti D, Gurevich M (2011) Preserving personalized pagerank in subgraphs. In: Proceedings of the 28th international conference on machine learning, ICML 2011, Bellevue, Washington, USA, June 28–July 2, 2011, pp 793–800
Voudigari E, Salamanos N, Papageorgiou T, Yannakoudakis EJ (2016) Rank degree: an efficient algorithm for graph sampling. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2016, San Francisco, CA, USA, August 18–21, 2016, pp 120–129
Xu X, Lee C, Eun DY (2014) A general framework of hybrid graph sampling for complex network analysis. In: 2014 IEEE conference on computer communications, INFOCOM 2014, Toronto, Canada, April 27–May 2, 2014, pp 2795–2803
Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, ACM, New York, NY, USA, MDS ’12, pp 3:1–3:8
Zafarani R, Liu H (2009) Social computing data repository at ASU. http://socialcomputing.asu.edu
Zeng A, Zhang CJ (2013) Ranking spreaders by decomposing complex networks. Phys Lett A 377(14):1031–1035
Acknowledgements
We thank Kyriaki Chryssaki for her helpful comments on the final manuscript. We also wish to acknowledge the Athens University of Economics and Business for the technical equipment provision.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Salamanos, N., Voudigari, E. & Yannakoudakis, E.J. Deterministic graph exploration for efficient graph sampling. Soc. Netw. Anal. Min. 7, 24 (2017). https://doi.org/10.1007/s13278-017-0441-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-017-0441-6