Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2566486.2568019acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

On estimating the average degree

Published: 07 April 2014 Publication History

Abstract

Networks are characterized by nodes and edges. While there has been a spate of recent work on estimating the number of nodes in a network, the edge-estimation question appears to be largely unaddressed. In this work we consider the problem of estimating the average degree of a large network using efficient random sampling, where the number of nodes is not known to the algorithm. We propose a new estimator for this problem that relies on access to node samples under a prescribed distribution. Next, we show how to efficiently realize this ideal estimator in a random walk setting. Our estimator has a natural and simple implementation using random walks; we bound its performance in terms of the mixing time of the underlying graph. We then show that our estimators are both provably and practically better than many natural estimators for the problem. Our work contrasts with existing theoretical work on estimating average degree, which assume that a uniform random sample of nodes is available and the number of nodes is known.

References

[1]
Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. J. ACM, 55(5), 2008.
[2]
Z. Bar-Yossef and M. Gurevich. Efficient search engine measurements. TWEB, 5(4):18, 2011.
[3]
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. Comput. Netw. ISDN Syst., 30(1--7):379--388, 1998.
[4]
C. Cooper, T. Radzik, and Y. Siantos. Estimating network parameters using random walks. In CASoN, pages 33--40, 2012.
[5]
A. Dasgupta, R. Kumar, and D. Sivakumar. Social sampling. In KDD, pages 235--243, 2012.
[6]
D. P. Dubhash and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009.
[7]
U. Feige. On sums of independent random variables with unbounded variance and estimating the average degree in a graph. SICOMP, 35(4):964--984, 2006.
[8]
M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. Walking in Facebook: A case study of unbiased sampling of OSNs. In INFOCOM, pages 1--9, 2010.
[9]
O. Goldreich and D. Ron. Approximating average parameters of graphs. RS&A, 32(4):473--493, 2008.
[10]
S. J. Hardiman and L. Katzir. Estimating clustering coefficients and size of social networks via random walk. In WWW, pages 539--550, 2013.
[11]
L. Katzir, E. Liberty, and O. Somekh. Estimating sizes of social networks via biased sampling. In WWW, pages 597--606, 2011.
[12]
M. Kurant, C. T. Butts, and A. Markopoulou. Graph size estimation. CoRR, abs/1210.0460, 2012.
[13]
D. Levin, Y. Peres, and E. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2009.
[14]
P. Lezaud. Chernoff-type bound for finite Markov chains. AAP, pages 849--867, 1998.
[15]
T. H. McCormick, A. Moussa, J. Ruf, T. A. DiPrete, A. Gelman, J. Teitler, and T. Zheng. A practical guide to measuring social structure using indirectly observed network data. Journal of Statistical Theory and Practice, 7(1):120--132, 2013.
[16]
T. H. McCormick, M. J. Salganik, and T. Zheng. How many people do you know?: Efficiently estimating personal network size. JASA, 105(489):59--70, 2010.
[17]
T. H. McCormick and T. Zheng. A latent space representation of overdispersed relative propensity in "How many X's do you know" data. In Conf. Proc. Joint Stat. Meet., 2010.
[18]
R. Motwani, R. Panigrahy, and Y. Xu. Estimating sum by weighted sampling. In ICALP, pages 53--64, 2007.
[19]
A. Sinclair. Algorithms for Random Generation and Counting: A Markov Chain Approach. Springer, 1993.
[20]
S. Ye and F. Wu. Estimating the size of online social networks. In SocialCom, pages 169--176, 2010.

Cited By

View all
  • (2024)One-Stage Fair Multi-View Spectral ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681162(1407-1416)Online publication date: 28-Oct-2024
  • (2024)Better Sum Estimation via Weighted SamplingACM Transactions on Algorithms10.1145/365003020:3(1-33)Online publication date: 21-Jun-2024
  • (2024)Sublinear-Time Opinion Estimation in the Friedkin--Johnsen ModelProceedings of the ACM Web Conference 202410.1145/3589334.3645572(2563-2571)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. average degree
  2. graph sampling
  3. random walks

Qualifiers

  • Research-article

Conference

WWW '14
Sponsor:
  • IW3C2

Acceptance Rates

WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)6
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)One-Stage Fair Multi-View Spectral ClusteringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681162(1407-1416)Online publication date: 28-Oct-2024
  • (2024)Better Sum Estimation via Weighted SamplingACM Transactions on Algorithms10.1145/365003020:3(1-33)Online publication date: 21-Jun-2024
  • (2024)Sublinear-Time Opinion Estimation in the Friedkin--Johnsen ModelProceedings of the ACM Web Conference 202410.1145/3589334.3645572(2563-2571)Online publication date: 13-May-2024
  • (2023)Theoretical bounds on the network community profile from low-rank semi-definite programmingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618976(13976-13992)Online publication date: 23-Jul-2023
  • (2023)Random Walk Sampling in Social Networks Involving Private NodesACM Transactions on Knowledge Discovery from Data10.1145/356138817:4(1-28)Online publication date: 24-Feb-2023
  • (2023)DeMEtRIS: Counting (near)-Cliques by CrawlingProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570438(312-320)Online publication date: 27-Feb-2023
  • (2023)Interactive Activities Initiation through Retrieving Hidden Social Information Networks2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00063(538-547)Online publication date: 1-Dec-2023
  • (2023)Exploring the topological characteristics of urban trip networks based on taxi trajectory dataPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2022.128391609(128391)Online publication date: Jan-2023
  • (2022)Edge sampling and graph parameter estimation via vertex neighborhood accessesProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3520059(1116-1129)Online publication date: 9-Jun-2022
  • (2022)Sampling Multiple Nodes in Large NetworksProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498383(37-47)Online publication date: 11-Feb-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media