Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3646547.3688411acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article
Free access

What's in the Dataset? Unboxing the APNIC per AS User Population Dataset

Published: 04 November 2024 Publication History

Abstract

The research measurement community needs methods and datasets to identify user concentrations and to accurately weight ASes against each other for analyzing measurements' coverage. However, academic researchers traditionally lack visibility into how many users are in each network or how much traffic flows to each network and so often fall back on treating all IP addresses or networks equally. As an alternative, some recent studies have used the APNIC per AS Population Estimates dataset, but it is unvalidated and its methodology is not fully public.
In this work, we validate its use as a fairly reliable user population indicator. Our approach includes a detailed comparative analysis using a global CDN dataset, providing concrete evidence of the APNIC dataset's accuracy. We find that the APNIC per-AS user estimates closely align with the Content Delivery Network (CDN) per-AS user estimates in 51.2% of countries and correctly identify the largest networks in 93.9% of cases. When we investigate the agreement with CDN traffic volume, the APNIC dataset closely aligns in 36.5% of countries, increasing to 91.0% when focusing only on larger networks. We also evaluate the limitations of the APNIC dataset, particularly its inability to accurately identify user populations for ASes in certain countries. To address this, we introduce new methods to improve its usability by focusing on the statistical representativeness of the underlying data collection process and ensuring consistency across several public datasets.

References

[1]
Bernhard Ager, Nikolaos Chatzis, Anja Feldmann, Nadi Sarrar, Steve Uhlig, and Walter Willinger. 2012. Anatomy of a large European IXP. In Proc. ACM SIGCOMM.
[2]
AmIUnique.org. [n.d.]. Am I Unique. https://amiunique.org/
[3]
Scott Anderson, Loqman Salamatian, Zachary S. Bischof, Alberto Dainotti, and Paul Barford. 2022. iGDB: connecting the physical and logical layers of the internet. In Proc. ACM IMC.
[4]
Daryna Antoniuk. 2023. Russia wants to isolate its internet, but experts warn it won't be easy. https://therecord.media/russia-internet-isolation-challenges. Accessed: 2023--12-03.
[5]
APNIC. [n.,d.]. Customers per AS Masurements -- Visible ASNs: Customer Populations (Est.). https://stats.labs.apnic.net/aspop/
[6]
ARCEP. 2024. Fixed Broadband and Superfast Broadband Market Press Release. https://en.arcep.fr/news/press-releases/view/n/fixed-broadband-and-superfast-broadband-market-140324.html Accessed: 2024-09-07.
[7]
Todd Arnold, Ege Gürmeric cliler, Georgia Essig, Arpit Gupta, Matt Calder, Vasileios Giotsas, and Ethan Katz-Bassett. 2020. hrefhttps://eprints.lancs.ac.uk/id/eprint/139422/(How Much) Does a Private WAN Improve Cloud Performance?killpunct. In Proc. IEEE INFOCOM.
[8]
Todd Arnold, Jia He, Weifan Jiang, Matt Calder, Italo Cunha, Vasileios Giotsas, and Ethan Katz-Bassett. 2020. Cloud Provider Connectivity in the Flat Internet. In Proc. ACM IMC.
[9]
Zachary S. Bischof, Kennedy Pitcher, Esteban Carisimo, Amanda Meng, Rafael Bezerra Nunes, Ramakrishna Padmanabhan, Margaret E. Roberts, Alex C. Snoeren, and Alberto Dainotti. 2023. hrefhttps://doi.org/10.1145/3603269.3604883Destination Unreachable: Characterizing Internet Outages and Shutdowns. In Proc. ACM SIGCOMM.
[10]
Timm Böttger, Gianni Antichi, Eder L Fernandes, Roberto di Lallo, Marc Bruyere, Steve Uhlig, Gareth Tyson, and Ignacio Castro. 2018. Shaping the Internet: 10 Years of IXP Growth. arXiv preprint arXiv:1810.10963 (2018).
[11]
Timm Böttger, Felix Cuadrado, and Steve Uhlig. 2018. Looking for Hypergiants in PeeringDB. In SIGCOMM CCR, Vol. 48. 13--19.
[12]
Lyle D Broemeling. 1986. Econometrics and structural change. Vol. 74. CRC Press.
[13]
CAIDA. [n.,d.]. The CAIDA UCSD AS to Organization Mapping Dataset, 2024/01. https://www.caida.org/data/as_organizations.xml.
[14]
CAIDA. 2021. The CAIDA UCSD AS Classification Dataset, 2020--2021. https://www.caida.org/catalog/datasets/as-classification
[15]
Esteban Carisimo, Alexander Gamero-Garrido, Alex C. Snoeren, and Alberto Dainotti. 2021. Identifying ASes of State-Owned Internet Operators. In Proc. ACM IMC.
[16]
Nikolaos Chatzis, Georgios Smaragdakis, and Anja Feldmann. 2013. On the importance of Internet eXchange Points for today's Internet ecosystem. ArXiv, Vol. abs/1307.5264 (2013). https://api.semanticscholar.org/CorpusID:384168
[17]
Alex Chen and Nate Sales. 2021. Multi-User IP Address Detection. https://blog.cloudflare.com/multi-user-ip-address-detection/ The Cloudflare Blog.
[18]
Yi-Ching Chiu, Brandon Schlinker, Abhishek Balaji Radhakrishnan, Ethan Katz-Bassett, and Ramesh Govindan. 2015. Are We One Hop Away from a Better Internet?killpunct. In Proc. ACM IMC.
[19]
David R. Choffnes and Fabián E. Bustamante. 2008. Taming the Torrent: A Practical Approach to Reducing Cross-Isp Traffic in Peer-to-Peer Systems. In Proc. ACM SIGCOMM.
[20]
David D Clark and Sara Wedeman. 2021. Measurement, meaning and purpose: Exploring the M-Lab NDT dataset. In TPRC49: The 49th Research Conference on Communication, Information and Internet Policy.
[21]
Cloudflare. [n.,d.]. About: Cloudflare Radar. https://radar.cloudflare.com/about.
[22]
Australian Communications and Media Authority (ACMA). 2024. Telco Reporting Obligations. https://www.acma.gov.au/telco-reporting-obligations Accessed: 2024-09-07.
[23]
DataReportal. 2024. Digital 2024: India. https://datareportal.com/reports/digital-2024-india Accessed: 2024-09-07.
[24]
Cloudflare Docs. [n.,d.]. Onion Routing and Tor support. https://developers.cloudflare.com/network/onion-routing/.
[25]
Damien Fay, Hamed Haddadi, Andrew Thomason, Andrew W. Moore, Richard Mortier, Almerima Jamakovic, Steve Uhlig, and Miguel Rio. 2010. Weighted Spectral Distribution for Internet Topology Analysis: Theory and Applications. IEEE/ACM ToN.
[26]
Federal Communications Commission (FCC). 2024. Broadband Data Collection. https://www.fcc.gov/BroadbandData Accessed: 2024-09-07.
[27]
Data for India. 2024. Living Conditions: Access to Communication Technology. https://www.dataforindia.com/living-conditions-access-to-comm-tech/ Accessed: 2024-09-07.
[28]
Blinded for review. 2022.
[29]
Electronic Frontier Foundation. [n.,d.]. Cover Your Tracks. https://coveryourtracks.eff.org/
[30]
Petros Gigis, Matt Calder, Lefteris Manassakis, George Nomikos, Vasileios Kotronis, Xenofontas Dimitropoulos, Ethan Katz-Bassett, and Georgios Smaragdakis. 2021. Seven years in the life of Hypergiants' off-nets. In Proc. ACM SIGCOMM.
[31]
Petros Gigis, Vasileios Kotronis, Emile Aben, Stephen D. Strowes, and Xenofontas Dimitropoulos. 2017. Characterizing User-to-User Connectivity with RIPE Atlas. In Proc. ACM ANRW.
[32]
Phillipa Gill, Christophe Diot, Lai Yi Ohlsen, Matt Mathis, and Stephen Soltesz. 2022. M-Lab: User initiated Internet data for the research community. SIGCOMM CCR (2022).
[33]
Vasileios Giotsas, George Nomikos, Vasileios Kotronis, Pavlos Sermpezis, Petros Gigis, Lefteris Manassakis, Christoph Dietzel, Stavros Konstantaras, and Xenofontas Dimitropoulos. 2021. href10.1109/TNET.2020.3025945O Peer, Where Art Thou? Uncovering Remote Peering Interconnections at IXPs. In IEEE/ACM ToN.
[34]
Liberty Global. 2020. Liberty Global Completes Acquisition of Sunrise. https://www.libertyglobal.com/liberty-global-completes-acquisition-of-sunrise/ Accessed: 2024-09-07.
[35]
Google. 2022. Update related to Russian ads (March 2022). https://support.google.com/adspolicy/answer/11960078?hl = en Accessed: 2024-04--25.
[36]
Google. 2024. About Display ads and the Google Display Network. https://support.google.com/google-ads/answer/2404190
[37]
Google. 2024. Understanding Google Ads country restrictions. https://support.google.com/google-ads/answer/6163740?hl=en
[38]
Google Support. n.d. Understand your test results - Google's partnership with M-Lab. https://support.google.com/websearch/answer/6283840?visit_id=638614097442592364--2117357741&p=speedtest&rd=1#zippy=%2Cunderstand-your-test-results%2Cgoogles-partnership-with-m-lab Accessed: 2024-09-07.
[39]
Freedom House. 2024. Freedom on the Net 2024 Scores. https://freedomhouse.org/countries/freedom-net/scores Accessed: YYYY-MM-DD.
[40]
Geoff Huston. 2014. How Big is that Network? https://labs.apnic.net/index.php/2014/10/02/how-big-is-that-network/.
[41]
Geoff Huston. 2024. Private Communication.
[42]
International Telecommunication Union (ITU). 2023. Measuring digital development: Facts and figures 2023. https://www.itu.int/hub/publication/d-ind-ict_mdd-2023--1/ Accessed: 2024-09-07.
[43]
Akshath Jain, Deepayan Patra, Peijing Xu, Justine Sherry, and Phillipa Gill. 2022. The Ukrainian Internet Under Attack: an NDT Perspective. In IMC.
[44]
Weifan Jiang, Tao Luo, Thomas Koch, Yunfan Zhang, Ethan Katz-Bassett, and Matt Calder. 2021. Towards Identifying Networks with Internet Clients Using Public Data. In Proc. ACM SIGCOMM.
[45]
Gagandeep Kaur. 2023. India's Top 2 Mobile Carriers Fight for Supremacy in Fixed Broadband. https://www.fierce-network.com/wireless/indias-top-2-mobile-carriers-fight-supremacy-fixed-broadband Accessed: 2024-09-07.
[46]
Thomas Koch, Weifan Jiang, Tao Luo, Petros Gigis, Yunfan Zhang, Kevin Vermeulen, Emile Aben, Matt Calder, Ethan Katz-Bassett, Lefteris Manassakis, Georgios Smaragdakis, and Narseo Vallina-Rodriguez. 2021. Towards a Traffic Map of the Internet: Connecting the Dots between Popular Services and Users. In Proc. ACM HotNets.
[47]
Thomas Koch, Ethan Katz-Bassett, John Heidemann, Matt Calder, Calvin Ardi, and Ke Li. 2021. Anycast In Context: A Tale of Two Systems. In Proc. ACM SIGCOMM.
[48]
DART Financial Supervisory Service Korea. 2024. Financial Report. https://dart.fss.or.kr/dsaf001/main.do?rcpNo=20240320002050 Accessed: 2024-09-07.
[49]
Vasileios Kotronis, George Nomikos, Lefteris Manassakis, Dimitris Mavrommatis, and Xenofontas Dimitropoulos. 2017. Shortcuts through Colocation Facilities. In Proc. ACM IMC.
[50]
Xiang Li, Baojun Liu, Xiaofeng Zheng, Haixin Duan, Qi Li, and Youjun Huang. 2021. hrefhttps://doi.org/10.1109/DSN48987.2021.00025Fast IPv6 Network Periphery Discovery and Security Implications. In Proc. IEEE/IFIP Dependable Systems and Networks.
[51]
Ioana Livadariu, Ahmed Elmokashfi, and Amogh Dhamdhere. 2020. An agent-based model of IPv6 adoption. In 2020 IFIP Networking Conference (Networking).
[52]
Aemen Lodhi, Natalie Larson, Amogh Dhamdhere, Constantine Dovrolis, and Kc Claffy. 2014. Using peeringDB to understand the peering ecosystem. In SIGCOMM CCR.
[53]
Kyle MacMillan, Tarun Mangla, James Saxon, Nicole P. Marwell, and Nick Feamster. 2023. A Comparative Analysis of Ookla Speedtest and Measurement Labs Network Diagnostic Test (NDT7). Proc. ACM Meas. Anal. Comput. Syst. (2023).
[54]
G. Maier, F. Schneider, and A. Feldmann. 2011. NAT usage in Residential Broadband Networks. In Proc. PAM.
[55]
P. Marchetta, A. Montieri, V. Persico, A. Pescapé, Í. Cunha, and E. Katz-Bassett. 2016. How and How Much Traceroute Confuses Our Understanding of Network Paths. In Proc. IEEE LANMAN.
[56]
Nick Merrill and Tejas N Narechania. 2023. Inside the Internet. Duke Law Journal Online (2023).
[57]
Broadband TV News. 2019. Vodafone initiates Unitymedia integration. https://www.broadbandtvnews.com/2019/09/02/vodafone-initiates-unitymedia-integration/ Accessed: 2024-09-07.
[58]
Sadia Nourin, Van Tran, Xi Jiang, Kevin Bock, Nick Feamster, Nguyen Phong Hoang, and Dave Levin. 2023. Measuring and Evading Turkmenistan's Internet Censorship: A Case Study in Large-Scale Measurements of a Low-Penetration Country. In Proc. ACM Web Conference.
[59]
Access Now. 2023. Internet shutdowns in Myanmar persist as a tool of control, Access Now condemns. https://www.accessnow.org/press-release/myanmar-keepiton-internet-shutdowns-2023-en/
[60]
Ministry of Internal Affairs and Communications (Japan). 2024. Japan Telecommunications Market Data Report 2024. https://www.soumu.go.jp/main_content/000936792.pdf Accessed: 2024-09-07.
[61]
University of Oregon. 2024. Route Views Archive Project. http://routeviews.org
[62]
Ministry of Science and ICT. 2024. Ministry of Science and ICT - Republic of Korea. https://www.msit.go.kr/eng/index.do Accessed: 2024-09-07.
[63]
Ofcom. 2024. Communications Market Report. https://www.ofcom.org.uk/research-statistics-and-data/cmr/ Accessed: 2024-09-07.
[64]
Ramakrishna Padmanabhan, Arturo Filastò, Maria Xynou, Ram Sundara Raman, Kennedy Middleton, Mingwei Zhang, Doug Madory, Molly Roberts, and Alberto Dainotti. 2021. A multi-perspective view of Internet censorship in Myanmar. In Proc. ACM SIGCOMM Workshop on Free and Open Communications on the Internet.
[65]
PeeringDB. [n.,d.]. PeeringDB. http://www.peeringdb.com.
[66]
David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R Grossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander, Michael Mitzenmacher, and Pardis C Sabeti. 2011. Detecting novel associations in large data sets. science, Vol. 334, 6062 (2011), 1518--1524.
[67]
P. Richter, G. Smaragdakis, D. Plonka, and A. Berger. 2016. Beyond Counting: New Perspectives on the Active IPv4 Address Space. In Proc. ACM IMC.
[68]
Loqman Salamatian, Todd Arnold, Ítalo Cunha, Jiangchen Zhu, Yunfan Zhang, Ethan Katz-Bassett, and Matt Calder. 2023. Who Squats IPv4 Addresses?. In SIGCOMM CCR, Vol. 53.
[69]
Mario A. Sanchez, Fabian E. Bustamante, Balachander Krishnamurthy, Walter Willinger, Georgios Smaragdakis, and Jeffrey Erman. 2014. Inter-Domain Traffic Estimation for the Outsider. In Proc. ACM IMC.
[70]
Patrick Sattler, Juliane Aulbach, Johannes Zirngibl, and Georg Carle. 2022. hrefhttps://doi.org/10.1145/3517745.3561426Towards a tectonic traffic shift? investigating Apple's new relay network. In Proc. ACM IMC.
[71]
Brandon Schlinker, Italo Cunha, Yi-Ching Chiu, Srikanth Sundaresan, and Ethan Katz-Bassett. 2019. Internet Performance from Facebook's Edge. In Proc. ACM IMC.
[72]
B Schlinker, H. Kim, T. Cui, E. Katz-Bassett, H. V. Madhyastha, I. Cunha, J. Quinn, S. Hasan, P. Lapukhov, and H. Zeng. 2017. Engineering Egress with Edge Fabric: Steering Oceans of Content to the World. In Proc. ACM SIGCOMM.
[73]
Patrick Schober, Christa Boer, and Lothar Schwarte. 2018. Correlation Coefficients: Appropriate Use and Interpretation. Anesthesia & Analgesia, Vol. 126 (02 2018), 1.
[74]
Statistica. 2024. Most used internet providers / brands in Austria as of March 2024. https://www.statista.com/forecasts/1001225. Survey conducted in Region, April 2023 to March 2024 with 1307 respondents, aged Age group.
[75]
Statistica. 2024. Most used internet providers / brands in Canada as of March 2024. https://www.statista.com/forecasts/998473. Survey conducted in Region, April 2023 to March 2024 with 1240 respondents, aged Age group.
[76]
Statistica. 2024. Most used internet providers / brands in Italy as of March 2024. https://www.statista.com/forecasts/1000674. Survey conducted in Region, April 2023 to March 2024 with 1254 respondents, aged Age group.
[77]
Statistica. 2024 d. Most used internet providers / brands in the U.S. as of March 2024. https://www.statista.com/forecasts/997229. Survey conducted in Region, April 2023 to March 2024 with 5561 respondents, aged Age group.
[78]
Elisa Tsai, Ram Sundara Raman, Atul Prakash, and Roya Ensafi. 2024. hrefhttps://www.ndss-symposium.org/ndss-paper/modeling-and-detecting-internet-censorship-events/Modeling and Detecting Internet Censorship Events. In Proc. ISOC NDSS.
[79]
Kevin Vermeulen, Loqman Salamatian, Sang Hoon Kim, Matt Calder, and Ethan Katz-Bassett. 2023. The Central Problem with Distributed Content: Common CDN Deployments Centralize Traffic In A Risky Way. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks (Cambridge, MA, USA) (HotNets '23). Association for Computing Machinery, New York, NY, USA, 70--78. https://doi.org/10.1145/3626111.3628213
[80]
Kevin Vermeulen, Loqman Salamatian, Sang Hoon Kim, Matt Calder, and Ethan Katz-Bassett. 2023. hrefhttps://doi.org/10.1145/3626111.3628213The Central Problem with Distributed Content: Common CDN Deployments Centralize Traffic In A Risky Way. In Proc. ACM HotNets.
[81]
Zesen Zhang, Jiting Shen, and Ricky K. P. Mok. 2024. hrefhttps://doi.org/10.1109/CCWC60891.2024.10427883Empirical Characterization of Ookla?s Speed Test Platform: Analyzing Server Deployment, Policy Impact, and User Coverage. In Proc. IEEE Computing and Communication Workshop and Conference.

Index Terms

  1. What's in the Dataset? Unboxing the APNIC per AS User Population Dataset

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IMC '24: Proceedings of the 2024 ACM on Internet Measurement Conference
    November 2024
    812 pages
    ISBN:9798400705922
    DOI:10.1145/3646547
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 November 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. apnic user estimates
    2. datasets
    3. traffic volume

    Qualifiers

    • Research-article

    Conference

    IMC '24
    IMC '24: ACM Internet Measurement Conference
    November 4 - 6, 2024
    Madrid, Spain

    Acceptance Rates

    Overall Acceptance Rate 277 of 1,083 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 80
      Total Downloads
    • Downloads (Last 12 months)80
    • Downloads (Last 6 weeks)80
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media