Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3485447.3512124acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Leveraging Google’s Publisher-Specific IDs to Detect Website Administration

Published: 25 April 2022 Publication History

Abstract

Digital advertising is the most popular way for content monetization on the Internet. Publishers spawn new websites, and older ones change hands with the sole purpose of monetizing user traffic. In this ever-evolving ecosystem, it is challenging to effectively answer questions such as: Which entities monetize what websites? What categories of websites does an average entity typically monetize on and how diverse are these websites? How has this website administration ecosystem changed across time?
In this paper, we propose a novel, graph-based methodology to detect administration of websites on the Web, by exploiting the ad-related publisher-specific IDs. We apply our methodology across the top 1 million websites and study the characteristics of the created graphs of website administration. Our findings show that approximately 90% of the websites are associated each with a single publisher, and that small publishers tend to manage less popular websites. We perform a historical analysis of up to 8 million websites, and find a new, constantly rising number of (intermediary) publishers that control and monetize traffic from hundreds of websites, seeking a share of the ad-market pie. We also observe that over time, websites tend to move from big to smaller administrators.

References

[1]
Lawrence Alexander. 2015. Open-Source Information Reveals Pro-Kremlin Web Campaign. https://globalvoices.org/2015/07/13/open-source-information-reveals-pro-kremlin-web-campaign/.
[2]
Internet Archive. 2021. HTTPArchive. https://httparchive.org/.
[3]
Andy Baio. 2011. Think You Can Hide, Anonymous Blogger? Two Words: Google Analytics. https://www.wired.com/2011/11/goog-analytics-anony-bloggers/.
[4]
Muhammad Ahmad Bashir, Sajjad Arshad, Engin Kirda, William Robertson, and Christo Wilson. 2019. A Longitudinal Analysis of the Ads.Txt Standard. In Proceedings of the Internet Measurement Conference (Amsterdam, Netherlands) (IMC ’19). Association for Computing Machinery, New York, NY, USA, 294–307. https://doi.org/10.1145/3355369.3355603
[5]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10(2008), P10008.
[6]
Frank Cangialosi, Taejoong Chung, David Choffnes, Dave Levin, Bruce M. Maggs, Alan Mislove, and Christo Wilson. 2016. Measurement and Analysis of Private Key Sharing in the HTTPS Ecosystem. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS ’16). Association for Computing Machinery, New York, NY, USA, 628–640. https://doi.org/10.1145/2976749.2978301
[7]
Juan Miguel Carrascosa, Jakub Mikians, Ruben Cuevas, Vijay Erramilli, and Nikolaos Laoutaris. 2015. I Always Feel like Somebody’s Watching Me: Measuring Online Behavioural Advertising. In Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies (Heidelberg, Germany) (CoNEXT ’15). Association for Computing Machinery, New York, NY, USA, Article 13, 13 pages. https://doi.org/10.1145/2716281.2836098
[8]
Google Help Center. 2021. About Tags. https://support.google.com/tagmanager/answer/3281060.
[9]
Google Help Center. 2021. Ad placement policies. https://support.google.com/adsense/answer/2659106.
[10]
Google Help Center. 2021. Manage user access to your account. https://support.google.com/adsense/answer/2646544.
[11]
Google Help Center. 2021. Organize your containers. https://support.google.com/tagmanager/answer/6261285.
[12]
Google Help Center. 2021. Revenue share. https://support.google.com/adsense/answer/1346295.
[13]
Google Help Center. 2021. Setup and install Tag Manager. https://support.google.com/tagmanager/answer/6103696.
[14]
Google Help Center. 2021. Tracking ID and property number. https://support.google.com/analytics/answer/7372977.
[15]
Google Help Center. 2021. Universal Analytics property. https://support.google.com/analytics/answer/10220206.
[16]
Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law distributions in empirical data. SIAM review 51, 4 (2009), 661–703.
[17]
AdSense Help Community. 2019. What is the adsenseHostId on blogger?https://support.google.com/adsense/thread/18637422/what-is-the-adsensehostid-on-blogger-why-there-is-another-pub-id.
[18]
Ethan Cramer-Flood. 2021. Worldwide Digital Ad Spending 2021. https://www.emarketer.com/content/worldwide-digital-ad-spending-2021.
[19]
David Dittrich and Erin Kenneally. 2012. The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research. Technical Report.
[20]
Steven Englehardt and Arvind Narayanan. 2016. Online Tracking: A 1-Million-Site Measurement and Analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS ’16). Association for Computing Machinery, New York, NY, USA, 1388–1401. https://doi.org/10.1145/2976749.2978313
[21]
Tabatha Farney. 2016. Google Analytics and Google Tag Manager. ALA TechSource, Chicago.
[22]
Phillipa Gill, Vijay Erramilli, Augustin Chaintreau, Balachander Krishnamurthy, Konstantina Papagiannaki, and Pablo Rodriguez. 2013. Best Paper – Follow the Money: Understanding Economics of Online Aggregation and Advertising. In Proceedings of the 2013 Conference on Internet Measurement Conference (Barcelona, Spain) (IMC ’13). Association for Computing Machinery, New York, NY, USA, 141–148. https://doi.org/10.1145/2504730.2504768
[23]
Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and biological networks. Proceedings of the national academy of sciences 99, 12 (2002), 7821–7826.
[24]
Inc. GoDaddy Media Temple. 2021. Sucuri - Free website security check and malware scanner. https://sitecheck.sucuri.net/.
[25]
Inc. GoDaddy Media Temple. 2021. Sucuri Report - prykoly.ru. https://sitecheck.sucuri.net/results/prykoly.ru.
[26]
Google. 2021. Certified Publishing Partner. https://www.google.com/ads/publisher/partners/.
[27]
Warner Music Group. 2021. Services - Recorded Music. https://www.wmg.com/services.
[28]
Tim Hwang. 2020. Subprime Attention Crisis: Advertising and the Time Bomb at the Heart of the Internet. FSG Originals x Logic, Farrar, Straus and Giroux, New York.
[29]
IAB Technology Laboratory. 2021. IAB Releases Internet Advertising Revenue Report for 2020. https://www.iab.com/news/iab-internet-advertising-revenue/.
[30]
Chronicle Security Ireland Limited. [n.d.]. VirusTotal - Analyze suspicious files and URLs to detect types of malware, automatically share them with the security community. https://www.virustotal.com/.
[31]
Chronicle Security Ireland Limited. 2021. VirusTotal Report - pps.net. https://www.virustotal.com/gui/domain/www.pps.net.
[32]
Similarweb LTD. [n.d.]. Website Traffic - Check and Analyze Any Website. https://www.similarweb.com/.
[33]
Srdjan Matic, Platon Kotzias, and Juan Caballero. 2015. CARONTE: Detecting Location Leaks for Deanonymizing Tor Hidden Services. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Denver, Colorado, USA) (CCS ’15). Association for Computing Machinery, New York, NY, USA, 1455–1466. https://doi.org/10.1145/2810103.2813667
[34]
Townsquare Media. 2021. Digital Media and Radio Advertising Company. https://www.townsquaremedia.com/.
[35]
Emmanouil Papadogiannakis. 2021. Scrape Titan. https://gitlab.com/papamano/scrape-titan.
[36]
Emmanouil Papadogiannakis. 2021. Website Administration Graphs. https://gitlab.com/papamano/website-administration-graphs.
[37]
Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos P. Markatos. 2018. The Cost of Digital Advertisement: Comparing User and Advertiser Views. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1479–1489. https://doi.org/10.1145/3178876.3186060
[38]
Panagiotis Papadopoulos, Nicolas Kourtellis, Pablo Rodriguez Rodriguez, and Nikolaos Laoutaris. 2017. If You Are Not Paying for It, You Are the Product: How Much Do Advertisers Pay to Reach You?. In Proceedings of the 2017 Internet Measurement Conference (London, United Kingdom) (IMC ’17). Association for Computing Machinery, New York, NY, USA, 142–156. https://doi.org/10.1145/3131365.3131397
[39]
GNU Project and Kevin Atkinson. 2019. GNU Aspell - Free and Open Source spell checker. http://aspell.net/.
[40]
PublicWWW. [n.d.]. Source Code Search Engine. https://publicwww.com/.
[41]
Caitlin M. Rivers and Bryan L. Lewis. 2014. Ethical research standards in a world of big data. F1000Research 3(2014), 38. https://doi.org/10.12688/f1000research.3-38.v2
[42]
Richard Rogers. 2021. Digital Forensics: Repurposing Google Analytics IDs. Amsterdam University Press, Amsterdam,The Netherlands, 241–245. http://www.jstor.org/stable/j.ctv1qr6smr.36
[43]
Celine Isabelle Samson. 2018. VERA FILES FACT CHECK YEARENDER: Ads reveal links between websites producing fake news. https://www.verafiles.org/articles/vera-files-fact-check-yearender-ads-reveal-links-between-web.
[44]
Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal 27, 3 (1948), 379–423.
[45]
Craig Silverman, Jane Lytvynenko, Lam Thuy Vo, and Jeremy Singer-Vine. 2017. Inside The Partisan Fight For Your News Feed. https://www.buzzfeednews.com/article/craigsilverman/inside-the-partisan-fight-for-your-news-feed.
[46]
Milivoj Simeonovski, Giancarlo Pellegrino, Christian Rossow, and Michael Backes. 2017. Who Controls the Internet? Analyzing Global Threats Using Property Graph Traversals. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW ’17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 647–656. https://doi.org/10.1145/3038912.3052587
[47]
Oleksii Starov, Yuchen Zhou, Xiao Zhang, Najmeh Miramirkhani, and Nick Nikiforakis. 2018. Betrayed by Your Dashboard: Discovering Malicious Campaigns via Web Analytics. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 227–236. https://doi.org/10.1145/3178876.3186089
[48]
The Chromium Authors. 2014. Chrome DevTools Protocol. https://chromedevtools.github.io/devtools-protocol/.
[49]
Tranco. 2021. Tranco list with the 1M top sites generated on 14 April 2021. https://tranco-list.eu/list/7JVX/full.
[50]
Changhoon Yoon, Kwanwoo Kim, Yongdae Kim, Seungwon Shin, and Sooel Son. 2019. DoppelgäNgers on the Dark Web: A Large-Scale Assessment on Phishing Hidden Web Services. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 2225–2235. https://doi.org/10.1145/3308558.3313551

Cited By

View all
  • (2024)Ad Laundering: How Websites Deceive Advertisers into Rendering Ads Next to Illicit ContentCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651466(782-785)Online publication date: 13-May-2024
  • (2023)IDTracker: Discovering Illicit Website Communities via Third-party Service IDs2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58367.2023.00050(459-469)Online publication date: Jun-2023
  • (2023)FNDaaS: Content-agnostic Detection of Websites Distributing Fake News2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386830(1438-1449)Online publication date: 15-Dec-2023

Index Terms

  1. Leveraging Google’s Publisher-Specific IDs to Detect Website Administration
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            WWW '22: Proceedings of the ACM Web Conference 2022
            April 2022
            3764 pages
            ISBN:9781450390965
            DOI:10.1145/3485447
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 25 April 2022

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. Advertising
            2. Identifiers
            3. Web Administration
            4. Web Monetization

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Funding Sources

            • EU H2020 Research and Innovation programme

            Conference

            WWW '22
            Sponsor:
            WWW '22: The ACM Web Conference 2022
            April 25 - 29, 2022
            Virtual Event, Lyon, France

            Acceptance Rates

            Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)24
            • Downloads (Last 6 weeks)2
            Reflects downloads up to 26 Sep 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Ad Laundering: How Websites Deceive Advertisers into Rendering Ads Next to Illicit ContentCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651466(782-785)Online publication date: 13-May-2024
            • (2023)IDTracker: Discovering Illicit Website Communities via Third-party Service IDs2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58367.2023.00050(459-469)Online publication date: Jun-2023
            • (2023)FNDaaS: Content-agnostic Detection of Websites Distributing Fake News2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386830(1438-1449)Online publication date: 15-Dec-2023

            View Options

            Get Access

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media