Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3576915.3616586acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Open access

CookieGraph: Understanding and Detecting First-Party Tracking Cookies

Published: 21 November 2023 Publication History

Abstract

As third-party cookie blocking is becoming the norm in mainstream web browsers, advertisers and trackers have started to use first-party cookies for tracking. To understand this phenomenon, we conduct a differential measurement study with versus without third-party cookies. We find that first-party cookies are used to store and exfiltrate identifiers to known trackers even when third-party cookies are blocked.
As opposed to third-party cookie blocking, first-party cookie blocking is not practical because it would result in major breakage of website functionality. We propose CookieGraph, a machine learning-based approach that can accurately and robustly detect and block first-party tracking cookies. CookieGraph detects first-party tracking cookies with 90.18% accuracy, outperforming the state-of-the-art CookieBlock by 17.31%. We show that CookieGraph is robust against cookie name manipulation, while CookieBlock's accuracy drops by 15.87%. While blocking all first-party cookies results in major breakage on 32% of the sites with SSO logins, and CookieBlock reduces it to 10%, we show that CookieGraph does not cause any major breakage on these sites.
Our deployment of CookieGraph shows that first-party tracking cookies are used on 89.86% of the top-million websites. We find that 96.61% of these first-party tracking cookies are in fact ghostwritten by third-party scripts embedded in the first-party context. We also find evidence of first-party tracking cookies being set by fingerprinting scripts. The most prevalent first-party tracking cookies are set by major advertising entities such as Google, Facebook, and TikTok.

References

[1]
1996. This bug in your PC is a smart cookie. https://archive.org/details/Financia lTimes1996UKEnglish.
[2]
2001. Internet Privacy with IE6 and P3P: A Summary of Findings. http://web.archive.org/web/20200731061208/http://www.spywarewarrior.c om/uiuc/ie6-p3p.htm.
[3]
2022. AdBlock Plus. https://adblockplus.org/. https://adblockplus.org/
[4]
2022. Attentive cookie. https://docs.attentivemobile.com/pages/developer-guides/third-party-integrations/referral-marketing-platforms/talkable/. https://docs.attentivemobile.com/pages/developer-guides/third-party-integrations/referral-marketing-platforms/talkable/
[5]
2022. Cookies and the Experience Cloud Identity Service. https://experienceleag ue.adobe.com/docs/id-service/using/intro/cookies.html?lang=en. https://experi enceleague.adobe.com/docs/id-service/using/intro/cookies.html?lang=en
[6]
2022. Disconnect tracking protection lists. https://disconnect.me/trackerprotect ion. https://disconnect.me/trackerprotection
[7]
2022. DoubleClick. https://web.archive.org/web/19970405225532/http://www.do ubleclick.com/.
[8]
2022. EasyList. https://easylist.to/easylist/easylist.txt.
[9]
2022. EasyPrivacy. https://easylist.to/easylist/easyprivacy.txt.
[10]
2022. Enhanced Tracking Protection in Firefox for desktop. https://support.mozi lla.org/en-US/kb/enhanced-tracking-protection-firefox-desktop.
[11]
2022. Hubspot cookie. https://knowledge.hubspot.com/reports/what-cookies-does-hubspot-set-in-a-visitor-s-browser. https://knowledge.hubspot.com/repo rts/what-cookies-does-hubspot-set-in-a-visitor-s-browser
[12]
2022. ID5 - First Party IDs and Identity Resolution Methods Explained. https://web.archive.org/web/20220408035339/https://id5.io/news/index. php/2022/03/24/first-party-ids-and-identity-resolution-methods-explained/.
[13]
2022. Omnisend cookie. https://support.omnisend.com/en/articles/1933402-explaining-and-managing-tracking-cookies. https://support.omnisend.com/en/ articles/1933402-explaining-and-managing-tracking-cookies
[14]
2022. One Trust. Cookiepedia. https://cookiepedia.co.uk.
[15]
2022. Tracking Prevention in Microsoft Edge. https://docs.microsoft.com/en-us/microsoft-edge/web-platform/tracking-prevention.
[16]
2022. The Trade Desk and LiveRamp to Lead Industry Effort to Bring New Privacy-First Interoperable ID Solution to Meet Emerging Requirements in Europe. Website. https://www.thetradedesk.com/us/news/press-room/the-trade-desk-and-liveramp-to-lead-industry-effort-to-bring-new-privacy-first-interoperable-id-solution-to-meet-emerging-requirements-in-europe
[17]
2022. Understanding Calls to the Demdex Domain. https://experience league.adobe.com/docs/audience-manager/user-guide/reference/demdex-calls.html?lang=en. https://experienceleague.adobe.com/docs/audience-manager/user-guide/reference/demdex-calls.html?lang=en
[18]
2023. Enhancing identity at scale with ID5. Website. https://www.thetradedesk.com/us/resource-desk/enhancing-identity-at-scale-with-id5
[19]
2023. An open-source identity solution built for the open internet. Website. https://web.archive.org/web/20230629012449/https://unifiedid.com/
[20]
n.d. About publisher provided identifiers. https://web.archive.org/web/ 20220614165742/https://support.google.com/admanager/answer/2880055?hl=e n.
[21]
n.d. Cartographer Identity Graph. https://web.archive.org/web/20220526085916/ https://www.lotame.com/solutions/cartographer-identity-graph/.
[22]
n.d. CookieBlock. https://github.com/dibollinger/CookieBlock.
[23]
n.d. Criteo Online Identification). https://web.archive.org/web/20220819071808/ https://filecache.investorroom.com/mr5ir_criteo/977/download/Criteo_Online _Identification_May2020.pdf/.
[24]
n.d. fbp and fbc Parameters. https://web.archive.org/web/20220722220344/https: //developers.facebook.com/docs/marketing-api/conversions-api/parameters/f bp-and-fbc/.
[25]
n.d. Firefox rolls out Total Cookie Protection by default to all users world-wide. https://blog.mozilla.org/en/products/firefox/firefox-rolls-out-total-cook ie-protection-by-default-to-all-users-worldwide/.
[26]
n.d. Firefox's protection against fingerprinting. https://support.mozilla.org/en-US/kb/firefox-protection-against-fingerprinting.
[27]
n.d. Google Analytics Cookie Usage on Websites). https://web.archive.org/web/ 20220812222800/https://developers.google.com/analytics/devguides/collection /gtagjs/cookie-usage.
[28]
n.d. ID5 Identity Cloud. https://web.archive.org/web/20220727094611/https: //www.id5.io/identity-cloud/.
[29]
n.d. Identity Guide. https://web.archive.org/web/20220115155115/https://yieldb ird.com/identity-guide/.
[30]
n.d. It's their word against their source code - TikTok Report. https://internet2-0.com/whitepaper/its-their-word-against-their-source-code-tiktok-report/.
[31]
n.d. Lotame - Data Collection Guide. https://web.archive.org/web/ 20210730071853/https://my.lotame.com/t/p8hxvnd/data-collection-guide.
[32]
n.d. Lotame Lightning Tag. https://web.archive.org/web/20220307010702/https: //my.lotame.com/t/m1hxv7l/lotame-lightning-tag.
[33]
n.d. Our New Approach to Address the Rise of Fingerprinting. https://blog.dis connect.me/our-new-approach-to-address-the-rise-of-fingerprinting/.
[34]
n.d. Panorama ID. https://web.archive.org/web/20220327180718/https://www.lo tame.com/panorama/id/.
[35]
n.d. TikTok Adds Third-Party Cookies To Its Pixel - And Tries To Eat Facebook's Lunch. https://web.archive.org/web/20220623232016/https: //www.adexchanger.com/online-advertising/tiktok-adds-third-party-cookies-to-its-pixel-and-tries-to-eat-facebooks-lunch/.
[36]
n.d. uBlock Origin: Resources Library. https://github.com/gorhill/uBlock/wiki /Resources-Library#cookie-removerjs-.
[37]
n.d. Using Cookies with TikTok Pixel. https://web.archive.org/web/ 20220610074648/https://ads.tiktok.com/help/article?aid=10007540.
[38]
n.d. What Facebook's First-Party Cookie Means for AdTech. https: //web.archive.org/web/20220729210450/https://clearcode.cc/blog/facebook-first-party-cookie-adtech/.
[39]
Mshabab Alrizah, Sencun Zhu, Xinyu Xing, and Gang Wang. 2019. Errors, Misunderstandings, and Attacks: Analyzing the Crowdsourcing Process of Adblocking Systems. In Proceedings of the 2019 Internet Measurement Conference (IMC).
[40]
Waqar Aqeel, Balakrishnan Chandrasekaran, Anja Feldmann, and Bruce M Maggs. 2020. On landing and internal web pages: The strange case of jekyll and hyde in web performance measurement. In Proceedings of the ACM Internet Measurement Conference.
[41]
Dino Bollinger. n.d. Analyzing Cookies Compliance with the GDPR. https://ww w.research-collection.ethz.ch/handle/20.500.11850/477333. Thesis, ETH Zurich.
[42]
Dino Bollinger, Karel Kubicek, Carlos Cotrini, and David Basin. 2022. Automating Cookie Consent and GDPR Violation Detection. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association.
[43]
Aaron Cahn, Scott Alfeld, Paul Barford, and S. Muthukrishnan. 2016. An Empirical Study of Web Cookies. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 891--901.
[44]
Quan Chen, Panagiotis Ilia, Michalis Polychronakis, and Alexandros Kapravelos. 2021. Cookie Swap Party: Abusing First-Party Cookies for Web Tracking. In Proceedings of the Web Conference.
[45]
Quan Chen and Alexandros Kapravelos. 2018. Mystique: Uncovering information leakage from browser extensions. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 1687--1700.
[46]
Andrey Chudnov and David A Naumann. 2015. Inlined information flow monitoring for JavaScript. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 629--643.
[47]
L. Montulli D. Kristol. 1997. HTTP State Management Mechanism. https://datatr acker.ietf.org/doc/html/rfc2109.
[48]
Savino Dambra, Iskander Sanchez-Rola, Leyla Bilge, and Davide Balzarotti. 2022. When Sally Met Trackers: Web Tracking From the Users' Perspective. In USENIX Security Symposium.
[49]
Díaz-Morales and Roberto. 2015. Cross-Device Tracking: Matching Devices and Cookies. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW). 1699--1704.
[50]
Brendan Eich. 2013. C is for Cookie. https://brendaneich.com/2013/05/c-is-for-cookie/.
[51]
Brendan Eich. 2013. The Cookie Clearinghouse. https://brendaneich.com/2013/ 06/the-cookie-clearinghouse/.
[52]
Steven Englehardt and Arvind Narayanan. 2016. Online tracking: A 1-million-site measurement and analysis. In Proceedings of ACM CCS 2016.
[53]
Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan Mayer, Arvind Narayanan, and Edward W. Felten. 2015. Cookies That Give You Away: The Surveillance Implications of Web Tracking. In Proceedings of the 24th International Conference on World Wide Web.
[54]
Imane Fouad, Nataliia Bielova, Arnaud Legout, and Natasa Sarafijanovic-Djukic. 2020. Missed by Filter Lists: Detecting Unknown Third-Party Trackers with Invisible Pixels. Proceedings on Privacy Enhancing Technologies 2020 (04 2020), 499--518. https://doi.org/10.2478/popets-2020-0038
[55]
Imane Fouad, Cristiana Santos, Arnaud Legout, and Nataliia Bielova. 2022. My Cookie is a phoenix: detection, measurement, and lawfulness of cookie respawning with browser fingerprinting. In Privacy Enhancing Technologies Symposium (PETS).
[56]
Google. n.d. The Privacy Sandbox. https://developer.chrome.com/docs/privacy- sandbox/.
[57]
Daniel Hedin, Arnar Birgisson, Luciano Bello, and Andrei Sabelfeld. 2014. JSFlow: Tracking information flow in JavaScript and its APIs. In Proceedings of the 29th Annual ACM Symposium on Applied Computing. 1663--1671.
[58]
Maximilian Hils, Daniel W Woods, and Rainer Böhme. 2020. Measuring the emergence of consent management on the web. In Proceedings of the ACM Internet Measurement Conference.
[59]
Xuehui Hu, Nishanth Sastry, and Mainack Mondal. 2021. CCCC: Corralling Cookies into Categories with CookieMonster. In 13th ACM Web Science Conference 2021. Association for Computing Machinery, 234--242.
[60]
Umar Iqbal, Steven Englehardt, and Zubair Shafiq. 2021. Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors. In IEEE Symposium on Security and Privacy (S&P). IEEE.
[61]
Umar Iqbal, Zubair Shafiq, and Zhiyun Qian. 2017. The Ad Wars: Retrospective Measurement and Analysis of Anti-Adblock Filter Lists. In IMC.
[62]
Umar Iqbal, Peter Snyder, Shitong Zhu, Benjamin Livshits, Zhiyun Qian, and Zubair Shafiq. 2020. AdGraph: A Graph-Based Approach to Ad and Tracker Blocking. In IEEE Symposium on Security and Privacy (S&P). IEEE.
[63]
Umar Iqbal, Charlie Wolfe, Charles Nguyen, Steven Englehardt, and Zubair Shafiq. 2022. Khaleesi: Breaker of Advertising and Tracking Request Chains. In USENIX Security Symposium (USENIX).
[64]
Pierre Laperdrix, Nataliia Bielova, Benoit Baudry, and Gildas Avoine. 2020. Browser fingerprinting: A survey. ACM Transactions on the Web (TWEB) 14, 2 (2020), 1--33.
[65]
Pierre Laperdrix, Walter Rudametkin, and Benoit Baudry. 2016. Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In 2016 IEEE Symposium on Security and Privacy (SP).
[66]
Hieu Le, Athina Markopoulou, and Zubair Shafiq. 2021. CV-Inspector: Towards Automating Detection of Adblock Circumvention. In Network and Distributed System Security Symposium (NDSS).
[67]
Sebastian Lekies, Ben Stock, and Martin Johns. 2013. 25 million flows later: Large-scale detection of DOM-based XSS. In Proceedings of the 2013 ACM SIGSAC conference on Computer and Communications Security. 1193--1204.
[68]
Pedro Giovanni Leon, Lorrie Faith Cranor, Aleecia M McDonald, and Robert McGuire. 2010. Token attempt: the misrepresentation of website privacy policies through the misuse of p3p compact policy tokens. In Proceedings of the 9th Annual ACM Workshop on Privacy in the Electronic Society.
[69]
MDN. 2022. Redirect tracking protection. https://developer.mozilla.org/en-US/d ocs/Mozilla/Firefox/Privacy/Redirect_Tracking_Protection. https://developer. mozilla.org/en-US/docs/Mozilla/Firefox/Privacy/Redirect_Tracking_Protection
[70]
Lou Montulli. 2013. The Reasoning Behind Web Cookies. http://montulli.blogs pot.com/2013/05/the-reasoning-behind-web-cookies.html.
[71]
Nick Nguyen. 2018. Latest Firefox Rolls Out Enhanced Tracking Protection. https://blog.mozilla.org/en/products/firefox/latest-firefox-rolls-out-enha nced-tracking-protection/.
[72]
ChangSeok Oh, Chris Kanich, Damon McCoy, and Paul Pearce. 2022. Cart-Ology: Intercepting Targeted Advertising via Ad Network Identity Entanglement. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security.
[73]
Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos P. Markatos. 2019. Cookie Synchronization: Everything You Always Wanted to Know But Were Afraid to Ask. In Proceedings of the World Wide Web (WWW) Conference.
[74]
Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński, and Wouter Joosen. 2018. Tranco: A research-oriented top sites ranking hardened against manipulation. arXiv preprint arXiv:1806.01156 (2018).
[75]
Audrey Randall, Peter Snyder, Alisha Ukani, Alex Snoeren, Geoff Voelker, Stefan Savage, and Aaron Schulman. 2022. Trackers Bounce Back: Measuring Evasion of Partitioned Storage in the Wild.
[76]
Franziska Roesner, Tadayoshi Kohno, and David Wetherall. 2012. Detecting and Defending Against Third-Party Tracking on the Web. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) (San Jose, CA). 155--168.
[77]
Iskander Sanchez-Rola, Matteo Dell'Amico, Davide Balzarotti, Pierre-Antoine Vervier, and Leyla Bilge. 2021. Journey to the center of the cookie ecosystem: Unraveling actors'; roles and relationships. In S&P 2021, 42nd IEEE Symposium on Security & Privacy, 23--27 May 2021, San Francisco, CA, USA.
[78]
Justin Schuh. 2020. Building a more private web: A path towards making third party cookies obsolete. https://blog.chromium.org/2020/01/building-more-priv ate-web-path-towards.html.
[79]
Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, and Carmela Troncoso. 2022. WebGraph: Capturing Advertising and Tracking Information Flows for Robust Blocking. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association.
[80]
Alexander Sjösten, Peter Snyder, Antonio Pastor, Panagiotis Papadopoulos, and Benjamin Livshits. 2020. Filter List Generation for Underserved Regions. In WWW.
[81]
Ben Stock, Sebastian Lekies, Tobias Mueller, Patrick Spiegel, and Martin Johns. 2014. Precise Client-side Protection against DOM-based Cross-Site Scripting. In 23rd USENIX Security Symposium (USENIX Security 14). San Diego, CA, 655--670.
[82]
Microsoft Edge Team. 2022. Introducing tracking prevention, now available in Microsoft Edge preview builds. https://blogs.windows.com/msedgedev/2019/06/ 27/tracking-prevention-microsoft-edge-preview/. https://blogs.windows.com/ msedgedev/2019/06/27/tracking-prevention-microsoft-edge-preview/
[83]
Alessandra Van Veen and AP de Vries. 2021. Cookie Compliance of Dutch Hospital Websites. (2021).
[84]
WebKit. 2022. Tracking Prevention in WebKit. https://webkit.org/tracking-prevention/. https://webkit.org/tracking-prevention/
[85]
John Wilander. 2017. Intelligent Tracking Prevention. https://webkit.org/blog/ 7675/intelligent-tracking-prevention/.
[86]
John Wilander. 2018. Intelligent Tracking Prevention 1.1. https://webkit.org/blo g/8142/intelligent-tracking-prevention-1--1//.
[87]
John Wilander. 2018. Intelligent Tracking Prevention 2.0. https://webkit.org/blo g/8311/intelligent-tracking-prevention-2-0/.
[88]
John Wilander. 2019. Intelligent Tracking Prevention 2.1. https://webkit.org/blo g/8613/intelligent-tracking-prevention-2-1/.
[89]
John Wilander. 2019. Intelligent Tracking Prevention 2.2. https://webkit.org/blo g/8828/intelligent-tracking-prevention-2--2/.
[90]
John Wilander. 2019. Intelligent Tracking Prevention 2.3. https://webkit.org/blo g/9521/intelligent-tracking-prevention-2--3/.
[91]
John Wilander. 2020. Full Third-Party Cookie Blocking and More. https://webkit.org/blog/10218/full-third-party-cookie-blocking-and-more/.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security
November 2023
3722 pages
ISBN:9798400700507
DOI:10.1145/3576915
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2023

Check for updates

Author Tags

  1. cookies
  2. machine learning
  3. privacy
  4. tracking
  5. web security

Qualifiers

  • Research-article

Funding Sources

Conference

CCS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,250
  • Downloads (Last 6 weeks)279
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Combating Web Tracking: Analyzing Web Tracking Technologies for User PrivacyFuture Internet10.3390/fi1610036316:10(363)Online publication date: 5-Oct-2024
  • (2024)SINBAD: Saliency-informed detection of breakage caused by ad blocking2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00199(258-276)Online publication date: 19-May-2024
  • (2024)Disposable identitiesJournal of Information Security and Applications10.1016/j.jisa.2024.10382184:COnline publication date: 25-Sep-2024
  • (2024)Information flow control for comparative privacy analysesInternational Journal of Information Security10.1007/s10207-024-00886-023:5(3199-3216)Online publication date: 14-Jul-2024
  • (2023)An Empirical Analysis of E-Governments’ Cookie Interfaces in 50 CountriesSustainability10.3390/su1502123115:2(1231)Online publication date: 9-Jan-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media