Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3241094.3241172guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Internet jones and the raiders of the lost trackers: an archaeological study of web tracking from 1996 to 2016

Published: 10 August 2016 Publication History

Abstract

Though web tracking and its privacy implications have received much attention in recent years, that attention has come relatively recently in the history of the web and lacks full historical context. In this paper, we present longitudinal measurements of third-party web tracking behaviors from 1996 to present (2016). Our tool, TrackingExcavator, leverages a key insight: that the Internet Archive's Wayback Machine opens the possibility for a retrospective analysis of tracking over time. We contribute an evaluation of the Wayback Machine's view of past third-party requests, which we find is imperfect-we evaluate its limitations and unearth lessons and strategies for overcoming them. Applying these strategies in our measurements, we discover (among other findings) that third-party tracking on the web has increased in prevalence and complexity since the first third-party tracker that we observe in 1996, and we see the spread of the most popular trackers to an increasing percentage of the most popular sites on the web. We argue that an understanding of the ecosystem's historical trends-which we provide for the first time at this scale in our work-is important to any technical and policy discussions surrounding tracking.

References

[1]
Disney absorbs Infoseek, July 1999. http://money.cnn.com/1999/07/12/deals/disney/.
[2]
Grad School Rankings, Engineering Specialties: Computer, 1999. https://web.archive.org/web/19990427094034/http://www4.usnews.com/usnews/edu/beyond/gradrank/gbengsp5.htm.
[3]
ACAR, G., EUBANK, C., ENGLEHARDT, S., JUAREZ, M., NARAYANAN, A., AND DIAZ, C. The Web Never Forgets: Persistent Tracking Mechanisms in the Wild. In Proceedings of the ACM Conference on Computer and Communications Security (2014).
[4]
ACAR, G., JUAREZ, M., NIKIFORAKIS, N., DIAZ, C., GÜRSES, S., PIESSENS, F., AND PRENEEL, B. FPDetective: Dusting the web for fingerprinters.
[5]
AINSWORTH, S. G., NELSON, M. L., AND VAN DE SOMPEL, H. Only One Out of Five Archived Web Pages Existed as Presented. 257-266.
[6]
AKKUS, I. E., CHEN, R., HARDT, M., FRANCIS, P., AND GEHRKE, J. Non-tracking web analytics. In Proceedings of the ACM Conference on Computer and Communications Security (2012).
[7]
BARTH, A. HTTP State Management Mechanism, Apr. 2011. https://tools.ietf.org/html/rfc6265.
[8]
BAU, J., MAYER, J., PASKOV, H., AND MITCHELL, J. C. A Promising Direction for Web Tracking Countermeasures. In Web 2.0 Security and Privacy (2013).
[9]
BRUNELLE, J. F. 2012-10-10: Zombies in the Archives. http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html.
[10]
BRUNELLE, J. F., KELLY, M., SALAHELDEEN, H., WEIGLE, M. C., AND NELSON, M. L. Not All Mementos Are Created Equal : Measuring The Impact Of Missing Resources Categories and Subject Descriptors. International Journal on Digital Libraries (2015).
[11]
CHROMIUM. CookieMonster. https://www.chromium.org/developers/design-documents/networkstack/cookiemonster.
[12]
CZYZ, J., ALLMAN, M., ZHANG, J., IEKEL-JOHNSON, S., OSTERWEIL, E., AND BAILEY, M. Measuring IPv6 Adoption. ACM SIGCOMM Computer Communication Review 44, 4 (2015), 87-98.
[13]
D. KRISTOL, L. M. HTTP State Management Mechanism, Oct. 2000. https://tools.ietf.org/html/rfc2965.html.
[14]
DHAWAN, M., KREIBICH, C., AND WEAVER, N. The Priv3 Firefox Extension. http://priv3.icsi.berkeley.edu/.
[15]
ECKERSLEY, P. How unique is your web browser? In Proceedings of the International Conference on Privacy Enhancing Technologies (2010).
[16]
ELECTRONIC FRONTIER FOUNDATION. Privacy Badger. https://www.eff.org/privacybadger.
[17]
ELTGROTH, D. R. Best Evidence and the Wayback Machine: a Workable Authentication Standard for Archived Internet Evidence. 78 Fordham L. Rev. 181. (2009), 181-215.
[18]
ENGLEHARDT, S., EUBANK, C., ZIMMERMAN, P., REISMAN, D., AND NARAYANAN, A. OpenWPM: An automated platform for web privacy measurement. Tech. rep., Princeton University, Mar. 2015.
[19]
ENGLEHARDT, S., REISMAN, D., EUBANK, C., ZIMMERMAN, P., MAYER, J., NARAYANAN, A., AND FELTEN, E. W. Cookies That Give You Away: The Surveillance Implications of Web Tracking. In Proceedings of the 24th International World Wide Web Conference (2015).
[20]
EUBANK, C., MELARA, M., PEREZ-BOTERO, D., AND NARAYANAN, A. Shining the Floodlights on Mobile Web Tracking - A Privacy Survey. In Proceedings of the IEEE Workshop on Web 2.0 Security and Privacy (2013).
[21]
FOUNDATION, P. S. 21.24. http.cookiejar Cookie handling for HTTP clients, Feb. 2015. https://docs.python.org/3.4/library/http.cookiejar.html.
[22]
FREDRIKSON, M., AND LIVSHITS, B. RePriv: Re-Envisioning In-Browser Privacy. In Proceedings of the IEEE Symposium on Security and Privacy (2011).
[23]
GHOSTERY. Ghostery. https://www.ghostery.com.
[24]
GUHA, S., CHENG, B., AND FRANCIS, P. Challenges in measuring online advertising systems. In Proceedings of the ACM Internet Measurement Conference (2010).
[25]
GUHA, S., CHENG, B., AND FRANCIS, P. Privad: Practical Privacy in Online Advertising. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (2011).
[26]
HACKETT, S., PARMANTO, B., AND ZENG, X. Accessibility of Internet Websites Through Time. In Proceedings of the 6th International ACM SIGACCESS Conference on Computers and Accessibility (New York, NY, USA, 2004), Assets '04, ACM, pp. 32-39.
[27]
HAN, S., JUNG, J., AND WETHERALL, D. A Study of Third-Party Tracking by Mobile Apps in the Wild. Tech. Rep. UW-CSE- 12-03-01, University of Washington, Mar. 2012.
[28]
HAN, S., LIU, V., PU, Q., PETER, S., ANDERSON, T. E., KRISHNAMURTHY, A., AND WETHERALL, D. Expressive Privacy Control with Pseudonyms. In SIGCOMM (2013).
[29]
HANNAK, A., SAPIEŽYNSKI, P., KAKHKI, A. M., KRISHNAMURTHY, B., LAZER, D., MISLOVE, A., AND WILSON, C. Measuring Personalization of Web Search. In Proceedings of the International World Wide Web Conference (2013).
[30]
IHM, S., AND PAI, V. Towards Understanding ModernWeb Traffic. In Proceedings of the ACM Internet Measurement Conference (2011).
[31]
INTERNET ARCHIVE. Wayback Machine. https://archive.org/.
[32]
JACKSON, C., BORTZ, A., BONEH, D., AND MITCHELL, J. C. Protecting Browser State FromWeb Privacy Attacks. In Proceedings of the International World Wide Web Conference (2006).
[33]
JANG, D., JHALA, R., LERNER, S., AND SHACHAM, H. An empirical study of privacy-violating information flows in JavaScript web applications. In Proceedings of the ACM Conference on Computer and Communications Security (2010).
[34]
JENSEN, C., SARKAR, C., JENSEN, C., AND POTTS, C. Tracking website data-collection and privacy practices with the iWatch web crawler. In Proceedings of the Symposium on Usable Privacy and Security (2007).
[35]
JOHN, N. A. Sharing and Web 2.0: The emergence of a keyword. New Media & Society (2012).
[36]
JONES, S. M., NELSON, M. L., SHANKAR, H., AND DE SOMPEL, H. V. Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension. CoRR abs/1406.3876 (2014).
[37]
KAMKAR, S. Evercookie-virtually irrevocable persistent cookies. http://samy.pl/evercookie/.
[38]
KELLY, M., BRUNELLE, J. F., WEIGLE, M. C., AND NELSON, M. L. On the Change in Archivability of Websites Over Time. CoRR abs/1307.8067 (2013).
[39]
KONTAXIS, G., POLYCHRONAKIS, M., KEROMYTIS, A. D., AND MARKATOS, E. P. Privacy-preserving social plugins. In USENIX Security Symposium (2012).
[40]
KRISHNAMURTHY, B., NARYSHKIN, K., AND WILLS, C. Privacy Leakage vs. Protection Measures: The Growing Disconnect. In Proceedings of the IEEE Workshop on Web 2.0 Security and Privacy (2011).
[41]
KRISHNAMURTHY, B., AND WILLS, C. On the leakage of personally identifiable information via online social networks. In Proceedings of the ACM Workshop on Online Social Networks (2009).
[42]
KRISHNAMURTHY, B., AND WILLS, C. Privacy Diffusion on the Web: a Longitudinal Perspective. In Proceedings of the International World Wide Web Conference (2009).
[43]
KRISHNAMURTHY, B., AND WILLS, C. E. Generating a Privacy Footprint on the Internet. In Proceedings of the ACM Internet Measurement Conference (2006).
[44]
KRISTOL, D., AND MONTULLI, L. RFC 2109 - HTTP State Management Mechanism, 1997. https://tools.ietf.org/html/rfc2109.
[45]
LÉCUYER, M., DUCOFFE, G., LAN, F., PAPANCEA, A., PETSIOS, T., SPAHN, R., CHAINTREAU, A., AND GEAMBASU, R. XRay: Enhancing the Web's Transparency with Differential Correlation. In 23rd USENIX Security Symposium (2014).
[46]
LEON, P. G., UR, B., WANG, Y., SLEEPER, M., BALEBAKO, R., SHAY, R., BAUER, L., CHRISTODORESCU, M., AND CRANOR, L. F. What Matters to Users? Factors that Affect Users' Willingness to Share Information with Online Advertisers. In Symposium on Usable Privacy and Security (2013).
[47]
LEONTIADIS, N., MOORE, T., AND CHRISTIN, N. A nearly four-year longitudinal study of search-engine poisoning. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (2014), ACM, pp. 930-941.
[48]
LUND, A. The History of Online Ad Targeting, 2014. http://www.sojern.com/blog/history-online-ad-targeting/.
[49]
MAYER, J., AND NARAYANAN, A. Do Not Track. http://donottrack.us/.
[50]
MAYER, J. R., AND MITCHELL, J. C. Third-Party Web Tracking: Policy and Technology. In Proceedings of the IEEE Symposium on Security and Privacy (2012).
[51]
MCDONALD, A. M., AND CRANOR, L. F. Americans' Attitudes about Internet Behavioral Advertising Practices. In Proceedings of the Workshop on Privacy in the Electronic Society (2010).
[52]
MILNE, G. R., AND CULNAN, M. J. Using the content of online privacy notices to inform public policy: A longitudinal analysis of the 1998-2001 US Web surveys. The Information Society 18, 5 (2002), 345-359.
[53]
MURPHY, J., HASHIM, N. H., AND OCONNOR, P. Take Me Back: Validating the Wayback Machine. Journal of Computer-Mediated Communication 13, 1 (2007), 60-75.
[54]
NARAINE, R. Windows XP SP2 Turns 'On' Pop-up Blocking, 2004. http://www.internetnews.com/dev-news/article.php/3327991.
[55]
NIKIFORAKIS, N., INVERNIZZI, L., KAPRAVELOS, A., VAN ACKER, S., JOOSEN, W., KRUEGEL, C., PIESSENS, F., AND VIGNA, G. You Are What You Include: Large-scale Evaluation of Remote Javascript Inclusions. In Proceedings of the ACM Conference on Computer and Communications Security (2012).
[56]
NIKIFORAKIS, N., JOOSEN, W., AND LIVSHITS, B. Privaricator: Deceiving fingerprinters with little white lies. In Proceedings of the 24th International Conference on World Wide Web (2015), InternationalWorldWideWeb Conferences Steering Committee, pp. 820-830.
[57]
NIKIFORAKIS, N., KAPRAVELOS, A., JOOSEN, W., KRUEGEL, C., PIESSENS, F., AND VIGNA, G. Cookieless Monster: Exploring the Ecosystem of Web-based Device Fingerprinting. In Proceedings of the IEEE Symposium on Security and Privacy (2013).
[58]
RESEARCH LIBRARY OF LOS ALAMOS NATIONAL LABORATORY. Time Travel. http://timetravel.mementoweb.org/about/.
[59]
REZNICHENKO, A., AND FRANCIS, P. Private-by-Design Advertising Meets the Real World. In Proceedings of the ACM Conference on Computer and Communications Security (2014).
[60]
ROESNER, F., KOHNO, T., AND WETHERALL, D. Detecting and Defending Against Third-Party Tracking on the Web. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (2012).
[61]
ROESNER, F., ROVILLOS, C., KOHNO, T., AND WETHERALL, D. ShareMeNot: Balancing Privacy and Functionality of Third-Party Social Widgets. USENIX ;login: 37 (2012).
[62]
SOSKA, K., AND CHRISTIN, N. Automatically detecting vulnerable websites before they turn malicious. In 23rd USENIX Security Symposium (USENIX Security 14) (2014), pp. 625-640.
[63]
STEVEN ENGLEHARDT. Do privacy studies help? A Retrospective look at Canvas Fingerprinting. https://freedom-to-tinker.com/blog/englehardt/retrospective-look-at-canvas-fingerprinting/.
[64]
TOUBIANA, V., NARAYANAN, A., BONEH, D., NISSENBAUM, H., AND BAROCAS, S. Adnostic: Privacy Preserving Targeted Advertising. In Proceedings of the Network and Distributed System Security Symposium (2010).
[65]
UR, B., LEON, P. G., CRANOR, L. F., SHAY, R., AND WANG, Y. Smart, useful, scary, creepy: perceptions of online behavioral advertising. In 8th Symposium on Usable Privacy and Security (2012).
[66]
VISSERS, T., NIKIFORAKIS, N., BIELOVA, N., AND JOOSEN, W. Crying wolf? on the price discrimination of online airline tickets. In HotPETS (2014).
[67]
WAGNER, C., GEBREMICHAEL, M. D., TAYLOR, M. K., AND SOLTYS, M. J. Disappearing act: decay of uniform resource locators in health care management journals. Journal of the Medical Library Association : JMLA 97, 2 (2009), 122-130.
[68]
WANG, D. Y., SAVAGE, S., AND VOELKER, G. M. Juice: A Longitudinal Study of an SEO Botnet. In NDSS (2013).
[69]
WASHINGTON POST. From Lycos to Ask Jeeves to Facebook: Tracking the 20 most popular web sites every year since 1996. https://www.washingtonpost.com/news/theintersect/wp/2014/12/15/from-lycos-to-ask-jeeves-to-facebook-tracking-the-20-most-popular-web-sites-every-year-since-1996/.
[70]
WILLS, C. E., AND TATAR, C. Understanding what they do with what they know. In Proceedings of the ACM Workshop on Privacy in the Electronic Society (2012).
[71]
YEN, T.-F., XIE, Y., YU, F., YU, R. P., AND ABADI, M. Host Fingerprinting and Tracking on the Web: Privacy and Security Implications. In Proceedings of the Network and Distributed System Security Symposium (2012).
[72]
ZACK WHITTAKER. PGP co-founder: Ad companies are the biggest privacy problem today, not governments, 2016. www.zdnet.com/article/pgp-co-founder-the-biggest-privacy-issue-today-are-online-ads/.

Cited By

View all
  • (2024)An Identity Alignment Method based on Online TrackingCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651469(609-612)Online publication date: 13-May-2024
  • (2024)Phishing Vs. Legit: Comparative Analysis of Client-Side Resources of Phishing and Target Brand WebsitesProceedings of the ACM Web Conference 202410.1145/3589334.3645535(1756-1767)Online publication date: 13-May-2024
  • (2023)Thou Shalt Not Reject: Analyzing Accept-Or-Pay Cookie Banners on the WebProceedings of the 2023 ACM on Internet Measurement Conference10.1145/3618257.3624846(154-161)Online publication date: 24-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SEC'16: Proceedings of the 25th USENIX Conference on Security Symposium
August 2016
1240 pages
ISBN:9781931971324

Sponsors

  • Google Inc.
  • NSF
  • Microsoft: Microsoft
  • Facebook: Facebook
  • CISCO

Publisher

USENIX Association

United States

Publication History

Published: 10 August 2016

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Identity Alignment Method based on Online TrackingCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651469(609-612)Online publication date: 13-May-2024
  • (2024)Phishing Vs. Legit: Comparative Analysis of Client-Side Resources of Phishing and Target Brand WebsitesProceedings of the ACM Web Conference 202410.1145/3589334.3645535(1756-1767)Online publication date: 13-May-2024
  • (2023)Thou Shalt Not Reject: Analyzing Accept-Or-Pay Cookie Banners on the WebProceedings of the 2023 ACM on Internet Measurement Conference10.1145/3618257.3624846(154-161)Online publication date: 24-Oct-2023
  • (2023)From Privacy Policies to Privacy Threats: A Case Study in Policy-Based Threat ModelingProceedings of the 22nd Workshop on Privacy in the Electronic Society10.1145/3603216.3624962(17-29)Online publication date: 26-Nov-2023
  • (2022)To Block or Not to Block: Accelerating Mobile Web Pages On-The-Fly Through JavaScript ClassificationProceedings of the 2022 International Conference on Information and Communication Technologies and Development10.1145/3572334.3572397(1-12)Online publication date: 27-Jun-2022
  • (2022)An Empirical View on Consolidation of the WebACM Transactions on Internet Technology10.1145/350315822:3(1-30)Online publication date: 12-Feb-2022
  • (2021)Predicting Voting Behavior Using Digital Trace DataSocial Science Computer Review10.1177/089443931988289639:5(862-883)Online publication date: 1-Oct-2021
  • (2021)Privacy Policies over Time: Curation and Analysis of a Million-Document DatasetProceedings of the Web Conference 202110.1145/3442381.3450048(2165-2176)Online publication date: 19-Apr-2021
  • (2021)What Makes a “Bad” Ad? User Perceptions of Problematic Online AdvertisingProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445459(1-24)Online publication date: 6-May-2021
  • (2020)PERCIVALProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489172(387-400)Online publication date: 15-Jul-2020
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media