Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3133956.3134042acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Rewriting History: Changing the Archived Web from the Present

Published: 30 October 2017 Publication History

Abstract

The Internet Archive's Wayback Machine is the largest modern web archive, preserving web content since 1996. We discover and analyze several vulnerabilities in how the Wayback Machine archives data, and then leverage these vulnerabilities to create what are to our knowledge the first attacks against a user's view of the archived web. Our vulnerabilities are enabled by the unique interaction between the Wayback Machine's archives, other websites, and a user's browser, and attackers do not need to compromise the archives in order to compromise users' views of a stored page. We demonstrate the effectiveness of our attacks through proof-of-concept implementations. Then, we conduct a measurement study to quantify the prevalence of vulnerabilities in the archive. Finally, we explore defenses which might be deployed by archives, website publishers, and the users of archives, and present the prototype of a defense for clients of the Wayback Machine, ArchiveWatcher.

Supplemental Material

MP4 File

References

[1]
Ada Lerner, Anna Kornfeld Simpson, Tadayoshi Kohno, Franziska Roesner 2016. Internet Jones and the Raiders of the Lost Trackers: An Arcahaeological Study of Web Tracking from 1996 to 2016. 25th USENIX Security Symposium (August 2016).
[2]
Scott G. Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, and Michael L. Nelson. 2012. How Much of the Web Is Archived? arxiv.org (2012), 1--10. showeprint[arxiv]1212.6177http://arxiv.org/abs/1212.6177
[3]
Scott G Ainsworth and Michael L Nelson 2004. Only One Out of Five Archived Web Pages Existed as Presented. ACM HT'15 (2004). http://public.lanl.gov/herbertv/papers/Papers/2015/ht15-ainsworth-submission.pdf
[4]
Scott G Ainsworth, Michael L Nelson, and Herbert Van de Sompel 2015. Only One Out of Five Archived Web Pages Existed as Presented Proceedings of the 26th ACM Conference on Hypertext & Social Media. ACM, 257--266.
[5]
Internet Archive. 2017. Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. https://github.com/internetarchive/heritrix3. (2017). shownoteAccessed: 2017-08--16.
[6]
Internet Archive. 2017. IA's public Wayback Machine (moved from SourceForge). https://github.com/internetarchive/wayback. (2017). shownoteAccessed: 2017-08--16.
[7]
Justin F. Brunelle. 2012. 2012--10--10: Zombies in the Archives. http://ws-dl.blogspot.com/2012/10/2012--10--10-zombies-in-archives.html. (2012). shownoteAccessed: 2017-05--13.
[8]
Justin F Brunelle, Mat Kelly, Hany Salaheldeen, Michele C Weigle, and Michael L Nelson. 2015. Not All Mementos Are Created Equal : Measuring The Impact Of Missing Resources Categories and Subject Descriptors. International Journal on Digital Libraries (2015).
[9]
International Internet Preservation Consortium 2017. The OpenWayback Development http://www.netpreserve.org/openwayback. https://github.com/iipc/openwayback. (2017). shownoteAccessed: 2017-08--16.
[10]
Shawn E. Douglas [n. d.]. Citing from a Digital Archive like the Internet Archive: A Cheat Sheet. http://www.writediteach.com/images/Citing%20from%20a%20Digital%20Archive%20like%20the%20Internet%20Archive.pdf. ( [n. d.]). shownoteAccessed: 2017-05-08.
[11]
Peter Eckersley. 2010. How unique is your web browser? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 6205 LNCS (2010), 1--18. w.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/soska
[12]
Stanford Libraries. 2017. Web Archiving | Stanford Libraries. http://library.stanford.edu/projects/web-archiving. (2017). shownoteAccessed: 2017-08--16.
[13]
Wikipedia. 2017. List of Web archiving initiatives. https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives. (2017). shownoteAccessed: 2017-08--16.endthebibliography

Cited By

View all
  • (2024)Understanding the Breakdown of Same-origin Policies in Web Services That Rehost WebsitesJournal of Information Processing10.2197/ipsjjip.32.80132(801-816)Online publication date: 2024
  • (2024)Challenges in replaying archived Twitter pagesInternational Journal on Digital Libraries10.1007/s00799-023-00379-w25:2(217-236)Online publication date: 1-Jun-2024
  • (2023)Hashes are not suitable to verify fixity of the public archived webPLOS ONE10.1371/journal.pone.028687918:6(e0286879)Online publication date: 9-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
October 2017
2682 pages
ISBN:9781450349468
DOI:10.1145/3133956
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. web archives
  2. web security

Qualifiers

  • Research-article

Funding Sources

Conference

CCS '17
Sponsor:

Acceptance Rates

CCS '17 Paper Acceptance Rate 151 of 836 submissions, 18%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)181
  • Downloads (Last 6 weeks)24
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Understanding the Breakdown of Same-origin Policies in Web Services That Rehost WebsitesJournal of Information Processing10.2197/ipsjjip.32.80132(801-816)Online publication date: 2024
  • (2024)Challenges in replaying archived Twitter pagesInternational Journal on Digital Libraries10.1007/s00799-023-00379-w25:2(217-236)Online publication date: 1-Jun-2024
  • (2023)Hashes are not suitable to verify fixity of the public archived webPLOS ONE10.1371/journal.pone.028687918:6(e0286879)Online publication date: 9-Jun-2023
  • (2023)Know(ing) Infrastructure: The Wayback Machine as object and instrument of digital researchConvergence: The International Journal of Research into New Media Technologies10.1177/1354856523116475930:1(167-189)Online publication date: 30-Mar-2023
  • (2023)To Re-experience the Web: A Framework for the Transformation and Replay of Archived Web PagesACM Transactions on the Web10.1145/358920617:4(1-49)Online publication date: 11-Jul-2023
  • (2022)WARChainJournal of Computer Security10.3233/JCS-21004030:3(499-515)Online publication date: 1-Jan-2022
  • (2022)“Way back then”: A Data-driven View of 25+ years of Web EvolutionProceedings of the ACM Web Conference 202210.1145/3485447.3512283(3471-3479)Online publication date: 25-Apr-2022
  • (2022)Caching HTTP 404 Responses Eliminates Unnecessary Archival Replay RequestsFrom Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries10.1007/978-3-031-21756-2_26(329-344)Online publication date: 30-Nov-2022
  • (2022)A Chromium-Based Memento-Aware Web BrowserLinking Theory and Practice of Digital Libraries10.1007/978-3-031-16802-4_12(147-160)Online publication date: 20-Sep-2022
  • (2021)Privacy Policies over Time: Curation and Analysis of a Million-Document DatasetProceedings of the Web Conference 202110.1145/3442381.3450048(2165-2176)Online publication date: 19-Apr-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media