Internet Archive: Difference between revisions
Tantek.com (talk | contribs) (ββKnown: issue filed for apparent limitation) |
Tantek.com (talk | contribs) (ββTrigger an Archive: nowiki URLs inside code) |
||
Line 8: | Line 8: | ||
You can tell archive.org to crawl and archive a specific URL immediately. | You can tell archive.org to crawl and archive a specific URL immediately. | ||
<code style="display:block;padding:1em">$ curl [[curl -I|-I]] [[curl -H|-H]] "Accept: application/json" https://web.archive.org/save/{url to archive} | grep Content-Location</code> | <code style="display:block;padding:1em"><nowiki>$ curl [[curl -I|-I]] [[curl -H|-H]] "Accept: application/json" https://web.archive.org/save/{url to archive} | grep Content-Location</nowiki></code> | ||
and you'll get a response like: | and you'll get a response like: |
Revision as of 19:51, 7 April 2017
This article is a stub. You can help the IndieWeb wiki by expanding it.
The Internet Archive is a non-profit organization that is building a digital library, including archival copy of much of the public web.
How to
Trigger an Archive
You can tell archive.org to crawl and archive a specific URL immediately.
$ curl [[curl -I|-I]] [[curl -H|-H]] "Accept: application/json" https://web.archive.org/save/{url to archive} | grep Content-Location
and you'll get a response like:
Content-Location: /web/20160715203015/http://indieweb.org
The response includes the path to the archived page on web.archive.org. Append this path to https://web.archive.org to build the final URL for the archived page.
Trigger Archive in PHP
PHP snippet to call the curl_exec()
function with appropriate options/params to trigger an archive:
$options = array(CURLOPT_URL => ('https://web.archive.org/save/' . $url_to_save), CURLOPT_HEADER => true, CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => array('Accept: application/json'), CURLOPT_USERAGENT => "YOUR_CMS_NAME_HERE"); $ch = curl_init(); curl_setopt_array($ch, $options); $response = curl_exec($ch); $info = curl_getinfo($ch); curl_close($ch);
You can then check $info['http_code']
for a numeric HTTP return code to see if there was an error (e.g. >= 400) and take action accordingly.
Trigger Archive in Python (with requests)
This method requires the 'requests' Python library.
pip install requests
import requests # fire and forget archive.org call try: verify = requests.get( '%s%s' % ('https://web.archive.org/save/', target), allow_redirects=False, timeout=30, ) except: pass
Trigger Archive in Ruby
This is a Ruby method that triggers archiving by the Internet Archive for a given URL. It returns the URL that you can visit to see the archived page. It uses the rest-client Ruby gem to make a GET request.
require 'rest-client' def web_archive(url) archive_request_response = RestClient.get("https://web.archive.org/save/#{url}") "https://web.archive.org" + archive_request_response.headers[:content_location] end
Known
(new!) There is a Known plugin to auto-archive your posts and edits to your posts, as well as pages that you bookmark:
- https://www.marcus-povey.co.uk/2017/02/02/archive-org-wayback-machine-support-for-known/
- source: https://github.com/mapkyca/KnownWaybackMachine
Limitation(?)
- Appears to NOT archive the links in your posts, except for bookmark posts.
- SHOULD: Archive all pages you link to, e.g. all pages that you (even attempt to e.g. do discovery on to) send Webmentions to.
- Github issue: https://github.com/mapkyca/KnownWaybackMachine/issues/1
WordPress
There is a useful little plugin Post Archival in the Internet Archive which will not only archive the user's post, but will also archive all the links within the post.
IndieWeb Examples
Jeremy Keith
Jeremy Keith has been pinging web.archive.org/save to archive pages that adactio.com posts link to since 2016-09-26
Aaron Parecki
Aaron Parecki has been pinging web.archive.org/save with every URL that a Webmention is sent to since 2016-09-26
Tantek
Tantek Γelik implemented (2016-11-13) pinging web.archive.org/save with every URL in a link/reply-to and has been doing it from tantek.com since 2016-11-15.
Chris Aldrich
Chris Aldrich implemented (2017-01-07) pinging the archive with URLs of both his own posts as well as links within posts using Post Archival in the Internet Archive.
svgur
Kevin Marks implemented pinging the archive with URLs of the SVG display posts, and thus indirectly the SVGs themselves
mention-tech
Kevin Marks implemented (2017-01-24) pinging the archive with both the source and target URLs of the webmention so that they get preserved too.
Requests
- Telegraph support - https://github.com/aaronpk/Telegraph/issues/14 - should support auto-archiving for any sent Webmention