Internet Archive
This article is a stub. You can help the IndieWeb wiki by expanding it.
The Internet Archive is a non-profit organization that is building a digital library, including archival copy of much of the public web.
How to
Trigger an Archive
You can tell archive.org to crawl and archive a specific URL immediately.
$ curl -I -H "Accept: application/json" http://web.archive.org/save/{url to archive} | grep Content-Location
and you'll get a response like:
Content-Location: /web/20160715203015/http://indieweb.org
The response includes the path to the archived page on web.archive.org. Append this path to http://web.archive.org to build the final URL for the archived page.
PHP snippet to call the curl_exec()
function with appropriate options/params:
$options = array(CURLOPT_URL => ('http://web.archive.org/save/' . $url_to_save), CURLOPT_HEADER => true, CURLOPT_RETURNTRANSFER => true, CURLOPT_HTTPHEADER => array('Accept: application/json'), CURLOPT_USERAGENT => "YOUR_CMS_NAME_HERE"); $ch = curl_init(); curl_setopt_array($ch, $options); $response = curl_exec($ch); $info = curl_getinfo($ch); curl_close($ch);
You can then check $info['http_code']
for a numeric HTTP return code to see if there was an error (e.g. >= 400) and take action accordingly.
IndieWeb Examples
- Jeremy Keith has been pinging web.archive.org/save to archive pages that adactio.com posts link to since 2016-??-??
- Aaron Parecki has been pinging web.archive.org/save with every URL that a Webmention is sent to since 2016-09-26
Requests
- Telegraph support - https://github.com/aaronpk/Telegraph/issues/14 - should support auto-archiving for any sent Webmention