Internet Archive

From IndieWeb
Revision as of 00:22, 14 November 2016 by Tantek.com (talk | contribs) (β†’β€ŽTrigger an Archive: add php cnippet to call curl_exec)


The Internet Archive is a non-profit organization that is building a digital library, including archival copy of much of the public web.

How to

Trigger an Archive

You can tell archive.org to crawl and archive a specific URL immediately.

$ curl -I -H "Accept: application/json" http://web.archive.org/save/{url to archive} | grep Content-Location

and you'll get a response like:

Content-Location: /web/20160715203015/http://indieweb.org

The response includes the path to the archived page on web.archive.org. Append this path to http://web.archive.org to build the final URL for the archived page.

PHP snippet to call the curl_exec() function with appropriate options/params:

$options = 
array(CURLOPT_URL => ('http://web.archive.org/save/' . $url_to_save),
      CURLOPT_HEADER => true,
      CURLOPT_RETURNTRANSFER => true,
      CURLOPT_HTTPHEADER => array('Accept: application/json'),
      CURLOPT_USERAGENT => "YOUR_CMS_NAME_HERE");
$ch = curl_init();
curl_setopt_array($ch, $options);
$response = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);

You can then check $info['http_code'] for a numeric HTTP return code to see if there was an error (e.g. >= 400) and take action accordingly.

IndieWeb Examples

  • Jeremy Keith has been pinging web.archive.org/save to archive pages that adactio.com posts link to since 2016-??-??
  • Aaron Parecki has been pinging web.archive.org/save with every URL that a Webmention is sent to since 2016-09-26

Requests

See Also