After one too many arbitrary format changes on the MOH web site I've decided to stop updating the scraper and shut down the online API. There are alternative sources of both live statistics and case data (see section below).
For me, this project was an object lesson in the futility of scraping hand-edited information. Open Data is necessary for the public to (feasibly) automatically process government-owned data. It turns out, in a crisis, Open Data is not a priority (indeed, as of 5 July 2020, in NZ the official government portal has scant COVID-19 datasets).
Sorry for the inconvenience, and thank you for your interest.
On April 12, the MOH stopped publishing all COVID-19 case details in a single table, and began reporting monthly cases. At this point I don't think it makes sense for this API to offer detailed case information. The last successfully scraped case data is now archived. I will leave the scraping code as is for those who want to use the CLI tool to download the current month's case data.
Similarly, the location (per-DHB statistics) which were derived from scraped cases will now be incorrect, and MOH's
own per-DHB case summary table is also only for the current month. Again, I will remove the API for /location/*
and
leave the CLI function in place, in case it is useful to anyone (unlikely, but who knows).
For those who are interested in obtaining a full snapshot of case information, the best source I know of is the via the arcgis.com dashboard linked from the MOH webste.
Specifically tables that appear to be obtained from, or maintained by ESR in the backend web service, can be dumped in JSON format with the right query strings:
ESR now provides a dashboard that (presumably) renders statistics directly from the authoritative database that all the NZ COVID-19 comes from (EpiSurv): https://nzcoviddashboard.esr.cri.nz/
Unfortunately there is no usable API. As far as I can tell, R Shiny-server uses a baroque home-grown protocol. It exchanges strangely encoded messages (mixed with JSON) over streaming XHR connections:
Client: ["0#0|o|"]
Server: a["1#0|m|{\"busy\":\"busy\"}"]
If anyone feels there is significant value in reverse-engineering this, feel free to open an issue.
This code is intended to scrape the following sources of COVID-19 data in New Zealand, and render the data in various formats suitable for mapping, visualisation and analysis:
- Ministry Of Health COVID-19 COVID-19 case page
- The government COVID-19 alert level page
Use this with caution - the NZ government may change their pages and break the scraper at any time.
This code is used as the core of an API service I'm running: https://nzcovid19api.xerra.nz/
Courtesy of @gizmoguy, the metrics exported are scraped by a Prometheus server, and visualised on a
Grafana dashboard:
To build the utilities, you'll need a go 1.13+ toolchain installed (check out https://golang.org/dl/ for details).
Running ./build.sh
will build each tool in the cmd/
subdirectories.
If you don't want to futz with Go, a Dockerfile is provided. Use docker to build a container:
$ docker build -t nzcovid19cases .
<snip>
Successfully tagged nzcovid19cases:latest
For now there is a CLI tool.
cmd/nzcovid19-cli$ ./nzcovid19-cli
Usage: ./cmd/nzcovid19-cli/nzcovid19-cli <action>
Where <action> is one of:
- cases/json
- cases/csv
- locations/json
- locations/csv
- alertlevel/json
- casestats/json
- clusters/json
- clusters/csv
$ docker run -ti --rm nzcovid19cases alertlevel/json
{
"Level": 4,
"LevelName": "Eliminate"
}
This code is published under the MIT license.
The data processed by this tool is published under:
- The Ministry Of Health's copyright which at the time of writing is Creative Commons Attribution 4.0 International Licence with some exceptions.
- The Crown Copyright.