Please include text files with the hash values of the future entity dumps in https://dumps.wikimedia.org/wikidatawiki/entities/ in order to check data integrity. These files could be similar to the *sums.txt ones in https://dumps.wikimedia.org/wikidatawiki/latest/.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add checksums for Wikidata entity dumps | operations/puppet | production | +12 -0 |
Event Timeline
Would we want one hash sum file per (dated) folder, or one for everything? Or both?
If one for everything, should it contain just the base file names (like wikidata-20180323-truthy-BETA.nt.bz2), or the relative path (like 20180323/wikidata-20180323-truthy-BETA.nt.bz2).
One per folder I suppose, so that as a particular run finishes up, the hash info is available.
Change 423353 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[operations/puppet@production] Add checksums for Wikidata entity dumps
Change 423353 merged by ArielGlenn:
[operations/puppet@production] Add checksums for Wikidata entity dumps
Mentioned in SAL (#wikimedia-operations) [2018-04-05T12:04:22Z] <hoo> Manually back-filled hashes for the Wikidata JSON dumps in https://dumps.wikimedia.org/wikidatawiki/entities/20180402/wikidata-20180402-*sums.txt (T190457)
First checksums are available: https://dumps.wikimedia.org/wikidatawiki/entities/20180402/wikidata-20180402-md5sums.txt and https://dumps.wikimedia.org/wikidatawiki/entities/20180402/wikidata-20180402-sha1sums.txt for https://dumps.wikimedia.org/wikidatawiki/entities/20180402/.
I manually added the JSON checksums, but the RDF ones were automatically added. I'll check next week to make sure this also correctly works for the JSON checksums, but I don't expect any surprises there.
JSON checksums look fine as well:
hoo@snapshot1007:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20180409$ md5sum -c wikidata-20180409-md5sums.txt wikidata-20180409-all.json.gz: OK hoo@snapshot1007:/mnt/dumpsdata/otherdumps/wikibase/wikidatawiki/20180409$ sha1sum -c wikidata-20180409-sha1sums.txt wikidata-20180409-all.json.gz: OK