Nothing Special   »   [go: up one dir, main page]

Page MenuHomePhabricator

Australia for COH
Closed, ResolvedPublic

Description

From @Mrjohncummings
Australia has a centralised website of data, open license text descriptions and openly licensed images available online, it can be searched by state to produce long lists of sites. It has over 22,000 sites, many have at least one photo, often more.

http://www.environment.gov.au/cgi-bin/ahdb/search.pl

It seems simple to create a structured database from the web pages, I did a test with one of the states, the names have links to the individual URLs for the sites.

https://docs.google.com/spreadsheets/d/1EyWhAAvZfcDgInGjLR5tEwwbFZ0oUfGZJ56OAg2xim0/edit?usp=sharing

We could copy the descriptions into Wikipedia to create missing articles using Wikidata as an index like I have done with the Biosphere Reserves (click on one of the links to see the Wikidata created map).
https://meta.wikimedia.org/wiki/WikiProject_UNESCO/Create_Biosphere_Reserve_Wikipedia_articles_from_UNESCO_descriptions

Open questions:

  • Is this suitable data
  • What data already exists on the WLM database for Australia?
  • How much work would it be to import this data and images?
  • Is scraping the images possible or should we rather contact them to see if they could provide the content in another way to make it easier to upload.

Event Timeline

Question 1:

  • What is the license of the data itself?
  • We would possibly want to filter the data on one or more of the Type / Register / Type of register fields but otherwise I think it might be suitable data

Question 2: Per https://commons.wikimedia.org/wiki/Commons:WLM Australia has never participated in WLM and I couldn't find them in the monuments database. There might be Wikipedia lists which were not imported to the database though.

So at the bottom of e.g. http://www.environment.gov.au/cgi-bin/ahdb/search.pl?mode=place_detail;place_id=18843 I see that the license is "CC BY 3.0 AU" and a link to their copyrights page. From that it doesn't look like the data is free enough to import to Wikidata.

Now since we are not interested in all of the data the question is if there is a subset which can be released and used on Wikidata?

Is all of the data copyrighted at all? How strict is Wikidata about the creativity threshold? The long description and photos are not necessary for the import, I guess. Would it be OK if you'd stick to facts alone?

With regards to copyright the biggest problem is probably issues similar to the database protection rights in Europe.

Per the quote (my highlights) it seems as though Australia has something similar with creativity being interchangeable for sweat-of-the-brow.

A database will be protected under Australian law if it is a literary work; expressed in material form; meets the originality test; and has a relevant connection with Australia. Facts and data in themselves are not protected by copyright. However, a collection of data, a dataset, or a database may be protected by copyright if it is sufficiently original. Whether a work is sufficiently original to be protected by copyright depends on whether it has been produced by the application of independent intellectual effort by the author/s, which may involve the exercise of skill, judgement, or creativity in selecting, presenting, or arranging the information. This summary synthesises recent cases regarding originality in factual compilations.

Met Kerry today, conclusion is to focus only on the federal lists (National Heritage List & Commonwealth Heritage list), those should be CC-BY. So if the copyright is the blocker, I suggest to use the oldfashioned lists for now - and worry about transfer to wikidata later?

I've attached the coordinates that @Yarl extracted from the National heritage list at

1id;state;name;lon;lat
2105758;"SA";"Adelaide Park Lands and City Layout";138.6101;-34.9145
3105741;"ACT";"Australian Academy of Science Building";149.122654;-35.283686
4105891;"VIC";"Australian Alps National Parks and Reserves - Alpine National Park";147.3060;-36.7320
5105891;"VIC";"Australian Alps National Parks and Reserves - Avon Wilderness";146.7910;-37.6290
6105891;"VIC";"Australian Alps National Parks and Reserves - Baw Baw National Park";146.2930;-37.8300
7105891;"NSW";"Australian Alps National Parks and Reserves - Bimberi Nature Reserve";148.7560;-35.5960
8105891;"NSW";"Australian Alps National Parks and Reserves - Brindabella National Park";148.7750;-35.2467
9105891;"NSW";"Australian Alps National Parks and Reserves - Kosciuszko National Park";148.2630;-36.4560
10105891;"VIC";"Australian Alps National Parks and Reserves - Mt Buffalo National Park";146.7670;-36.7760
11105891;"ACT";"Australian Alps National Parks and Reserves - Namadgi National Park";148.9900;-35.5590
12105891;"NSW";"Australian Alps National Parks and Reserves - Scabby Range Nature Reserve";148.8700;-35.7757
13105891;"VIC";"Australian Alps National Parks and Reserves - Snowy River National Park";148.4432;-37.2555
14105891;"ACT";"Australian Alps National Parks and Reserves - Tidbinbilla Nature Reserve";148.9050;-35.4568
15106304;"SA";"Australian Cornish Mining Sites - Burra";138.9309;-33.6767
16106096;"SA";"Australian Cornish Mining Sites - Moonta";137.6066;-34.0746
17105692;"SA";"Australian Fossil Mammal Sites (Naracoorte)";140.809836;-37.059995
18105691;"QLD";"Australian Fossil Mammal Sites (Riversleigh)";138.629856;-19.033295
19105889;"ACT";"Australian War Memorial and the Memorial Parade";149.148707;-35.280815
20105887;"WA";"Batavia Shipwreck Site and Survivor Camps Area 1629 - Houtman Abrolhos";113.746946;-28.469676
21106009;"NSW";"Bondi Beach";151.276;-33.892
22105845;"VIC";"Bonegilla Migrant Camp - Block 19";147.012;-36.13
23105778;"NSW";"Brewarrina Aboriginal Fish Traps (Baiames Ngunnhu)";146.855035;-29.957926
24105977;"TAS";"Brickendon Estate";147.133;-41.624
25105673;"VIC";"Budj Bim National Heritage Landscape - Mt Eccles Lake Condah Area";141.88132;-38.079192
26105678;"VIC";"Budj Bim National Heritage Landscape - Tyrendarra Area";141.764301;-38.193877
27105932;"TAS";"Cascades Female Factory";147.299;-42.894
28106060;"TAS";"Cascades Female Factory Yard 4 North";147.299;-42.894
29105683;"VIC";"Castlemaine Diggings National Heritage Park";144.226919;-37.149994
30105861;"NSW";"City of Broken Hill";141.462;-31.947
31105931;"TAS";"Coal Mines Historic Site";147.714;-42.983
32105928;"NSW";"Cockatoo Island";151.172;-33.848
33105937;"NSW";"Cyprus Hellene Club - Australian Hall";151.209552;-33.877891
34105727;"WA";"Dampier Archipelago (including Burrup Peninsula)";116.649;-20.603
35105933;"TAS";"Darlington Probation Station";148.073;-42.578
36105664;"QLD";"Dinosaur Stampede National Monument";142.409;-23.018
37105808;"WA";"Dirk Hartog Landing Site 1616 - Cape Inscription Area";112.987569;-25.500382
38105777;"VIC";"Echuca Wharf";144.747233;-36.121115
39105880;"SA";"Ediacara Fossil Site - Nilpena";138.399617;-31.133091
40105821;"QLD";"Elizabeth Springs";140.5830;-23.3415
41105754;"VIC";"Eureka Stockade Gardens";143.884584;-37.564619
42105761;"NSW";"First Government House Site";151.211642;-33.863525
43105974;"WA";"Fitzgerald River National Park";119.6;-34
44105922;"VIC";"Flemington Racecourse";144.912899;-37.789023
45105851;"VIC";"Flora Fossil Site - Yea";145.449294;-37.221012
46105992;"QLD";"Fraser Island";153.142719;-25.25983
47105762;"WA";"Fremantle Prison (former)";115.753177;-32.055039
48105815;"QLD";"Glass House Mountains National Landscape";152.943;-26.93
49105729;"VIC";"Glenrowan Heritage Precinct";146.223931;-36.462709
50105704;"NSW";"Gondwana Rainforests of Australia - Barrington Tops Area";151.53;-32.096
51105704;"QLD";"Gondwana Rainforests of Australia - Focal Peak Group";152.648;-28.278
52105704;"NSW";"Gondwana Rainforests of Australia - Focal Peak Group";152.468;-28.445
53105704;"NSW";"Gondwana Rainforests of Australia - Hastings-Macleay Group";152.123;-30.94
54105704;"NSW";"Gondwana Rainforests of Australia - Iluka Nature Reserve";153.361;-29.404
55105704;"QLD";"Gondwana Rainforests of Australia - Main Range Group";152.405;-28.067
56105704;"NSW";"Gondwana Rainforests of Australia - Main Range Group";152.42;-28.31
57105704;"NSW";"Gondwana Rainforests of Australia - New England Group";152.476;-30.434
58105704;"QLD";"Gondwana Rainforests of Australia - Shield Volcano Group";153.151;-28.219
59105704;"NSW";"Gondwana Rainforests of Australia - Shield Volcano Group";153.074;-28.363
60105704;"NSW";"Gondwana Rainforests of Australia - Washpool and Gibraltar Range";152.339;-29.409
61105852;"VIC";"Grampians National Park (Gariwerd)";142.409475;-37.253771
62105709;"QLD";"Great Barrier Reef";148.587;-19.266
63105875;"VIC";"Great Ocean Road";143.391556;-38.680523
64105999;"NSW";"Greater Blue Mountains";150.469825;-33.285614
65106065;"WA";"HMAS Sydney II";111.2130;-26.2442
66106167;"EXT";"HMS Sirius";167.955;-29.06
67105764;"VIC";"HMVS Cerberus";145.008103;-37.967811
68106065;"WA";"HSK Kormoran";111.0719;-26.0992
69105707;"EXT";"Heard and McDonald Islands";73.516992;-53.09354
70105767;"NT";"Hermannsburg Historic Precinct";132.775032;-23.945097
71105745;"ACT";"High Court - National Gallery Precinct";149.136;-35.3
72105896;"VIC";"High Court of Australia (former)";144.959;-37.814
73105935;"NSW";"Hyde Park Barracks";151.21275;-33.86957
74105747;"VIC";"ICI Building (former)";144.973522;-37.808991
75106168;"TAS";"Jordan River Levee";147.266;-42.705
76105688;"NT";"Kakadu National Park";132.519512;-13.005471
77105962;"EXT";"Kingston and Arthurs Vale Historic Area";167.959;-29.054
78106022;"SA";"Koonalda Cave";129.837;-31.403
79105817;"NSW";"Ku-ring-gai Chase National Park, Lion, Long and Spectacle Island Nature Reserves";151.213244;-33.630803
80105812;"NSW";"Kurnell Peninsula Headland";151.217;-34.005
81105967;"WA";"Lesueur National Park";115.1;-30.1
82105694;"NSW";"Lord Howe Island Group";159.079723;-31.556175
83105698;"TAS";"Macquarie Island";158.864516;-54.628346
84105713;"ANTA";"Mawsons Huts and Mawsons Huts Historic Site";142.668102;-67.009725
85105885;"VIC";"Melbourne Cricket Ground";144.983447;-37.819943
86106098;"NSW";"Moree Baths and swimming pool";149.846;-29.4743
87105936;"VIC";"Mount William Stone Hatchet Quarry";144.81;-37.213
88106149;"VIC";"Murtoa No 1 Grain Store";142.479;-36.623
89105869;"NSW";"Myall Creek Massacre and Memorial Site";150.715;-29.779
90105739;"VIC";"Newman College";144.963759;-37.795408
91106025;"QLD";"Ngarrabullgan";144.832;-16.822
92105759;"NSW";"North Head - Sydney";151.296074;-33.813868
93105957;"NSW";"Old Government House and the Government Domain";150.995;-33.81
94105961;"NSW";"Old Great North Road";150.995;-33.379
95105774;"ACT";"Old Parliament House and Curtilage";149.129522;-35.302456
96105671;"VIC";"Point Cook Air Base";144.751;-37.933
97105680;"VIC";"Point Nepean Defence Sites and Quarantine Station Area";144.684021;-38.315516
98105982;"WA";"Porongurup National Park";117.8866;-34.6853
99105718;"TAS";"Port Arthur Historic Site";147.85493;-43.144065
100105697;"WA";"Purnululu National Park";128.546055;-17.448789
101106064;"QLD";"QANTAS hangar - Longreach";144.2711;-23.4393
102105665;"TAS";"Recherche Bay (North East Peninsula) Area";146.922692;-43.533898
103105724;"TAS";"Richmond Bridge";147.44002;-42.733874
104105763;"VIC";"Rippon Lea House and Garden";144.998951;-37.879586
105105708;"VIC";"Royal Exhibition Building and Carlton Gardens";144.971584;-37.804234
106105893;"NSW";"Royal National Park and Garawarra State Conservation Area";151.059818;-34.121549
107105686;"WA";"Shark Bay, Western Australia";113.604655;-25.92897
108105743;"VIC";"Sidney Myer Music Bowl";144.974721;-37.824198
109105919;"NSW";"Snowy Mountains Scheme";148.2632;-36.0015
110105710;"SA";"South Australian Old and New Parliament Houses";138.598629;-34.921133
111106305;"VIC";"St Kilda Road and Environs";144.97553;-37.83622
112105818;"WA";"Stirling Range National Park";118.047177;-34.397718
113105888;"NSW";"Sydney Harbour Bridge";151.210684;-33.852232
114105738;"NSW";"Sydney Opera House";151.214798;-33.857484
115105695;"TAS";"Tasmanian Wilderness";145.882;-42.445
116106061;"QLD";"The Burke, Wills, King and Yandruwandha National Heritage Place";140.836;-27.7
117106007;"WA";"The Goldfields Water Supply Scheme";121.164;-30.953
118105881;"WA";"The Ningaloo Coast";113.628;-22.739
119106063;"WA";"The West Kimberley";126;-16.2
120105721;"QLD";"Tree of Knowledge and curtilage";145.28963;-23.552352
121105687;"NT";"Uluru - Kata Tjuta National Park";130.985732;-25.322076
122105853;"NSW";"Warrumbungle National Park";149.014006;-31.280359
123105897;"NT";"Wave Hill Walk Off Route";130.947;-17.492
124105751;"TAS";"Western Tasmania Aboriginal Cultural Landscape";144.7895;-41.3761
125105689;"QLD";"Wet Tropics of Queensland";145.705;-17.671
126105693;"NSW";"Willandra Lakes Region";143.101406;-33.618142
127105819;"SA";"Witjira-Dalhousie Springs";135.4667;-26.5131
128105976;"TAS";"Woolmers Estate";147.154;-41.624

@Lokal_Profil just to confirm: is there anything missing to get this rolling from your end?

LilyOfTheWest triaged this task as High priority.
LilyOfTheWest subscribed.

@Lokal_Profil I took the liberty and changed the priority of this ask to High and assigned it to you. ;) Feel free to disagree on each front. We do need the list for Australia soon for Wiki Loves Monuments. Can we have a sense of when this can happen? :)

Sourcing: So the list that we are basing all of this of is that which was sent from @Gnangarra to @Effeietsanders with 520 entries (P5911). To source the data on Wikidata I need info on where this comes from (Either a dataset with an equivalent item on Wikidata or a url) and when it was downloaded.

The same goes for the coordinates which @Yarl extracted (P5881).

Coordinates for the commonwealth list (P5912) can be sourced as an url but need an item if we want to source it more properly.

Mapping: I've mapped all of the source fields to Wikidata in d:User:André_Costa_(WMSE)/COH/Australia. Please take a look and make sure it makes sense. There are currently two unmapped fields which I'm happy to get input for.

Please also add the sourcing information (above) to this page.

Issues with source data:
Three id numbers (place_id) are repeated in the dataset from @Gnangarra: 105694, 105698 and 105707 (all on the National Heritage list). This needs to be resolved as they are otherwise not unique and cannot be mapped against the relevant coordinate.

In the coordinates dataset created by @Yarl 105891, 105704 and 106065 are used multiple times making it unclear which coordinate to use.

De-duplication:
Other that looking for wikidata entries that already use the P3008 (place id property) are there any other ways of automatically identifying heritage sites already on Wikidata?

De-duplication could happen partially based on the coordinates, I guess? But this could only give suggestions - and will never be error free. Otherwise, it's probably manual work, I'd guess.

This is excitingly close to working :)

If someone creates a list of manual tasks for muggles to do I would be happy to do some of them.

Commons
In addition to the Wikidata side of things there are also things that need doing on Commons for Australia to be ready.Possibly @Romaine
could help with these?

Specifically the following are needed:

  • An upload campaign
  • Any localised WLM templates
  • Categories for "Listed <something> in Australia"
  • A template holding the place_id and categorising these in "Listed <something> in Australia with known IDs" with the id as sort key.

Finding monuments
Finally how are contestants finding which images to upload? Monumental? Listeria powered lists? Old-style lists? For both of the latter two templates are needed on whatever wiki they will live and someone with knowledge of that wiki and its policies will have to decide where they should live.

Replies

  • I'll leave the question for Gnangarra for him.

I'm not sure (s)he is actually on Phabricator which is why I tagged you.

Thanks I've added this to the document.

De-duplication could happen partially based on the coordinates, I guess? But this could only give suggestions - and will never be error free. Otherwise, it's probably manual work, I'd guess.

Coordinate based de-duplication (or suggesting duplicates for a human to review) is probably the best. This will not happen before WLM though (I'm also not saying I'm the best person to build this) so my guess is that extended de-duplication will have to happen afterwards.

Also a ping that there are still two question marks with regards to the mapping. Feedback (on the talk page) is appreciated otherwise I will make an uninformed guess ;)

(some kudos for Romaines supertable: https://commons.wikimedia.org/wiki/User:Romaine/Wiki_Loves_Monuments/2017/table )

I'll poke Gnangarra for this thread.

Item 105694 is Lord Howe Island the first instance(424) is the area of the main island (Q104784) and the second instance(425) appears to be the area of the additional islands in the group which is covered by (Q1869866) and includes Wolf Rock(Q2695142), Sail Rock(no id) and Mutton Bird Island(no id) - combine as one entry under (Q1869866) which is the whole group of islands

Item 105698 - Macquarie Island (Q46650) - same as before multiple Islands though in this case there is no group, thought the first instance appears to be related to Judge and Clerk Islets (Q46439) & Bishop and Clerk Islets (Q46489) both which are part of the Macquarie Island Nature reserve (no WD item)

Item 105707 - combine as Heard Island and McDonald Islands (Q131198), the first instance is McDonald Island (Q1915097) and the second one is Heard Island (there is no WD id)

for 105891, 105704 and 106065

105891 - the Australian Alp NP, the different Coordinates are for individual Peaks within it
105704 - Gondwana Rain forest, different locations within the rain forest
106065 - two ships HMAS Sydney(Q1031986) @ 26°14′31″S 111°12′48″E, Kormoran(Q708046) @ 26°05′46″S 111°04′33″E both sank off Geraldton in WA following after attacking each other in WWII

Thanks for the help in sorting this out

National Heritage list can be sourced to http://www.environment.gov.au/heritage/places/national-heritage-list
Commonwealth list can be sourced to http://www.environment.gov.au/heritage/places/commonwealth-heritage-list

the data I got from using the queries available at http://www.environment.gov.au/about-us/environmental-information-data/databases-applications extracting that data then filtering to only the necessary information, removing copyright material from the dataset as raw data(anything that can be sourced multiple ways) cant be copyrighted in Australia.

the data I got from using the queries available at http://www.environment.gov.au/about-us/environmental-information-data/databases-applications extracting that data then filtering to only the necessary information, removing copyright material from the dataset as raw data(anything that can be sourced multiple ways) cant be copyrighted in Australia.

Could you provide an approximate date for this and maybe some deeper links for the queries? Pointing the url to either of the links in T153221#3552463 doesn't give users much help if they want to verify the info.

  • For the identifier, I would suggest one single identifier, based on the 'entry in the Australian Heritage Database'. Possibly on top of that, we can create templates for the National Heritage List, Commonwealth Heritage List and register of the National Estate or even places under consideration. This would allow a single template for uploading, which can then automatically be replaced with the more specific template.

For the template :c:Template:Cultural Heritage Australia I would say start with a simple template taking only the place_id as the only parameter. It's fairly easy to add parameters to make the text/category change based on type afterwards and since the data will be in Wikidata one could write a bot to update anything which has already been uploaded.

(some kudos for Romaines supertable: https://commons.wikimedia.org/wiki/User:Romaine/Wiki_Loves_Monuments/2017/table )

I'm not sure that all items on the list are buildings. We might actually want a new category higher up in the Australian Cultural Heritage tree.

Item 105694 is Lord Howe Island the first instance(424) is the area of the main island (Q104784) and the second instance(425) appears to be the area of the additional islands in the group which is covered by (Q1869866) and includes Wolf Rock(Q2695142), Sail Rock(no id) and Mutton Bird Island(no id) - combine as one entry under (Q1869866) which is the whole group of islands

Item 105698 - Macquarie Island (Q46650) - same as before multiple Islands though in this case there is no group, thought the first instance appears to be related to Judge and Clerk Islets (Q46439) & Bishop and Clerk Islets (Q46489) both which are part of the Macquarie Island Nature reserve (no WD item)

Item 105707 - combine as Heard Island and McDonald Islands (Q131198), the first instance is McDonald Island (Q1915097) and the second one is Heard Island (there is no WD id)

Thanks. I've ensured there is a group item for each and will simply not import area and coordinates for these automatically

for 105891, 105704 and 106065

105891 - the Australian Alp NP, the different Coordinates are for individual Peaks within it
105704 - Gondwana Rain forest, different locations within the rain forest
106065 - two ships HMAS Sydney(Q1031986) @ 26°14′31″S 111°12′48″E, Kormoran(Q708046) @ 26°05′46″S 111°04′33″E both sank off Geraldton in WA following after attacking each other in WWII

Thanks. I'll simply drop automatically adding coordinates for these.

Have largely finished the code. You can see a preview of what the import would look like at https://www.wikidata.org/wiki/User:André_Costa_(WMSE)/COH/Australia/preview awaiting the final source components.

the data I got from using the queries available at http://www.environment.gov.au/about-us/environmental-information-data/databases-applications extracting that data then filtering to only the necessary information, removing copyright material from the dataset as raw data(anything that can be sourced multiple ways) cant be copyrighted in Australia.

Could you provide an approximate date for this and maybe some deeper links for the queries? Pointing the url to either of the links in T153221#3552463 doesn't give users much help if they want to verify the info.

around about 21 July 2017 was the date I sourced the data for both lists, that would accurate within a couple days either way the start point URL for the Australian Heritage list is http://data.gov.au/dataset/2016-soe-her-aus-national-heritage according to original csv file the data was last updated 31 May 2017

the commonwealth list was created at the same time from http://data.gov.au/dataset/commonwealth-heritage-list

around about 21 July 2017 was the date I sourced the data for both lists, that would accurate within a couple days either way the start point URL for the Australian Heritage list is http://data.gov.au/dataset/2016-soe-her-aus-national-heritage according to original csv file the data was last updated 31 May 2017

the commonwealth list was created at the same time from http://data.gov.au/dataset/commonwealth-heritage-list

Thanks. I've now updated the sources and the preview example.

I created T174324: Allow Australia to participate in WLM2017 to more clearly separate Wiki-Loves-Monuments and Connected-Open-Heritage needs. I've also created a few subtasks there. For WLM related discussions I recommend continuing there and feeling free to unsubscribe here.