The current multitude of inspections in both the private and public sectors represents a significant administrative burden for farms in Switzerland. With over 5000 inspection points1 and more than 20 different inspection programs, the system lacks a user-friendly coordination. Existing processes are neither very digitized nor harmonized, leading to redundancies and inefficiencies for both farmers and authorities.
The goal of agricheck is to first collect and harmonize inspection points from both the private and public agricultural sector, and second to provide a simple web application for farmers to quickly search and navigate these inspection points.
The data from various sources is standardized and freely provided in the RDF format via the linked data service LINDAS by the Federal Archive. Here's an example example inspection point as a linked data object on LINDAS.
The data of agricheck is organized hierarchically. Here are the links to the top-level collections:
The data model was written using OWL, the web ontology language. It is not only used as a map to write queries, but also for a automatic reasoning process. You can inspect the data model here.
To run the data integrat 741E ion from excel or XML files to standardized RDF turtle files, run
sh scripts/pipeline.sh
This executes the R script for data conversion (acontrol.R
, bioinspecta.R
and mutterkuh.R
) as well as the data validation, reasoning and merging validate-syntax.py
and reason.py
.
In the source data, some inspection points contain a schema:description
value that is nearly identical to their schema:name
.
Unfortunately, this is often the case for one language but not another, which leads to weird fallback langauge behavior. To fix this, the similarity is measured using the normalized Levenshtein distance, which computes the number of single-character edits (insertions, deletions, substitutions) needed to transform one string into the other.
The raw distance is then divided by the maximum string length, which makes the metric length-agnostic and comparable across strings of different sizes:
If the normalized distance between schema:name and schema:description is ≤ 0.1, the description is considered redundant and removed from the RDF graph.
This cleaning process is implemented in the Python script scripts/remove-redundancy.py
, and the resulting graph is re-serialized to Turtle.
- Get all inspection points with labels, comment and codes
- Find inspection point groups with exactly one sub-item
- How many distinct inspection points are there under the public domain?
Footnotes
-
Here, an inspection point is a specific, verifiable criterion within an agricultural control program used to assess a farm's compliance with a particular regulation or standard. ↩