AGRICHECK

The current multitude of inspections in both the private and public sectors represents a significant administrative burden for farms in Switzerland. With over 5000 inspection points¹ and more than 20 different inspection programs, the system lacks a user-friendly coordination. Existing processes are neither very digitized nor harmonized, leading to redundancies and inefficiencies for both farmers and authorities.

The goal of agricheck is to first collect and harmonize inspection points from both the private and public agricultural sector, and second to provide a simple web application for farmers to quickly search and navigate these inspection points.

The data

The data from various sources is standardized and freely provided in the RDF format via the linked data service LINDAS by the Federal Archive. Here's an example example inspection point as a linked data object on LINDAS.

The data of agricheck is organized hierarchically. Here are the links to the top-level collections:

The data model

The data model was written using OWL, the web ontology language. It is not only used as a map to write queries, but also for a automatic reasoning process. You can inspect the data model here.

Run the ETL pipeline

To run the data integrat 741E ion from excel or XML files to standardized RDF turtle files, run

sh scripts/pipeline.sh

This executes the R script for data conversion (acontrol.R, bioinspecta.R and mutterkuh.R) as well as the data validation, reasoning and merging validate-syntax.py and reason.py.

Cleaning duplicate descriptions

In the source data, some inspection points contain a schema:description value that is nearly identical to their schema:name. Unfortunately, this is often the case for one language but not another, which leads to weird fallback langauge behavior. To fix this, the similarity is measured using the normalized Levenshtein distance, which computes the number of single-character edits (insertions, deletions, substitutions) needed to transform one string into the other. The raw distance is then divided by the maximum string length, which makes the metric length-agnostic and comparable across strings of different sizes:

$$ \text{Normalized Levenshtein}(a, b) = \frac{\text{Levenshtein}(a, b)}{\max\left(\lvert a \rvert, \lvert b \rvert\right)} $$

If the normalized distance between schema:name and schema:description is ≤ 0.1, the description is considered redundant and removed from the RDF graph. This cleaning process is implemented in the Python script scripts/remove-redundancy.py, and the resulting graph is re-serialized to Turtle.

Example queries

Here, an inspection point is a specific, verifiable criterion within an agricultural control program used to assess a farm's compliance with a particular regulation or standard. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
docs		docs
rdf		rdf
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AGRICHECK

The data

The data model

Run the ETL pipeline

Cleaning duplicate descriptions

Example queries

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

blw-ofag-ufag/agricheck

Folders and files

Latest commit

History

Repository files navigation

AGRICHECK

The data

The data model

Run the ETL pipeline

Cleaning duplicate descriptions

Example queries

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages