Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu
paper cover icon
Data Quality Workflows using Akka

Data Quality Workflows using Akka

2016
John Wieczorek
Abstract
Data cleaning has the potential to improve the chances for people and computers to find and use relevant data. This is true for researchers as well as for large-scale data aggregators. In the biodiversity realm, Darwin Core provides a convenient scope and framework for data cleaning tools and vocabularies. One way to address data cleaning tasks is to use workflows that act on a combination of original data, controlled vocabularies, algorithms, and services to detect inconsistencies and errors, recommend changes, and augment the original data with improvements and additions. There are advantages from the perspective of flexibility to construct such workflows from specialized, reusable "actors" -- building blocks that do specific tasks, such as provide a list of distinct values of a field in a data set. The Kurator project uses Akka, a Java-based framework to construct workflows with actors written in a variety and even in a combination of programming languages. In this pres...

John Wieczorek hasn't uploaded this paper.

Let John know you want this paper to be uploaded.

Ask for this paper to be uploaded.