John Wieczorek

UC Berkeley, Museum of Vertebrate Zoology, Information Architect

Followers

Following

Co-authors

Public Views

Information Architect who likes to help people accomplish their goals.

less

Interests

Uploads

Papers by John Wieczorek

La informàtica de la biodiversitat és un ingredient bàsic dels museus de cièncias naturals

Mnemòsine: revista catalana de museologia, 2014

Els museus de ciencies naturals son contenidors d'informacio en quantitat i en qualitat i per... more Els museus de ciencies naturals son contenidors d'informacio en quantitat i en qualitat i per tant cal extreure el maxim rendiment d'aquesta anima intangible. Fins a la meitat del segle XX els museus van desenvolupar procediments per crear arxius d'informacio on reunir de forma sistematica les dades descriptives de les mostres de colleccions. Els catalegs o inventaris amb fitxes de paper permetien ordenacions lineals segon un criteri, pero no facilitaven cerques per categories generals, ni combinar. A finals del segle XX ha aparegut una nova dimensio, la digital, que reuneix conceptes com gestio i difusio de la informacio o del coneixement. La capacitat de les colleccions de proporcionar informacio, en essencia immaterial pero basada en materials i no en simples observacions, s'ha convertit en una nova exigencia de servei cientific i a la societat que se suma a les tradicionals dels museus. Amb la implementacio de les bases de dades estructurades i relacionals en suport digital es va poder oferir informacio de facil consulta. Em primer lloc els museus es van abocar a informatitzar el seu fons. Abans d'haver completat el primer objectiu, els museus van distribuir el temps potencial dedicat a gestionar la informacio per invertir-li en projectes de publicacio a Internet. Actualment, els museus estan enfrontant-se al repte d'esdevenir fonts connectades d'informacio, rigoroses i interpetables per les eines d'analisi en una Internet que fuig de les ambiguitats.

Summary of issues encountered in the name combinations that form the basis of the validation set

<p>Numbers indicate name combinations that showed one or more types of issues. Total number... more <p>Numbers indicate name combinations that showed one or more types of issues. Total number of name combinations assessed for issues = 991, total number of those name combinations with issues = 532, total number of those name combinations with errors (misspelling, conceptual or format error) = 341.</p

Data cleaning with the Kurator toolkit: Bridging the gap between conventional scripting and high-performance workflow automation

The Kurator project aims to facilitate the development, documentation, and efficient execution of... more The Kurator project aims to facilitate the development, documentation, and efficient execution of scripts and workflows for cleaning biodiversity data. Kurator tools under development and available as prototypes in the Kurator GitHub repositories ( http://github.com/kurator-org/ ) support traditional scripting as well as high-performance, actor-oriented workflow approaches to validating, annotating, and cleaning data. The Kurator-Akka framework ( http://github.com/kurator-org/kurator-akka ) makes it easy to develop and run high-performance data cleaning workflows that employ the Akka actor toolkit by shielding actor developers and workflow users alike from the complexities of the Akka API (application programming interface). Kurator-Akka actors currently can be written either in Python or Java, and workflows may be specified using a language based on YAML (YAML Ain't Markup Language) that defines how data flows between the actors at run time. A workflow can be composed from e...

Data Quality Workflows using Akka

Data cleaning has the potential to improve the chances for people and computers to find and use r... more Data cleaning has the potential to improve the chances for people and computers to find and use relevant data. This is true for researchers as well as for large-scale data aggregators. In the biodiversity realm, Darwin Core provides a convenient scope and framework for data cleaning tools and vocabularies. One way to address data cleaning tasks is to use workflows that act on a combination of original data, controlled vocabularies, algorithms, and services to detect inconsistencies and errors, recommend changes, and augment the original data with improvements and additions. There are advantages from the perspective of flexibility to construct such workflows from specialized, reusable "actors" -- building blocks that do specific tasks, such as provide a list of distinct values of a field in a data set. The Kurator project uses Akka, a Java-based framework to construct workflows with actors written in a variety and even in a combination of programming languages. In this pres...

A Community-Developed Extension to Darwin Core for Reporting the Chronometric Age of Specimens

bioRxiv (Cold Spring Harbor Laboratory), Nov 24, 2021

John Wieczorek

Uploads

Papers by John Wieczorek

Log In