Els museus de ciencies naturals son contenidors d'informacio en quantitat i en qualitat i per... more Els museus de ciencies naturals son contenidors d'informacio en quantitat i en qualitat i per tant cal extreure el maxim rendiment d'aquesta anima intangible. Fins a la meitat del segle XX els museus van desenvolupar procediments per crear arxius d'informacio on reunir de forma sistematica les dades descriptives de les mostres de colleccions. Els catalegs o inventaris amb fitxes de paper permetien ordenacions lineals segon un criteri, pero no facilitaven cerques per categories generals, ni combinar. A finals del segle XX ha aparegut una nova dimensio, la digital, que reuneix conceptes com gestio i difusio de la informacio o del coneixement. La capacitat de les colleccions de proporcionar informacio, en essencia immaterial pero basada en materials i no en simples observacions, s'ha convertit en una nova exigencia de servei cientific i a la societat que se suma a les tradicionals dels museus. Amb la implementacio de les bases de dades estructurades i relacionals en suport digital es va poder oferir informacio de facil consulta. Em primer lloc els museus es van abocar a informatitzar el seu fons. Abans d'haver completat el primer objectiu, els museus van distribuir el temps potencial dedicat a gestionar la informacio per invertir-li en projectes de publicacio a Internet. Actualment, els museus estan enfrontant-se al repte d'esdevenir fonts connectades d'informacio, rigoroses i interpetables per les eines d'analisi en una Internet que fuig de les ambiguitats.
<p>Numbers indicate name combinations that showed one or more types of issues. Total number... more <p>Numbers indicate name combinations that showed one or more types of issues. Total number of name combinations assessed for issues = 991, total number of those name combinations with issues = 532, total number of those name combinations with errors (misspelling, conceptual or format error) = 341.</p
The Kurator project aims to facilitate the development, documentation, and efficient execution of... more The Kurator project aims to facilitate the development, documentation, and efficient execution of scripts and workflows for cleaning biodiversity data. Kurator tools under development and available as prototypes in the Kurator GitHub repositories ( http://github.com/kurator-org/ ) support traditional scripting as well as high-performance, actor-oriented workflow approaches to validating, annotating, and cleaning data. The Kurator-Akka framework ( http://github.com/kurator-org/kurator-akka ) makes it easy to develop and run high-performance data cleaning workflows that employ the Akka actor toolkit by shielding actor developers and workflow users alike from the complexities of the Akka API (application programming interface). Kurator-Akka actors currently can be written either in Python or Java, and workflows may be specified using a language based on YAML (YAML Ain't Markup Language) that defines how data flows between the actors at run time. A workflow can be composed from e...
Data cleaning has the potential to improve the chances for people and computers to find and use r... more Data cleaning has the potential to improve the chances for people and computers to find and use relevant data. This is true for researchers as well as for large-scale data aggregators. In the biodiversity realm, Darwin Core provides a convenient scope and framework for data cleaning tools and vocabularies. One way to address data cleaning tasks is to use workflows that act on a combination of original data, controlled vocabularies, algorithms, and services to detect inconsistencies and errors, recommend changes, and augment the original data with improvements and additions. There are advantages from the perspective of flexibility to construct such workflows from specialized, reusable "actors" -- building blocks that do specific tasks, such as provide a list of distinct values of a field in a data set. The Kurator project uses Akka, a Java-based framework to construct workflows with actors written in a variety and even in a combination of programming languages. In this pres...
Camera trapping has revolutionized wildlife ecology and conservation by providing automated data ... more Camera trapping has revolutionized wildlife ecology and conservation by providing automated data acquisition, leading to the accumulation of massive amounts of camera trap data worldwide. Although management and processing of camera trap-derived Big Data are becoming increasingly solvable with the help of scalable cyber-infrastructures, harmonization and exchange of the data remain limited, hindering its full potential. We present a new data exchange format, the Camera Trap Data Package (Camtrap DP), designed to allow users to easily exchange, harmonize and archive camera trap data at local to global scales. Camtrap DP structures camera trap data in a simple yet flexible data model consisting of three tables (Deployments, Media, and Observations) that supports a wide range of camera deployment designs, classification techniques (e.g., human and AI, media-based and event-based) and analytical use cases, from compiling species occurrence data through distribution, occupancy and activity modeling to density estimation. The format further achieves interoperability by building upon existing standards, Frictionless Data Package in particular, which is supported by a suite of open software tools to read and validate data. Camtrap DP is the consensus of a long, in-depth, consultation and outreach process with standard and software developers, the main existing camera trap data management platforms, major players in the field of camera trapping, and the Global Biodiversity Information Facility (GBIF). Under the umbrella of the Biodiversity Information Standards (TDWG), Camtrap DP has been developed openly, collaboratively, and with version control from the start and we encourage camera trapping users and developers to join the discussion and contribute to the further development and adoption of this standard.
Version v0.5 of the EBV Metadata Profile XSD schema. Version v0.5 is focused on species-level EBV... more Version v0.5 of the EBV Metadata Profile XSD schema. Version v0.5 is focused on species-level EBVs, especially on the species populations EBV class, and maybe equally applicable to the species traits EBV class. However, this version doesn´t cover other EBV classes related to ecosystem function/structure or community composition and genetic composition. These classes will need some additional terms as well as have no need for some current terms (e.g. taxonomic coverage). For instance, ecosystem function and ecosystem structure EBVs will need some other dimension than taxonomy, e.g. fields related to what type of ecosystems and habitats are captured.
Els museus de ciencies naturals son contenidors d'informacio en quantitat i en qualitat i per... more Els museus de ciencies naturals son contenidors d'informacio en quantitat i en qualitat i per tant cal extreure el maxim rendiment d'aquesta anima intangible. Fins a la meitat del segle XX els museus van desenvolupar procediments per crear arxius d'informacio on reunir de forma sistematica les dades descriptives de les mostres de colleccions. Els catalegs o inventaris amb fitxes de paper permetien ordenacions lineals segon un criteri, pero no facilitaven cerques per categories generals, ni combinar. A finals del segle XX ha aparegut una nova dimensio, la digital, que reuneix conceptes com gestio i difusio de la informacio o del coneixement. La capacitat de les colleccions de proporcionar informacio, en essencia immaterial pero basada en materials i no en simples observacions, s'ha convertit en una nova exigencia de servei cientific i a la societat que se suma a les tradicionals dels museus. Amb la implementacio de les bases de dades estructurades i relacionals en suport digital es va poder oferir informacio de facil consulta. Em primer lloc els museus es van abocar a informatitzar el seu fons. Abans d'haver completat el primer objectiu, els museus van distribuir el temps potencial dedicat a gestionar la informacio per invertir-li en projectes de publicacio a Internet. Actualment, els museus estan enfrontant-se al repte d'esdevenir fonts connectades d'informacio, rigoroses i interpetables per les eines d'analisi en una Internet que fuig de les ambiguitats.
<p>Numbers indicate name combinations that showed one or more types of issues. Total number... more <p>Numbers indicate name combinations that showed one or more types of issues. Total number of name combinations assessed for issues = 991, total number of those name combinations with issues = 532, total number of those name combinations with errors (misspelling, conceptual or format error) = 341.</p
The Kurator project aims to facilitate the development, documentation, and efficient execution of... more The Kurator project aims to facilitate the development, documentation, and efficient execution of scripts and workflows for cleaning biodiversity data. Kurator tools under development and available as prototypes in the Kurator GitHub repositories ( http://github.com/kurator-org/ ) support traditional scripting as well as high-performance, actor-oriented workflow approaches to validating, annotating, and cleaning data. The Kurator-Akka framework ( http://github.com/kurator-org/kurator-akka ) makes it easy to develop and run high-performance data cleaning workflows that employ the Akka actor toolkit by shielding actor developers and workflow users alike from the complexities of the Akka API (application programming interface). Kurator-Akka actors currently can be written either in Python or Java, and workflows may be specified using a language based on YAML (YAML Ain't Markup Language) that defines how data flows between the actors at run time. A workflow can be composed from e...
Data cleaning has the potential to improve the chances for people and computers to find and use r... more Data cleaning has the potential to improve the chances for people and computers to find and use relevant data. This is true for researchers as well as for large-scale data aggregators. In the biodiversity realm, Darwin Core provides a convenient scope and framework for data cleaning tools and vocabularies. One way to address data cleaning tasks is to use workflows that act on a combination of original data, controlled vocabularies, algorithms, and services to detect inconsistencies and errors, recommend changes, and augment the original data with improvements and additions. There are advantages from the perspective of flexibility to construct such workflows from specialized, reusable "actors" -- building blocks that do specific tasks, such as provide a list of distinct values of a field in a data set. The Kurator project uses Akka, a Java-based framework to construct workflows with actors written in a variety and even in a combination of programming languages. In this pres...
Camera trapping has revolutionized wildlife ecology and conservation by providing automated data ... more Camera trapping has revolutionized wildlife ecology and conservation by providing automated data acquisition, leading to the accumulation of massive amounts of camera trap data worldwide. Although management and processing of camera trap-derived Big Data are becoming increasingly solvable with the help of scalable cyber-infrastructures, harmonization and exchange of the data remain limited, hindering its full potential. We present a new data exchange format, the Camera Trap Data Package (Camtrap DP), designed to allow users to easily exchange, harmonize and archive camera trap data at local to global scales. Camtrap DP structures camera trap data in a simple yet flexible data model consisting of three tables (Deployments, Media, and Observations) that supports a wide range of camera deployment designs, classification techniques (e.g., human and AI, media-based and event-based) and analytical use cases, from compiling species occurrence data through distribution, occupancy and activity modeling to density estimation. The format further achieves interoperability by building upon existing standards, Frictionless Data Package in particular, which is supported by a suite of open software tools to read and validate data. Camtrap DP is the consensus of a long, in-depth, consultation and outreach process with standard and software developers, the main existing camera trap data management platforms, major players in the field of camera trapping, and the Global Biodiversity Information Facility (GBIF). Under the umbrella of the Biodiversity Information Standards (TDWG), Camtrap DP has been developed openly, collaboratively, and with version control from the start and we encourage camera trapping users and developers to join the discussion and contribute to the further development and adoption of this standard.
Version v0.5 of the EBV Metadata Profile XSD schema. Version v0.5 is focused on species-level EBV... more Version v0.5 of the EBV Metadata Profile XSD schema. Version v0.5 is focused on species-level EBVs, especially on the species populations EBV class, and maybe equally applicable to the species traits EBV class. However, this version doesn´t cover other EBV classes related to ecosystem function/structure or community composition and genetic composition. These classes will need some additional terms as well as have no need for some current terms (e.g. taxonomic coverage). For instance, ecosystem function and ecosystem structure EBVs will need some other dimension than taxonomy, e.g. fields related to what type of ecosystems and habitats are captured.
Uploads
Papers by John Wieczorek