tBiodivL: Larger Semantic Table Annotations Benchmark for Biodiversity Domain

1. Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena, Germany
2. City, University of London, UK
3. IBM Research, USA

tBiodivL is a dataset for tabular data to knowledge graph matching. It is derived from the Biodiversity domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).

tBiodivL is generated by KG2Tables using 10 levels of a recursive hierarchy of related concepts in Wikidata. It is the successor work of tBiodiv

tBiodivL contains 222,353 entity and horizontal tables, while this repository contains only a sample of 1% of the total generated tables of the entire benchmark with its ground truth data (gt). The Full size of this dataset is 312 GB. We will update this repository with the full dataset in the Future.

Please get in touch if you are interested in the full dataset,

The supported tasks for semantic table annotations are:

Topic Detection (TD) links the entire table to an entity or a class from the target KG.
Cell Entity Annotation (CEA) maps individual table cells to entities from the target KG.
Column Type Annotation (CTA) links individual table columns to classes from the target KG.
Column Property Annotation (CPA) detects the relations between column pairs from the target knowledge graph.
Row Annotation (RA) annotates the entire row to a KG entity or property.

Files

tbiodiv10-0.01-sample.zip

Files (40.3 MB)

Name	Size	Download all
tbiodiv10-0.01-sample.zip md5:51a084f34b80f45af2fe0172b18b28c3	40.3 MB	Preview Download

Additional details

Is derived from: Software: https://github.com/fusion-jena/KG2Tables (URL)
Is variant form of: Dataset: 10.5281/zenodo.10283015 (DOI)

133

Views

Downloads

Show more details

	All versions	This version
Views	133	133
Downloads	25	25
Data volume	1.1 GB	1.1 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Languages

English

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: December 7, 2023
Modified: December 7, 2023

tBiodivL: Larger Semantic Table Annotations Benchmark for Biodiversity Domain

Creators

Description

Files

tbiodiv10-0.01-sample.zip

Files (40.3 MB)

Additional details

Related works