Nothing Special   »   [go: up one dir, main page]

Published January 27, 2021 | Version v1.0.0
Software Open

collab-uniba/KGTorrent: First release (v. 1.0.0) of KGTorrent

  • 1. University of Bari, Italy

Description

Given their growing popularity among data scientists, computational notebooks - and in particular Jupyter notebooks - are being increasingly studied by researchers worldwide. Generally, the aim is to understand how they are typically used, identify possible flaws, and inform the design of extensions and updates of the tool. To ease these kind of research endeavors, we collected and shared a large dataset of 248,761 Jupyter notebooks from Kaggle, named KGTorrent.

Kaggle is a web platform hosting machine learning competitions that enables the creation and execution of Jupyter notebooks in a containerized computational environment. By leveraging Meta Kaggle, a dataset that is publicly available on the platform, we also built a companion MySQL database containing metadata on the notebooks in our dataset.

This repository hosts the Python scripts we developed to create KGTorrent. By leveraging the latest version of Meta Kaggle, the same scripts can also be used to refresh the collection.

For further details, please visit the full documentation of KGTorrent and the official KGTorrent GitHub repository.

Files

collab-uniba/KGTorrent-v1.0.0.zip

Files (12.9 MB)

Name Size Download all
md5:e44b057633fbcacb12a2bcec76ad0a66
12.9 MB Preview Download

Additional details

Related works