Authors:
Yanis Ouakrim
1
;
2
;
Hannah Bull
1
;
Michèle Gouiffès
1
;
Denis Beautemps
2
;
Thomas Hueber
2
and
Annelies Braffort
1
Affiliations:
1
LISN, Univ. Paris-Saclay, CNRS, 91405 Orsay, France
;
2
Univ. Grenoble Alpes, GIPSA-Lab, CNRS, F-38000 Grenoble, France
Keyword(s):
Sign Language Processing, Sign Language Translation, Sign Language Corpora, French Sign Language, LSF.
Abstract:
We introduce Mediapi-RGB, a new dataset of French Sign Language (LSF) along with the first LSF-to-French machine translation model. With 86 hours of video, it the largest LSF corpora with translation. The corpus consists of original content in French Sign Language produced by deaf journalists, and has subtitles in written French aligned to the signing. The current release of Mediapi-RGB is available at the Ortolang corpus repository (https://www.ortolang.fr/workspaces/mediapi-rgb), and can be used for academic research purposes. The test and validation sets contain 13 and 7 hours of video respectively. The training set contains 66 hours of video that will be released progressively until December 2024. Additionally, the current release contains skeleton keypoints, sign temporal segmentation, spatio-temporal features and subtitles for all the videos in the train, validation and test sets, as well as a suggested vocabulary of nouns for evaluation purposes. In addition, we present the re
sults obtained on this corpus with the first LSF-to-French translation baseline to give an overview of the possibilities offered by this corpus of unprecedented caliber for LSF. Finally, we suggest potential technological and linguistic applications for this new video-text dataset.
(More)