Building Sign Language Datasets

Achraf Othman²

27 Accesses

Abstract

This chapter focuses on the methodologies and frameworks essential for building comprehensive sign language datasets. It outlines the critical steps in data collection, including participant recruitment, video recording, and annotation processes, ensuring high-quality and representative data. The chapter discusses the different types of sign language datasets, such as lexical databases, conversational corpora, and annotated video corpora, highlighting their importance for various research and technological applications. Additionally, it addresses the challenges and best practices in dataset creation, emphasizing the need for ethical considerations and community involvement. By providing a detailed guide on constructing robust sign language datasets, the chapter aims to support researchers and developers in advancing sign language technologies and promoting inclusive linguistic research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

1.
About SignStream®: https://www.bu.edu/asllrp/SignStream/3/

References

El Ghoul, O.E.G., Othman, A.O., Aziz, M.A., Sedrati, S.S.: JUMLA-QSL-22: A dataset of Qatari sign language sentences, https://ieee-dataport.org/open-access/jumla-qsl-22-dataset-qatari-sign-language-sentences. https://doi.org/10.21227/CKZP-3754.
Othman, A., El Ghoul, O., Aziz, M., Chemnad, K., Sedrati, S., Dhouib, A.: JUMLA-QSL-22: Creation and Annotation of a Qatari Sign Language Corpus for Sign Language Processing. In: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments. pp. 686–692 (2023).
Google Scholar
Othman, A., El Ghoul, O.: BuHamad - The first Qatari virtual interpreter for Qatari Sign Language. NAFATH. 7, (2022). https://doi.org/10.54455/MCN.20.01.
Shi, B., Brentari, D., Shakhnarovich, G., Livescu, K.: Open-Domain Sign Language Translation Learned from Online Video, http://arxiv.org/abs/2205.12870, (2022).
Google Scholar
Neidle, C., Thangali, A., Sclaroff, S.: Challenges in development of the American Sign Language Lexicon Video Dataset (ASLLVD) corpus. Presented at the (2012).
Google Scholar
Sams, A., Akash, A.H., Rahman, S.M.M.: SignBD-Word: Video-Based Bangla Word-Level Sign Language and Pose Translation. In: 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT). pp. 1–7. IEEE, Delhi, India (2023). https://doi.org/10.1109/ICCCNT56998.2023.10306914.
Neidle, C., Opoku, A., Metaxas, D.: ASL Video Corpora & Sign Bank: Resources Available through the American Sign Language Linguistic Research Project (ASLLRP), http://arxiv.org/abs/2201.07899, (2022).
Li, D., Opazo, C.R., Yu, X., Li, H.: Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 1448–1458. IEEE, Snowmass Village, CO, USA (2020). https://doi.org/10.1109/WACV45572.2020.9093512.
Adithya, V., Rajesh, R.: Hand gestures for emergency situations: A video dataset based on words from Indian sign language. Data in Brief. 31, 106016 (2020). https://doi.org/10.1016/j.dib.2020.106016.
Article Google Scholar
Bohacek, M., Hruz, M.: Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries. In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). pp. 1–6. IEEE, Waikoloa Beach, HI, USA (2023). https://doi.org/10.1109/FG57933.2023.10042544.
Cooper, H., Pugeault, N., Bowden, R.: Reading the signs: A video based sign dictionary. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). pp. 914–919. IEEE, Barcelona, Spain (2011). https://doi.org/10.1109/ICCVW.2011.6130349.
Jedlička, P., Krňoul, Z., Železný, M., Muller, L.: MC-TRISLAN: A Large 3D Motion Capture Sign Language Data-set. Presented at the SIGNLANG (2022).
Google Scholar
Kiran Kumar, E., Kishore, P.V.V., Sastry, A.S.C.S., Anil Kumar, D.: 3D Motion Capture for Indian Sign Language Recognition (SLR). In: Satapathy, S.C., Bhateja, V., and Das, S. (eds.) Smart Computing and Informatics. pp. 21–29. Springer Singapore, Singapore (2018). https://doi.org/10.1007/978-981-10-5547-8_3.
Chapter Google Scholar
Watanabe, K., Nagashima, Y., Hara, D., Horiuchi, Y., Sako, S., Ichikawa, A.: Construction of a Japanese Sign Language Database with Various Data Types. In: Stephanidis, C. (ed.) HCI International 2019 - Posters. pp. 317–322. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-23522-2_41.
Benchiheub, M.-E.-F., Berret, B., Braffort, A.: Collecting and Analysing a Motion-Capture Corpus of French Sign Language. Presented at the (2016).
Google Scholar
Dibeklioglu, H., Dikici, E., Santemiz, P., Balci, K., Akarun, L.: Sign Language Motion Tracking and Generating 3D Motion Pieces Using 2D Features. In: 2007 IEEE 15th Signal Processing and Communications Applications. pp. 1–4. IEEE, Eskisehir, Turkey (2007). https://doi.org/10.1109/SIU.2007.4298843.
Kiran, P.S., Kumar, D.A., Kishore, P.V.V., Kumar, E.K., Kumar, M.T.K., Sastry, A.S.C.S.: Investigation of 3-D Relational Geometric Features for Kernel-Based 3-D Sign Language Recognition. In: 2019 IEEE International Conference on Intelligent Systems and Green Technology (ICISGT). pp. 31–313. IEEE, Visakhapatnam, India (2019). https://doi.org/10.1109/ICISGT44072.2019.00022.
Brock, H., Law, F., Nakadai, K., Nagashima, Y.: Learning Three-dimensional Skeleton Data from Sign Language Video. ACM Trans. Intell. Syst. Technol. 11, 1–24 (2020). https://doi.org/10.1145/3377552.
Article Google Scholar
Ben Haj Amor, A., El Ghoul, O., Jemni, M.: An EMG dataset for Arabic sign language alphabet letters and numbers. Data in Brief. 51, 109770 (2023). https://doi.org/10.1016/j.dib.2023.109770.
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., Torres, J., Giro-i-Nieto, X.: How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language, http://arxiv.org/abs/2008.08143, (2021).

Download references

Author information

Authors and Affiliations

Mada Qatar Assistive Technology Center, Doha, Qatar
Achraf Othman

Authors

Achraf Othman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Othman, A. (2024). Building Sign Language Datasets. In: Sign Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-031-68763-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-68763-1_7
Published: 01 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68762-4
Online ISBN: 978-3-031-68763-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics