Abstract
This chapter focuses on the methodologies and frameworks essential for building comprehensive sign language datasets. It outlines the critical steps in data collection, including participant recruitment, video recording, and annotation processes, ensuring high-quality and representative data. The chapter discusses the different types of sign language datasets, such as lexical databases, conversational corpora, and annotated video corpora, highlighting their importance for various research and technological applications. Additionally, it addresses the challenges and best practices in dataset creation, emphasizing the need for ethical considerations and community involvement. By providing a detailed guide on constructing robust sign language datasets, the chapter aims to support researchers and developers in advancing sign language technologies and promoting inclusive linguistic research.
Notes
- 1.
About SignStream®: https://www.bu.edu/asllrp/SignStream/3/
References
El Ghoul, O.E.G., Othman, A.O., Aziz, M.A., Sedrati, S.S.: JUMLA-QSL-22: A dataset of Qatari sign language sentences, https://ieee-dataport.org/open-access/jumla-qsl-22-dataset-qatari-sign-language-sentences. https://doi.org/10.21227/CKZP-3754.
Othman, A., El Ghoul, O., Aziz, M., Chemnad, K., Sedrati, S., Dhouib, A.: JUMLA-QSL-22: Creation and Annotation of a Qatari Sign Language Corpus for Sign Language Processing. In: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments. pp. 686–692 (2023).
Othman, A., El Ghoul, O.: BuHamad - The first Qatari virtual interpreter for Qatari Sign Language. NAFATH. 7, (2022). https://doi.org/10.54455/MCN.20.01.
Shi, B., Brentari, D., Shakhnarovich, G., Livescu, K.: Open-Domain Sign Language Translation Learned from Online Video, http://arxiv.org/abs/2205.12870, (2022).
Neidle, C., Thangali, A., Sclaroff, S.: Challenges in development of the American Sign Language Lexicon Video Dataset (ASLLVD) corpus. Presented at the (2012).
Sams, A., Akash, A.H., Rahman, S.M.M.: SignBD-Word: Video-Based Bangla Word-Level Sign Language and Pose Translation. In: 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT). pp. 1–7. IEEE, Delhi, India (2023). https://doi.org/10.1109/ICCCNT56998.2023.10306914.
Neidle, C., Opoku, A., Metaxas, D.: ASL Video Corpora & Sign Bank: Resources Available through the American Sign Language Linguistic Research Project (ASLLRP), http://arxiv.org/abs/2201.07899, (2022).
Li, D., Opazo, C.R., Yu, X., Li, H.: Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 1448–1458. IEEE, Snowmass Village, CO, USA (2020). https://doi.org/10.1109/WACV45572.2020.9093512.
Adithya, V., Rajesh, R.: Hand gestures for emergency situations: A video dataset based on words from Indian sign language. Data in Brief. 31, 106016 (2020). https://doi.org/10.1016/j.dib.2020.106016.
Bohacek, M., Hruz, M.: Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries. In: 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). pp. 1–6. IEEE, Waikoloa Beach, HI, USA (2023). https://doi.org/10.1109/FG57933.2023.10042544.
Cooper, H., Pugeault, N., Bowden, R.: Reading the signs: A video based sign dictionary. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). pp. 914–919. IEEE, Barcelona, Spain (2011). https://doi.org/10.1109/ICCVW.2011.6130349.
Jedlička, P., Krňoul, Z., Železný, M., Muller, L.: MC-TRISLAN: A Large 3D Motion Capture Sign Language Data-set. Presented at the SIGNLANG (2022).
Kiran Kumar, E., Kishore, P.V.V., Sastry, A.S.C.S., Anil Kumar, D.: 3D Motion Capture for Indian Sign Language Recognition (SLR). In: Satapathy, S.C., Bhateja, V., and Das, S. (eds.) Smart Computing and Informatics. pp. 21–29. Springer Singapore, Singapore (2018). https://doi.org/10.1007/978-981-10-5547-8_3.
Watanabe, K., Nagashima, Y., Hara, D., Horiuchi, Y., Sako, S., Ichikawa, A.: Construction of a Japanese Sign Language Database with Various Data Types. In: Stephanidis, C. (ed.) HCI International 2019 - Posters. pp. 317–322. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-23522-2_41.
Benchiheub, M.-E.-F., Berret, B., Braffort, A.: Collecting and Analysing a Motion-Capture Corpus of French Sign Language. Presented at the (2016).
Dibeklioglu, H., Dikici, E., Santemiz, P., Balci, K., Akarun, L.: Sign Language Motion Tracking and Generating 3D Motion Pieces Using 2D Features. In: 2007 IEEE 15th Signal Processing and Communications Applications. pp. 1–4. IEEE, Eskisehir, Turkey (2007). https://doi.org/10.1109/SIU.2007.4298843.
Kiran, P.S., Kumar, D.A., Kishore, P.V.V., Kumar, E.K., Kumar, M.T.K., Sastry, A.S.C.S.: Investigation of 3-D Relational Geometric Features for Kernel-Based 3-D Sign Language Recognition. In: 2019 IEEE International Conference on Intelligent Systems and Green Technology (ICISGT). pp. 31–313. IEEE, Visakhapatnam, India (2019). https://doi.org/10.1109/ICISGT44072.2019.00022.
Brock, H., Law, F., Nakadai, K., Nagashima, Y.: Learning Three-dimensional Skeleton Data from Sign Language Video. ACM Trans. Intell. Syst. Technol. 11, 1–24 (2020). https://doi.org/10.1145/3377552.
Ben Haj Amor, A., El Ghoul, O., Jemni, M.: An EMG dataset for Arabic sign language alphabet letters and numbers. Data in Brief. 51, 109770 (2023). https://doi.org/10.1016/j.dib.2023.109770.
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., Torres, J., Giro-i-Nieto, X.: How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language, http://arxiv.org/abs/2008.08143, (2021).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Othman, A. (2024). Building Sign Language Datasets. In: Sign Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-031-68763-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-68763-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68762-4
Online ISBN: 978-3-031-68763-1
eBook Packages: Computer ScienceComputer Science (R0)