Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3428658.3430980acmconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

A Two-Stream Model Based on 3D Convolutional Neural Networks for the Recognition of Brazilian Sign Language in the Health Context

Published: 30 November 2020 Publication History

Abstract

Deaf people are a considerable part of the world population and communicate naturally using sign languages. However, although many countries adopt their sign language as an official language, there are linguistics barriers to accessing fundamental rights, especially access to health services, even more critical situation in the midst of the COVID-19 crisis. This situation has been the focus of some government policies that oblige essential service providers to provide sign language interpreters to assist Deaf people. However, this type of solution has high operating costs, mainly to serve the entire Deaf community in all environments. These setbacks motivate the investigation of methodologies and automated tools to support this type of problem. In this paper, we address this problem by proposing two-stream model for the recognition of the Brazilian Sign Language (Libras) in the health context. The proposed solution does not use any additional capture sensor or hardware, being entirely base on images or sequences of images (videos). The results show that the solution is able to recognize the Libras signs in the test dataset reasonably well, achieved an average accuracy of approximately 96,12% considering a scenario where the interpreter used in the test set was not used in the training set, which shows that there are good evidence that it can assist in the communication process with Deaf people. Besides, an additional contribution of this paper is the introduction of a new dataset in the Brazilian sign language (Libras) containing 5000 videos of 50 signs in the health context, which may assist the development and research of other solutions.

References

[1]
2018. Classificação automática de sinais visuais da Língua Brasileira de Sinais representados por caracterização espaço-temporal. Master's thesis. https://tede.ufam.edu.br/handle/tede/6645 Instituto de Computação.
[2]
Rini Akmeliawati, Melanie Po-Leen Ooi, and Ye Chow Kuang. 2007. Real-Time Malaysian Sign Language Translation using Colour Segmentation and Neural Network. In 2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007. IEEE. https://doi.org/10.1109/imtc.2007.379311
[3]
Jamilly da Silva Aragão, Inacia Sátiro Xavier de Francisco, Alexsandro Silva Coura, Francisco Stélio de Sousa, Joana D'arc Lyra Batista, and Isabella Medeiros de Oliveira Magalhões. 2015. A content validity study of signs, symptoms and diseases/health problems expressed in LIBRAS. Revista Latino-Americana de Enfermagem 23 (12 2015), 1014--1023. http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0104-11692015000601014&nrm=iso
[4]
Tiago Araujo, Felipe Ferreira, Danilo Silva, Leonardo Oliveira, Eduardo Falcão, Vandhuy Martins, Igor Portela, Yurika Nóbrega, Hozana Lima, Guido Souza Filho, Tatiana Tavares, and Alexandre Duarte. 2014. An approach to generate and embed sign language video tracks into multimedia contents. Information Sciences 281 (04 2014), 762-. https://doi.org/10.1016/j.ins.2014.04.008
[5]
S. Bessa Carneiro, E. D. F. De M. Santos, T. M. De A. Barbosa, J. O. Ferreira, S. G. Soares Alcalá, and A. F. Da Rocha. 2016. Static gestures recognition for Brazilian Sign Language with kinect sensor. In 2016 IEEE SENSORS. 1--3.
[6]
Vivek Bheda and Dianna Radpour. 2017. Using Deep Convolutional Networks for Gesture Recognition in American Sign Language. CoRR abs/1710.06836 (2017). arXiv:1710.06836 http://arxiv.org/abs/1710.06836
[7]
Nguyen Dang Binh and Toshiaki Ejima. 2005. Real-Time Malaysian Sign Language Translation using Colour Segmentation and Neural Network. In Proc. ICGST Int. Conf. Graph. Vision Image Process. 1--6. http://www.math.hcmuns.edu.vn/~ptbao/LVTN/2003/hand_GestureP1150535210.pdf
[8]
G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).
[9]
Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. (2017). arXiv:1705.07750 [cs.CV]
[10]
Roberto Cavararo. 2010. Características gerais da população, religião e pessoas com defciência. Instituto Brasileiro de Geografa e Estatística (IBGE). 211 pages. https://biblioteca.ibge.gov.br/visualizacao/periodicos/94/cd_2010_religiao_deficiencia.pdf
[11]
Ming Jin Cheok, Zaid Omar, and Mohamed Hisham Jaward. 2017. A review of hand gesture and sign language recognition techniques. International Journal of Machine Learning and Cybernetics (aug 2017). https://doi.org/10.1007/s13042-017-0705-5
[12]
François Chollet et al. 2015. Keras. https://keras.io.
[13]
C. Chuan, E. Regina, and C. Guardino. 2014. American Sign Language Recognition Using Leap Motion Sensor. In 2014 13th International Conference on Machine Learning and Applications. 541--544.
[14]
Helen Cooper, Brian Holt, and Richard Bowden. 2011. Sign Language Recognition. In Visual Analysis of Humans. Springer London, 539--562. https://doi.org/10.1007/978-0-85729-997-0_27
[15]
Helen Cooper, Eng-Jon Ong, Nicolas Pugeault, and Richard Bowden. 2017. Sign Language Recognition Using Sub-units. Springer International Publishing, Cham, 89--118. https://doi.org/10.1007/978-3-319-57021-1_3
[16]
Helen Cooper, Nicolas Pugeault, and Richard Bowden. 2011. Reading the signs: A video based sign dictionary. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE. https://doi.org/10.1109/iccvw.2011.6130349
[17]
Tiago Maritan Ugulino de Araújo, Felipe Lacet S. Ferreira, Danilo Assis Nobre dos S. Silva, Felipe Hermínio Lemos, Gutenberg Pessoa Botelho Neto, Derzu Omaia, Guido Lemos de Souza Filho, and Tatiana A. Tavares. 2012. Automatic generation of Brazilian sign language windows for digital TV systems. Journal of the Brazilian Computer Society 19 (2012), 107--125.
[18]
Maria Fernanda Neves Silveira de Souza, Amanda Miranda Brito Araújo, Luiza Fernandes Fonseca Sandes, Daniel Antunes Freitas, Wellington Danilo Soares, Raquel Schwenck de Mello Vianna, and Árlen Almeida Duarte de Sousa. 2017. Principais dificuldades e obstáculos enfrentados pela comunidade surda no acesso à saúde: uma revisão integrativa de literatura. Revista CEFAC 19, 3 (jun 2017), 395--405. https://doi.org/10.1590/1982-0216201719317116
[19]
Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2014. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. arXiv e-prints, Article arXiv:1411.4389 (Nov 2014), arXiv:1411.4389 pages. arXiv:1411.4389 [cs.CV]
[20]
Ruo Du, Qiang Wu, Xiangjian He, and Jie Yang. 2012. Object Categorization Based on a Supervised Mean Shift Algorithm. In Computer Vision - ECCV 2012. Workshops and Demonstrations, Andrea Fusiello, Vittorio Murino, and Rita Cucchiara (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 611--614.
[21]
R. Galicia, O. Carranza, E. D. Jiménez, and G. E. Rivera. 2015. Mexican sign language recognition using movement sensor. In 2015 IEEE 24th International Symposium on Industrial Electronics (ISIE). 573--578.
[22]
Hand Talk. 2013. Hand Talk. https://www.handtalk.me/br
[23]
Matt Huenerfauth. 2004. A multi-path architecture for machine translation of English text into American Sign Language animation. (05 2004). https://doi.org/10.3115/1614038.1614043
[24]
Matt Huenerfauth. 2008. Generating American Sign Language animation: overcoming misconceptions and technical challenges. Universal Access in the Information Society 6 (02 2008), 419--434. https://doi.org/10.1007/s10209-007-0095-7
[25]
A. B. Jani, N. A. Kotak, and A. K. Roy. 2018. Sensor Based Hand Gesture Recognition System for English Alphabets Used in Sign Language of Deaf-Mute People. In 2018 IEEE SENSORS. 1--4.
[26]
L. Kau, W. Su, P. Yu, and S. Wei. 2015. A real-time portable sign language translation system. In 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS). 1--4.
[27]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. arXiv:1705.06950 [cs.CV]
[28]
F. Kaya, A. F. Tuncer, and Ş. K. Yildiz. 2018. Detection of the turkish sign language alphabet with strain sensor based data glove. In 2018 26th Signal Processing and Communications Applications Conference (SIU). 1--4.
[29]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980
[30]
Verónica López-Ludeña, Carlos Morcillo, Juan Carlos López, Roberto Barra-Chicote, Ricardo Cordoba, and Ruben Hernandez. 2014. Translating bus information into sign language for deaf people. Engineering Applications of Artificial Intelligence 32 (06 2014). https://doi.org/10.1016/j.engappai.2014.02.006
[31]
Verónica López-Ludeña, Carlos Morcillo, Juan Carlos López, E. Ferreiro, Javier Ferreiros, and Ruben Hernandez. 2014. Methodology for developing an advanced communications system for the Deaf in a new domain. Knowledge-Based Systems 56 (01 2014), 240--252. https://doi.org/10.1016/j.knosys.2013.11.017
[32]
Mahshid Majd and Reza Safabakhsh. 2019. Correlational Convolutional LSTM for Human Action Recognition. Neurocomputing (04 2019). https://doi.org/10.1016/j.neucom.2018.10.095
[33]
Sarfaraz Masood, Adhyan Srivastava, Harish Thuwal, and Musheer Ahmad. 2018. Real-Time Sign Language Gesture (Word) Recognition from Video Sequences Using CNN and RNN. 623--632. https://doi.org/10.1007/978-981-10-7566-7_63
[34]
Sara Morrissey and Andy Way. 2013. Manual labour: Tackling machine translation for sign languages. Machine Translation 27 (03 2013). https://doi.org/10.1007/s10590-012-9133-1
[35]
Eng-Jon Ong, Oscar Koller, Nicolas Pugeault, and Richard Bowden. 2014. Sign Spotting Using Hierarchical Sequential Patterns with Temporal Intervals. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2014.248
[36]
World Health Organization. 2013. Millions of people in the world have hearing loss that can be treated or prevented. WHO. http://www.who.int/pbd/deafness/news/Millionslivewithhearingloss.pdf
[37]
M. Oszust and M. Wysocki. 2013. Polish sign language words recognition with Kinect. In 2013 6th International Conference on Human System Interactions (HSI). 219--226.
[38]
Lionel Pigou, Sander Dieleman, Pieter-Jan Kindermans, and Benjamin Schrauwen. 2015. Sign Language Recognition Using Convolutional Neural Networks. In Computer Vision - ECCV 2014 Workshops. Springer International Publishing, 572--578. https://doi.org/10.1007/978-3-319-16178-5_40
[39]
Franco Ronchetti, Facundo Quiroga, Cesar Estrebou, Laura Lanzarini, and Alejandro Rosete. 2016. LSA64: An Argentinian Sign Language Dataset.
[40]
Rybena. 2003. https://www.grupoicts.com.br/. https://portal.rybena.com.br/site-rybena/
[41]
Umar Shoaib, Nadeem Ahmad, Paolo Prinetto, and G. Tiotto. 2013. Integrating MultiWordNet with Italian Sign Language lexical resources. Expert Systems with Applications (01 2013). https://doi.org/10.1016/j.eswa.2013.09.027
[42]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv e-prints, Article arXiv:1212.0402 (Dec 2012), arXiv:1212.0402 pages. arXiv:1212.0402 [cs.CV]
[43]
VLibras. 2016. VLibras. https://www.vlibras.gov.br/
[44]
J. Wan, S. Z. Li, Y. Zhao, S. Zhou, I. Guyon, and S. Escalera. 2016. ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 761--769.
[45]
J. Wu, L. Sun, and R. Jafari. 2016. A Wearable System for Recognizing American Sign Language in Real-Time Using IMU and Surface EMG Sensors. IEEE Journal of Biomedical and Health Informatics 20, 5 (2016), 1281--1290.
[46]
L. Zhang, G. Zhu, P. Shen, J. Song, S. A. Shah, and M. Bennamoun. 2017. Learning Spatiotemporal Features Using 3DCNN and Convolutional LSTM for Gesture Recognition. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 3120--3128.

Cited By

View all
  • (2024)Connecting Silent Worlds: Requirements for Automatic Oral-Sign Language TranslationProceedings of the XXIII Brazilian Symposium on Human Factors in Computing Systems10.1145/3702038.3702066(1-14)Online publication date: 7-Oct-2024

Index Terms

  1. A Two-Stream Model Based on 3D Convolutional Neural Networks for the Recognition of Brazilian Sign Language in the Health Context

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WebMedia '20: Proceedings of the Brazilian Symposium on Multimedia and the Web
    November 2020
    364 pages
    ISBN:9781450381963
    DOI:10.1145/3428658
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • SBC: Brazilian Computer Society
    • CNPq: Conselho Nacional de Desenvolvimento Cientifico e Tecn
    • CGIBR: Comite Gestor da Internet no Brazil
    • CAPES: Brazilian Higher Education Funding Council

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 November 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Datasets
    2. Deep Learning
    3. Libras
    4. Neural Networks
    5. Sign Language

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WebMedia '20
    Sponsor:
    WebMedia '20: Brazillian Symposium on Multimedia and the Web
    November 30 - December 4, 2020
    São Luís, Brazil

    Acceptance Rates

    WebMedia '20 Paper Acceptance Rate 34 of 87 submissions, 39%;
    Overall Acceptance Rate 270 of 873 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Connecting Silent Worlds: Requirements for Automatic Oral-Sign Language TranslationProceedings of the XXIII Brazilian Symposium on Human Factors in Computing Systems10.1145/3702038.3702066(1-14)Online publication date: 7-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media