Abstract
Sign Language is the linguistic system adopted by the Deaf to communicate. The lack of fully-fledged Automatic Sign Language (ASLR) technologies contributes to the numerous difficulties that deaf individuals face in the absence of an interpreter, such as in private health appointments or in emergency situations. A challenging problem in the development of reliable ASLR systems is that sign languages do not rely only on manual gestures but also on facial expressions and other non-manual markers. This paper proposes to adopt Facial Action Coding System to encode sign language facial expressions. However, the state-of-the-art of Action Unit (AU) recognition models is mostly targeted to classify two dozen of AUs, typically related to the expression of emotions. We adopted Brazilian Sign Language (Libras) as our case study and we identified more than one hundred of AUs (with a great intersection with other sign languages). We then implemented and evaluated a novel AU recognition model architecture that combines SqueezeNet and geometric-based features. Our model obtained 88% of accuracy for 119 classes. Combined with the state-of-the-art of gesture recognition, our model is ready to improve sign disambiguation and to advance ASLR.
Similar content being viewed by others
Notes
Code available at https://github.com/SrtaEmely/AUdetectionForLibras.
References
Araujo ADSD (2013) As expressões e as marcas não-manuais na língua de sinais brasileira. Universidade de Brasília (UnB). Brasília, Masters dissertation
Baltrusaitis T, Zadeh A, Lim YC, Morency LP (2018) Openface 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, pp 59–66
Batista JC, Albiero V, Bellon OR, Silva L (2017) Aumpnet: simultaneous action units detection and intensity estimation on multipose facial images using a single convolutional neural network. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 866–871
Benitez-Quiroz CF, Srinivasan R, Feng Q, Wang Y, Martinez AM (2017) Emotionet challenge: Recognition of facial expressions of emotion in the wild
Brazil (2002) Decree-law no.10.436, of 24 April 2002. http://www.planalto.gov.br/ccivil_03/leis/2002/l10436.htm. Accessed 20 Jul 2020
Caridakis G, Asteriadis S, Karpouzis K (2014) Non-manual cues in automatic sign language recognition. Pers Ubiquitous Comput 18(1):37–46
Chen Y, Wang J, Chen S, Shi Z, Cai J (2019) Facial motion prior networks for facial expression recognition. In: 2019 IEEE visual communications and image processing (VCIP). IEEE, pp 1–4
Chollet F et al (2018) Keras: the python deep learning library. Astrophysics Source Code Library. record ascl:1806.022
Chu WS, De la Torre F, Cohn JF (2017) Learning spatial and temporal cues for multi-label facial action unit detection. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 25–32
Chu WS, De la Torre F, Cohn JF (2019) Learning facial action units with spatiotemporal cues and multi-label sampling. Image Vis Comput 81:1–14
Silva EP, Costa PDP (2017) Qlibras: a novel database for grammatical facial expressions in brazilian sign language. In: Proceeding of the X Meeting of Students and Teachers of DCA/FEEC/UNICAMP (EADCA)
Dachkovsky S, Sandler W (2009) Visual intonation in the prosody of a sign language. Lang speech 52(2–3):287–314. https://doi.org/10.1177/0023830909103175
De Martino JM, Silva IR, Bolognini CZ, Costa PDP, Kumada KMO, Coradine LC, Brito PHS, do Amaral WM, Benetti ÂB, Poeta ET, Angare LMG, Ferreira CM, De Conti DF (2017) Signing avatars: making education more inclusive. Univers Access in the Inf Soc 16(3):793–808. https://doi.org/10.1007/s10209-016-0504-x
De Vos C, Van Der Kooij E, Crasborn O (2009) Mixed signals: combining linguistic and affective functions of eyebrows in questions in sign language of The Netherlands. Lang Speech 52(2–3):315–339. https://doi.org/10.1177/0023830909103177
dos Santos TS, Xavier AN (2019) Recursos manuais e não-manuais na expressão de intensidade em libras. Leitura 2(63):120–137
Du S, Tao Y, Martinez AM (2014) Compound facial expressions of emotion. Proceedings of the National Academy of Sciences 111(15):E1454–E1462. https://doi.org/10.1073/pnas.1322355111
Dubbaka A, Gopalan A (2020) Detecting learner engagement in MOOCs using automatic facial expression recognition. In: 2020 IEEE global engineering education conference (EDUCON). IEEE, pp 447–456
Ekman P (1993) Facial expression and emotion. Am Psychol 48(4):384
Ekman P, Friesen WV (1978) Manual for the facial action coding system. Consulting Psychologists Press, Palo Alto, CA
Freitas FA, Pere SM, Lima CA, Barbosa FV (2014) Grammatical facial expressions recognition with machine learning. In: The Twenty-seventh international FLAIRS conference (FLAIRS-27). Pensacola Beach, Florida.
Ghosh S, Laksana E, Scherer S, Morency LP (2015) A multi-label convolutional neural network approach to cross-domain action unit detection. In: 2015 international conference on affective computing and intelligent interaction (ACII). IEEE, pp 609–615
Gudi A, Tasli HE, Den Uyl TM, Maroulis A (2015) Deep learning based FACS action unit occurrence and intensity estimation. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG), vol 6. IEEE, pp 1–5
Han S, Meng Z, Li Z, O’Reilly J, Cai J, Wang X, Tong Y (2018) Optimizing filter size in convolutional neural networks for facial action unit recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5070–5078
Hao L, Wang S, Peng G, Ji Q (2018) Facial action unit recognition augmented by their dependencies. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, pp 187–194
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. arXiv preprint arXiv:1602.07360
Itseez G (2015) Open source computer vision library. https://github.com/itseez/opencv. Accessed 20 Jul 2020
Jia X, Liu S, Powers D, Cardiff B (2017) A multi-layer fusion-based facial expression recognition approach with optimal weighted AUs. Appl Sci 7(2):112. https://doi.org/10.3390/app7020112
Jiang B, Valstar MF, Martinez B, Pantic M (2014) A dynamic appearance descriptor approach to facial actions temporal modeling. IEEE Transactions Cybern 44(2):161–174. https://doi.org/10.1109/TCYB.2013.2249063
Kanade T, Tian Y, Cohn JF (2000) Comprehensive database for facial expression analysis. In: Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580). IEEE, p 46–53. https://doi.org/10.1109/AFGR.2000.840611
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1867–1874
Kim Y, Yoo B, Kwak Y, Choi C, Kim J (2017) Deep generative-contrastive networks for facial expression recognition. arXiv preprint arXiv:1703.07140
King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758
Koelstra S, Pantic M, Patras I (2010) A dynamic texture-based approach to recognition of facial actions and their temporal models. IEEE Trans Pattern Anal Mach Intell 32(11):1940–1954
Kreyszig E (2011) Advanced engineering mathematics. International Edition, John Wiley & Sons, NY. 10th Edition, 1152 (ISBN: 978-0-470-64613-7)
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25:1097–1105
Lee M, Pavlovic V, Pantic M (2019) Fast and effective adaptation of facial action unit detection deep model. Presented at 2019 IJCAI Affective Computing Workshop. arXiv preprint arXiv:1909.12158
Lei F, Liu X, Dai Q, Ling BWK (2020) Shallow convolutional neural network for image classification. SN Appli Sci 2(1):97. https://doi.org/10.1007/s42452-019-1903-4
Li W, Abtahi F, Zhu Z (2017) Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1841–1850. arXiv preprint arXiv:1704.03067
Li W, Abtahi F, Zhu Z, Yin L (2017) Eac-net: a region-based deep enhancing and cropping approach for facial action unit detection. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, p. 103–110 arXiv preprint arXiv:1702.02925
Li W, Abtahi F, Zhu Z, Yin L (2018) Eac-net: deep nets with enhancing and cropping for facial action unit detection. IEEE Trans Pattern Anal Mach Intell 40(11):2583–2596. https://doi.org/10.1109/TPAMI.2018.2791608
Liu Z, Dong J, Zhang C, Wang L, Dang J (2020) Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection. In: Ro Y et al. (eds) MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science, vol. 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_40
Martinez B, Valstar MF, Jiang B, Pantic M (2017) Automatic analysis of facial actions: A survey. IEEE Trans Affect Comput 10(3):325–347. https://doi.org/10.1109/TAFFC.2017.2731763
Mavadati M, Sanger P, Mahoor MH (2016) Extended disfa dataset: investigating posed and spontaneous facial expressions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp 1–8
Mavadati SM, Mahoor MH, Bartlett K, Trinh P, Cohn JF (2013) Disfa: a spontaneous facial action intensity database. IEEE Transactions on Affective Computing 4(2):151–160
Mei C, Jiang F, Shen R, Hu Q (2018) Region and temporal dependency fusion for multi-label action unit detection. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 848–853. https://doi.org/10.1109/ICPR.2018.8545069
Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: IEEE Workshop on Applications of Computer Vision (WACV). IEEE, pp 1–10. https://doi.org/10.1109/WACV.2016.7477450
Ntinou, I., Sanchez, E., Bulat, A., Valstar, M., Tzimiropoulos, G. (2020) A transfer learning approach to heatmap regression for action unit intensity estimation. arXiv preprint arXiv:2004.06657
Ong SC, Ranganath S (2005) Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Trans Pattern Anal Mach Intell 6:873–891. https://doi.org/10.1109/TPAMI.2005.112
Pramerdorfer C, Kampel M (2016) Facial expression recognition using convolutional neural networks: state of the art. arXiv preprint arXiv:1612.02903
Rodić A, Urukalo D, Vujović M, Spasojević S, Tomić M, Berns K, Al-Darraji S, Zafar Z (2016) Embodiment of human personality with EI-robots by mapping behaviour traits from live-model, vol 540. Springer, Cham. https://doi.org/10.1007/978-3-319-49058-8_48
Sanchez, E., Tzimiropoulos, G., Valstar, M. (2018) Joint action unit localisation and intensity estimation through heatmap regression. arXiv preprint arXiv:1805.03487
Sankaran N, Mohan DD, Lakshminarayana NN, Setlur S, Govindaraju V. (2020) Domain adaptive representation learning for facial action unit recognition. Pattern Recognition, Elsevier 102:107127. https://doi.org/10.1016/j.patcog.2019.107127
Savran A, Sankur B, Bilge MT (2012) Regression-based intensity estimation of facial action units. Image and Vision Computing, Elsevier 30(10):774–784. https://doi.org/10.1016/j.imavis.2011.11.008
Shao Z, Liu Z, Cai J, Ma L (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European conference on computer vision (ECCV), pp 705–720
Shao Z, Liu Z, Cai J, Ma L (2021) JÂA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention. International Journal of Computer Vision, Springer 129:321–340. https://doi.org/10.1007/s11263-020-01378-z
Shao Z, Liu Z, Cai J, Wu Y, Ma L (2019) Facial Action Unit Detection Using Attention and Relation Learning. In: IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2019.2948635
Shao Z, Zou L, Cai J, Wu Y, Ma L (2020) Spatio-temporal relation and attention learning for facial action unit detection. arXiv preprint arXiv:2001.01168
Silva EP, Costa PDP, Kumada KMO, De Martino JM (2020) Silfa: Sign language facial action database for the development of assistive technologies for the deaf. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp 382–386. https://doi.org/10.1109/FG47880.2020.00059
Silva EP, Costa PDP, Kumada KMO, De Martino JM, Florentino GA (2020) August) Recognition of Affective and Grammatical Facial Expressions: A Study for Brazilian Sign Language, vol 12536. Springer, Cham, pp 218–236. https://doi.org/10.1007/978-3-030-66096-3_16
Silva EP (2020) Facial expression recognition in Brazilian sign language using facial action coding system: Reconhecimento de expressões faciais na língua de sinais brasileira por meio do sistema de códigos de ação facial. University of Campinas, School of Electrical and Computer Engineering. Campinas, SP. Ph.D. thesis
Silv EP, Costa PDP (2017) Recognition of non-manual expressions in brazilian sign language. In: 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017). IEEE, Doctoral Consortium
Simard PY, Steinkraus D, Platt JC (2003) Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In Seventh International Conference on Document Analysis and Recognition (ICDAR 2003). Proceedings. Vol. 3, pp 958-958. IEEE Computer Society. https://doi.org/10.1109/ICDAR.2003.1227801
Simonyan, K., & Zisserman, A. (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Spitzbart A (1960) A generalization of Hermite's interpolation formula. Am Mathe Mon 67(1):42–46. https://doi.org/10.1080/00029890.1960.11989446
Stokoe WC (1960). Sign Language Structure. Studies in Linguistics Occasional Papers 8. Silver Spring, MD: Linstok press (Revised 1978)
Sun N, Li Q, Huan R, Liu J, Han G (2019) Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recogn Letters, Elsevier 119:49–61. https://doi.org/10.1016/j.patrec.2017.10.022
Valstar MF, Pantic M (2011) Fully automatic recognition of the temporal phases of facial actions. IEEE Trans Sys Man Cybern Part B (Cybernetics) 42(1):28–43. https://doi.org/10.1109/TSMCB.2011.2163710
Velusamy S, Kannan H, Anand B, Sharma A, Navathe B (2011) A method to infer emotions from facial action units. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2028-2031. https://doi.org/10.1109/ICASSP.2011.5946910
Viola P, Jones MJ (2004) Robust Real-Time Face Detection. Int J Comput Vis 57:137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Vural E, Cetin M, Ercil A, Littlewort G, Bartlett M, Movellan J (2007) Drowsy driver detection through facial movement analysis, vol 4796. Springer, Berlin, Heidelberg, pp 6–18. https://doi.org/10.1007/978-3-540-75773-3_2
Walecki R, Pavlovic V, Schuller B, Pantic M (2017) Deep structured learning for facial action unit intensity estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 3405–3414. https://doi.org/10.1109/CVPR.2017.605
Xiong L, Karlekar J, Zhao J, Cheng Y, Xu Y, Feng J, Pranata S, Shen S (2017) A good practice towards top performance of face recognition: Transferred deep feature fusion. arXiv preprint. arXiv:1704.00438
Xu X, de Sa VR (2020) Exploring multidimensional measurements for pain evaluation using facial action units. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). pp 786–792. IEEE. https://doi.org/10.1109/FG47880.2020.00087
Yabunaka K, Mori Y, Toyonaga M (2018) Facial expression sequence recognition for a japanese sign language training system. In 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS). pp 1348–1353. IEEE. https://doi.org/10.1109/SCIS-ISIS.2018.00210
Yang H, Ciftci U, Yin L (2018) Facial expression recognition by de-expression residue learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2168–2177. https://doi.org/10.1109/CVPR.2018.00231
Yang HD, Lee SW (2013) Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine. Pattern Recogn Lett 34(16):2051–2056. https://doi.org/10.1016/j.patrec.2013.06.022
Zhang X, Yin L, Cohn JF, Canavan S, Reale M, Horowitz A, LiuP Girard JM (2014) Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis Comput 32(10):692–706. https://doi.org/10.1016/j.imavis.2014.06.002
Zhao K, Chu WS, Martinez AM (2018) Learning facial action units from web images with scalable weakly supervised clustering. In Proceedings of the IEEE Conference on computer vision and pattern recognition 1:2090–2099. https://doi.org/10.1109/CVPR.2018.00223
Zhi R, Liu M, Zhang D (2020) A comprehensive survey on automatic facial action unit analysis. Visual Comput 36:1067–1093. https://doi.org/10.1007/s00371-019-01707-5
Zhi R, Zhou C, Li T, Liu S, Jin Y (2021) Action unit analysis enhanced facial expression recognition by deep neural network evolution. Neurocomputing 425:135–148. https://doi.org/10.1016/j.neucom.2020.03.036
Zhong L, Liu Q, Yang P, Huang J, Metaxas DN (2015) Learning multiscale active facial patches for expression analysis. IEEE transactions on cybernetics 45(8):1499–1510. https://doi.org/10.1109/TCYB.2014.2354351
Acknowledgements
The research for this paper was financially supported by the National Council for the Improvement of Higher Education (CAPES)—Brazil.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
da Silva, E.P., Costa, P.D.P., Kumada, K.M.O. et al. Facial action unit detection methodology with application in Brazilian sign language recognition. Pattern Anal Applic 25, 549–565 (2022). https://doi.org/10.1007/s10044-021-01024-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-01024-5