review-article

A shapelet-based framework for large-scale word-level sign language database auto-construction

Authors:

Lin YuanAuthors Info & Claims

Neural Computing and Applications, Volume 35, Issue 1

Pages 253 - 274

https://doi.org/10.1007/s00521-022-08018-2

Published: 20 November 2022 Publication History

Abstract

Sign language recognition is a challenging and often underestimated problem that includes the asynchronous integration of multimodal articulators. Learning powerful applied statistical models requires much training data. However, well-labelled sign language databases are a scarce resource due to the high cost of manual labelling and performing. On the other hand, there exist a lot of sign language-interpreted videos on the Internet. This work aims to propose a framework to automatically learn a large-scale sign language database from sign language-interpreted videos. We achieved this by exploring the correspondence between subtitles and motions by discovering shapelets which are the most discriminative subsequences within the data sequences. In this paper, two modified shapelet methods were used to identify the target signs for 1000 words from 89 (96 h, 8 naive signers) sign language-interpreted videos in terms of brute force search and parameter learning. Then, an augmented (3–5 times larger) large-scale word-level sign database was finally constructed using an adaptive sample augmentation strategy that collected all similar video clips of the target sign as valid samples. Experiments on a subset of 100 words revealed a considerable speedup and 14% improvement in recall rate. The evaluation of three state-of-the-art sign language classifiers demonstrates the good discrimination of the database, and the sample augmentation strategy can significantly increase the recognition accuracy of all classifiers by 10–33% by increasing the number, variety, and balance of the data.

References

[1]

Vos T, Barber RM, Bell B, Bertozzi-Villa A, Biryukov S, Bolliger I, Charlson F, Davis A, Degenhardt L, and Dicker D Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188 countries, 1990–2013: a systematic analysis for the global burden of disease study 2013 The Lancet 2015 386 9995 743-800

[2]

Olusanya BO, Neumann KJ, and Saunders JE The global burden of disabling hearing impairment: a call to action Bull World Health Organ 2014 92 367-373

[3]

Stokoe J and William C Sign language structure: an outline of the visual communication systems of the American deaf J Deaf Studi Deaf Educ 2005 10 1 3-37

[4]

Rabiner LR (1989) Tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol 77, pp 257–286.

[5]

McCallum A, Freitag D, Pereira FCN (2000) Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000), pp 591–598

[6]

Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001), pp 282–289

[7]

Yu S-H, Huang C-L, Hsu S-C, Lin H-W, Wang H-W (2011) Vision-based continuous sign language recognition using product hmm. In: The first Asian conference on pattern recognition, pp 510–514.

[8]

Wu C-H, Lin J-C, and Wei W-L Two-level hierarchical alignment for semi-coupled hmm-based audiovisual emotion recognition with temporal course IEEE Trans Multimedia 2013 15 8 1880-1895

[9]

Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1724–1734.

[10]

Li D, Opazo CR, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: 2020 IEEE winter conference on applications of computer vision (WACV), pp 1448–1458.

[11]

Carreira J, Zisserman A (2017) Quo Vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4724–4733.

[12]

Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, School of Computer Science and Engineering, University of New South Wales

[13]

Fels SS and Hinton GE Glove-talk: a neural network interface between a data-glove and a speech synthesizer IEEE Trans Neural Netw 1993 4 1 2-8

[14]

Gao W, Ma J, Shan S, Chen X, Zeng W, Zhang H, Yan J, Wu J (2000) Handtalker: a multimodal dialog system using sign language and 3-d virtual human. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 1948. Beijing, China, pp 564–571.

[15]

Chai X, Wang H, Chen X (2014) The Devisign large vocabulary of Chinese sign language database and baseline evaluations. Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS

[16]

Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, and Brilliant L Detecting influenza epidemics using search engine query data Nature 2009 457 7232 1012-1014

[17]

Xu E, Nemati S, and Tremoulet AH A deep convolutional neural network for Kawasaki disease diagnosis Sci Rep 2022 12 1 1-6

[18]

Morales J, Yoshimura N, Xia Q, Wada A, Namioka Y, Maekawa T (2022) Acceleration-based human activity recognition of packaging tasks using motif-guided attention networks. In: 2022 IEEE international conference on pervasive computing and communications (PerCom), pp 1–12.

[19]

Kumar P, Roy PP, and Dogra DP Independent Bayesian classifier combination based sign language recognition using facial expression Inf Sci 2018 428 30-48

[20]

Saeed S, Mahmood MK, and Khan YD An exposition of facial expression recognition techniques Neural Comput Appl 2018 29 9 425-443

[21]

Shao Z, Li YF (2013) A new descriptor for multiple 3d motion trajectories recognition. In: 2013 IEEE international conference on robotics and automation, pp 4749–4754.

[22]

Shao Z and Li Y Integral invariants for space motion trajectory matching and recognition Pattern Recogn 2015 48 8 2418-2432

[23]

Wang H, Chai X, and Chen X Sparse observation (so) alignment for sign language recognition Neurocomputing 2016 175 674-685

[24]

Kumar EK, Kishore PVV, Kiran Kumar MT, and Kumar DA 3d sign language recognition with joint distance and angular coded color topographical descriptor on a 2 stream CNN Neurocomputing 2020 372 40-54

[25]

Ma X, Yuan L, Wen R, Wang Q (2020) Sign language recognition based on concept learning. In: 2020 IEEE international instrumentation and measurement technology conference (I2MTC), pp 1–6.

[26]

Wadhawan A and Kumar P Deep learning-based sign language recognition system for static signs Neural Comput Appl 2020 32 12 7957-7968

[27]

Güney S and Erkuş M A real-time approach to recognition of Turkish sign language by using convolutional neural networks Neural Comput Appl 2021

[28]

Huang J, Zhou W, Zhang Q, Li H, Li W (2018) Video-based sign language recognition without temporal segmentation. In: 32nd AAAI conference on artificial intelligence, AAAI 2018, vol 32, pp 2257–2264

[29]

Kumar P, Gauba H, Pratim Roy P, and Prosad Dogra D A multimodal framework for sensor based sign language recognition Neurocomputing 2017 259 21-38

[30]

Gao L, Li H, Liu Z, Liu Z, Wan L, and Feng W RNN-transducer based Chinese sign language recognition Neurocomputing 2021 434 45-54

[31]

Cihan Camgöz N, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10020–10030.

[32]

Liu Y, Zhang H, Xu D, and He K Graph transformer network with temporal kernel attention for skeleton-based action recognition Knowl Based Syst 2022 240 108146

[33]

Sun M, Savarese S (2011) Articulated part-based model for joint object detection and pose estimation. In: Proceedings of the IEEE international conference on computer vision, Barcelona, Spain, pp 723–730.

[34]

Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 07-12-June-2015, Boston, MA, USA, pp 648–656.

[35]

Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2016-December, Las Vegas, NV, USA, pp 4724–4732.

[36]

Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings-30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017-January, Honolulu, HI, USA, pp 4645–4653.

[37]

JOZE HV, Koller O (2016) Ms-asl: a large-scale data set and benchmark for understanding American sign language. In: Proceedings of the British machine vision conference, pp 41–14116.

[38]

Albanie S, Varol G, Momeni L, Afouras T, Chung JS, Fox N, Zisserman A (2020) Bsl-1k: scaling up co-articulated sign language recognition using mouthing cues. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 12356 LNCS, Glasgow, UK, pp 35–53.

[39]

Momeni L, Varol G, Albanie S, Afouras T, Zisserman A (2021) Watch, read and lookup: learning to spot signs from multiple supervisors. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 12627 LNCS, pp 291–308.

[40]

Barbara L, Loeding AP, Sudeep Sarkar, Karshmer AI (2004) Progress in automated computer recognition of sign language. In: Computers helping people with special needs, 9th international conference, ICCHP 2004, Paris, France, July 7–9, 2004, Proceedings. Lecture notes in computer science, vol 3118, pp 1079–1087.

[41]

Martinez AM, Wilbur RB, Shay R, Kak AC (2002) Purdue RVL-SLLL ASL database for automatic recognition of American sign language. In: Proceedings 4th IEEE international conference on multimodal interfaces, ICMI 2002, pp 167–172.

[42]

Zahedi M, Keysers D, Deselaers T, Ney H (2005) Combination of tangent distance and an image distortion model for appearance-based sign language. In: Lecture notes in computer science, vol 3663, Vienna, Austria, pp 401–408.

[43]

Liu B, Xiao Y, and Hao Z A selective multiple instance transfer learning method for text categorization problems Knowl Based Syst 2018 141 178-187

[44]

Zhang Y, Zhang H, and Tian Y Sparse multiple instance learning with non-convex penalty Neurocomputing 2020 391 142-156

[45]

Buehler P, Everingham M, Zisserman A (2009) Learning sign language by watching tv (using weakly aligned subtitles). In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops, CVPR workshops 2009, pp 2961–2968.

[46]

Pfister T, Charles J, Zisserman A (2013) Large-scale learning of sign language by watching tv (using co-occurrences). In: Proceedings of the British machine vision conference, pp 20–12011.

[47]

Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: Advances in neural information processing systems, vol 15, pp 561–568

[48]

Cooper H, Bowden R (2009) Learning signs from subtitles: a weakly supervised approach to sign language recognition. In: 2009 IEEE conference on computer vision and pattern recognition, pp 2568–2574.

[49]

Varol G, Momeni L, Albanie S, Afouras T, Zisserman A (2021) Read and attend: temporal localisation in sign language videos. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16852–16861.

[50]

Miech A, Alayrac J-B, Smaira L, Laptev I, Sivic J, Zisserman A (2020) End-to-end learning of visual representations from uncurated instructional videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 9876–9886.

[51]

Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, pp 947–955.

[52]

Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 1154–1162.

[53]

Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: SIAM international conference on data mining 2013, SMD 2013, Austin, TX, USA, pp 668–676

[54]

Chang K-W, Deka B, Hwu W-MW, Roth D (2012) Efficient pattern-based time series classification on GPU. In: Proceedings-IEEE international conference on data mining, ICDM, Brussels, Belgium, pp 131–140.

[55]

Ji C, Zhao C, Liu S, Yang C, Pan L, Wu L, and Meng X A fast shapelet selection algorithm for time series classification Comput Netw 2019 148 231-240

[56]

Hu Y, Zhan P, Xu Y, Zhao J, Li Y, and Li X Temporal representation learning for time series classification Neural Comput Appl 2021 33 8 3169-3182

[57]

Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, USA, pp 392–401.

[58]

Zhang Z, Zhang H, Wen Y, Zhang Y, and Yuan X Discriminative extraction of features from time series Neurocomputing 2018 275 2317-2328

[59]

Shah M, Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2016) Learning DTW-Shapelets for time-series classification. In: Proceedings of the 3rd IKDD conference on data science, 2016, pp 1–8.

[60]

Ma Q, Zhuang W, Li S, Huang D, Cottrell G (2020) Adversarial dynamic shapelet networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 5069–5076

[61]

Pfister T, Charles J, Zisserman A (2014) Domain-adaptive discriminative one-shot learning of gestures. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8694 LNCS, Zurich, Switzerland, pp 814–829.

[62]

Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: Proceedings-IEEE international conference on data mining, ICDM, vol 0, Barcelona, Catalonia, Spain, pp 1317–1322.

[63]

Zhu Y, Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile ii: exploiting a novel algorithm and GPUS to break the one hundred million barrier for time series motifs and joins. In: Proceedings-IEEE international conference on data mining, ICDM, vol 0, Barcelona, Catalonia, Spain, pp 739–748.

[64]

Parliament S (2021) The playlist of BSL videos. https://youtube.com/playlist?list=PL4l0q4AbG0mmB3AEL6F-DCjK7uhRp0ll_. Accessed 21 July

[65]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, and Fei-Fei L Imagenet large scale visual recognition challenge Int J Comput Vis 2015 115 3 211-252

Recommendations

Sign Language Recognition, Generation, and Modelling: A Research Effort with Applications in Deaf Communication
UAHCI '09: Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Addressing Diversity. Part I: Held as Part of HCI International 2009

Sign language and Web 2.0 applications are currently incompatible, because of the lack of anonymisation and easy editing of online sign language contributions. This paper describes Dicta-Sign, a project aimed at developing the technologies required for ...
Integrating MultiWordNet with Italian Sign Language lexical resources

A novel Italian Sign Language MultiWordNet (LMWN), which integrates the MultiWordNet (MWN) lexical database with the Italian Sign Language (LIS), is presented in this paper. The approach relies on LIS lexical resources which support and help to search ...
Chinese Sign Language Recognition with Batch Sampling ResNet-Bi-LSTM
Abstract
Sign language has served as a communication medium between the Deaf community and society. Nonetheless, the practice of sign language is not common in Chinese society, along with a lack of professional sign language interpreters. Most existing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neural Computing and Applications

Neural Computing and Applications Volume 35, Issue 1

Jan 2023

1023 pages

ISSN:0941-0643

EISSN:1433-3058

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 20 November 2022

Accepted: 26 October 2022

Received: 01 March 2022

Author Tags

Qualifiers

Review-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents