Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network
<p>Overall framework of the detection method. Core algorithm ST-GCN consists of BN (batch normalization), GCN (graph convolutional network), TCN (temporal convolutional network), POOL (pooling), and FC (fully connected) layers.</p> "> Figure 2
<p>System architecture of AlphaPose.</p> "> Figure 3
<p>Label order for human body pose keypoints: (<b>a</b>) Original 17 2D keypoint model from human keypoint annotation in COCO dataset, (<b>b</b>) New 13 2D keypoint model without eyes and ears.</p> "> Figure 4
<p>Transfer learning strategy for human skeleton extraction network.</p> "> Figure 5
<p>Schematic diagram of multi-feature skeleton data: (<b>a</b>) Position features, (<b>b</b>) Motion features; (<b>c</b>) Bone features. Red represents the features used in the subplot, while blue represents the features that were not used.</p> "> Figure 6
<p>Representation of a 2D skeleton in different coordinate systems: (<b>a</b>) Cartesian coordinates, (<b>b</b>) Polar coordinates.</p> "> Figure 7
<p>Overall workflow of the fall detection method.</p> "> Figure 8
<p>Constructing a partitioning strategy for adjacency matrices: (<b>a</b>) Single label, an example where green represents a subset of nodes in a graph. (<b>b</b>) Distance, the green nodes represent one category, denoted by the number 0, and the blue nodes represent another category, denoted by the number 1. (<b>c</b>) Spatial, taking the green nodes as an example, yellow nodes are added as a third category, denoted by the number 2. (<b>d</b>) Expanded spatial, taking the green nodes as an example, the number of new nodes has been added to each category.</p> "> Figure 9
<p>Spatial-temporal graph convolutional unit: (<b>a</b>) GCN unit, (<b>b</b>) Multi-scale TCN unit.</p> "> Figure 10
<p>Examples of extracted frames and skeleton keypoints from partial video: (<b>a</b>) Thermal infrared data, (<b>b</b>) Near infrared data. The dots in the figure represent the key points of the skeleton, and the boxes represent the area of the rectangular regions of the human body.</p> "> Figure 11
<p>Example of image data contained in a complete action sample from IR-Fall dataset: (<b>a</b>) Example of TIR fall data, (<b>b</b>) Example of NIR fall data.</p> "> Figure 11 Cont.
<p>Example of image data contained in a complete action sample from IR-Fall dataset: (<b>a</b>) Example of TIR fall data, (<b>b</b>) Example of NIR fall data.</p> "> Figure 12
<p>Comparison of test results of pre-trained and fine-tuned models: (<b>a</b>) NIR image test results using pre-trained model, (<b>b</b>) NIR image test results using fine-tuned model, (<b>c</b>) TIR image test results using pre-trained model, (<b>d</b>) TIR image test results using fine-tuned model.</p> "> Figure 13
<p>Confusion matrix comparison of two-stream model on two datasets: (<b>a</b>) Results on NTU-RGB+D-120 dataset; (<b>b</b>) Results on IR-fall dataset.</p> ">
Abstract
:1. Introduction
- We use the AlphaPose skeleton extraction algorithm to process low-resolution IR videos and significantly reduce computational costs by downsampling through keyframe extraction. To enhance recognition accuracy, we employ transfer learning strategies and propose a tailored human keypoints model for fall analysis, ultimately achieving rapid and accurate extraction of skeleton keypoints from the IR videos;
- To combat the challenge of low accuracy in fall detection, we propose an enhanced representation method for skeletal features and introduce an optimized two-stream ST-GCN. This approach incorporates expanded partitioning strategies and multi-scale temporal convolutions to more effectively capture the relationships between human posture and motion patterns, thereby facilitating accurate recognition of fall actions;
- To enhance the capabilities of the fall detection system in privacy protection, system stability, and generalization, the proposed fall detection method is applicable to both NIR and TIR video sources. This method is capable of meeting the demands for 24-h monitoring and privacy protection while exhibiting robustness and generalization.
2. Methods
2.1. Human Skeleton Extraction Methods
2.2. Data Preprocessing
2.2.1. Data Augmentation
2.2.2. Data Normalization
2.3. ST-GCN-Based Fall Detection
2.3.1. Graph and Graph Convolutional Networks
2.3.2. ST-GCN-Based Fall-Detection Algorithm
3. Experiments
3.1. Dataset
3.2. Experimental Setup
3.3. Evaluation Criteria
3.4. Human Skeleton Extraction Network
3.5. ST-GCN-Based Fall-Detection Network
3.5.1. Effects of Different Skeleton Data Representations on ST-GCN Performance
3.5.2. Effects of GCN and TCN on ST-GCN Performance
3.5.3. Effects of Network Layers and Temporal Windows on ST-GCN Performance
3.5.4. Effects of Different Infrared Datasets on ST-GCN Performance
3.5.5. Comparison with Other Fall Detection Methods
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lu, S.; Shi, Y.; Xie, X.-Y.; Wang, Z.; Wang, Z.-A.; Zhang, J.-E. Incidence and Risk Factors of Falls Among Older People in Nursing Homes: Systematic Review and Meta-Analysis. J. Am. Med. Dir. Assoc. 2023, 24, 1708–1717. [Google Scholar] [CrossRef]
- World Health Organization. Fact Sheet 404 September 2022. 2022. Available online: http://www.who.int/mediacentre/factsheets/fs404/en/ (accessed on 1 October 2022).
- Newaz, N.T.; Hanada, E. The Methods of Fall Detection: A Literature Review. Sensors 2023, 23, 5212. [Google Scholar] [CrossRef] [PubMed]
- Palestra, G.; Rebiai, M.; Courtial, E.; Koutsouris, D. Evaluation of a Rehabilitation System for the Elderly in a Day Care Center. Information 2019, 10, 3. [Google Scholar] [CrossRef]
- Alam, E.; Sufian, A.; Dutta, P.; Leo, M. Vision-based human fall detection systems using deep learning: A review. Comput. Biol. Med. 2022, 146, 105626. [Google Scholar] [CrossRef] [PubMed]
- Kip, F.T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar] [CrossRef]
- Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Sun, K.; Zhang, B.; Jiang, J.; Fu, Y. Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1–14. [Google Scholar] [CrossRef]
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.-E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 172–186. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Y.; Zhang, D.; Yang, L.; Zhou, Z. Fall detection and recognition based on GCN and 2D pose. In Proceedings of the 2019 6th International Conference on Systems and Informatics (ICSAI), Shanghai, China, 2–4 November 2019; pp. 558–562. [Google Scholar] [CrossRef]
- Zheng, H.; Liu, Y. Lightweight Fall Detection Algorithm Based on AlphaPose Optimization Model and ST-GCN. Math. Probl. Eng. 2022, 2022, 9962666. [Google Scholar] [CrossRef]
- Liu, Y.; Deng, Y.; Jia, C.; Yang, Y.D.; Wang, R.; Li, C. Two-stream Graph Convolutional Networks for 2D Skeleton-based Fall Detection. In Proceedings of the 9th International Symposium on Computational Intelligence and Industrial Applications (ISCIIA 2020), Beijing, China, 1–2 November 2020; Available online: https://api.semanticscholar.org/CorpusID:229370732 (accessed on 31 October 2020).
- Keskes, O.; Noumeir, R. Vision-Based Fall Detection Using ST-GCN. IEEE Access 2021, 9, 28224–28236. [Google Scholar] [CrossRef]
- Yang, R.; Cheng, W.; Qian, Q.; Zhang, Q.; QIian, J.; Pan, Y. Fall Detection Algorithm Based on Multi Feature Extraction in Infrared Image. Infrared Technol. 2017, 39, 1131–1138. [Google Scholar]
- Chen, H.; Yu, L.; Peng, S.; Nie, H.; Ou, Q.; Xiong, B. Indoor Human Fall Detection Method Based on Infrared Images and Back-Projection Algorithm. Infrared Technol. 2021, 43, 968–978. [Google Scholar]
- Ramanujam, E.; Padmavathi, S. Real time fall detection using infrared cameras and reflective tapes under day/night luminance. J. Ambient. Intell. Smart Environ. 2021, 13, 405–415. [Google Scholar] [CrossRef]
- Zhou, X.-H.; Yu, L.; He, X.; Chen, H.; Nie, H.; Ou, Q.-F.; Xiong, B.-S. Research on human behavior recognition method in infrared image based on improved ResNet 18. Laser Infrared 2021, 51, 1178–1184. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Yao, C.; Luo, Z.; Liu, G.; Bu, W.; Luo, G. Human behavior recognition in infrared video based on 3D Convolution. Manuf. Autom. 2023, 45, 6–9. Available online: http://journal.cmanu-automation.cn/zzyzdh/CN/Y2023/V45/I11/6 (accessed on 10 January 2022).
- Maray, N.; Ngu, A.H.; Ni, J.; Debnath, M.; Wang, L. Transfer learning on small datasets for improved fall detection. Sensors 2023, 23, 1105. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Zürich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar] [CrossRef]
- Yang, F.; Wu, Y.; Sakti, S.; Nakamura, S. Make Skeleton-based Action Recognition Model Smaller, Faster and Better. In Proceedings of the 1st ACM International Conference on Multimedia in Asia, MMAsia ’19, Association for Computing Machinery, New York, NY, USA, 15–18 December 2019. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12018–12027. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, Z.; Yuan, C.; Li, B.; Deng, Y.; Hu, W. Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 13339–13348. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Shahroudy, A.; Liu, J.; Ng, T.-T.; Wang, G. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1010–1019. [Google Scholar] [CrossRef]
- Li, M.; Chen, S.; Chen, X.; Zhang, Y.; Wang, Y.; Tian, Q. Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3590–3598. [Google Scholar] [CrossRef]
Category | Image Resolution | Train | Test |
---|---|---|---|
Near infrared | 512 × 424 | 720 | 160 |
Thermal infrared | 160 × 120 | 320 | 60 |
Category | Description | Train | Test |
---|---|---|---|
FALL | Fall down | 630 | 316 |
ADLs | Stand up | 620 | 316 |
Sit down | 626 | 315 | |
Squat down | 638 | 320 | |
Jump up | 632 | 316 | |
Hopping | 631 | 316 | |
Drop | 626 | 316 | |
Hand waving | 628 | 316 | |
Kicking something | 630 | 316 | |
Arm swings | 640 | 320 | |
Run on the foot | 640 | 320 |
Category | Description | Train | Test | ||
---|---|---|---|---|---|
TIR | NIR | TIR | NIR | ||
FALL | Fall down | 600 | 105 | 130 | 25 |
ADLs | Sit down | 600 | 77 | 240 | 18 |
Stand up | 650 | 84 | 170 | 22 | |
Lying | 586 | 82 | 129 | 16 | |
Sit | 747 | 67 | 176 | 20 | |
Walk | 518 | 199 | 180 | 36 |
Name | Configuration Information |
---|---|
Processor | Intel Xeon 4110*2 (from Intel Corporation in Santa Clara, CA, USA) |
Memory | 32.0 GB |
Operating system | Centos7 |
GPU support | Nvidia Gv100, CUDA 9.0, cuDNN 7.0 |
Branch | Params/MB | FPS | FLOPS/G | mAP |
---|---|---|---|---|
AlphaPose (FastRes50) | 38.70 | 33.79 | 20.87 | 0.77 |
AlphaPose (Res50) | 34.00 | 33.17 | 34.16 | 0.83 |
AlphaPose (FastRes152) | 79.30 | 19.59 | 56.43 | 0.87 |
Branch | NIR | TIR |
AlphaPose (FastRes50) | 0.8317 | 0.7173 |
AlphaPose (Res50) | 0.8525 | 0.7469 |
AlphaPose (FastRes152) | 0.8776 | 0.7649 |
Coordinates | Features | NTU-RGB+D-120 | IR-Fall |
---|---|---|---|
One-Stream (Multi-scale TCN unit/ Expanded spatial) | Positions | 0.8996 | 0.9369 |
Motions | 0.9182 | 0.6723 | |
Bones | 0.9081 | 0.9595 | |
Polar | 0.9168 | 0.9535 | |
Two-Stream (Multi-scale TCN unit/ Expanded spatial) | Positions & Motions | 0.9464 | 0.9468 |
Positions & Bones | 0.9085 | 0.9502 | |
Motions & Bones | 0.9268 | 0.9476 | |
Polar & Positions | 0.9217 | 0.9552 | |
Polar & Motions | 0.9188 | 0.9527 | |
Polar & Bones | 0.9257 | 0.9637 |
Strategies | NTU-RGB+D-120 | IR-Fall |
---|---|---|
Single label | 0.9271 | 0.9257 |
Distance | 0.9289 | 0.9527 |
Spatial | 0.9312 | 0.9556 |
Expanded spatial | 0.9464 | 0.9637 |
Branch | NTU-RGB+D-120 | IR-Fall |
---|---|---|
Baseline | 0.9134 | 0.9468 |
Conv-B | 0.9409 | 0.9510 |
Conv-B & Pool-B | 0.9369 | 0.9519 |
Conv-B & Org-B | 0.9320 | 0.9578 |
Conv-B & Pool-B & Org-B | 0.9464 | 0.9637 |
NTU-RGB-D-120 | IR-Fall | Params/MB | |
---|---|---|---|
Layer = 10, depth = 256 | 0.9340 | 0.9502 | 2.410 |
Layer = 9, depth = 256 | 0.9464 | 0.9637 | 1.766 |
Layer = 8, depth = 256 | 0.9335 | 0.9586 | 1.121 |
Layer = 7, depth = 128 | 0.9395 | 0.9527 | 0.5987 |
Layer = 6, depth = 128 | 0.9266 | 0.9525 | 0.5577 |
Layer = 5, depth = 128 | 0.9200 | 0.9578 | 0.2745 |
Layer = 4, depth = 64 | 0.9332 | 0.9611 | 0.1422 |
Layer = 3, depth = 64 | 0.9180 | 0.9535 | 0.1012 |
Layer = 2, depth = 64 | 0.9005 | 0.9459 | 0.0602 |
Frame Window | Accuracy of ST-GCN | Speed of the Entire Algorithm (FPS) | |
---|---|---|---|
NTU-RGB-D-120 | IR-Fall | ||
9 | 0.8535 | 0.9358 | 35.3 |
12 | 0.9289 | 0.9181 | 32.3 |
15 | 0.9349 | 0.9527 | 29.9 |
18 | 0.9464 | 0.9637 | 25.5 |
24 | 0.9498 | 18.8 | |
30 | 0.9552 | 11.2 |
Coordinates | Features | NIR | TIR |
---|---|---|---|
One-Stream (Multi-scale TCN unit/ Expanded spatial) | Positions | 0.9431 | 0.8812 |
Motions | 0.6219 | 0.7188 | |
Bones | 0.9403 | 0.8125 | |
Polar | 0.9521 | 0.8875 | |
Two-Stream (Multi-scale TCN unit/ Expanded spatial) | Positions & Motions | 0.9343 | 0.8313 |
Positions & Bones | 0.9421 | 0.7875 | |
Motions & Bones | 0.9510 | 0.8313 | |
Polar & Positions | 0.9609 | 0.8625 | |
Polar & Motions | 0.9589 | 0.8375 | |
Polar & Bones | 0.9628 | 0.9187 |
Model Structure | Features | NIR | TIR |
---|---|---|---|
Polar & Bones/ Multi-scale TCN unit/ Expanded spatial | Fall down | 0.9845 | 1.0000 |
Lying | 0.9845 | 1.0000 | |
Sit | 0.9489 | 0.8500 | |
Walk | 0.9333 | 0.9167 | |
Sit down | 0.9536 | 0.9524 | |
Stand up | 0.9826 | 0.9534 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, J.; He, Y.; Zhu, J.; Lv, Z.; Jin, W. Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network. Sensors 2024, 24, 4647. https://doi.org/10.3390/s24144647
Yang J, He Y, Zhu J, Lv Z, Jin W. Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network. Sensors. 2024; 24(14):4647. https://doi.org/10.3390/s24144647
Chicago/Turabian StyleYang, Junkai, Yuqing He, Jingxuan Zhu, Zitao Lv, and Weiqi Jin. 2024. "Fall Detection Method for Infrared Videos Based on Spatial-Temporal Graph Convolutional Network" Sensors 24, no. 14: 4647. https://doi.org/10.3390/s24144647