Sleep Action Recognition Based on Segmentation Strategy
<p>Self-attention mechanism module.</p> "> Figure 2
<p>Clip-level features stacked into video-level features. <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>1</mn> </msub> </mrow> </semantics></math>–<math display="inline"><semantics> <mrow> <mo> </mo> <msub> <mi>x</mi> <mi>k</mi> </msub> </mrow> </semantics></math> are the segment-level features of sequence 1 − <span class="html-italic">k</span> output through segment module; <math display="inline"><semantics> <mrow> <msub> <mi>s</mi> <mn>1</mn> </msub> <mo> </mo> </mrow> </semantics></math>–<math display="inline"><semantics> <mrow> <mo> </mo> <mo> </mo> <msub> <mi>s</mi> <mi>k</mi> </msub> </mrow> </semantics></math> are frame-level features of the image in each segment.</p> "> Figure 3
<p>Overall structure of video classification network in this paper; <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>−</mo> <msub> <mi>x</mi> <mi>n</mi> </msub> </mrow> </semantics></math> represents fragment-level features (LSTM: long short-term memory).</p> "> Figure 4
<p>Image collection effect of laboratory monitoring camera.</p> "> Figure 5
<p>Video data set construction process.</p> "> Figure 6
<p>Schematic diagram of image sequence of sleeping behavior and daily behavior.</p> "> Figure 7
<p>Curve of benchmark network training results.</p> "> Figure 8
<p>Curve of improved network training results.</p> "> Figure 9
<p>Heat map of network features before and after improvement. The subgraph (<b>a</b>) is the thermal diagram of the original network, and the subgraph (<b>b</b>) is the thermal diagram of the proposed method; subgraph (<b>c</b>–<b>e</b>) show the effect comparison between the original method and ours (red circles).</p> "> Figure 10
<p>Sleep video detection results. Among them, ① ② and ③ were the target person to be tested (red squares).</p> ">
Abstract
:1. Introduction
- -
- An overview of methods, models and algorithms used in personnel behavior recognition based on deep learning;
- -
- A sleeping post data set built under the monitoring situation for the training of a sleeping personnel post task network model;
- -
- For the task of recognizing sleeping personnel behavior, a sleeping behavior recognition algorithm based on a self-attention time convolution network for monitoring data is proposed: 1. In the feature extraction stage of a CNN network, the self-attention mechanism module is added to obtain the fine-grained features of the image, so that the feature extraction network pays more attention to the movement features around the people, solving the problems caused by difficulty in extracting the fine-grained features of people’s sleeping behaviors. 2. In the feature fusion stage, a video segmentation strategy is proposed, which uses segment-level features to stack into video-level features to effectively solve the problem of long-distance dependence of spatiotemporal information in long video feature fusion.
2. Related Work
3. Method
3.1. Local Feature under Self-Attention Mechanism
3.2. Fragment-Level Feature Fusion Module
3.3. Network
4. Experiment
4.1. Experimental Platform
4.2. Data Set Creation
4.3. Description of Evaluation Indicators
4.4. Results and Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zeng, T.; Huang, D. An overview of abnormal behavior detection algorithms in intelligent video surveillance systems. Comput. Meas. Control 2021, 29, 7. [Google Scholar]
- Xie, S.; Zhang, X.; Cai, J. Video crowd detection and abnormal behavior model detection based on machine learning method. Neural Comput. Appl. 2019, 31, 175–184. [Google Scholar] [CrossRef]
- Shen, M.; Jiang, X.; Sun, T. Anomaly detection based on Nearest Neighbor search with Locality-Sensitive B-tree. Neurocomputing 2018, 289, 55–67. [Google Scholar] [CrossRef]
- Hu, X.; Huang, Y.; Duan, Q.; Ci, W.; Dai, J.; Yang, H. Abnormal event detection in crowded scenes using histogram of oriented contextual gradient descriptor. EURASIP J. Adv. Signal Process. 2018, 2018, 54. [Google Scholar] [CrossRef] [Green Version]
- Xu, K.; Jiang, X.; Sun, T. Anomaly Detection Based on Stacked Sparse Coding with Intraframe Classification Strategy. IEEE Trans. Multimed. 2018, 20, 1062–1074. [Google Scholar] [CrossRef]
- Sabokrou, M.; Fayyaz, M.; Fathy, M.; Moayed, Z.; Klette, R. Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Comput. Vis. Image Underst. 2018, 172, 88–97. [Google Scholar] [CrossRef] [Green Version]
- Cosar, S.; Donatiello, G.; Bogorny, V.; Garate, C.; Alvares, L.O.; Bremond, F. Toward Abnormal Trajectory and Event Detection in Video Surveillance. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 683–695. [Google Scholar] [CrossRef]
- Ye, O.; Deng, J.; Yu, Z.; Liu, T.; Dong, L. Abnormal Event Detection via Feature Expectation Subgraph Calibrating Classification in Video Surveillance Scenes. IEEE Access 2020, 8, 97564–97575. [Google Scholar] [CrossRef]
- Mehran, R.; Oyama, A.; Shah, M. Abnormal crowd behavior detection using social force model. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 935–942. [Google Scholar]
- Fernando, T.; Denman, S.; Sridharan, S.; Fookes, C. Soft + Hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection. Neural Netw. 2018, 108, 466–478. [Google Scholar] [CrossRef] [Green Version]
- Ullah, A.; Muhammad, K.; Ser, J.D.; Baik, S.W.; Albuquerque, V.H.C.d. Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM. IEEE Trans. Ind. Electron. 2019, 66, 9692–9702. [Google Scholar] [CrossRef]
- Martinel, N.; Micheloni, C.; Piciarelli, C.; Foresti, G.L. Camera Selection for Adaptive Human-Computer Interface. IEEE Trans. Syst. Man Cybern. Syst. 2014, 44, 653–664. [Google Scholar] [CrossRef] [Green Version]
- Sabokrou, M.; Fathy, M.; Hosseini, M.; Klette, R. Real-time anomaly detection and localization in crowded scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 11–12 June 2015; pp. 56–62. [Google Scholar]
- Wang, H.; Schmid, C. Action Recognition with Improved Trajectories. In Proceedings of the ICCV—IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 3551–3558. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Luo, W.; Liu, W.; Gao, S. Remembering history with convolutional LSTM for anomaly detection. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 439–444. [Google Scholar]
- Donahue, J.; Hendricks, L.A.; Rohrbach, M.; Venugopalan, S.; Guadarrama, S.; Saenko, K.; Darrell, T. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 677–691. [Google Scholar] [CrossRef] [PubMed]
- Ullah, A.; Ahmad, J.; Muhammad, K.; Sajjad, M.; Baik, S.W. Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features. IEEE Access 2018, 6, 1155–1166. [Google Scholar] [CrossRef]
- Gammulle, H.; Denman, S.; Sridharan, S.; Fookes, C. Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 177–186. [Google Scholar]
- Li, Q.; Qiu, Z.; Yao, T.; Tao, M.; Luo, J. Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, New York, NY, USA, 6–9 June 2016. [Google Scholar]
- Li, Z.; Gavrilyuk, K.; Gavves, E.; Jain, M.; Snoek, C.G.M. VideoLSTM convolves, attends and flows for action recognition. Comput. Vis. Image Underst. 2018, 166, 41–50. [Google Scholar] [CrossRef] [Green Version]
- Ma, C.-Y.; Chen, M.-H.; Kira, Z.; AlRegib, G. TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition. Signal Process. Image Commun. 2019, 71, 76–87. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Two-Stream Convolutional Networks for Action Recognition in Videos. Adv. Neural Inf. Process. Syst. 2014, 27, 568–576. [Google Scholar]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Gool, L.V. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 20–36. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Wildes, R.P.J.I. Spatiotemporal Residual Networks for Video Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. SlowFast Networks for Video Recognition. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Moghadam, S.M.; Nevalainen, P.; Stevenson, N.J.; Vanhatalo, S. Sleep State Trend (SST), a bedside measure of neonatal sleep state fluctuations based on single EEG channels. Clin. Neurophysiol. 2022, 143, 75–83. [Google Scholar] [CrossRef]
- Andrillon, T.; Solelhac, G.; Bouchequet, P.; Romano, F.; LeBrun, M.P.; Brigham, M.; Chennaoui, M.; Léger, D. Leveraging machine learning to identify the neural correlates of insomnia with and without sleep state misperception. J. Sleep Med. 2022, 100, S129. [Google Scholar] [CrossRef]
- Zhang, X.; Landsness, E.C.; Chen, W.; Miao, H.; Tang, M.; Brier, L.M.; Culver, J.P.; Lee, J.M.; Anastasio, M.A. Automated sleep state classification of wide-field calcium imaging data via multiplex visibility graphs and deep learning. J. Neurosci. Methods 2022, 366, 109421. [Google Scholar] [CrossRef]
- Yan, X.; Lv, W.; Hua, W. Statistical analysis of college students’ sleeping behavior in class based on video data. Ind. Control Comput. 2018, 31. 122-123+126. [Google Scholar]
- Shuwei, Z. Research and Application of Human Behavior Recognition Algorithm for Intelligent Security Scene. Master’s Thesis, Xi’an University of Electronic Technology, Xi’an, China, 2021. [Google Scholar]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.J.A. Attention is All you Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Tan, Z.; Wang, M.; Xie, J.; Chen, Y.; Shi, X.J.A. Deep Semantic Role Labeling with Self-Attention. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Frame Interval | Accuracy (%) |
---|---|
5 frames | 89.20 |
30 frames | 84.82 |
5 and 30 frames | 85.45 |
Samples | Behavior | |
---|---|---|
Sleep | Other Sitting Behavior | |
Training set | 1036 | 952 |
Validation set | 222 | 204 |
Test set | 222 | 204 |
Total samples | 1480 | 1360 |
Frame Interval | Accuracy (%) | GFLOPs | Params (M) |
---|---|---|---|
Four frames | 93.65 | 64.84 | 38.37 |
Eight frames | 94.71 | ||
Sixteen frames | 93.06 |
Net | Accuracy (%) | GFLOPs | Params (M) |
---|---|---|---|
CNN-LSTM (BackBone) | 88.02 | 64.38 | 25.78 |
+Segmentation strategy | 92.08 | 64.30 | 25.78 |
+Attention mechanism | 90.67 | 64.91 | 38.37 |
+Both | 94.71 | 64.84 | 38.37 |
Net | Accuracy (%) | GFLOPs | Params (M) |
---|---|---|---|
SLOWONLY | 75.25 | 82.25 | 32.45 |
C3D | 92.86 | - | 78.00 |
SLOWFAST | 93.38 | - | 34.48 |
TSN | 92.50 | - | 24.74 |
OURS | 94.71 | 64.84 | 38.37 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, X.; Cui, Y.; Xu, G.; Chen, H.; Zeng, J.; Li, Y.; Xiao, J. Sleep Action Recognition Based on Segmentation Strategy. J. Imaging 2023, 9, 60. https://doi.org/10.3390/jimaging9030060
Zhou X, Cui Y, Xu G, Chen H, Zeng J, Li Y, Xiao J. Sleep Action Recognition Based on Segmentation Strategy. Journal of Imaging. 2023; 9(3):60. https://doi.org/10.3390/jimaging9030060
Chicago/Turabian StyleZhou, Xiang, Yue Cui, Gang Xu, Hongliang Chen, Jing Zeng, Yutong Li, and Jiangjian Xiao. 2023. "Sleep Action Recognition Based on Segmentation Strategy" Journal of Imaging 9, no. 3: 60. https://doi.org/10.3390/jimaging9030060