Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition
<p>The structure of the spatiotemporal interaction residual network with pseudo3D (STINP). The STINP consists of two branches, the spatial branch and the temporal branch. The spatial branch aims to obtain the features of the scene and objects in the individual frames of the video, where the green arrows represent introducing the pseudo3D structure to extract the interactive relationship among the consecutive frames. The temporal branch employs the optical flow frames as input to obtain the dynamic information of the video.</p> "> Figure 2
<p>The different structures of the spatial branch developed for the STINP: (<b>a</b>) is the spatial branch in STINP-1, and (<b>b</b>) is the spatial branch in STINP-2. The yellow blocks represent the 2D convolutional filter, and the blue blocks represent the 1D convolutional filter.</p> "> Figure 3
<p>The structure of the temporal branch of the STINP. The yellow block denotes the 2D spatial convolutional filter, and the blue block represents the 1D temporal convolutional filter.</p> "> Figure 4
<p>The structure of the proposed STINP. (<b>a</b>) STINP-1 and (<b>b</b>) STINP-2.</p> "> Figure 5
<p>Examples of videos from the UCF101 dataset.</p> "> Figure 6
<p>Examples of videos from the HMBD51 dataset.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Hand-Crafted Feature-Based Methods
2.2. Deep Learning Architecture-Based Methods
3. Method
3.1. Residual Network
3.2. Spatial Branch
3.3. Temporal Branch
3.4. Combination of the Spatial and Temporal Branches
4. Experiments
4.1. Datasets
4.2. Experimental Setup
4.3. Experimental Results and Analysis
4.3.1. Analyzing the Performances of STINP-1 and STINP-2
4.3.2. Comparing STINP with the State-of-the-Art
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Fernando, B.; Gavves, E.; Oramas, M.J.O.; Ghodrati, A.; Tuytelaars, T. Rank Pooling for Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 773–787. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, H.; Vial, R.; Lu, S. TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5814–5822. [Google Scholar]
- Papadopoulos, G.T.; Axenopoulos, A.; Daras, P. Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data. In Proceedings of the International Conference on Multimedia Modeling, Dublin, Ireland, 8–10 January 2014; Volume 8325, pp. 473–483. [Google Scholar]
- Ziaeefard, M.; Bergevin, R. Semantic human activity recognition: A literature review. Pattern Recognit. 2015, 48, 2329–2345. [Google Scholar] [CrossRef]
- Kong, Y.; Fu, Y. Action Recognition and Prediction: A Survey Human. arXiv 2018, arXiv:1806.11230. [Google Scholar]
- Papadopoulos, K.; Demisse, G.; Ghorbel, E.; Antunes, M.; Aouada, D.; Ottersten, B. Localized Trajectories for 2D and 3D Action Recognition. Sensors 2019, 19, 3503. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Qiu, Z.; Yao, T.; Mei, T. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5534–5542. [Google Scholar]
- Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; Li, F.-F. Large-Scale Video Classification with Convolutional Neural Networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 1725–1732. [Google Scholar]
- Nazir, S.; Yousaf, M.H.; Nebel, J.-C.; Velastin, S.A. Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action Recognition. Sensors 2019, 19, 2790. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wei, H.; Jafari, R.; Kehtarnavaz, N. Fusion of Video and Inertial Sensing for Deep Learning–Based Human Action Recognition. Sensors 2019, 19, 3680. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Schneiderman, H.; Kanade, T. Object Detection Using the Statistics of Parts. Int. J. Comput. Vis. 2004, 56, 151–177. [Google Scholar] [CrossRef] [Green Version]
- Li, C.; Wang, P.; Wang, S.; Hou, Y.; Li, W. Skeleton-based action recognition using LSTM and CNN. In Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China, 10–14 July 2017; pp. 585–590. [Google Scholar]
- Park, E.; Han, X.; Berg, T.L.; Berg, A.C. Combining multiple sources of knowledge in deep CNNs for action recognition. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–9 March 2016; pp. 1–8. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Wildes, R.P. Temporal Residual Networks for Dynamic Scene Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7435–7444. [Google Scholar]
- Simonyan, K.; Zisserman, A. Two-Stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems, Cambridge, MA, USA, 8–13 December 2014; pp. 568–576. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef] [Green Version]
- Feichtenhofer, C.; Pinz, A.; Wildes, R.P. Spatiotemporal Multiplier Networks for Video Action Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7445–7454. [Google Scholar]
- Baccouche, M.; Mamalet, F.; Wolf, C.; Garcia, C.; Baskurt, A. Sequential Deep Learning for Human Action Recognition. In Proceedings of the Applications of Evolutionary Computation, Amsterdam, The Netherlands, 16 November 2011; Volume 7065, pp. 29–39. [Google Scholar]
- Yunpeng, C.; Kalantidis, Y.; Li, J.; Yan, S.; Feng, J. Multi-fiber Networks for Video Recognition. In Proceedings of the Applications of Evolutionary Computation, Munich, Germany, 8–14 September 2018; pp. 364–380. [Google Scholar]
- Zhang, S.; Wei, Z.; Nie, J.; Huang, L.; Wang, S.; Li, Z. A Review on Human Activity Recognition Using Vision-Based Method. J. Heal. Eng. 2017, 2017, 1–31. [Google Scholar] [CrossRef]
- Ali, S.; Basharat, A.; Shah, M. Chaotic Invariants for Human Action Recognition. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–20 October 2007; pp. 1–8. [Google Scholar]
- Bobick, A.F.; Davis, J.W. The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 257–267. [Google Scholar] [CrossRef] [Green Version]
- Gorelick, L.; Blank, M.; Shechtman, E.; Irani, M.; Basri, R. Actions as Space-Time Shapes. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2247–2253. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Laptev, I. On Space-Time Interest Points. Int. J. Comput. Vis. 2005, 64, 107–123. [Google Scholar] [CrossRef]
- Willems, G.; Tuytelaars, T.; Van Gool, L. An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008; Volume 5303, pp. 650–663. [Google Scholar]
- Dollár, P.; Rabaud, V.; Cottrell, G.; Belongie, S. Behavior Recognition via Sparse Spatio-Temporal Features. In Proceedings of the 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China, 15–16 October 2005; pp. 65–72. [Google Scholar] [CrossRef] [Green Version]
- Rodriguez, M.D.; Ahmed, J.; Shah, M. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Niebles, J.C.; Li., F.-F. A Hierarchical Model of Shape and Appearance for Human Action Classification. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MI, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. In Proceedings of the Advances in Neural Information Processing Systems, Cambridge, MA, USA, 8–13 December 2014; Volume 3, pp. 2672–2680. [Google Scholar]
- Lv, F.; Nevatia, R. Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; Volume 3954, pp. 359–372. [Google Scholar]
- Savarese, S.; Delpozo, A.; Niebles, J.C.; Li., F.-F. Spatial-Temporal correlatons for unsupervised action classification. In Proceedings of the 2008 IEEE Workshop on Motion and video Computing, Copper Mountain, CO, USA, 8–9 January 2008; pp. 1–8. [Google Scholar] [CrossRef]
- Ghojogh, B.; Mohammadzade, H.; Mokari, M. Fisherposes for Human Action Recognition Using Kinect Sensor Data. IEEE Sens. J. 2018, 18, 1612–1627. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Pdf ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Lee, C.-Y.; Gallagher, P.W.; Tu, Z. Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Proceedings of the Artificial intelligence and statistics, Cadiz, Spain, 9–11 May 2016; pp. 464–472. [Google Scholar]
- Xu, Z.; Yang, Y.; Hauptmann, A.G. A discriminative CNN video representation for event detection. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1798–1807. [Google Scholar]
- Girdhar, R.; Ramanan, D.; Gupta, A.; Sivic, J.; Russell, B. ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3165–3174. [Google Scholar]
- Schuster, M.; Paliwal, K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
- Carreira, J.; Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4724–4733. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Convolutional Two-Stream Network Fusion for Video Action Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1933–1941. [Google Scholar]
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild. arXiv 2014, arXiv:1212.0402. [Google Scholar]
- Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A large video database for human motion recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li., F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Wang, X.; Farhadi, A.; Gupta, A. Actions~transformations. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2658–2667. [Google Scholar]
- Sun, L.; Jia, K.; Yeung, D.-Y.; Shi, B.E. Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4597–4605. [Google Scholar]
- Wang, H.; Schmid, C. Action Recognition with Improved Trajectories. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 3551–3558. [Google Scholar]
- Donahue, J.; Hendricks, L.A.; Rohrbach, M.; Venugopalan, S.; Guadarrama, S.; Saenko, K.; Darrell, T. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 677–691. [Google Scholar] [CrossRef] [PubMed]
- Srivastava, N.; Mansimov, E.; Salakhudinov, R. Unsupervised learning of video representations using lstms. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 843–852. [Google Scholar]
- Ng, J.Y.-H.; Hausknecht, M.; Vijayanarasimhan, S.; Vinyals, O.; Monga, R.; Toderici, G. Beyond short snippets: Deep networks for video classification. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4694–4702. [Google Scholar]
- Tran, D.; Ray, J.; Shou, Z.; Chang, S.-F.; Paluri, M. Convnet architecture search for spatiotemporal feature learning. arXiv 2017, arXiv:1708.05038. [Google Scholar]
- Bilen, H.; Fernando, B.; Gavves, E.; Vedaldi, A.; Gould, S. Dynamic Image Networks for Action Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 3034–3042. [Google Scholar]
- Yang, H.; Yuan, C.; Li, B.; Du, Y.; Xing, J.; Hu, W.; Maybank, S.J. Asymmetric 3d convolutional neural networks for action recognition. Pattern Recognit. 2019, 85, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Diba, A.; Fayyaz, M.; Sharma, V.; Karami, A.H.; Arzani, M.M.; Yousefzadeh, R.; Van Gool, L. Temporal 3d Convnets: New Architecture and Transfer Learning for Video Classification. arXiv 2017, arXiv:1711.08200. [Google Scholar]
- Wang, L.; Qiao, Y.; Tang, X. Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 4305–4314. [Google Scholar]
- Li, Z.; Gavrilyuk, K.; Gavves, E.; Jain, M.; Snoek, C. VideoLSTM convolves, attends and flows for action recognition. Comput. Vis. Image Underst. 2018, 166, 41–50. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Wang, S.; Tang, J.; O’Hare, N.; Chang, Y.; Li, B. Hierarchical Attention Network for Action Recognition in Videos. arXiv 2016, arXiv:1607.06416. [Google Scholar]
- Yuan, Y.; Zhao, Y.; Wang, Q. Action recognition using spatial-optical data organization and sequential learning framework. Neurocomputing 2018, 315, 221–233. [Google Scholar] [CrossRef]
- Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Van Gool, L. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Volume 9912, pp. 20–36. [Google Scholar]
- Chen, E.; Bai, X.; Gao, L.; Tinega, H.C.; Ding, Y. A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition. IEEE Access 2019, 7, 57267–57275. [Google Scholar] [CrossRef]
- Sun, S.; Kuang, Z.; Sheng, L.; Ouyang, W.; Zhang, W. Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake, UT, USA, 18–22 June 2018; pp. 1390–1399. [Google Scholar]
Layer Name | Blocks |
---|---|
conv1 | 7 × 7 × 1,64 |
pool1 | 3 × 3 × 1 max stride 2 |
conv2_i | |
conv3_i | |
conv4_i | |
conv5_i | |
Pool5 | 7 × 7 × 1 average |
Model | Branch | UCF101 | HMDB51 |
---|---|---|---|
STINP-1 | Spatial branch-1 | 84.00% | 53.20% |
Temporal branch | 86.00% | 62.10% | |
Fusion | 93.40% | 66.70% | |
STINP-2 | Spatial branch-2 | 83.20% | 53.00% |
Temporal branch | 86.00% | 62.10% | |
Fusion | 93.00% | 67.10% |
Model | Branch | UCF101 | HMDB51 |
---|---|---|---|
STINP-1 | Spatial branch-1 | 89.80% | 61.60% |
Temporal branch | 86.40% | 60.80% | |
Fusion | 94.40% | 69.60% | |
STINP-2 | Spatial branch-2 | 87.50% | 59.00% |
Temporal branch | 86.60% | 60.20% | |
Fusion | 94.00% | 69.00% |
Model | Branch | UCF101 | HMDB51 |
---|---|---|---|
STINP-1 | Spatial branch-1 | 86.30% | 54.10% |
Temporal branch | 85.00% | 61.80% | |
Fusion | 93.60% | 68.70% | |
STINP-2 | Spatial branch-2 | 85.80% | 53.80% |
Temporal branch | 85.00% | 61.20% | |
Fusion | 93.50% | 68.50% |
Model | Branch | UCF101 | HMDB51 |
---|---|---|---|
STINP-1 | Spatial branch-1 | 85.80% | 56.60% |
Temporal branch | 86.10% | 60.00% | |
Fusion | 93.70% | 67.80% | |
STINP-2 | Spatial branch-2 | 86.20% | 55.80% |
Temporal branch | 84.50% | 58.80% | |
Fusion | 93.30% | 68.00% |
Model | UCF101 | HMDB51 |
---|---|---|
STINP-1 | 99.50% | 91.60% |
STINP-2 | 98.80% | 91.00% |
Methods | UCF101 | HMDB51 |
---|---|---|
IDT [53] | 86.40% | 61.70% |
Spatiotemporal ConvNet [8] | 65.40% | — |
Long-term recurrent ConvNet [54] | 82.90% | — |
Composite LSTM Model [55] | 84.30% | 44.00% |
Two-Stream ConvNet [17] | 88.00% | 59.40% |
P3D ResNets (Without IDT) [7] | 88.60% | — |
Two-Stream+LSTM [56] | 88.60% | — |
C3D [42] | 85.20% | - |
Res3D [57] | 85.80% | 54.90% |
Dynamic Image Networks [58] | 76.90% | 42.80% |
Dynamic Image Networks + IDT [58] | 89.10% | 65.20% |
Asymmetric 3D-CNN (RGB+RGBF+IDT) [59] | 92.60% | 65.40% |
T3D [60] | 93.20% | 63.50% |
TDD+IDT [61] | 91.50% | 65.90% |
Conv Fusion (Without IDT) [47] | 92.50% | 65.40% |
Transformations [51] | 92.40% | 62.00% |
VideoLSTM + IDT [62] | 92.20% | 64.90% |
Hierarchical Attention Networks [63] | 92.70% | 64.30% |
Spatiotemporal Multiplier ConvNet [19] | 94.20% | 68.90% |
Sequential Learning Framework [64] | 90.90% | 65.70% |
T-ResNets (Without IDT) [16] | 93.90% | 67.20% |
TSN (2 modalities) [65] | 94.00% | 68.50% |
Spatiotemporal Heterogeneous Two-stream Network [66] | 94.40% | 67.20% |
Our proposed STINP | 94.40% | 69.60% |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.; Kong, J.; Sun, H.; Xu, H.; Liu, X.; Lu, Y.; Zheng, C. Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition. Sensors 2020, 20, 3126. https://doi.org/10.3390/s20113126
Chen J, Kong J, Sun H, Xu H, Liu X, Lu Y, Zheng C. Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition. Sensors. 2020; 20(11):3126. https://doi.org/10.3390/s20113126
Chicago/Turabian StyleChen, Jianyu, Jun Kong, Hui Sun, Hui Xu, Xiaoli Liu, Yinghua Lu, and Caixia Zheng. 2020. "Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition" Sensors 20, no. 11: 3126. https://doi.org/10.3390/s20113126
APA StyleChen, J., Kong, J., Sun, H., Xu, H., Liu, X., Lu, Y., & Zheng, C. (2020). Spatiotemporal Interaction Residual Networks with Pseudo3D for Video Action Recognition. Sensors, 20(11), 3126. https://doi.org/10.3390/s20113126