A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition
<p>Categorization of different level of activities.</p> "> Figure 2
<p>Example of handcrafted representation-based approach.</p> "> Figure 3
<p>Example of learning-based representation approach.</p> "> Figure 4
<p>Traditional action representation and recognition approach.</p> "> Figure 5
<p>Components of space-time-based approaches. ESURF: Enhanced Speeded-Up Robust Features; HOG: histogram of oriented gradients; HOF: histogram of optical flow; MBH: motion boundary histogram; BOW: bag-of-words; BoVW: Bag-of-Visual-Words; FV: Fisher Vector; SFV: Stacked Fisher Vector; HMM: Hidden Markov Model; DBAN: Dynamic Bayesian Action Network; SVM: support vector machine; ANN: Artificial Neural Network; LDA: Latent Dirichlet Allocation; SOM: Self Organizing Map; VQ: vector quantization.</p> "> Figure 6
<p>Example of Fuzzy view estimation framework.</p> "> Figure 7
<p>Learning-based action representation approaches.</p> "> Figure 8
<p>Convolutional Neural Networks layers (source [<a href="#B146-applsci-07-00110" class="html-bibr">146</a>]).</p> "> Figure 9
<p>Two-stream Convolutional Neural Network (CNN) architecture (Source [<a href="#B148-applsci-07-00110" class="html-bibr">148</a>]).</p> "> Figure 10
<p>An example of stratified pooling with CNN.</p> "> Figure 11
<p>One frame example of each action in Weizmann dataset.</p> "> Figure 12
<p>One frame example of each action from four different scenarios in the KTH dataset.</p> "> Figure 13
<p>One frame example for each action from five different camera views in IXMAS (INRIA Xmas Motion Acquisition Sequences) dataset.</p> "> Figure 14
<p>(<b>a</b>) Exemplar frames for action 1–28 from HMDB (Human Motion Database)-51 action dataset; (<b>b</b>) Exemplar frames for action 29–51 from HMDB-51 action dataset.</p> "> Figure 15
<p>Exemplar frames from Hollywood2 dataset.</p> "> Figure 16
<p><b>(a)</b> Exemplar frames for actions 1–57 from UCF-101 dataset; (<b>b</b>) Exemplar frames for actions 58–101 from UCF-101 dataset.</p> "> Figure 16 Cont.
<p><b>(a)</b> Exemplar frames for actions 1–57 from UCF-101 dataset; (<b>b</b>) Exemplar frames for actions 58–101 from UCF-101 dataset.</p> "> Figure 17
<p>Exemplar frames from sports action dataset.</p> "> Figure 18
<p>Exemplar frames of 11 sports actions from YouTube action dataset.</p> "> Figure 19
<p>Exemplar frames from ActivityNet dataset.</p> ">
Abstract
:1. Introduction
2. Handcrafted Representation-Based Approach
2.1. Space-Time-Based Approaches
2.1.1. Space-Time Volumes (STVs)
2.1.2. Space-Time Trajectory
2.1.3. Space-Time Features
2.1.4. Discussion
2.2. Appearance-Based Approaches
2.2.1. Shape-Based Approaches
2.2.2. Motion-Based Approaches
2.2.3. Hybrid Approaches
2.3. Other Approaches
2.3.1. Local Binary Pattern (LBP)-Based Approaches
2.3.2. Fuzzy Logic-Based Approaches
2.3.3. Discussion
3. Learning-Based Action Representation Approach
3.1. Non-Deep Learning-Based Approaches
3.1.1. Dictionary Learning-Based Approaches
3.1.2. Genetic Programming
3.2. Deep Learning-Based Approaches
3.2.1. Generative/Unsupervised Models
3.2.2. Discriminative/Supervised Models
3.2.3. Discussion
4. Datasets
4.1. Weizmann Human Action Dataset
4.2. KTH Human Action Dataset
4.3. IXMAS Dataset
4.4. HMDB-51
4.5. Hollywood2
4.6. UCF-101 Action Recognition Dataset
4.7. UCF Sports Action Dataset
4.8. YouTube Action Dataset
4.9. ActivityNet Dataset
5. Applications
5.1. Intelligent Video Surveillance
5.2. Ambient Assisted Living
5.3. Human-Robot Interaction
5.4. Entertainment
5.5. Intelligent Driving
6. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Aggarwal, J.K.; Ryoo, M.S. Human Activity Analysis: A Review. ACM Comput. Surv. (CSUR) 2011, 43, 16. [Google Scholar] [CrossRef]
- Bouwmans, T. Traditional and recent approaches in background modeling for foreground detection: An overview. Comput. Sci. Rev. 2014, 11, 31–66. [Google Scholar] [CrossRef]
- Ke, S.-R.; Thuc, H.L.U.; Lee, Y.-J.; Hwang, J.-N.; Yoo, J.-H.; Choi, K.-H. A review on video-based human activity recognition. Computers 2013, 2, 88–131. [Google Scholar] [CrossRef]
- Ramanathan, M.; Yau, W.-Y.; Teoh, E.K. Human action recognition with video data: Research and evaluation challenges. IEEE Trans. Hum. Mach. Syst. 2014, 44, 650–663. [Google Scholar] [CrossRef]
- Poppe, R. A survey on vision-based human action recognition. Image Vis. Comput. 2010, 28, 976–990. [Google Scholar] [CrossRef]
- Weinland, D.; Ronfard, R.; Boyer, E. A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 2011, 115, 224–241. [Google Scholar] [CrossRef]
- Ziaeefard, M.; Bergevin, R. Semantic human activity recognition: A literature review. Pattern Recognit. 2015, 48, 2329–2345. [Google Scholar] [CrossRef]
- Maravelakis, E.; Konstantaras, A.; Kilty, J.; Karapidakis, E.; Katsifarakis, E. Automatic building identification and features extraction from aerial images: Application on the historic 1866 square of Chania Greece. In Proceedings of the 2014 International Symposium on Fundamentals of Electrical Engineering (ISFEE), Bucharest, Romania, 28–29 November 2014.
- Jalal, A.; Kamal, S.; Kim, D. A depth video sensor-based life-logging human activity recognition system for elderly care in smart indoor environments. Sensors 2014, 14, 11735–11759. [Google Scholar] [CrossRef] [PubMed]
- Jalal, A.; Sarif, N.; Kim, J.T.; Kim, T.S. Human activity recognition via recognized body parts of human depth silhouettes for residents monitoring services at smart home. Indoor Built Environ. 2013, 22, 271–279. [Google Scholar] [CrossRef]
- Li, J.; Allinson, N. Building recognition using local oriented features. IEEE Trans. Ind. Inform. 2013, 9, 1697–1704. [Google Scholar] [CrossRef]
- Jalal, A.; Kamal, S.; Kim, D. Shape and motion features approach for activity tracking and recognition from kinect video camera. In Proceedings of the 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops (WAINA), Gwangju, Korea, 25–27 March 2015.
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Yuan, R.; Hui, W. Object identification and recognition using multiple contours based moment invariants. In Proceedings of the 2008 International Symposium on Information Science and Engineering, Shanghai, China, 20–22 December 2008.
- Jalal, A.; Rasheed, Y.A. Collaboration achievement along with performance maintenance in video streaming. In Proceedings of the IEEE Conference on Interactive Computer Aided Learning, Villach, Austria, 26–28 September 2007.
- Kamal, S.; Azurdia-Meza, C.A.; Lee, K. Subsiding OOB Emission and ICI Power Using iPOWER Pulse in OFDM Systems. Adv. Electr. Comput. Eng. 2016, 16, 79–86. [Google Scholar] [CrossRef]
- Farooq, A.; Jalal, A.; Kamal, S. Dense RGB-D map-based human tracking and activity recognition using skin joints features and self-organizing map. KSII Trans. Internet Inf. Syst. 2015, 9, 1856–1869. [Google Scholar]
- Jalal, A.; Kim, S. The mechanism of edge detection using the block matching criteria for the motion estimation. In Proceedings of the Conference on Human Computer Interaction, Daegu, Korea, 1–4 February 2005.
- Kamal, S.; Jalal, A. A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors. Arab. J. Sci. Eng. 2016, 41, 1043–1051. [Google Scholar] [CrossRef]
- Azurdia-Meza, C.A.; Falchetti, A.; Arrano, H.F. Evaluation of the improved parametric linear combination pulse in digital baseband communication systems. In Proceedings of the 2015 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea, 28–30 October 2015.
- Bongale, P.; Ranjan, A.; Anand, S. Implementation of 3D object recognition and tracking. In Proceedings of the 2012 International Conference on Recent Advances in Computing and Software Systems (RACSS), Chennai, India, 25–27 April 2012.
- Kamal, S.; Jalal, A.; Kim, D. Depth Images-based Human Detection, Tracking and Activity Recognition Using Spatiotemporal Features and Modified HMM. J. Electr. Eng. Technol. 2016, 11, 1921–1926. [Google Scholar]
- Lai, K.; Bo, L.; Ren, X.; Fox, D. Sparse distance learning for object recognition combining RGB and depth information. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011.
- Jalal, A.; Kim, J.T.; Kim, T.-S. Development of a life logging system via depth imaging-based human activity recognition for smart homes. In Proceedings of the International Symposium on Sustainable Healthy Buildings, Seoul, Korea, 19 September 2012.
- Chang, J.-Y.; Shyu, J.-J.; Cho, C.-W. Fuzzy rule inference based human activity recognition. In Proceedings of the 2009 IEEE Control Applications, (CCA) & Intelligent Control, (ISIC), St. Petersburg, Russia, 8–10 July 2009.
- Holte, M.B.; Tran, C.; Trivedi, M.M. Human pose estimation and activity recognition from multi-view videos: Comparative explorations of recent developments. IEEE J. Sel. Top. Signal Process. 2012, 6, 538–552. [Google Scholar] [CrossRef]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 27. [Google Scholar] [CrossRef]
- Dawn, D.D.; Shaikh, S.H. A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis. Comput. 2016, 32, 289–306. [Google Scholar] [CrossRef]
- Sipiran, I.; Bustos, B. Harris 3D: A robust extension of the Harris operator for interest point detection on 3D meshes. Vis. Comput. 2011, 27, 963–976. [Google Scholar] [CrossRef]
- Laptev, I. On space-time interest points. Int. J. Comput.Vis. 2005, 64, 107–123. [Google Scholar] [CrossRef]
- Gilbert, A.; Illingworth, J.; Bowden, R. Scale invariant action recognition using compound features mined from dense spatio-temporal corners. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008.
- Bobick, A.F.; Davis, J.W. The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 257–267. [Google Scholar] [CrossRef]
- Hu, Y.; Cao, L.; Lv, F.; Yan, S.; Gong, Y. Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009.
- Roh, M.-C.; Shin, H.-K.; Lee, S.-W. View-independent human action recognition with volume motion template on single stereo camera. Pattern Recognit. Lett. 2010, 31, 639–647. [Google Scholar] [CrossRef]
- Schuldt, C.; Laptev, I.; Caputo, B. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, 23–26 August 2004.
- Sadanand, S.; Corso, J.J. Action bank: A high-level representation of activity in video. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012.
- Wu, X.; Xu, D.; Duan, L.; Luo, J. Action recognition using context and appearance distribution features. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011.
- Ikizler, N.; Duygulu, P. Histogram of oriented rectangles: A new pose descriptor for human action recognition. Image Vis. Comput. 2009, 27, 1515–1526. [Google Scholar] [CrossRef]
- Peng, X.; Qiao, Y.; Peng, Q.; Qi, X. Exploring Motion Boundary based Sampling and Spatial-Temporal Context Descriptors for Action Recognition. In Proceedings of the British Machine Vision Conference (BMVC), Bristol, UK, 9–13 September 2013.
- Liu, J.; Kuipers, B.; Savarese, S. Recognizing human actions by attributes. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011.
- Chen, M.; Gong, L.; Wang, T.; Feng, Q. Action recognition using lie algebrized gaussians over dense local spatio-temporal features. Multimed. Tools Appl. 2015, 74, 2127–2142. [Google Scholar] [CrossRef]
- Wang, H.; Kläser, A.; Schmid, C. Action recognition by dense trajectories. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011.
- Rodriguez, M. Spatio-temporal Maximum Average Correlation Height Templates In Action Recognition And Video Summarization. Ph.D. Thesis, University of Central Florida, Orlando, FL, USA, 2010. [Google Scholar]
- Soomro, K.; Zamir, A.R. Action recognition in realistic sports videos. In Computer Vision in Sports; Springer: Berlin, Germany, 2014; pp. 181–208. [Google Scholar]
- Ma, S.; Sigal, L.; Sclaroff, S. Space-time tree ensemble for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
- Wang, C.; Wang, Y.; Yuille, A.L. An approach to pose-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013.
- Kuehne, H.; Jhuang, H.; Garrote, E. HMDB: A large video database for human motion recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011.
- Wang, H.; Schmid, C. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 3–6 December 2013.
- Jiang, Y.-G.; Dai, Q.; Xue, X.; Liu, W.; Ngo, C.W. Trajectory-based modeling of human actions with motion reference points. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Kliper-Gross, O.; Gurovich, Y.; Hassner, T. Motion interchange patterns for action recognition in unconstrained videos. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Wang, L.; Qiao, Y.; Tang, X. Motionlets: Mid-level 3D parts for human motion recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, Oregon, 25–27 June 2013.
- Peng, X.; Zou, C.; Qiao, Y.; Peng, Q. Action recognition with stacked fisher vectors. In European Conference on Computer Vision; Springer: Berlin, Germany, 2014. [Google Scholar]
- Jain, M.; Jegou, H.; Bouthemy, P. Better exploiting motion for better action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013.
- Fernando, B.; Gavves, E.; Oramas, J.M. Modeling video evolution for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
- Hoai, M.; Zisserman, A. Improving human action recognition using score distribution and ranking. In Asian Conference on Computer Vision; Springer: Berlin, Germany, 2014. [Google Scholar]
- Marszalek, M.; Laptev, I.; Schmid, C. Actions in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–26 June 2009.
- Vig, E.; Dorr, M.; Cox, D. Space-variant descriptor sampling for action recognition based on saliency and eye movements. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Mathe, S.; Sminchisescu, C. Dynamic eye movement datasets and learnt saliency models for visual action recognition. In Computer Vision–ECCV 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 842–856. [Google Scholar]
- Kihl, O.; Picard, D.; Gosselin, P.-H. Local polynomial space-time descriptors for action classification. Mach. Vis. Appl. 2016, 27, 351–361. [Google Scholar] [CrossRef] [Green Version]
- Lan, T.; Zhu, Y.; Zamir, A.R.; Savarese, S. Action recognition by hierarchical mid-level action elements. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015.
- Yuan, J.; Liu, Z.; Wu, Y. Discriminative subvolume search for efficient action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, FL, USA, 20–26 June 2009.
- Amor, B.B.; Su, J.; Srivastava, A. Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Zanfir, M.; Leordeanu, M.; Sminchisescu, C. The moving pose: An efficient 3d kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 3–6 December 2013.
- Liu, J.; Luo, J.; Shah, M. Recognizing realistic actions from videos “in the wild”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–26 June 2009.
- Yilmaz, A.; Shah, M. Actions sketch: A novel action representation. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005.
- Sheikh, Y.; Sheikh, M.; Shah, M. Exploring the space of a human action. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–20 October 2005; Volume 1.
- Yang, J.; Shi, Z.; Wu, Z. Vision-based action recognition of construction workers using dense trajectories. Adv. Eng. Inform. 2016, 30, 327–336. [Google Scholar] [CrossRef]
- Jiang, Y.-G.; Dai, Q.; Liu, W.; Xue, X. Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling. IEEE Trans. Image Process. 2015, 24, 3781–3795. [Google Scholar] [CrossRef] [PubMed]
- Dollár, P.; Rabaud, V.; Cottrell, G. Behavior recognition via sparse spatio-temporal features. In Proceedings of the 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China, 15–16 October 2005.
- Thi, T.H.; Zhang, J.; Cheng, L.; Wang, L. Human action recognition and localization in video using structured learning of local space-time features. In Proceedings of the 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Boston, MA, USA, 29 August–1 September 2010.
- Sivic, J.; Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 14–17 October 2003.
- Peng, X.; Wang, L.; Wang, X.; Qiao, Y. Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Comput. Vis. Image Underst. 2016, 150, 109–125. [Google Scholar] [CrossRef]
- Liu, L.; Wang, L.; Liu, X. In defense of soft-assignment coding. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011.
- Perronnin, F.; Sánchez, J.; Mensink, T. Improving the fisher kernel for large-scale image classification. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Wang, H.; Kläser, A.; Schmid, C.; Liu, C.L. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 2013, 103, 60–79. [Google Scholar] [CrossRef]
- Li, H.; Greenspan, M. Multi-scale gesture recognition from time-varying contours. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–20 October 2005; Volume 1.
- Thurau, C.; Hlavác, V. Pose primitive based human action recognition in videos or still images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, Anchorage, AK, USA, 24–26 June 2008.
- Efros, A.A.; Berg, A.C.; Mori, G. Recognizing action at a distance. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 14–17 October 2003.
- Fathi, A.; Mori, G. Action recognition by learning mid-level motion features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, Anchorage, AK, USA, 24–26 June 2008.
- Jiang, Z.; Lin, Z.; Davis, L. Recognizing human actions by learning and matching shape-motion prototype trees. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 533–547. [Google Scholar] [CrossRef] [PubMed]
- Holte, M.B.; Moeslund, T.B.; Nikolaidis, N. 3D human action recognition for multi-view camera systems. In Proceedings of the 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), Hangzhou, China, 16–19 May 2011.
- Huang, P.; Hilton, A.; Starck, J. Shape similarity for 3D video sequences of people. Int. J. Comput. Vis. 2010, 89, 362–381. [Google Scholar] [CrossRef]
- Weinland, D.; Ronfard, R.; Boyer, E. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 2006, 104, 249–257. [Google Scholar] [CrossRef]
- Slama, R.; Wannous, H.; Daoudi, M.; Srivastava, A. Accurate 3D action recognition using learning on the Grassmann manifold. Pattern Recognit. 2015, 48, 556–567. [Google Scholar] [CrossRef]
- Wang, L.; Suter, D. Recognizing human activities from silhouettes: Motion subspace and factorial discriminative graphical model. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 18–23 June 2007.
- Rahman, S.A.; Cho, S.-Y.; Leung, M.K. Recognising human actions by analysing negative spaces. IET Comput. Vis. 2012, 6, 197–213. [Google Scholar] [CrossRef]
- Vishwakarma, D.; Kapoor, R. Hybrid classifier based human activity recognition using the silhouette and cells. Expert Syst. Appl. 2015, 42, 6957–6965. [Google Scholar] [CrossRef]
- Junejo, I.N.; Junejo, K.N.; Al Aghbari, Z. Silhouette-based human action recognition using SAX-Shapes. Vis. Comput. 2014, 30, 259–269. [Google Scholar] [CrossRef]
- Chaaraoui, A.A.; Climent-Pérez, P.; Flórez-Revuelta, F. Silhouette-based human action recognition using sequences of key poses. Pattern Recognit. Lett. 2013, 34, 1799–1807. [Google Scholar] [CrossRef] [Green Version]
- Chaaraoui, A.A.; Flórez-Revuelta, F. A Low-Dimensional Radial Silhouette-Based Feature for Fast Human Action Recognition Fusing Multiple Views. Int. Sch. Res. Not. 2014, 2014, 547069. [Google Scholar] [CrossRef] [PubMed]
- Rahman, S.A.; Song, I.; Song, I.; Leung, M.K.H.; Lee, I. Fast action recognition using negative space features. Expert Syst. Appl. 2014, 41, 574–587. [Google Scholar] [CrossRef]
- Cheema, S.; Eweiwi, A.; Thurau, C. Action recognition by learning discriminative key poses. In Proceedings of the 2011 IEEE. International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011.
- Chun, S.; Lee, C.-S. Human action recognition using histogram of motion intensity and direction from multiple views. IET Comput. Vis. 2016, 10, 250–257. [Google Scholar] [CrossRef]
- Murtaza, F.; Yousaf, M.H.; Velastin, S. Multi-view Human Action Recognition using 2D Motion Templates based on MHIs and their HOG Description. IET Comput. Vis. 2016, 10, 758–767. [Google Scholar] [CrossRef]
- Ahmad, M.; Lee, S.-W. HMM-based human action recognition using multiview image sequences. In Proceedings of the 18th International Conference on Pattern Recognition, ICPR 2006, Hong Kong, China, 20–24 August 2006.
- Vishwakarma, D.K.; Kapoor, R.; Dhiman, A. A proposed unified framework for the recognition of human activity by exploiting the characteristics of action dynamics. Robot. Auton. Syst. 2016, 77, 25–38. [Google Scholar] [CrossRef]
- Pehlivan, S.; Forsyth, D.A. Recognizing activities in multiple views with fusion of frame judgments. Image Vis. Comput. 2014, 32, 237–249. [Google Scholar] [CrossRef]
- Eweiwi, A.; Cheema, S.; Thurau, C. Temporal key poses for human action recognition. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011.
- Ojala, T.; Pietikainen, M.; Harwood, D. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Jerusalem, Israel, 9–13 October 1994.
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Pietikäinen, M.; Hadid, A.; Zhao, G.; Ahonen, T. Computer Vision Using Local Binary Patterns; Springer Science & Business Media: London, UK, 2011; Volume 40. [Google Scholar]
- Zhao, G.; Pietikainen, M. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 915–928. [Google Scholar] [CrossRef] [PubMed]
- Yeffet, L.; Wolf, L. Local trinary patterns for human action recognition. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009.
- Kellokumpu, V.; Zhao, G.; Pietikäinen, M. Human activity recognition using a dynamic texture based method. In Proceedings of the British Machine Vision Conference (BMVC 2008), Leeds, UK, 1–4 September 2008.
- Kushwaha, A.K.S.; Srivastava, S.; Srivastava, R. Multi-view human activity recognition based on silhouette and uniform rotation invariant local binary patterns. Multimed. Syst. 2016. [Google Scholar] [CrossRef]
- Baumann, F.; Ehlers, A.; Rosenhahn, B.; Liao, J. Recognizing human actions using novel space-time volume binary patterns. Neurocomputing 2016, 173, 54–63. [Google Scholar] [CrossRef]
- Sadek, S.; Al-Hamadi, A.; Michaelis, B. An action recognition scheme using fuzzy log-polar histogram and temporal self-similarity. EURASIP J. Adv. Signal Process. 2011, 2011, 540375. [Google Scholar] [CrossRef]
- Yao, B.; Alhaddad, M.J.; Alghazzawi, D. A fuzzy logic-based system for the automation of human behavior recognition using machine vision in intelligent environments. Soft Comput. 2015, 19, 499–506. [Google Scholar] [CrossRef]
- Lim, C.H.; Chan, C.S. Fuzzy qualitative human model for viewpoint identification. Neural Comput. Appl. 2016, 27, 845–856. [Google Scholar] [CrossRef]
- Obo, T.; Loo, C.K.; Seera, M.; Kubota, N. Hybrid evolutionary neuro-fuzzy approach based on mutual adaptation for human gesture recognition. Appl. Soft Comput. 2016, 42, 377–389. [Google Scholar] [CrossRef]
- Yousefi, B.; Loo, C.K. Bio-Inspired Human Action Recognition using Hybrid Max-Product Neuro-Fuzzy Classifier and Quantum-Behaved PSO. arXiv, 2015; arXiv:1509.03789. [Google Scholar]
- Iglesias, J.A.; Angelov, P.; Ledezma, A. Creating evolving user behavior profiles automatically. IEEE Trans. Knowl. Data Eng. 2012, 24, 854–867. [Google Scholar] [CrossRef] [Green Version]
- Iglesias, J.A.; Angelov, P.; Ledezma, A. Evolving classification of agents’ behaviors: A general approach. Evol. Syst. 2010, 1, 161–171. [Google Scholar] [CrossRef] [Green Version]
- 1Gorelick, L.; Blank, M.; Shechtman, E. Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2247–2253. [Google Scholar] [CrossRef] [PubMed]
- Kellokumpu, V.; Zhao, G.; Pietikäinen, M. Recognition of human actions using texture descriptors. Mach. Vis. Appl. 2011, 22, 767–780. [Google Scholar] [CrossRef]
- Sadek, S.; Al-Hamadi, A.; Michaelis, B. Human action recognition via affine moment invariants. In Proceedings of the 2012 21st International Conference on Pattern Recognition (ICPR), Tsukuba, Japan, 11–15 November 2012.
- Mattivi, R.; Shao, L. Human action recognition using LBP-TOP as sparse spatio-temporal feature descriptor. In Computer Analysis of Images and Patterns; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Weinland, D.; Boyer, E.; Ronfard, R. Action recognition from arbitrary views using 3D exemplars. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–20 October 2007.
- Sargano, A.B.; Angelov, P.; Habib, Z. Human Action Recognition from Multiple Views Based on View-Invariant Feature Descriptor Using Support Vector Machines. Appl. Sci. 2016, 10. [Google Scholar] [CrossRef]
- Holte, M.B.; Chakraborty, B.; Gonzalez, J. A local 3-D motion descriptor for multi-view human action recognition from 4-D spatio-temporal interest points. IEEE J. Sel. Top. Signal Process. 2012, 6, 553–565. [Google Scholar] [CrossRef]
- Turaga, P.; Veeraraghavan, A.; Chellappa, R. Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, Anchorage, AK, USA, 24–26 June 2008.
- Pehlivan, S.; Duygulu, P. A new pose-based representation for recognizing actions from multiple cameras. Comput. Vis. Image Underst. 2011, 115, 140–151. [Google Scholar] [CrossRef]
- Zhu, F.; Shao, L.; Xie, J.; Fang, Y. From handcrafted to learned representations for human action recognition: A survey. Image Vis. Comput. 2016, 55, 42–52. [Google Scholar] [CrossRef]
- Guha, T.; Ward, R.K. Learning sparse representations for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1576–1588. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Yuan, C.; Hu, W.; Sun, C. Supervised class-specific dictionary learning for sparse modeling in action recognition. Pattern Recognit. 2012, 45, 3902–3911. [Google Scholar] [CrossRef]
- Zheng, J.; Jiang, Z.; Phillips, P.J.; Chellappa, R. Cross-View Action Recognition via a Transferable Dictionary Pair. In Proceedings of the 2012 British Machine Vision Conference, BMVC 2012, Guildford, UK, 3–7 September 2012.
- Zheng, J.; Jiang, Z.; Chellappa, R. Cross-View Action Recognition via Transferable Dictionary Learning. IEEE Trans. Image Process. 2016, 25, 2542–2556. [Google Scholar] [CrossRef] [PubMed]
- Zhu, F.; Shao, L. Weakly-supervised cross-domain dictionary learning for visual recognition. Int. J. Comput. Vis. 2014, 109, 42–59. [Google Scholar] [CrossRef]
- Zhu, F.; Shao, L. Correspondence-Free Dictionary Learning for Cross-View Action Recognition. In International Conference on Pattern Recognition (ICPR 2014), Stockholm, Sweden, 24–28 August 2014.
- Wang, J.; Yang, J.; Yu, K.; Lv, F.; Huang, T. Locality-constrained linear coding for image classification. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010.
- Liu, L.; Shao, L.; Li, X.; Lu, K. Learning spatio-temporal representations for action recognition: A genetic programming approach. IEEE Trans. Cybern. 2016, 46, 158–170. [Google Scholar] [CrossRef] [PubMed]
- Deng, L.; Yu, D. Deep Learning. Signal Process. 2014, 7, 3–4. [Google Scholar]
- Ivakhnenko, A. Polynomial theory of complex systems. IEEE Trans. Syst. Man Cybern. 1971, 1, 364–378. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Hinton, G.E.; Osindero, S.; Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
- Smolensky, P. Information Processing in Dynamical Systems: Foundations of Harmony Theory; DTIC Document; University of Colorado Boulder Computer Science Department: Boulder, CO, USA, 1986. [Google Scholar]
- Le, Q.V.; Zou, W.Y.; Yeung, S.Y. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011.
- Foggia, P.; Saggese, A.; Strisciuglio, N. Exploiting the deep learning paradigm for recognizing human actions. In Proceedings of the 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Seoul, Korea, 26–29 August 2014.
- Hasan, M.; Roy-Chowdhury, A.K. Continuous learning of human activity models using deep nets. In European Conference on Computer Vision; Springer: Berlin, Germany, 2014. [Google Scholar]
- Ballan, L.; Bertini, M.; Del Bimbo, A.; Seidenari, L.; Serra, G. Effective codebooks for human action representation and classification in unconstrained videos. IEEE Trans. Multimed. 2012, 14, 1234–1245. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In European Conference on Computer Vision; Springer: Berlin, Germany, 2014. [Google Scholar]
- Zeiler, M.D.; Taylor, G.W.; Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011.
- Karpathy, A.; Li, F.; Johnson, J. CS231n Convolutional Neural Network for Visual Recognition. Available online: http://cs231n.github.io/ (accessed on 10 August 2016).
- Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T. Large-scale video classification with convolutional neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014.
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [Google Scholar] [CrossRef] [PubMed]
- Wiskott, L.; Sejnowski, T.J. Slow feature analysis: Unsupervised learning of invariances. Neural Comput. 2002, 14, 715–770. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Tao, D. Slow feature analysis for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 436–450. [Google Scholar] [CrossRef] [PubMed]
- Sun, L.; Jia, K.; Chan, T.-H.; Fang, Y.; Wang, G.; Yan, S. DL-SFA: Deeply-learned slow feature analysis for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014.
- Du, Y.; Wang, W.; Wang, L. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
- Sun, L.; Jia, K.; Yeung, D.-Y.; Shi, B.E. Human action recognition using factorized spatio-temporal convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015.
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015.
- Park, E.; Han, X.; Berg, T.L.; Berg, A.C. Combining multiple sources of knowledge in deep CNNs for action recognition. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–9 March 2016.
- Yu, S.; Cheng, Y.; Su, S.; Cai, G.; Li, S. Stratified pooling based deep convolutional neural networks for human action recognition. Multimed. Tools Appl. 2016, 1–16. [Google Scholar] [CrossRef]
- Ijjina, E.P.; Mohan, C.K. Human action recognition based on motion capture information using fuzzy convolution neural networks. In Proceedings of the 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), Kolkata, India, 4–7 January 2015.
- Chéron, G.; Laptev, I.; Schmid, C. P-CNN: Pose-based CNN features for action recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015.
- Gkioxari, G.; Girshick, R.; Malik, J. Contextual action recognition with R* CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015.
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014.
- Rahmani, H.; Mian, A. 3D action recognition from novel viewpoints. In Proceedings of the 2016 Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
- Alfaro, A.; Mery, D.; Soto, A. Action Recognition in Video Using Sparse Coding and Relative Features. arXiv, 2016; arXiv:1605.03222. [Google Scholar]
- Luo, Y.; Cheong, L.-F.; Tran, A. Actionness-assisted recognition of actions. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015.
- Wang, L.; Qiao, Y.; Tang, X. Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
- Lan, Z.; Lin, M.; Li, X.; Hauptmann, A.G.; Raj, B. Beyond gaussian pyramid: Multi-skip feature stacking for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
- Bilen, H.; Fernando, B.; Gavves, E.; Vedaldi, A.; Gould, S. Dynamic image networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition CVPR, Las Vegas, NV, USA, 27–30 June 2016.
- Mahasseni, B.; Todorovic, S. Regularizing Long Short Term Memory with 3D Human-Skeleton Sequences for Action Recognition. In Proceedigs of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 27–30 June 2016.
- Fernando, B.; Gavves, E.; Oramas, J.; Ghodrati, A.; Tuytelaars, T. Rank pooling for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016. [Google Scholar] [CrossRef]
- Zhu, W.; Hu, J.; Sun, G.; Cao, X.; Qiao, Y. A key volume mining deep framework for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
- Wang, C.; Wang, Y.; Yuille, A.L. Mining 3D key-pose-motifs for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
- Veeriah, V.; Zhuang, N.; Qi, G.-J. Differential recurrent neural networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015.
- Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv, 2012; arXiv:1212.0402. [Google Scholar]
- Yue-Hei Ng, J.; Hausknecht, M.; Vijayanarasimhan, S.; Vinyals, O.; Monga, R.; Toderici, G. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
- Weinzaepfel, P.; Harchaoui, Z.; Schmid, C. Learning to track for spatio-temporal action localization. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015.
- Caba Heilbron, F.; Escorcia, V.; Ghanem, B.; Carlos Niebles, J. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
- Reddy, K.K.; Shah, M. Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 2013, 24, 971–981. [Google Scholar] [CrossRef]
- Lizhong, L.; Zhiguo, L.; Yubin, Z. Research on Detection and Tracking of Moving Target in Intelligent Video Surveillance. In Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering (ICCSEE), Hangzhou, China, 23–25 March 2012.
- Kratz, L.; Nishino, K. Tracking pedestrians using local spatio-temporal motion patterns in extremely crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 987–1002. [Google Scholar] [CrossRef] [PubMed]
- Xiang, T.; Gong, S. Video behavior profiling for anomaly detection. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 893–908. [Google Scholar] [CrossRef] [PubMed]
- Sadeghi-Tehran, P.; Angelov, P. A real-time approach for novelty detection and trajectories analysis for anomaly recognition in video surveillance systems. In Proceedings of the 2012 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), Madrid, Spain, 17–18 May 2012.
- Hu, W.; Tan, T.; Wang, L.; Maybank, S. A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2004, 34, 334–352. [Google Scholar] [CrossRef]
- Paul, M.; Haque, S.M.; Chakraborty, S. Human detection in surveillance videos and its applications—A review. EURASIP J. Adv. Signal Process. 2013, 2013, 176. [Google Scholar] [CrossRef]
- Foroughi, H.; Naseri, A.; Saberi, A.; Yazdi, H.S. An eigenspace-based approach for human fall detection using integrated time motion image and neural network. In Proceedings of the 9th International Conference on Signal Processing, ICSP 2008, Leipzig, Germany, 10–11 May 2008.
- Rougier, C.; Meunier, J.; St-Arnaud, A.; Rousseau, J. Robust video surveillance for fall detection based on human shape deformation. IEEE Trans. Circuits Syst. Video Technol. 2011, 21, 611–622. [Google Scholar] [CrossRef]
- Mubashir, M.; Shao, L.; Seed, L. A survey on fall detection: Principles and approaches. Neurocomputing 2013, 100, 144–152. [Google Scholar] [CrossRef]
- Benmansour, A.; Bouchachia, A.; Feham, M. Multioccupant activity recognition in pervasive smart home environments. ACM Comput. Surv. (CSUR) 2016, 48, 34. [Google Scholar] [CrossRef]
- Jurek, A.; Nugent, C.; Bi, Y.; Wu, S. Clustering-based ensemble learning for activity recognition in smart homes. Sensors 2014, 14, 12285–12304. [Google Scholar] [CrossRef] [PubMed]
- Fatima, I.; Fahim, M.; Lee, Y.-K.; Lee, S. Classifier ensemble optimization for human activity recognition in smart homes. In Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication, Kota Kinabalu, Malaysia, 17–19 January 2013.
- Zhang, L.; Jiangb, M.; Faridc, D.; Hossaina, M.A. Intelligent facial emotion recognition and semantic-based topic detection for a humanoid robot. Expert Syst. Appl. 2013, 40, 5160–5168. [Google Scholar] [CrossRef]
- Roitberg, A.; Perzylo, A.; Somani, N.; Giuliani, M.; Rickert, M.; Knoll, A. Human activity recognition in the context of industrial human-robot interaction. In Proceedings of the 2014 Annual Summit and Conference (APSIPA) Asia-Pacific Signal and Information Processing Association, Chiang Mai, Thailand, 9–12 December 2014.
- Ryoo, M.; Fuchs, T.J.; Xia, L.; Aggarwal, J.K.; Matthies, L. Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, Portland, OR, USA, 2–5 March 2015.
- Xia, L.; Gori, I.; Aggarwal, J.K.; Ryoo, M.S. Robot-centric Activity Recognition from First-Person RGB-D Videos. In Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Beach, HI, USA, 6–9 January 2015.
- Luo, Y.; Wu, T.-D.; Hwang, J.-N. Object-based analysis and interpretation of human motion in sports video sequences by dynamic Bayesian networks. Comput. Vis. Image Underst. 2003, 92, 196–216. [Google Scholar] [CrossRef]
- Vallim, R.M.; Filho, J.A.A.; De Mello, R.F.; De Carvalho, A.C.P.L.F. Online behavior change detection in computer games. Expert Syst. Appl. 2013, 40, 6258–6265. [Google Scholar] [CrossRef]
- Klauer, S.G.; Guo, F.; Sudweeks, J.; Dingus, T.A. An Analysis of Driver Inattention Using a Case-Crossover Approach on 100-Car Data: Final Report; National Highway Traffic Safety Administration: Washington, DC, USA, 2010.
- Tison, J.; Chaudhary, N.; Cosgrove, L. National Phone Survey on Distracted Driving Attitudes and Behaviors; National Highway Traffic Safety Administration: Washington, DC, USA, 2011.
- Eshed Ohn-Bar, S.M.; Tawari, A.; Trivedi, M. Head, eye, and hand patterns for driver activity recognition. In Proceeedings of the 2014 IEEE International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014.
- Braunagel, C.; Kasneci, E.; Stolzmann, W.; Rosenstiel, W. Driver-activity recognition in the context of conditionally autonomous driving. In Proceeedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Canary Islands, Spain, 15–18 September 2015.
Method | Feature Type | Performance (%) |
---|---|---|
KTH [35] | ||
Sadanand and Corso 2012 [36] | Space-time volumes | 98.2 |
Wu et al. 2011 [37] | Space-time volumes | 94.5 |
Ikizler and Duygulu 2009 [38] | Space-time volumes | 89.4 |
Peng et al. 2013 [39] | Features | 95.6% |
Liu et al. 2011 [40] | Features (Attributes) | 91.59 |
Chen et al. 2015 [41] | Features (mid-level) | 97.41 |
Wang et al. 2011 [42] | Dense trajectory | 95% |
UCF (University of Central Florida) Sports [43,44] | ||
Sadanand and Corso 2012 [36] | Space-time volumes | 95.0 |
Wu et al. 2011 [37] | Space-time volumes | 91.30 |
Ma et al. 2015 [45] | Space-time volumes | 89.4 |
Chen et al. 2015 [41] | Features (mid-level) | 92.67 |
Wang et al. 2013 [46] | Features (Pose-based) | 90 |
Sadanand and Corso 2012 [36] | STVs (space-time volumes) | 95.0 |
HDMB (Human Motion database)-51 [47] | ||
Wang and Schmid 2013 [48] | Dense trajectory | 57.2 |
Jiang et al. 2012 [49] | Trajectory | 40.7 |
Wang et al. 2011 [42] | Dense trajectory | 46.6 |
Kliper et al. 2012 [50] | Space-time volumes, bag-of-visual-words | 29.2 |
Sadanand and Corso 2012 [36] | Space-time volumes | 26.9 |
Kuehne et al. 2011 [47] | Features | 23.0 |
Wang et al. 2013 [51] | Features (mid-level) | 33.7 |
Peng et al. 2014 [52] | Fisher vector and Stacked Fisher Vector | 66.79 |
Jain et al. 2013 [53] | Features | 52.1 |
Fernando et al. 2015 [54] | Features (Video Darwin) | 63.7 |
Hoai and Zisserman 2014 [55] | Features | 65.9 |
Hollywood2 [56] | ||
Wang and Schmid 2013 [48] | Dense trajectory | 64.3 |
Jain et al. 2013 [53] | Trajectory | 62.5 |
Jiang et al. 2012 [49] | Trajectory | 59.5 |
Vig et al. 2012 [57] | Trajectory | 59.4 |
Mathe and Sminchisescu 2012 [58] | Space-time volumes | 61.0 |
Kihl et al. 2016 [59] | Features | 58.6 |
Lan et al. 2015 [60] | Features (mid-level) | 66.3 |
Fernando et al. 2015 [54] | Features (Video Darwin) | 73.7 |
Hoai and Zisserman [55] | Features | 73.6 |
Microsoft Research Action3D [61] | ||
Wang et al. 2013 [46] | Features (pose-based) | 90.22 |
Amor et al. 2016 [62] | Trajectory | 89 |
Zanfir et al. 2013 [63] | 3D Pose | 91.7 |
YouTube action dataset [64] | ||
Wang et al. 2011 [42] | Dense trajectory | 84.1 |
Peng et al. 2014 [52] | Features (FV + SFV) | 93.38 |
Method | Feature Type | Performance (%) |
---|---|---|
Weizmann [114] | ||
Rahman et al. 2012 [86] | Shape Features | 100 |
Vishwakarma and Kapoor 2015 [87] | Shape Features | 100 |
Rahman et al. 2014 [91] | Shape-motion | 95.56 |
Chaaraoui et al. 2013 [89] | Shape Features | 92.8 |
Vishwakarma et al. 2016 [96] | Shape Features | 100 |
Jiang et al. 2012 [80] | Shape-motion | 100 |
Eweiwi et al. 2011 [98] | Shape-motion | 100 |
Yeffet and Wolf 2009 [103] | LBP | 100 |
Kellokumpu et al. 2008 [104] | LBP (LBP-TOP) | 98.7 |
Kellokumpu et al. 2011 [115] | LBP | 100 |
Sadek et al. 2011 [107] | Fuzzy features | 97.8 |
Yao et al. 2015 [108] | Fuzzy features | 94.03 |
KTH [35] | ||
Rahman et al. 2012 [86] | Shape Features | 94.67 |
Vishwakarma and Kapoor 2015 [87] | Shape Features | 96.4 |
Rahman et al. 2014 [91] | Shape-motion | 94.49 |
Vishwakarma et al. 2016 [96] | Shape Features | 95.5 |
Sadek et al. 2012 [116] | Shape Features | 93.30 |
Jiang et al. 2012 [80] | Shape-motion | 95.77 |
Yeffet and Wolf 2009 [103] | LBP | 90.1 |
Mattivi and Shao 2009 [117] | LBP (LBP-TOP) | 91.25 |
Kellokumpu et al. 2011 [115] | LBP | 93.8 |
Sadek et al. 2011 [107] | Fuzzy Features | 93.6 |
IXMAS (INRIA Xmas Motion Acquisition Sequences) [118] | ||
Junejo et al. 2014 [88] | Shape Features | 89.0 |
Sargano et al. 2016 [119] | Shape features | 89.75 |
Lin et al. 2009 [80] | Shape-motion | 88.89 |
Chaaraoui et al. 2013 [89] | Shape Features | 85.9 |
Chun and Lee 2016 [93] | Motion Features | 83.03 |
Vishwakarma et al. 2016 [96] | Shape Features | 85.80 |
Holte et al. 2012 [120] | Motion Feature (3D) | 100 |
Weinland et al. 2006 [83] | Motion Features (3D) | 93.33 |
Turaga et al. 2008 [121] | Shape-motion (3D) | 98.78 |
Pehlivan and Duygulu 2011 [122] | Shape Features (3D) | 90.91 |
Baumann et al. 2016 [106] | LBP | 80.55 |
Method | Feature Type | Performance (%) |
---|---|---|
KTH [35] | ||
Wang et al. 2012 [125] | Dictionary Learning | 94.17 |
Liu et al. 2016 [131] | Genetic Programming | 95.0 |
Le et al. 2011 [138] | Subspace analysis | 93.9 |
Ballan et al. 2012 [141] | Codebook | 92.66 |
Hasan and Chowdhury 2014 [140] | DBNs (Deep Belief Networks) | 96.6 |
Ji et al 2013 [149] | 3D CNN (Convolutional Neural Networks) | 90.2 |
Zhang and Tao 2012 [151] | Slow Feature Analysis (SFA) | 93.50 |
Sun et al. 2014 [152] | Deeply-Learned Slow Feature Analysis (D-SFA) | 93.1 |
Alfaro et al. 2016 [163] | Sparse coding | 97.5% |
HDMB-51 [47] | ||
Liu et al. 2016 [131] | Genetic Programming | 48.4 |
Simonyan and Zisserman 2014 [148] | CNN | 59.4 |
Luo et al. 2015 [164] | Actionness | 56.38 |
Wang et al. 2015 [165] | convolutional descriptor | 65.9 |
Lan et al. 2015 [166] | Multi-skip Feature Stacking | 65.1 |
Sun et al. 2015 [154] | Spatio-Temporal CNN | 59.1 |
Park et al. 2016 [156] | Deep CNN | 54.9 |
Yu et al. 2016 [157] | SP(stratified pooling)-CNN | 74.7 |
Bilen et al. 2016 [167] | Multiple Dynamic Images (MDI), trajectory | 65.2 |
Mahasseni and Todorovic 2016 [168] | Lon Short term Memory- Convolutional Neural Network (LSTM-CNN) | 55.3 |
Fernando et al. 2016 [169] | Rank pooling + CNN | 65.8 |
Zhu et al. 2016 [170] | Key volume mining | 63.3 |
Hollywood2 [56] | ||
Liu et al. 2016 [131] | Genetic Programming | 46.8 |
Le et al. 2011 [138] | Subspace analysis | 53.3 |
Ballan et al. 2012 [141] | Codebook | 45.0 |
Sun et al. 2014 [152] | DL-SFA | 48.1 |
Fernando et al. 2016 [169] | Rank pooling + CNN | 75.2 |
MSR Action3D [61] | ||
Du et al. 2015 [153] | RNN (Recurrent Neural Network) | 94.49 |
Wang et al. 2016 [171] | 3D Key-Pose-Motifs | 99.36 |
Veeriah et al. 2015 [172] | Differential RNN | 92.03 |
University of Central Florida (UCF-101) [173] | ||
Simonyan and Zisserman 2014 [148] | Two-stream CNN | 88.0 |
Ng et al. 2015 [174] | CNN | 88.6 |
Wang et al. 2015 [165] | convolutional descriptor | 91.5 |
Lan et al. 2015 [166] | Multi-skip Feature Stacking | 89.1 |
Sun et al. 2015 [154] | Spatio-Temporal CNN | 88.1 |
Tran et al. 2015 [155] | 3D CNN | 90.4 |
Park et al. 2016 [156] | Deep CNN | 89.1 |
Yu et al. 2016 [157] | SP-CNN | 91.6 |
Bilen et al. 2016 [167] | MDI and trajectory | 89.1 |
Mahasseni and Todorovic 2016 [168] | LSTM-CNN | 86.9 |
Zhu et al. 2016 [170] | Key volume mining | 93.1 |
UCF Sports [43,44] | ||
Sun et al. 2014 [152] | DL-SFA | 86.6 |
Weinzaepfel et al. 2015 [175] | Spatio-temporal | 91.9% |
ActivityNet Dataset [176] | ||
Heilbron et al. 2015 [176] | Deep Features, Motion Features, and Static Features | 42.2 (Untrimmed) |
Heilbron et al. 2015 [176] | Deep Features, Motion Features, and Static Features | 50.2 (Trimmed) |
Dataset | Year | No. of Actions | Method | Highest Accuracy |
---|---|---|---|---|
KTH | 2004 | 6 | [36] | 98.2% |
Weizmann | 2005 | 9 | [87] | 100% |
IXMAS | 2006 | 13 | [120] | 100% |
UCF Sports | 2008 | 10 | [36] | 95.0% |
Hollywood2 | 2009 | 12 | [169] | 75.2% |
YouTube | 2009 | 11 | [52] | 93.38% |
HDMB-51 | 2011 | 51 | [157] | 74.7% |
UCF-101 | 2012 | 101 | [157] | 91.6 |
ActivityNet (Untrimmed) | 2015 | 200 | [176] | 42.2 (baseline) |
ActivityNet (Trimmed) | 2015 | 200 | [176] | 50.2 (baseline) |
© 2017 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sargano, A.B.; Angelov, P.; Habib, Z. A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition. Appl. Sci. 2017, 7, 110. https://doi.org/10.3390/app7010110
Sargano AB, Angelov P, Habib Z. A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition. Applied Sciences. 2017; 7(1):110. https://doi.org/10.3390/app7010110
Chicago/Turabian StyleSargano, Allah Bux, Plamen Angelov, and Zulfiqar Habib. 2017. "A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition" Applied Sciences 7, no. 1: 110. https://doi.org/10.3390/app7010110
APA StyleSargano, A. B., Angelov, P., & Habib, Z. (2017). A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition. Applied Sciences, 7(1), 110. https://doi.org/10.3390/app7010110