Abstract
Human activity recognition (HAR) has quite a wide range of applications. Due to its widespread use, new studies have been developed to improve the HAR performance. In this study, HAR is carried out using the commonly preferred KTH and Weizmann dataset, as well as a dataset which we created. Speeded up robust features (SURF) are used to extract features from these datasets. These features are reinforced with bag of visual words (BoVW). Different from the studies in the literature that use similar methods, SURF descriptors are extracted from binary images as well as grayscale images. Moreover, four different machine learning (ML) methods such as k-nearest neighbors, decision tree, support vector machine and naive Bayes are used for classification of BoVW features. Hyperparameter optimization is used to set the hyperparameters of these ML methods. As a result, ML methods are compared with each other through a comparison with the activity recognition performances of binary and grayscale image features. The results show that if the contrast of the environment decreases when a human enters the frame, the SURF of the binary image are more effective than the SURF of the gray image for HAR.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Dobhal T, Shitole V, Thomas G, Navada G (2015) Human activity recognition using binary motion image and deep learning. Procedia Comput Sci 58:178–185
Kim E, Helal S, Cook D (2010) Human activity recognition and pattern discovery. IEEE Pervasive Comput/IEEE Comput Soc IEEE Commun Soc 9(1):48
De Kleijn R, Kachergis G, Hommel B (2014) Everyday robotic action: lessons from human action control. Front Neurorobot 8:13
Dhamsania CJ, Ratanpara TV (2016) A survey on human action recognition from videos. In: 2016 Online international conference on green engineering and technologies (IC-GET). IEEE, pp 1–5
Koohzadi M, Charkari NM (2017) Survey on deep learning methods in human action recognition. IET Comput Vis 11(8):623–632
Ngoc LQ, Viet VH, Son TT, Hoang PM (2016) A robust approach for action recognition based on spatio-temporal features in RGB-D sequences. Int J Adv Comput Sci Appl 7(5):166–177
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Mandal R, Roy PP, Pal U, Blumenstein M (2018) Bag-of-visual-words for signature-based multi-script document retrieval. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3444-y
Tang F, Lim SH, Chang NL, Tao H (2009) A novel feature descriptor invariant to complex brightness changes. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 2631–2638
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: European conference on computer vision. Springer, pp 404–417
Panchal P, Panchal S, Shah S (2013) A comparison of SIFT and SURF. Int J Innov Res Comput and Commun Eng 1(2):323–327
Karami E, Prasad S, Shehata M (2017) Image matching using SIFT, SURF, BRIEF and ORB: performance comparison for distorted images. arXiv preprint arXiv:1710.02726
Yang J, Jiang Y-G, Hauptmann AG, Ngo C-W (2007) Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the international workshop on multimedia information retrieval. ACM, pp 197–206
Faraki M, Palhang M, Sanderson C (2014) Log-Euclidean bag of words for human action recognition. IET Comput Vis 9(3):331–339
Dawn DD, Shaikh SH (2016) A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis Comput 32(3):289–306
Xu S, Fang T, Li D, Wang S (2010) Object classification of aerial images with bag-of-visual words. IEEE Geosci Remote Sens Lett 7(2):366–370
Kim J, Kim B-S, Savarese S (2012) Comparing image classification methods: k-nearest-neighbor and support-vector-machines. Ann Arbor 1001:48109–48122
Farid DM, Zhang L, Rahman CM, Hossain MA, Strachan R (2014) Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst Appl 41(4):1937–1946
Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. In: Data mining techniques for the life sciences. Springer, pp 223–239
Abellán J, Castellano JG (2017) Improving the Naive Bayes classifier via a quick variable selection method using maximum of entropy. Entropy 19(6):247
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305
Yao Y, Cao J, Ma Z (2018) A cost-effective deadline-constrained scheduling strategy for a hyperparameter optimization workflow for machine learning algorithms. In: International conference on service-oriented computing. Springer, pp 870–878
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, 2004 ICPR 2004, vol. 3. IEEE, pp 32–36
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space–time shapes. In: Proceedings of international conference computer Vision. IEEE, pp 1395–1402
Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666
Plötz T, Guan Y (2018) Deep learning for human activity recognition in mobile computing. Computer 51(5):50–59
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding. Springer, pp 29–39
Rahman S, Cho S-Y, Leung M (2012) Recognising human actions by analysing negative spaces. IET Comput Vis 6(3):197–213
Zhang Z, Hu Y, Chan S, Chia L-T (2008) Motion context: a new representation for human action recognition. In: European conference on computer vision. Springer, pp 817–829
Singh M, Basu A, Mandal MK (2008) Human activity recognition based on silhouette directionality. IEEE Trans Circuits Syst Video Technol 18(9):1280–1292
Bian W, Tao D, Rui Y (2012) Cross-domain human action recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):298–307
Cao X-Q, Liu Z-Q (2015) Type-2 fuzzy topic models for human action recognition. IEEE Trans Fuzzy Syst 23(5):1581–1593
Uddin MZ, Kim T-S, Kim J-T (2013) A spatiotemporal robust approach for human activity recognition. Int J Adv Robot Syst 10(11):391
Ding W, Liu K, Cheng F, Shi H, Zhang B (2015) Skeleton-based human action recognition with profile hidden Markov models. In: CCF Chinese conference on computer vision. Springer, pp 12–21
Gao H, Chen W, Dou L (2015) Image classification based on support vector machine and the fusion of complementary features. arXiv preprint arXiv:1511.01706
Halima NB, Hosam O (2016) Bag of words based surveillance system using support vector machines. Int J Secur Appl 10(4):331–346
Liu A-A, Su Y, Gao Z, Hao T, Yang Z-X, Zhang Z (2013) Partwise bag-of-words-based multi-task learning for human action recognition. Electron Lett 49(13):803–805
Liu A-A, Xu N, Su Y-T, Lin H, Hao T, Yang Z-X (2015) Single/multi-view human action recognition via regularized multi-task learning. Neurocomputing 151:544–553
Liu Y, Fung K-C, Ding W, Guo H, Qu T, Xiao C (2018) Novel smart waste sorting system based on image processing algorithms: SURF-BoW and multi-class SVM. Comput Inf Sci 11(3):35
Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware activity recognition and anomaly detection in video. J Sel Top Signal Process 7(1):91–101
Vo V, Ly N (2012) Robust human action recognition using improved BOW and hybrid features. In: 2012 IEEE International symposium on signal processing and information technology (ISSPIT). IEEE, pp 000224–000229
Gilbert A, Illingworth J, Bowden R (2009) Fast realistic multi-action recognition using mined dense spatio-temporal features. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 925–931
Grushin A, Monner DD, Reggia JA, Mishra A (2013) Robust human action recognition via long short-term memory. In: The 2013 international joint conference on, neural networks (IJCNN). IEEE, pp 1–8
Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition. In: IEEE 11th international conference on computer vision, 2007 ICCV 2007. IEEE, pp 1–8
Kläser A (2010) Learning human actions in video. Ph.D. Thesis, Université de Grenoble
Lin Z, Jiang Z, Davis LS (2009) Recognizing actions by shape-motion prototype trees. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 444–451
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 1996–2003
Liu J, Shah M (2008) Learning human actions via information maximization. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Rodriguez M (2010) Spatio-temporal maximum average correlation height templates in action recognition and video summarization. Electronic Theses and Dissertations, 4323
Schindler K, Van Gool L (2008) Action snippets: How many frames does human action recognition require? In: IEEE conference on computer vision and pattern recognition CVPR 2008. IEEE, pp 1–8
Sun X, Chen M, Hauptmann A (2009) Action recognition via local descriptors and holistic features. In: IEEE computer society conference on computer vision and pattern recognition workshops, 2009 CVPR workshops 2009. IEEE, pp 58–65
Veeriah V, Zhuang N, Qi G-J (2015) Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 4041–4049
Wu X, Liang W, Jia Y (2009) Incremental discriminative-analysis of canonical correlations for action recognition. In: 2009 IEEE 12th international conference on computer vision, 2009. IEEE, pp 2035–2041
Suto J, Oniga S, Lung C, Orha I (2018) Comparison of offline and real-time human activity recognition results using machine learning techniques. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3437-x
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1. Association for Computational Linguistics, pp 248–256
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space–time shapes. In: Tenth IEEE international conference on computer vision (ICCV’05). IEEE, pp 1395–1402
Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 357–360
Bregonzio M, Xiang T, Gong S (2012) Fusing appearance and distribution information of interest points for action recognition. Pattern Recognit 45(3):1220–1234
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance. IEEE, pp 65–72
Klaser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008 19th British machine vision conference. British Machine Vision Association, pp 275: 1–10
Liu H, Ju Z, Ji X, Chan CS, Khoury M (2017) Study of human action recognition based on improved spatio-temporal features. In: Human Motion sensing and recognition: a fuzzy qualitative approach. Springer, Berlin, pp 233–250
Moussa MM, Hamayed E, Fayek MB, El Nemr HA (2015) An enhanced method for human action recognition. J Adv Res 6(2):163–169
Singh YK, Singh ND (2017) Binary face image recognition using logistic regression and neural network. In: 2017 International conference on energy, communication, data analytics and soft computing (ICECDS). IEEE, pp 3883–3888
Pandey RK, Vignesh K, Ramakrishnan A (2018) Binary Document image super resolution for improved readability and OCR performance. arXiv preprint arXiv:1812.02475
Perner P, Perner H, Müller B (2002) Mining knowledge for HEp-2 cell image classification. Artif Intel Med 26(1–2):161–173
Santofimia MJ, Martinez-del-Rincon J, Nebel J-C (2014) Episodic reasoning for vision-based human action recognition. Sci World J 2014:270171
Laptev I, Lindeberg T (2006) Local descriptors for spatio-temporal recognition. In: Spatial coherence for visual motion analysis. Springer, pp 91–103
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44(8):1761–1776
Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804
Acknowledgements
The authors are thankful to RAC-LAB (www.rac-lab.com) for providing the trial version of their commercial software for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aslan, M.F., Durdu, A. & Sabanci, K. Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization. Neural Comput & Applic 32, 8585–8597 (2020). https://doi.org/10.1007/s00521-019-04365-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04365-9