Abstract
Manipulative action recognition is one of the most important and challenging topic in the fields of image processing. In this paper, three kinds of sensor modules are used for motion, force and object information capture in the manipulative actions. Two fusion methods are proposed. Further, the recognition accuracy can be improved by using object as context. For the feature-level fusion method, significant features are chosen first. Then the Hidden Markov Models are built with these selected features to characterize the temporal sequence. For the decision-level fusion method, HMMs are built for each feature group. Then the decisions are fused. On top of these two fusion methods, the object/action context is modeled using Bayesian network. Assembly tasks are used for algorithm evaluation. The experimental results prove that the proposed approach is effective on manipulative action recognition task. The recognition accuracy of the decision-level, feature-level fusion methods and the Bayesian model are 72%, 80% and 90% respectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmad, M., & Lee, S. W. (2006). Hmm-based human action recognition using multiview image sequences. In International conference on pattern recognition (pp. 263–266).
Aldoma, A., Marton, Z. C., Tombari, F., & Wohlkinger, W. (2012). Tutorial: Point cloud library—Three-dimensional object recognition and 6 dof pose estimation. Robotics & Automation Magazine IEEE, 19(3), 80–91.
Alhamzi, K., Elmogy, M., & Barakat, S. (2015). 3D object recognition based on local and global features using point cloud library. International Journal of Advancements in Computing Technology, 7, 43–54.
Banos, O., Damas, M., Guillen, A., Herrera, L. J., Pomares, H., Rojas, I., Villalonga, C., & Lee, S. (2015). On the development of a real-time multi-sensor activity recognition system. In International work-conference on ambient assisted living. ICT-based solutions in real life situations (pp. 176–182).
Bux, A., Angelov, P., & Habib, Z. (2016). Vision based human activity recognition: A review. Berlin: Springer.
Chen, C., Jafari, R., & Kehtarnavaz, N. (2017). A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools and Applications, 76(3), 4405–4425.
Chernbumroong, S., Shuang, C., & Yu, H. (2014). A practical multi-sensor activity recognition system for home-based care. Decision Support Systems, 66(C), 61–70.
Chu, V., Fitzgerald, T., & Thomaz, A. L. (2016). Learning object affordances by leveraging the combination of human-guidance and self-exploration. In ACM/IEEE international conference on human–robot interaction (pp. 221–228).
Diete, A., Sztyler, T., & Stuckenschmidt, H. (2017). A smart data annotation tool for multi-sensor activity recognition. In IEEE international conference on pervasive computing and communications workshops (pp. 111–116).
Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple classifier systems, LBCS-1857 (pp. 1–15).
Du, Y., Wang, W., & Wang, L. (2015). Hierarchical recurrent neural network for skeleton based action recognition. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1110–1118).
Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016). Convolutional two-stream network fusion for video action recognition. In CoRR. arXiv:1604.06573.
Gu, Y., Do, H., & Sheng, W. (2012). Human gesture recognition through a kinect sensor. In IEEE international conference on robotics and biomimetics.
Gu, Y., Sheng, W., Liu, M., & Ou, Y. (2015). Fine manipulative action recognition through sensor fusion. In IEEE/RSJ international conference on intelligent robots and systems (pp. 886–891).
He, Z. (2010). A new feature fusion method for gesture recognition based on 3d accelerometer. In 2010 Chinese conference on pattern recognition (CCPR) (pp. 1–5).
Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., & Boussaid, F. (2020). Learning latent global network for skeleton-based action prediction. IEEE Transactions on Image Processing, 29, 959–970.
Ke, Q., Fritz, M., & Schiele, B. (2019). Time-conditioned action anticipation in one shot. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 9917–9926).
Ke, Q., Liu, J., Bennamoun, M., Rahmani, H., An, S., Sohel, F., et al. (2019). Global regularizer and temporal-aware cross-entropy for skeleton-based early action recognition. In C. Jawahar, H. Li, G. Mori, & K. Schindler (Eds.), Computer vision—ACCV 2018 (pp. 729–745). Cham: Springer.
Kumar, S. H. & Sivaprakash, P. (2013). New approach for action recognition using motion based features. In Information and communication technologies (pp. 1247–1252).
Lara, O., & Labrador, M. (2013). A survey on human activity recognition using wearable sensors. IEEE Communications Surveys Tutorials, 15(3), 1192–1209.
Liu, J., Shahroudy, A., Wang, G., Duan, L., & Chichung, A. Kot, (2019). Skeleton-based online action prediction using scale selection network. In IEEE transactions on pattern analysis and machine intelligence (p. 1).
Liu, J., Shahroudy, A., Wang, G., Duan, L., & Kot, A. C. (2018a). SSNET: Scale selection network for online 3d action prediction. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 8349–8358).
Liu, J., Shahroudy, A., Xu, D., Kot, A. C., & Wang, G. (2018b). Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 3007–3021.
Liu, J., Wang, G., Duan, L., Hu, P., & Kot, A. C. (2017). Skeleton based human action recognition with global context-aware attention LSTM networks. In CoRR. arXiv:1707.05740.
Meena, P. R., & Shantha, S. K. R. (2017). Spatial fuzzy c means and expectation maximization algorithms with bias correction for segmentation of mr brain images. Journal of Medical Systems, 41(1), 15.
Munaro, M., Rusu, R. B., & Menegatti, E. (2016). 3D robot perception with point cloud library. Robotics & Autonomous Systems, 78, 97–99.
Nag, A., & Mukhopadhyay, S. C. (2015). Occupancy detection at smart home using real-time dynamic thresholding of flexiforce sensor. IEEE Sensors Journal, 15(8), 4457–4463.
Pfister, A., West, A. M., Bronner, S., & Noah, J. A. (2014). Comparative abilities of microsoft kinect and vicon 3d motion capture for gait analysis. Journal of Medical Engineering and Technology, 38(5), 274–280.
Quigley, M., Conley, K., Gerkey, B. P., Faust, J., Foote, T., Leibs, J., Wheeler, R., & Ng, A. Y. (2009). ROS: An open-source robot operating system. In ICRA workshop on open source software.
Rahmani, H., Mian, A., & Shah, M. (2018). Learning a deep model for human action recognition from novel viewpoints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 667–681.
Shahroudy, A., Liu, J., Ng, T. T., & Wang, G. (2016). Ntu rgb+d: A large scale dataset for 3d human activity analysis. In Computer vision and pattern recognition (pp. 1010–1019).
Sharma, S., Kiros, R., & Salakhutdinov, R. (2015). Action recognition using visual attention. arXiv:1511.04119.
Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In CoRR. arXiv:1406.2199.
Smisek, J., Jancosek, M., & Pajdla, T. (2013). 3D with kinect. Advances in Computer Vision & Pattern Recognition, 21(5), 1154–1160.
Stiefmeier, T., Ogris, G., Junker, H., Lukowicz, P., & Troster, G. (2006). Combining motion sensors and ultrasonic hands tracking for continuous activity recognition in a maintenance scenario. In 10th IEEE international symposium on wearable computers(pp. 97–104).
Titus, J. A. (2012). The hands-on XBEE lab manual: experiments that teach you XBEE wirelesss communications (1st edn.). Newnes.
Tombari, F., Salti, S., & Di Stefano, L. (2010). Unique signature of histograms for local surface description. In Proceedings of the 11th European conference on computer vision (pp. 356–369).
Tombari, F., Salti, S., & Di Stefano, L. (2011). A combined texture-shape descriptor for enhanced 3D feature matching. In IEEE international conference on image processing (ICIP) (pp. 809–812).
Tran, K., Kakadiaris, I. A., & Shah, S. K. (2012). Fusion of human posture features for continuous action recognition. In Proceedings of the 11th European conference on trends and topics in computer vision—volume part I, ser. ECCV’10, 2012 (pp. 244–257).
Tsai, C. H., & Yen, J. C. (2014). Teaching spatial visualization skills using OpenNI and the microsoft kinect sensor. Berlin: Springer.
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Gool, L. V. (2016). Temporal segment networks: Towards good practices for deep action recognition. In CoRR. arXiv:1608.00859.
Wu, Q., Wang, Z., Deng, F., Chi, Z., & Feng, D. (2013). Realistic human action recognition with multimodal feature selection and fusion. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 43(4), 875–885.
Yan, S., Xiong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. In CoRR. arXiv:1801.07455.
Yang, Y., Li, Y., Fermuller, C., & Aloimonos, Y. (2015). Robot learning manipulation action plans by “watching” unconstrained videos from the world wide web. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence, ser. AAAI’15. AAAI Press, 2015 (pp. 3686–3692). http://dl.acm.org/citation.cfm?id=2888116.2888228.
Zhao, Z., Cox, J., Duling, D., & Sarle, W. (2012). Massively parallel feature selection: An approach based on variance preservation. In European conference on machine learning and knowledge discovery in databases (pp. 237–252).
Zhao, Z., Ma, H., & You, S. (2016). Single image action recognition using semantic body part actions. In CoRR. arXiv:1612.04520.
Zhou, L., Li, W., & Ogunbona, P. (2016). Learning a pose lexicon for semantic action recognition. In IEEE international conference on multimedia and expo (pp. 1–6).
Acknowledgements
This Project is supported by the National Natural Science Foundation of China (No. 61906123). The Fundamental Research Funds for Shenzhen Technology University. Shenzhen Overseas High Level Talent (Peacock Plan) Program (No. KQTD20140630154026047). National Natural Science Foundation of China U1713216. Shenzhen basic research Projects (JCYJ20160429161539298). National Natural Science Foundation of China (No. 61976070). Scientific Research Platforms and Projects in Universities in Guangdong Province under Grants 2019KTSCX204.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gu, Y., Liu, M., Sheng, W. et al. Sensor fusion based manipulative action recognition. Auton Robot 45, 1–13 (2021). https://doi.org/10.1007/s10514-020-09943-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-020-09943-8