Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition

Published: 12 November 2021 Publication History

Abstract

For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two major problems of covariance include the following: (1) it is prone to be singular so that actions fail to be represented properly, and (2) it is short of global action/pose-aware information so that expressive and discriminative power is limited. In this article, we propose a novel Bayesian covariance representation by a prior regularization method to solve the preceding problems. Specifically, covariance is viewed as a parametric maximum likelihood estimate of Gaussian distribution over local poses from an independent action. Then, a Global Informative Prior (GIP) is generated over global poses with sufficient statistics to regularize covariance. In this way, (1) singularity is greatly relieved due to sufficient statistics, (2) global pose information of GIP makes Bayesian covariance theoretically equivalent to a saliency weighting covariance over global action poses so that discriminative characteristics of actions can be represented more clearly. Experimental results show that our Bayesian covariance with GIP efficiently improves the performance of action recognition. In some databases, it outperforms the state-of-the-art variant methods that are based on kernels, temporal-order structures, and saliency weighting attentions, among others.

References

[1]
Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. 2007. Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications 29, 1 (2007), 328–347.
[2]
Silvere Bonnabel and Rodolphe Sepulchre. 2010. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM Journal on Matrix Analysis and Applications 31, 3 (2010), 1055–1070.
[3]
Kevin M. Carter, Raviv Raich, William G. Finn, and Alfred O. Hero III. 2009. Fine: Fisher information nonparametric embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 11 (2009), 2093–2098.
[4]
Jacopo Cavazza, Andrea Zunino, Marco San Biagio, and Vittorio Murino. 2016. Kernelized covariance for action recognition. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, Los Alamitos, CA, 408–413.
[5]
Chen Chen, Kui Liu, and Nasser Kehtarnavaz. 2016. Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing 12, 1 (2016), 155–163.
[6]
Guang Chen, Daniel Clarke, Manuel Giuliani, Andre Gaschler, and Alois Knoll. 2015. Combining unsupervised learning and discrimination for 3D action recognition. Signal Processing 110 (2015), 67–81.
[7]
Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1110–1118.
[8]
Masoud Faraki, Mehrtash T. Harandi, and Fatih Porikli. 2015. Approximate infinite-dimensional region covariance descriptors for image classification. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’15). IEEE, Los Alamitos, CA, 1364–1368.
[9]
Masoud Faraki, Mehrtash T. Harandi, and Fatih Porikli. 2016. Image set classification by symmetric positive semi-definite matrices. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, Los Alamitos, CA, 1–8.
[10]
Guillermo Garcia-Hernando and Tae Kyun Kim. 2017. Transition forests: Learning discriminative temporal transitions for action recognition and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 407–415.
[11]
Lena Gorelick, Moshe Blank, Eli Shechtman, Michal Irani, and Ronen Basri. 2007. Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 12 (Dec. 2007), 2247–2253.
[12]
Mohammad Abdelaziz Gowayyed, Marwan Torki, Mohammed Elsayed Hussein, and Motaz El-Saban. 2013. Histogram of oriented displacements (HOD): Describing trajectories of human joints for action recognition. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence.
[13]
Mehrtash Harandi, Mathieu Salzmann, and Fatih Porikli. 2014. Bregman divergences for infinite dimensional covariance matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1003–1010.
[14]
Mehrtash T. Harandi, Mathieu Salzmann, and Richard Hartley. 2014. From manifold to manifold: Geometry-aware dimensionality reduction for SPD matrices. In Proceedings of the European Conference on Computer Vision. 17–32.
[15]
Mehrtash T. Harandi, Conrad Sanderson, Richard Hartley, and Brian C. Lovell. 2012. Sparse coding and dictionary learning for symmetric positive definite matrices: A kernel approach. In Proceedings of the European Conference on Computer Vision. 216–229.
[16]
Min Huang, Song-Zhi Su, Hong-Bo Zhang, Guo-Rong Cai, Dongying Gong, Donglin Cao, and Shao-Zi Li. 2018. Multifeature selection for 3D human action recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2 (2018), 45.
[17]
Zhiwu Huang and Luc Van Gool. 2017. A Riemannian network for SPD matrix learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.
[18]
Mohamed E. Hussein, Marwan Torki, Mohammad A. Gowayyed, and Motaz El-Saban. 2013. Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence.
[19]
Tommi Jaakkola and David Haussler. 1999. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems. 487–493.
[20]
Piotr Koniusz and Anoop Cherian. 2016. Sparse coding for third-order super-symmetric tensor descriptors with application to texture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5395–5403.
[21]
Piotr Koniusz, Anoop Cherian, and Fatih Porikli. 2016. Tensor representations via kernel linearization for action recognition from 3D skeletons. In Proceedings of the European Conference on Computer Vision. 37–53.
[22]
Tomer Lancewicki. 2017. Regularization of the kernel matrix via covariance matrix shrinkage estimation. arXiv:1707.06156.
[23]
M. Aldjem and T. Lancewicki. 2014. Multi-target shrinkage estimation for covariance matrices. IEEE Transactions on Signal Processing 62, 24 (2014), 6380–6390.
[24]
Wanqing Li, Zhengyou Zhang, and Zicheng Liu. 2010. Action recognition based on a bag of 3D points. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Los Alamitos, CA, 9–14.
[25]
Xi Li, Weiming Hu, Zhongfei Zhang, Xiaoqin Zhang, Mingliang Zhu, and Jian Cheng. 2008. Visual tracking via incremental Log-Euclidean Riemannian subspace learning. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1–8.
[26]
Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C. Kot. 2017. Global context-aware attention LSTM networks for 3D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1647–1656.
[27]
Hà Quang Minh and Vittorio Murino. 2017. Covariances in computer vision and machine learning. Synthesis Lectures on Computer Vision 7, 4 (2017), 1–170.
[28]
Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA.
[29]
Xuan Son Nguyen, Abdel-Illah Mouaddib, and Thanh Phuong Nguyen. 2019. Hierarchical Gaussian descriptor based on local pooling for action recognition. Machine Vision and Applications 30 (2019), 321–343.
[30]
Omar Oreifej and Zicheng Liu. 2013. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716–723.
[31]
Yanwei Pang, Yuan Yuan, and Xuelong Li. 2008. Gabor-based region covariance matrices for face recognition. IEEE Transactions on Circuits and Systems for Video Technology 18, 7 (2008), 989–993.
[32]
Xavier Pennec, Pierre Fillard, and Nicholas Ayache. 2006. A Riemannian framework for tensor computing. International Journal of Computer Vision 66, 1 (2006), 41–66.
[33]
S. James Press. 2005. Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference. Courier Corporation.
[34]
Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. 2017. Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition. Pattern Recognition 66 (2017), 202–212.
[35]
Ha Quang Minh, Marco San Biagio, Loris Bazzani, and Vittorio Murino. 2016. Approximate log-Hilbert-Schmidt distances between covariance operators for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5195–5203.
[36]
Ali Rahimi and Benjamin Recht. 2008. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems. 1177–1184.
[37]
Carl Edward Rasmussen. 2003. Gaussian processes in machine learning. In Summer School on Machine Learning. Springer, 63–71.
[38]
John Shawe-Taylor and Nello Cristianini2004. Kernel Methods for Pattern Analysis. Cambridge University Press.
[39]
Hiroshi Shimodaira, Ken-Ichi Noma, Mitsuru Nakai, and Shigeki Sagayama. 2002. Dynamic time-alignment kernel in support vector machine. In Advances in Neural Information Processing Systems. 921–928.
[40]
Saurabh Singh, Abhinav Gupta, and Alexei A. Efros. 2012. Unsupervised discovery of mid-level discriminative patches. In Proceedings of the European Conference on Computer Vision. 73–86.
[41]
Oncel Tuzel, Fatih Porikli, and Peter Meer. 2006. Region covariance: A fast descriptor for detection and classification. In Proceedings of the European Conference on Computer Vision. 589–600.
[42]
Oncel Tuzel, Fatih Porikli, and Peter Meer2007. Human detection via classification on Riemannian manifolds. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 4.
[43]
Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3D skeletons as points in a Lie Group. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 588–595.
[44]
Raviteja Vemulapalli and Rama Chellappa. 2016. Rolling rotations for recognizing human actions from 3D skeletal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4471–4479.
[45]
Matt P. Wand and M. Chris Jones. 1994. Kernel Smoothing. Chapman & Hall/CRC.
[46]
Chunyu Wang, Yizhou Wang, and Alan L. Yuille. 2013. An approach to pose-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 915–922.
[47]
Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2012. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1290–1297.
[48]
Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2014. Learning actionlet ensemble for 3D human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 5 (2014), 914–927.
[49]
Lei Wang, Jianjia Zhang, Luping Zhou, Chang Tang, and Wanqing Li. 2015. Beyond covariance: Feature representation with nonlinear kernel matrices. In Proceedings of the IEEE International Conference on Computer Vision. 4570–4578.
[50]
Ruiping Wang, Huimin Guo, Larry S. Davis, and Qionghai Dai. 2012. Covariance discriminative learning: A natural and efficient approach to image set classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2496–2503.
[51]
Junwu Weng, Chaoqun Weng, and Junsong Yuan. 2017. Spatio-temporal naive-Bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4171–4180.
[52]
Min Xiang, Shirin Enshaeifar, Alexander E. Stott, Clive Cheong Took, Yili Xia, Sithan Kanna, and Danilo P. Mandic. 2018. Simultaneous diagonalisation of the covariance and complementary covariance matrices in quaternion widely linear signal processing. Signal Processing 148 (2018), 193–204.
[53]
Jianchao Yang, John Wright, Thomas Huang, and Yi Ma. 2008. Image super-resolution as sparse representation of raw image patches. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1–8.
[54]
Xiaodong Yang and YingLi Tian. 2014. Super normal vector for activity recognition using depth sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 804–811.
[55]
Jun Ye, Hao Hu, Guo-Jun Qi, and Kien A. Hua. 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 2 (2017), 14.
[56]
Mihai Zanfir, Marius Leordeanu, and Cristian Sminchisescu. 2013. The Moving Pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE International Conference on Computer Vision. 2752–2759.
[57]
Jianhai Zhang, Zhiyong Feng, Yong Su, and Meng Xing. 2019. Discriminative saliency-pose-attention covariance for action recognition. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 2132–2136.
[58]
Jianhai Zhang, Zhiyong Feng, Yong Su, and Meng Xing. 2020. Cross-covariance matrix: Time-shifted correlations for 3D action recognition. Signal Processing 171 (2020), 107499.
[59]
Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In Proceedings of the IEEE International Conference on Computer Vision. 398–407.
[60]
Yu Zhu, Wenbin Chen, and Guodong Guo. 2013. Fusing spatiotemporal features and joints for 3D action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 486–491.

Cited By

View all
  • (2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
  • (2024)TFSemantic: A Time–Frequency Semantic GAN Framework for Imbalanced Classification Using Radio SignalsACM Transactions on Sensor Networks10.1145/361409620:4(1-22)Online publication date: 11-May-2024
  • (2024)SplitDB: Closing the Performance Gap for LSM-Tree-Based Key-Value StoresIEEE Transactions on Computers10.1109/TC.2023.332698273:1(206-220)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
November 2021
529 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3492437
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2021
Accepted: 01 April 2021
Revised: 01 January 2021
Received: 01 July 2019
Published in TOMM Volume 17, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Covariance matrix
  2. 3D action recognition
  3. riemannian manifold
  4. Bayesian regularization

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing file systems on heterogeneous memory by integrating DRAM cache with virtual memory managementProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650702(71-88)Online publication date: 27-Feb-2024
  • (2024)TFSemantic: A Time–Frequency Semantic GAN Framework for Imbalanced Classification Using Radio SignalsACM Transactions on Sensor Networks10.1145/361409620:4(1-22)Online publication date: 11-May-2024
  • (2024)SplitDB: Closing the Performance Gap for LSM-Tree-Based Key-Value StoresIEEE Transactions on Computers10.1109/TC.2023.332698273:1(206-220)Online publication date: 1-Jan-2024
  • (2023)Cross-modal Semantically Augmented Network for Image-text MatchingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363135620:4(1-18)Online publication date: 11-Dec-2023
  • (2023)Incentive Mechanism with Task Bundling for Mobile Crowd SensingACM Transactions on Sensor Networks10.1145/358178819:3(1-23)Online publication date: 17-Apr-2023
  • (2023)Bidirectional Transformer GAN for Long-term Human Motion PredictionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357935919:5(1-19)Online publication date: 15-Apr-2023
  • (2023)InDe: An Inline Data Deduplication Approach via Adaptive Detection of Valid Container UtilizationACM Transactions on Storage10.1145/356842619:1(1-27)Online publication date: 11-Jan-2023
  • (2023)Data-Aware Proxy Hashing for Cross-modal RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591660(686-696)Online publication date: 19-Jul-2023
  • (2023)Deep Hierarchical Attention Active Learning for Mental Disorder Unlabeled Data in AIoMTACM Transactions on Sensor Networks10.1145/351930419:3(1-18)Online publication date: 1-Mar-2023
  • (2023)Boosting Scene Graph Generation with Visual Relation SaliencyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351404119:1(1-17)Online publication date: 5-Jan-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media