Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3240508.3240675acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition

Published: 15 October 2018 Publication History

Abstract

Current researches mainly focus on single-view and multiview human action recognition, which can hardly satisfy the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of databases also sets up barriers. In this paper, we newly collect a large-scale RGB-D action database for arbitrary-view action analysis, including RGB videos, depth and skeleton sequences. The database includes action samples captured in 8 fixed viewpoints and varying-view sequences which covers the entire 360 view angles. In total, 118 persons are invited to act 40 action categories, and 25,600 video samples are collected. Our database involves more articipants, more viewpoints and a large number of samples. More importantly, it is the first database containing the entire 360? varying-view sequences. The database provides sufficient data for cross-view and arbitrary-view action analysis. Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition. Experiment results show that the VS-CNN achieves superior performance.

References

[1]
Yi Bin, Yang Yang, Fumin Shen, Ning Xie, Heng Tao Shen, and Xuelong Li. 2018. Describing Video with Attention based Bidirectional LSTM. IEEE Transactions on Cybernetics (2018).
[2]
Z. Cai, L. Wang, X. Peng, and Y. Qiao. 2014. Multi-View Super Vector for Action Recognition. In CVPR.
[3]
Z. Gao, S. Li, Y. Zhu, C. Wang, and H. Zhang. 2017. Collaborative sparse representation leaning model for RGBD action recognition. Journal of Visual Communication and Image Representation 48 (2017), 442--452.
[4]
A. Gupta, J. Martinez, J. J. Little, and R. J. Woodham. 2014. 3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding. In CVPR.
[5]
K. Hara, H. Kataoka, Y. Satoh, and Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. In CVPR.
[6]
J. Hu, W. Zheng, J. Lai, S. Gong, and T. Xiang. 2016. Exemplarbased Recognition of Human-Object Interactions. IEEE Transactions on Circuits and Systems for Video Technology 26(4) (2016), 647--660.
[7]
J. Hu, W. Zheng, L. Ma, and et al. 2016. Real-time RGB-D Activity Prediction by Soft Regression. In Proc. of ECCV.
[8]
J. Hu, W. S. Zheng, J. Lai, and J. Zhang. 2017. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39 (2017), 2186--2200.
[9]
J. Hu, W. S. Zheng, J. H. Lai, and J. Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In CVPR.
[10]
M. Hu, Y. Yang, F. Shen, L. Zhang, H. T. Shen, and X. Li. 2017. Robust Web Image Annotation via Exploring Multi-facet and Structural Knowledge. IEEE Transactions on Image Processing 26, 10 (2017), 4871--4884.
[11]
Y. Ji, H. Cheng, Y. Zheng, and H. Li. 2015. Learning Contrastive Feature Distribution Model for Interaction Recognition. Journal of Visual Communication and Image Representation, 33 (Nov. 2015), 340--349.
[12]
Y. Ji, Y. Ko, A. Shimada, H. Nagahara, and R. Taniguchi. 2012. Cooking Gesture Recognition using Local Feature and Depth Image. In Proc. of ACMMM in workshop CEA.
[13]
Y. Ji, Y. Yang, X. Xu, and H. T. Shen. 2017. One-shot Learning based Pattern Transition Map for Action Early Recognition. Signal Processing (2017).
[14]
Y.-G. Jiang, Q. Dai, W. Liu, X. Xue, and C.-W. Ngo. 2015. Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling. IEEE Transactions on Image Processing 24, 11 (2015), 3781--3795.
[15]
Y. G. Jiang, Z. Wu, J. Wang, X. Xue, and S. F. Chang. 2018. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 352--364.
[16]
T. Kim and A. Reiter. 2017. Interpretable 3D Human Action Analysis with Temporal Convolutional Networks. In CVPRW.
[17]
M. Wang L. Xie, Q. Tian and B. Zhang. 2014. Spatial Pooling of Heterogeneous Features for Image Classification. IEEE Transactions on Image Processing 23, 5 (2014), 1994--2008.
[18]
C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager. 2017. Temporal convolutional networks for action segmentation and detection. In CVPR.
[19]
C. Li, Z. Huang, Y. Yang, J. Cao, X. Sun, and H. T. Shen. 2017. Hierarchical Latent Concept Discovery for Video Event Detection. IEEE Transactions on Image Processing 26, 5 (2017), 2149--2162.
[20]
C. Li, P. Wang, S. Wang, Y. Hou, and W. Li. 2017. Skeleton-based Action Recognition Using LSTM and CNN. CoRR abs/1707.02356 (2017).
[21]
R. Li and T. Zickler. 2012. Discriminative virtual views for crossview action recognition. In CVPR.
[22]
A. Liu, N. Xu, W. Nie, Y. Su, Y. Wong, and M. Kankanhalli. 2017. Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition. IEEE Trans. Cybernetics 47 (2017), 1781--1794.
[23]
J. Liu, M. Shah, B. Kuipers, and S. Savarese. 2011. Cross-view action recognition via view knowledge transfer. In CVPR.
[24]
M. Liu, H. Liu, and C. Chen. 2017. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition 68 (2017), 346--362.
[25]
H. Rahmani, A. Mahmood, D. Huynh, and A. Mian. 2014. Histogram of Oriented Principal Components for Cross-View Action Recognition. In ECCV.
[26]
H. Rahmani, A. Mahmood, D. Huynh, and A. Mian. 2016. Histogram of Oriented Principal Components for Cross-View Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38 (2016), 2430--2443.
[27]
H. Rahmani and A. Mian. 2015. Learning a non-linear knowledge transfer model for cross-view action recognition. In CVPR.
[28]
H. Rahmani and A. Mian. 2016. 3D Action Recognition from Novel Viewpoints. In CVPR.
[29]
L. Rybok, S. Friedberger, U. D. Hanebeck, and R. Stiefelhagen. 2011. The KIT Robo-kitchen data set for the evaluation of view-based activity recognition systems. In 11th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2011).
[30]
A. Shahroudy, J. Liu, T. T. Ng, and G. Wang. 2016. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. In CVPR.
[31]
Y. Shen, R. Ji, S. Zhang, W. Zuo, Y Wang, and F. Huang. 2018. Generative Adversarial Learning Towards Fast Weakly Supervised Detection. In CVPR.
[32]
J. Wang, Z. Liu, Y. Wu, and J. Yuan. 2014. Learning Actionlet Ensemble for 3D Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (2014), 914-- 927.
[33]
J. Wang, X. Nie, Y. Xia, Y. Wu, and S.C. Zhu. 2014. Cross-View Action Modeling, Learning, and Recognition. In CVPR.
[34]
P. Wei, Y. Zhao, N. Zheng, and S.C. Zhu. 2013. Modeling 4D Human-Object Interactions for Event and Object Recognition. In ICCV.
[35]
D. Weinland, E. Boyer, and R. Ronfard. 2007. Action Recognition from Arbitrary Views using 3D Exemplars. In ICCV.
[36]
D. Weinland, R. Ronfard, and E. Boyer. 2006. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding 104 (2006), 249--257.
[37]
P. Yan, S. M. Khan, and M. Shah. 2008. Learning 4D action feature models for arbitrary view action recognition. In CVPR.
[38]
S. Yan, Y. Xiong, and D. Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.
[39]
L. Yu, Y. Yang, Z. Huang, P. Wang, J. Song, and H. T. Shen. 2016. Web Video Event Recognition by Semantic Analysis from Ubiquitous Documents. IEEE Transactions on Image Processing 25, 12 (2016), 5689--5701.
[40]
Z.Cheng, L. Qin, Y. Ye, Q. Huang, and Q. Tian. 2012. Human Daily Action Analysis with Multi-view and Color-Depth Data. In ECCV.
[41]
C. Zhang and W. Zheng. 2017. Semi-supervised Multi-view Discrete Hashing for Fast Image Search. IEEE Transactions on Image Processing 26(6) (2017), 2604--2617.
[42]
L. Zhang, Y. Yang, M. Wang, R. Hong, L. Nie, and X. Li. 2016. Detecting Densely Distributed Graph Patterns for Fine-grained Image Categorization. IEEE Transactions on Image Processing 25, 2 (2016), 553--565.
[43]
Z. Zhang, C. Wang, B. Xiao, W. Zhou, S. Liu, and C. Shi. 2013. Cross-View Action Recognition via a Continuous Virtual Path. In CVPR.
[44]
F. Zhu, L. Shao, and M. Lin. 2013. Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recognition Letters 34 (2013), 20--24.

Cited By

View all
  • (2024)GUESS: GradUally Enriching SyntheSis for Text-Driven Human Motion GenerationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335200230:12(7518-7530)Online publication date: Dec-2024
  • (2024)Hypergraph-Based Multi-View Action Recognition Using Event CamerasIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338211746:10(6610-6622)Online publication date: Oct-2024
  • (2024)MotionDiffuse: Text-Driven Human Motion Generation With Diffusion ModelIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.335541446:6(4115-4128)Online publication date: Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. arbitrary-view recognition
  2. cross-view recognition
  3. hri
  4. human action recognition
  5. rgb-d action database

Qualifiers

  • Research-article

Funding Sources

  • Natural Science Foundation of China (NSFC)

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)8
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)GUESS: GradUally Enriching SyntheSis for Text-Driven Human Motion GenerationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.335200230:12(7518-7530)Online publication date: Dec-2024
  • (2024)Hypergraph-Based Multi-View Action Recognition Using Event CamerasIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338211746:10(6610-6622)Online publication date: Oct-2024
  • (2024)MotionDiffuse: Text-Driven Human Motion Generation With Diffusion ModelIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.335541446:6(4115-4128)Online publication date: Jun-2024
  • (2024)Human Motion Generation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333093546:4(2430-2449)Online publication date: Apr-2024
  • (2024)Diffusion-Based Graph Generative MethodsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.346630136:12(7954-7972)Online publication date: Dec-2024
  • (2024)3D Human Animation Synthesis based on a Temporal Diffusion Generative Model2024 2nd International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)10.1109/PRMVIA63497.2024.00028(108-116)Online publication date: 24-May-2024
  • (2024)Language-Free Compositional Action Generation via Decoupling RefinementICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448207(2910-2914)Online publication date: 14-Apr-2024
  • (2024)RE-STNet: relational enhancement spatio-temporal networks based on skeleton action recognitionMultimedia Tools and Applications10.1007/s11042-024-18864-yOnline publication date: 15-Mar-2024
  • (2024)Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion GenerationComputer Vision – ECCV 202410.1007/978-3-031-73383-3_26(445-463)Online publication date: 3-Nov-2024
  • (2024)Bridging the Gap Between Human Motion and Action Semantics via Kinematic PhrasesComputer Vision – ECCV 202410.1007/978-3-031-73242-3_13(223-240)Online publication date: 29-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media