Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Dynamic texture and scene classification by transferring deep image features

Published: 01 January 2016 Publication History

Abstract

Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However, the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changes, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempt to leverage a deep structure to extract features for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be more specific, we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a feature extractor to extract mid-level features from each frame, and then form the video-level representation by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover, we explore two different implementations of the TCoF scheme, i.e., the spatial TCoF and the temporal TCoF. In the spatial TCoF, the mean-removed frames are used as the inputs of the ConvNet; whereas in the temporal TCoF, the differences between two adjacent frames are used as the inputs of the ConvNet. We evaluate systematically the proposed spatial TCoF and the temporal TCoF schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, and demonstrate that the proposed approach yields superior performance.

References

[1]
B. Afsari, R. Chaudhry, A. Ravichandran, R. Vidal, Group action induced distances for averaging and clustering linear dynamical systems with applications to the analysis of dynamic scenes, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Providence, RI, USA, 2012, pp. 2208-2215.
[2]
H. Azizpour, A.S. Razavian, J. Sullivan, A. Maki, S. Carlsson, From generic to specific deep representations for visual recognition, 2014, arXiv preprint arXiv:1406.5774.
[3]
C.-C. Chang, C.-J. Lin, Libsvm: a library for support vector machines, ACM Trans. Intell. Syst. Technol. (TIST), 2 (2011) 27.
[4]
K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, in: British Machine Vision Conference, 2011.
[5]
K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, Return of the devil in the details: delving deep into convolutional nets, in: British Machine Vision Conference, 2014.
[6]
R. Chaudhry, G. Hager, R. Vidal, Dynamic template tracking and recognition, Int. J. Comput. Vis., 105 (2013) 19-48.
[7]
J. Chen, G. Zhao, M. Salo, E. Rahtu, M. Pietikäinen, Automatic dynamic texture segmentation using local descriptors and optical flow, IEEE Trans. Image Process., 22 (2013) 326-339.
[8]
R. Collobert, J. Weston, A unified architecture for natural language processing: deep neural networks with multitask learning, in: Proceedings of the 25th International Conference on Machine Learning, ACM, Helsinki, Finland, 2008, pp. 160-167.
[9]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res., 12 (2011) 2493-2537.
[10]
L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, et al., Recent advances in deep learning for speech research at microsoft, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Florence, Italy, 2013. pp. 8604-8608.
[11]
K.G. Derpanis, M. Lecce, K. Daniilidis, R.P. Wildes, Dynamic scene understanding: the role of orientation features in space and time in scene classification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE,Providence, RI, USA, 2012, pp. 1306-1313.
[12]
K.G. Derpanis, R.P. Wildes, Dynamic texture recognition based on distributions of spacetime oriented structure, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisco, CA, USA, 2010, pp. 191-198.
[13]
K.G. Derpanis, R.P. Wildes, Spacetime texture representation and recognition based on a spatiotemporal orientation analysis, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012) 1193-1205.
[14]
G. Doretto, A. Chiuso, Y.N. Wu, S. Soatto, Dynamic textures, Int. J. Comput. Vis., 51 (2003) 91-109.
[15]
C. Feichtenhofer, A. Pinz, R.P. Wildes, Spacetime forests with complementary features for dynamic scene recognition, in: British Machine Vision Conference, 2013.
[16]
C. Feichtenhofer, A. Pinz, R.P. Wildes, Bags of spacetime energies for dynamic scene recognition, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Columbus, OH, USA, 2014.
[17]
K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, in: Computer Vision-ECCV 2014, 2014, Springer, Zurich, Switzerland, pp. 346-361.
[18]
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997) 1735-1780.
[19]
H. Ji, X. Yang, H. Ling, Y. Xu, Wavelet domain multifractal analysis for static and dynamic texture classification, IEEE Trans. Image Process., 22 (2013) 286-299.
[20]
Y. Jia, Caffe: an open source convolutional architecture for fast feature embedding, 2013, {http://caffe.berkeleyvision.org}.
[21]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[22]
K. Konda, R. Memisevic, V. Michalski, Learning to encode motion using spatio-temporal synchrony, in: International Conference on Learning Representations, 2013.
[23]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097-1105.
[24]
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86(11) (1998) 2278-2324.
[25]
D. Mandic, J. Chambers, Recurrent Neural Networks for Prediction, John Wiley & Sons, Inc., New York, NY, USA, 2001.
[26]
M. Marszalek, I. Laptev, C. Schmid, Actions in context, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, FL, USA, 2009, pp. 2929-2936.
[27]
A. Mumtaz, E. Coviello, G. Lanckriet, A. Chan, A scalable and accurate descriptor for dynamic textures using bag of system trees, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015) 697-712.
[28]
T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell., 24 (2002) 971-987.
[29]
A. Oliva, A. Torralba, Modeling the shape of the scene, Int. J. Comput. Vis., 42 (2001) 145-175.
[30]
F. Perronnin, J. Sánchez, T. Mensink, Improving the Fisher kernel for large-scale image classification, in: European Conference on Computer Vision, Springer, Heraklion, Crete, Greece, 2010, pp. 143-156.
[31]
R. Péteri, S. Fazekas, M.J. Huiskes, Dyntex: A comprehensive database of dynamic textures, Pattern Recognit. Lett., 31 (2010) 1627-1632.
[32]
M. Pietikäinen, A. Hadid, G. Zhao, T. Ahonen, Springer, London, UK, 2011.
[33]
X. Qi, R. Xiao, J. Guo, L. Zhang, Pairwise rotation invariant co-occurrence local binary pattern, in: European Conference on Computer Vision, Springer, Florence, Italy, 2012, pp. 158-171.
[34]
E. Rahtu, J. Heikkilä, V. Ojansivu, T. Ahonen, Local phase quantization for blur-insensitive image analysis, Image Vis. Comput., 30 (2012) 501-512.
[35]
A. Ravichandran, R. Chaudhry, R. Vidal, Categorizing dynamic textures using a bag of dynamical systems, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 342-353.
[36]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: integrated recognition, localization and detection using convolutional networks, 2013, arXiv preprint arXiv:1312.6229.
[37]
A. Sharif Razavian, H. Azizpour, J. Sullivan, S. Carlsson, Cnn features off-the-shelf: an astounding baseline for recognition, 2014, arXiv preprint arXiv:1403.6382.
[38]
N. Shroff, P. Turaga, R. Chellappa, Moving vistas: exploiting motion for describing scenes, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisco, CA, USA, 2010, pp. 1911-1918.
[39]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.
[40]
Y. Sun, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, 2014, arXiv preprint arXiv:1406.4773.
[41]
Y. Sun, X. Wang, X. Tang, Deep learning face representation from predicting 10,000 classes, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[42]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, 2014, arXiv preprint arXiv:1409.4842.
[43]
C. Theriault, N. Thome, M. Cord, Dynamic scene classification: learning motion descriptors with slow features analysis, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Portland, OR, USA, 2013, pp. 2603-2610.
[44]
L. van der Maaten, G. Hinton, Visualizing data using t-sne, J. Mach. Learn. Res., 9 (2008) 85.
[45]
A. Vedaldi, K. Lenc, Matconvnet-convolutional neural networks for matlab, 2014, arXiv preprint arXiv:1412.4564.
[46]
Y. Xu, Y. Quan, H. Ling, H. Ji, Dynamic texture classification using dynamic fractal analysis, in: IEEE International Conference on International Conference on Computer Vision (ICCV), IEEE, Barcelona, Spain, 2011, pp. 1219-1226.
[47]
G. Zhao, M. Pietikäinen, Dynamic texture recognition using local binary patterns with an application to facial expressions, IEEE Trans. Pattern Anal. Mach. Intell., 29 (2007) 915-928.

Cited By

View all
  1. Dynamic texture and scene classification by transferring deep image features

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Neurocomputing
    Neurocomputing  Volume 171, Issue C
    January 2016
    1693 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 01 January 2016

    Author Tags

    1. Convolutional neural network
    2. Dynamic scene classification
    3. Dynamic texture classification
    4. Transferred ConvNet feature

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Adequately hierarchical patterns based on pairwise regionsMultimedia Systems10.1007/s00530-023-01217-430:1Online publication date: 28-Jan-2024
    • (2023)Representing dynamic textures based on polarized gradient featuresMachine Vision and Applications10.1007/s00138-023-01438-734:5Online publication date: 28-Aug-2023
    • (2022)A Novel Multi-Modal Network-Based Dynamic Scene UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/346221818:1(1-19)Online publication date: 27-Jan-2022
    • (2022)Dynamic texture description using adapted bipolar-invariant and blurred featuresMultidimensional Systems and Signal Processing10.1007/s11045-022-00826-y33:3(945-979)Online publication date: 1-Sep-2022
    • (2021)Digital Audio Scene Recognition Method Based on Machine Learning TechnologyScientific Programming10.1155/2021/23886972021Online publication date: 1-Jan-2021
    • (2021)A Comprehensive Taxonomy of Dynamic Texture RepresentationACM Computing Surveys10.1145/348789255:1(1-39)Online publication date: 23-Nov-2021
    • (2021)Prominent Local Representation for Dynamic Textures Based on High-Order Gaussian-GradientsIEEE Transactions on Multimedia10.1109/TMM.2020.299720223(1367-1382)Online publication date: 1-Jan-2021
    • (2021)A part-based spatial and temporal aggregation method for dynamic scene recognitionNeural Computing and Applications10.1007/s00521-020-05415-333:13(7353-7370)Online publication date: 1-Jul-2021
    • (2020)Directional dense‐trajectory‐based patterns for dynamic texture recognitionIET Computer Vision10.1049/iet-cvi.2019.045514:4(162-176)Online publication date: 15-Apr-2020
    • (2020)A comprehensive system for image scene classificationMultimedia Tools and Applications10.1007/s11042-019-08264-y79:25-26(18033-18058)Online publication date: 1-Jul-2020
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media