research-article

Dynamic texture and scene classification by transferring deep image features

Authors:

Matti PietikäinenAuthors Info & Claims

Neurocomputing, Volume 171, Issue C

Pages 1230 - 1241

https://doi.org/10.1016/j.neucom.2015.07.071

Published: 01 January 2016 Publication History

Abstract

Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However, the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changes, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempt to leverage a deep structure to extract features for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be more specific, we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a feature extractor to extract mid-level features from each frame, and then form the video-level representation by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover, we explore two different implementations of the TCoF scheme, i.e., the spatial TCoF and the temporal TCoF. In the spatial TCoF, the mean-removed frames are used as the inputs of the ConvNet; whereas in the temporal TCoF, the differences between two adjacent frames are used as the inputs of the ConvNet. We evaluate systematically the proposed spatial TCoF and the temporal TCoF schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, and demonstrate that the proposed approach yields superior performance.

References

[1]

B. Afsari, R. Chaudhry, A. Ravichandran, R. Vidal, Group action induced distances for averaging and clustering linear dynamical systems with applications to the analysis of dynamic scenes, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Providence, RI, USA, 2012, pp. 2208-2215.

[2]

H. Azizpour, A.S. Razavian, J. Sullivan, A. Maki, S. Carlsson, From generic to specific deep representations for visual recognition, 2014, arXiv preprint arXiv:1406.5774.

[3]

C.-C. Chang, C.-J. Lin, Libsvm: a library for support vector machines, ACM Trans. Intell. Syst. Technol. (TIST), 2 (2011) 27.

Digital Library

[4]

K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, in: British Machine Vision Conference, 2011.

[5]

K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, Return of the devil in the details: delving deep into convolutional nets, in: British Machine Vision Conference, 2014.

[6]

R. Chaudhry, G. Hager, R. Vidal, Dynamic template tracking and recognition, Int. J. Comput. Vis., 105 (2013) 19-48.

[7]

J. Chen, G. Zhao, M. Salo, E. Rahtu, M. Pietikäinen, Automatic dynamic texture segmentation using local descriptors and optical flow, IEEE Trans. Image Process., 22 (2013) 326-339.

Digital Library

[8]

R. Collobert, J. Weston, A unified architecture for natural language processing: deep neural networks with multitask learning, in: Proceedings of the 25th International Conference on Machine Learning, ACM, Helsinki, Finland, 2008, pp. 160-167.

Digital Library

[9]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res., 12 (2011) 2493-2537.

Digital Library

[10]

L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams, et al., Recent advances in deep learning for speech research at microsoft, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Florence, Italy, 2013. pp. 8604-8608.

[11]

K.G. Derpanis, M. Lecce, K. Daniilidis, R.P. Wildes, Dynamic scene understanding: the role of orientation features in space and time in scene classification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE,Providence, RI, USA, 2012, pp. 1306-1313.

[12]

K.G. Derpanis, R.P. Wildes, Dynamic texture recognition based on distributions of spacetime oriented structure, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisco, CA, USA, 2010, pp. 191-198.

[13]

K.G. Derpanis, R.P. Wildes, Spacetime texture representation and recognition based on a spatiotemporal orientation analysis, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012) 1193-1205.

Digital Library

[14]

G. Doretto, A. Chiuso, Y.N. Wu, S. Soatto, Dynamic textures, Int. J. Comput. Vis., 51 (2003) 91-109.

[15]

C. Feichtenhofer, A. Pinz, R.P. Wildes, Spacetime forests with complementary features for dynamic scene recognition, in: British Machine Vision Conference, 2013.

[16]

C. Feichtenhofer, A. Pinz, R.P. Wildes, Bags of spacetime energies for dynamic scene recognition, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Columbus, OH, USA, 2014.

[17]

K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, in: Computer Vision-ECCV 2014, 2014, Springer, Zurich, Switzerland, pp. 346-361.

[18]

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997) 1735-1780.

Digital Library

[19]

H. Ji, X. Yang, H. Ling, Y. Xu, Wavelet domain multifractal analysis for static and dynamic texture classification, IEEE Trans. Image Process., 22 (2013) 286-299.

Digital Library

[20]

Y. Jia, Caffe: an open source convolutional architecture for fast feature embedding, 2013, {http://caffe.berkeleyvision.org}.

[21]

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

Digital Library

[22]

K. Konda, R. Memisevic, V. Michalski, Learning to encode motion using spatio-temporal synchrony, in: International Conference on Learning Representations, 2013.

[23]

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097-1105.

Digital Library

[24]

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86(11) (1998) 2278-2324.

[25]

D. Mandic, J. Chambers, Recurrent Neural Networks for Prediction, John Wiley & Sons, Inc., New York, NY, USA, 2001.

[26]

M. Marszalek, I. Laptev, C. Schmid, Actions in context, in: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, FL, USA, 2009, pp. 2929-2936.

[27]

A. Mumtaz, E. Coviello, G. Lanckriet, A. Chan, A scalable and accurate descriptor for dynamic textures using bag of system trees, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015) 697-712.

[28]

T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell., 24 (2002) 971-987.

Digital Library

[29]

A. Oliva, A. Torralba, Modeling the shape of the scene, Int. J. Comput. Vis., 42 (2001) 145-175.

Digital Library

[30]

F. Perronnin, J. Sánchez, T. Mensink, Improving the Fisher kernel for large-scale image classification, in: European Conference on Computer Vision, Springer, Heraklion, Crete, Greece, 2010, pp. 143-156.

[31]

R. Péteri, S. Fazekas, M.J. Huiskes, Dyntex: A comprehensive database of dynamic textures, Pattern Recognit. Lett., 31 (2010) 1627-1632.

Digital Library

[32]

M. Pietikäinen, A. Hadid, G. Zhao, T. Ahonen, Springer, London, UK, 2011.

[33]

X. Qi, R. Xiao, J. Guo, L. Zhang, Pairwise rotation invariant co-occurrence local binary pattern, in: European Conference on Computer Vision, Springer, Florence, Italy, 2012, pp. 158-171.

[34]

E. Rahtu, J. Heikkilä, V. Ojansivu, T. Ahonen, Local phase quantization for blur-insensitive image analysis, Image Vis. Comput., 30 (2012) 501-512.

Digital Library

[35]

A. Ravichandran, R. Chaudhry, R. Vidal, Categorizing dynamic textures using a bag of dynamical systems, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 342-353.

Digital Library

[36]

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, Overfeat: integrated recognition, localization and detection using convolutional networks, 2013, arXiv preprint arXiv:1312.6229.

[37]

A. Sharif Razavian, H. Azizpour, J. Sullivan, S. Carlsson, Cnn features off-the-shelf: an astounding baseline for recognition, 2014, arXiv preprint arXiv:1403.6382.

[38]

N. Shroff, P. Turaga, R. Chellappa, Moving vistas: exploiting motion for describing scenes, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisco, CA, USA, 2010, pp. 1911-1918.

[39]

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.

[40]

Y. Sun, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, 2014, arXiv preprint arXiv:1406.4773.

[41]

Y. Sun, X. Wang, X. Tang, Deep learning face representation from predicting 10,000 classes, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

Digital Library

[42]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, 2014, arXiv preprint arXiv:1409.4842.

[43]

C. Theriault, N. Thome, M. Cord, Dynamic scene classification: learning motion descriptors with slow features analysis, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Portland, OR, USA, 2013, pp. 2603-2610.

[44]

L. van der Maaten, G. Hinton, Visualizing data using t-sne, J. Mach. Learn. Res., 9 (2008) 85.

[45]

A. Vedaldi, K. Lenc, Matconvnet-convolutional neural networks for matlab, 2014, arXiv preprint arXiv:1412.4564.

[46]

Y. Xu, Y. Quan, H. Ling, H. Ji, Dynamic texture classification using dynamic fractal analysis, in: IEEE International Conference on International Conference on Computer Vision (ICCV), IEEE, Barcelona, Spain, 2011, pp. 1219-1226.

[47]

G. Zhao, M. Pietikäinen, Dynamic texture recognition using local binary patterns with an application to facial expressions, IEEE Trans. Pattern Anal. Mach. Intell., 29 (2007) 915-928.

Digital Library

Cited By

Nguyen TNguyen TBouchara F(2024)Adequately hierarchical patterns based on pairwise regionsMultimedia Systems10.1007/s00530-023-01217-430:1Online publication date: 28-Jan-2024
https://dl.acm.org/doi/10.1007/s00530-023-01217-4
Nguyen TNguyen TBouchara F(2023)Representing dynamic textures based on polarized gradient featuresMachine Vision and Applications10.1007/s00138-023-01438-734:5Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/s00138-023-01438-7
Uddin MJoolee JLee YSohn K(2022)A Novel Multi-Modal Network-Based Dynamic Scene UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/346221818:1(1-19)Online publication date: 27-Jan-2022
https://dl.acm.org/doi/10.1145/3462218
Show More Cited By

Dynamic texture and scene classification by transferring deep image features
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

An Approach of Transferring Pre-trained Deep Convolutional Neural Networks for Aerial Scene Classification
Pattern Recognition and Machine Intelligence
Abstract
Feature selection or feature extraction plays a vital role in image classification task. Since the advent of deep learning methods, significant efforts have been given by researchers to obtain an optimal feature set of images for improving ...
Dynamic Texture Classification Based on Dual-Tree Complex Wavelet Transform
IMCCC '11: Proceedings of the 2011 First International Conference on Instrumentation, Measurement, Computer, Communication and Control

Dynamic texture is a spatially repetitive, time-varying visual pattern that forms an image sequence with some spatio-temporal stationary properties. This paper proposes a dynamic texture classification algorithm based on the magnitude information and ...
Image Retrieval Using Fused Deep Convolutional Features

This paper proposes an image retrieval using fused deep convolutional features to solve the semantic gap between low-level features and high-level semantic features of traditional contend-based image retrieval method. Firstly, the improved network ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 171, Issue C

January 2016

1693 pages

ISSN:0925-2312

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nguyen TNguyen TBouchara F(2024)Adequately hierarchical patterns based on pairwise regionsMultimedia Systems10.1007/s00530-023-01217-430:1Online publication date: 28-Jan-2024
https://dl.acm.org/doi/10.1007/s00530-023-01217-4
Nguyen TNguyen TBouchara F(2023)Representing dynamic textures based on polarized gradient featuresMachine Vision and Applications10.1007/s00138-023-01438-734:5Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/s00138-023-01438-7
Uddin MJoolee JLee YSohn K(2022)A Novel Multi-Modal Network-Based Dynamic Scene UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/346221818:1(1-19)Online publication date: 27-Jan-2022
https://dl.acm.org/doi/10.1145/3462218
Nguyen TNguyen TBouchara F(2022)Dynamic texture description using adapted bipolar-invariant and blurred featuresMultidimensional Systems and Signal Processing10.1007/s11045-022-00826-y33:3(945-979)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1007/s11045-022-00826-y
Sun S(2021)Digital Audio Scene Recognition Method Based on Machine Learning TechnologyScientific Programming10.1155/2021/23886972021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/2388697
Nguyen TNguyen T(2021)A Comprehensive Taxonomy of Dynamic Texture RepresentationACM Computing Surveys10.1145/348789255:1(1-39)Online publication date: 23-Nov-2021
https://dl.acm.org/doi/10.1145/3487892
Nguyen TNguyen TBouchara F(2021)Prominent Local Representation for Dynamic Textures Based on High-Order Gaussian-GradientsIEEE Transactions on Multimedia10.1109/TMM.2020.299720223(1367-1382)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1109/TMM.2020.2997202
Peng XBouzerdoum APhung S(2021)A part-based spatial and temporal aggregation method for dynamic scene recognitionNeural Computing and Applications10.1007/s00521-020-05415-333:13(7353-7370)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1007/s00521-020-05415-3
Nguyen TNguyen TBouchara F(2020)Directional dense‐trajectory‐based patterns for dynamic texture recognitionIET Computer Vision10.1049/iet-cvi.2019.045514:4(162-176)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1049/iet-cvi.2019.0455
Sorkhi AHassanpour HFateh M(2020)A comprehensive system for image scene classificationMultimedia Tools and Applications10.1007/s11042-019-08264-y79:25-26(18033-18058)Online publication date: 1-Jul-2020
https://dl.acm.org/doi/10.1007/s11042-019-08264-y
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents