research-article

Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging

Authors:

Marcus Liwicki,

Thomas M. BreuelAuthors Info & Claims

Pattern Recognition Letters, Volume 63, Issue C

Pages 23 - 29

https://doi.org/10.1016/j.patrec.2015.06.003

Published: 01 October 2015 Publication History

Abstract

Efficient 2D LSTM attribute learning without pre-/post- processing of the data.2D LSTM networks with only a small amount of parameters.Raw noisy web-images for training without manual annotation.Automatic web-image analysis (unknown number of attribute classes and scene types).Further evaluations on public attribute dataset (SceneAtt). This paper describes an approach to scene analysis based on supervised training of 2D Long Short-Term Memory recurrent neural networks (LSTM networks). Unlike previous methods, our approach requires no manual construction of feature hierarchies or incorporation of other prior knowledge. Rather, like deep learning approaches using convolutional networks, our recognition networks are trained directly on raw pixel values. However, in contrast to convolutional neural networks, our approach uses 2D LSTM networks at all levels. Our networks yield per pixel mid-level classifications of input images; since training data for such applications is not available in large numbers, we describe an approach to generating artificial training data, and then evaluate the trained networks on real-world images. Our approach performed significantly better than others methods including Convolutional Neural Networks (ConvNet), yet using two orders of magnitude fewer parameters. We further show the experiment on a recently published dataset, outdoor scene attribute dataset for fair comparisons of scene attribute learning which had significant performance improvement (ca. 21%). Finally, our approach is successfully applied on a real-world application, automatic web-image tagging.

References

[1]

V. Arvis, C. Debain, M. Berducat, A. Benassi, Generalization of the cooccurrence matrix for colour images: application to colour texture classification, Image Anal. Stereol., 23 (2011) 63-72.

[2]

Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learn., 2 (2009) 1-127.

Digital Library

[3]

F. Bianconi, R. Harvey, P. Southam, A. Fernndez, Theoretical and experimental comparison of different approaches for color texture classification, J. Electron. Imaging, 20 (2011) 043006-1-043006-17.

[4]

A. Blake, P. Kohli, C. Rother, Mit Press, 2011.

[5]

A. Bosch, A. Zisserman, X. Muoz, Image classification using random forests and ferns, 2007.

[6]

W. Byeon, T.M. Breuel, Supervised texture segmentation using 2d LSTM networks, 2014.

[7]

W. Byeon, M. Liwicki, T. Breuel, Texture classification using 2d LSTM networks, 2014.

[8]

G. Csurka, C. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, 2004.

[9]

A. Drimbarean, P.F. Whelan, Experiments in colour texture analysis, Pattern Recognit. Lett., 22 (2001) 1161-1167.

Digital Library

[10]

A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes, 2009.

[11]

V. Ferrari, A. Zisserman, Learning visual attributes, 2007.

[12]

R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, 2014.

[13]

S. Gould, R. Fulton, D. Koller, Decomposing a scene into geometric and semantically consistent regions, 2009.

[14]

A. Graves, Springer, 2012.

[15]

A. Graves, S. Fernández, J. Schmidhuber, Multi-dimensional recurrent neural networks, Springer, 2007.

[16]

A. Graves, J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, 2008.

[17]

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997) 1735-1780.

Digital Library

[18]

Y. Jia, Caffe: an open source convolutional architecture for fast feature embedding, 2013 http://caffe.berkeleyvision.org/.

[19]

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, 2014.

[20]

K. Kavukcuoglu, P. Sermanet, Y. lan Boureau, K. Gregor, M. Mathieu, Y.L. Cun, Learning convolutional feature hierarchies for visual recognition, in: Advances in Neural Information Processing Systems, vol. 23, 2010, pp. 1090-1098.

[21]

A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, Computer Science Department, University of Toronto, Tech. Report, 1 (2009) 7.

[22]

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, 2012.

[23]

Ladick, P. Sturgess, K. Alahari, C. Russell, P. Torr, What, where and how many? combining object detectors and CRFS, in: Lecture Notes in Computer Science, vol. 6314, Springer, Berlin, Heidelberg, 2010, pp. 424-437.

[24]

S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, 2006.

[25]

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998) 2278-2324.

[26]

C. Liu, J. Yuen, A. Torralba, Nonparametric scene parsing: label transfer via dense scene alignment, 2009.

[27]

L. Liu, P.W. Fieguth, Texture classification from random features, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012) 574-586.

Digital Library

[28]

K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell., 27 (2005) 1615-1630.

Digital Library

[29]

R. Mittelman, H. Lee, B. Kuipers, S. Savarese, Weakly supervised learning of mid-level features with beta-Bernoulli process restricted Boltzmann machines, 2013.

[30]

G. Patterson, J. Hays, Sun attribute database: discovering, annotating, and recognizing scene attributes, 2012.

[31]

F. Perronnin, C. Dance, Fisher kernels on visual vocabularies for image categorization, 2007.

[32]

P.M. Roth, M. Winter, Survey of appearance-based methods for object recognition, Institute for Computer Graphics and Vision, Graz University of Technology, Austria, Technical Report ICGTR0108 (ICG-TR-01/08) (2008).

[33]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet large scale visual recognition challenge, Int. J. Comput. Vis. (IJCV) (2015) 1-42.

[34]

J. Sánchez, F. Perronnin, High-dimensional signature compression for large-scale image classification, 2011.

[35]

Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, in: Advances in Neural Information Processing Systems, vol. 27, Curran Associates, Inc., 2014, pp. 1988-1996.

[36]

M. Varma, A. Zisserman, Classifying images of materials: achieving viewpoint and illumination independence, Springer-Verlag, 2002.

[37]

A. Vedaldi, B. Fulkerson, Vlfeat: an open and portable library of computer vision algorithms, ACM, 2010.

[38]

A. Vedaldi, V. Gulshan, M. Varma, A. Zisserman, Multiple kernels for object detection, 2009.

[39]

S. Wang, J. Joo, Y. Wang, S.-C. Zhu, Weakly supervised learning for attribute localization in outdoor scenes, 2013.

[40]

S. Wang, Y. Wang, S.-C. Zhu, Hierarchical space tiling for scene modeling, Springer, 2013.

[41]

J. Xiao, J. Hays, K. Ehinger, A. Oliva, A. Torralba, Sun database: large-scale scene recognition from abbey to zoo, 2010.

Cited By

Guan YYuan ZSun GCong J(2017)FPGA-based accelerator for long short-term memory recurrent neural networks2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASPDAC.2017.7858394(629-634)Online publication date: 16-Jan-2017
https://dl.acm.org/doi/10.1109/ASPDAC.2017.7858394

Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
  2. Machine learning
    1. Machine learning approaches

Recommendations

Sentiment Analysis in the Light of LSTM Recurrent Neural Networks

Long short-term memory LSTM is a special type of recurrent neural network RNN architecture that was designed over simple RNNs for modeling temporal sequences and their long-range dependencies more accurately. In this article, the authors work with ...
Software failure time series prediction with RBF, GRNN, and LSTM neural networks
Abstract
The important task of software quality assurance is failure prediction. Time series forecasting methods can be successfully used for this purpose. This paper aims to study and compare the effectiveness of software failure prediction using ...
A comparative performance analysis of different activation functions in LSTM networks for classification
Abstract
In recurrent neural networks such as the long short-term memory (LSTM), the sigmoid and hyperbolic tangent functions are commonly used as activation functions in the network units. Other activation functions developed for the neural networks are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Pattern Recognition Letters

Pattern Recognition Letters Volume 63, Issue C

October 2015

78 pages

ISSN:0167-8655

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Guan YYuan ZSun GCong J(2017)FPGA-based accelerator for long short-term memory recurrent neural networks2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASPDAC.2017.7858394(629-634)Online publication date: 16-Jan-2017
https://dl.acm.org/doi/10.1109/ASPDAC.2017.7858394

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents