Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging

Published: 01 October 2015 Publication History

Abstract

Efficient 2D LSTM attribute learning without pre-/post- processing of the data.2D LSTM networks with only a small amount of parameters.Raw noisy web-images for training without manual annotation.Automatic web-image analysis (unknown number of attribute classes and scene types).Further evaluations on public attribute dataset (SceneAtt). This paper describes an approach to scene analysis based on supervised training of 2D Long Short-Term Memory recurrent neural networks (LSTM networks). Unlike previous methods, our approach requires no manual construction of feature hierarchies or incorporation of other prior knowledge. Rather, like deep learning approaches using convolutional networks, our recognition networks are trained directly on raw pixel values. However, in contrast to convolutional neural networks, our approach uses 2D LSTM networks at all levels. Our networks yield per pixel mid-level classifications of input images; since training data for such applications is not available in large numbers, we describe an approach to generating artificial training data, and then evaluate the trained networks on real-world images. Our approach performed significantly better than others methods including Convolutional Neural Networks (ConvNet), yet using two orders of magnitude fewer parameters. We further show the experiment on a recently published dataset, outdoor scene attribute dataset for fair comparisons of scene attribute learning which had significant performance improvement (ca. 21%). Finally, our approach is successfully applied on a real-world application, automatic web-image tagging.

References

[1]
V. Arvis, C. Debain, M. Berducat, A. Benassi, Generalization of the cooccurrence matrix for colour images: application to colour texture classification, Image Anal. Stereol., 23 (2011) 63-72.
[2]
Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learn., 2 (2009) 1-127.
[3]
F. Bianconi, R. Harvey, P. Southam, A. Fernndez, Theoretical and experimental comparison of different approaches for color texture classification, J. Electron. Imaging, 20 (2011) 043006-1-043006-17.
[4]
A. Blake, P. Kohli, C. Rother, Mit Press, 2011.
[5]
A. Bosch, A. Zisserman, X. Muoz, Image classification using random forests and ferns, 2007.
[6]
W. Byeon, T.M. Breuel, Supervised texture segmentation using 2d LSTM networks, 2014.
[7]
W. Byeon, M. Liwicki, T. Breuel, Texture classification using 2d LSTM networks, 2014.
[8]
G. Csurka, C. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, 2004.
[9]
A. Drimbarean, P.F. Whelan, Experiments in colour texture analysis, Pattern Recognit. Lett., 22 (2001) 1161-1167.
[10]
A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes, 2009.
[11]
V. Ferrari, A. Zisserman, Learning visual attributes, 2007.
[12]
R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, 2014.
[13]
S. Gould, R. Fulton, D. Koller, Decomposing a scene into geometric and semantically consistent regions, 2009.
[14]
A. Graves, Springer, 2012.
[15]
A. Graves, S. Fernández, J. Schmidhuber, Multi-dimensional recurrent neural networks, Springer, 2007.
[16]
A. Graves, J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, 2008.
[17]
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997) 1735-1780.
[18]
Y. Jia, Caffe: an open source convolutional architecture for fast feature embedding, 2013 http://caffe.berkeleyvision.org/.
[19]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, Large-scale video classification with convolutional neural networks, 2014.
[20]
K. Kavukcuoglu, P. Sermanet, Y. lan Boureau, K. Gregor, M. Mathieu, Y.L. Cun, Learning convolutional feature hierarchies for visual recognition, in: Advances in Neural Information Processing Systems, vol. 23, 2010, pp. 1090-1098.
[21]
A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, Computer Science Department, University of Toronto, Tech. Report, 1 (2009) 7.
[22]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, 2012.
[23]
Ladick, P. Sturgess, K. Alahari, C. Russell, P. Torr, What, where and how many? combining object detectors and CRFS, in: Lecture Notes in Computer Science, vol. 6314, Springer, Berlin, Heidelberg, 2010, pp. 424-437.
[24]
S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, 2006.
[25]
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998) 2278-2324.
[26]
C. Liu, J. Yuen, A. Torralba, Nonparametric scene parsing: label transfer via dense scene alignment, 2009.
[27]
L. Liu, P.W. Fieguth, Texture classification from random features, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012) 574-586.
[28]
K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell., 27 (2005) 1615-1630.
[29]
R. Mittelman, H. Lee, B. Kuipers, S. Savarese, Weakly supervised learning of mid-level features with beta-Bernoulli process restricted Boltzmann machines, 2013.
[30]
G. Patterson, J. Hays, Sun attribute database: discovering, annotating, and recognizing scene attributes, 2012.
[31]
F. Perronnin, C. Dance, Fisher kernels on visual vocabularies for image categorization, 2007.
[32]
P.M. Roth, M. Winter, Survey of appearance-based methods for object recognition, Institute for Computer Graphics and Vision, Graz University of Technology, Austria, Technical Report ICGTR0108 (ICG-TR-01/08) (2008).
[33]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet large scale visual recognition challenge, Int. J. Comput. Vis. (IJCV) (2015) 1-42.
[34]
J. Sánchez, F. Perronnin, High-dimensional signature compression for large-scale image classification, 2011.
[35]
Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by joint identification-verification, in: Advances in Neural Information Processing Systems, vol. 27, Curran Associates, Inc., 2014, pp. 1988-1996.
[36]
M. Varma, A. Zisserman, Classifying images of materials: achieving viewpoint and illumination independence, Springer-Verlag, 2002.
[37]
A. Vedaldi, B. Fulkerson, Vlfeat: an open and portable library of computer vision algorithms, ACM, 2010.
[38]
A. Vedaldi, V. Gulshan, M. Varma, A. Zisserman, Multiple kernels for object detection, 2009.
[39]
S. Wang, J. Joo, Y. Wang, S.-C. Zhu, Weakly supervised learning for attribute localization in outdoor scenes, 2013.
[40]
S. Wang, Y. Wang, S.-C. Zhu, Hierarchical space tiling for scene modeling, Springer, 2013.
[41]
J. Xiao, J. Hays, K. Ehinger, A. Oliva, A. Torralba, Sun database: large-scale scene recognition from abbey to zoo, 2010.

Cited By

View all
  • (2017)FPGA-based accelerator for long short-term memory recurrent neural networks2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASPDAC.2017.7858394(629-634)Online publication date: 16-Jan-2017
  1. Scene analysis by mid-level attribute learning using 2D LSTM networks and an application to web-image tagging

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Pattern Recognition Letters
      Pattern Recognition Letters  Volume 63, Issue C
      October 2015
      78 pages

      Publisher

      Elsevier Science Inc.

      United States

      Publication History

      Published: 01 October 2015

      Author Tags

      1. LSTM
      2. Mid-level attribute learning
      3. Recurrent neural network
      4. Scene analysis
      5. Web-image tagging

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 28 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)FPGA-based accelerator for long short-term memory recurrent neural networks2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASPDAC.2017.7858394(629-634)Online publication date: 16-Jan-2017

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media