Abstract
Deep Convolutional Neural Networks have set remarkable milestones in the field of computer vision, especially in image classification tasks. However, training a deep network is heavily depending on massive labeled data and expensive computation resource. A number of studies have shown that utilizing a pre-trained model for deep feature extraction can achieve excellent performance. While most of these methods only consider the features from fully connected layers, we delve deep into the intermediate convolution layers. We propose the Selected Multi-Scale Convolution feature (SMSC) for compact deep representations. A convolutional feature map selection and deep descriptor aggregation method are proposed, and a fusion method of the multi-layer features for compact representation is introduced. The experimental results on the known MIT-Indoor dataset have demonstrated the effectiveness and efficiency of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV (2003)
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Gao, B. Bin Wei, X.S.: Deep spatial pyramid: the devil is once again in the details. arXiv preprint arXiv:1504.05277 (2015)
Liu, L., Shen, C., van den Hengel, A.: The treasure beneath convolutional layers: cross-convolutional-layer pooling for image classification. In: CVPR (2015)
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Heidelberg (2014)
Yoo, D., Park, S., Lee, J.Y., Kweon, I.: Multi-scale pyramid pooling for deep convolutional representation. In: CVPR Workshops (2015)
Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshops (2014)
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: CVPR (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Mohedano, E., Salvador, A., McGuinness, K., Marques, F., O’Connor, N.E., Giró-i-Nieto, X.: Bags of local convolutional features for scalable instance search. arXiv preprint arXiv:1604.04653 (2016)
Salvador, A., Giró-i-Nieto, X., Marqués, F., Satoh, S.I.: Faster R-CNN features for instance search. In: CVPR Workshops (2016)
Uricchio, T., Bertini, M., Seidenari, L., Bimbo, A.: Fisher encoded convolutional bag-of-windows for efficient image retrieval and social image tagging. In: CVPR Workshops (2016)
Hariharan, B., Arbelez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR (2015)
Kulkarni, P., Zepeda, J., Jurie, F., Perez, P., Chevallier, L.: Hybrid multi-layer deep CNN/aggregator feature for image classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: CVPR (2015)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia (2014)
Wei, X.S., Luo, J.H., Wu, J.: Selective convolutional descriptor aggregation for fine-grained image retrieval. arXiv preprint arXiv:1604.04994 (2016)
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: ICCV (2015)
Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing Using MATLAB. McGraw Hill Education, New York City (2010)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia (2010)
Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: CVPR (2013)
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)
Azizpour, H., Razavian, A., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. In: CVPR Workshops (2015)
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China under grant number 61401154, by the Natural Science Foundation of Hebei Province under grant number F2016502101, and by the Fundamental Research Funds for the Central Universities under grant number 2015ZD20.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhao, Z., Xu, G., Qi, Y. (2017). Multi-Scale Hierarchy Deep Feature Aggregation for Compact Image Representations. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-54526-4_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54525-7
Online ISBN: 978-3-319-54526-4
eBook Packages: Computer ScienceComputer Science (R0)