Abstract
In this study we compare three different fine-tuning strategies in order to investigate the best way to transfer the parameters of popular deep convolutional neural networks that were trained for a visual annotation task on one dataset, to a new, considerably different dataset. We focus on the concept-based image/video annotation problem and use ImageNet as the source dataset, while the TRECVID SIN 2013 and PASCAL VOC-2012 classification datasets are used as the target datasets. A large set of experiments examines the effectiveness of three fine-tuning strategies on each of three different pre-trained DCNNs and each target dataset. The reported results give rise to guidelines for effectively fine-tuning a DCNN for concept-based visual annotation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Campos, V., Salvador, A., Giro-i Nieto, X., Jou, B.: Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. In: 1st International Workshop on Affect and Sentiment in Multimedia (ASM 2015), pp. 57–62. ACM, Brisbane (2015)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition (2013). CoRR abs/1310.1531
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC 2012) Results (2012)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR 2014) (2014)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding (2014). arXiv preprint: arXiv:1408.5093
Krizhevsky, A., Ilya, S., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS 2012), pp. 1097–1105. Curran Associates, Inc. (2012)
Markatopoulou, F., et al.: ITI-CERTH participation in TRECVID 2015. In: TRECVID 2015 Workshop. NIST, Gaithersburg (2015)
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Computer Vision and Pattern Recognition (CVPR 2014) (2014)
Over, P., et al.: TRECVID 2013 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID 2013. NIST, Gaithersburg (2013)
Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Lecun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv technical report (2014)
Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retr. 2(4), 215–322 (2009)
Snoek, C., Fontijne, D., van de Sande, K.E., Stokman, H., et al.: Qualcomm Research and University of Amsterdam at TRECVID 2015: recognizing concepts, objects, and events in video. In: TRECVID 2015 Workshop. NIST, Gaithersburg (2015)
Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR 2015) (2015)
Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. In: 31st ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 603–610. ACM, USA (2008)
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? CoRR abs/1411.1792 (2014)
Acknowledgements
This work was supported by the European Commission under contract H2020-687786 InVID.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Pittaras, N., Markatopoulou, F., Mezaris, V., Patras, I. (2017). Comparison of Fine-Tuning and Extension Strategies for Deep Convolutional Neural Networks. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-51811-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51810-7
Online ISBN: 978-3-319-51811-4
eBook Packages: Computer ScienceComputer Science (R0)