research-article

Fully Convolutional Networks for Semantic Segmentation

Authors:

Evan Shelhamer,

Trevor DarrellAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 39, Issue 4

Pages 640 - 651

https://doi.org/10.1109/TPAMI.2016.2572683

Published: 01 April 2017 Publication History

Abstract

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional networks achieve improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image.

References

[1]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Neural Inf. Process. Syst., 2012, pp. 1106–1114.

Digital Library

[2]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., 2015.

[3]

C. Szegedy, et al., “Going deeper with convolutions,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.

[4]

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated recognition, localization and detection using convolutional networks,” in Proc. Int. Conf. Learn. Represent., 2014.

[5]

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 38, no. Issue 1, pp. 142–158, 2015.

Digital Library

[6]

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 346–361.

[7]

N. Zhang, J. Donahue, R. Girshick, and T. Darrell, “Part-based R-CNNs for fine-grained category detection,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 834–849.

[8]

J. Long, N. Zhang, and T. Darrell, “Do convnets learn correspondence?” in Proc. Neural Inf. Process. Syst, 2014, pp. 1601–1609.

Digital Library

[9]

P. Fischer, A. Dosovitskiy, and T. Brox, “Descriptor matching with convolutional neural networks: A comparison to SIFT,” arXiv preprint arXiv:1405.5769, 2014.

[10]

F. Ning, D. Delhomme, Y. LeCun, F. Piano, L. Bottou, and P. E. Barbano, “Toward automatic phenotyping of developing embryos from videos,” IEEE Trans. Image Process., vol. Volume 14, no. Issue 9, pp. 1360–1371, 2005.

Digital Library

[11]

D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” in Proc. Neural Inf. Process. Syst., 2012, pp. 2852–2860.

Digital Library

[12]

C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” Proc. IEEE Trans. Pattern Anal. Mach. Intel., vol. Volume 35, no. Issue 8, pp. 1915–1929, 2013.

Digital Library

[13]

P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” in Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 82–90.

Digital Library

[14]

B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous detection and segmentation,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 297–312.

[15]

S. Gupta, R. Girshick, P. Arbelaez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 345–360.

[16]

Y. Ganin and V. Lempitsky, “N<inline-formula><tex-math notation=LaTeX>$^4$</tex-math><alternatives><inline-graphic/></alternatives></inline-formula>-fields: Neural network nearest neighbor fields for image transforms,” in Proc. Asian Conf. Comput. Vis., 2014, pp. 536–551.

[17]

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. Comput. Vis. Pattern Recognit., 2015.

[18]

J. Donahue, et al., “DeCAF: A deep convolutional activation feature for generic visual recognition,” in Proc. Int. Conf. Mach. Learn., 2014, pp. 647–655.

Digital Library

[19]

M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.

[20]

O. Matan, C. J. Burges, Y. LeCun, and J. S. Denker, “Multi-digit recognition using a space displacement neural network,” in Proc. Neural Inf. Process. Syst., 1991, pp. 488–495.

Digital Library

[21]

Y. LeCun, et al., “Backpropagation applied to hand-written zip code recognition,” in Proc. Neural Comput., 1989, pp. 541–551.

Digital Library

[22]

R. Wolf and J. C. Platt, “Postal address block location using a convolutional locator network,” in Proc. Neural Inf. Process. Syst., 1994, pp. 745–745.

Digital Library

[23]

D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in Proc. Int. Conf. Comput. Vis., 2013, pp. 633–640.

Digital Library

[24]

J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a convolutional network and a graphical model for human pose estimation,” in Proc. Neural Inf. Process. Syst., 2014, pp. 1799–1807.

Digital Library

[25]

D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Proc. Neural Inf. Process. Syst., 2014, pp. 2366–2374.

Digital Library

[26]

P. Burt and E. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. Volume 31, no. Issue 4, pp. 532–540, 1983.

[27]

J. J. Koenderink and A. J. van Doorn, “Representation of local geometry in the visual system,” Biol. Cybern., vol. Volume 55, no. Issue 6, pp. 367–375, 1987.

Digital Library

[28]

P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian detection with unsupervised multi-stage feature learning,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 3626–3633.

Digital Library

[29]

B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 447–456.

[30]

M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich, “Feedforward semantic segmentation with zoom-out features,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 3376–3385.

[31]

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Neural Inf. Process. Syst., 2015, pp. 91–99.

Digital Library

[32]

S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1395–1403.

Digital Library

[33]

F. Liu, C. Shen, G. Lin, and I. Reid, “Learning depth from single monocular images using deep convolutional neural fields,” IEEE Trans. Pattern Anal. Mach. Intell., 2015,

Digital Library

[34]

P. Fischer, et al., “Learning optical flow with convolutional networks,” in Proc. Int. Conf. Comput. Vis., 2015.

Digital Library

[35]

D. Pathak, P. Krähenbühl, and T. Darrell, “Constrained convolutional neural networks for weakly supervised segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1796–1804.

Digital Library

[36]

G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille, “Weakly-and semi-supervised learning of a DCNN for semantic image segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1742–1750.

Digital Library

[37]

J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1635–1643.

Digital Library

[38]

S. Hong, H. Noh, and B. Han, “Decoupled deep neural network for semi-supervised semantic segmentation,” in Proc. Neural Inf. Process. Syst., 2015, pp. 1495–1503.

Digital Library

[39]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected CRFs,” in Proc. Int. Conf. Learn. Represent., 2015.

[40]

S. Zheng, et al., “Conditional random fields as recurrent neural networks,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1529–1537.

Digital Library

[41]

W. Liu, A. Rabinovich, and A. C. Berg, “ParseNet: Looking wider to see better,” arXiv preprint arXiv:1506.04579, 2015.

[42]

H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1520–1528.

Digital Library

[43]

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. Med. Image Comput. Comput.-Assist. Intervention, 2015, pp. 234–241.

[44]

F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proc. Int. Conf. Learn. Represent., 2016.

[45]

A. Giusti, D. C. Cireşan, J. Masci, L. M. Gambardella, and J. Schmidhuber, “Fast image scanning with deep max-pooling convolutional neural networks,” in Proc. Int. Conf. Image Process., 2013, pp. 4034–4038.

[46]

M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, “A real-time algorithm for signal analysis with the help of the wavelet transform,” in Proc. Int. Conf. Time-Freq. Methods Phase Space, 1989, pp. 286–297.

[47]

S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed. New York, NY, USA: Academic, 1999.

[48]

P. P. Vaidyanathan, “Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial,” Proc. IEEE, vol. Volume 78, no. Issue 1, pp. 56–93, 1990.

[49]

L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, “Regularization of neural networks using DropConnect,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1058–1066.

Digital Library

[50]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes Challenge 2011 Results. {Online}. Available: http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html

[51]

C. M. Bishop, Pattern Recognition and Machine Learning . New York, NY, USA: Springer-Verlag, 2006, p. pp.229.

Digital Library

[52]

B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 991–998.

Digital Library

[53]

Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural Networks: Tricks of the Trade . Berlin, Germany: Springer, 1998, pp. 9–48.

Digital Library

[54]

Y. Jia, et al., “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.

[55]

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 746–760.

Digital Library

[56]

S. Gupta, P. Arbelaez, and J. Malik, “Perceptual organization and recognition of indoor scenes from RGB-D images,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 564–571.

Digital Library

[57]

C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 33, no. Issue 5, pp. 978–994, 2011.

Digital Library

[58]

J. Tighe and S. Lazebnik, “Superparsing: scalable nonparametric image parsing with superpixels,” in Proc. Eur. Conf. Comput. Vis., 2010, pp. 352–365.

Digital Library

[59]

J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar detectors,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 3001–3008.

Digital Library

[60]

J. Dai, K. He, and J. Sun, “Convolutional feature masking for joint object and stuff segmentation,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 3992–4000.

[61]

J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic segmentation with second-order pooling,” in Proc. Eur. Conf. Comput. Vis., 2012 pp. 430–443.

Digital Library

[62]

R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille, “The role of context for object detection and semantic segmentation in the wild,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 891–898.

Digital Library

Cited By

Hu HWang XLi T(2024)Infrared and visible image fusion method based on full convolutional network (FCN)Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23609446:1(2825-2834)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-236094
Liu CShe W(2024)A bilateral feature fusion network for defect detection on mobile camerasJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23266446:1(2585-2594)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-232664
Viswanathan Akumar VUmamaheswari MJanarthanan VJaganathan M(2024)Semantic segmentation based on enhanced gated pyramid network with lightweight attention moduleAI Communications10.3233/AIC-22025437:1(97-114)Online publication date: 21-Mar-2024
https://dl.acm.org/doi/10.3233/AIC-220254
Show More Cited By

Recommendations

Pairwise Semantic Segmentation via Conjugate Fully Convolutional Network
Medical Image Computing and Computer Assisted Intervention – MICCAI 2019
Abstract
Semantic segmentation has been popularly addressed using fully convolutional networks (FCNs) with impressive results if the training set is diverse and large enough. However, FCNs often fail to achieve satisfactory results due to a limited number ...
Multi-scale deep context convolutional neural networks for semantic segmentation

Recent years have witnessed the great progress for semantic segmentation using deep convolutional neural networks (DCNNs). This paper presents a novel fully convolutional network for semantic segmentation using multi-scale contextual convolutional ...
Semantic image segmentation using fully convolutional neural networks with multi-scale images and multi-scale dilated convolutions

In this work, we investigate the effects of the cascade architecture of dilated convolutions and the deep network architecture of multi-resolution input images on the accuracy of semantic segmentation. We show that a cascade of dilated convolutions is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 39, Issue 4

April 2017

208 pages

ISSN:0162-8828

Issue’s Table of Contents

Copyright © 2017.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,532
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu HWang XLi T(2024)Infrared and visible image fusion method based on full convolutional network (FCN)Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23609446:1(2825-2834)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-236094
Liu CShe W(2024)A bilateral feature fusion network for defect detection on mobile camerasJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23266446:1(2585-2594)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/JIFS-232664
Viswanathan Akumar VUmamaheswari MJanarthanan VJaganathan M(2024)Semantic segmentation based on enhanced gated pyramid network with lightweight attention moduleAI Communications10.3233/AIC-22025437:1(97-114)Online publication date: 21-Mar-2024
https://dl.acm.org/doi/10.3233/AIC-220254
Shao YMa XMa YZhang W(2024)Deep semantic learning for acoustic scene classificationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00323-52024:1Online publication date: 3-Jan-2024
https://dl.acm.org/doi/10.1186/s13636-023-00323-5
Chang RMao ZHu JBai HPan AYang YGao S(2024)Generation of Smoke Dataset for Power Equipment and Study of Image Semantic SegmentationJournal of Electrical and Computer Engineering10.1155/2024/92984782024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/9298478
Ai MWang X(2024)Self-Training Based Instance Calibration for Unsupervised Domain Adaptation Semantic Segmentation of Remote Sensing ImagesProceedings of the 2024 8th International Conference on Big Data and Internet of Things10.1145/3697355.3697370(93-99)Online publication date: 14-Sep-2024
https://dl.acm.org/doi/10.1145/3697355.3697370
Xi XZhao JWang HYan TLiu J(2024)Research on Recognition Algorithm of Power Facility Inspection Robot Based on Computer VisionProceedings of the 2024 9th International Conference on Cyber Security and Information Engineering10.1145/3689236.3689326(446-453)Online publication date: 15-Sep-2024
https://dl.acm.org/doi/10.1145/3689236.3689326
Cheng CHsung TLi GTew ILo WLam W(2024)Developing automated photographic detection of gum diseases using deep neural networks for mobile devicesProceedings of the 2024 6th International Conference on Software Engineering and Development10.1145/3686614.3686621(57-63)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3686614.3686621
Yang SZhang RSun WChen SYe CWu HLi M(2024)Self-Attention-based Multi-Scale Feature Fusion Network for RoadPonding SegmentationProceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern Recognition10.1145/3663976.3663987(1-6)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3663976.3663987
Lang KShen YTong CZhang KWei H(2024)Photovoltaic Array Extraction Algorithm Based on Modified U2NetProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661775(1-6)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1145/3661725.3661775
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents