Nothing Special   »   [go: up one dir, main page]

skip to main content

Fully Convolutional Networks for Semantic Segmentation

Published: 01 April 2017 Publication History


Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional networks achieve improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image.


A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Neural Inf. Process. Syst., 2012, pp. 1106–1114.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., 2015.
C. Szegedy, et al., “Going deeper with convolutions,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated recognition, localization and detection using convolutional networks,” in Proc. Int. Conf. Learn. Represent., 2014.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 38, no. Issue 1, pp. 142–158, 2015.
K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 346–361.
N. Zhang, J. Donahue, R. Girshick, and T. Darrell, “Part-based R-CNNs for fine-grained category detection,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 834–849.
J. Long, N. Zhang, and T. Darrell, “Do convnets learn correspondence?” in Proc. Neural Inf. Process. Syst, 2014, pp. 1601–1609.
P. Fischer, A. Dosovitskiy, and T. Brox, “Descriptor matching with convolutional neural networks: A comparison to SIFT,” arXiv preprint arXiv:1405.5769, 2014.
F. Ning, D. Delhomme, Y. LeCun, F. Piano, L. Bottou, and P. E. Barbano, “Toward automatic phenotyping of developing embryos from videos,” IEEE Trans. Image Process., vol. Volume 14, no. Issue 9, pp. 1360–1371, 2005.
D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” in Proc. Neural Inf. Process. Syst., 2012, pp. 2852–2860.
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” Proc. IEEE Trans. Pattern Anal. Mach. Intel., vol. Volume 35, no. Issue 8, pp. 1915–1929, 2013.
P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” in Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 82–90.
B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous detection and segmentation,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 297–312.
S. Gupta, R. Girshick, P. Arbelaez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 345–360.
Y. Ganin and V. Lempitsky, “N<inline-formula><tex-math notation=LaTeX>$^4$</tex-math><alternatives><inline-graphic/></alternatives></inline-formula>-fields: Neural network nearest neighbor fields for image transforms,” in Proc. Asian Conf. Comput. Vis., 2014, pp. 536–551.
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. Comput. Vis. Pattern Recognit., 2015.
J. Donahue, et al., “DeCAF: A deep convolutional activation feature for generic visual recognition,” in Proc. Int. Conf. Mach. Learn., 2014, pp. 647–655.
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.
O. Matan, C. J. Burges, Y. LeCun, and J. S. Denker, “Multi-digit recognition using a space displacement neural network,” in Proc. Neural Inf. Process. Syst., 1991, pp. 488–495.
Y. LeCun, et al., “Backpropagation applied to hand-written zip code recognition,” in Proc. Neural Comput., 1989, pp. 541–551.
R. Wolf and J. C. Platt, “Postal address block location using a convolutional locator network,” in Proc. Neural Inf. Process. Syst., 1994, pp. 745–745.
D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in Proc. Int. Conf. Comput. Vis., 2013, pp. 633–640.
J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a convolutional network and a graphical model for human pose estimation,” in Proc. Neural Inf. Process. Syst., 2014, pp. 1799–1807.
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Proc. Neural Inf. Process. Syst., 2014, pp. 2366–2374.
P. Burt and E. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. Volume 31, no. Issue 4, pp. 532–540, 1983.
J. J. Koenderink and A. J. van Doorn, “Representation of local geometry in the visual system,” Biol. Cybern., vol. Volume 55, no. Issue 6, pp. 367–375, 1987.
P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian detection with unsupervised multi-stage feature learning,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 3626–3633.
B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 447–456.
M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich, “Feedforward semantic segmentation with zoom-out features,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 3376–3385.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Neural Inf. Process. Syst., 2015, pp. 91–99.
S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1395–1403.
F. Liu, C. Shen, G. Lin, and I. Reid, “Learning depth from single monocular images using deep convolutional neural fields,” IEEE Trans. Pattern Anal. Mach. Intell., 2015,
P. Fischer, et al., “Learning optical flow with convolutional networks,” in Proc. Int. Conf. Comput. Vis., 2015.
D. Pathak, P. Krähenbühl, and T. Darrell, “Constrained convolutional neural networks for weakly supervised segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1796–1804.
G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille, “Weakly-and semi-supervised learning of a DCNN for semantic image segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1742–1750.
J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1635–1643.
S. Hong, H. Noh, and B. Han, “Decoupled deep neural network for semi-supervised semantic segmentation,” in Proc. Neural Inf. Process. Syst., 2015, pp. 1495–1503.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected CRFs,” in Proc. Int. Conf. Learn. Represent., 2015.
S. Zheng, et al., “Conditional random fields as recurrent neural networks,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1529–1537.
W. Liu, A. Rabinovich, and A. C. Berg, “ParseNet: Looking wider to see better,” arXiv preprint arXiv:1506.04579, 2015.
H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1520–1528.
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. Med. Image Comput. Comput.-Assist. Intervention, 2015, pp. 234–241.
F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proc. Int. Conf. Learn. Represent., 2016.
A. Giusti, D. C. Cireşan, J. Masci, L. M. Gambardella, and J. Schmidhuber, “Fast image scanning with deep max-pooling convolutional neural networks,” in Proc. Int. Conf. Image Process., 2013, pp. 4034–4038.
M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, “A real-time algorithm for signal analysis with the help of the wavelet transform,” in Proc. Int. Conf. Time-Freq. Methods Phase Space, 1989, pp. 286–297.
S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed. New York, NY, USA: Academic, 1999.
P. P. Vaidyanathan, “Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial,” Proc. IEEE, vol. Volume 78, no. Issue 1, pp. 56–93, 1990.
L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, “Regularization of neural networks using DropConnect,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1058–1066.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes Challenge 2011 Results. {Online}. Available:
C. M. Bishop, Pattern Recognition and Machine Learning . New York, NY, USA: Springer-Verlag, 2006, p. pp.229.
B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 991–998.
Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural Networks: Tricks of the Trade . Berlin, Germany: Springer, 1998, pp. 9–48.
Y. Jia, et al., “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 746–760.
S. Gupta, P. Arbelaez, and J. Malik, “Perceptual organization and recognition of indoor scenes from RGB-D images,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 564–571.
C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 33, no. Issue 5, pp. 978–994, 2011.
J. Tighe and S. Lazebnik, “Superparsing: scalable nonparametric image parsing with superpixels,” in Proc. Eur. Conf. Comput. Vis., 2010, pp. 352–365.
J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar detectors,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 3001–3008.
J. Dai, K. He, and J. Sun, “Convolutional feature masking for joint object and stuff segmentation,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 3992–4000.
J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic segmentation with second-order pooling,” in Proc. Eur. Conf. Comput. Vis., 2012 pp. 430–443.
R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille, “The role of context for object detection and semantic segmentation in the wild,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 891–898.

Cited By

View all
  • (2025)SDE2D: Semantic-Guided Discriminability Enhancement Feature Detector and DescriptorIEEE Transactions on Multimedia10.1109/TMM.2024.352174827(275-286)Online publication date: 1-Jan-2025
  • (2025)AGSENet: A Robust Road Ponding Detection Method for Proactive Traffic SafetyIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.350665926:1(497-516)Online publication date: 1-Jan-2025
  • (2025)Deep learning-enhanced environment perception for autonomous drivingPattern Recognition10.1016/j.patcog.2024.111174160:COnline publication date: 1-Apr-2025
  • Show More Cited By



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 39, Issue 4
April 2017
208 pages


IEEE Computer Society

United States

Publication History

Published: 01 April 2017


  • Research-article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics


Cited By

View all
  • (2025)SDE2D: Semantic-Guided Discriminability Enhancement Feature Detector and DescriptorIEEE Transactions on Multimedia10.1109/TMM.2024.352174827(275-286)Online publication date: 1-Jan-2025
  • (2025)AGSENet: A Robust Road Ponding Detection Method for Proactive Traffic SafetyIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.350665926:1(497-516)Online publication date: 1-Jan-2025
  • (2025)Deep learning-enhanced environment perception for autonomous drivingPattern Recognition10.1016/j.patcog.2024.111174160:COnline publication date: 1-Apr-2025
  • (2025)STRATNeurocomputing10.1016/j.neucom.2024.129039617:COnline publication date: 18-Feb-2025
  • (2025)Multi-sensor information fusion in Internet of Vehicles based on deep learningNeurocomputing10.1016/j.neucom.2024.128886614:COnline publication date: 21-Jan-2025
  • (2025)Image segmentation reviewInformation Fusion10.1016/j.inffus.2024.102608114:COnline publication date: 1-Feb-2025
  • (2025)Combining hierarchical sparse representation with adaptive prompt for few-shot segmentationExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125377260:COnline publication date: 15-Jan-2025
  • (2025)Lightweight multi-scale feature dense cascade neural network for scene understanding of intelligent autonomous platformExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125354259:COnline publication date: 1-Jan-2025
  • (2025)Semi-supervised segmentation model for crack detection based on mutual consistency constraint and boundary lossEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109683139:PBOnline publication date: 1-Jan-2025
  • (2025)Multi-scale information sharing and selection network with boundary attention for polyp segmentationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109467139:PBOnline publication date: 1-Jan-2025
  • Show More Cited By

View Options

View options






Share this Publication link

Share on social media