Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Fully Convolutional Networks for Semantic Segmentation

Published: 01 April 2017 Publication History

Abstract

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional networks achieve improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image.

References

[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Neural Inf. Process. Syst., 2012, pp. 1106–1114.
[2]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., 2015.
[3]
C. Szegedy, et al., “Going deeper with convolutions,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.
[4]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated recognition, localization and detection using convolutional networks,” in Proc. Int. Conf. Learn. Represent., 2014.
[5]
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 38, no. Issue 1, pp. 142–158, 2015.
[6]
K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 346–361.
[7]
N. Zhang, J. Donahue, R. Girshick, and T. Darrell, “Part-based R-CNNs for fine-grained category detection,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 834–849.
[8]
J. Long, N. Zhang, and T. Darrell, “Do convnets learn correspondence?” in Proc. Neural Inf. Process. Syst, 2014, pp. 1601–1609.
[9]
P. Fischer, A. Dosovitskiy, and T. Brox, “Descriptor matching with convolutional neural networks: A comparison to SIFT,” arXiv preprint arXiv:1405.5769, 2014.
[10]
F. Ning, D. Delhomme, Y. LeCun, F. Piano, L. Bottou, and P. E. Barbano, “Toward automatic phenotyping of developing embryos from videos,” IEEE Trans. Image Process., vol. Volume 14, no. Issue 9, pp. 1360–1371, 2005.
[11]
D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images,” in Proc. Neural Inf. Process. Syst., 2012, pp. 2852–2860.
[12]
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene labeling,” Proc. IEEE Trans. Pattern Anal. Mach. Intel., vol. Volume 35, no. Issue 8, pp. 1915–1929, 2013.
[13]
P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” in Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 82–90.
[14]
B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous detection and segmentation,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 297–312.
[15]
S. Gupta, R. Girshick, P. Arbelaez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 345–360.
[16]
Y. Ganin and V. Lempitsky, “N<inline-formula><tex-math notation=LaTeX>$^4$</tex-math><alternatives><inline-graphic/></alternatives></inline-formula>-fields: Neural network nearest neighbor fields for image transforms,” in Proc. Asian Conf. Comput. Vis., 2014, pp. 536–551.
[17]
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. Comput. Vis. Pattern Recognit., 2015.
[18]
J. Donahue, et al., “DeCAF: A deep convolutional activation feature for generic visual recognition,” in Proc. Int. Conf. Mach. Learn., 2014, pp. 647–655.
[19]
M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.
[20]
O. Matan, C. J. Burges, Y. LeCun, and J. S. Denker, “Multi-digit recognition using a space displacement neural network,” in Proc. Neural Inf. Process. Syst., 1991, pp. 488–495.
[21]
Y. LeCun, et al., “Backpropagation applied to hand-written zip code recognition,” in Proc. Neural Comput., 1989, pp. 541–551.
[22]
R. Wolf and J. C. Platt, “Postal address block location using a convolutional locator network,” in Proc. Neural Inf. Process. Syst., 1994, pp. 745–745.
[23]
D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in Proc. Int. Conf. Comput. Vis., 2013, pp. 633–640.
[24]
J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a convolutional network and a graphical model for human pose estimation,” in Proc. Neural Inf. Process. Syst., 2014, pp. 1799–1807.
[25]
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Proc. Neural Inf. Process. Syst., 2014, pp. 2366–2374.
[26]
P. Burt and E. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. Volume 31, no. Issue 4, pp. 532–540, 1983.
[27]
J. J. Koenderink and A. J. van Doorn, “Representation of local geometry in the visual system,” Biol. Cybern., vol. Volume 55, no. Issue 6, pp. 367–375, 1987.
[28]
P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian detection with unsupervised multi-stage feature learning,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 3626–3633.
[29]
B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 447–456.
[30]
M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich, “Feedforward semantic segmentation with zoom-out features,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 3376–3385.
[31]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Neural Inf. Process. Syst., 2015, pp. 91–99.
[32]
S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1395–1403.
[33]
F. Liu, C. Shen, G. Lin, and I. Reid, “Learning depth from single monocular images using deep convolutional neural fields,” IEEE Trans. Pattern Anal. Mach. Intell., 2015,
[34]
P. Fischer, et al., “Learning optical flow with convolutional networks,” in Proc. Int. Conf. Comput. Vis., 2015.
[35]
D. Pathak, P. Krähenbühl, and T. Darrell, “Constrained convolutional neural networks for weakly supervised segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1796–1804.
[36]
G. Papandreou, L.-C. Chen, K. Murphy, and A. L. Yuille, “Weakly-and semi-supervised learning of a DCNN for semantic image segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1742–1750.
[37]
J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1635–1643.
[38]
S. Hong, H. Noh, and B. Han, “Decoupled deep neural network for semi-supervised semantic segmentation,” in Proc. Neural Inf. Process. Syst., 2015, pp. 1495–1503.
[39]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected CRFs,” in Proc. Int. Conf. Learn. Represent., 2015.
[40]
S. Zheng, et al., “Conditional random fields as recurrent neural networks,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1529–1537.
[41]
W. Liu, A. Rabinovich, and A. C. Berg, “ParseNet: Looking wider to see better,” arXiv preprint arXiv:1506.04579, 2015.
[42]
H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1520–1528.
[43]
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. Med. Image Comput. Comput.-Assist. Intervention, 2015, pp. 234–241.
[44]
F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proc. Int. Conf. Learn. Represent., 2016.
[45]
A. Giusti, D. C. Cireşan, J. Masci, L. M. Gambardella, and J. Schmidhuber, “Fast image scanning with deep max-pooling convolutional neural networks,” in Proc. Int. Conf. Image Process., 2013, pp. 4034–4038.
[46]
M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, “A real-time algorithm for signal analysis with the help of the wavelet transform,” in Proc. Int. Conf. Time-Freq. Methods Phase Space, 1989, pp. 286–297.
[47]
S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed. New York, NY, USA: Academic, 1999.
[48]
P. P. Vaidyanathan, “Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial,” Proc. IEEE, vol. Volume 78, no. Issue 1, pp. 56–93, 1990.
[49]
L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, “Regularization of neural networks using DropConnect,” in Proc. Int. Conf. Mach. Learn., 2013, pp. 1058–1066.
[50]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes Challenge 2011 Results. {Online}. Available: http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html
[51]
C. M. Bishop, Pattern Recognition and Machine Learning . New York, NY, USA: Springer-Verlag, 2006, p. pp.229.
[52]
B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 991–998.
[53]
Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural Networks: Tricks of the Trade . Berlin, Germany: Springer, 1998, pp. 9–48.
[54]
Y. Jia, et al., “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
[55]
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 746–760.
[56]
S. Gupta, P. Arbelaez, and J. Malik, “Perceptual organization and recognition of indoor scenes from RGB-D images,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 564–571.
[57]
C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 33, no. Issue 5, pp. 978–994, 2011.
[58]
J. Tighe and S. Lazebnik, “Superparsing: scalable nonparametric image parsing with superpixels,” in Proc. Eur. Conf. Comput. Vis., 2010, pp. 352–365.
[59]
J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar detectors,” in Proc. Comput. Vis. Pattern Recognit., 2013, pp. 3001–3008.
[60]
J. Dai, K. He, and J. Sun, “Convolutional feature masking for joint object and stuff segmentation,” in Proc. Comput. Vis. Pattern Recognit., 2015, pp. 3992–4000.
[61]
J. Carreira, R. Caseiro, J. Batista, and C. Sminchisescu, “Semantic segmentation with second-order pooling,” in Proc. Eur. Conf. Comput. Vis., 2012 pp. 430–443.
[62]
R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R. Urtasun, and A. Yuille, “The role of context for object detection and semantic segmentation in the wild,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 891–898.

Cited By

View all
  • (2024)Infrared and visible image fusion method based on full convolutional network (FCN)Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23609446:1(2825-2834)Online publication date: 1-Jan-2024
  • (2024)A bilateral feature fusion network for defect detection on mobile camerasJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23266446:1(2585-2594)Online publication date: 1-Jan-2024
  • (2024)Semantic segmentation based on enhanced gated pyramid network with lightweight attention moduleAI Communications10.3233/AIC-22025437:1(97-114)Online publication date: 21-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence  Volume 39, Issue 4
April 2017
208 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 April 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Infrared and visible image fusion method based on full convolutional network (FCN)Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23609446:1(2825-2834)Online publication date: 1-Jan-2024
  • (2024)A bilateral feature fusion network for defect detection on mobile camerasJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23266446:1(2585-2594)Online publication date: 1-Jan-2024
  • (2024)Semantic segmentation based on enhanced gated pyramid network with lightweight attention moduleAI Communications10.3233/AIC-22025437:1(97-114)Online publication date: 21-Mar-2024
  • (2024)Deep semantic learning for acoustic scene classificationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00323-52024:1Online publication date: 3-Jan-2024
  • (2024)Generation of Smoke Dataset for Power Equipment and Study of Image Semantic SegmentationJournal of Electrical and Computer Engineering10.1155/2024/92984782024Online publication date: 1-Jan-2024
  • (2024)Self-Training Based Instance Calibration for Unsupervised Domain Adaptation Semantic Segmentation of Remote Sensing ImagesProceedings of the 2024 8th International Conference on Big Data and Internet of Things10.1145/3697355.3697370(93-99)Online publication date: 14-Sep-2024
  • (2024)Research on Recognition Algorithm of Power Facility Inspection Robot Based on Computer VisionProceedings of the 2024 9th International Conference on Cyber Security and Information Engineering10.1145/3689236.3689326(446-453)Online publication date: 15-Sep-2024
  • (2024)Developing automated photographic detection of gum diseases using deep neural networks for mobile devicesProceedings of the 2024 6th International Conference on Software Engineering and Development10.1145/3686614.3686621(57-63)Online publication date: 29-May-2024
  • (2024)Self-Attention-based Multi-Scale Feature Fusion Network for RoadPonding SegmentationProceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern Recognition10.1145/3663976.3663987(1-6)Online publication date: 26-Apr-2024
  • (2024)Photovoltaic Array Extraction Algorithm Based on Modified U2NetProceedings of the International Conference on Computing, Machine Learning and Data Science10.1145/3661725.3661775(1-6)Online publication date: 12-Apr-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media