Abstract
To parse images into fine-grained semantic parts, the complex elements will put it in trouble when using off-the-shelf semantic segmentation networks, because it is difficult for them to utilize the contextual information of fine-grained parts. In this paper we propose a progressive decomposition method to parse images in a coarse-to-fine manner with refined semantic classes. It consists of two aspects: stacked networks and progressive supervisions. The stacked network is achieved by stacking some segmentation layers in a segmentation network. The former segmentation module parses images at a coarser-grained level, and the result will be fed to the following one to provide effective contextual clues for the finer-grained parsing. The skip connections from shallow layers of the network to fine-grained parsing modules are also added to recover the details of small structures. For the training of the stacked networks which have coarse-to-fine outputs, a strategy of progressive supervision is proposed to merge classes in ground truth to get coarse-to-fine label maps, and then train the stacked network end-to-end with the hierarchical supervisions. The proposed framework can be injected into many advanced neural networks to improve the parsing results. Extensive evaluations on several public datasets including face parsing and human parsing well demonstrate the superiority of our method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39:2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Chen L-C, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: 2016 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3640–3649. https://doi.org/10.1109/CVPR.2016.396
Chen L-C, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40:834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Eigen D, Fergus R (2015) Predicting depth, surface Normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 2650–2658. https://doi.org/10.1109/ICCV.2015.304
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 4476–4484. https://doi.org/10.1109/CVPR.2017.476
Fu J, Liu J, Wang Y, Lu H (2017) Densely connected deconvolutional network for semantic segmentation. In: 2017 IEEE Int. Conf. Image process. ICIP 2017, Beijing, China, Sept. 17–20, 2017. IEEE, pp 3085–3089. https://doi.org/10.1109/ICIP.2017.8296850
Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 6757–6765. https://doi.org/10.1109/CVPR.2017.715
Hariharan B, Arbeláez PA, Girshick RB, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015. IEEE Computer Society, pp 447–456. https://doi.org/10.1109/CVPR.2015.7298642
Hu J, Sun Z, Sun Y, Shi J (2018) Progressive refinement: a method of coarse-to-fine image parsing using stacked network. In: 2018 IEEE Int. Conf. Multimed. Expo, ICME 2018, San Diego, USA, July 23–27, 2018, pp 1–6
Jegou S, Drozdzal M, Vazquez D et al (2017) The one hundred layers tiramisu: fully convolutional DenseNets for semantic segmentation. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. Work. IEEE, pp 1175–1183. https://doi.org/10.1109/CVPRW.2017.156
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Adv. Neural Inf. Process. Syst. 24, 25th Annu. Conf. Neural Inf. Process. Syst. 2011. Proc. a meet. Held 12–14 December 2011, Granada, Spain, pp 109–117 http://papers.nips.cc/paper/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Adv. Neural Inf. Process. Syst. 25 26th Annu. Conf. Neural Inf. Process. Syst. 2012. Proc. a meet. Held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp 1106–1114 http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
Li Z, Zhang J (2017) Pixel-level guided face editing with fully convolution networks. In: 2017 IEEE Int. Conf. Multimed. Expo. IEEE, pp 307–312. https://doi.org/10.1109/ICME.2017.8019363
Liang X, Liu S, Shen X et al (2015) Deep human parsing with active template regression. IEEE Trans Pattern Anal Mach Intell 37:2402–2414. https://doi.org/10.1109/TPAMI.2015.2408360
Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S (2015) Human parsing with contextualized convolutional neural network. In: 2015 IEEE Int. Conf. Comput. Vis. IEEE, pp 1386–1394. https://doi.org/10.1109/ICCV.2015.163
Liang X, Lin L, Yang W et al (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimed 18:1175–1186. https://doi.org/10.1109/TMM.2016.2542983
Liang X, Shen X, Xiang D, Feng J, Lin L, Yan S (2016) Semantic object parsing with local-global Long short-term memory. In: 2016 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3185–3193. https://doi.org/10.1109/CVPR.2016.347
Lin G, Shen C, van den Hengel A, Reid ID (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: 2016 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 3194–3203. https://doi.org/10.1109/CVPR.2016.348
Lin G, Milan A, Shen C, Reid ID (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 5168–5177. https://doi.org/10.1109/CVPR.2017.549
Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Xiaochun C, Yan S (2015) Matching-CNN meets KNN: quasi-parametric human parsing. In: 2015 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 1419–1427. https://doi.org/10.1109/CVPR.2015.7298748
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: 2012 IEEE Conf. Comput. Vis. Pattern recognition, Provid. RI, USA, June 16–21, 2012. IEEE Computer Society, pp 2480–2487. https://doi.org/10.1109/CVPR.2012.6247963
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 1520–1528. https://doi.org/10.1109/ICCV.2015.178
Porway J, Wang Q, Zhu SC (2010) A hierarchical and contextual model for aerial image parsing. Int J Comput Vis 88:254–283. https://doi.org/10.1007/s11263-009-0306-1
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.) Med. Image Comput. Comput. Interv. - MICCAI 2015 - 18th Int. Conf. Munich, Ger. Oct. 5–9, 2015, proceedings, part III. Springer, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39:640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, CoRR. abs/1409.1556. http://arxiv.org/abs/1409.1556
Smith BM, Zhang L, Brandt J, Lin Z, Yang J (2013) Exemplar-based face parsing. In: 2013 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3484–3491. https://doi.org/10.1109/CVPR.2013.447
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015. IEEE Computer Society, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Tu Z, Chen X, Yuille AL, Zhu SC (2005) Image parsing: unifying segmentation, detection, and recognition. Int J Comput Vis 63:113–140. https://doi.org/10.1007/s11263-005-6642-x
Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. In: 2017 IEEE Int. Conf. Comput. Vis. IEEE, pp 4039–4048. https://doi.org/10.1109/ICCV.2017.433
Xu Z, Chen H, Zhu SC, Luo J (2008) A hierarchical compositional model for face representation and sketching. IEEE Trans Pattern Anal Mach Intell 30:955–969. https://doi.org/10.1109/TPAMI.2008.50
Yang K, Sun Z (2017) Paint with stitches: a style definition and image-based rendering method for random-needle embroidery. Multimed Tools Appl. https://doi.org/10.1007/s11042-017-4882-8
Zhang Y, Ying MTC, Yang L et al (2016) Coarse-to-fine stacked fully convolutional nets for lymph node segmentation in ultrasound images. In: 2016 IEEE Int. Conf. Bioinforma. Biomed. IEEE, pp 443–448. https://doi.org/10.1109/BIBM.2016.7822557
Zhang H, Xu T, Li H (2017) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: 2017 IEEE Int. Conf. Comput. Vis. IEEE, pp 5908–5916. https://doi.org/10.1109/ICCV.2017.629
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 6230–6239. https://doi.org/10.1109/CVPR.2017.660
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 1529–1537. https://doi.org/10.1109/ICCV.2015.179
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ADE20K dataset. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 5122–5130. https://doi.org/10.1109/CVPR.2017.544
Acknowledgements
This work was supported by Development Program of China (Nos. 2018YFC0309100 and 2018YFC0309104), National Natural Science Foundation of China (Nos. 61321491 and 61272219), National High Technology Research and Development Program of China (No. 2007AA01Z334), National Key Research and, Program for New Century Excellent Talents in University of China (NCET-04-04605), the China Postdoctoral Science Foundation (Grant No. 2017M621700) and Innovation Fund of State Key Laboratory for Novel Software Technology (Nos. ZZKT2013A12, ZZKT2016A11 and ZZKT2018A09).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, Y., Hu, J., Shi, J. et al. Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks. Multimed Tools Appl 79, 13379–13402 (2020). https://doi.org/10.1007/s11042-019-08288-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08288-4