Abstract
Real-time semantic segmentation is a challenging task as both segmentation accuracy and inference speed need to be considered at the same time. In this paper, a Dual Context Network (DCNet) is presented to address this challenge. It contains two independent sub-networks: Region Context Network and Pixel Context Network. Region Context Network is main network with low-resolution input and features re-weighting module to achieve sufficient receptive field. Meanwhile, Pixel Context Network with location attention module is to capture the location dependencies of each pixel for assisting the main network to recover spatial detail. A contextual feature fusion is introduced to combine output features of these two sub-networks. The experiments show that DCNet can achieve high-quality segmentation while keeping a high speed. Specifically, for Cityscapes test dataset, it can achieve 76.1% Mean IOU with the speed of 82 FPS on a single GTX 2080Ti GPU when using ResNet50 as backbone and 71.2% Mean IOU with the speed of 142 FPS when using ResNet18 as backbone.
Similar content being viewed by others
References
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12), 2481–2495 (2017)
Paszke, A., Chaurasia, A., Kim,S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation (2016)
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 552–568 (2018)
Yu, C., Wang, J., Peng, C. Gao, C., Yu, G., Sang, N.: Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 325–341 (2018)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 405–420 (2018)
Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019).
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Cordts, M. et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Brostow, G.J., Fauqueur, J., Cipolla, R.J.: Semantic object classes in video: a high-definition ground truth database. Pattern Recognit. Lett. 30(2), 88–97 (2009)
Wu, Z., Shen, C., Hengel, A.: Real-time semantic image segmentation via spatial sparsity (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Liu, W., Rabinovich, A., Berg, A. C.: Parsenet: Looking wider to see better (2015)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017).
Yang, M., Yu, K., Zhang, C., Li, Z.,. Yang, K: Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692 (2018)
Yuan, Y., Wang, J.: Ocnet: Object context network for scene parsing (2018)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866 (2018)
Orsic, M., Kreso, I., Bevandic, P., Segvic, S.: In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12607–12616 (2019)
Lin, Z. et al.: A structured self-attentive sentence embedding (2017)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: "Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Howard A. et al.: Searching for mobilenetv3. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 801–818 (2018)
Fu, J. et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Zhang, H. et al.: Context encoding for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160 (2018)
Sturgess, P., Alahari, K., Ladicky, L., Torr, P. H.: Combining appearance and structure from motion features for road scene understanding (2009)
Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: Thing and stuff classes in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209–1218 (2018)
Lin, T.-Y. et al.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Wang, P. et al.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460, IEEE (2018)
Ding, H., Jiang, X., Shuai, B., Qun Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2393–2402 (2018)
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4151–4160 (2017)
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: Spatial CNN for traffic scene understanding. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)
Li, H., Xiong, P. An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv 2018
Zheng, S. et al.: Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537 (2015)
Liu, Y., Cao, S., Lasang, P., Shen, S.: Modular lightweight network for road object detection using a feature fusion approach. IEEE Trans. Syst. Man Cybern. Syst. pp 1–13 (2019)
Khelifi, L., Mignotte, M.: A novel fusion approach based on the global consistency criterion to fusing multiple segmentations. IEEE Trans. Syst. Man Cybern. Syst. 47(9), 2489–2502 (2017)
Yuan, X., Cao, X., Hao, X., Chen, H., Wei, X.: Vehicle detection by a context-aware multichannel feature pyramid. IEEE Trans. Syst. Man Cybern. Syst. 47(7), 1348–1357 (2017)
Li, Y., Guo, Y., Kao, Y., He, R.: Image piece learning for weakly supervised semantic segmentation. IEEE Trans. Syst. Man Cybern. Syst. 47(4), 648–659 (2017)
Si, J., Zhang, H., Li, C., Guo, J.: Spatial pyramid-based statistical features for person re-identification: a comprehensive evaluation. IEEE Trans. Syst. Man Cybern. Syst. 48(7), 1140–1154 (2018)
Chen, B., Collins, M., Zhu, Y., Liu, T., Huang, T., Adam, H., Chen, L.: Panoptic-deeplab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Fang, J., Sun, Y., Zhang, Q., Peng, K., Li, Y., Liu, W., Wang, X.: FNA++: fast network adaptation via parameter remapping and architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 43(9), 2990–3004 (2021)
Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tan, S., Tong, Y.: Semantic flow for fast and accurate scene parsing. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Fang, C., Tian., H., Zhang, D, Zhang, Q., Han, J., Han, J.: Densely nested top-down flows for salient object detection. Sci. China Inf. Sci. (2022)
Liu, N., Han, J.: DHSNet: deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Liu, N., Han, J., Yang, M.: PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Zhao, J., Liu, J.-J., Fan, D.-P., Cao, Y., Yang, J., Cheng, M.-M.: EGNet: Edge guidance network for salient object detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8779–8788 (2019)
Liu, N., Zhao, W., Shao, L., Han, J.: SCG: saliency and contour guided salient instance segmentation. IEEE Trans. Image Process. 30, 5862–5874 (2021)
Zhang, D., Zeng, W., Guo, G., Fang, C., Cheng, L., Han. J.: Weakly supervised semantic segmentation via alternative self-dual teaching. arXiv preprint arXiv:2112.09459 (2021)
Zhang, D., Zeng, W., Yao, J., Han, J.: Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3349–3363 (2022)
Zhang, D., Han, J., Yang, L., Xu, D.: SPFTN: a joint learning framework for localizing and segmenting objects in weakly labeled videos. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 475–489 (2020)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yin, H., Xie, W., Zhang, J. et al. Dual Context Network for real-time semantic segmentation. Machine Vision and Applications 34, 22 (2023). https://doi.org/10.1007/s00138-023-01373-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01373-7