Abstract
Semantic segmentation is an active field of computer vision. It provides semantic information for many applications. In semantic segmentation tasks, spatial information, context information, and high-level semantic information play an important role in improving segmentation accuracy. In this paper, a semantic segmentation network with multi-path structure, attention reweighting, and multi-scale encoding structure is proposed. Firstly, three parallel structures were designed, including a pyramid spatial path with a pyramid image input, a context path composed of a lightweight backbone network, and a semantic graph path composed of spatial graph convolutional layers. Secondly, a feature fusion module was designed to perform a weighted fusion of the output features of different paths based on the channel attention mechanism. Then, the semantic segmentation dataset CamVid and Cityscapes were used for network training. Finally, ablation experiments were carried out to verify the effectiveness of the proposed network components, and analyze the computational efficiency and segmentation accuracy of the model. The experimental results show that the semantic segmentation network can improve the accuracy of semantic segmentation by combining multi-scale information, high-level semantic information, and global context information while ensuring high computational efficiency.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Liu, F., et al.: Structural feature learning-based unsupervised semantic segmentation of synthetic aperture radar image. J. Appl. Remote Sens. 13(1), 014501 (2019)
Wang, D., Han, M.: SA-U-Net++: SAR marine floating raft aquaculture identification based on semantic segmentation and ISAR augmentation. J. Appl. Remote Sens. 15(1), 016505 (2021)
Liu, Y., et al.: Semantic segmentation of multisensor remote sensing imagery with deep ConvNets and higher-order conditional random fields. J. Appl. Remote Sens. 13(1), 016501 (2019)
Wang, Y., Xiao, S.: Learning multiscale spatial context for three-dimensional point cloud semantic segmentation. J Electron Imag 29(6), 063005 (2020)
Ku, T., et al.: SHREC 2020: 3D point cloud semantic segmentation for street scenes. Comput. Graph. 93, 13–24 (2020)
Hegde, S., Gangisetty, S.: PIG-Net: Inception based deep learning architecture for 3D point cloud segmentation. Comput. Graph. 95, 13–22 (2021)
Boulch, A., et al.: SnapNet: 3D point cloud semantic labeling with 2D deep segmentation networks. Comput. Graph. 71, 189–198 (2018)
Wang, P., et al.: 3D shape segmentation via shape fully convolutional networks. Comput. Graph. 76, 182–192 (2018)
Li, C., et al.: ANU-Net: attention-based nested U-Net to exploit full resolution features for medical image segmentation. Comput. Graph. 90, 11–20 (2020)
Yuan, D., Qiang, J., Yin, J.: Image segmentation via foreground and background semantic descriptors. J. Electron. Imag. 26(5), 053004 (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)
Paszke, A., et al., ENet: A deep neural network architecture for real-time semantic segmentation. ArXiv, 2016. abs/1606.02147.
Zhao, H., et al., ICNet for real-time semantic segmentation on high-resolution images. ArXiv, 2018. abs/1704.08545.
Li, X., et al., Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 6459-6468
Wu, Z., C. Shen and A.V.D. Hengel, Real-time semantic image segmentation via spatial sparsity. ArXiv, 2017. abs/1712.00213.
Yu, C., et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation. in ECCV. 2018.
Woo, S., et al. CBAM: Convolutional block attention module. in ECCV. 2018.
Brostow, G., et al. Segmentation and recognition using structure from motion point clouds. in ECCV. 2008.
Cordts, M., et al., The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016: p. 3213-3223
Wang, P., et al., Understanding Convolution for Semantic Segmentation. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018: p. 1451-1460
Chen, L., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)
Chen, L., et al., Rethinking atrous convolution for semantic image segmentation. ArXiv, 2017. abs/1706.05587.
Zhao, H., et al., Pyramid scene parsing network. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 6230-6239
Wang, C., et al.: On the contextual aspects of using deep convolutional neural network for semantic image segmentation. J. Electron. Imag. 27(5), 051223 (2018)
Zhang, R., et al., Scale-adaptive convolutions for scene parsing. 2017 IEEE international conference on computer vision (ICCV), 2017: p. 2050-2058
He, K., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)
Gao, S., et al.: Res2Net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43, 652–662 (2021)
Wang, Y., et al., MGCN: Descriptor learning using multiscale GCNs. ArXiv, 2020. https://arxiv.org/abs/2001.10472
Li, X., et al. Expectation-maximization attention networks for semantic segmentation. in Proceedings of the IEEE International Conference on Computer Vision. 2019.
Wang, F., et al., Residual attention network for image classification. 2017 ieee conference on computer vision and pattern recognition (CVPR), 2017: p. 6450-6458
Hu, J., et al.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2011–2023 (2020)
Zhao, Y., et al., Multi-class part parsing with joint boundary-semantic awareness. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 9176-9185
Li, Y. and A. Gupta. Beyond grids: Learning graph representations for visual recognition. in NeurIPS. 2018.
Liang, X., et al. Symbolic graph reasoning meets convolutions. in NeurIPS. 2018.
Bruna, J., et al., Spectral networks and locally connected networks on graphs. CoRR, 2014. https://arxiv.org/abs/1312.6203
Defferrard, M., X. Bresson and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. in NIPS. 2016.
Kipf, T. and M. Welling, Semi-supervised classification with graph convolutional networks. 2017. https://arxiv.org/abs/1609.02907
Velickovic, P., et al., Graph attention networks. ArXiv, 2018. abs/1710.10903.
Michieli, U., et al. GMNet: Graph matching network for large scale part semantic segmentation in the wild. in ECCV. 2020.
Peng, C., et al., Large Kernel Matters — improve semantic segmentation by global convolutional network. 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017: p. 1743-1751
Ioffe, S. and C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift. ArXiv, 2015. abs/1502.03167.
Glorot, X., A. Bordes and Y. Bengio. Deep sparse rectifier neural networks. in AISTATS. 2011.
He, K., et al., Deep residual learning for image recognition. 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016: p. 770-778
Howard, A.G., et al., MobileNets: Efficient convolutional neural networks for mobile vision applications. ArXiv, 2017. abs/1704.04861.
F., C. Xception: Deep learning with depthwise separable convolutions. in 2017 ieee conference on computer vision and pattern recognition (CVPR). 2017.
J., D., et al. ImageNet: a large-scale hierarchical image database. in 2009 ieee conference on computer vision and pattern recognition. 2009.
Richter, S.R., et al., Playing for data: ground truth from computer games. ArXiv, 2016. abs/1608.02192.
Kirillov, A., et al., Panoptic Feature Pyramid Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: p. 6392-6401
X., L., X. E. and Z. H. Dynamic-Structured Semantic Propagation Network. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.
J., F., et al., Scene Segmentation With Dual Relation-Aware Attention Network. IEEE Transactions on Neural Networks and Learning Systems, 2020: p. 1–14.
M., Y., et al. DenseASPP for Semantic Segmentation in Street Scenes. in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018.
Cheng, B., et al., SPGNet: Semantic Prediction Guidance for Scene Parsing. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: p. 5217-5227
Tsai, Y., et al., Learning to adapt structured output space for semantic segmentation. 2018 IEEE/CVF conference on computer vision and pattern recognition, 2018: p. 7472-7481
Luo, Y., et al., Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019: p. 2502-2511
Tsai, Y., et al., Domain adaptation for structured output via discriminative patch representations. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 1456-1465
Zou, Y., et al., Confidence regularized self-training. 2019 IEEE/CVF international conference on computer vision (ICCV), 2019: p. 5981-5990
Zhang, Q., et al. Category anchor-guided unsupervised domain adaptation for semantic segmentation. in NeurIPS. 2019.
Zheng, Z., Yang, Y.W.: Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. Int. J. Computer Vision 54, 1–15 (2021)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (No. 51874217), Foundation of Hubei Provincial Education Department (No. B2020011), WUST National Defense Pre-research Foundation (No. GF202008).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lin, Z., Sun, W., Tang, B. et al. Semantic segmentation network with multi-path structure, attention reweighting and multi-scale encoding. Vis Comput 39, 597–608 (2023). https://doi.org/10.1007/s00371-021-02360-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02360-7