Abstract
The semantic segmentation of human body images has huge application potential in many fields, such as autonomous driving, artificial intelligence (AI) face changing, and virtual try-on. Nowadays, many researchers use additional human body posture information to generate multi-level human body analysis images. However, the existing method has limitations when faced with multiple poses and overlapping targets. In this paper, a novel algorithm based on Mask RCNN which has pixel-level accuracy is proposed. In the feature extraction process, a multi-scale feature fusion module applying dilated convolution is proposed to obtain richer semantic information from different perceptual fields. We added a small residual module to the original residual unit structure to increase the size of the receptive field of each layer to capture details and global characteristics. Three convolution kernels with different ratios are designed to obtain receptive fields of different scales. The experimental results show that our method has better performance while considering both object positioning and target classification.
Similar content being viewed by others
References
Arsalan M, Kim DS, Lee MB, Owais M, Park KR (2019) FRED-Net: fully residual encoder–decoder network for accurate iris segmentation. Expert Syst Appl 122:217–241. https://doi.org/10.1016/j.eswa.2019.01.010
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Chen Y, Hu H (2020) Multi-layer adaptive feature fusion for semantic segmentation. Neural Process Lett 51(2):1081–1092. https://doi.org/10.1007/s11063-019-10129-2
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv Prepr. arXiv1412.7062. https://doi.org/10.48550/arXiv.1412.7062
Gao S, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr PHS (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2938758
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65. https://doi.org/10.1016/j.asoc.2018.05.018
Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017-Janua, pp 6757–6765. https://doi.org/10.1109/CVPR.2017.715
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Kwak J, Sung Y (2021) DeepLabV3-Refiner-based semantic segmentation model for dense 3D point clouds. Remote Sens 13(8):1565. https://doi.org/10.3390/rs13081565
Li S, Zhao X, Zhou G (2019) Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput Civ Infrastruct Eng 34(7):616–634. https://doi.org/10.1111/mice.12433
Liu S et al (2013) Fashion parsing with weak color-category labels. IEEE Trans Multimed 16(1):253–265. https://doi.org/10.1109/TMM.2013.2285526
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096–1104
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Mehmood S, Shahzad M, Fraz MM (2020) Deep context aware recurrent neural network for semantic segmentation of large scale unstructured 3D point cloud. Neural Process Lett. https://doi.org/10.1007/s11063-020-10368-8
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv Prepr. arXiv1606.02147. https://doi.org/10.48550/arXiv.1606.02147
Pavoni G, Corsini M, Pedersen N, Petrovic V, Cignoni P (2021) Challenges in the deep learning-based semantic segmentation of benthic communities from Ortho-images. Appl Geomat 13(1):131–146. https://doi.org/10.1007/s12518-020-00331-6
Razzaghi P, Samavi S (2015) Image retargeting using nonparametric semantic segmentation. Multimed Tools Appl 74(24):11517–11536. https://doi.org/10.1007/s11042-014-2249-y
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Xia F, Wang P, Chen X, Yuille A (2017) Joint multi-person pose estimation and semantic part segmentation. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017-Janua, pp 6080–6089. https://doi.org/10.1109/CVPR.2017.644
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv Prepr. arXiv1511.07122. https://doi.org/10.48550/arXiv.1511.07122
Zhang Q, Yang M, Kpalma K, Zheng Q, Zhang X (2018) Segmentation of hand posture against complex backgrounds based on saliency and skin colour detection. IAENG Int J Comput Sci 45(3):435–444
Zhang X, Yang Y, Li Z, Ning X, Qin Y, Cai W (2021) An improved encoder-decoder network based on strip pool method applied to segmentation of farmland vacancy field. Entropy 23(4):435. https://doi.org/10.3390/e23040435
Zhu B, Chen Y, Tang M, Wang J (2018) Progressive cognitive human parsing. 32nd AAAI Conf. Artif. Intell. AAAI 2018, pp 7607–7614. https://doi.org/10.1609/aaai.v32i1.12336
Funding
This work was supported by National Natural Science Foundation of China (No. 61976105) and Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX22_2342).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Data sharing
Data sharing is not applicable to this article, as no new data were created or analyzed in this study.
Conflict of interest
Wentao He, Jing’an Wang, Lei Wang, Ruru Pan* and Weidong Gao declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, W., Wang, J., Wang, L. et al. A semantic segmentation algorithm for fashion images based on modified mask RCNN. Multimed Tools Appl 82, 28427–28444 (2023). https://doi.org/10.1007/s11042-023-14958-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14958-1