A semantic segmentation algorithm for fashion images based on modified mask RCNN

Wentao He ORCID: orcid.org/0000-0003-3928-4170¹,
Jing’an Wang¹,
Lei Wang¹,
Ruru Pan¹ &
…
Weidong Gao¹

453 Accesses
1 Altmetric
Explore all metrics

Abstract

The semantic segmentation of human body images has huge application potential in many fields, such as autonomous driving, artificial intelligence (AI) face changing, and virtual try-on. Nowadays, many researchers use additional human body posture information to generate multi-level human body analysis images. However, the existing method has limitations when faced with multiple poses and overlapping targets. In this paper, a novel algorithm based on Mask RCNN which has pixel-level accuracy is proposed. In the feature extraction process, a multi-scale feature fusion module applying dilated convolution is proposed to obtain richer semantic information from different perceptual fields. We added a small residual module to the original residual unit structure to increase the size of the receptive field of each layer to capture details and global characteristics. Three convolution kernels with different ratios are designed to obtain receptive fields of different scales. The experimental results show that our method has better performance while considering both object positioning and target classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced Deep Learning Framework for Fine-Grained Segmentation of Fashion and Apparel

Fully Convolutional Network with Superpixel Parsing for Fashion Web Image Segmentation

Feature fusion network for clothing parsing

Article 18 February 2022

References

Arsalan M, Kim DS, Lee MB, Owais M, Park KR (2019) FRED-Net: fully residual encoder–decoder network for accurate iris segmentation. Expert Syst Appl 122:217–241. https://doi.org/10.1016/j.eswa.2019.01.010
Article Google Scholar
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Article Google Scholar
Chen Y, Hu H (2020) Multi-layer adaptive feature fusion for semantic segmentation. Neural Process Lett 51(2):1081–1092. https://doi.org/10.1007/s11063-019-10129-2
Article Google Scholar
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv Prepr. arXiv1412.7062. https://doi.org/10.48550/arXiv.1412.7062
Gao S, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr PHS (2019) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2938758
Article Google Scholar
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65. https://doi.org/10.1016/j.asoc.2018.05.018
Article Google Scholar
Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017-Janua, pp 6757–6765. https://doi.org/10.1109/CVPR.2017.715
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Article Google Scholar
Kwak J, Sung Y (2021) DeepLabV3-Refiner-based semantic segmentation model for dense 3D point clouds. Remote Sens 13(8):1565. https://doi.org/10.3390/rs13081565
Article Google Scholar
Li S, Zhao X, Zhou G (2019) Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput Civ Infrastruct Eng 34(7):616–634. https://doi.org/10.1111/mice.12433
Article Google Scholar
Liu S et al (2013) Fashion parsing with weak color-category labels. IEEE Trans Multimed 16(1):253–265. https://doi.org/10.1109/TMM.2013.2285526
Article Google Scholar
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096–1104
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Mehmood S, Shahzad M, Fraz MM (2020) Deep context aware recurrent neural network for semantic segmentation of large scale unstructured 3D point cloud. Neural Process Lett. https://doi.org/10.1007/s11063-020-10368-8
Article Google Scholar
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv Prepr. arXiv1606.02147. https://doi.org/10.48550/arXiv.1606.02147
Pavoni G, Corsini M, Pedersen N, Petrovic V, Cignoni P (2021) Challenges in the deep learning-based semantic segmentation of benthic communities from Ortho-images. Appl Geomat 13(1):131–146. https://doi.org/10.1007/s12518-020-00331-6
Article Google Scholar
Razzaghi P, Samavi S (2015) Image retargeting using nonparametric semantic segmentation. Multimed Tools Appl 74(24):11517–11536. https://doi.org/10.1007/s11042-014-2249-y
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Xia F, Wang P, Chen X, Yuille A (2017) Joint multi-person pose estimation and semantic part segmentation. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017-Janua, pp 6080–6089. https://doi.org/10.1109/CVPR.2017.644
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv Prepr. arXiv1511.07122. https://doi.org/10.48550/arXiv.1511.07122
Zhang Q, Yang M, Kpalma K, Zheng Q, Zhang X (2018) Segmentation of hand posture against complex backgrounds based on saliency and skin colour detection. IAENG Int J Comput Sci 45(3):435–444
Google Scholar
Zhang X, Yang Y, Li Z, Ning X, Qin Y, Cai W (2021) An improved encoder-decoder network based on strip pool method applied to segmentation of farmland vacancy field. Entropy 23(4):435. https://doi.org/10.3390/e23040435
Article Google Scholar
Zhu B, Chen Y, Tang M, Wang J (2018) Progressive cognitive human parsing. 32nd AAAI Conf. Artif. Intell. AAAI 2018, pp 7607–7614. https://doi.org/10.1609/aaai.v32i1.12336

Download references

Funding

This work was supported by National Natural Science Foundation of China (No. 61976105) and Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX22_2342).

Author information

Authors and Affiliations

Key Laboratory of Eco-textiles, Ministry of Education, Jiangnan University, No. 1800, Lihu Avenue, Wuxi, 214122, Jiangsu, China
Wentao He, Jing’an Wang, Lei Wang, Ruru Pan & Weidong Gao

Authors

Wentao He
View author publications
You can also search for this author in PubMed Google Scholar
Jing’an Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruru Pan
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruru Pan.

Ethics declarations

Data sharing

Data sharing is not applicable to this article, as no new data were created or analyzed in this study.

Conflict of interest

Wentao He, Jing’an Wang, Lei Wang, Ruru Pan* and Weidong Gao declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, W., Wang, J., Wang, L. et al. A semantic segmentation algorithm for fashion images based on modified mask RCNN. Multimed Tools Appl 82, 28427–28444 (2023). https://doi.org/10.1007/s11042-023-14958-1

Download citation

Received: 05 April 2022
Revised: 31 October 2022
Accepted: 22 February 2023
Published: 13 March 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11042-023-14958-1

A semantic segmentation algorithm for fashion images based on modified mask RCNN

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhanced Deep Learning Framework for Fine-Grained Segmentation of Fashion and Apparel

Fully Convolutional Network with Superpixel Parsing for Fashion Web Image Segmentation

Feature fusion network for clothing parsing

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data sharing

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A semantic segmentation algorithm for fashion images based on modified mask RCNN

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhanced Deep Learning Framework for Fine-Grained Segmentation of Fashion and Apparel

Fully Convolutional Network with Superpixel Parsing for Fashion Web Image Segmentation

Feature fusion network for clothing parsing

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Data sharing

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now