research-article

Exploring the Robustness of Human Parsers Toward Common Corruptions

Authors:

Jie ZhouAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 32

Pages 5394 - 5407

https://doi.org/10.1109/TIP.2023.3313493

Published: 18 September 2023 Publication History

Abstract

Human parsing aims to segment each pixel of the human image with fine-grained semantic categories. However, current human parsers trained with clean data are easily confused by numerous image corruptions such as blur and noise. To improve the robustness of human parsers, in this paper, we construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models. Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions. Specifically, two types of data augmentations from different views, i.e., image-aware augmentation and model-aware image-to-image transformation, are integrated in a sequential manner for adapting to unforeseen image corruptions. The image-aware augmentation can enrich the high diversity of training images with the help of common image operations. The model-aware augmentation strategy that improves the diversity of input data by considering the model’s randomness. The proposed method is model-agnostic, and it can plug and play into arbitrary state-of-the-art human parsing frameworks. The experimental results show that the proposed method demonstrates good universality which can improve the robustness of the human parsing models and even the semantic segmentation models when facing various image common corruptions. Meanwhile, it can still obtain approximate performance on clean data.

References

[1]

W. Wang, Y. Xu, J. Shen, and S.-C. Zhu, “Attentive fashion grammar network for fashion landmark detection and clothing category classification,” in Proc. IEEE Conf. CVPR, Jun. 2018, pp. 4271–4280.

[2]

H. Donget al., “Towards multi-pose guided virtual try-on network,” in Proc. IEEE Conf. ICCV, Oct. 2019, pp. 9026–9035.

[3]

H. Yang, R. Zhang, X. Guo, W. Liu, W. Zuo, and P. Luo, “Towards photo-realistic virtual try-on by adaptively generating-preserving image content,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 7850–7859.

[4]

C. Ge, Y. Song, Y. Ge, H. Yang, W. Liu, and P. Luo, “Disentangled cycle consistency for highly-realistic virtual try-on,” in Proc. IEEE Conf. CVPR, Jun. 2021, pp. 16928–16937.

[5]

S. Song, W. Zhang, J. Liu, Z. Guo, and T. Mei, “Unpaired person image generation with semantic parsing transformation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4161–4176, Nov. 2021.

[6]

Y. Men, Y. Mao, Y. Jiang, W.-Y. Ma, and Z. Lian, “Controllable person image synthesis with attribute-decomposed GAN,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 5084–5093.

[7]

K. Zhu, H. Guo, Z. Liu, M. Tang, and J. Wang, “Identity-guided human semantic parsing for person re-identification,” in Proc. Conf. ECCV. Glasgow, U.K.: Springer, Aug. 2020, pp. 346–363.

[8]

S. Qi, W. Wang, B. Jia, J. Shen, and S.-C. Zhu, “Learning human-object interactions by graph parsing neural networks,” in Proc. Conf. ECCV, 2018, pp. 401–417.

[9]

T. Zhou, S. Qi, W. Wang, J. Shen, and S.-C. Zhu, “Cascaded parsing of human-object interaction recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2827–2840, Jun. 2022.

[10]

H.-S. Fang, C. Wang, M. Gou, and C. Lu, “GraspNet: A large-scale clustered and densely annotated dataset for object grasping,” 2019, arXiv:1912.13470.

[11]

Y. Liu, S. Zhang, J. Xu, J. Yang, and Y.-W. Tai, “An accurate and lightweight method for human body image super-resolution,” IEEE Trans. Image Process., vol. 30, pp. 2888–2897, 2021.

Digital Library

[12]

P. Li, Y. Xu, Y. Wei, and Y. Yang, “Self-correction for human parsing,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 3260–3271, Jun. 2022.

[13]

W. Wang, T. Zhou, S. Qi, J. Shen, and S.-C. Zhu, “Hierarchical human semantic parsing with comprehensive part-relation modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3508–3522, Jul. 2022.

[14]

T. Ruan, T. Liu, Z. Huang, Y. Wei, S. Wei, and Y. Zhao, “Devil in the details: Towards accurate single and multiple human parsing,” in Proc. Conf. AAAI, vol. 33, 2019, pp. 4814–4821.

[15]

S. Zhang, G.-J. Qi, X. Cao, Z. Song, and J. Zhou, “Human parsing with pyramidical gather-excite context,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 3, pp. 1016–1030, Mar. 2021.

[16]

D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, “AUGMIX: A simple data processing method to improve robustness and uncertainty,” in Proc. Conf. ICLR, 2020, pp. 1–15.

[17]

C. Michaeliset al., “Benchmarking robustness in object detection: Autonomous driving when winter is coming,” 2019, arXiv:1907.07484.

[18]

C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 8828–8838.

[19]

C. Kamann and C. Rother, “Increasing the robustness of semantic segmentation models with painting-by-numbers,” in Proc. Conf. ECCV, 2020, pp. 369–387.

[20]

J. Wang, S. Jin, W. Liu, W. Liu, C. Qian, and P. Luo, “When human pose estimation meets robustness: Adversarial algorithms and benchmarks,” in Proc. IEEE Conf. CVPR, Jun. 2021, pp. 11855–11864.

[21]

X. Liang, K. Gong, X. Shen, and L. Lin, “Look into person: Joint body parsing & pose estimation network and a new benchmark,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 4, pp. 871–885, Apr. 2019.

Digital Library

[22]

X. Lianget al., “Deep human parsing with active template regression,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 12, pp. 2402–2414, Dec. 2015.

Digital Library

[23]

X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun, and A. Yuille, “Detect what you can: Detecting and representing objects using holistic models and body parts,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1971–1978.

[24]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010.

Digital Library

[25]

K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg, “Parsing clothing in fashion photographs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 3570–3577.

[26]

S. Liuet al., “Fashion parsing with weak color-category labels,” IEEE Trans. Multimedia, vol. 16, no. 1, pp. 253–265, Jan. 2014.

[27]

Y. Bo and C. C. Fowlkes, “Shape-based pedestrian parsing,” in Proc. IEEE Conf. CVPR, Jun. 2011, pp. 2265–2272.

[28]

J. Dong, Q. Chen, W. Xia, Z. Huang, and S. Yan, “A deformable mixture parsing model with parselets,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 3408–3415.

[29]

K. Yamaguchi, M. H. Kiapour, and T. L. Berg, “Paper doll parsing: Retrieving similar styles to parse clothing items,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 3519–3526.

[30]

L. Zhu, Y. Chen, Y. Lu, C. Lin, and A. Yuille, “Max margin AND/OR graph learning for parsing the human body,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8.

[31]

X. Lianget al., “Human parsing with contextualized convolutional neural network,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1386–1394.

[32]

X. Liang, X. Shen, D. Xiang, J. Feng, L. Lin, and S. Yan, “Semantic object parsing with local–global long short-term memory,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 3185–3193.

[33]

X. Liang, X. Shen, J. Feng, L. Lin, and S. Yan, “Semantic object parsing with graph LSTM,” in Proc. Conf. ECCV, 2016, pp. 125–143.

[34]

P. Luo, X. Wang, and X. Tang, “Pedestrian parsing via deep decompositional network,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 2648–2655.

[35]

W. Wang, T. Zhou, F. Yu, J. Dai, E. Konukoglu, and L. Van Gool, “Exploring cross-image pixel contrast for semantic segmentation,” in Proc. IEEE Conf. ICCV, Oct. 2021, pp. 7303–7313.

[36]

T. Zhou, W. Wang, E. Konukoglu, and L. Van Goo, “Rethinking semantic segmentation: A prototype view,” in Proc. IEEE Conf. CVPR, Jun. 2022, pp. 2582–2593.

[37]

Z. Zhang, C. Su, L. Zheng, and X. Xie, “Correlating edge, pose with parsing,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 8900–8909.

[38]

T. Zhou, W. Wang, S. Liu, Y. Yang, and L. Van Gool, “Differentiable multi-granularity human representation learning for instance-aware human semantic parsing,” in Proc. IEEE Conf. CVPR, Jun. 2021, pp. 1622–1631.

[39]

W. Wang, J. Shen, X. Lu, S. C. H. Hoi, and H. Ling, “Paying attention to video object pattern understanding,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 7, pp. 2413–2428, Jul. 2021.

[40]

R. Jiet al., “Learning semantic neural tree for human parsing,” in Proc. Conf. ECCV, 2020, pp. 205–221.

[41]

W. Wang, Z. Zhang, S. Qi, J. Shen, Y. Pang, and L. Shao, “Learning compositional neural information fusion for human parsing,” in Proc. IEEE Conf. ICCV, Oct. 2019, pp. 5703–5713.

[42]

W. Wang, H. Zhu, J. Dai, Y. Pang, J. Shen, and L. Shao, “Hierarchical human parsing with typed part-relation reasoning,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 8929–8939.

[43]

K. Gong, Y. Gao, X. Liang, X. Shen, M. Wang, and L. Lin, “Graphonomy: Universal human parsing via graph transfer learning,” in Proc. IEEE Conf. CVPR, Jun. 2019, pp. 7450–7459.

[44]

H. He, J. Zhang, Q. Zhang, and D. Tao, “Grapy-ML: Graph pyramid mutual learning for cross-dataset human parsing,” in Proc. Conf. AAAI, vol. 34, 2020, pp. 10949–10956.

[45]

L. Li, T. Zhou, W. Wang, J. Li, and Y. Yang, “Deep hierarchical semantic segmentation,” in Proc. IEEE Conf. CVPR, Jun. 2022, pp. 1246–1257.

[46]

Y. Luo, Z. Zheng, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Macro-micro adversarial network for human parsing,” in Proc. Conf. ECCV, 2018, pp. 418–434.

[47]

S. Liuet al., “Cross-domain human parsing via adversarial feature and label adaptation,” in Proc. Conf. AAAI, vol. 32, 2018, pp. 1–8.

[48]

T. Zhouet al., “Consistency and diversity induced human motion segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 197–210, Jan. 2023.

[49]

T. Li, Z. Liang, S. Zhao, J. Gong, and J. Shen, “Self-learning with rectification strategy for human parsing,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 9263–9272.

[50]

H. Wang, C. Xiao, J. Kossaifi, Z. Yu, A. Anandkumar, and Z. Wang, “AugMax: Adversarial composition of random augmentations for robust training,” in Proc. Conf. NeurIPS, vol. 34, 2021, pp. 237–250.

[51]

Z. Jiang, T. Chen, T. Chen, and Z. Wang, “Robust pre-training by adversarial contrastive learning,” in Proc. Conf. NeurIPS, vol. 33, 2020, pp. 16199–16210.

[52]

T. W. Wenget al., “Evaluating the robustness of neural networks: An extreme value theory approach,” in Proc. Conf. ICLR, 2018, pp. 1–18.

[53]

S. Zheng, Y. Song, T. Leung, and I. Goodfellow, “Improving the robustness of deep neural networks via stability training,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 4480–4488.

[54]

D. Hendryckset al., “The many faces of robustness: A critical analysis of out-of-distribution generalization,” in Proc. IEEE Conf. ICCV, Oct. 2021, pp. 8340–8349.

[55]

C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models with respect to common corruptions,” Int. J. Comput. Vis., vol. 129, no. 2, pp. 462–483, Feb. 2021.

Digital Library

[56]

X. Xu, H. Zhao, and J. Jia, “Dynamic divide-and-conquer adversarial training for robust semantic segmentation,” in Proc. IEEE Conf. ICCV, Oct. 2021, pp. 7486–7495.

[57]

E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “AutoAugment: Learning augmentation strategies from data,” in Proc. IEEE Conf. CVPR, Jun. 2019, pp. 113–123.

[58]

S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, “Res2Net: A new multi-scale backbone architecture,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 652–662, Feb. 2021.

Digital Library

[59]

J. Wanget al., “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3349–3364, Oct. 2021.

[60]

Y. Yuan, X. Chen, and J. Wang, “Object-contextual representations for semantic segmentation,” in Proc. Conf. ECCV, 2020, pp. 173–190.

[61]

X. Luo, Z. Su, J. Guo, G. Zhang, and X. He, “Trusted guidance pyramid network for human parsing,” in Proc. Conf. ACM MM, Oct. 2018, pp. 654–662.

[62]

P. Chen, S. Liu, H. Zhao, and J. Jia, “GridMask data augmentation,” 2020, arXiv:2001.04086.

[63]

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2881–2890.

[64]

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Proc. Conf. ICLR, 2018, pp. 1–28.

[65]

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder–decoder with atrous separable convolution for semantic image segmentation,” in Proc. Conf. ECCV, 2018, pp. 801–818.

Index Terms

Exploring the Robustness of Human Parsers Toward Common Corruptions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
    2. Natural language processing

Index terms have been assigned to the content through auto-classification.

Recommendations

Benchmarking the Robustness of Semantic Segmentation Models with Respect to Common Corruptions
Abstract
When designing a semantic segmentation model for a real-world application, such as autonomous driving, it is crucial to understand the robustness of the network with respect to a wide range of image corruptions. While there are recent robustness ...
LR parsers for natural languages
ACL '84/COLING '84: Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics

MLR, an extended LR parser, is introduced, and its application to natural language parsing is discussed. An LR parser is a shift-reduce parser which is deterministically guided by a parsing table. A parsing table can be obtained automatically from a ...
Multibox parsers

Traditional compiler front end generating tools such as Lex/Yacc assume a front end consisting of two boxes: a lexical box and a syntax box. Lex produces a lexical analyzer using regular expressions as a token description. Yacc generates a syntax ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 32, Issue

2023

5324 pages

ISSN:1057-7149

Issue’s Table of Contents

1941-0042 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 18 September 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents