Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Exploring the Robustness of Human Parsers Toward Common Corruptions

Published: 18 September 2023 Publication History

Abstract

Human parsing aims to segment each pixel of the human image with fine-grained semantic categories. However, current human parsers trained with clean data are easily confused by numerous image corruptions such as blur and noise. To improve the robustness of human parsers, in this paper, we construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models. Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions. Specifically, two types of data augmentations from different views, i.e., image-aware augmentation and model-aware image-to-image transformation, are integrated in a sequential manner for adapting to unforeseen image corruptions. The image-aware augmentation can enrich the high diversity of training images with the help of common image operations. The model-aware augmentation strategy that improves the diversity of input data by considering the model’s randomness. The proposed method is model-agnostic, and it can plug and play into arbitrary state-of-the-art human parsing frameworks. The experimental results show that the proposed method demonstrates good universality which can improve the robustness of the human parsing models and even the semantic segmentation models when facing various image common corruptions. Meanwhile, it can still obtain approximate performance on clean data.

References

[1]
W. Wang, Y. Xu, J. Shen, and S.-C. Zhu, “Attentive fashion grammar network for fashion landmark detection and clothing category classification,” in Proc. IEEE Conf. CVPR, Jun. 2018, pp. 4271–4280.
[2]
H. Donget al., “Towards multi-pose guided virtual try-on network,” in Proc. IEEE Conf. ICCV, Oct. 2019, pp. 9026–9035.
[3]
H. Yang, R. Zhang, X. Guo, W. Liu, W. Zuo, and P. Luo, “Towards photo-realistic virtual try-on by adaptively generating-preserving image content,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 7850–7859.
[4]
C. Ge, Y. Song, Y. Ge, H. Yang, W. Liu, and P. Luo, “Disentangled cycle consistency for highly-realistic virtual try-on,” in Proc. IEEE Conf. CVPR, Jun. 2021, pp. 16928–16937.
[5]
S. Song, W. Zhang, J. Liu, Z. Guo, and T. Mei, “Unpaired person image generation with semantic parsing transformation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 11, pp. 4161–4176, Nov. 2021.
[6]
Y. Men, Y. Mao, Y. Jiang, W.-Y. Ma, and Z. Lian, “Controllable person image synthesis with attribute-decomposed GAN,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 5084–5093.
[7]
K. Zhu, H. Guo, Z. Liu, M. Tang, and J. Wang, “Identity-guided human semantic parsing for person re-identification,” in Proc. Conf. ECCV. Glasgow, U.K.: Springer, Aug. 2020, pp. 346–363.
[8]
S. Qi, W. Wang, B. Jia, J. Shen, and S.-C. Zhu, “Learning human-object interactions by graph parsing neural networks,” in Proc. Conf. ECCV, 2018, pp. 401–417.
[9]
T. Zhou, S. Qi, W. Wang, J. Shen, and S.-C. Zhu, “Cascaded parsing of human-object interaction recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2827–2840, Jun. 2022.
[10]
H.-S. Fang, C. Wang, M. Gou, and C. Lu, “GraspNet: A large-scale clustered and densely annotated dataset for object grasping,” 2019, arXiv:1912.13470.
[11]
Y. Liu, S. Zhang, J. Xu, J. Yang, and Y.-W. Tai, “An accurate and lightweight method for human body image super-resolution,” IEEE Trans. Image Process., vol. 30, pp. 2888–2897, 2021.
[12]
P. Li, Y. Xu, Y. Wei, and Y. Yang, “Self-correction for human parsing,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 3260–3271, Jun. 2022.
[13]
W. Wang, T. Zhou, S. Qi, J. Shen, and S.-C. Zhu, “Hierarchical human semantic parsing with comprehensive part-relation modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3508–3522, Jul. 2022.
[14]
T. Ruan, T. Liu, Z. Huang, Y. Wei, S. Wei, and Y. Zhao, “Devil in the details: Towards accurate single and multiple human parsing,” in Proc. Conf. AAAI, vol. 33, 2019, pp. 4814–4821.
[15]
S. Zhang, G.-J. Qi, X. Cao, Z. Song, and J. Zhou, “Human parsing with pyramidical gather-excite context,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 3, pp. 1016–1030, Mar. 2021.
[16]
D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, “AUGMIX: A simple data processing method to improve robustness and uncertainty,” in Proc. Conf. ICLR, 2020, pp. 1–15.
[17]
C. Michaeliset al., “Benchmarking robustness in object detection: Autonomous driving when winter is coming,” 2019, arXiv:1907.07484.
[18]
C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 8828–8838.
[19]
C. Kamann and C. Rother, “Increasing the robustness of semantic segmentation models with painting-by-numbers,” in Proc. Conf. ECCV, 2020, pp. 369–387.
[20]
J. Wang, S. Jin, W. Liu, W. Liu, C. Qian, and P. Luo, “When human pose estimation meets robustness: Adversarial algorithms and benchmarks,” in Proc. IEEE Conf. CVPR, Jun. 2021, pp. 11855–11864.
[21]
X. Liang, K. Gong, X. Shen, and L. Lin, “Look into person: Joint body parsing & pose estimation network and a new benchmark,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 4, pp. 871–885, Apr. 2019.
[22]
X. Lianget al., “Deep human parsing with active template regression,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 12, pp. 2402–2414, Dec. 2015.
[23]
X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun, and A. Yuille, “Detect what you can: Detecting and representing objects using holistic models and body parts,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1971–1978.
[24]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010.
[25]
K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg, “Parsing clothing in fashion photographs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 3570–3577.
[26]
S. Liuet al., “Fashion parsing with weak color-category labels,” IEEE Trans. Multimedia, vol. 16, no. 1, pp. 253–265, Jan. 2014.
[27]
Y. Bo and C. C. Fowlkes, “Shape-based pedestrian parsing,” in Proc. IEEE Conf. CVPR, Jun. 2011, pp. 2265–2272.
[28]
J. Dong, Q. Chen, W. Xia, Z. Huang, and S. Yan, “A deformable mixture parsing model with parselets,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 3408–3415.
[29]
K. Yamaguchi, M. H. Kiapour, and T. L. Berg, “Paper doll parsing: Retrieving similar styles to parse clothing items,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 3519–3526.
[30]
L. Zhu, Y. Chen, Y. Lu, C. Lin, and A. Yuille, “Max margin AND/OR graph learning for parsing the human body,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8.
[31]
X. Lianget al., “Human parsing with contextualized convolutional neural network,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1386–1394.
[32]
X. Liang, X. Shen, D. Xiang, J. Feng, L. Lin, and S. Yan, “Semantic object parsing with local–global long short-term memory,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 3185–3193.
[33]
X. Liang, X. Shen, J. Feng, L. Lin, and S. Yan, “Semantic object parsing with graph LSTM,” in Proc. Conf. ECCV, 2016, pp. 125–143.
[34]
P. Luo, X. Wang, and X. Tang, “Pedestrian parsing via deep decompositional network,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 2648–2655.
[35]
W. Wang, T. Zhou, F. Yu, J. Dai, E. Konukoglu, and L. Van Gool, “Exploring cross-image pixel contrast for semantic segmentation,” in Proc. IEEE Conf. ICCV, Oct. 2021, pp. 7303–7313.
[36]
T. Zhou, W. Wang, E. Konukoglu, and L. Van Goo, “Rethinking semantic segmentation: A prototype view,” in Proc. IEEE Conf. CVPR, Jun. 2022, pp. 2582–2593.
[37]
Z. Zhang, C. Su, L. Zheng, and X. Xie, “Correlating edge, pose with parsing,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 8900–8909.
[38]
T. Zhou, W. Wang, S. Liu, Y. Yang, and L. Van Gool, “Differentiable multi-granularity human representation learning for instance-aware human semantic parsing,” in Proc. IEEE Conf. CVPR, Jun. 2021, pp. 1622–1631.
[39]
W. Wang, J. Shen, X. Lu, S. C. H. Hoi, and H. Ling, “Paying attention to video object pattern understanding,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 7, pp. 2413–2428, Jul. 2021.
[40]
R. Jiet al., “Learning semantic neural tree for human parsing,” in Proc. Conf. ECCV, 2020, pp. 205–221.
[41]
W. Wang, Z. Zhang, S. Qi, J. Shen, Y. Pang, and L. Shao, “Learning compositional neural information fusion for human parsing,” in Proc. IEEE Conf. ICCV, Oct. 2019, pp. 5703–5713.
[42]
W. Wang, H. Zhu, J. Dai, Y. Pang, J. Shen, and L. Shao, “Hierarchical human parsing with typed part-relation reasoning,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 8929–8939.
[43]
K. Gong, Y. Gao, X. Liang, X. Shen, M. Wang, and L. Lin, “Graphonomy: Universal human parsing via graph transfer learning,” in Proc. IEEE Conf. CVPR, Jun. 2019, pp. 7450–7459.
[44]
H. He, J. Zhang, Q. Zhang, and D. Tao, “Grapy-ML: Graph pyramid mutual learning for cross-dataset human parsing,” in Proc. Conf. AAAI, vol. 34, 2020, pp. 10949–10956.
[45]
L. Li, T. Zhou, W. Wang, J. Li, and Y. Yang, “Deep hierarchical semantic segmentation,” in Proc. IEEE Conf. CVPR, Jun. 2022, pp. 1246–1257.
[46]
Y. Luo, Z. Zheng, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Macro-micro adversarial network for human parsing,” in Proc. Conf. ECCV, 2018, pp. 418–434.
[47]
S. Liuet al., “Cross-domain human parsing via adversarial feature and label adaptation,” in Proc. Conf. AAAI, vol. 32, 2018, pp. 1–8.
[48]
T. Zhouet al., “Consistency and diversity induced human motion segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 197–210, Jan. 2023.
[49]
T. Li, Z. Liang, S. Zhao, J. Gong, and J. Shen, “Self-learning with rectification strategy for human parsing,” in Proc. IEEE Conf. CVPR, Jun. 2020, pp. 9263–9272.
[50]
H. Wang, C. Xiao, J. Kossaifi, Z. Yu, A. Anandkumar, and Z. Wang, “AugMax: Adversarial composition of random augmentations for robust training,” in Proc. Conf. NeurIPS, vol. 34, 2021, pp. 237–250.
[51]
Z. Jiang, T. Chen, T. Chen, and Z. Wang, “Robust pre-training by adversarial contrastive learning,” in Proc. Conf. NeurIPS, vol. 33, 2020, pp. 16199–16210.
[52]
T. W. Wenget al., “Evaluating the robustness of neural networks: An extreme value theory approach,” in Proc. Conf. ICLR, 2018, pp. 1–18.
[53]
S. Zheng, Y. Song, T. Leung, and I. Goodfellow, “Improving the robustness of deep neural networks via stability training,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 4480–4488.
[54]
D. Hendryckset al., “The many faces of robustness: A critical analysis of out-of-distribution generalization,” in Proc. IEEE Conf. ICCV, Oct. 2021, pp. 8340–8349.
[55]
C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models with respect to common corruptions,” Int. J. Comput. Vis., vol. 129, no. 2, pp. 462–483, Feb. 2021.
[56]
X. Xu, H. Zhao, and J. Jia, “Dynamic divide-and-conquer adversarial training for robust semantic segmentation,” in Proc. IEEE Conf. ICCV, Oct. 2021, pp. 7486–7495.
[57]
E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “AutoAugment: Learning augmentation strategies from data,” in Proc. IEEE Conf. CVPR, Jun. 2019, pp. 113–123.
[58]
S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, “Res2Net: A new multi-scale backbone architecture,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 652–662, Feb. 2021.
[59]
J. Wanget al., “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3349–3364, Oct. 2021.
[60]
Y. Yuan, X. Chen, and J. Wang, “Object-contextual representations for semantic segmentation,” in Proc. Conf. ECCV, 2020, pp. 173–190.
[61]
X. Luo, Z. Su, J. Guo, G. Zhang, and X. He, “Trusted guidance pyramid network for human parsing,” in Proc. Conf. ACM MM, Oct. 2018, pp. 654–662.
[62]
P. Chen, S. Liu, H. Zhao, and J. Jia, “GridMask data augmentation,” 2020, arXiv:2001.04086.
[63]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2881–2890.
[64]
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Proc. Conf. ICLR, 2018, pp. 1–28.
[65]
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder–decoder with atrous separable convolution for semantic image segmentation,” in Proc. Conf. ECCV, 2018, pp. 801–818.

Index Terms

  1. Exploring the Robustness of Human Parsers Toward Common Corruptions
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Image Processing
      IEEE Transactions on Image Processing  Volume 32, Issue
      2023
      5324 pages

      Publisher

      IEEE Press

      Publication History

      Published: 18 September 2023

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Nov 2024

      Other Metrics

      Citations

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media