Mask-Guided Deformation Adaptive Network for Human Parsing

Published: 14 March 2022


Due to the challenges of densely compacted body parts, nonrigid clothing items, and severe overlap in crowd scenes, human parsing needs to focus more on multilevel feature representations compared to general scene parsing tasks. Based on this observation, we propose to introduce the auxiliary task of human mask and edge detection to facilitate human parsing. Different from human parsing, which exploits the discriminative features of each category, human mask and edge detection emphasizes the boundaries of semantic parsing regions and the difference between foreground humans and background clutter, which benefits the parsing predictions of crowd scenes and small human parts. Specifically, we extract human mask and edge labels from the human parsing annotations and train a shared encoder with three independent decoders for the three mutually beneficial tasks. Furthermore, the decoder feature maps of the human mask prediction branch are further exploited as attention maps, indicating human regions to facilitate the decoding process of human parsing and human edge detection. In addition to these auxiliary tasks, we further alleviate the problem of deformed clothing items under various human poses by tracking the deformation patterns with the deformable convolution. Extensive experiments show that the proposed method can achieve superior performance against state-of-the-art methods on both single and multiple human parsing datasets. Codes and trained models are available


Information & Contributors


Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
January 2022
517 pages
Issue’s Table of Contents


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2022
Accepted: 01 May 2021
Revised: 01 March 2021
Received: 01 August 2020
Published in TOMM Volume 18, Issue 1


Author Tags

  Human parsing
  multi-task learning
  deformable convolution


  • Research-article
  • Refereed

Funding Sources

  National Natural Science Foundation of China
  Guangdong International Science and Technology Cooperation Project
  Guangdong Natural Science Foundation
  Guangzhou Basic and Applied Research Project
  Fundamental Research Funds for the Central Universities
  Social Science Research Base of Guangdong Province-Research Center of Network Civilization in New Era of SCUT
  CCF-Tencent Open Research fund


