Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation

Published: 01 January 2023 Publication History

Abstract

Weakly supervised semantic segmentation with only image-level labels aims to reduce annotation costs for the segmentation task. Existing approaches generally leverage class activation maps (CAMs) to locate the object regions for pseudo label generation. However, CAMs can only discover the most discriminative parts of objects, thus leading to inferior pixel-level pseudo labels. To address this issue, we propose a saliency guided <bold>I</bold>nter- and <bold>I</bold>ntra-<bold>C</bold>lass <bold>R</bold>elation <bold>C</bold>onstrained (I<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula>CRC) framework to assist the expansion of the activated object regions in CAMs. Specifically, we propose a saliency guided class-agnostic distance module to pull the intra-category features closer by aligning features to their class prototypes. Further, we propose a class-specific distance module to push the inter-class features apart and encourage the object region to have a higher activation than the background. Besides strengthening the capability of the classification network to activate more integral object regions in CAMs, we also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels. Extensive experiments on PASCAL VOC 2012 and COCO datasets demonstrate well the effectiveness of I<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula>CRC over other state-of-the-art counterparts.

References

[1]
Q. Wang, C. Yuan, and Y. Liu, “Learning deep conditional neural network for image segmentation,” IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1839–1852, Jul. 2019.
[2]
B. Kang, Y. Lee, and T. Q. Nguyen, “Depth-adaptive deep neural network for semantic segmentation,” IEEE Trans. Multimedia, vol. 20, no. 9, pp. 2478–2490, Sep. 2018.
[3]
M. Everingham, L. V. Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.
[4]
T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
[5]
T. Chen et al., “Enhanced feature alignment for unsupervised domain adaptation of semantic segmentation,” IEEE Trans. Multimedia, vol. 24, pp. 1042–1054, 2022.
[6]
M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3213–3223.
[7]
A. Kolesnikov and C. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 695–711.
[8]
Y. Wei et al., “STC: A simple to complex framework for weakly-supervised semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, pp. 2314–2320, Nov. 2017.
[9]
S. Hong, D. Yeo, S. Kwak, H. Lee, and B. Han, “Weakly supervised semantic segmentation using web-crawled videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7322–7330.
[10]
A. Chaudhry, P. Dokania, and P. Torr, “Discovering class-specific pixels for weakly-supervised semantic segmentation,” in Proc. Brit. Mach. Vis. Conf., 2017.
[11]
Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang, “Weakly-supervised semantic segmentation network with deep seeded region growing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7014–7023.
[12]
J. Ahn and S. Kwak, “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4981–4990.
[13]
Y. Wei et al., “Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7268–7277.
[14]
P.-T. Jiang et al., “Integral object mining via online attention accumulation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 2070–2079.
[15]
J. Dai, K. He, and J. Sun, “BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1635–1643.
[16]
A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele, “Simple does it: Weakly supervised instance and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 876–885.
[17]
C. Song, Y. Huang, W. Ouyang, and L. Wang, “Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3136–3145.
[18]
D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3159–3167.
[19]
P. Vernaza and M. Chandraker, “Learning random-walk label propagation for weakly-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7158–7166.
[20]
A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei, “What’s the point: Semantic segmentation with point supervision,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 549–565.
[21]
B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2921–2929.
[22]
R. R. Selvaraju et al., “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 618–626.
[23]
Q. Hou et al., “Deeply supervised salient object detection with short connections,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3203–3212.
[24]
J. Lee, E. Kim, S. Lee, J. Lee, and S. Yoon, “FickleNet: Weakly and semi-supervised semantic image segmentation using stochastic inference,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5267–5276.
[25]
J. Fan, Z. Zhang, C. Song, and T. Tan, “Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4283–4292.
[26]
J. Fan, Z. Zhang, and T. Tan, “Employing multi-estimations for weakly-supervised semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 332–348.
[27]
G. Sun, W. Wang, J. Dai, and L. Gool, “Mining cross-image semantics for weakly supervised semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 347–365.
[28]
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3431–3440.
[29]
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput. Assist. Intervention, 2015, pp. 234–241.
[30]
V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, Dec. 2017.
[31]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2017.
[32]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2881–2890.
[33]
H. Zhang et al., “Context encoding for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7151–7160.
[34]
C. Liu et al., “Auto-deepLab: Hierarchical neural architecture search for semantic image segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 82–92.
[35]
G.-S. Xie, J. Liu, H. Xiong, and L. Shao, “Scale-aware graph neural network for few-shot semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 5475–5484.
[36]
G.-S. Xie, H. Xiong, J. Liu, Y. Yao, and L. Shao, “Few-shot semantic segmentation with cyclic memory network,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 7293–7302.
[37]
T. Chen et al., “Semantically meaningful class prototype learning for one-shot image segmentation,” IEEE Trans. Multimedia, vol. 24, pp. 968–980, 2022.
[38]
G. Lin, A. Milan, C. Shen, and I. Reid, “RefineNet: Multi-path refinement networks for high-resolution semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1925–1934.
[39]
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.
[40]
J. Fu et al., “Dual attention network for scene segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3146–3154.
[41]
Z. Huang et al., “CCNet: Criss-cross attention for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 603–612.
[42]
W. Wang et al., “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 568–578.
[43]
Y. Wei et al., “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1568–1576.
[44]
Q. Hou, P.-T. Jiang, Y. Wei, and M.-M. Cheng, “Self-erasing network for integral object attention,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 549–559.
[45]
X. Li, T. Zhou, J. Li, Y. Zhou, and Z. Zhang, “Group-wise semantic mining for weakly supervised semantic segmentation,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 1984–1992.
[46]
X. Zhang, Y. Wei, J. Feng, Y. Yang, and T. Huang, “Adversarial complementary learning for weakly supervised object localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1325–1334.
[47]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015.
[48]
J. Deng et al., “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
[49]
Y.-T. Chang et al., “Weakly-supervised semantic segmentation via sub-category exploration,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 8991–9000.
[50]
T. Zhang, G. Lin, W. Liu, J. Cai, and A. Kot, “Splitting vs. merging: Mining object regions with discrepancy and intersection loss for weakly supervised semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 663–679.
[51]
L. Chen, W. Wu, C. Fu, X. Han, and Y.-T. Zhang, “Weakly supervised semantic segmentation with boundary exploration,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 347–362.
[52]
D. Zhang, H. Zhang, J. Tang, X.-S. Hua, and Q. Sun, “Causal intervention for weakly-supervised semantic segmentation,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 655–666, 2020.
[53]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[54]
B. Hariharan, P. Arbeláez, L. D. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 991–998.
[55]
X. Wang, S. Liu, H. Ma, and M.-H. Yang, “Weakly-supervised semantic segmentation by iterative affinity learning,” Int. J. Comput. Vis., vol. 128, no. 6, pp. 1736–1749, 2020.
[56]
B. Jin, M. O. Segovia, and S. Süsstrunk, “Webly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3626–3635.
[57]
D. Kim, D. Cho, and D. Yoo, “Two-phase learning for weakly supervised object localization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 3534–3543.
[58]
K. Li, Z. Wu, K.-C. Peng, J. Ernst, and Y. Fu, “Tell me where to look: Guided attention inference network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9215–9223.
[59]
X. Wang, S. You, X. Li, and H. Ma, “Weakly-supervised semantic segmentation by iteratively mining common object features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1354–1362.
[60]
Y. Zeng, Y. Zhuge, H. Lu, and L. Zhang, “Joint learning of saliency detection and weakly supervised semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 7223–7233.
[61]
K. Sun, H. Shi, Z. Zhang, and Y. Huang, “ECS-Net: Improving weakly supervised semantic segmentation by using connections between class activation maps,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 7283–7292.
[62]
F. Saleh et al., “Built-in foreground/background prior for weakly-supervised semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 413–432.
[63]
J. Ahn, S. Cho, and S. Kwak, “Weakly supervised learning of instance segmentation with inter-pixel relations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2209–2218.
[64]
W. Shimoda and K. Yanai, “Self-supervised difference detection for weakly-supervised semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 5208–5217.
[65]
Y. Wang, J. Zhang, M. Kan, S. Shan, and X. Chen, “Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 12 275–12 284.
[66]
Y. Liu et al., “Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3, pp. 1415–1428, Mar. 2022.
[67]
Y. Yao et al., “Non-salient region object mining for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 2623–2632.

Cited By

View all
  • (2024)Enhancing Robustness in Learning with Noisy Labels: An Asymmetric Co-Training ApproachProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680835(4406-4415)Online publication date: 28-Oct-2024
  • (2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
  • (2024)Learning With Imbalanced Noisy Data by Preventing Bias in Sample SelectionIEEE Transactions on Multimedia10.1109/TMM.2024.336891026(7426-7437)Online publication date: 22-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 25, Issue
2023
8932 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Robustness in Learning with Noisy Labels: An Asymmetric Co-Training ApproachProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680835(4406-4415)Online publication date: 28-Oct-2024
  • (2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
  • (2024)Learning With Imbalanced Noisy Data by Preventing Bias in Sample SelectionIEEE Transactions on Multimedia10.1109/TMM.2024.336891026(7426-7437)Online publication date: 22-Feb-2024
  • (2024)Query-Guided Prototype Evolution Network for Few-Shot SegmentationIEEE Transactions on Multimedia10.1109/TMM.2024.335292126(6501-6512)Online publication date: 11-Jan-2024
  • (2024)DDAug: Differentiable Data Augmentation for Weakly Supervised Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2023.332630026(4764-4775)Online publication date: 1-Jan-2024
  • (2024)Mutual Dual-Task Generator With Adaptive Attention Fusion for Image InpaintingIEEE Transactions on Multimedia10.1109/TMM.2023.328289226(1539-1550)Online publication date: 1-Jan-2024
  • (2024)Wave-Like Class Activation Map With Representation Fusion for Weakly-Supervised Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2023.326789126(581-592)Online publication date: 1-Jan-2024
  • (2024)Foster Adaptivity and Balance in Learning with Noisy LabelsComputer Vision – ECCV 202410.1007/978-3-031-73383-3_13(217-235)Online publication date: 29-Sep-2024
  • (2023)Calligraphy Font Generation via Explicitly Modeling Location-Aware Glyph Component DeformationsIEEE Transactions on Multimedia10.1109/TMM.2023.334269026(5939-5950)Online publication date: 13-Dec-2023
  • (2023)Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2023.333010226(8442-8453)Online publication date: 3-Nov-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media