research-article

Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly Supervised Semantic Segmentation

Authors:

Fumin ShenAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 25

Pages 1727 - 1737

https://doi.org/10.1109/TMM.2022.3157481

Published: 01 January 2023 Publication History

Abstract

Weakly supervised semantic segmentation with only image-level labels aims to reduce annotation costs for the segmentation task. Existing approaches generally leverage class activation maps (CAMs) to locate the object regions for pseudo label generation. However, CAMs can only discover the most discriminative parts of objects, thus leading to inferior pixel-level pseudo labels. To address this issue, we propose a saliency guided <bold>I</bold>nter- and <bold>I</bold>ntra-<bold>C</bold>lass <bold>R</bold>elation <bold>C</bold>onstrained (I<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula>CRC) framework to assist the expansion of the activated object regions in CAMs. Specifically, we propose a saliency guided class-agnostic distance module to pull the intra-category features closer by aligning features to their class prototypes. Further, we propose a class-specific distance module to push the inter-class features apart and encourage the object region to have a higher activation than the background. Besides strengthening the capability of the classification network to activate more integral object regions in CAMs, we also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels. Extensive experiments on PASCAL VOC 2012 and COCO datasets demonstrate well the effectiveness of I<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math></inline-formula>CRC over other state-of-the-art counterparts.

References

[1]

Q. Wang, C. Yuan, and Y. Liu, “Learning deep conditional neural network for image segmentation,” IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1839–1852, Jul. 2019.

[2]

B. Kang, Y. Lee, and T. Q. Nguyen, “Depth-adaptive deep neural network for semantic segmentation,” IEEE Trans. Multimedia, vol. 20, no. 9, pp. 2478–2490, Sep. 2018.

Digital Library

[3]

M. Everingham, L. V. Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.

Digital Library

[4]

T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.

[5]

T. Chen et al., “Enhanced feature alignment for unsupervised domain adaptation of semantic segmentation,” IEEE Trans. Multimedia, vol. 24, pp. 1042–1054, 2022.

[6]

M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3213–3223.

[7]

A. Kolesnikov and C. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 695–711.

[8]

Y. Wei et al., “STC: A simple to complex framework for weakly-supervised semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, pp. 2314–2320, Nov. 2017.

Digital Library

[9]

S. Hong, D. Yeo, S. Kwak, H. Lee, and B. Han, “Weakly supervised semantic segmentation using web-crawled videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7322–7330.

[10]

A. Chaudhry, P. Dokania, and P. Torr, “Discovering class-specific pixels for weakly-supervised semantic segmentation,” in Proc. Brit. Mach. Vis. Conf., 2017.

[11]

Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang, “Weakly-supervised semantic segmentation network with deep seeded region growing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7014–7023.

[12]

J. Ahn and S. Kwak, “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4981–4990.

[13]

Y. Wei et al., “Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7268–7277.

[14]

P.-T. Jiang et al., “Integral object mining via online attention accumulation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 2070–2079.

[15]

J. Dai, K. He, and J. Sun, “BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1635–1643.

[16]

A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele, “Simple does it: Weakly supervised instance and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 876–885.

[17]

C. Song, Y. Huang, W. Ouyang, and L. Wang, “Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3136–3145.

[18]

D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3159–3167.

[19]

P. Vernaza and M. Chandraker, “Learning random-walk label propagation for weakly-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 7158–7166.

[20]

A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei, “What’s the point: Semantic segmentation with point supervision,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 549–565.

[21]

B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2921–2929.

[22]

R. R. Selvaraju et al., “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 618–626.

[23]

Q. Hou et al., “Deeply supervised salient object detection with short connections,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3203–3212.

[24]

J. Lee, E. Kim, S. Lee, J. Lee, and S. Yoon, “FickleNet: Weakly and semi-supervised semantic image segmentation using stochastic inference,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5267–5276.

[25]

J. Fan, Z. Zhang, C. Song, and T. Tan, “Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4283–4292.

[26]

J. Fan, Z. Zhang, and T. Tan, “Employing multi-estimations for weakly-supervised semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 332–348.

[27]

G. Sun, W. Wang, J. Dai, and L. Gool, “Mining cross-image semantics for weakly supervised semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 347–365.

[28]

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3431–3440.

[29]

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput. Assist. Intervention, 2015, pp. 234–241.

[30]

V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, Dec. 2017.

[31]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2017.

[32]

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2881–2890.

[33]

H. Zhang et al., “Context encoding for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7151–7160.

[34]

C. Liu et al., “Auto-deepLab: Hierarchical neural architecture search for semantic image segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 82–92.

[35]

G.-S. Xie, J. Liu, H. Xiong, and L. Shao, “Scale-aware graph neural network for few-shot semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 5475–5484.

[36]

G.-S. Xie, H. Xiong, J. Liu, Y. Yao, and L. Shao, “Few-shot semantic segmentation with cyclic memory network,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 7293–7302.

[37]

T. Chen et al., “Semantically meaningful class prototype learning for one-shot image segmentation,” IEEE Trans. Multimedia, vol. 24, pp. 968–980, 2022.

[38]

G. Lin, A. Milan, C. Shen, and I. Reid, “RefineNet: Multi-path refinement networks for high-resolution semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1925–1934.

[39]

X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.

[40]

J. Fu et al., “Dual attention network for scene segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3146–3154.

[41]

Z. Huang et al., “CCNet: Criss-cross attention for semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 603–612.

[42]

W. Wang et al., “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 568–578.

[43]

Y. Wei et al., “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1568–1576.

[44]

Q. Hou, P.-T. Jiang, Y. Wei, and M.-M. Cheng, “Self-erasing network for integral object attention,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 549–559.

[45]

X. Li, T. Zhou, J. Li, Y. Zhou, and Z. Zhang, “Group-wise semantic mining for weakly supervised semantic segmentation,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 1984–1992.

[46]

X. Zhang, Y. Wei, J. Feng, Y. Yang, and T. Huang, “Adversarial complementary learning for weakly supervised object localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1325–1334.

[47]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015.

[48]

J. Deng et al., “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.

[49]

Y.-T. Chang et al., “Weakly-supervised semantic segmentation via sub-category exploration,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 8991–9000.

[50]

T. Zhang, G. Lin, W. Liu, J. Cai, and A. Kot, “Splitting vs. merging: Mining object regions with discrepancy and intersection loss for weakly supervised semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 663–679.

[51]

L. Chen, W. Wu, C. Fu, X. Han, and Y.-T. Zhang, “Weakly supervised semantic segmentation with boundary exploration,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 347–362.

[52]

D. Zhang, H. Zhang, J. Tang, X.-S. Hua, and Q. Sun, “Causal intervention for weakly-supervised semantic segmentation,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 655–666, 2020.

[53]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[54]

B. Hariharan, P. Arbeláez, L. D. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 991–998.

[55]

X. Wang, S. Liu, H. Ma, and M.-H. Yang, “Weakly-supervised semantic segmentation by iterative affinity learning,” Int. J. Comput. Vis., vol. 128, no. 6, pp. 1736–1749, 2020.

Digital Library

[56]

B. Jin, M. O. Segovia, and S. Süsstrunk, “Webly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3626–3635.

[57]

D. Kim, D. Cho, and D. Yoo, “Two-phase learning for weakly supervised object localization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 3534–3543.

[58]

K. Li, Z. Wu, K.-C. Peng, J. Ernst, and Y. Fu, “Tell me where to look: Guided attention inference network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 9215–9223.

[59]

X. Wang, S. You, X. Li, and H. Ma, “Weakly-supervised semantic segmentation by iteratively mining common object features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1354–1362.

[60]

Y. Zeng, Y. Zhuge, H. Lu, and L. Zhang, “Joint learning of saliency detection and weakly supervised semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 7223–7233.

[61]

K. Sun, H. Shi, Z. Zhang, and Y. Huang, “ECS-Net: Improving weakly supervised semantic segmentation by using connections between class activation maps,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 7283–7292.

[62]

F. Saleh et al., “Built-in foreground/background prior for weakly-supervised semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 413–432.

[63]

J. Ahn, S. Cho, and S. Kwak, “Weakly supervised learning of instance segmentation with inter-pixel relations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2209–2218.

[64]

W. Shimoda and K. Yanai, “Self-supervised difference detection for weakly-supervised semantic segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 5208–5217.

[65]

Y. Wang, J. Zhang, M. Kan, S. Shan, and X. Chen, “Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 12 275–12 284.

[66]

Y. Liu et al., “Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3, pp. 1415–1428, Mar. 2022.

[67]

Y. Yao et al., “Non-salient region object mining for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 2623–2632.

Cited By

Sheng MSun ZPei GChen TLuo HYao YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Enhancing Robustness in Learning with Noisy Labels: An Asymmetric Co-Training ApproachProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680835(4406-4415)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680835
Guo RYing XQi YQu L(2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3369922
Liu HSheng MSun ZYao YHua XShen H(2024)Learning With Imbalanced Noisy Data by Preventing Bias in Sample SelectionIEEE Transactions on Multimedia10.1109/TMM.2024.336891026(7426-7437)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3368910
Show More Cited By

Recommendations

Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Successful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-...
Boosted MIML method for weakly-supervised image semantic segmentation

Weakly-supervised image semantic segmentation aims to segment images into semantically consistent regions with only image-level labels are available, and is of great significance for fine-grained image analysis, retrieval and other possible ...
Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation
Abstract
The major obstacle in semantic segmentation is that it requires a large number of pixel-level labeled data to train an effective model. In order to reduce the cost of annotation, weakly-supervised methods use weaker labels to overcome the need for ... $_{}$

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 25, Issue

2023

8932 pages

ISSN:1520-9210

Issue’s Table of Contents

1520-9210 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sheng MSun ZPei GChen TLuo HYao YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Enhancing Robustness in Learning with Noisy Labels: An Asymmetric Co-Training ApproachProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680835(4406-4415)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680835
Guo RYing XQi YQu L(2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3369922
Liu HSheng MSun ZYao YHua XShen H(2024)Learning With Imbalanced Noisy Data by Preventing Bias in Sample SelectionIEEE Transactions on Multimedia10.1109/TMM.2024.336891026(7426-7437)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3368910
Cong RXiong HChen JZhang WHuang QZhao Y(2024)Query-Guided Prototype Evolution Network for Few-Shot SegmentationIEEE Transactions on Multimedia10.1109/TMM.2024.335292126(6501-6512)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3352921
Li BZhang FWang LWang YLiu TLin ZAn WGuo Y(2024)DDAug: Differentiable Data Augmentation for Weakly Supervised Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2023.332630026(4764-4775)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3326300
Zhang YLiu YHu RWu QZhang J(2024)Mutual Dual-Task Generator With Adaptive Attention Fusion for Image InpaintingIEEE Transactions on Multimedia10.1109/TMM.2023.328289226(1539-1550)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3282892
Xu RWang CXu SMeng WZhang X(2024)Wave-Like Class Activation Map With Representation Fusion for Weakly-Supervised Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2023.326789126(581-592)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3267891
Sheng MSun ZChen TPang SWang YYao Y(2024)Foster Adaptivity and Balance in Learning with Noisy LabelsComputer Vision – ECCV 202410.1007/978-3-031-73383-3_13(217-235)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-73383-3_13
Zhao MQi XHu ZLi LZhang YHuang ZYu X(2023)Calligraphy Font Generation via Explicitly Modeling Location-Aware Glyph Component DeformationsIEEE Transactions on Multimedia10.1109/TMM.2023.334269026(5939-5950)Online publication date: 13-Dec-2023
https://dl.acm.org/doi/10.1109/TMM.2023.3342690
Dao SShi HPhung DCai J(2023)Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic SegmentationIEEE Transactions on Multimedia10.1109/TMM.2023.333010226(8442-8453)Online publication date: 3-Nov-2023
https://dl.acm.org/doi/10.1109/TMM.2023.3330102
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents