$^3$ Net: Self-Supervised Self-Ensembling Network for Semi-Supervised RGB-D Salient Object Detection | IEEE Transactions on Multimedia"/>
Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

S <inline-formula><tex-math notation="LaTeX">$^3$</tex-math></inline-formula> Net: Self-Supervised Self-Ensembling Network for Semi-Supervised RGB-D Salient Object Detection

Published: 01 January 2023 Publication History

Abstract

RGB-D salient object detection aims to detect visually distinctive objects or regions from a pair of the RGB image and the depth image. State-of-the-art RGB-D saliency detectors are mainly based on convolutional neural networks but almost suffer from an intrinsic limitation relying on the labeled data, thus degrading detection accuracy in complex cases. In this work, we present a self-supervised self-ensembling network (S <inline-formula><tex-math notation="LaTeX">$^3$</tex-math></inline-formula> Net) for semi-supervised RGB-D salient object detection by leveraging the unlabeled data and exploring a self-supervised learning mechanism. To be specific, we first build a self-guided convolutional neural network (SG-CNN) as a baseline model by developing a series of three-layer cross-model feature fusion (TCF) modules to leverage complementary information among depth and RGB modalities and formulating an auxiliary task that predicts a self-supervised image rotation angle. After that, to further explore the knowledge from unlabeled data, we assign SG-CNN to a student network and a teacher network, and encourage the saliency predictions and self-supervised rotation predictions from these two networks to be consistent on the unlabeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our network quantitatively and qualitatively outperforms the state-of-the-art methods.

References

[1]
K. Fu, Q. Zhao, and I. Y.-H. Gu, “Refinet: A deep segmentation assisted refinement network for salient object detection,”IEEE Trans. Multimedia, vol. 21, pp. 457–469, 2019.
[2]
H. Xiao, J. Feng, Y. Wei, M. Zhang, and S. Yan, “Deep salient object detection with dense connections and distraction diagnosis,”IEEE Trans. Multimedia, vol. 20, pp. 3239–3251, 2018.
[3]
X. Ding and Z. Chen, “Improving saliency detection based on modeling photographer’s intention,”IEEE Trans. Multimedia, vol. 21, pp. 124–134, 2019.
[4]
L. Ye, Z. Liu, L. Li, L. Shen, C. Bai, and Y. Wang, “Salient object segmentation via effective integration of saliency and objectness,”IEEE Trans. Multimedia, vol. 19, pp. 1742–1756, 2017.
[5]
C.-C. Tsai, K.-J. Hsu, Y.-Y. Lin, X. Qian, and Y.-Y. Chuang, “Deep co-saliency detection via stacked autoencoder-enabled fusion and self-trained cnns,”IEEE Trans. Multimedia, vol. 22, pp. 1016–1031, 2020.
[6]
G. Ma, C. Chen, S. Li, C. Peng, A. Hao, and H. Qin, “Salient object detection via multiple instance joint re-learning,”IEEE Trans. Multimedia, vol. 22, pp. 324–336, 2020.
[7]
L. Zhuet al., “Aggregating attentional dilated features for salient object detection,”IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 10, pp. 3358–3371, Oct.2020.
[8]
L. Wang, R. Chen, L. Zhu, H. Xie, and X. Li, “Deep sub-region network for salient object detection,”IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 2, pp. 728–741, Feb.2021.
[9]
W. Wang, J. Shen, and F. Porikli, “Saliency-aware geodesic video object segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3395–3402.
[10]
S. Hong, T. You, S. Kwak, and B. Han, “Online tracking by learning discriminative saliency map with convolutional neural network,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 597–606.
[11]
H. Hadizadeh and I. V. Bajic, “Saliency-aware video compression,”IEEE Trans. Image Process., vol. 23, no. 1, pp. 19–33, Jan.2014.
[12]
H. Zhao, X. Mao, X. Jin, J. Shen, F. Wei, and J. Feng, “Real-time saliency-aware video abstraction,”Vis. Comput., vol. 25, no. 11, pp. 973–984, 2009.
[13]
M.-M. Cheng, F.-L. Zhang, N. J. Mitra, X. Huang, and S.-M. Hu, “RepFinder: Finding approximately repeated scene elements for image editing,”ACM Trans. Graph., ACM, vol. 29, no. 4, pp. 1–8, 2010.
[14]
L. Zhu, X. Hu, C.-W. Fu, J. Qin, and P.-A. Heng, “Saliency-aware texture smoothing,”IEEE Trans. Vis. Comput. Graphics, vol. 26, no. 7, pp. 2471–2484, Jul.2020.
[15]
D. Liu, K. Zhang, and Z. Chen, “Attentive cross-modal fusion network for RGB-D saliency detection,”IEEE Trans. Multimedia, vol. 23, pp. 967–981, 2021.
[16]
B. Jiang, Z. Zhou, X. Wang, J. Tang, and B. Luo, “cmSalGAN: RGB-D salient object detection with cross-view generative adversarial networks,”IEEE Trans. Multimedia, vol. 23, pp. 1343–1353, 2021.
[17]
D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, and M.-M. Cheng, “Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks,”IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 5, pp. 2075–2089, May2021.
[18]
J. Zhanget al., “UC-Net: Uncertainty inspired RGB-D saliency detection via conditional variational autoencoders,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 8582–8591.
[19]
D. Feng, N. Barnes, S. You, and C. McCarthy, “Local background enclosure for RGB-D salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2343–2350.
[20]
H. Song, Z. Liu, H. Du, G. Sun, O. L. Meur, and T. Ren, “Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning,”IEEE Trans. Image Process., vol. 26, no. 9, pp. 4204–4216, Sep.2017.
[21]
R. Cong, J. Lei, H. Fu, J. Hou, Q. Huang, and S. Kwong, “Going from RGB to RGBD saliency: A depth-guided transformation model,”IEEE Trans. Cybern., vol. 50, no. 8, pp. 3627–3639, Aug.2020.
[22]
H. Chen and Y. Li, “Progressively complementarity-aware fusion network for RGB-D salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3051–3060.
[23]
J.-X. Zhao, Y. Cao, D.-P. Fan, M.-M. Cheng, X.-Y. Li, and L. Zhang, “Contrast prior and fluid pyramid integration for RGBD salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3927–3936.
[24]
Y. Piao, W. Ji, J. Li, M. Zhang, and H. Lu, “Depth-induced multi-scale recurrent attention network for saliency detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 7254–7263.
[25]
K. Fu, D.-P. Fan, G.-P. Ji, and Q. Zhao, “JL-DCF: Joint learning and densely-cooperative fusion framework for RGB-D salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3052–3062.
[26]
M. Zhang, W. Ren, Y. Piao, Z. Rong, and H. Lu, “Select, supplement and focus for RGB-D saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3472–3481.
[27]
N. Liu, N. Zhang, and J. Han, “Learning selective self-mutual attention for RGB-D saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13 756–13 765.
[28]
Y. Niu, Y. Geng, X. Li, and F. Liu, “Leveraging stereopsis for saliency analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 454–461.
[29]
J. Guo, T. Ren, and J. Bei, “Salient object detection for RGB-D image via saliency evolution,” in Proc. IEEE Int. Conf. Multimedia Expo, 2016, pp. 1–6.
[30]
R. Cong, J. Lei, C. Zhang, Q. Huang, X. Cao, and C. Hou, “Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion,”IEEE Signal Process. Lett., vol. 23, no. 6, pp. 819–823, Jun.2016.
[31]
R. Ju, L. Ge, W. Geng, T. Ren, and G. Wu, “Depth saliency based on anisotropic center-surround difference,” in Proc. IEEE Int. Conf. Image Process., 2014, pp. 1115–1119.
[32]
H. Peng, B. Li, W. Xiong, W. Hu, and R. Ji, “RGBD salient object detection: A benchmark and algorithms,” in Proc. Eur. Conf. Comput. Vis., Springer, 2014, pp. 92–109.
[33]
Y. Cheng, H. Fu, X. Wei, J. Xiao, and X. Cao, “Depth enhanced saliency detection method,” in Proc. Int. Conf. Internet Multimedia Comput. Serv., 2014, pp. 23–27.
[34]
F. Liang, L. Duan, W. Ma, Y. Qiao, Z. Cai, and L. Qing, “Stereoscopic saliency model using contrast and depth-guided-background prior,”Neurocomputing, vol. 275, pp. 2227–2238, 2018.
[35]
A. Wang and M. Wang, “RGB-D salient object detection via minimum barrier distance transform and saliency fusion,”IEEE Signal Process. Lett., vol. 24, no. 5, pp. 663–667, May2017.
[36]
A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 1195–1204.
[37]
Z. Chen, L. Zhu, L. Wan, S. Wang, W. Feng, and P.-A. Heng, “A multi-task mean teacher for semi-supervised shadow detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 5611–5620.
[38]
Z. Liu, S. Shi, Q. Duan, W. Zhang, and P. Zhao, “Salient object detection for RGB-D image by single stream recurrent convolution neural network,”Neurocomputing, vol. 363, pp. 46–57, 2019.
[39]
P. Huang, C.-H. Shen, and H.-F. Hsiao, “RGBD salient object detection using spatially coherent deep learning framework,” in Proc. IEEE 23rd Int. Conf. Digit. Signal Process., 2018, pp. 1–5.
[40]
J. Han, H. Chen, N. Liu, C. Yan, and X. Li, “CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion,”IEEE Trans. Cybern., vol. 48, no. 11, pp. 3171–3183, Nov.2018.
[41]
H. Chen and Y. Li, “Three-stream attention-aware network for RGB-D salient object detection,”IEEE Trans. Image Process., vol. 28, no. 6, pp. 2825–2835, Jun.2019.
[42]
H. Chen, Y. Li, and D. Su, “Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection,”Pattern Recognit., vol. 86, pp. 376–385, 2019.
[43]
M. Zhang, S. X. Fei, J. Liu, S. Xu, Y. Piao, and H. Lu, “Asymmetric two-stream architecture for accurate RGB-D saliency detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 374–390.
[44]
Y. Pang, L. Zhang, X. Zhao, and H. Lu, “Hierarchical dynamic filtering network for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 235–252.
[45]
X. Zhao, L. Zhang, Y. Pang, H. Lu, and L. Zhang, “A single stream network for robust and real-time RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 646–662.
[46]
S. Chen and Y. Fu, “Progressively guided alternate refinement network for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 520–538.
[47]
W. Ji, J. Li, M. Zhang, Y. Piao, and H. Lu, “Accurate RGB-D salient object detection via collaborative learning,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 52–69.
[48]
A. Luo, X. Li, F. Yang, Z. Jiao, H. Cheng, and S. Lyu, “Cascade graph neural networks for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis. ICLR, Springer, 2020, pp. 346–364.
[49]
S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” in Proc. Int. Conf. Learn. Representations, 2017. [Online]. Available: https://openreview.net/forum?id=BJ6oOfqge
[50]
L. Yu, S. Wang, X. Li, C.-W. Fu, and P.-A. Heng, “Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Interv., Springer, 2019, pp. 605–613.
[51]
M. Zhao, L. Jiao, W. Ma, H. Liu, and S. Yang, “Classification and saliency detection by semi-supervised low-rank representation,”Pattern Recognit., vol. 51, pp. 281–294, 2016.
[52]
Y. Zhou, S. Huo, W. Xiang, C. Hou, and S.-Y. Kung, “Semi-supervised salient object detection using a linear feedback control system model,”IEEE Trans. Cybern., vol. 49, no. 4, pp. 1173–1185, Apr.2019.
[53]
D. Zhang, J. Han, Y. Zhang, and D. Xu, “Synthesizing supervision for learning deep saliency network without human annotation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 7, pp. 1755–1769, Jul.2020.
[54]
D. Zhang, H. Tian, and J. Han, “Few-cost salient object detection with adversarial-paced learning,” in Annual Conf. Neural Informat. Process. Syst. (NeurIPS), 2020.
[55]
X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4L: Self-supervised semi-supervised learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 1476–1485.
[56]
Z. Feng, C. Xu, and D. Tao, “Self-supervised representation learning by rotation feature decoupling,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10 364–10374.
[57]
S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in Proc. Int. Conf. Learn. Representations, 2018.
[58]
M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 132–149.
[59]
D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2536–2544.
[60]
R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in Proc. Eur. Conf. Comput. Vis., Springer, 2016, pp. 649–666.
[61]
D. Pathak, R. Girshick, P. Dollár, T. Darrell, and B. Hariharan, “Learning features by watching objects move,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2701–2710.
[62]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr.2018.
[63]
S. Song, S. P. Lichtenberg, and J. Xiao, “Sun RGB-D: A RGB-D scene understanding benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 567–576.
[64]
L. Qu, S. He, J. Zhang, J. Tian, Y. Tang, and Q. Yang, “RGBD salient object detection via deep fusion,”IEEE Trans. Image Process., vol. 26, no. 5, pp. 2274–2285, May2017.
[65]
C. Li, R. Cong, Y. Piao, Q. Xu, and C. C. Loy, “RGB-D salient object detection with cross-modality modulation and selection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 225–241.
[66]
G. Li, Z. Liu, L. Ye, Y. Wang, and H. Ling, “Cross-modal weighting network for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 665–681.
[67]
D.-P. Fan, Y. Zhai, A. Borji, J. Yang, and L. Shao, “BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 275–292.
[68]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[69]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
[70]
N. Li, J. Ye, Y. Ji, H. Ling, and J. Yu, “Saliency detection on light field,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2806–2813.
[71]
C. Lang, J. Feng, S. Feng, J. Wang, and S. Yan, “Dual low-rank pursuit: Learning salient features for saliency detection,”IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 6, pp. 1190–1200, Jun.2016.
[72]
N. Liu, J. Han, and M.-H. Yang, “PiCANet: Learning pixel-wise contextual attention for saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3089–3098.
[73]
D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4548–4557.
[74]
R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1597–1604.
[75]
Q. Hou, M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. S. Torr, “Deeply supervised salient object detection with short connections,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5300–5309.
[76]
Z. Denget al., “R $^{3}$ Net: Recurrent residual refinement network for saliency detection,” in Proc. Int. Joint Conf. Artif. Intell., 2018, pp. 684–690.
[77]
D.-P. Fan, C. Gong, Y. Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” in Proc. Int. Joint Conf. Artif. Intell., 2018, pp. 698–704.
[78]
F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 733–740.
[79]
C. Zhu, X. Cai, K. Huang, T. H. Li, and G. Li, “PDNet: Prior-model guided depth-enhanced network for salient object detection,” in Proc. IEEE Int. Conf. Multimedia Expo, 2019, pp. 199–204.

Cited By

View all
  • (2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
  • (2024)Depth-Assisted Semi-Supervised RGB-D Rail Surface Defect InspectionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.338794925:7(8042-8052)Online publication date: 1-Jul-2024
  • (2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11_Part_1(10917-10929)Online publication date: 11-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 25, Issue
2023
8932 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
  • (2024)Depth-Assisted Semi-Supervised RGB-D Rail Surface Defect InspectionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.338794925:7(8042-8052)Online publication date: 1-Jul-2024
  • (2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11_Part_1(10917-10929)Online publication date: 11-Jun-2024
  • (2023)Semi-supervised Learning with Easy Labeled Data via Impartial Labeled Set ExtensionProceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3607541.3616815(29-39)Online publication date: 29-Oct-2023
  • (2022)Spatial-temporal Fusion Network for Fast Video Shadow DetectionProceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry10.1145/3574131.3574455(1-5)Online publication date: 27-Dec-2022

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media