Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Joint Cross-Modal and Unimodal Features for RGB-D Salient Object Detection

Published: 01 January 2021 Publication History

Abstract

RGB-D salient object detection is one of the basic tasks in computer vision. Most existing models focus on investigating efficient ways of fusing the complementary information from RGB and depth images for better saliency detection. However, for many real-life cases, where one of the input images has poor visual quality or contains affluent saliency cues, fusing cross-modal features does not help to improve the detection accuracy, when compared to using unimodal features only. In view of this, a novel RGB-D salient object detection model is proposed by simultaneously exploiting the cross-modal features from the RGB-D images and the unimodal features from the input RGB and depth images for saliency detection. To this end, a Multi-branch Feature Fusion Module is presented to effectively capture the cross-level and cross-modal complementary information between RGB-D images, as well as the cross-level unimodal features from the RGB images and the depth images separately. On top of that, a Feature Selection Module is designed to adaptively select those highly discriminative features for the final saliency prediction from the fused cross-modal features and the unimodal features. Extensive evaluations on four benchmark datasets demonstrate that the proposed model outperforms the state-of-the-art approaches by a large margin.

References

[1]
A. Borji, M. Cheng, H. Jiang, and J. Li, “Salient object detection: A survey,”Comput. Vis. Media, vol. 5, pp. 117–150, 2014.
[2]
Z. Ren, S. Gao, L. Chia, and I. W.-H. Tsang, “Region-based saliency detection and its application in object recognition,”IEEE Trans. Circuits Syst. Video Technol., vol. 24, no.5, pp. 769–779, May2014.
[3]
S. Hong, T. You, S. Kwak, and B. Han, “Online tracking by learning discriminative saliency map with convolutional neural network,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 597–606.
[4]
S. J. Oh, R. Benenson, A. Khoreva, Z. Akata, M. Fritz, and B. Schiele, “Exploiting saliency for object segmentation from image level labels,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5038–5047.
[5]
L. Ye, Z. Liu, L. Li, L. Shen, C. Bai, and Y. Wang, “Salient object segmentation via effective integration of saliency and objectness,”IEEE Trans. Multimedia, vol. 19, no. 8, pp. 1742–1756, Aug.2017.
[6]
L. Wang, L. Wang, H. Lu, P. Zhang, and X. Ruan, “Salient object detection with recurrent fully convolutional networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 7, pp. 1734–1746, Jul.2019.
[7]
Z. Luo, A. K. Mishra, A. Achkar, J. A. Eichel, S. Li, and P.-M. Jodoin, “Non-local deep features for salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 6593–6601, 2017.
[8]
T. Wang, et al. “Detect globally, refine locally: A novel approach to saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3127–3135.
[9]
L. Zhang, J. Dai, H. Lu, Y. He, and G. Wang, “A bi-directional message passing model for salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1741–1750.
[10]
G. Li, Y. Xie, L. Lin, and Y. Yu, “Instance-level salient object segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2386–2395.
[11]
R. Achanta, S. Hemami, F. Estrada, and S. Süsstrunk, “Frequency-tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1597–1604.
[12]
Y. Liu, J. Han, Q. Zhang, and L. Wang, “Salient object detection via two-stage graphs,”IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 4, pp. 1023–1037, Apr.2019.
[13]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015, pp. 1–14.
[14]
K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 630–645.
[15]
Q. Xie, O. Remil, Y. Guo, M. Wang, M. Wei, and J. Wang, “Object detection and tracking under occlusion for object-level RGB-D video segmentation,”IEEE Trans. Multimedia, vol. 20, no. 3, pp. 580–592, 2017.
[16]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[17]
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1492–1500.
[18]
Y. Tang and X. Wu, “Salient object detection using cascaded convolutional neural networks and adversarial learning,”IEEE Trans. Multimedia, vol. 21, no. 9, pp. 2237–2247, Sep.2019.
[19]
K. Fu, Q. Zhao, and I. Y. Gu, “RefiNet: A deep segmentation assisted refinement network for salient object detection,”IEEE Trans. Multimedia, vol. 21, no. 2, pp. 457–469, Feb.2019.
[20]
D. P. Fanet al., “Rethinking RGB-D salient object detection: Models, datasets, and large-scale benchmarks,”2019, arXiv:1907.06781.
[21]
J. Han, L. Shao, D. Xu, and J. Shotton, “Enhanced computer vision with microsoft kinect sensor: A review,”IEEE Trans. Cybern., vol. 43, pp. 1318–1334, Oct.2013.
[22]
I. Realsense, “Introducing intel realsense lidar camera,” Website, 2020, https://www.intelrealsense.com/
[23]
R. Huang, Y. Xing, and Z. Wang, “RGB-D salient object detection by a CNN with multiple layers fusion,”IEEE Signal Process. Lett., vol. 26, no. 4, pp. 552–556, Apr.2019.
[24]
J. Han, H. Chen, N. Liu, C. Yan, and X. Li, “CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion,”IEEE Trans. Cybern., vol. 48, no. 11, pp. 3171–3183, Nov.2018.
[25]
H. Chen and Y. Li, “Progressively complementarity-aware fusion network for RGB-D salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3051–3060.
[26]
Z. Liu, S. Shi, Q. Duan, W. Zhang, and P. Zhao, “Salient object detection for RGB-D image by single stream recurrent convolution neural network,”Neurocomputing, vol. 363, pp. 46–57, 2019.
[27]
H. Chen and Y. Li, “Three-stream attention-aware network for RGB-D salient object detection,”IEEE Trans. Image Process., vol. 28, no. 6, pp. 2825–2835, Jun.2019.
[28]
J. Zhao, Y. Cao, D. Fan, M. Cheng, X. Li, and L. Zhang, “Contrast prior and fluid pyramid integration for RGB-D salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3927–3936.
[29]
C. Zhu, X. Cai, K. Huang, T. H. Li, and G. Li, “PDNet: Prior-model guided depth-enhanced network for salient object detection,” in Proc. EEE Int. Conf. Multimedia Expo, 2018, pp. 199–204.
[30]
X. Wang, T. Sun, R. Yang, C. Li, B. Luo, and J. Tang, “Quality-aware multimodal saliency detection via deep reinforcement learning,”2018, arXiv:1811.10763.
[31]
N. Wang, and X. Gong, “Adaptive fusion for RGB-D salient object detection,”IEEE Access, vol. 7, pp. 55 277–55 284, 2019.
[32]
L. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,’’2017, arXiv:1706.05587.
[33]
L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 801–818.
[34]
P. Zhang, D. Wang, H. Lu, H. Wang, and X. Ruan, “Amulet: Aggregating multi-level convolutional features for salient object detection,” in Proc. IEEE Int. Conf. Comput. Vision, 2017, pp. 202–211.
[35]
G. Li and Y. Yu, “Deep contrast learning for salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 478–487.
[36]
L. Wang, H. Lu, X. Ruan, and M.-H. Yang, “Deep networks for saliency detection via local estimation and global search,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3183–3192.
[37]
Y. Liu, J. Han, Q. Zhang, and C. Shan, “Deep salient object detection with contextual information guidance,”IEEE Trans. Image Process., vol. 29, pp. 360–374, 2019.
[38]
J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 1, no. 1, pp. 1–13, 2019.
[39]
R. Ju, L. Ge, W. Geng, T. Ren, and G. Wu, “Depth saliency based on anisotropic center-surround difference,” in Proc. IEEE Int. Conf. Image Process., 2014, pp. 1115–1119.
[40]
A. Wang and M. Wang, “RGB-D salient object detection via minimum barrier distance transform and saliency fusion,”IEEE Signal Process. Lett., vol. 24, no. 5, pp. 663–667, 2017.
[41]
R. Shigematsu, D. Feng, S. You, and N. Barnes, “Learning RGB-D salient object detection using background enclosure, depth contrast, and top-down features,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2749–2757.
[42]
L. Qu, S. He, J. Zhang, J. Tian, Y. Tang, and Q. Yang, “RGB-D salient object detection via deep fusion,”IEEE Trans. Image Process., vol. 26, no. 5, pp. 2274–2285, May2017.
[43]
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
[44]
R. Zhao, W. Ouyang, H. Li, and X. Wang, “Saliency detection by multi-context deep learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1265–1274.
[45]
T. Wanget al., “Detect globally, refine locally: A novel approach to saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 3127–3135.
[46]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected CRFs,”2014, arXiv:1412.7062.
[47]
H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2002–2011.
[48]
A. Kroner, M. Senden, K. Driessens, and R. Goebel, “Contextual encoder-decoder network for visual saliency prediction,”Neural Netw., vol. 129, pp. 261–270, 2019.
[49]
V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proc. Int. Conf. Mach. Learn., 2010, pp. 807–814.
[50]
C. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in Proc. Artif. Intell. Statist., 2015, pp. 562–570.
[51]
S. Xie and Z. Tu, “Holistically-nested edge detection,”Int. J. Comput. Vis., vol. 125, no. 1, pp. 3–12, 2017.
[52]
S. Woo, J. Park, J. Lee, and I. So Kweon, “CBAM: Convolutional block attention module,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 3–19.
[53]
H. Peng, B. Li, W. Xiong, W. Hu, and R. Ji, “RGB-D salient object detection: A benchmark and algorithms,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 92–109.
[54]
Y. Niu, Y. Geng, X. Li, and F. Liu, “Leveraging stereopsis for saliency analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 454–461.
[55]
D. Fan, M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4548–4557.
[56]
A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” in Adv. Neural Inf. Process. Syst., 2019, pp. 8024–8035.
[57]
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. Int. Conf. Artif. Intell. Statist., 2010, pp. 249–256.
[58]
I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. Int. Conf. Int. Conf. Mach. Learn., 2013, pp. 1130–1139.
[59]
Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “GCNet: Non-local networks meet squeeze-excitation networks and beyond,” in Proc. IEEE Int. Conf. Comput. Vision Workshops, 2019, pp. 1–10.

Cited By

View all
  • (2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
  • (2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11_Part_1(10917-10929)Online publication date: 11-Jun-2024
  • (2024)Feature Calibrating and Fusing Network for RGB-D Salient Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329658134:3(1493-1507)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 23, Issue
2021
1967 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2021

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
  • (2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11_Part_1(10917-10929)Online publication date: 11-Jun-2024
  • (2024)Feature Calibrating and Fusing Network for RGB-D Salient Object DetectionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329658134:3(1493-1507)Online publication date: 1-Mar-2024
  • (2023)Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object DetectionAdvances in Multimedia10.1155/2023/99219882023Online publication date: 1-Jan-2023
  • (2023)C$^{2}$DFNet: Criss-Cross Dynamic Filter Network for RGB-D Salient Object DetectionIEEE Transactions on Multimedia10.1109/TMM.2022.318785625(5142-5154)Online publication date: 1-Jan-2023
  • (2023)PGDENet: Progressive Guided Fusion and Depth Enhancement Network for RGB-D Indoor Scene ParsingIEEE Transactions on Multimedia10.1109/TMM.2022.316185225(3483-3494)Online publication date: 1-Jan-2023
  • (2023)Radio-Assisted Human DetectionIEEE Transactions on Multimedia10.1109/TMM.2022.314912925(2613-2623)Online publication date: 1-Jan-2023
  • (2023)A Feature Divide-and-Conquer Network for RGB-T Semantic SegmentationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.322935933:6(2892-2905)Online publication date: 1-Jun-2023
  • (2023)RGB-D saliency detection via complementary and selective learningApplied Intelligence10.1007/s10489-022-03612-253:7(7957-7969)Online publication date: 1-Apr-2023
  • (2023)Research on Improved Algorithm of Significance Object Detection Based on ATSA ModelAdvances in Brain Inspired Cognitive Systems10.1007/978-981-97-1417-9_15(154-165)Online publication date: 5-Aug-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media