Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Multi-level progressive parallel attention guided salient object detection for RGB-D images

Published: 01 March 2021 Publication History

Abstract

Detecting salient objects in RGB-D images attracts more and more attention in recent years. It benefits from the widespread use of depth sensors and can be applied in the comprehensive understanding of RGB-D images. Existing models focus on double-stream networks which transfer from color stream to depth stream, but depth stream with one channel information cannot learn the same feature as color stream with three channels information even if HHA representation is adopted. In our works, RGB-D four-channels input is chosen, and meanwhile, progressive parallel spatial and channel attention mechanisms are performed to improve feature representation. Spatial and channel attention can pay more attention on partial positions and channels in the image which show higher response to salient objects. Both attentive features are optimized by attentive feature from higher layer, respectively, and parallel fed into recurrent convolutional layer to generate side-output saliency maps guided by saliency map from higher layer. Last multi-level saliency maps are fused together from multi-scale perspective. Experiments on benchmark datasets demonstrate that parallel attention mechanism and progressive optimization operation play an important role in improving the accuracy of salient object detection, and our model outperforms state-of-the-art models in evaluation matrices.

References

[1]
Wu, P., Duan, L., Kong, L.: RGB-D salient object detection via feature fusion and multi-scale enhancement. In: CCF Chinese Conference on Computer Vision. Springer, pp. 359–368 (2015)
[2]
Ren, J., Gong, X., Yu, L., Zhou, W., Ying Yang, M.: Exploiting global priors for RGB-D saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 25–32 (2015)
[3]
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: IEEE International Conference on Image Processing, pp. 1115–1119 (2015)
[4]
Jiang, L., Koch, A., Zell, A.: Salient regions detection for indoor robots using RGB-D data. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1323–1328. IEEE (2015)
[5]
Guo, J., Ren, T., Bei, J., Zhu, Y.: Salient object detection in RGB-D image based on saliency fusion and propagation. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, p. 59. ACM (2015)
[6]
Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2343–2350 (2016)
[7]
Lin H, Lin C, Zhao Y, and Wang A 3D saliency detection based on background detection J. Vis. Commun. Image Represent. 2017 48 238-253
[8]
Zhu, C., Zhang, W., Li, T.H., Li, G.: Exploiting the value of the center-dark channel prior for salient object detection. arXiv preprint arXiv:1805.05132
[9]
Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: PDNet: prior-model guided depth-enhanced network for salient object detection. arXiv preprint arXiv:1803.08636
[10]
Chen, H., Li, Y., Su, D.: RGB-D saliency detection by multi-stream late fusion network. In: International Conference on Computer Vision Systems, pp. 459–468. Springer (2017)
[11]
Han J, Hao C, Liu N, Yan C, and Li X CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion IEEE Trans. Cybern. 2017 48 99 1-13
[12]
Chen, H., Li, Y.-F., Su, D.: M3net: multi-scale multi-path multi-modal fusion network and example application to RGB-D salient object detection. In: Intelligent Robots and Systems (IROS), pp. 4911–4916. IEEE (2017)
[13]
Chen H, Li Y, and Su D Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection Pattern Recognit. 2019 86 376-385
[14]
Chen, H., Li, Y.: Progressively complementarity-aware fusion network for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3051–3060 (2018)
[15]
Woo, S., Park, J., Lee, J.-Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
[16]
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507
[17]
Zhang, X., Wang, T., Qi, J., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 714–722 (2018)
[18]
Sun F, Li W, and Guan Y Self-attention recurrent network for saliency detection Multimed. Tools Appl. 2018 78 1-15
[19]
Liu, N., Han, J., Yang, M.-H.: Picanet: Learning pixel-wise contextual attention for saliency detection. arXiv preprint arXiv:1708.06433
[20]
Chen, S., Tan, X., Wang, B., Hu, X.: Reverse attention for salient object detection. arXiv preprint arXiv:1807.09940
[21]
Chen S, Wang B, Tan X, and Hu X Embedding attention and residual network for accurate salient object detection IEEE Trans. Cybern 2018
[22]
Zhang, P., Wang, L., Wang, D., Lu, H., Shen, C.: Agile amulet: real-time salient object detection with contextual attention. arXiv preprint arXiv:1802.06960
[23]
Chen, H., Li, Y., Su, D.: RGB-D salient object detection based on discriminative cross-modal transfer learning. arXiv preprint arXiv:1703.00122
[24]
Chen H, Li Y, and Su D Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection Pattern Recognit. 2019 86 376-385
[25]
Wang S, Zhou Z, Jin W, and Qu H Visual saliency detection for RGB-D images under a bayesian framework IPSJ Trans. Comput. Vis. Appl. 2018 10 1 1
[26]
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer (2014)
[27]
Cong R, Lei J, Zhang C, Huang Q, Cao X, and Hou C Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion IEEE Signal Process. Lett. 2016 23 6 819-823
[28]
Du, D., Xu, X., Ren, T., Wu, G.: Depth images could tell us more: enhancing depth discriminability for RGB-D scene recognition. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)
[29]
Song, X., Herranz, L., Jiang, S.: Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs. In: AAAI, pp. 4271–4277 (2017)
[30]
Liu Z, Shi S, Duan Q, Zhang W, and Zhao P Salient object detection for RGB-D image by single stream recurrent convolution neural network Neurocomputing 2019 363 46-57
[31]
Huang, P., Shen, C.-H., Hsiao, H.-F.: RGBD salient object detection using spatially coherent deep learning framework. In: 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), pp. 1–5. IEEE (2018)
[32]
Fan, D.-P., Lin, Z., Zhao, J.-X., Liu, Y., Zhang, Z., Hou, Q., Zhu, M., Cheng, M.-M.: Rethinking RGB-D salient object detection: models, datasets, and large-scale benchmarks. arXiv preprint arXiv:1907.06781
[33]
Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: Computer Vision and Pattern Recognition, pp. 3367–3375 (2015)
[34]
Qu L, He S, Zhang J, Tian J, Tang Y, and Yang Q Rgbd salient object detection via deep fusion IEEE Trans. Image Process. 2017 26 5 2274-2285
[35]
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
[36]
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6298–6306. IEEE (2017)
[37]
Xu, H., Saenko, K.: Ask, attend and answer: exploring question-guided spatial attention for visual question answering. In: European Conference on Computer Vision, pp. 451–466. Springer (2016)
[38]
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)
[39]
Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.: Visual7w: grounded question answering in images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4995–5004 (2016)
[40]
Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. arXiv preprint arXiv:1702.07432 1(2)
[41]
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. arXiv preprint arXiv:1704.06904
[42]
Kuen, J., Wang, Z., Wang, G.: Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3677 (2016)
[43]
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci (2014). arXiv preprint arXiv:1409.1556
[44]
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
[45]
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: European Conference on Computer Vision, pp. 92–109. Springer (2014)
[46]
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 454–461. IEEE (2012)
[47]
Martin DR, Fowlkes CC, and Malik J Learning to detect natural image boundaries using local brightness, color, and texture cues IEEE Trans. Pattern Anal. Mach. Intell. 2004 26 5 530-549
[48]
Fan, D.-P., Cheng, M.-M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4558–4567 (2017)
[49]
Fan, D.-P., Gong, C., Cao, Y., Ren, B., Cheng, M.-M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421
[50]
Fan, D.-P., Cheng, M.-M., Liu, J.-J., Gao, S.-H., Hou, Q., Borji, A.: Salient objects in clutter: bringing salient object detection to the foreground. In: European Conference on Computer Vision, pp. 196–212. Springer (2018)
[51]
Guo, J., Ren, T., Bei, J.: Salient object detection for RGB-D image via saliency evolution. In: Multimedia and Expo (ICME), pp. 1–6. IEEE (2016)
[52]
Chen H and Li Y Three-stream attention-aware network for RGB-D salient object detection IEEE Trans. Image Process. 2019 28 6 2825-2835

Cited By

View all
  • (2024)LCH: fast RGB-D salient object detection on CPU via lightweight convolutional network with hybrid knowledge distillationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02898-840:3(1997-2014)Online publication date: 1-Mar-2024
  • (2024)CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02887-x40:3(1805-1823)Online publication date: 1-Mar-2024
  • (2024)UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02870-640:3(1565-1582)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics
The Visual Computer: International Journal of Computer Graphics  Volume 37, Issue 3
Mar 2021
221 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2021

Author Tags

  1. Salient object detection
  2. RGB-D image
  3. Attention mechanism
  4. Recurrent convolutional layer

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LCH: fast RGB-D salient object detection on CPU via lightweight convolutional network with hybrid knowledge distillationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02898-840:3(1997-2014)Online publication date: 1-Mar-2024
  • (2024)CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02887-x40:3(1805-1823)Online publication date: 1-Mar-2024
  • (2024)UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02870-640:3(1565-1582)Online publication date: 1-Mar-2024
  • (2023)Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object DetectionAdvances in Multimedia10.1155/2023/99219882023Online publication date: 1-Jan-2023
  • (2023)Cross-modal and multi-level feature refinement network for RGB-D salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02543-w39:9(3979-3994)Online publication date: 1-Sep-2023
  • (2023)MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02460-y39:5(2005-2019)Online publication date: 1-May-2023
  • (2022)A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasetsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02166-738:8(2939-2970)Online publication date: 1-Aug-2022
  • (2022)Attention Unet++ for lightweight depth estimation from sparse depth samples and a single RGB imageThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02092-838:5(1619-1630)Online publication date: 1-May-2022
  • (2020)Attention-Based Asymmetric Fusion Network for Saliency Prediction in 3D ImagesArtificial Intelligence and Mobile Services – AIMS 202010.1007/978-3-030-59605-7_8(93-105)Online publication date: 18-Sep-2020

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media