research-article

Multi-level progressive parallel attention guided salient object detection for RGB-D images

Authors:

Peng ZhaoAuthors Info & Claims

The Visual Computer, Volume 37, Issue 3

Pages 529 - 540

https://doi.org/10.1007/s00371-020-01821-9

Published: 01 March 2021 Publication History

Abstract

Detecting salient objects in RGB-D images attracts more and more attention in recent years. It benefits from the widespread use of depth sensors and can be applied in the comprehensive understanding of RGB-D images. Existing models focus on double-stream networks which transfer from color stream to depth stream, but depth stream with one channel information cannot learn the same feature as color stream with three channels information even if HHA representation is adopted. In our works, RGB-D four-channels input is chosen, and meanwhile, progressive parallel spatial and channel attention mechanisms are performed to improve feature representation. Spatial and channel attention can pay more attention on partial positions and channels in the image which show higher response to salient objects. Both attentive features are optimized by attentive feature from higher layer, respectively, and parallel fed into recurrent convolutional layer to generate side-output saliency maps guided by saliency map from higher layer. Last multi-level saliency maps are fused together from multi-scale perspective. Experiments on benchmark datasets demonstrate that parallel attention mechanism and progressive optimization operation play an important role in improving the accuracy of salient object detection, and our model outperforms state-of-the-art models in evaluation matrices.

References

[1]

Wu, P., Duan, L., Kong, L.: RGB-D salient object detection via feature fusion and multi-scale enhancement. In: CCF Chinese Conference on Computer Vision. Springer, pp. 359–368 (2015)

[2]

Ren, J., Gong, X., Yu, L., Zhou, W., Ying Yang, M.: Exploiting global priors for RGB-D saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 25–32 (2015)

[3]

Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: IEEE International Conference on Image Processing, pp. 1115–1119 (2015)

[4]

Jiang, L., Koch, A., Zell, A.: Salient regions detection for indoor robots using RGB-D data. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1323–1328. IEEE (2015)

[5]

Guo, J., Ren, T., Bei, J., Zhu, Y.: Salient object detection in RGB-D image based on saliency fusion and propagation. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, p. 59. ACM (2015)

[6]

Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2343–2350 (2016)

[7]

Lin H, Lin C, Zhao Y, and Wang A 3D saliency detection based on background detection J. Vis. Commun. Image Represent. 2017 48 238-253

[8]

Zhu, C., Zhang, W., Li, T.H., Li, G.: Exploiting the value of the center-dark channel prior for salient object detection. arXiv preprint arXiv:1805.05132

[9]

Zhu, C., Cai, X., Huang, K., Li, T.H., Li, G.: PDNet: prior-model guided depth-enhanced network for salient object detection. arXiv preprint arXiv:1803.08636

[10]

Chen, H., Li, Y., Su, D.: RGB-D saliency detection by multi-stream late fusion network. In: International Conference on Computer Vision Systems, pp. 459–468. Springer (2017)

[11]

Han J, Hao C, Liu N, Yan C, and Li X CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion IEEE Trans. Cybern. 2017 48 99 1-13

[12]

Chen, H., Li, Y.-F., Su, D.: M3net: multi-scale multi-path multi-modal fusion network and example application to RGB-D salient object detection. In: Intelligent Robots and Systems (IROS), pp. 4911–4916. IEEE (2017)

[13]

Chen H, Li Y, and Su D Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection Pattern Recognit. 2019 86 376-385

[14]

Chen, H., Li, Y.: Progressively complementarity-aware fusion network for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3051–3060 (2018)

[15]

Woo, S., Park, J., Lee, J.-Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

[16]

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507

[17]

Zhang, X., Wang, T., Qi, J., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 714–722 (2018)

[18]

Sun F, Li W, and Guan Y Self-attention recurrent network for saliency detection Multimed. Tools Appl. 2018 78 1-15

[19]

Liu, N., Han, J., Yang, M.-H.: Picanet: Learning pixel-wise contextual attention for saliency detection. arXiv preprint arXiv:1708.06433

[20]

Chen, S., Tan, X., Wang, B., Hu, X.: Reverse attention for salient object detection. arXiv preprint arXiv:1807.09940

[21]

Chen S, Wang B, Tan X, and Hu X Embedding attention and residual network for accurate salient object detection IEEE Trans. Cybern 2018

[22]

Zhang, P., Wang, L., Wang, D., Lu, H., Shen, C.: Agile amulet: real-time salient object detection with contextual attention. arXiv preprint arXiv:1802.06960

[23]

Chen, H., Li, Y., Su, D.: RGB-D salient object detection based on discriminative cross-modal transfer learning. arXiv preprint arXiv:1703.00122

[24]

Chen H, Li Y, and Su D Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection Pattern Recognit. 2019 86 376-385

[25]

Wang S, Zhou Z, Jin W, and Qu H Visual saliency detection for RGB-D images under a bayesian framework IPSJ Trans. Comput. Vis. Appl. 2018 10 1 1

[26]

Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer (2014)

[27]

Cong R, Lei J, Zhang C, Huang Q, Cao X, and Hou C Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion IEEE Signal Process. Lett. 2016 23 6 819-823

[28]

Du, D., Xu, X., Ren, T., Wu, G.: Depth images could tell us more: enhancing depth discriminability for RGB-D scene recognition. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)

[29]

Song, X., Herranz, L., Jiang, S.: Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs. In: AAAI, pp. 4271–4277 (2017)

[30]

Liu Z, Shi S, Duan Q, Zhang W, and Zhao P Salient object detection for RGB-D image by single stream recurrent convolution neural network Neurocomputing 2019 363 46-57

[31]

Huang, P., Shen, C.-H., Hsiao, H.-F.: RGBD salient object detection using spatially coherent deep learning framework. In: 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), pp. 1–5. IEEE (2018)

[32]

Fan, D.-P., Lin, Z., Zhao, J.-X., Liu, Y., Zhang, Z., Hou, Q., Zhu, M., Cheng, M.-M.: Rethinking RGB-D salient object detection: models, datasets, and large-scale benchmarks. arXiv preprint arXiv:1907.06781

[33]

Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: Computer Vision and Pattern Recognition, pp. 3367–3375 (2015)

[34]

Qu L, He S, Zhang J, Tian J, Tang Y, and Yang Q Rgbd salient object detection via deep fusion IEEE Trans. Image Process. 2017 26 5 2274-2285

[35]

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)

[36]

Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6298–6306. IEEE (2017)

[37]

Xu, H., Saenko, K.: Ask, attend and answer: exploring question-guided spatial attention for visual question answering. In: European Conference on Computer Vision, pp. 451–466. Springer (2016)

[38]

Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)

[39]

Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.: Visual7w: grounded question answering in images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4995–5004 (2016)

[40]

Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. arXiv preprint arXiv:1702.07432 1(2)

[41]

Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. arXiv preprint arXiv:1704.06904

[42]

Kuen, J., Wang, Z., Wang, G.: Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3668–3677 (2016)

[43]

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci (2014). arXiv preprint arXiv:1409.1556

[44]

Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

[45]

Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: European Conference on Computer Vision, pp. 92–109. Springer (2014)

[46]

Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 454–461. IEEE (2012)

[47]

Martin DR, Fowlkes CC, and Malik J Learning to detect natural image boundaries using local brightness, color, and texture cues IEEE Trans. Pattern Anal. Mach. Intell. 2004 26 5 530-549

[48]

Fan, D.-P., Cheng, M.-M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4558–4567 (2017)

[49]

Fan, D.-P., Gong, C., Cao, Y., Ren, B., Cheng, M.-M., Borji, A.: Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421

[50]

Fan, D.-P., Cheng, M.-M., Liu, J.-J., Gao, S.-H., Hou, Q., Borji, A.: Salient objects in clutter: bringing salient object detection to the foreground. In: European Conference on Computer Vision, pp. 196–212. Springer (2018)

[51]

Guo, J., Ren, T., Bei, J.: Salient object detection for RGB-D image via saliency evolution. In: Multimedia and Expo (ICME), pp. 1–6. IEEE (2016)

[52]

Chen H and Li Y Three-stream attention-aware network for RGB-D salient object detection IEEE Trans. Image Process. 2019 28 6 2825-2835

Cited By

Wang BZhang FZhao Y(2024)LCH: fast RGB-D salient object detection on CPU via lightweight convolutional network with hybrid knowledge distillationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02898-840:3(1997-2014)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02898-8
Zhang YWang HYang GZhang JGong CWang Y(2024)CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02887-x40:3(1805-1823)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02887-x
Gao LFu PXu MWang TLiu B(2024)UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02870-640:3(1565-1582)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02870-6
Show More Cited By

Recommendations

Salient object detection in RGB-D image based on saliency fusion and propagation
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service

Automatic detection of salient objects in images attracts much research attention for its usage in numerous multimedia applications. In this paper, we propose a saliency fusion and propagation strategy based salient object detection method for RGB-D ...
Depth Information Fused Salient Object Detection
ICIMCS '14: Proceedings of International Conference on Internet Multimedia Computing and Service

Saliency Detection has emerged as a hot topic due to its potential application in image and video understanding. Most existing saliency detection algorithms focus on two-dimensional information while the depth information is often ignored. In this paper,...
Salient object detection: From pixels to segments

In this paper we propose a novel approach to the task of salient object detection. In contrast to previous salient object detectors that are based on a spotlight attention theory, we follow an object-based attention theory and incorporate the notion of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics

The Visual Computer: International Journal of Computer Graphics Volume 37, Issue 3

Mar 2021

221 pages

ISSN:0178-2789

Issue’s Table of Contents

© Springer-Verlag GmbH Germany, part of Springer Nature 2020.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2021

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Science Foundation of Anhui Province
National Natural Science Foundation of China
Key Program of Natural Science Project of Educational Commission of Anhui Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang BZhang FZhao Y(2024)LCH: fast RGB-D salient object detection on CPU via lightweight convolutional network with hybrid knowledge distillationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02898-840:3(1997-2014)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02898-8
Zhang YWang HYang GZhang JGong CWang Y(2024)CSNet: a ConvNeXt-based Siamese network for RGB-D salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02887-x40:3(1805-1823)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02887-x
Gao LFu PXu MWang TLiu B(2024)UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02870-640:3(1565-1582)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02870-6
Meng LYuan MShi XLiu QZhange LWu JDai PCheng F(2023)Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object DetectionAdvances in Multimedia10.1155/2023/99219882023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/9921988
Gao YDai MZhang Q(2023)Cross-modal and multi-level feature refinement network for RGB-D salient object detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02543-w39:9(3979-3994)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1007/s00371-022-02543-w
Xu JLiu WXing WWei X(2023)MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-022-02460-y39:5(2005-2019)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1007/s00371-022-02460-y
Bayoudh KKnani RHamdaoui FMtibaa A(2022)A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasetsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02166-738:8(2939-2970)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1007/s00371-021-02166-7
Zhao TPan SGao WSheng CSun YWei J(2022)Attention Unet++ for lightweight depth estimation from sparse depth samples and a single RGB imageThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02092-838:5(1619-1630)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s00371-021-02092-8
Zhang XJin T(2020)Attention-Based Asymmetric Fusion Network for Saliency Prediction in 3D ImagesArtificial Intelligence and Mobile Services – AIMS 202010.1007/978-3-030-59605-7_8(93-105)Online publication date: 18-Sep-2020
https://dl.acm.org/doi/10.1007/978-3-030-59605-7_8

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents