Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection

Published: 01 January 2023 Publication History

Abstract

Existing RGB + depth (RGB-D) salient object detection methods mainly focus on better integrating the cross-modal features of RGB images and depth maps. Many methods use the same feature interaction module to fuse RGB and depth maps, which ignores the inherent properties of different modalities. In contrast to previous methods, this paper proposes a novel RGB-D salient object detection method that uses a depth-feature guide cross-modal fusion module based on the properties of RGB and depth maps. First, a depth-feature guide cross-modal fusion module is designed using coordinate attention to utilize the simple data representation capability of depth maps effectively. Second, a dense decoder guidance module is proposed to recover the spatial details of salient objects. Furthermore, a context-aware content module is proposed to extract rich context information, which can predict multiple objects more completely. Experimental results on six benchmark public datasets demonstrate that, compared with 15 mainstream convolutional neural network detection methods, the saliency map edge contours detected by the proposed model have better continuity and the spatial structure details are clearer. Perfect results are achieved on four quantitative evaluation metrics. Furthermore, the effectiveness of the three proposed modules is verified through ablation experiments.

References

[1]
R. Cong, Y. Zhang, L. Fang, J. Li, Y. Zhao, S. Kwong, and Rrnet, “RRNet: relational reasoning network with parallel multiscale attention for salient object detection in optical remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2022.
[2]
H. Mei, Y. Liu, Z. Wei, D. Zhou, X. Wei, Q. Zhang, and X. Yang, “Exploring dense context for salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1378–1389, 2022.
[3]
Q. Chen, Z. Liu, Y. Zhang, K. Fu, Q. Zhao, and H. Du, “RGB-D salient object detection via 3D convolutional neural networks,” in Proceedings of the AAAI conference on artificial intelligence, pp. 1–9, California CA USA, May 2021.
[4]
Z. Li, C. Lang, L. Liang, J. Zhao, S. Feng, Q. Hou, and J. Feng, “Dense attentive feature enhancement for salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 12, pp. 8128–8141.
[5]
G. Li, Z. Liu, and H. Ling, “ICNet: information conversion network for RGB-D based salient object detection,” IEEE Transactions on Image Processing, vol. 29, pp. 4873–4884, 2020.
[6]
M. Jalwana, N. Akhtar, M. Bennamoun, and A. Mian, “CAMERAS: enhanced resolution and sanity preserving class activation mapping for image saliency,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16327–16336, Nashville, TN, USA, June 2021.
[7]
Y. Pang, L. Ye, X. Li, and J. Pan, “Incremental learning with saliency map for moving object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 3, pp. 640–651, 2018.
[8]
D. Mishra, S. Singh, R. Singh, and D. Kedia, “Multi-scale network (MsSG-CNN) for joint image and saliency map learning-based compression,” Neurocomputing, vol. 460, no. 14, pp. 95–105, 2021.
[9]
K. Gu, S. Wang, H. Yang, W. Lin, G. Zhai, X. Yang, and W. Zhang, “Saliency-Guided quality assessment of screen content images,” IEEE Transactions on Multimedia, vol. 18, no. 6, pp. 1098–1110, 2016.
[10]
Z. Zhang, Z. Lin, J. Xu, W.-D. Jin, S.-P. Lu, and D.-P. Fan, “Bilateral attention network for RGB-D salient object detection,” IEEE Transactions on Image Processing, vol. 30, pp. 1949–1961, 2021.
[11]
J. Zhang, D. Fan, Y. Dai, X. Yu, Y. Zhong, N. Barnes, and L. Shao, “RGB-D saliency detection via cascaded mutual information minimization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4338–4347, Montreal, BC, Canada, October 2021.
[12]
K. Fu, D. Fan, G. Ji, Q. Zhao, J. Shen, and C. Zhu, “Siamese network for RGB-D salient object detection and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5541–5559, 2021.
[13]
K. Yi, J. Zhu, F. Guo, and J. Xu, “Cross-stage multi-scale interaction network for RGB-D salient object detection,” IEEE Signal Processing Letters, vol. 29, pp. 2402–2406, 2022.
[14]
W. Jin, J. Xu, Q. Han, Y. Zhang, and M. Cheng, “CDNet: complementary depth network for RGB-D salient object detection,” IEEE Transactions on Image Processing, vol. 30, pp. 3376–3390, 2021.
[15]
F. Wang, J. Pan, S. Xu, and J. Tang, “Learning discriminative cross-modality features for RGB-D saliency detection,” IEEE Transactions on Image Processing, vol. 31, pp. 1285–1297, 2022.
[16]
Z. Huang, H. Chen, T. Zhou, Y. Yang, and B. Liu, “Multi-level cross-modal interaction network for RGB-D salient object detection,” Neurocomputing, vol. 452, no. 10, pp. 200–211, 2021.
[17]
H. Chen, Y. Li, and D. Su, “Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection,” Pattern Recognition, vol. 86, pp. 376–385, 2019.
[18]
J. Han, H. Chen, N. Liu, C. Yan, and X. Li, “CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion,” IEEE Transactions on Cybernetics, vol. 48, no. 11, pp. 3171–3183, 2018.
[19]
X. Zhao, L. Zhang, Y. Pang, H. Lu, and L. Zhang, “A single stream network for robust and real-time RGB-D salient object detection,” in Proceedings of the ECCV 2020: 16th European Conference Computer Vision, pp. 646–662, Glasgow, UK, August 2020.
[20]
S. Sun, C. Feng, S. Tong, Y. Zhao, N. Chen, and M. Zhu, “Evaluation of advanced phosphorus removal from slaughterhouse wastewater using industrial waste-based adsorbents,” Water Science and Technology: A Journal of the International Association on Water Pollution Research, vol. 83, no. 6, pp. 1407–1417, 2021.
[21]
N. Huang, Y. Yang, D. Zhang, Q. Zhang, and J. Han, “Employing bilinear fusion and saliency prior information for RGB-D salient object detection,” IEEE Transactions on Multimedia, vol. 24, pp. 1651–1664, 2022.
[22]
Z. Gao, C. Xu, H. Zhang, S. Li, and V. H. C. de Albuquerque, “Trustful Internet of surveillance things based on deeply represented visual Co-saliency detection,” IEEE Internet of Things Journal, vol. 7, no. 5, pp. 4092–4100, 2020.
[23]
N. Wang and X. Gong, “Adaptive fusion for RGB-D salient object detection,” IEEE Access, vol. 7, pp. 55277–55284, 2019.
[24]
C. Zhang, R. Cong, Q. Lin, L. Ma, F. Li, Y. Zhao, and S. Kwong, “Cross-modality discrepant interaction network for RGB-D salient object detection,” in Proceedings of the 29th ACM international conference on multimedia, pp. 2094–2102, Chengdu, China, October 2021.
[25]
W. Zhou, Y. Zhu, J. Lei, J. Wan, and L. Yu, “CCAFNet: crossflow and cross-scale adaptive fusion network for detecting salient objects in RGB-D images,” IEEE Transactions on Multimedia, vol. 24, pp. 2192–2204, 2022.
[26]
N. Huang, Y. Liu, Q. Zhang, and J. Han, “Joint cross-modal and unimodal features for RGB-D salient object detection,” IEEE Transactions on Multimedia, vol. 23, pp. 2428–2441, 2021.
[27]
J. Zhang, D.-P. Fan, Y. Dai, S. Anwar, F. Saleh, T. Zhang, N. Barnes, and Uc-Net, “Uncertainty inspired RGB-D saliency detection via conditional variational autoencoders,” Proc. CVPR, vol. 44, pp. 8579–8588, 2020.
[28]
W. Ji, J. Li, M. Zhang, Y. Piao, and H. Lu, “Accurate RGB-D salient object detection via collaborative learning,” in Proceedings of the ECCV 2020: 16th European Conference Computer Vision, pp. 52–69, Glasgow, UK, August 2020.
[29]
H. Luo, C. B. Hill, G. Zhou, X. Q. Zhang, C. Li, and S. Lyu, “Genome-wide association mapping reveals novel genes associated with coleoptile length in a worldwide collection of barley,” BMC Plant Biology, vol. 20, no. 1, pp. 346–364, 2020.
[30]
B. V. Lad, M. F. Hashmi, and A. G. Keskar, “Boundary preserved salient object detection using guided filter based hybridization approach of transformation and spatial domain analysis,” IEEE Access, vol. 10, pp. 67230–67246, 2022.
[31]
Z. Liu, Y. Wang, Z. Tu, Y. Xiao, and B. Tang, “TriTransNet: RGB-D salient object detection with a triplet transformer embedding network,” in Proceedings of the 29th ACM international conference on multimedia, pp. 4481–4490, Chengdu China, October 2021.
[32]
N. Liu, N. Zhang, K. Wan, and L. Shao, “J. Han,visual saliency transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 4702–4712, Montreal, BC, Canada, October 2021.
[33]
H. Ge, S. Wang, C. Huang, and Y. An, “A visual tracking algorithm combining parallel network and dual attention-aware mechanism,” IEEE Access, vol. 11, pp. 15831–15844, 2023.
[34]
J. Bai, Z. Wen, Z. Xiao, F. Ye, Y. Zhu, M. Alazab, and L. Jiao, “Hyperspectral image classification based on multibranch attention transformer networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–17, August 2022.
[35]
S. Xiao, Y. Li, Y. Ye, L. Chen, S. Pu, Z. Zhao, J. Shao, and J. Xiao, “Hierarchical temporal fusion of multi-grained attention features for video question answering,” Neural Processing Letters, vol. 52, no. 2, pp. 993–1003, 2020.
[36]
J. Yang, C. Zhang, Y. Tang, and Z. Li, “PAFM: pose-drive attention fusion mechanism for occluded person re-identification,” Neural Computing & Applications, vol. 34, no. 10, pp. 8241–8252, 2022.
[37]
Z. Chen, Y. Shang, A. Python, Y. Cai, J. Yin, and Db-BlendMask, “DB-BlendMask: decomposed attention and balanced BlendMask for instance segmentation of high-resolution remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2022.
[38]
M. Zhang, W. Ren, Y. Piao, Z. Rong, and H. Lu, “Select, supplement and focus for RGB-D saliency detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3469–3478, Seattle, WA, USA, June 2020.
[39]
W. Ji, G. Yan, J. Li, Y. Piao, S. Yao, M. Zhang, L. Cheng, and H. Lu, “DMRA: depth-induced multi-scale recurrent attention network for RGB-D saliency detection,” IEEE Transactions on Image Processing, vol. 31, pp. 2321–2336, 2022.
[40]
C. Li, R. Cong, S. Kwong, J. Hou, H. Fu, G. Zhu, D. Zhang, and Q. Huang, “ASIF-Net: attention steered interweave fusion network for RGB-D salient object detection,” IEEE Transactions on Cybernetics, vol. 51, no. 1, pp. 88–100, Jan. 2021.
[41]
Z. Liu, Q. Duan, S. Shi, and P. Zhao, “Multi-level progressive parallel attention guided salient object detection for RGB-D images,” The Visual Computer, vol. 37, no. 3, pp. 529–540, 2021.
[42]
R. Ju, L. Ge, W. Geng, T. Ren, and G. Wu, “Depth saliency based on anisotropic center-surround difference,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1115–1119, Paris, France, October 2014.
[43]
Y. Cheng, H. Fu, X. Wei, J. Xiao, and X. Cao, “Depth enhanced saliency detection method,” in Proceedings of the international conference on internet multimedia computing and service, pp. 23–27, Xiamen, China, July 2014.
[44]
H. Peng, B. Li, W. Xiong, W. Hu, and R. Ji, “RGBD salient object detection: a benchmark and algorithms,” in Proceedings of the 13th European Conference Computer Vision--ECCV 2014, pp. 92–109, Zurich, Switzerland, September 2014.
[45]
G. Li and C. Zhu, “A three-pathway psychobiological framework of salient object detection using stereoscopic technology,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 3018–3014, Carolina WB, USA, October 2017.
[46]
Y. Piao, W. Ji, J. Li, M. Zhang, and H. Lu, “Depth-induced multi-scale recurrent attention network for saliency detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7253–7262, Seoul, Korea(South), November 2019.
[47]
D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, and M.-M. Cheng, “Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 5, pp. 2075–2089, 2021.
[48]
R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604, Miami, FL, USA, June 2009.
[49]
D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: a new way to evaluate foreground maps,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4548–4557, Carolina WB, USA, October 2017.
[50]
D.-P. Fan, C. Gong, Y. Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” 2018, https://arxiv.org/abs/1805.10421.
[51]
F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Contrast based filtering for salient region detection,” in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–740, Providence, RI, USA, June 2012.
[52]
X. Zhao, Y. Pang, L. Zhang, H. Lu, and X. Ruan, “Self-supervised pretraining for rgb-d salient object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, Columbia, Canada, June 2022.
[53]
Y.-H. Wu, Y. Liu, J. Xu, J.-W. Bian, Y.-C. Gu, and M.-M. Cheng, “Mobilesal: extremely efficient rgb-d salient object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, 2022.
[54]
H. Wen, C. Yan, X. Zhou, R. Cong, Y. Sun, B. Zheng, J. Zhang, Y. Bao, and G. Ding, “Dynamic selective network for RGB-D salient object detection,” IEEE Transactions on Image Processing, vol. 30, pp. 9179–9192, 2021.
[55]
W. Zhang, G. Ji, Z. Wang, K. Fu, and Q. Zhao, “Depth quality-inspired feature manipulation for efficient RGB-D salient object detection,” in Proceedings of the 29th ACM international conference on multimedia, pp. 731–740, Chengdu China, October 2021.
[56]
W. Ji, J. Li, S. Yu, M. Zhang, Y. Piao, S. Yao, Q. Bi, K. Ma, Y. Zheng, H. Lu, and L. Cheng, “Calibrated RGB-D saliency object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9471–9481, Nashville, TN, USA, June 2021.
[57]
K. Fu, D. Fan, G. Ji, and Q. Zhao, “Joint learning and densely-cooperative fusion framework for RGB-D salient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3049–3059, Seattle, WA, USA, June 2020.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Advances in Multimedia
Advances in Multimedia  Volume 2023, Issue
2023
533 pages
ISSN:1687-5680
EISSN:1687-5699
Issue’s Table of Contents
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Published: 01 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media