Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection

Published: 26 February 2024 Publication History

Abstract

Recent years have witnessed a growing interest in co-object segmentation and multi-modal salient object detection. Many efforts are devoted to segmenting co-existed objects among a group of images or detecting salient objects from different modalities. Albeit the appreciable performance achieved on respective benchmarks, each of these methods is limited to a specific task and cannot be generalized to other tasks. In this paper, we develop a <bold>Uni</bold>fied <bold>TR</bold>ansformer-based framework, namely <bold>UniTR</bold>, aiming at tackling the above tasks individually with a unified architecture. Specifically, a transformer module (CoFormer) is introduced to learn the consistency of relevant objects or complementarity from different modalities. To generate high-quality segmentation maps, we adopt a dual-stream decoding paradigm that allows the extracted consistent or complementary information to better guide mask prediction. Moreover, a feature fusion module (ZoomFormer) is designed to enhance backbone features and capture multi-granularity and multi-semantic information. Extensive experiments show that our UniTR performs well on <bold>17</bold> <bold>benchmarks</bold>, and surpasses existing state-of-the-art approaches.

References

[1]
B. Yan et al., “Towards grand unification of object tracking,” in Proc. Eur. Conf. Comput. Vis., 2022, pp. 733–751.
[2]
G. Ghiasi, B. Zoph, E. D. Cubuk, Q. V. Le, and T.-Y. Lin, “Multi-task self-training for learning general representations,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 8836–8845.
[3]
Y. Su et al., “A unified transformer framework for group-based segmentation: Co-segmentation, co-saliency detection and video salient object detection,” IEEE Trans. Multimedia, vol. 26, pp. 313–325, 2024.
[4]
R. Girdhar et al., “Omnivore: A single model for many visual modalities,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 16081–16091.
[5]
Y. Kong, Y. Wang, and A. Li, “Spatiotemporal saliency representation learning for video action recognition,” IEEE Trans. Multimedia, vol. 24, pp. 1515–1528, 2022.
[6]
T. Chen et al., “Saliency guided inter-and intra-class relation constraints for weakly supervised semantic segmentation,” IEEE Trans. Multimedia, vol. 25, pp. 1727–1737, 2023.
[7]
S. Hu, H. P. Shum, N. Aslam, F. W. Li, and X. Liang, “A unified deep metric representation for mesh saliency detection and non-rigid shape matching,” IEEE Trans. Multimedia, vol. 22, no. 9, pp. 2278–2292, Sep. 2020.
[8]
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4489–4497.
[9]
H. Song, W. Wang, S. Zhao, J. Shen, and K.-M. Lam, “Pyramid dilated deeper convLSTM for video salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 715–731.
[10]
L. Qu et al., “RGBD salient object detection via deep fusion,” IEEE Trans. Image Process., vol. 26, no. 5, pp. 2274–2285, May 2017.
[11]
J. Han, H. Chen, N. Liu, C. Yan, and X. Li, “CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion,” IEEE Trans. Cybern., vol. 48, no. 11, pp. 3171–3183, Nov. 2018.
[12]
K. Fu, D.-P. Fan, G.-P. Ji, and Q. Zhao, “JL-DCF: Joint learning and densely-cooperative fusion framework for RGB-D salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3049–3059.
[13]
W. Li, O. H. Jafari, and C. Rother, “Deep object co-segmentation,” in Proc. Asian Conf. Comput. Vis., 2018, pp. 638–653.
[14]
H. Chen, Y. Huang, and H. Nakayama, “Semantic aware attention based deep object co-segmentation,” in Proc. Asian Conf. Comput. Vis., 2018, pp. 435–450.
[15]
C. Zhang, G. Li, G. Lin, Q. Wu, and R. Yao, “CycleSegNet: Object co-segmentation with cycle refinement and region correspondence,” IEEE Trans. Image Process., vol. 30, pp. 5652–5664, 2021.
[16]
K. Zhang, J. Chen, B. Liu, and Q. Liu, “Deep object co-segmentation via spatial-semantic network modulation,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12813–12820.
[17]
D.-P. Fan et al., “Re-thinking co-salient object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4339–4354, Aug. 2022.
[18]
B. Li, Z. Sun, L. Tang, Y. Sun, and J. Shi, “Detecting robust co-saliency with recurrent co-attention neural network,” in Proc. Int. Joint Conf. Artif. Intell., 2019, pp. 818–825.
[19]
Z. Bai, Z. Liu, G. Li, and Y. Wang, “Adaptive group-wise consistency network for co-saliency detection,” IEEE Trans. Multimedia, vol. 25, pp. 764–776, 2023.
[20]
D. Zhang, J. Han, C. Li, J. Wang, and X. Li, “Detection of co-salient objects by looking deep and wide,” Int. J. Comput. Vis., vol. 120, no. 2, pp. 215–232, 2016.
[21]
K. Zhang et al., “Adaptive graph convolutional network with attention graph clustering for co-saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9047–9056.
[22]
G.-P. Ji et al., “Full-duplex strategy for video object segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 4902–4913.
[23]
M. Xu et al., “Video salient object detection via robust seeds extraction and multi-graphs manifold propagation,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2191–2206, Jul. 2020.
[24]
A. Vaswani et al., “Attention is all you need,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2017, pp. 6000–6010.
[25]
C. Athanasiadis, E. Hortal, and S. Asteriadis, “Audio–visual domain adaptation using conditional semi-supervised generative adversarial networks,” Neurocomputing, vol. 397, pp. 331–344, 2020.
[26]
L. A. Fanzeres and C. Nadeu, “Sound-to-imagination: An exploratory study on unsupervised crossmodal translation using diverse audiovisual data,” Appl. Sci., vol. 13, no. 19, pp. 10833–10860, 2023.
[27]
L. Xu, W. Ouyang, M. Bennamoun, F. Boussaid, and D. Xu, “Multi-class token transformer for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 4300–4309.
[28]
N. Huang, Y. Yang, D. Zhang, Q. Zhang, and J. Han, “Employing bilinear fusion and saliency prior information for RGB-D salient object detection,” IEEE Trans. Multimedia, vol. 24, pp. 1651–1664, 2021.
[29]
N. Huang, Y. Liu, Q. Zhang, and J. Han, “Joint cross-modal and unimodal features for RGB-D salient object detection,” IEEE Trans. Multimedia, vol. 23, pp. 2428–2441, 2021.
[30]
L. Zhu et al., “S3Net: Self-supervised self-ensembling network for semi-supervised RGB-D salient object detection,” IEEE Trans. Multimedia, vol. 25, pp. 676–689, 2023.
[31]
Z. Tu et al., “RGB-T image saliency detection via collaborative graph learning,” IEEE Trans. Multimedia, vol. 22, no. 1, pp. 160–173, Jan. 2020.
[32]
H. Chen, Y. Li, and D. Su, “Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection,” Pattern Recognit., vol. 86, pp. 376–385, 2019.
[33]
H. Zhang, H. Zhang, C. Wang, and J. Xie, “Co-occurrent features in semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 548–557.
[34]
H. Chen et al., “BlendMask: Top-down meets bottom-up for instance segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 8570–8578.
[35]
D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time instance segmentation,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 9156–9165.
[36]
X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “SOLOv2: Dynamic and fast instance segmentation,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 17721–17732.
[37]
H. Liu et al., “TransIFC: Invariant cues-aware feature concentration learning for efficient fine-grained bird image classification,” IEEE Trans. Multimedia, early access, Jan. 20, 2023.
[38]
Z. Xie et al., “Self-supervised learning with swin transformers,” 2021, arXiv:2105.04553.
[39]
C.-F. R. Chen, Q. Fan, and R. Panda, “CrossViT: Cross-attention multi-scale vision transformer for image classification,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 347–356.
[40]
N. Carion et al., “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 213–229.
[41]
X. Zhu et al., “Deformable DETR: Deformable transformers for end-to-end object detection,” in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–11.
[42]
R. Guo, D. Niu, L. Qu, and Z. Li, “SOTR: Segmenting objects with transformers,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 7137–7146.
[43]
B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 1280–1289.
[44]
S. Jiayao, S. Zhou, Y. Cui, and Z. Fang, “Real-time 3D single object tracking with transformer,” IEEE Trans. Multimedia, vol. 25, pp. 2339–2353, 2023.
[45]
Y. Wang et al., “End-to-end video instance segmentation with transformers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8737–8746.
[46]
Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 9992–10002.
[47]
T.-Y. Lin et al., “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 936–944.
[48]
A. Kirillov, R. Girshick, K. He, and P. Dollár, “Panoptic feature pyramid networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 6392–6401.
[49]
K. Han et al., “Transformer in transformer,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 15908–15919.
[50]
P. Zhang et al., “Multi-scale vision longformer: A new vision transformer for high-resolution image encoding,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 2978–2988.
[51]
K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sep. 2015.
[52]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6230–6239.
[53]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018.
[54]
J. Liu, J. He, J. Zhang, J. S. Ren, and H. Li, “EfficientFCN: Holistically-guided decoding for semantic segmentation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 1–17.
[55]
S. Ren, D. Zhou, S. He, J. Feng, and X. Wang, “Shunted self-attention via multi-scale token aggregation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10843–10852.
[56]
T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
[57]
M. Everingham et al., “The Pascal visual object classes challenge: A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, 2015.
[58]
M. Rubinstein, A. Joulin, J. Kopf, and C. Liu, “Unsupervised joint object discovery and segmentation in internet images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 1939–1946.
[59]
D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, “iCoseg: Interactive co-segmentation with intelligent scribble guidance,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010, pp. 3169–3176.
[60]
D. Zhang, J. Han, C. Li, and J. Wang, “Co-saliency detection via looking deep and wide,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 2994–3002.
[61]
Z. Zhang, W. Jin, J. Xu, and M.-M. Cheng, “Gradient-induced co-saliency detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 455–472.
[62]
L. Wang et al., “Learning to detect salient objects with image-level supervision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3796–3805.
[63]
P. Ochs, J. Malik, and T. Brox, “Segmentation of moving objects by long term video analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 6, pp. 1187–1200, Jun. 2014.
[64]
F. Perazzi et al., “A benchmark dataset and evaluation methodology for video object segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 724–732.
[65]
F. Li, T. Kim, A. Humayun, D. Tsai, and J. M. Rehg, “Video segmentation by tracking many figure-ground segments,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2192–2199.
[66]
W. Wang, J. Shen, and L. Shao, “Consistent video saliency using local gradient flow optimization and global refinement,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 4185–4196, Nov. 2015.
[67]
G. Wang et al., “RGB-T saliency detection benchmark: Dataset, baselines, analysis and a novel approach,” in Proc. Chin. Conf. Image Graph. Technol., 2018, pp. 359–369.
[68]
Z. Tu et al., “RGBT salient object detection: A large-scale dataset and benchmark,” IEEE Trans. Multimedia, vol. 25, pp. 4163–4176, 2023.
[69]
R. Ju, L. Ge, W. Geng, T. Ren, and G. Wu, “Depth saliency based on anisotropic center-surround difference,” in Proc. IEEE Int. Conf. Image Process., 2014, pp. 1115–1119.
[70]
H. Peng, B. Li, W. Xiong, W. Hu, and R. Ji, “RGBD salient object detection: A benchmark and algorithms,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 92–109.
[71]
D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, and M.-M. Cheng, “Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 5, pp. 2075–2089, May 2021.
[72]
Y. Niu, Y. Geng, X. Li, and F. Liu, “Leveraging stereopsis for saliency analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 454–461.
[73]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015, pp. 1–14.
[74]
R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1597–1604.
[75]
D.-P. Fan et al., “Enhanced-alignment measure for binary foreground map evaluation,” in Proc. Int. Joint Conf. Artif. Intell., 2018, pp. 698–704.
[76]
D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4558–4567.
[77]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[78]
K. R. Jerripothula, J. Cai, J. Lu, and J. Yuan, “Object co-skeletonization with co-segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3881–3889.
[79]
K. R. Jerripothula, J. Cai, and J. Yuan, “Image co-segmentation via saliency co-fusion,” IEEE Trans. Multimedia, vol. 18, no. 9, pp. 1896–1909, Sep. 2016.
[80]
C. Wang, H. Zhang, L. Yang, X. Cao, and H. Xiong, “Multiple semantic matching on augmented $N$-partite graph for object co-segmentation,” IEEE Trans. Image Process., vol. 26, no. 12, pp. 5825–5839, Dec. 2017.
[81]
K.-J. Hsu, Y.-Y. Lin, and Y.-Y. Chuang, “Co-attention CNNs for unsupervised object co-segmentation,” in Proc. Int. Joint Conf. Artif. Intell., 2018, pp. 748–756.
[82]
B. Li, Z. Sun, Q. Li, Y. Wu, and A. Hu, “Group-wise deep object co-segmentation with co-attention recurrent neural network,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 8518–8527.
[83]
Y.-C. Chen, Y.-Y. Lin, M.-H. Yang, and J.-B. Huang, “Show, match and segment: Joint weakly supervised learning of semantic matching and object co-segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3632–3647, Oct. 2021.
[84]
X. Qin et al., “BASNet: Boundary-aware salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7471–7481.
[85]
J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling-based design for real-time salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3912–3921.
[86]
J.-X. Zhao et al., “EGNet: Edge guidance network for salient object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 8778–8787.
[87]
K. Zhang, T. Li, B. Liu, and Q. Liu, “Co-saliency detection via mask-guided fully convolutional networks with multi-scale label smoothing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3090–3099.
[88]
K. Zhang, M. M. Dong, B. Liu, X. Yuan, and Q. Liu, “DeepACG: Co-saliency detection via semantic-aware contrast Gromov-Wasserstein distance,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13698–13707.
[89]
Q. Fan et al., “Group collaborative learning for co-salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 12283–12293.
[90]
N. Zhang, J. Han, N. Liu, and L. Shao, “Summarize and search: Learning consensus-aware dynamic convolution for co-saliency detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 4147–4156.
[91]
T. Li et al., “Image co-saliency detection and instance co-segmentation using attention graph clustering based graph convolutional network,” IEEE Trans. Multimedia, vol. 24, pp. 492–505, 2022.
[92]
K. Zhang et al., “Deep object co-segmentation and co-saliency detection via high-order spatial-semantic network modulation,” IEEE Trans. Multimedia, vol. 25, pp. 5733–5746, 2023.
[93]
M. Everingham, L. V. Gool, C. K. Williams, J. Winn, and A. Zisserman, “The Pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.
[94]
Y. Chen et al., “SCOM: Spatiotemporal constrained optimization for salient object detection,” IEEE Trans. Image Process., vol. 27, no. 7, pp. 3345–3357, Jul. 2018.
[95]
S. Li, B. Seybold, A. Vorobyov, X. Lei, and C.-C. J. Kuo, “Unsupervised video object segmentation with motion-based bilateral networks,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 207–223.
[96]
R. Cong et al., “Video saliency detection via sparsity-based reconstruction and propagation,” IEEE Trans. Image Process., vol. 28, no. 10, pp. 4819–4831, Oct. 2019.
[97]
M. Xu, B. Liu, P. Fu, J. Li, and Y. H. Hu, “Video saliency detection via graph clustering with motion energy and spatiotemporal objectness,” IEEE Trans. Multimedia, vol. 21, no. 11, pp. 2790–2805, Nov. 2019.
[98]
C. Chen, G. Wang, C. Peng, X. Zhang, and H. Qin, “Improved robust video saliency detection based on long-term spatial-temporal information,” IEEE Trans. Image Process., vol. 29, pp. 1090–1100, 2020.
[99]
D.-P. Fan, W. Wang, M.-M. Cheng, and J. Shen, “Shifting more attention to video salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8546–8556.
[100]
P. Yan et al., “Semi-supervised video salient object detection using pseudo-labels,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 7283–7292.
[101]
Y. Ji, H. Zhang, Z. Jie, L. Ma, and Q. M. J. Wu, “CASNet: A cross-attention siamese network for video salient object detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 6, pp. 2676–2690, Jun. 2021.
[102]
Y. Gu et al., “Pyramid constrained self-attention network for fast video salient object detection,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 10869–10876.
[103]
M. Zhen et al., “Learning discriminative feature with CRF for unsupervised video object segmentation,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 445–462.
[104]
H. Park, J. Yoo, S. Jeong, G. Venkatesh, and N. Kwak, “Learning dynamic network using a reuse gate function in semi-supervised video object segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8401–8410.
[105]
S.-H. Gao et al., “Res2Net: A new multi-scale backbone architecture,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 652–662, Feb. 2021.
[106]
S. Mehta, M. Rastegari, L. Shapiro, and H. Hajishirzi, “ESPNetv2: A light-weight, power efficient, and general purpose convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9182–9192.
[107]
G. Wang et al., “RGB-T saliency detection benchmark: Dataset, baselines, analysis and a novel approach,” in Proc. Image Graph. Technol. Appl.: 13th Conf. Image Graph. Technol. Appl., 2018, pp. 359–369.
[108]
Z. Tu, T. Xia, C. Li, Y. Lu, and J. Tang, “M3S-NIR: Multi-modal multi-scale noise-insensitive ranking for RGB-T saliency detection,” in Proc. IEEE Conf. Multimedia Inf. Process. Retrieval, 2019, pp. 141–146.
[109]
Q. Chen et al., “RGB-D salient object detection via 3D convolutional neural networks,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 1063–1071.
[110]
Z. Chen, R. Cong, Q. Xu, and Q. Huang, “DPANet: Depth potentiality-aware gated attention network for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 30, pp. 7012–7024, 2021.
[111]
Z. Tu, Z. Li, C. Li, Y. Lang, and J. Tang, “Multi-interactive dual-decoder for RGB-thermal salient object detection,” IEEE Trans. Image Process., vol. 30, pp. 5678–5691, 2021.
[112]
W. Zhou, Y. Zhu, J. Lei, J. Wan, and L. Yu, “APNet: Adversarial learning assistance and perceived importance fusion network for all-day RGB-T salient object detection,” IEEE Trans. Emerg. Topics Comput. Intell., vol. 6, no. 4, pp. 957–968, Aug. 2022.
[113]
W. Zhou, Q. Guo, J. Lei, L. Yu, and J.-N. Hwang, “ECFFNet: Effective and consistent feature fusion network for RGB-T salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1224–1235, Mar. 2022.
[114]
F. Huo, X. Zhu, L. Zhang, Q. Liu, and Y. C. Shu, “Efficient context-guided stacked refinement network for RGB-T salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 3111–3124, May 2022.
[115]
J. Wang, K. Song, Y. Bao, L. Huang, and Y. Yan, “CGFNet: Cross-guided fusion network for RGB-T salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 5, pp. 2949–2961, May 2022.
[116]
Z. Tu, Z. Li, C. Li, and J. Tang, “Weakly alignment-free RGBT salient object detection with deep correlation network,” IEEE Trans. Image Process., vol. 31, pp. 3752–3764, 2022.
[117]
Y. Piao, Z. Rong, M. Zhang, W. Ren, and H. Lu, “A2dele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9057–9066.
[118]
M. Zhang, W. Ren, Y. Piao, Z. Rong, and H. Lu, “Select, supplement and focus for RGB-D saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3469–3478.
[119]
N. Liu, N. Zhang, and J. Han, “Learning selective self-mutual attention for RGB-D saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13756–13765.
[120]
G. Li, Z. Liu, L. Ye, Y. Wang, and H. Ling, “Cross-modal weighting network for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 665–681.
[121]
C. Li, R. Cong, Y. Piao, Q. Xu, and C. C. Loy, “RGB-D salient object detection with cross-modality modulation and selection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 225–241.
[122]
P. Sun, W. Zhang, H. Wang, S. Li, and X. Li, “Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 1407–1417.
[123]
W. Ji, J. Li, M. Zhang, Y. Piao, and H. Lu, “Accurate RGB-D salient object detection via collaborative learning,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 52–69.
[124]
D.-P. Fan, Y. Zhai, A. Borji, J. Yang, and L. Shao, “BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 275–292.
[125]
H. Wen et al., “Dynamic selective network for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 30, pp. 9179–9192, 2021.
[126]
Y. Pang, L. Zhang, X. Zhao, and H. Lu, “Hierarchical dynamic filtering network for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 235–252.
[127]
W. Zhou, Y. Zhu, J. Lei, J. Wan, and L. Yu, “CCAFNet: Crossflow and cross-scale adaptive fusion network for detecting salient objects in RGB-D images,” IEEE Trans. Multimedia, vol. 24, pp. 2192–2204, 2022.
[128]
S. Chen and Y. Fu, “Progressively guided alternate refinement network for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 520–538.
[129]
Q. Zhang, R. Cong, J. Hou, C. Li, and Y. Zhao, “CoADNet: Collaborative aggregation-and-distribution networks for co-salient object detection,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2020, pp. 6959–6970.
[130]
W. Ji et al., “Calibrated RGB-D salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9466–9476.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 26, Issue
2024
10405 pages

Publisher

IEEE Press

Publication History

Published: 26 February 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media