Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

Published: 04 June 2024 Publication History

Abstract

With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D video SOD (ViDSOD-100) dataset, which contains 100 videos within a total of 9362 frames, acquired from diverse natural scenes. All the frames in each video are manually annotated to a high-quality saliency annotation. Moreover, we propose a new baseline model, named attentive triple-fusion network (ATF-Net), for RGB-D video salient object detection. Our method aggregates the appearance information from an input RGB image, spatio-temporal information from an estimated motion map, and the geometry information from the depth map by devising three modality-specific branches and a multi-modality integration branch. The modality-specific branches extract the representation of different inputs, while the multi-modality integration branch combines the multi-level modality-specific features by introducing the encoder feature aggregation (MEA) modules and decoder feature aggregation (MDA) modules. The experimental findings conducted on both our newly introduced ViDSOD-100 dataset and the well-established DAVSOD dataset highlight the superior performance of the proposed ATF-Net.This performance enhancement is demonstrated both quantitatively and qualitatively, surpassing the capabilities of current state-of-the-art techniques across various domains, including RGB-D saliency detection, video saliency detection, and video object segmentation. We shall release our data, our results, and our code upon the publication of this work.

References

[1]
Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition (pp. 1597–1604).
[2]
Alpert S, Galun M, Brandt A, and Basri R Image segmentation by probabilistic bottom-up aggregation and cue integration IEEE Transactions on Pattern Analysis and Machine Intelligence 2011 34 2 315-327
[3]
Chen C, Li S, Wang Y, Qin H, and Hao A Video saliency detection via spatial–temporal fusion and low-rank coherency diffusion IEEE Transactions on Image Processing 2017 26 7 3156-3170
[4]
Chen, H., & Li, Y. (2018). Progressively complementarity-aware fusion network for RGB-D salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3051–3060).
[5]
Cheng, H. K. & Schwing, A. G. (2022). XMem: Long-term video object segmentation with an Atkinson–Shiffrin memory model. In ECCV.
[6]
Cheng HK, Tai YW, and Tang CK Ranzato M, Beygelzimer A, and Dauphin Y Rethinking space-time networks with improved memory coverage for efficient video object segmentation Advances in neural information processing systems 2021 Berlin Curran Associates, Inc. 11781-11794
[7]
Cheng MM, Mitra NJ, Huang X, Torr PH, and Hu SM Global contrast based salient region detection IEEE Transactions on Pattern Analysis and Machine Intelligence 2014 37 3 569-582
[8]
Cheng, Y., Fu, H., Wei, X., Xiao, J., & Cao, X. (2014b). Depth enhanced saliency detection method. In Proceedings of international conference on internet multimedia computing and service, ICIMCS ’14(pp. 23–27). Association for Computing Machinery.
[9]
Cho S, Lee H, Lee M, Park C, Jang S, Kim M, and Lee S Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T Tackling background distraction in video object segmentation Computer vision–ECCV 2022 2022 Springer 446-462
[10]
Ciptadi A, Hermans T, and Rehg JM An in depth view of saliency 2013 Georgia Institute of Technology
[11]
Cong R, Lei J, Fu H, Cheng MM, Lin W, and Huang Q Review of visual saliency detection with comprehensive information IEEE Transactions on Circuits and Systems for Video Technology 2019 29 10 2941-2959
[12]
Cong R, Lei J, Fu H, Porikli F, Huang Q, and Hou C Video saliency detection via sparsity-based reconstruction and propagation IEEE Transactions on Image Processing 2019 28 10 4819-4831
[13]
Cong R, Lin Q, Zhang C, Li C, Cao X, Huang Q, and Zhao Y CIR-Net: Cross-modality interaction and refinement for RGB-D salient object detection IEEE Transactions on Image Processing 2022 31 6800-6815
[14]
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.
[15]
Fan, D. P., Cheng, M. M., Liu, Y., Li, T., & Borji, A. (2017). Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision (ICCV).
[16]
Fan, D. P., Cheng, M. M., Liu, J. J., Gao, S. H., Hou, Q., & Borji, A. (2018). Salient objects in clutter: Bringing salient object detection to the foreground. In ECCV (pp. 186–202).
[17]
Fan, D. P., Wang, W., Cheng, M. M., & Shen, J. (2019). Shifting more attention to video salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR).
[18]
Fan DP, Lin Z, Zhang Z, Zhu M, and Cheng MM Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks IEEE Transactions on Neural Networks and Learning Systems 2020 32 5 2075-2089
[19]
Fan, D. P., Zhai, Y., Borji, A., Yang, J., & Shao, L. (2020b). BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In Computer Vision—ECCV (pp. 275–292).
[20]
Feng, C. M., Yan, Y., Fu, H., Chen, L., & Xu, Y. (2021). Task transformer network for joint MRI reconstruction and super-resolution. arXiv preprint arXiv:2106.06742
[21]
Fu, K., Fan, D. P., Ji, G. P., & Zhao, Q. (2020). JL-DCF: Joint learning and densely-cooperative fusion framework for RGB-D salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3052–3062).
[22]
Gao S, Cheng MM, Zhao K, Zhang XY, Yang MH, and Torr P Res2net: A new multi-scale backbone architecture IEEE Transactions on Pattern Analysis and Machine Intelligence 2019 43 652-662
[23]
Gu, Y., Wang, L., Wang, Z., Liu, Y., Cheng, M. M., & Lu, S. P. (2020). Pyramid constrained self-attention network for fast video salient object detection. In Proceedings of the AAAI conference on artificial intelligence (pp. 10869–10876).
[24]
He, K., Zhang, X., Ren, S., & Sun, J. (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). IEEE.
[25]
Ji, W., Li, J., Yu, S., Zhang, M., Piao, Y., Yao, S., Bi, Q., Ma, K., Zheng, Y., Lu, H., & Cheng, L. (2021). Calibrated RGB-D salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 9466–9476).
[26]
Ju, R., Ge, L., Geng, W., Ren, T., & Wu, G. (2014). Depth saliency based on anisotropic center-surround difference. In 2014 IEEE international conference on image processing (ICIP) (pp. 1115–1119).
[27]
Kwolek B and Kepski M Human fall detection on embedded platform using depth maps and wireless accelerometer Computer Methods and Programs in Biomedicine 2014 117 3 489-501
[28]
Lai, K., Bo, L., & Fox, D. (2014). Unsupervised feature learning for 3D scene labeling. In IEEE international conference on robotics and automation (ICRA) (pp. 3050–3057). IEEE.
[29]
Lee M, Park C, Cho S, and Lee S Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T SPSN: Superpixel prototype sampling network for RGB-D salient object detection Computer vision–ECCV 2022 2022 Springer 630-647
[30]
Li, F., Kim, T., Humayun, A., Tsai, D., & Rehg, J. M. (2013). Video segmentation by tracking many figure-ground segments. In Proceedings of the IEEE international conference on computer vision (ICCV).
[31]
Li, G., Xie, Y., Lin, L., & Yu, Y. (2017). Instance-level salient object segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 2386–2395).
[32]
Li, G., & Yu, Y. (2015). Visual saliency based on multiscale deep features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5455–5463)
[33]
Li, G., Liu, Z., Ye, L., Wang, Y., & Ling, H. (2020). Cross-modal weighting network for RGB-D salient object detection. In Computer vision—ECCV (pp. 665–681). Springer.
[34]
Li, H., Chen, G., Li, G., & Yu, Y. (2019). Motion guided attention for video salient object detection. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 7274–7283).
[35]
Li J, Xia C, and Chen X A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection IEEE Transactions on Image Processing 2018 27 1 349-364
[36]
Li, N., Ye, J., Ji, Y., Ling, H., & Yu, J. (2014a). Saliency detection on light field. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
[37]
Li, Y., Hou, X., Koch, C., Rehg, J. M., & Yuille, A. L. (2014b). The secrets of salient object segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 280–287).
[38]
Liu, N., Zhang, N., & Han, J. (2020). Learning selective self-mutual attention for RGB-D saliency detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 13753–13762).
[39]
Liu, N., Zhang, N., Wan, K., Shao, L., & Han, J. (2021a). Visual saliency transformer. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 4722–4732).
[40]
Liu N, Zhang N, Shao L, and Han J Learning selective mutual attention and contrast for RGB-D saliency detection IEEE Transactions on Pattern Analysis and Machine Intelligence 2022 44 12 9026-9042
[41]
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., & Shum, H. Y. (2010). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence,33(2), 353–367.
[42]
Liu Y, Yu R, Yin F, Zhao X, Zhao W, Xia W, and Yang Y Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T Learning quality-aware dynamic memory for video object segmentation Computer vision—ECCV 2022 2022 Springer 468-486
[43]
Liu Z, Shi S, Duan Q, Zhang W, and Zhao P Salient object detection for RGB-D image by single stream recurrent convolution neural network Neurocomputing 2019 363 46-57
[44]
Liu, Z., Wang, Y., Tu, Z., Xiao, Y., & Tang, B. (2021b). Tritransnet: RGB-D salient object detection with a triplet transformer embedding network. In Proceedings of the 29th ACM international conference on multimedia, MM ’21 (pp. 4481–4490). Association for Computing Machinery
[45]
Movahedi, V., Elder, J. H. (2010). Design and perceptual validation of performance measures for salient object segmentation. In 2010 IEEE computer society conference on computer vision and pattern recognition—Workshops (pp. 49–56).
[46]
Niu, Y., Geng, Y., Li, X., & Liu, F. (2012). Leveraging stereopsis for saliency analysis. In 2012 IEEE conference on computer vision and pattern recognition (pp. 454–461).
[47]
Oh, S. W., Lee, J. Y., Xu, N., & Kim, S. J. (2019). Video object segmentation using space-time memory networks. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 9225–9234).
[48]
Peng, H., Li, B., Xiong, W., Hu, W., & Ji, R. (2014). RGBD salient object detection: A benchmark and algorithms. In European conference on computer vision (pp. 92–109). Springer.
[49]
Perazzi, F., Krähenbühl, P., Pritch, Y., & Hornung, A. (2012). Saliency filters: Contrast based filtering for salient region detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 733–740).
[50]
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., & Sorkine-Hornung, A. (2016). A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
[51]
Piao, Y., Ji, W., Li, J., Zhang, M., & Lu, H. (2019). Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV).
[52]
Rajpal, A., Cheema, N., Illgner-Fehns, K., Slusallek, P., & Jaiswal, S. (2023). High-resolution synthetic RGB-D datasets for monocular depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1188–1198).
[53]
Ranftl, R., Bochkovskiy, A., & Koltun, V. (2021). Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 12179–12188).
[54]
Ren Q, Lu S, Zhang J, and Hu R Salient object detection by fusing local and global contexts IEEE Transactions on Multimedia 2021 23 1442-1453
[55]
Ren, S., Han, C., Yang, X., Han, G., & He, S. (2020). Tenet: Triple excitation network for video salient object detection. In Computer vision—ECCV (pp. 212–228). Springer.
[56]
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, and Berg AC Imagenet large scale visual recognition challenge International Journal of Computer Vision 2015 115 3 211-252
[57]
Shi J, Yan Q, Xu L, and Jia J Hierarchical image saliency detection on extended CSSD IEEE Transactions on Pattern Analysis and Machine Intelligence 2015 38 4 717-729
[58]
Song, S., & Xiao, J. (2013). Tracking revisited using RGBD camera: Unified benchmark and baselines. In Proceedings of the IEEE international conference on computer vision (ICCV).
[59]
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D slam systems. In Proceedings of the international conference on intelligent robot systems (IROS).
[60]
Su Y, Deng J, Sun R, Lin G, Su H, and Wu Q A unified transformer framework for group-based segmentation: Co-segmentation, co-saliency detection and video salient object detection IEEE Transactions on Multimedia 2023
[61]
Teed, Z., Deng, J. (2020). Raft: Recurrent all-pairs field transforms for optical flow. In Computer vision—ECCV 2020 (pp. 402–419). Springer.
[62]
Wang, F., Hauser, K. (2019). In-hand object scanning via RGB-D video segmentation. In International conference on robotics and automation (ICRA) (pp. 3296–3302). IEEE.
[63]
Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., & Ruan, X. (2017a). Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 136–145).
[64]
Wang, W., Shen, J., & Porikli, F. (2015a). Saliency-aware geodesic video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
[65]
Wang W, Shen J, and Shao L Consistent video saliency using local gradient flow optimization and global refinement IEEE Transactions on Image Processing 2015 24 11 4185-4196
[66]
Wang W, Shen J, and Shao L Video salient object detection via fully convolutional networks IEEE Transactions on Image Processing 2017 27 1 38-49
[67]
Wang, Y., Wang, R., Fan, X., Wang, T., & He, X. (2023). Pixels, regions, and objects: Multiple enhancement for salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 10031–10040).
[68]
Wei, J., Wang, S., Huang, Q. (2020). F3net: Fusion, feedback and focus for salient object detection. In Proceedings of the AAAI conference on artificial intelligence (pp. 12321–12328).
[69]
Xia, C., Li, J., Chen, X., Zheng, A., & Zhang, Y. (2017). What is and what is not a salient object? Learning salient object detector by ensembling linear exemplar regressors. In Proceedings of the IEEE/CVF conference on computer vision and pattern Recognition (CVPR) (pp. 4142–4150).
[70]
Yan, P., Li, G., Xie, Y., Li, Z., Wang, C., Chen, T., & Lin, L. (2019). Semi-supervised video salient object detection using pseudo-labels. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV).
[71]
Zeng, Y., Zhang, P., Zhang, J., Lin, Z., & Lu, H. (2019). Towards high-resolution salient object detection. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 7234–7243).
[72]
Zhai, Y., Fan, D. P., Yang, J., Borji, A., Shao, L., Han, J., & Wang, L. (2021). Bifurcated backbone strategy for RGB-D salient object detection. IEEE Transactions on Image Processing,30, 8727–8742.
[73]
Zhang, J., Fan, D. P., Dai, Y., Anwar, S., Saleh, F. S., Zhang, T., & Barnes, N. (2020a). UC-Net: Uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8579–8588).
[74]
Zhang, J., Fan, D. P., Dai, Y., Yu, X., Zhong, Y., Barnes, N., & Shao, L. (2021a). RGB-D saliency detection via cascaded mutual information minimization. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 4338–4347).
[75]
Zhang, J., Fan, D. P., Dai, Y., Yu, X., Zhong, Y., Barnes, N., & Shao, L. (2021b). RGB-D saliency detection via cascaded mutual information minimization. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 4338–4347).
[76]
Zhang, J., Ma, S., Sameki, M., Sclaroff, S., Betke, M., Lin, Z., Shen, X., Price, B., & Mech, R. (2015). Salient object subitizing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4045–4054).
[77]
Zhang, L., Zhang, J., Lin, Z., Lu, H., & He, Y. (2019) Capsal: Leveraging captioning to boost semantics for salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 6024–6033).
[78]
Zhang, M., Liu, J., Wang, Y., Piao, Y., Yao, S., Ji, W., Li, J., Lu, H., & Luo, Z. (2021c). Dynamic context-sensitive filtering network for video salient object detection. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 1553–1563).
[79]
Zhang, M., Ren, W., Piao, Y., Rong, Z., & Lu, H. (2020b). Select, supplement and focus for RGB-D saliency detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR).
[80]
Zhao, J. X., Liu, J. J., Fan, D. P., Cao, Y., Yang, J., & Cheng, M. M. (2019). EGNet: Edge guidance network for salient object detection. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV).
[81]
Zhao, W., Zhang, J., Li, L., Barnes, N., Liu, N., & Han, J. (2021). Weakly supervised video salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 16826–16835).
[82]
Zhou, T., Fu, H., Chen, G., Zhou, Y., Fan, D. P., & Shao, L. (2021). Specificity-preserving RGB-D saliency detection. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 4681–4691).
[83]
Zhu, C., Li, G. (2017). A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE international conference on computer vision (ICCV) workshops.
[84]
Zhuge, M., Fan, D. P., Liu, N., Zhang, D., Xu, D., & Shao, L. (2023). Salient object detection via integrity learning. IEEE Transactions on Pattern Analysis and Machine Intelligence,45(3), 3738–3752.

Index Terms

  1. ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image International Journal of Computer Vision
            International Journal of Computer Vision  Volume 132, Issue 11
            Nov 2024
            668 pages

            Publisher

            Kluwer Academic Publishers

            United States

            Publication History

            Published: 04 June 2024
            Accepted: 08 March 2024
            Received: 01 April 2023

            Author Tags

            1. RGB-D video dataset
            2. Neural networks
            3. Salient object detection

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 26 Dec 2024

            Other Metrics

            Citations

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media