Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Unified Transformer Framework for Group-Based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection

Published: 05 April 2023 Publication History

Abstract

Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world. In the computer vision area, many researchers focus on co-segmentation (CoS), co-saliency detection (CoSD) and video salient object detection (VSOD) to discover the co-occurrent objects. However, previous approaches design different networks for these similar tasks separately, and they are difficult to apply to each other. Besides, they fail to take full advantage of the cues among inter- and intra-feature within a group of images. In this paper, we introduce a unified framework to tackle these issues from a unified view, term as <bold>UFGS</bold> (<bold>U</bold>nified <bold>F</bold>ramework for <bold>G</bold>roup-based <bold>S</bold>egmentation). Specifically, we first introduce a transformer block, which views the image feature as a patch token and then captures their long-range dependencies through the self-attention mechanism. This can help the network to excavate the patch-structured similarities among the relevant objects. Furthermore, we propose an intra-MLP learning module to produce self-mask to enhance the network to avoid partial activation. Extensive experiments on four CoS benchmarks (PASCAL, iCoseg Internet and MSRC), three CoSD benchmarks (Cosal2015, CoSOD3k, and CocA) and five VSOD benchmarks (DAVIS<inline-formula><tex-math notation="LaTeX">$_{16}$</tex-math></inline-formula>, FBMS, ViSal, SegV2, and DAVSOD) show that our method outperforms other state-of-the-arts on three different tasks in both accuracy and speed by using the same network architecture, which can reach 140 FPS in real-time.

References

[1]
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018.
[2]
L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 801–818.
[3]
Y. Su, R. Sun, G. Lin, and Q. Wu, “Context decoupling augmentation for weakly supervised semantic segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 7004–7014.
[4]
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” 2020, arXiv:2004.10934.
[5]
H. Law and J. Deng, “CornerNet: Detecting objects as paired keypoints,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 734–750.
[6]
K. Zhang, J. Chen, B. Liu, and Q. Liu, “Deep object co-segmentation via spatial-semantic network modulation,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12813–12820.
[7]
C. Zhang, G. Li, G. Lin, Q. Wu, and R. Yao, “CycleSegNet: Object co-segmentation with cycle refinement and region correspondence,” IEEE Trans. Image Process., vol. 30, pp. 5652–5664, 2021.
[8]
K. Zhang, M. Dong, B. Liu, X.-T. Yuan, and Q. Liu, “DeepACG: Co-saliency detection via semantic-aware contrast Gromov-Wasserstein distance,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13703–13712.
[9]
Q. Fan et al., “Group collaborative learning for co-salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 12288–12298.
[10]
Y. Gu et al., “Pyramid constrained self-attention network for fast video salient object detection,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 10869–10876.
[11]
G.-P. Ji et al., “Full-duplex strategy for video object segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 4922–4933.
[12]
B. Li, Z. Sun, Q. Li, Y. Wu, and A. Hu, “Group-wise deep object co-segmentation with co-attention recurrent neural network,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8519–8528.
[13]
N. Zhang, J. Han, N. Liu, and L. Shao, “Summarize and search: Learning consensus-aware dynamic convolution for co-saliency detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 4167–4176.
[14]
T.-Y. Linet al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., Springer, 2014, pp. 740–755.
[15]
L. Wang et al., “Learning to detect salient objects with image-level supervision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 136–145.
[16]
J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, no. 1, pp. 23–79, 2021.
[17]
Y. Liu, L. Zhu, M. Yamada, and Y. Yang, “Semantic correspondence as an optimal transport problem,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4463–4472.
[18]
K. Zhang et al., “Adaptive graph convolutional network with attention graph clustering for co-saliency detection,” in Proc. IEEE/CVF Conf. Comput. Vis. pattern Recognit., 2020, pp. 9050–9059.
[19]
U. V. Luxburg, “A tutorial on spectral clustering,” Statist. Comput., vol. 17, no. 4, pp. 395–416, 2007.
[20]
C. Zhang, Y. Cai, G. Lin, and C. Shen, “DeepEMD: Few-shot image classification with differentiable earth mover's distance and structured classifiers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 12203–12213.
[21]
J. Solomon, G. Peyré, V. G. Kim, and S. Sra, “Entropic metric alignment for correspondence problems,” ACM Trans. Graph., vol. 35, no. 4, pp. 1–13, 2016.
[22]
Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover's distance as a metric for image retrieval,” Int. J. Comput. Vis., vol. 40, no. 2, pp. 99–121, 2000.
[23]
M. Xu, B. Liu, P. Fu, J. Li, and Y. H. Hu, “Video saliency detection via graph clustering with motion energy and spatiotemporal objectness,” IEEE Trans. Multimedia, vol. 21, no. 11, pp. 2790–2805, Nov. 2019.
[24]
E. Ilg et al., “FlowNet 2.0: Evolution of optical flow estimation with deep networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2462–2470.
[25]
H. Fu, X. Cao, and Z. Tu, “Cluster-based co-saliency detection,” IEEE Trans. Image Process., vol. 22, no. 10, pp. 3766–3778, Oct. 2013.
[26]
W. Wang, X. Lu, J. Shen, D. J. Crandall, and L. Shao, “Zero-shot video object segmentation via attentive graph neural networks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9236–9245.
[27]
A. Dosovitskiyet al., “An image is worth 16 × 16 words: Transformers for image recognition at scale,” 2020, arXiv:2010.11929.
[28]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2921–2929.
[29]
C. Rother, T. Minka, A. Blake, and V. Kolmogorov, “Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2006, pp. 993–1000.
[30]
D. S. Hochbaum and V. Singh, “An efficient algorithm for co-segmentation,” in Proc. IEEE 12th Int. Conf. Comput. Vis., 2009, pp. 269–276.
[31]
M. Rubinstein, A. Joulin, J. Kopf, and C. Liu, “Unsupervised joint object discovery and segmentation in internet images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 1939–1946.
[32]
H. Chen, Y. Huang, and H. Nakayama, “Semantic aware attention based deep object co-segmentation,” in Proc. Asian Conf. Comput. Vis., Springer, 2018, pp. 435–450.
[33]
W. Li, O. Hosseini Jafari, and C. Rother, “Deep object co-segmentation,” in Proc. Asian Conf. Comput. Vis., 2018, pp. 638–653.
[34]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[35]
Y.-C. Chen, Y.-Y. Lin, M.-H. Yang, and J.-B. Huang, “Show, match and segment: Joint weakly supervised learning of semantic matching and object co-segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3632–3647, Oct. 2021.
[36]
X. Lu, W. Wang, J. Shen, D. Crandall, and L. Van Gool, “Segmenting objects from relational visual data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7885–7897, Nov. 2022.
[37]
Z. Zhang, W. Jin, J. Xu, and M.-M. Cheng, “Gradient-induced co-saliency detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 455–472.
[38]
H. Li, F. Meng, and K. N. Ngan, “Co-salient object detection from multiple images,” IEEE Trans. Multimedia, vol. 15, no. 8, pp. 1896–1909, Dec. 2013.
[39]
H. Song, Z. Liu, Y. Xie, L. Wu, and M. Huang, “RGBD co-saliency detection via bagging-based clustering,” IEEE Signal Process. Lett., vol. 23, no. 12, pp. 1722–1726, Dec. 2016.
[40]
K. R. Jerripothula, J. Cai, and J. Yuan, “CATS: Co-saliency activated tracklet selection for video co-localization,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 187–202.
[41]
C. Wang, Z.-J. Zha, D. Liu, and H. Xie, “Robust deep co-saliency detection with group semantic,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 8917–8924.
[42]
B. Jiang, X. Jiang, A. Zhou, J. Tang, and B. Luo, “A unified multiple graph learning and convolutional network model for co-saliency estimation,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 1375–1382.
[43]
F. Mémoli, “The Gromov–Wasserstein distance: A brief overview,” Axioms, vol. 3, no. 3, pp. 335–341, 2014.
[44]
Y. Wei, F. Wen, W. Zhu, and J. Sun, “Geodesic saliency using background priors,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 29–42.
[45]
W. Wang, J. Shen, R. Yang, and F. Porikli, “Saliency-aware video object segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 1, pp. 20–33, Jan. 2018.
[46]
M. Xu et al., “Video salient object detection via robust seeds extraction and multi-graphs manifold propagation,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2191–2206, Jul. 2020.
[47]
X. Zhou, Z. Liu, C. Gong, and W. Liu, “Improving video saliency detection via localized estimation and spatiotemporal refinement,” IEEE Trans. Multimedia, vol. 20, no. 11, pp. 2993–3007, Nov. 2018.
[48]
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4489–4497.
[49]
S. Mahadevan et al., “Making a case for 3D convolutions for object segmentation in videos,” 2020, arXiv:2008.11516.
[50]
P. Yan et al., “Semi-supervised video salient object detection using pseudo-labels,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7284–7293.
[51]
X. Lu et al., “See more, know more: Unsupervised video object segmentation with co-attention siamese networks,” in Proc. IEEE/CVF Conf. Comput. Vis. pattern Recognit., 2019, pp. 3623–3632.
[52]
D.-P. Fan, W. Wang, M.-M. Cheng, and J. Shen, “Shifting more attention to video salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8554–8564.
[53]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.
[54]
T.-Y. Lin et al., “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2117–2125.
[55]
J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016, arXiv:1607.06450.
[56]
A. Goyal and Y. Bengio, “Inductive biases for deep learning of higher-level cognition,” Proc. Royal Soc. A, vol. 478, no. 2266, 2020, Art. no.
[57]
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 652–660.
[58]
T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2337–2346.
[59]
X. Qin et al., “BASNet: Boundary-aware salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7479–7489.
[60]
M. Everingham et al., “The pascal visual object classes challenge: A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, 2015.
[61]
D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, “iCoseg: Interactive co-segmentation with intelligent scribble guidance,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2010, pp. 3169–3176.
[62]
J. Shotton, J. Winn, C. Rother, and A. Criminisi, “TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 1–15.
[63]
D. Zhang, J. Han, C. Li, and J. Wang, “Co-saliency detection via looking deep and wide,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 2994–3002.
[64]
D.-P. Fan et al., “Re-thinking co-salient object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4339–4354, Aug. 2022.
[65]
F. Perazzi et al., “A benchmark dataset and evaluation methodology for video object segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 724–732.
[66]
P. Ochs, J. Malik, and T. Brox, “Segmentation of moving objects by long term video analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 6, pp. 1187–1200, Jun. 2014.
[67]
W. Wang, J. Shen, and L. Shao, “Consistent video saliency using local gradient flow optimization and global refinement,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 4185–4196, Nov. 2015.
[68]
F. Li, T. Kim, A. Humayun, D. Tsai, and J. M. Rehg, “Video segmentation by tracking many figure-ground segments,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2192–2199.
[69]
F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 733–740.
[70]
R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1597–1604.
[71]
D.-P. Fan et al., “Enhanced-alignment measure for binary foreground map evaluation,” 2018, arXiv:1805.10421.
[72]
D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4548–4557.
[73]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.
[74]
M. Zhen et al., “Learning discriminative feature with CRF for unsupervised video object segmentation,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 445–462.
[75]
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.
[76]
K. R. Jerripothula, J. Cai, and J. Yuan, “Image co-segmentation via saliency co-fusion,” IEEE Trans. Multimedia, vol. 18, no. 9, pp. 1896–1909, Sep. 2016.
[77]
K. R. Jerripothula, J. Cai, J. Lu, and J. Yuan, “Object co-skeletonization with co-segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3881–3889.
[78]
C. Wang, H. Zhang, L. Yang, X. Cao, and H. Xiong, “Multiple semantic matching on augmented $ n$-partite graph for object co-segmentation,” IEEE Trans. Image Process., vol. 26, no. 12, pp. 5825–5839, Dec. 2017.
[79]
K.-J. Hsu et al., “Co-attention CNNs for unsupervised object co-segmentation,” in Proc. 27th Int. Joint Conf. Artif. Intell., 2018, pp. 748–756.
[80]
J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling-based design for real-time salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3917–3926.
[81]
J.-X. Zhao et al., “EGNet: Edge guidance network for salient object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8779–8788.
[82]
Z. Wu, L. Su, and Q. Huang, “Stacked cross refinement network for edge-aware salient object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7264–7273.
[83]
B. Li, Z. Sun, L. Tang, Y. Sun, and J. Shi, “Detecting robust co-saliency with recurrent co-attention neural network,” in Proc. 28th Int. Joint Conf. Artif. Intell., 2019, pp. 818–825.
[84]
K. Zhang, T. Li, B. Liu, and Q. Liu, “Co-saliency detection via mask-guided fully convolutional networks with multi-scale label smoothing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3095–3104.
[85]
T. Li et al., “Image co-saliency detection and instance co-segmentation using attention graph clustering based graph convolutional network,” IEEE Trans. Multimedia, vol. 24, pp. 492–505, 2021.
[86]
X. Qian, Y. Zeng, W. Wang, and Q. Zhang, “Co-saliency detection guided by group weakly supervised learning,” IEEE Trans. Multimedia, early access, Apr. 10, 2022.
[87]
Y. Chen et al., “SCOM: Spatiotemporal constrained optimization for salient object detection,” IEEE Trans. Image Process., vol. 27, no. 7, pp. 3345–3357, Jul. 2018.
[88]
G. Li and Y. Yu, “Deep contrast learning for salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 478–487.
[89]
S. Li, B. Seybold, A. Vorobyov, X. Lei, and C.-C. J. Kuo, “Unsupervised video object segmentation with motion-based bilateral networks,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 207–223.
[90]
H. Song, W. Wang, S. Zhao, J. Shen, and K.-M. Lam, “Pyramid dilated deeper convLSTM for video salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 715–731.
[91]
R. Cong et al., “Video saliency detection via sparsity-based reconstruction and propagation,” IEEE Trans. Image Process., vol. 28, no. 10, pp. 4819–4831, Oct. 2019.
[92]
C. Chen, G. Wang, C. Peng, X. Zhang, and H. Qin, “Improved robust video saliency detection based on long-term spatial-temporal information,” IEEE Trans. Image Process., vol. 29, pp. 1090–1100, 2019.
[93]
Y. Ji, H. Zhang, Z. Jie, L. Ma, and Q. J. Wu, “CASNet: A cross-attention siamese network for video salient object detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 6, pp. 2676–2690, Jun. 2021.
[94]
H. Park, J. Yoo, S. Jeong, G. Venkatesh, and N. Kwak, “Learning dynamic network using a reuse gate function in semi-supervised video object segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8405–8414.
[95]
Y. Kong, Y. Wang, A. Li, and Q. Huang, “Self-sufficient feature enhancing networks for video salient object detection,” IEEE Trans. Multimedia, vol. 25, pp. 557–571, 2021.
[96]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[97]
A. Howard et al., “Searching for MobileNetv3,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 1314–1324.
[98]
J. Pont-Tuset et al., “The 2017 DAVIS challenge on video object segmentation,” 2017, arXiv:1704.00675.
[99]
N. Xu et al., “YouTube-VOS: A large-scale video object segmentation benchmark,” 2018, arXiv:1809.03327.
[100]
T. Liu et al., “Learning to detect a salient object,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 353–367, Feb. 2011.
[101]
Z. Wang, X. Yan, Y. Han, and M. Sun, “Ranking video salient object detection,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 873–881.
[102]
H. Li, G. Chen, G. Li, and Y. Yu, “Motion guided attention for video salient object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7274–7283.
[103]
K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5693–5703.
[104]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD birds-200-2011 dataset,” Tech. Rep., 2010-001, California Inst. Technol., Pasadena, CA, USA, 2011.
[105]
K. Tang, A. Joulin, L.-J. Li, and L. Fei-Fei, “Co-localization in real-world images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1464–1471.
[106]
R. Liu et al., “FuseFormer: Fusing fine-grained information in transformers for video inpainting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 14040–14049.
[107]
G. Bradski, “The opencv library,” Dr Dobb's J.: Softw. Tools Professional Programmer, vol. 25, no. 11, pp. 120–123, 2000.

Cited By

View all
  • (2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
  • (2024)ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object DetectionInternational Journal of Computer Vision10.1007/s11263-024-02051-5132:11(5173-5191)Online publication date: 1-Nov-2024
  • (2024)Self-supervised Co-salient Object Detection via Feature Correspondences at Multiple ScalesComputer Vision – ECCV 202410.1007/978-3-031-72673-6_13(231-250)Online publication date: 29-Sep-2024
  • Show More Cited By

Index Terms

  1. A Unified Transformer Framework for Group-Based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Multimedia
          IEEE Transactions on Multimedia  Volume 26, Issue
          2024
          10405 pages

          Publisher

          IEEE Press

          Publication History

          Published: 05 April 2023

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 17 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
          • (2024)ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object DetectionInternational Journal of Computer Vision10.1007/s11263-024-02051-5132:11(5173-5191)Online publication date: 1-Nov-2024
          • (2024)Self-supervised Co-salient Object Detection via Feature Correspondences at Multiple ScalesComputer Vision – ECCV 202410.1007/978-3-031-72673-6_13(231-250)Online publication date: 29-Sep-2024
          • (2023)Co-Salient Object Detection with Semantic-Level Consensus Extraction and DispersionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612133(2744-2755)Online publication date: 26-Oct-2023
          • (2023)Multi-View Graph Embedding Learning for Image Co-Segmentation and Co-LocalizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.333918134:6(4942-4956)Online publication date: 4-Dec-2023

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media