research-article

A Unified Transformer Framework for Group-Based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection

Authors:

Jingliang Deng,

Qingyao WuAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 26

Pages 313 - 325

https://doi.org/10.1109/TMM.2023.3264883

Published: 05 April 2023 Publication History

Abstract

Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world. In the computer vision area, many researchers focus on co-segmentation (CoS), co-saliency detection (CoSD) and video salient object detection (VSOD) to discover the co-occurrent objects. However, previous approaches design different networks for these similar tasks separately, and they are difficult to apply to each other. Besides, they fail to take full advantage of the cues among inter- and intra-feature within a group of images. In this paper, we introduce a unified framework to tackle these issues from a unified view, term as <bold>UFGS</bold> (<bold>U</bold>nified <bold>F</bold>ramework for <bold>G</bold>roup-based <bold>S</bold>egmentation). Specifically, we first introduce a transformer block, which views the image feature as a patch token and then captures their long-range dependencies through the self-attention mechanism. This can help the network to excavate the patch-structured similarities among the relevant objects. Furthermore, we propose an intra-MLP learning module to produce self-mask to enhance the network to avoid partial activation. Extensive experiments on four CoS benchmarks (PASCAL, iCoseg Internet and MSRC), three CoSD benchmarks (Cosal2015, CoSOD3k, and CocA) and five VSOD benchmarks (DAVIS<inline-formula><tex-math notation="LaTeX">$_{16}$</tex-math></inline-formula>, FBMS, ViSal, SegV2, and DAVSOD) show that our method outperforms other state-of-the-arts on three different tasks in both accuracy and speed by using the same network architecture, which can reach 140 FPS in real-time.

References

[1]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018.

[2]

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 801–818.

[3]

Y. Su, R. Sun, G. Lin, and Q. Wu, “Context decoupling augmentation for weakly supervised semantic segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 7004–7014.

[4]

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” 2020, arXiv:2004.10934.

[5]

H. Law and J. Deng, “CornerNet: Detecting objects as paired keypoints,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 734–750.

[6]

K. Zhang, J. Chen, B. Liu, and Q. Liu, “Deep object co-segmentation via spatial-semantic network modulation,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 12813–12820.

[7]

C. Zhang, G. Li, G. Lin, Q. Wu, and R. Yao, “CycleSegNet: Object co-segmentation with cycle refinement and region correspondence,” IEEE Trans. Image Process., vol. 30, pp. 5652–5664, 2021.

[8]

K. Zhang, M. Dong, B. Liu, X.-T. Yuan, and Q. Liu, “DeepACG: Co-saliency detection via semantic-aware contrast Gromov-Wasserstein distance,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13703–13712.

[9]

Q. Fan et al., “Group collaborative learning for co-salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 12288–12298.

[10]

Y. Gu et al., “Pyramid constrained self-attention network for fast video salient object detection,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 10869–10876.

[11]

G.-P. Ji et al., “Full-duplex strategy for video object segmentation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 4922–4933.

[12]

B. Li, Z. Sun, Q. Li, Y. Wu, and A. Hu, “Group-wise deep object co-segmentation with co-attention recurrent neural network,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8519–8528.

[13]

N. Zhang, J. Han, N. Liu, and L. Shao, “Summarize and search: Learning consensus-aware dynamic convolution for co-saliency detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 4167–4176.

[14]

T.-Y. Linet al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., Springer, 2014, pp. 740–755.

[15]

L. Wang et al., “Learning to detect salient objects with image-level supervision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 136–145.

[16]

J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, no. 1, pp. 23–79, 2021.

Digital Library

[17]

Y. Liu, L. Zhu, M. Yamada, and Y. Yang, “Semantic correspondence as an optimal transport problem,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4463–4472.

[18]

K. Zhang et al., “Adaptive graph convolutional network with attention graph clustering for co-saliency detection,” in Proc. IEEE/CVF Conf. Comput. Vis. pattern Recognit., 2020, pp. 9050–9059.

[19]

U. V. Luxburg, “A tutorial on spectral clustering,” Statist. Comput., vol. 17, no. 4, pp. 395–416, 2007.

Digital Library

[20]

C. Zhang, Y. Cai, G. Lin, and C. Shen, “DeepEMD: Few-shot image classification with differentiable earth mover's distance and structured classifiers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 12203–12213.

[21]

J. Solomon, G. Peyré, V. G. Kim, and S. Sra, “Entropic metric alignment for correspondence problems,” ACM Trans. Graph., vol. 35, no. 4, pp. 1–13, 2016.

Digital Library

[22]

Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover's distance as a metric for image retrieval,” Int. J. Comput. Vis., vol. 40, no. 2, pp. 99–121, 2000.

Digital Library

[23]

M. Xu, B. Liu, P. Fu, J. Li, and Y. H. Hu, “Video saliency detection via graph clustering with motion energy and spatiotemporal objectness,” IEEE Trans. Multimedia, vol. 21, no. 11, pp. 2790–2805, Nov. 2019.

[24]

E. Ilg et al., “FlowNet 2.0: Evolution of optical flow estimation with deep networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2462–2470.

[25]

H. Fu, X. Cao, and Z. Tu, “Cluster-based co-saliency detection,” IEEE Trans. Image Process., vol. 22, no. 10, pp. 3766–3778, Oct. 2013.

Digital Library

[26]

W. Wang, X. Lu, J. Shen, D. J. Crandall, and L. Shao, “Zero-shot video object segmentation via attentive graph neural networks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9236–9245.

[27]

A. Dosovitskiyet al., “An image is worth 16 × 16 words: Transformers for image recognition at scale,” 2020, arXiv:2010.11929.

[28]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2921–2929.

[29]

C. Rother, T. Minka, A. Blake, and V. Kolmogorov, “Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2006, pp. 993–1000.

[30]

D. S. Hochbaum and V. Singh, “An efficient algorithm for co-segmentation,” in Proc. IEEE 12th Int. Conf. Comput. Vis., 2009, pp. 269–276.

[31]

M. Rubinstein, A. Joulin, J. Kopf, and C. Liu, “Unsupervised joint object discovery and segmentation in internet images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 1939–1946.

[32]

H. Chen, Y. Huang, and H. Nakayama, “Semantic aware attention based deep object co-segmentation,” in Proc. Asian Conf. Comput. Vis., Springer, 2018, pp. 435–450.

[33]

W. Li, O. Hosseini Jafari, and C. Rother, “Deep object co-segmentation,” in Proc. Asian Conf. Comput. Vis., 2018, pp. 638–653.

[34]

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

Digital Library

[35]

Y.-C. Chen, Y.-Y. Lin, M.-H. Yang, and J.-B. Huang, “Show, match and segment: Joint weakly supervised learning of semantic matching and object co-segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3632–3647, Oct. 2021.

[36]

X. Lu, W. Wang, J. Shen, D. Crandall, and L. Van Gool, “Segmenting objects from relational visual data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7885–7897, Nov. 2022.

[37]

Z. Zhang, W. Jin, J. Xu, and M.-M. Cheng, “Gradient-induced co-saliency detection,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 455–472.

[38]

H. Li, F. Meng, and K. N. Ngan, “Co-salient object detection from multiple images,” IEEE Trans. Multimedia, vol. 15, no. 8, pp. 1896–1909, Dec. 2013.

Digital Library

[39]

H. Song, Z. Liu, Y. Xie, L. Wu, and M. Huang, “RGBD co-saliency detection via bagging-based clustering,” IEEE Signal Process. Lett., vol. 23, no. 12, pp. 1722–1726, Dec. 2016.

[40]

K. R. Jerripothula, J. Cai, and J. Yuan, “CATS: Co-saliency activated tracklet selection for video co-localization,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 187–202.

[41]

C. Wang, Z.-J. Zha, D. Liu, and H. Xie, “Robust deep co-saliency detection with group semantic,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 8917–8924.

[42]

B. Jiang, X. Jiang, A. Zhou, J. Tang, and B. Luo, “A unified multiple graph learning and convolutional network model for co-saliency estimation,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 1375–1382.

Digital Library

[43]

F. Mémoli, “The Gromov–Wasserstein distance: A brief overview,” Axioms, vol. 3, no. 3, pp. 335–341, 2014.

[44]

Y. Wei, F. Wen, W. Zhu, and J. Sun, “Geodesic saliency using background priors,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 29–42.

[45]

W. Wang, J. Shen, R. Yang, and F. Porikli, “Saliency-aware video object segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 1, pp. 20–33, Jan. 2018.

[46]

M. Xu et al., “Video salient object detection via robust seeds extraction and multi-graphs manifold propagation,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 7, pp. 2191–2206, Jul. 2020.

Digital Library

[47]

X. Zhou, Z. Liu, C. Gong, and W. Liu, “Improving video saliency detection via localized estimation and spatiotemporal refinement,” IEEE Trans. Multimedia, vol. 20, no. 11, pp. 2993–3007, Nov. 2018.

Digital Library

[48]

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4489–4497.

[49]

S. Mahadevan et al., “Making a case for 3D convolutions for object segmentation in videos,” 2020, arXiv:2008.11516.

[50]

P. Yan et al., “Semi-supervised video salient object detection using pseudo-labels,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7284–7293.

[51]

X. Lu et al., “See more, know more: Unsupervised video object segmentation with co-attention siamese networks,” in Proc. IEEE/CVF Conf. Comput. Vis. pattern Recognit., 2019, pp. 3623–3632.

[52]

D.-P. Fan, W. Wang, M.-M. Cheng, and J. Shen, “Shifting more attention to video salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 8554–8564.

[53]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.

[54]

T.-Y. Lin et al., “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2117–2125.

[55]

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” 2016, arXiv:1607.06450.

[56]

A. Goyal and Y. Bengio, “Inductive biases for deep learning of higher-level cognition,” Proc. Royal Soc. A, vol. 478, no. 2266, 2020, Art. no.

[57]

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 652–660.

[58]

T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2337–2346.

[59]

X. Qin et al., “BASNet: Boundary-aware salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7479–7489.

[60]

M. Everingham et al., “The pascal visual object classes challenge: A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, 2015.

Digital Library

[61]

D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, “iCoseg: Interactive co-segmentation with intelligent scribble guidance,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2010, pp. 3169–3176.

[62]

J. Shotton, J. Winn, C. Rother, and A. Criminisi, “TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 1–15.

[63]

D. Zhang, J. Han, C. Li, and J. Wang, “Co-saliency detection via looking deep and wide,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 2994–3002.

[64]

D.-P. Fan et al., “Re-thinking co-salient object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4339–4354, Aug. 2022.

Digital Library

[65]

F. Perazzi et al., “A benchmark dataset and evaluation methodology for video object segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 724–732.

[66]

P. Ochs, J. Malik, and T. Brox, “Segmentation of moving objects by long term video analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 6, pp. 1187–1200, Jun. 2014.

[67]

W. Wang, J. Shen, and L. Shao, “Consistent video saliency using local gradient flow optimization and global refinement,” IEEE Trans. Image Process., vol. 24, no. 11, pp. 4185–4196, Nov. 2015.

Digital Library

[68]

F. Li, T. Kim, A. Humayun, D. Tsai, and J. M. Rehg, “Video segmentation by tracking many figure-ground segments,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2192–2199.

[69]

F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 733–740.

[70]

R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 1597–1604.

[71]

D.-P. Fan et al., “Enhanced-alignment measure for binary foreground map evaluation,” 2018, arXiv:1805.10421.

[72]

D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 4548–4557.

[73]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.

[74]

M. Zhen et al., “Learning discriminative feature with CRF for unsupervised video object segmentation,” in Proc. Eur. Conf. Comput. Vis., Springer, 2020, pp. 445–462.

[75]

X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.

[76]

K. R. Jerripothula, J. Cai, and J. Yuan, “Image co-segmentation via saliency co-fusion,” IEEE Trans. Multimedia, vol. 18, no. 9, pp. 1896–1909, Sep. 2016.

Digital Library

[77]

K. R. Jerripothula, J. Cai, J. Lu, and J. Yuan, “Object co-skeletonization with co-segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3881–3889.

[78]

C. Wang, H. Zhang, L. Yang, X. Cao, and H. Xiong, “Multiple semantic matching on augmented $ n$-partite graph for object co-segmentation,” IEEE Trans. Image Process., vol. 26, no. 12, pp. 5825–5839, Dec. 2017.

Digital Library

[79]

K.-J. Hsu et al., “Co-attention CNNs for unsupervised object co-segmentation,” in Proc. 27th Int. Joint Conf. Artif. Intell., 2018, pp. 748–756.

Digital Library

[80]

J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling-based design for real-time salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3917–3926.

[81]

J.-X. Zhao et al., “EGNet: Edge guidance network for salient object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8779–8788.

[82]

Z. Wu, L. Su, and Q. Huang, “Stacked cross refinement network for edge-aware salient object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7264–7273.

[83]

B. Li, Z. Sun, L. Tang, Y. Sun, and J. Shi, “Detecting robust co-saliency with recurrent co-attention neural network,” in Proc. 28th Int. Joint Conf. Artif. Intell., 2019, pp. 818–825.

Digital Library

[84]

K. Zhang, T. Li, B. Liu, and Q. Liu, “Co-saliency detection via mask-guided fully convolutional networks with multi-scale label smoothing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3095–3104.

[85]

T. Li et al., “Image co-saliency detection and instance co-segmentation using attention graph clustering based graph convolutional network,” IEEE Trans. Multimedia, vol. 24, pp. 492–505, 2021.

Digital Library

[86]

X. Qian, Y. Zeng, W. Wang, and Q. Zhang, “Co-saliency detection guided by group weakly supervised learning,” IEEE Trans. Multimedia, early access, Apr. 10, 2022.

Digital Library

[87]

Y. Chen et al., “SCOM: Spatiotemporal constrained optimization for salient object detection,” IEEE Trans. Image Process., vol. 27, no. 7, pp. 3345–3357, Jul. 2018.

[88]

G. Li and Y. Yu, “Deep contrast learning for salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 478–487.

[89]

S. Li, B. Seybold, A. Vorobyov, X. Lei, and C.-C. J. Kuo, “Unsupervised video object segmentation with motion-based bilateral networks,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 207–223.

[90]

H. Song, W. Wang, S. Zhao, J. Shen, and K.-M. Lam, “Pyramid dilated deeper convLSTM for video salient object detection,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 715–731.

[91]

R. Cong et al., “Video saliency detection via sparsity-based reconstruction and propagation,” IEEE Trans. Image Process., vol. 28, no. 10, pp. 4819–4831, Oct. 2019.

[92]

C. Chen, G. Wang, C. Peng, X. Zhang, and H. Qin, “Improved robust video saliency detection based on long-term spatial-temporal information,” IEEE Trans. Image Process., vol. 29, pp. 1090–1100, 2019.

[93]

Y. Ji, H. Zhang, Z. Jie, L. Ma, and Q. J. Wu, “CASNet: A cross-attention siamese network for video salient object detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 6, pp. 2676–2690, Jun. 2021.

[94]

H. Park, J. Yoo, S. Jeong, G. Venkatesh, and N. Kwak, “Learning dynamic network using a reuse gate function in semi-supervised video object segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8405–8414.

[95]

Y. Kong, Y. Wang, A. Li, and Q. Huang, “Self-sufficient feature enhancing networks for video salient object detection,” IEEE Trans. Multimedia, vol. 25, pp. 557–571, 2021.

[96]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[97]

A. Howard et al., “Searching for MobileNetv3,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 1314–1324.

[98]

J. Pont-Tuset et al., “The 2017 DAVIS challenge on video object segmentation,” 2017, arXiv:1704.00675.

[99]

N. Xu et al., “YouTube-VOS: A large-scale video object segmentation benchmark,” 2018, arXiv:1809.03327.

[100]

T. Liu et al., “Learning to detect a salient object,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 353–367, Feb. 2011.

Digital Library

[101]

Z. Wang, X. Yan, Y. Han, and M. Sun, “Ranking video salient object detection,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 873–881.

Digital Library

[102]

H. Li, G. Chen, G. Li, and Y. Yu, “Motion guided attention for video salient object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7274–7283.

[103]

K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5693–5703.

[104]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD birds-200-2011 dataset,” Tech. Rep., 2010-001, California Inst. Technol., Pasadena, CA, USA, 2011.

[105]

K. Tang, A. Joulin, L.-J. Li, and L. Fei-Fei, “Co-localization in real-world images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1464–1471.

[106]

R. Liu et al., “FuseFormer: Fusing fine-grained information in transformers for video inpainting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 14040–14049.

[107]

G. Bradski, “The opencv library,” Dr Dobb's J.: Softw. Tools Professional Programmer, vol. 25, no. 11, pp. 120–123, 2000.

Cited By

Guo RYing XQi YQu L(2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3369922
Lin JZhu LShen JFu HZhang QWang L(2024)ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object DetectionInternational Journal of Computer Vision10.1007/s11263-024-02051-5132:11(5173-5191)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s11263-024-02051-5
Chakraborty SSamaras D(2024)Self-supervised Co-salient Object Detection via Feature Correspondences at Multiple ScalesComputer Vision – ECCV 202410.1007/978-3-031-72673-6_13(231-250)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72673-6_13
Show More Cited By

Index Terms

A Unified Transformer Framework for Group-Based Segmentation: Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Video segmentation
      2. Computer vision tasks
        Scene understanding
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Evaluation on fusion of saliency and objectness for salient object segmentation
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service

Saliency detection measures the probability how a region attracts human visual attention, and objectness estimates the probability that a rectangle window may contain potential objects. Can a salient object segmentation method which utilizes both ...
Salient Object Segmentation via Effective Integration of Saliency and Objectness
This paper proposes an effective salient object segmentation method via the graph-based integration of saliency and objectness. Based on the superpixel segmentation result of the input image, a graph is built to represent superpixels using regular vertex, ...
Salient object detection via multiple saliency weights

Salient object detection aims to emulate the extraordinary capability of human visual system, which has the ability to find the most visually attractive objects in a complex visual scene. The human visual attention is often complicated and affected by ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 26, Issue

2024

10405 pages

ISSN:1520-9210

Issue’s Table of Contents

1520-9210 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 05 April 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Guo RYing XQi YQu L(2024)UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.336992226(7622-7635)Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3369922
Lin JZhu LShen JFu HZhang QWang L(2024)ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object DetectionInternational Journal of Computer Vision10.1007/s11263-024-02051-5132:11(5173-5191)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s11263-024-02051-5
Chakraborty SSamaras D(2024)Self-supervised Co-salient Object Detection via Feature Correspondences at Multiple ScalesComputer Vision – ECCV 202410.1007/978-3-031-72673-6_13(231-250)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72673-6_13
Xu PMu YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Co-Salient Object Detection with Semantic-Level Consensus Extraction and DispersionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612133(2744-2755)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612133
Huang ALi LZhang LNiu YZhao TLin C(2023)Multi-View Graph Embedding Learning for Image Co-Segmentation and Co-LocalizationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.333918134:6(4942-4956)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3339181

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents