Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-030-58536-5_46guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Learning What to Learn for Video Object Segmentation

Published: 23 August 2020 Publication History

Abstract

Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined by a first-frame reference mask during inference. The problem of how to capture and utilize this limited information to accurately segment the target remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learner. Our learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond the standard few-shot learning paradigm by learning what our target model should learn in order to maximize segmentation accuracy. We perform extensive experiments on standard benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6% relative improvement over the previous best result. The code and models are available at https://github.com/visionml/pytracking.

References

[1]
Behl, H.S., Najafi, M., Arnab, A., Torr, P.H.S.: Meta learning deep visual words for fast video object segmentation. In: NeurIPS 2019 Workshop on Machine Learning for Autonomous Driving (2018)
[2]
Berman, M., Rannen Triki, A., Blaschko, M.B.: The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4413–4421 (2018)
[3]
Bertinetto, L., Henriques, J.F., Torr, P., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: International Conference on Learning Representations (2019)
[4]
Bhat, G., Danelljan, M., Van Gool, L., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6182–6191 (2019)
[5]
Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5320–5329. IEEE (2017)
[6]
Choi, J., Kwon, J., Lee, K.M.: Deep meta learning for real-time target-aware visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 911–920 (2019)
[7]
Cohen, I., Medioni, G.: Detecting and tracking moving objects for video surveillance. In: Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), vol. 2, pp. 319–325. IEEE (1999)
[8]
Danelljan, M., Van Gool, L., Timofte, R.: Probabilistic regression for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
[9]
Erdélyi, A., Barát, T., Valet, P., Winkler, T., Rinner, B.: Adaptive cartooning for privacy protection in camera networks. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 44–49. IEEE (2014)
[10]
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1126–1135. JMLR. org (2017)
[11]
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask r-cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
[12]
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)
[13]
Hu, P., Wang, G., Kong, X., Kuen, J., Tan, Y.P.: Motion-guided cascaded refinement network for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1400–1409 (2018)
[14]
Hu Y-T, Huang J-B, and Schwing AG Ferrari V, Hebert M, Sminchisescu C, and Weiss Y VideoMatch: matching based video object segmentation Computer Vision – ECCV 2018 2018 Cham Springer 56-73
[15]
Johnander, J., Danelljan, M., Brissman, E., Khan, F.S., Felsberg, M.: A generative appearance model for end-to-end video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
[16]
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, December 2014
[17]
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: CVPR (2019)
[18]
Lin, H., Qi, X., Jia, J.: Agss-vos: attention guided single-shot video object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3949–3957 (2019)
[19]
Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755
[20]
Liu, Y., Liu, L., Zhang, H., Rezatofighi, H., Reid, I.: Meta learning with differentiable closed-form solver for fast video object segmentation. arXiv preprint arXiv:1909.13046 (2019)
[21]
Luiten J, Voigtlaender P, and Leibe B Jawahar CV, Li H, Mori G, and Schindler K PReMVOS: proposal-generation, refinement and merging for video object segmentation Computer Vision – ACCV 2018 2019 Cham Springer 565-580
[22]
Maninis KK et al. Video object segmentation without temporal information IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2018 41 6 1515-1530
[23]
Massa, F., Girshick, R.: maskrcnn-benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch. https://github.com/facebookresearch/maskrcnn-benchmark (2018). Accessed 04 Sep 2019
[24]
Oh, S.W., Lee, J.Y., Sunkavalli, K., Kim, S.J.: Fast video object segmentation by reference-guided mask propagation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7376–7385. IEEE (2018)
[25]
Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
[26]
Park, E., Berg, A.C.: Meta-tracker: fast and robust online adaptation for visual object trackers. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 569–585 (2018)
[27]
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Computer Vision and Pattern Recognition (2016)
[28]
Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2663–2672 (2017)
[29]
Robinson, A., Lawin, F.J., Danelljan, M., Khan, F.S., Felsberg, M.: Learning fast and robust target models for video object segmentation (2020)
[30]
Ros, G., Ramos, S., Granados, M., Bakhtiary, A., Vazquez, D., Lopez, A.M.: Vision-based offline-online perception paradigm for autonomous driving. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp. 231–238. IEEE (2015)
[31]
Saleh, K., Hossny, M., Nahavandi, S.: Kangaroo vehicle collision detection using deep semantic segmentation convolutional neural network. In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–7. IEEE (2016)
[32]
Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. In: BMVC (2017)
[33]
Voigtlaender, P., Leibe, B.: Feelvos: fast end-to-end embedding learning for video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
[34]
Voigtlaender, P., Luiten, J., Torr, P.H., Leibe, B.: Siam r-cnn: visual tracking by re-detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
[35]
Vondrick C, Shrivastava A, Fathi A, Guadarrama S, and Murphy K Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Tracking emerges by colorizing videos Computer Vision – ECCV 2018 2018 Cham Springer 402-419
[36]
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
[37]
Wang, Z., Xu, J., Liu, L., Zhu, F., Shao, L.: Ranet: ranking attention network for fast video object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3978–3987 (2019)
[38]
Xu N et al. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, et al. YouTube-VOS: sequence-to-sequence video object segmentation Computer Vision – ECCV 2018 2018 Cham Springer 603-619
[39]
Xu, N., et al.: Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 (2018)
[40]
Yang L, Wang Y, Xiong X, Yang J, and Katsaggelos AK Efficient video object segmentation via network modulation Algorithms 2018 29 15

Cited By

View all
  • (2024)OneVOS: Unifying Video Object Segmentation with All-in-One Transformer FrameworkComputer Vision – ECCV 202410.1007/978-3-031-73636-0_2(20-40)Online publication date: 29-Sep-2024
  • (2023)From ViT features to training-free video object segmentation via streaming-data mixture modelsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666607(10995-11007)Online publication date: 10-Dec-2023
  • (2023)Complementary Coarse-to-Fine Matching for Video Object SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359649619:6(1-21)Online publication date: 12-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II
Aug 2020
839 pages
ISBN:978-3-030-58535-8
DOI:10.1007/978-3-030-58536-5

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 August 2020

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)OneVOS: Unifying Video Object Segmentation with All-in-One Transformer FrameworkComputer Vision – ECCV 202410.1007/978-3-031-73636-0_2(20-40)Online publication date: 29-Sep-2024
  • (2023)From ViT features to training-free video object segmentation via streaming-data mixture modelsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666607(10995-11007)Online publication date: 10-Dec-2023
  • (2023)Complementary Coarse-to-Fine Matching for Video Object SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359649619:6(1-21)Online publication date: 12-Jul-2023
  • (2023)Exploring the Adversarial Robustness of Video Object Segmentation via One-shot Adversarial AttacksProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611827(8598-8607)Online publication date: 26-Oct-2023
  • (2023)SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611804(7481-7490)Online publication date: 26-Oct-2023
  • (2023)DSNet: Efficient Lightweight Model for Video Salient Object Detection for IoT and WoT ApplicationsCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587592(1286-1295)Online publication date: 30-Apr-2023
  • (2022)Spatial and Temporal Guidance for Semi-supervised Video Object SegmentationNeural Information Processing10.1007/978-3-031-30111-7_9(97-109)Online publication date: 22-Nov-2022
  • (2022)Video Object Segmentation via Structural Feature ReconfigurationComputer Vision – ACCV 202210.1007/978-3-031-26293-7_35(588-605)Online publication date: 4-Dec-2022
  • (2022)Robust Visual Tracking by SegmentationComputer Vision – ECCV 202210.1007/978-3-031-20047-2_33(571-588)Online publication date: 23-Oct-2022
  • (2022)Tackling Background Distraction in Video Object SegmentationComputer Vision – ECCV 202210.1007/978-3-031-20047-2_26(446-462)Online publication date: 23-Oct-2022
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media