Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3265987.3265992acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Joint Object Tracking and Segmentation with Independent Convolutional Neural Networks

Published: 15 October 2018 Publication History

Abstract

Object tracking and segmentation are important research topics in computer vision. They provide the trajectory and boundary of an object based on their appearance and shape features. Most studies on tracking and segmentation focus on encoding methods for the feature of an object. However, the tracking trajectory and segmentation mask are acquired separately, although similar visual information is required for both methods. Therefore, in this paper, we propose a CNN-based joint object tracking and segmentation framework that provides a segmentation mask while improving the performance of object tacker. In our model, the tracking model determines the trajectory of the target object as a bounding box in each frame. Given the bounding box at each frame, the segmentation model predicts a dense mask of the target object in the bounding box. Then, the segmentation mask is used to refine the bounding box for the tracking model. We evaluate the performance of our algorithm on DAVIS benchmark dataset by AUC score and mean IoU. We showed that the performance of original tracker was improved by our proposed framework.

References

[1]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2015. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015).
[2]
David S Bolme, J Ross Beveridge, Bruce A Draper, and Yui Man Lui. 2010. Visual object tracking using adaptive correlation filters. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2544--2550.
[3]
Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, and Luc Van Gool. 2017. One-shot video object segmentation. In CVPR 2017. IEEE.
[4]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014).
[5]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2018), 834--848.
[6]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
[7]
Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision. 4310--4318.
[8]
Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International journal of computer vision 111, 1 (2015), 98--136.
[9]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.
[10]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2980--2988.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[12]
João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2015), 583--596.
[13]
Zia Khan, Tucker Balch, and Frank Dellaert. 2004. An MCMC-based particle filter for tracking multiple interacting targets. In European Conference on Computer Vision. Springer, 279--290.
[14]
Matej Kristan, Jiri Matas, Ales Leonardis, Tomas Vojir, Roman Pflugfelder, Gustavo Fernandez, Georg Nebehay, Fatih Porikli, and Luka Cehovin. 2016. A Novel Performance Evaluation Methodology for Single-Target Trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 11 (Nov 2016), 2137--2155.
[15]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[16]
Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M Rehg. 2013. Video segmentation by tracking many figure-ground segments. In Proceedings of the IEEE International Conference on Computer Vision. 2192--2199.
[17]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[18]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.
[19]
Anton Milan, Laura Leal-Taixé, Konrad Schindler, and Ian Reid. 2015. Joint tracking and segmentation of multiple targets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5397--5406.
[20]
Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4293--4302.
[21]
Katja Nummiaro, Esther Koller-Meier, and Luc Van Gool. 2003. An adaptive color-based particle filter. Image and vision computing 21, 1 (2003), 99--110.
[22]
Kenji Okuma, Ali Taleghani, Nando De Freitas, James J Little, and David G Lowe. 2004. A boosted particle filter: Multitarget detection and tracking. In European conference on computer vision. Springer, 28--39.
[23]
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 724--732.
[24]
Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, and Ming-Hsuan Yang. 2016. Hedged deep tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4303--4311.
[25]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.
[26]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[27]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.
[28]
Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[29]
Jeany Son, Ilchae Jung, Kayoung Park, and Bohyung Han. 2015. Tracking-bysegmentation with online gradient boosting decision tree. In Proceedings of the IEEE International Conference on Computer Vision. 3056--3064.
[30]
Paul Voigtlaender and Bastian Leibe. 2017. Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364 (2017).
[31]
Longyin Wen, Dawei Du, Zhen Lei, Stan Z Li, and Ming-Hsuan Yang. 2015. Jots: Joint online tracking and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2226--2234.
[32]
Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2013. Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2411--2418.

Cited By

View all
  • (2022)Perceived Integrity of Distributed Streaming Media Based on AWTC-TT Algorithm OptimizationWireless Communications & Mobile Computing10.1155/2022/75221742022Online publication date: 1-Jan-2022
  • (2020)DeepKeyACM Transactions on Intelligent Systems and Technology10.1145/339361911:4(1-24)Online publication date: 31-May-2020
  • (2020)Video Object Segmentation and TrackingACM Transactions on Intelligent Systems and Technology10.1145/339174311:4(1-47)Online publication date: 25-May-2020
  • Show More Cited By

Index Terms

  1. Joint Object Tracking and Segmentation with Independent Convolutional Neural Networks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CoVieW'18: Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild
      October 2018
      45 pages
      ISBN:9781450359764
      DOI:10.1145/3265987
      © 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 October 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. joint model
      2. masking
      3. object tracking
      4. video object segmentation

      Qualifiers

      • Research-article

      Funding Sources

      • the Ministry of Science and ICT of Korea
      • the Ministry of Education

      Conference

      MM '18
      Sponsor:
      MM '18: ACM Multimedia Conference
      October 22, 2018
      Seoul, Republic of Korea

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 14 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Perceived Integrity of Distributed Streaming Media Based on AWTC-TT Algorithm OptimizationWireless Communications & Mobile Computing10.1155/2022/75221742022Online publication date: 1-Jan-2022
      • (2020)DeepKeyACM Transactions on Intelligent Systems and Technology10.1145/339361911:4(1-24)Online publication date: 31-May-2020
      • (2020)Video Object Segmentation and TrackingACM Transactions on Intelligent Systems and Technology10.1145/339174311:4(1-47)Online publication date: 25-May-2020
      • (2020)End-to-End Text-to-Image Synthesis with Spatial ConstrainsACM Transactions on Intelligent Systems and Technology10.1145/339170911:4(1-19)Online publication date: 25-May-2020
      • (2020)Geosocial Co-ClusteringACM Transactions on Intelligent Systems and Technology10.1145/339170811:4(1-26)Online publication date: 13-Jun-2020
      • (2020)A Traffic Density Estimation Model Based on Crowdsourcing Privacy ProtectionACM Transactions on Intelligent Systems and Technology10.1145/339170711:4(1-18)Online publication date: 22-May-2020
      • (2020)Domain-attention Conditional Wasserstein Distance for Multi-source Domain AdaptationACM Transactions on Intelligent Systems and Technology10.1145/339122911:4(1-19)Online publication date: 31-May-2020
      • (2020)CoFi-pointsACM Transactions on Intelligent Systems and Technology10.1145/338912711:4(1-24)Online publication date: 25-May-2020
      • (2020)CNN-based Multiple Manipulation Detector Using Frequency Domain Features of Image ResidualsACM Transactions on Intelligent Systems and Technology10.1145/338863411:4(1-26)Online publication date: 31-May-2020
      • (2020)Superpixel Region Merging Based on Deep Network for Medical Image SegmentationACM Transactions on Intelligent Systems and Technology10.1145/338609011:4(1-22)Online publication date: 31-May-2020
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media