research-article

Joint Object Tracking and Segmentation with Independent Convolutional Neural Networks

Authors:

Jongwoo LimAuthors Info & Claims

CoVieW'18: Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild

Pages 7 - 13

https://doi.org/10.1145/3265987.3265992

Published: 15 October 2018 Publication History

Abstract

Object tracking and segmentation are important research topics in computer vision. They provide the trajectory and boundary of an object based on their appearance and shape features. Most studies on tracking and segmentation focus on encoding methods for the feature of an object. However, the tracking trajectory and segmentation mask are acquired separately, although similar visual information is required for both methods. Therefore, in this paper, we propose a CNN-based joint object tracking and segmentation framework that provides a segmentation mask while improving the performance of object tacker. In our model, the tracking model determines the trajectory of the target object as a bounding box in each frame. Given the bounding box at each frame, the segmentation model predicts a dense mask of the target object in the bounding box. Then, the segmentation mask is used to refine the bounding box for the tracking model. We evaluate the performance of our algorithm on DAVIS benchmark dataset by AUC score and mean IoU. We showed that the performance of original tracker was improved by our proposed framework.

References

[1]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2015. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015).

[2]

David S Bolme, J Ross Beveridge, Bruce A Draper, and Yui Man Lui. 2010. Visual object tracking using adaptive correlation filters. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2544--2550.

[3]

Sergi Caelles, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, and Luc Van Gool. 2017. One-shot video object segmentation. In CVPR 2017. IEEE.

[4]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062 (2014).

[5]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2018), 834--848.

[6]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).

[7]

Martin Danelljan, Gustav Hager, Fahad Shahbaz Khan, and Michael Felsberg. 2015. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision. 4310--4318.

Digital Library

[8]

Mark Everingham, SM Ali Eslami, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2015. The pascal visual object classes challenge: A retrospective. International journal of computer vision 111, 1 (2015), 98--136.

Digital Library

[9]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.

Digital Library

[10]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2980--2988.

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[12]

João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2015. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2015), 583--596.

Digital Library

[13]

Zia Khan, Tucker Balch, and Frank Dellaert. 2004. An MCMC-based particle filter for tracking multiple interacting targets. In European Conference on Computer Vision. Springer, 279--290.

[14]

Matej Kristan, Jiri Matas, Ales Leonardis, Tomas Vojir, Roman Pflugfelder, Gustavo Fernandez, Georg Nebehay, Fatih Porikli, and Luka Cehovin. 2016. A Novel Performance Evaluation Methodology for Single-Target Trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 11 (Nov 2016), 2137--2155.

Digital Library

[15]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[16]

Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M Rehg. 2013. Video segmentation by tracking many figure-ground segments. In Proceedings of the IEEE International Conference on Computer Vision. 2192--2199.

Digital Library

[17]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.

[18]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.

[19]

Anton Milan, Laura Leal-Taixé, Konrad Schindler, and Ian Reid. 2015. Joint tracking and segmentation of multiple targets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5397--5406.

[20]

Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4293--4302.

[21]

Katja Nummiaro, Esther Koller-Meier, and Luc Van Gool. 2003. An adaptive color-based particle filter. Image and vision computing 21, 1 (2003), 99--110.

[22]

Kenji Okuma, Ali Taleghani, Nando De Freitas, James J Little, and David G Lowe. 2004. A boosted particle filter: Multitarget detection and tracking. In European conference on computer vision. Springer, 28--39.

[23]

Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 724--732.

[24]

Yuankai Qi, Shengping Zhang, Lei Qin, Hongxun Yao, Qingming Huang, Jongwoo Lim, and Ming-Hsuan Yang. 2016. Hedged deep tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4303--4311.

[25]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

Digital Library

[26]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[27]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.

Digital Library

[28]

Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[29]

Jeany Son, Ilchae Jung, Kayoung Park, and Bohyung Han. 2015. Tracking-bysegmentation with online gradient boosting decision tree. In Proceedings of the IEEE International Conference on Computer Vision. 3056--3064.

Digital Library

[30]

Paul Voigtlaender and Bastian Leibe. 2017. Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364 (2017).

[31]

Longyin Wen, Dawei Du, Zhen Lei, Stan Z Li, and Ming-Hsuan Yang. 2015. Jots: Joint online tracking and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2226--2234.

[32]

Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2013. Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2411--2418.

Digital Library

Cited By

Yu WJiang JZhai YXu P(2022)Perceived Integrity of Distributed Streaming Media Based on AWTC-TT Algorithm OptimizationWireless Communications & Mobile Computing10.1155/2022/75221742022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7522174
Zhang XYao LHuang CGu TYang ZLiu Y(2020)DeepKeyACM Transactions on Intelligent Systems and Technology10.1145/339361911:4(1-24)Online publication date: 31-May-2020
https://dl.acm.org/doi/10.1145/3393619
Yao RLin GXia SZhao JZhou Y(2020)Video Object Segmentation and TrackingACM Transactions on Intelligent Systems and Technology10.1145/339174311:4(1-47)Online publication date: 25-May-2020
https://dl.acm.org/doi/10.1145/3391743
Show More Cited By

Index Terms

Joint Object Tracking and Segmentation with Independent Convolutional Neural Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
        Video segmentation

Recommendations

Video Object Segmentation and Tracking: A Survey
Survey Paper and Regular Paper

Object segmentation and object tracking are fundamental research areas in the computer vision community. These two topics are difficult to handle some common challenges, such as occlusion, deformation, motion blur, scale variation, and more. The former ...
Robust object tracking via multi-cue fusion

A long-term object tracking method based on calibrated binocular cameras by fusing information of the two channels and binocular geometry constraints is proposed.The stereo filter which is built based on the epipolar geometry of the binocular cameras is ...
Object Inter-camera Tracking with Non-overlapping Views: A New Dynamic Approach
CRV '10: Proceedings of the 2010 Canadian Conference on Computer and Robot Vision

Disjoint inter-camera object tracking is the task of tracking objects across video-surveillance cameras that have non-overlapping views. Unlike the closely related task of single-camera tracking, disjoint inter-camera tracking is difficult due to the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CoVieW'18: Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild

October 2018

45 pages

ISBN:9781450359764

DOI:10.1145/3265987

General Chairs:
Kwanghoon Sohn
Yonsei University, Korea
,
Ming-Hsuan Yang
University of California at Merced, USA
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Jongwoo Lim
Hanyang University, Korea
,
Jison Hsu
NTUST, Taiwan
,
Stephen Lin
Microsoft Research, China

Copyright © 2018 ACM.

© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Ministry of Science and ICT of Korea
the Ministry of Education

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22, 2018

Seoul, Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
171
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu WJiang JZhai YXu P(2022)Perceived Integrity of Distributed Streaming Media Based on AWTC-TT Algorithm OptimizationWireless Communications & Mobile Computing10.1155/2022/75221742022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7522174
Zhang XYao LHuang CGu TYang ZLiu Y(2020)DeepKeyACM Transactions on Intelligent Systems and Technology10.1145/339361911:4(1-24)Online publication date: 31-May-2020
https://dl.acm.org/doi/10.1145/3393619
Yao RLin GXia SZhao JZhou Y(2020)Video Object Segmentation and TrackingACM Transactions on Intelligent Systems and Technology10.1145/339174311:4(1-47)Online publication date: 25-May-2020
https://dl.acm.org/doi/10.1145/3391743
Wang MLang CLiang LFeng SWang TGao Y(2020)End-to-End Text-to-Image Synthesis with Spatial ConstrainsACM Transactions on Intelligent Systems and Technology10.1145/339170911:4(1-19)Online publication date: 25-May-2020
https://dl.acm.org/doi/10.1145/3391709
Kim JLee JLee BLiu J(2020)Geosocial Co-ClusteringACM Transactions on Intelligent Systems and Technology10.1145/339170811:4(1-26)Online publication date: 13-Jun-2020
https://dl.acm.org/doi/10.1145/3391708
Huang YTian YLiu ZJin XLiu YZhao STian D(2020)A Traffic Density Estimation Model Based on Crowdsourcing Privacy ProtectionACM Transactions on Intelligent Systems and Technology10.1145/339170711:4(1-18)Online publication date: 22-May-2020
https://dl.acm.org/doi/10.1145/3391707
Wu HYan YNg MWu Q(2020)Domain-attention Conditional Wasserstein Distance for Multi-source Domain AdaptationACM Transactions on Intelligent Systems and Technology10.1145/339122911:4(1-19)Online publication date: 31-May-2020
https://dl.acm.org/doi/10.1145/3391229
Li LPan WMing Z(2020)CoFi-pointsACM Transactions on Intelligent Systems and Technology10.1145/338912711:4(1-24)Online publication date: 25-May-2020
https://dl.acm.org/doi/10.1145/3389127
Singhal DGupta ATripathi AKothari R(2020)CNN-based Multiple Manipulation Detector Using Frequency Domain Features of Image ResidualsACM Transactions on Intelligent Systems and Technology10.1145/338863411:4(1-26)Online publication date: 31-May-2020
https://dl.acm.org/doi/10.1145/3388634
Liu HWang HWu YXing L(2020)Superpixel Region Merging Based on Deep Network for Medical Image SegmentationACM Transactions on Intelligent Systems and Technology10.1145/338609011:4(1-22)Online publication date: 31-May-2020
https://dl.acm.org/doi/10.1145/3386090
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten