Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2964284.2964295acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration

Published: 01 October 2016 Publication History

Abstract

Video segmentation has become an important and active research area with a large diversity of proposed approaches. Graph-based methods, enabling top performance on recent benchmarks, usually focus on either obtaining a precise similarity graph or designing efficient graph cutting strategies. However, these two components are often conducted in two separated steps, and thus the obtained similarity graph may not be the optimal one for segmentation and this may lead to suboptimal results. In this paper, we propose a novel framework, joint graph learning and video segmentation (JGLVS)}, which learns the similarity graph and video segmentation simultaneously. JGLVS learns the similarity graph by assigning adaptive neighbors for each vertex based on multiple cues (appearance, motion, boundary and spatial information). Meanwhile, the new rank constraint is imposed to the Laplacian matrix of the similarity graph, such that the connected components in the resulted similarity graph are exactly equal to the number of segmentations. Furthermore, JGLVS can automatically weigh multiple cues and calibrate the pairwise distance of superpixels based on their topology structures. Most noticeably, empirical results on the challenging dataset VSB100 show that JGLVS achieves promising performance on the benchmark dataset which outperforms the state-of-the-art by up to 11% for the BPR metric.

References

[1]
P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. From contours to regions: An empirical evaluation. In CVPR, pages 2294--2301, 2009.
[2]
P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):898--916, 2011.
[3]
W. Brendel and S. Todorovic. Video object segmentation by tracking regions. In ICCV, pages 833--840, 2009.
[4]
T. Brox and J. Malik. Object segmentation by long term analysis of point trajectories. In ECCV, pages 282--295, 2010.
[5]
L. Chen, J. Shen, W. Wang, and B. Ni. Video object segmentation via dense trajectories. IEEE Trans. Multimedia, 17(12):2225--2234, 2015.
[6]
J. Corso, E. Sharon, S. Dube, S. El-Saden, U. Sinha, and A. Yuille. Efficient multilevel brain tumor segmentation with integrated bayesian model classification. Medical Imaging, IEEE Transactions on, 27(5):629--640, 2008.
[7]
K. Fan. On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations. I. Proceedings of the National Academy of Science, 35:652--655, Nov. 1949.
[8]
K. Fragkiadaki and J. Shi. Detection free tracking: Exploiting motion and topology for segmenting and tracking under entanglement. In CVPR, pages 2073--2080, 2011.
[9]
F. Galasso, R. Cipolla, and B. Schiele. Video segmentation with superpixels. In ACCV, 2012.
[10]
F. Galasso, M. Keuper, T. Brox, and B. Schiele. Spectral graph reduction for efficient image and streaming video segmentation. In CVPR, 2014.
[11]
F. Galasso, N. S. Nagaraja, T. J. Cardenas, T. Brox, and B. Schiele. A unified video segmentation benchmark: Annotation, metrics and analysis. In ICCV, 2013.
[12]
L. Gao, J. Song, F. Nie, Y. Yan, N. Sebe, and H. T. Shen. Optimal graph learning with partial tags and multiple features for image and video annotation. In CVPR, pages 4371--4379, 2015.
[13]
L. Gao, J. Song, F. Nie, F. Zou, N. Sebe, and H. T. Shen. Graph-without-cut: An ideal graph learning for image segmentation. In AAAI, pages 1188--1194, 2016.
[14]
M. Grundmann, V. Kwatra, M. Han, and I. Essa. Efficient hierarchical graph-based video segmentation. In CVPR, pages 2141--2148, 2010.
[15]
A. Jain, S. Chatterjee, and R. Vidal. Coarse-to-fine semantic video segmentation using supervoxel trees. In ICCV, pages 1865--1872, 2013.
[16]
H. Jiang, G. Zhang, H. Wang, and H. Bao. Spatio-temporal video segmentation of static scenes and its applications. IEEE Trans. Multimedia, 17(1):3--15, 2015.
[17]
M. Keuper, B. Andres, and T. Brox. Motion trajectory segmentation via minimum cost multicuts. In ICCV, 2015.
[18]
M. Keuper, B. Andres, and T. Brox. Motion trajectory segmentation via minimum cost multicuts. In ICCV, pages 3271--3279, 2015.
[19]
A. Khoreva, F. Galasso, M. Hein, and B. Schiele. Classifier based graph construction for video segmentation. In CVPR, 2015.
[20]
C. Li, L. Lin, W. Zuo, S. Yan, and J. Tang. Sold: Sub-optimal low-rank decomposition for efficient video segmentation. In CVPR, 2015.
[21]
B. Liu and X. He. Multiclass semantic video segmentation with object-level active inference. In CVPR, pages 4286--4294, 2015.
[22]
B. Luo, H. Li, T. Song, and C. Huang. Object segmentation from long video sequences. In ACM Multimedia, pages 1187--1190, 2015.
[23]
T. Ma and L. J. Latecki. Maximum weight cliques with mutex constraints for video object segmentation. In CVPR, pages 670--677, 2012.
[24]
N. S. Nagaraja, F. R. Schmidt, and T. Brox. Video segmentation with just a few strokes. In ICCV, pages 3235--3243, 2015.
[25]
F. Nie, X. Wang, and H. Huang. Clustering and projected clustering with adaptive neighbors. In SIGKDD, pages 977--986, 2014.
[26]
F. Nie, X. Wang, M. I. Jordan, and H. Huang. The constrained laplacian rank algorithm for graph-based clustering. In AAAI, pages 1969--1976, 2016.
[27]
P. Ochs and T. Brox. Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions. In ICCV, pages 1583--1590, 2011.
[28]
P. Ochs and T. Brox. Higher order motion models and spectral clustering. In CVPR, pages 614--621, 2012.
[29]
P. Ochs, J. Malik, and T. Brox. Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell., 36(6):1187--1200, 2014.
[30]
S. Paris. Edge-preserving smoothing and mean-shift segmentation of video streams. In ECCV, pages 460--473, 2008.
[31]
S. H. Raza, M. Grundmann, and I. A. Essa. Geometric context from videos. In CVPR, pages 3081--3088, 2013.
[32]
A. V. Reina, S. Avidan, H. Pfister, and E. L. Miller. Multiple hypothesis video segmentation from superpixel flows. In ECCV, pages 268--281, 2010.
[33]
F. Shen, C. Shen, Q. Shi, A. van den Hengel, Z. Tang, and H. T. Shen. Hashing on nonlinear manifolds. IEEE Trans. Image Processing, 24(6):1839--1851, 2015.
[34]
J. Son, I. Jung, K. Park, and B. Han. Tracking-by-segmentation with online gradient boosting decision tree. In ICCV, 2015.
[35]
J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo. Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimedia, 15(8):1997--2008, 2013.
[36]
H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
[37]
Y. Wang, J. Liu, Y. Li, and H. Lu. Semi- and weakly- supervised semantic segmentation with deep convolutional neural networks. In ACM Multimedia, pages 1223--1226, 2015.
[38]
C. Xu, C. Xiong, and J. J. Corso. Streaming hierarchical video segmentation. In ECCV, pages 626--639, 2012.
[39]
X. Yao, J. Han, G. Cheng, and L. Guo. Semantic segmentation based on stacked discriminative autoencoders and context-constrained weakly supervised learning. In ACM Multimedia, pages 1211--1214, 2015.
[40]
S. Yi and V. Pavlovic. Multi-cue structure preserving MRF for unconstrained video segmentation. In ICCV, 2015.
[41]
C.-P. Yu, H. Le, G. Zelinsky, and D. Samaras. Efficient video segmentation using parametric graph partitioning. In ICCV, 2015.
[42]
V. Zografos, R. Lenz, E. Ringaby, M. Felsberg, and K. Nordberg. Fast segmentation of sparse 3d point trajectories using group theoretical invariants. In ACCV, pages 675--691, 2014.

Cited By

View all
  • (2021)Unsupervised 2D dimensionality reduction by jointly learning structural and temporal correlationApplied Intelligence10.1007/s10489-021-02439-7Online publication date: 18-Aug-2021
  • (2019)Automatic Annotation of Airborne Images by Label Propagation Based on a Bayesian-CRF ModelRemote Sensing10.3390/rs1102014511:2(145)Online publication date: 13-Jan-2019
  • (2019)Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMsProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3350949(2043-2051)Online publication date: 15-Oct-2019
  • Show More Cited By

Index Terms

  1. Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '16: Proceedings of the 24th ACM international conference on Multimedia
      October 2016
      1542 pages
      ISBN:9781450336031
      DOI:10.1145/2964284
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 October 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. graph-based method
      2. multiple cues
      3. topology
      4. video segmentation

      Qualifiers

      • Research-article

      Funding Sources

      • FP7 EC project
      • National Natural Science Foundation of China
      • the Fundamental Research Funds for the Central Universities

      Conference

      MM '16
      Sponsor:
      MM '16: ACM Multimedia Conference
      October 15 - 19, 2016
      Amsterdam, The Netherlands

      Acceptance Rates

      MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)11
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 16 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Unsupervised 2D dimensionality reduction by jointly learning structural and temporal correlationApplied Intelligence10.1007/s10489-021-02439-7Online publication date: 18-Aug-2021
      • (2019)Automatic Annotation of Airborne Images by Label Propagation Based on a Bayesian-CRF ModelRemote Sensing10.3390/rs1102014511:2(145)Online publication date: 13-Jan-2019
      • (2019)Predicting Future Instance Segmentation with Contextual Pyramid ConvLSTMsProceedings of the 27th ACM International Conference on Multimedia10.1145/3343031.3350949(2043-2051)Online publication date: 15-Oct-2019
      • (2019)Multi-scale deep context convolutional neural networks for semantic segmentationWorld Wide Web10.1007/s11280-018-0556-322:2(555-570)Online publication date: 1-Mar-2019
      • (2019)ProfitLeaderWorld Wide Web10.1007/s11280-018-0537-622:2(533-553)Online publication date: 1-Mar-2019
      • (2019)Exploiting long-term temporal dynamics for video captioningWorld Wide Web10.1007/s11280-018-0530-022:2(735-749)Online publication date: 1-Mar-2019
      • (2018)Coarse-to-fine image co-segmentation with intra and inter rank constraintsProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304415.3304518(719-725)Online publication date: 13-Jul-2018
      • (2018)Cumulative Nets for Edge DetectionProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240688(1847-1855)Online publication date: 15-Oct-2018
      • (2018)Boosting Scene Parsing Performance via Reliable Scale PredictionProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240657(492-500)Online publication date: 15-Oct-2018
      • (2018)Learning Deep Spatio-Temporal Dependence for Semantic Video SegmentationIEEE Transactions on Multimedia10.1109/TMM.2017.275950420:4(939-949)Online publication date: 1-Apr-2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media