Nothing Special   »   [go: up one dir, main page]

Skip to main content

A Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Included in the following conference series:

Abstract

This paper presents an algorithm for the temporal segmentation of user-generated videos into visually coherent parts that correspond to individual video capturing activities. The latter include camera pan and tilt, change in focal length and camera displacement. The proposed approach identifies the aforementioned activities by extracting and evaluating the region-level spatio-temporal distribution of the optical flow over sequences of neighbouring video frames. The performance of the algorithm was evaluated with the help of a newly constructed ground-truth dataset, against several state-of-the-art techniques and variations of them. Extensive evaluation indicates the competitiveness of the proposed approach in terms of detection accuracy, and highlight its suitability for analysing large collections of data in a time-efficient manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Some works reported in Sect. 2 use certain datasets (TRECVid 2007 rushes summarization, UT Ego, ADL and GTEA Gaze) which were designed for assessing the efficiency of methods targeting specific types of analysis, such as video rushes segmentation [3] and the identification of everyday activities [30] and thus, ground-truth sub-shot segmentation is not available for them.

  2. 2.

    http://mklab.iti.gr/project/annotated-dataset-sub-shot-segmentation-evaluation.

References

  1. Abdollahian, G., et al.: Camera motion-based analysis of user generated video. IEEE Trans. Multimed. 12(1), 28–41 (2010)

    Article  Google Scholar 

  2. Apostolidis, E., et al.: Fast shot segmentation combining global and local visual descriptors. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6583–6587 (2014). http://mklab.iti.gr/project/video-shot-segm

  3. Bai, L., et al.: Automatic summarization of rushes video using bipartite graphs. Multimed. Tools Appl. 49(1), 63–80 (2010)

    Article  Google Scholar 

  4. Bay, H., et al.: Surf: speeded up robust features. In: Proceedings of the 9th European Conference on Computer Vision, pp. 404–417 (2006)

    Google Scholar 

  5. Benois-Pineau, J., Lovell, B.C., Andrews, R.J.: Motion estimation in colour image sequences. In: Fernandez-Maloigne, C. (ed.) Advanced Color Image Processing and Analysis, pp. 377–395. Springer, New York (2013). https://doi.org/10.1007/978-1-4419-6190-7_11

    Chapter  Google Scholar 

  6. Bouguet, J.Y.: Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp. 5(1–10), 4 (2001)

    Google Scholar 

  7. Chu, W.T., et al.: Video copy detection based on bag of trajectory and two-level approximate sequence. In: Proceedings of the Computer Vision, Graphics, and Image Processing Conference (2010)

    Google Scholar 

  8. Cooray, S.H., et al.: An interactive and multi-level framework for summarising user generated videos. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 685–688 (2009)

    Google Scholar 

  9. Cooray, S.H., et al.: Identifying an efficient and robust sub-shot segmentation method for home movie summarisation. In: 10th International Conference on Intelligent Systems Design and Applications, pp. 1287–1292 (2010)

    Google Scholar 

  10. Cricri, F., et al.: Multimodal event detection in user generated videos. In: IEEE International Symposium on Multimedia, pp. 263–270 (2011)

    Google Scholar 

  11. Dumont, E., et al.: Rushes video summarization using a collaborative approach. In: Proceedings of the 2nd ACM TRECVID Video Summarization Workshop, pp. 90–94 (2008)

    Google Scholar 

  12. Durik, M., et al.: Robust motion characterisation for video indexing based on MPEG2 optical flow. In: International Workshop on Content-Based Multimedia Indexing, pp. 57–64 (2001)

    Google Scholar 

  13. Fischler, M.A., et al.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun. 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  14. González-Díaz, I., et al.: Temporal segmentation and keyframe selection methods for user-generated video search-based annotation. Expert Syst. Appl. 42(1), 488–502 (2015)

    Article  Google Scholar 

  15. Guo, Y., et al.: Selecting video key frames based on relative entropy and the extreme studentized deviate test. Entropy 18(3), 73 (2016)

    Article  Google Scholar 

  16. Haller, M., et al.: A generic approach for motion-based video parsing. In: 15th European Signal Processing Conference, pp. 713–717 (2007)

    Google Scholar 

  17. Karaman, S., et al.: Hierarchical hidden Markov model in detecting activities of daily living in wearable videos for studies of dementia. Multimed. Tools Appl. 69(3), 743–771 (2014)

    Article  MathSciNet  Google Scholar 

  18. Kim, J.G., et al.: Efficient camera motion characterization for mpeg video indexing. In: Proceedings of the IEEE International Conference on Multimedia and Expo, vol. 2, pp. 1171–1174 (2000)

    Google Scholar 

  19. Lan, D.J., et al.: A novel motion-based representation for video mining. In: Proceedings of the International Conference on Multimedia and Expo, pp. 469–472 (2003)

    Google Scholar 

  20. Liu, Y., et al.: Rushes video summarization using audio-visual information and sequence alignment. In: Proceedings of the 2nd ACM TRECVID Video Summarization Workshop, pp. 114–118 (2008)

    Google Scholar 

  21. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999)

    Google Scholar 

  22. Mei, T., et al.: Near-lossless semantic video summarization and its applications to video analysis. ACM Trans. Multimed. Comput. Commun. Appl. 9(3), 16:1–16:23 (2013)

    Article  Google Scholar 

  23. Ngo, C.W., et al.: Video summarization and scene detection by graph modeling. IEEE Trans. Circ. Syst. Video Tech. 15(2), 296–305 (2005)

    Article  Google Scholar 

  24. Nitta, N., et al.: Content analysis for home videos. ITE Trans. Media Tech. Appl. 1(2), 91–100 (2013)

    Article  Google Scholar 

  25. Ojutkangas, O., Peltola, J., Järvinen, S.: Location based abstraction of user generated mobile videos. In: Atzori, L., Delgado, J., Giusto, D. (eds.) MobiMedia 2011. LNICST, vol. 79, pp. 295–306. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30419-4_25

    Chapter  Google Scholar 

  26. Pan, C.M., et al.: NTU TRECVID-2007 fast rushes summarization system. In: Proceedings of the 1st ACM TRECVID Video Summarization Workshop, pp. 74–78 (2007)

    Google Scholar 

  27. Rublee, E., et al.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2564–2571 (2011)

    Google Scholar 

  28. Shi, J., et al.: Good features to track. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)

    Google Scholar 

  29. Wang, G., et al.: Motch: an automatic motion type characterization system for sensor-rich videos. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1319–1320 (2012)

    Google Scholar 

  30. Xu, J., et al.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2235–2244 (2015)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the EU’s Horizon 2020 research and innovation programme under grant agreement H2020-732665 EMMA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Mezaris .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Apostolidis, K., Apostolidis, E., Mezaris, V. (2018). A Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73603-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73602-0

  • Online ISBN: 978-3-319-73603-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics