Abstract
This paper presents an algorithm for the temporal segmentation of user-generated videos into visually coherent parts that correspond to individual video capturing activities. The latter include camera pan and tilt, change in focal length and camera displacement. The proposed approach identifies the aforementioned activities by extracting and evaluating the region-level spatio-temporal distribution of the optical flow over sequences of neighbouring video frames. The performance of the algorithm was evaluated with the help of a newly constructed ground-truth dataset, against several state-of-the-art techniques and variations of them. Extensive evaluation indicates the competitiveness of the proposed approach in terms of detection accuracy, and highlight its suitability for analysing large collections of data in a time-efficient manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Some works reported in Sect. 2 use certain datasets (TRECVid 2007 rushes summarization, UT Ego, ADL and GTEA Gaze) which were designed for assessing the efficiency of methods targeting specific types of analysis, such as video rushes segmentation [3] and the identification of everyday activities [30] and thus, ground-truth sub-shot segmentation is not available for them.
- 2.
References
Abdollahian, G., et al.: Camera motion-based analysis of user generated video. IEEE Trans. Multimed. 12(1), 28–41 (2010)
Apostolidis, E., et al.: Fast shot segmentation combining global and local visual descriptors. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6583–6587 (2014). http://mklab.iti.gr/project/video-shot-segm
Bai, L., et al.: Automatic summarization of rushes video using bipartite graphs. Multimed. Tools Appl. 49(1), 63–80 (2010)
Bay, H., et al.: Surf: speeded up robust features. In: Proceedings of the 9th European Conference on Computer Vision, pp. 404–417 (2006)
Benois-Pineau, J., Lovell, B.C., Andrews, R.J.: Motion estimation in colour image sequences. In: Fernandez-Maloigne, C. (ed.) Advanced Color Image Processing and Analysis, pp. 377–395. Springer, New York (2013). https://doi.org/10.1007/978-1-4419-6190-7_11
Bouguet, J.Y.: Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp. 5(1–10), 4 (2001)
Chu, W.T., et al.: Video copy detection based on bag of trajectory and two-level approximate sequence. In: Proceedings of the Computer Vision, Graphics, and Image Processing Conference (2010)
Cooray, S.H., et al.: An interactive and multi-level framework for summarising user generated videos. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 685–688 (2009)
Cooray, S.H., et al.: Identifying an efficient and robust sub-shot segmentation method for home movie summarisation. In: 10th International Conference on Intelligent Systems Design and Applications, pp. 1287–1292 (2010)
Cricri, F., et al.: Multimodal event detection in user generated videos. In: IEEE International Symposium on Multimedia, pp. 263–270 (2011)
Dumont, E., et al.: Rushes video summarization using a collaborative approach. In: Proceedings of the 2nd ACM TRECVID Video Summarization Workshop, pp. 90–94 (2008)
Durik, M., et al.: Robust motion characterisation for video indexing based on MPEG2 optical flow. In: International Workshop on Content-Based Multimedia Indexing, pp. 57–64 (2001)
Fischler, M.A., et al.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun. 24(6), 381–395 (1981)
González-Díaz, I., et al.: Temporal segmentation and keyframe selection methods for user-generated video search-based annotation. Expert Syst. Appl. 42(1), 488–502 (2015)
Guo, Y., et al.: Selecting video key frames based on relative entropy and the extreme studentized deviate test. Entropy 18(3), 73 (2016)
Haller, M., et al.: A generic approach for motion-based video parsing. In: 15th European Signal Processing Conference, pp. 713–717 (2007)
Karaman, S., et al.: Hierarchical hidden Markov model in detecting activities of daily living in wearable videos for studies of dementia. Multimed. Tools Appl. 69(3), 743–771 (2014)
Kim, J.G., et al.: Efficient camera motion characterization for mpeg video indexing. In: Proceedings of the IEEE International Conference on Multimedia and Expo, vol. 2, pp. 1171–1174 (2000)
Lan, D.J., et al.: A novel motion-based representation for video mining. In: Proceedings of the International Conference on Multimedia and Expo, pp. 469–472 (2003)
Liu, Y., et al.: Rushes video summarization using audio-visual information and sequence alignment. In: Proceedings of the 2nd ACM TRECVID Video Summarization Workshop, pp. 114–118 (2008)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999)
Mei, T., et al.: Near-lossless semantic video summarization and its applications to video analysis. ACM Trans. Multimed. Comput. Commun. Appl. 9(3), 16:1–16:23 (2013)
Ngo, C.W., et al.: Video summarization and scene detection by graph modeling. IEEE Trans. Circ. Syst. Video Tech. 15(2), 296–305 (2005)
Nitta, N., et al.: Content analysis for home videos. ITE Trans. Media Tech. Appl. 1(2), 91–100 (2013)
Ojutkangas, O., Peltola, J., Järvinen, S.: Location based abstraction of user generated mobile videos. In: Atzori, L., Delgado, J., Giusto, D. (eds.) MobiMedia 2011. LNICST, vol. 79, pp. 295–306. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30419-4_25
Pan, C.M., et al.: NTU TRECVID-2007 fast rushes summarization system. In: Proceedings of the 1st ACM TRECVID Video Summarization Workshop, pp. 74–78 (2007)
Rublee, E., et al.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of IEEE International Conference on Computer Vision, pp. 2564–2571 (2011)
Shi, J., et al.: Good features to track. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)
Wang, G., et al.: Motch: an automatic motion type characterization system for sensor-rich videos. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1319–1320 (2012)
Xu, J., et al.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2235–2244 (2015)
Acknowledgements
This work was supported by the EU’s Horizon 2020 research and innovation programme under grant agreement H2020-732665 EMMA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Apostolidis, K., Apostolidis, E., Mezaris, V. (2018). A Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-73603-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)