Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Shot boundary detection using multimodal Siamese network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Shot Boundary Detection (SBD) is one of the most interesting pre-processing tasks involving all intelligent video analysis applications. An efficient method for SBD is a very important task in this challenge. A wide variety of methods was proposed in the literature to achieve this task. However, only a few of them adopted the multimodal approach to help solve the problem. In this work, we introduced a new multimodal technique for shot boundary detection by learning the distance measure between audiovisual features using the Siamese network. The proposed system consists of two models: Convolutional Neural Network-Gated Recurrent Unit(CNN-GRU) based model for the audio modality and the pre-trained model EfficientNet for the visual modality. The proposed network learns the similarity score from the image embedding features and the Power Spectrum Density (PSD) as audio features. The obtained similarity scores from the proposed network were then used to build a signal which represents the audio-visual change. After that, we used a global threshold for transition detection, and an adaptive threshold to differentiate between the detected transition types (Abrupt or Gradual). The experimental study, applied on standard datasets (TRECvid 2001 and TRECvid 2007) revealed that the introduction of the audio features achieved an interesting improvement, in terms of F1 score (91.36%) and gradual transition (89.06%) compared to the state-of-the-art models. The proposed approach can be incorporated into different multimedia applications to reduce their complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Abdulhussain SH, Ramli AR, Mahmmod BM, Saripan MI, Al-Haddad S, Jassim WA (2019) Shot boundary detection based on orthogonal polynomial. Multimed Tools Appl 78(14):20361–20382

    Article  Google Scholar 

  2. Abdulhussain SH, Ramli AR, Saripan MI, Mahmmod BM, Al-Haddad SAR, Jassim WA, et al. (2018) Methods and challenges in shot boundary detection: a review. Entropy 20(4):214

    Article  Google Scholar 

  3. Amirian S, Rasheed K, Taha TR, Arabnia HR (2020) Automatic image and video caption generation with deep learning: a concise review and algorithmic overlap. IEEE Access 8:218386–218400

    Article  Google Scholar 

  4. Bakkouri I, Afdel K (2020) Computer-aided diagnosis (cad) system based on multi-layer feature fusion network for skin lesion recognition in dermoscopy images. Multimed Tools Appl 79(29):20483–20518

    Article  Google Scholar 

  5. Bansal M, Kumar M, Kumar M, Kumar K (2021) An efficient technique for object recognition using shi-tomasi corner detection algorithm. Soft Comput 25(6):4423–4432

    Article  Google Scholar 

  6. Bouyahi M, Ayed YB (2020) Video scenes segmentation based on multimodal genre prediction. Proc Comput Sci 176:10–21

    Article  Google Scholar 

  7. Bouyahi M, Ayed YB (2021) Multimodal features for shots boundary detection. In: International conference on machine vision, vol. 11605, pp 661–670

  8. Chakladar DD, Kumar P, Roy PP, Dogra DP, Scheme E, Chang V (2021) A multimodal-siamese neural network (msnn) for person verification using signatures and eeg. Inf Fus 71:17–27

    Article  Google Scholar 

  9. Chakraborty S, Thounaojam DM (2019) A novel shot boundary detection system using hybrid optimization technique. Appl Intell 49(9):3207–3220

    Article  Google Scholar 

  10. Chakraborty S, Thounaojam DM (2021) Sbd-duo: a dual stage shot boundary detection technique robust to motion and illumination effect. Multimed Tools Appl 80(2):3071–3087

    Article  Google Scholar 

  11. Chakraborty S, Thounaojam DM, Sinha N (2021) A shot boundary detection technique based on visual colour information. Multimed Tools Appl 80 (3):4007–4022

    Article  Google Scholar 

  12. Chavate S, Mishra R, Yadav P (2021) A comparative analysis of video shot boundary detection using different approaches. In: 2021 10Th international conference on system modeling & advancement in research trends (SMART), pp 1–7

  13. Choi J-A, Lim K (2020) Identifying machine learning techniques for classification of target advertising. ICT Express 6(3):175–180

    Article  Google Scholar 

  14. Deng J, Dong W, Socher R, Li L. -J., Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255

  15. Georgiou T, Liu Y, Chen W, Lew M (2020) A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision. Int J Multimed Inf Retriev 9(3):135–170

    Article  Google Scholar 

  16. Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1763–1771

  17. He L, Shen X-H, Zhang M-H, Wang H-Y (2020) Segmentation method for ship-radiated noise using the generalized likelihood ratio test on an ordinal pattern distribution. Entropy 22(4):374

    Article  MathSciNet  Google Scholar 

  18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  19. Hou L, Jin X, Zhao Z (2019) Time series similarity measure via siamese convolutional neural network. In: 2019 12Th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI), pp 1–6

  20. Ichida AY, Meneguzzi F, Ruiz DD (2018) Measuring semantic similarity between sentences using a siamese neural network. In: 2018 International joint conference on neural networks (IJCNN), pp 1–7

  21. Iwan LH, Thom JA (2017) Temporal video segmentation: detecting the end-of-act in circus performance videos. Multimed Tools Appl 76(1):1379–1401

    Article  Google Scholar 

  22. Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: 2017 12Th IEEE international conference on automatic face gesture recognition, pp 650–657

  23. Langford Z, Eisenbeiser L, Vondal M (2019) Robust signal classification using siamese networks. In: Proceedings of the ACM workshop on wireless security and machine learning, pp 1–5

  24. Mocanu B, Tapu R, Zaharia T (2020) A multimodal high level video segmentation for content targeted online advertising. In: International symposium on visual computing, pp 506–517

  25. Priya GL, Domnic S (2014) Shot based keyframe extraction for ecological video indexing and retrieval. Ecol Inf 23:107–117

    Article  Google Scholar 

  26. Rashmi B, Nagendraswamy H (2021) Video shot boundary detection using block based cumulative approach. Multimed Tools Appl 80(1):641–664

    Article  Google Scholar 

  27. Rastgoo MN, Nakisa B, Maire F, Rakotonirainy A, Chandran V (2019) Automatic driver stress level classification using multimodal deep learning. Expert Syst Appl 112793:138

    Google Scholar 

  28. Sajjad M, Khan ZA, Ullah A, Hussain T, Ullah W, Lee MY, Baik SW (2020) A novel cnn-gru-based hybrid approach for short-term residential load forecasting. IEEE Access 8:143759–143768

    Article  Google Scholar 

  29. Sasithradevi A, Roomi SMM (2020) A new pyramidal opponent color-shape model based video shot boundary detection. J Vis Commun Image Represent 102754:67

    Google Scholar 

  30. Sharma V, Gupta M, Kumar A, Mishra D (2021) Video processing using deep learning techniques: a systematic literature review. IEEE Access 9:139489–139507

    Article  Google Scholar 

  31. Shen L, Hong R, Hao Y (2020) Advance on large scale near-duplicate video retrieval. Front Comput Sci 14(5):1–24

    Article  Google Scholar 

  32. Shoeibi A, Ghassemi N, Alizadehsani R, Rouhani M, Hosseini-Nejad H, Khosravi A, Panahiazar M, Nahavandi S (2021) A comprehensive comparison of handcrafted features and convolutional autoencoders for epileptic seizures detection in eeg signals. Expert Syst Appl 113788:163

    Google Scholar 

  33. Spolaor N, Lee HD, Takaki WSR, Ensina LA, Coy CSR, Wu FC (2020) A systematic review on content-based video retrieval. Eng Appl Artif Intell 103557:90

    Google Scholar 

  34. Sun J, Peng Y, Guo Y, Li D (2021) Segmentation of the multimodal brain tumor image used the multi-pathway architecture method based on 3d fcn. Neurocomputing 423:34–45

    Article  Google Scholar 

  35. Supriya S, Siuly S, Wang H, Zhang Y (2020) Automated epilepsy detection techniques from electroencephalogram signals: a review study. Health Inf Sci Syst 8(1):1–15

    Article  Google Scholar 

  36. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks International conference on machine learning, pp 6105–6114

  37. Tanberk S, Dağlı V, Gürkan MK (2021) Deep learning for videoconferencing: a brief examination of speech to text and speech synthesis. In: 6Th international conference on computer science and engineering (UBMK), pp 506–511

  38. Thounaojam DM, Bhadouria VS, Roy S, Singh K, et al. (2017) Shot boundary detection using perceptual and semantic information. Int J Multimed Inf Retr 6(2):167–174

    Article  Google Scholar 

  39. Tippaya S, Sitjongsataporn S, Tan T, Khan MM, Chamnongthai K (2017) Multi-modal visual features-based video shot boundary detection. IEEE Access 5:12563–12575

    Article  Google Scholar 

  40. Zhu Q, Guo X, Deng W, Guan Q, Zhong Y, Zhang L, Li D (2022) Land-use/land-cover change detection based on a siamese global learning framework for high spatial resolution remote sensing imagery. J Photogrammetry Remote Sens 184:63–78

    Article  Google Scholar 

Download references

Funding

The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bouyahi Mohamed.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ben Ayed Yassine contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohamed, B., Yassine, B.A. Shot boundary detection using multimodal Siamese network. Multimed Tools Appl 83, 5055–5078 (2024). https://doi.org/10.1007/s11042-023-15428-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15428-4

Keywords

Navigation