Shot boundary detection using multimodal Siamese network

Bouyahi Mohamed¹ &
Ben Ayed Yassine¹

154 Accesses
Explore all metrics

Abstract

Shot Boundary Detection (SBD) is one of the most interesting pre-processing tasks involving all intelligent video analysis applications. An efficient method for SBD is a very important task in this challenge. A wide variety of methods was proposed in the literature to achieve this task. However, only a few of them adopted the multimodal approach to help solve the problem. In this work, we introduced a new multimodal technique for shot boundary detection by learning the distance measure between audiovisual features using the Siamese network. The proposed system consists of two models: Convolutional Neural Network-Gated Recurrent Unit(CNN-GRU) based model for the audio modality and the pre-trained model EfficientNet for the visual modality. The proposed network learns the similarity score from the image embedding features and the Power Spectrum Density (PSD) as audio features. The obtained similarity scores from the proposed network were then used to build a signal which represents the audio-visual change. After that, we used a global threshold for transition detection, and an adaptive threshold to differentiate between the detected transition types (Abrupt or Gradual). The experimental study, applied on standard datasets (TRECvid 2001 and TRECvid 2007) revealed that the introduction of the audio features achieved an interesting improvement, in terms of F1 score (91.36%) and gradual transition (89.06%) compared to the state-of-the-art models. The proposed approach can be incorporated into different multimedia applications to reduce their complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced video temporal segmentation using a Siamese network with multimodal features

Article 07 July 2023

Shot Boundary Detection Using Artificial Neural Network

Shot boundary detection in video using dual-stage optimized VGGNet based feature fusion and classification

Article 26 September 2023

References

Abdulhussain SH, Ramli AR, Mahmmod BM, Saripan MI, Al-Haddad S, Jassim WA (2019) Shot boundary detection based on orthogonal polynomial. Multimed Tools Appl 78(14):20361–20382
Article Google Scholar
Abdulhussain SH, Ramli AR, Saripan MI, Mahmmod BM, Al-Haddad SAR, Jassim WA, et al. (2018) Methods and challenges in shot boundary detection: a review. Entropy 20(4):214
Article Google Scholar
Amirian S, Rasheed K, Taha TR, Arabnia HR (2020) Automatic image and video caption generation with deep learning: a concise review and algorithmic overlap. IEEE Access 8:218386–218400
Article Google Scholar
Bakkouri I, Afdel K (2020) Computer-aided diagnosis (cad) system based on multi-layer feature fusion network for skin lesion recognition in dermoscopy images. Multimed Tools Appl 79(29):20483–20518
Article Google Scholar
Bansal M, Kumar M, Kumar M, Kumar K (2021) An efficient technique for object recognition using shi-tomasi corner detection algorithm. Soft Comput 25(6):4423–4432
Article Google Scholar
Bouyahi M, Ayed YB (2020) Video scenes segmentation based on multimodal genre prediction. Proc Comput Sci 176:10–21
Article Google Scholar
Bouyahi M, Ayed YB (2021) Multimodal features for shots boundary detection. In: International conference on machine vision, vol. 11605, pp 661–670
Chakladar DD, Kumar P, Roy PP, Dogra DP, Scheme E, Chang V (2021) A multimodal-siamese neural network (msnn) for person verification using signatures and eeg. Inf Fus 71:17–27
Article Google Scholar
Chakraborty S, Thounaojam DM (2019) A novel shot boundary detection system using hybrid optimization technique. Appl Intell 49(9):3207–3220
Article Google Scholar
Chakraborty S, Thounaojam DM (2021) Sbd-duo: a dual stage shot boundary detection technique robust to motion and illumination effect. Multimed Tools Appl 80(2):3071–3087
Article Google Scholar
Chakraborty S, Thounaojam DM, Sinha N (2021) A shot boundary detection technique based on visual colour information. Multimed Tools Appl 80 (3):4007–4022
Article Google Scholar
Chavate S, Mishra R, Yadav P (2021) A comparative analysis of video shot boundary detection using different approaches. In: 2021 10Th international conference on system modeling & advancement in research trends (SMART), pp 1–7
Choi J-A, Lim K (2020) Identifying machine learning techniques for classification of target advertising. ICT Express 6(3):175–180
Article Google Scholar
Deng J, Dong W, Socher R, Li L. -J., Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255
Georgiou T, Liu Y, Chen W, Lew M (2020) A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision. Int J Multimed Inf Retriev 9(3):135–170
Article Google Scholar
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 1763–1771
He L, Shen X-H, Zhang M-H, Wang H-Y (2020) Segmentation method for ship-radiated noise using the generalized likelihood ratio test on an ordinal pattern distribution. Entropy 22(4):374
Article MathSciNet Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Hou L, Jin X, Zhao Z (2019) Time series similarity measure via siamese convolutional neural network. In: 2019 12Th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI), pp 1–6
Ichida AY, Meneguzzi F, Ruiz DD (2018) Measuring semantic similarity between sentences using a siamese neural network. In: 2018 International joint conference on neural networks (IJCNN), pp 1–7
Iwan LH, Thom JA (2017) Temporal video segmentation: detecting the end-of-act in circus performance videos. Multimed Tools Appl 76(1):1379–1401
Article Google Scholar
Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: 2017 12Th IEEE international conference on automatic face gesture recognition, pp 650–657
Langford Z, Eisenbeiser L, Vondal M (2019) Robust signal classification using siamese networks. In: Proceedings of the ACM workshop on wireless security and machine learning, pp 1–5
Mocanu B, Tapu R, Zaharia T (2020) A multimodal high level video segmentation for content targeted online advertising. In: International symposium on visual computing, pp 506–517
Priya GL, Domnic S (2014) Shot based keyframe extraction for ecological video indexing and retrieval. Ecol Inf 23:107–117
Article Google Scholar
Rashmi B, Nagendraswamy H (2021) Video shot boundary detection using block based cumulative approach. Multimed Tools Appl 80(1):641–664
Article Google Scholar
Rastgoo MN, Nakisa B, Maire F, Rakotonirainy A, Chandran V (2019) Automatic driver stress level classification using multimodal deep learning. Expert Syst Appl 112793:138
Google Scholar
Sajjad M, Khan ZA, Ullah A, Hussain T, Ullah W, Lee MY, Baik SW (2020) A novel cnn-gru-based hybrid approach for short-term residential load forecasting. IEEE Access 8:143759–143768
Article Google Scholar
Sasithradevi A, Roomi SMM (2020) A new pyramidal opponent color-shape model based video shot boundary detection. J Vis Commun Image Represent 102754:67
Google Scholar
Sharma V, Gupta M, Kumar A, Mishra D (2021) Video processing using deep learning techniques: a systematic literature review. IEEE Access 9:139489–139507
Article Google Scholar
Shen L, Hong R, Hao Y (2020) Advance on large scale near-duplicate video retrieval. Front Comput Sci 14(5):1–24
Article Google Scholar
Shoeibi A, Ghassemi N, Alizadehsani R, Rouhani M, Hosseini-Nejad H, Khosravi A, Panahiazar M, Nahavandi S (2021) A comprehensive comparison of handcrafted features and convolutional autoencoders for epileptic seizures detection in eeg signals. Expert Syst Appl 113788:163
Google Scholar
Spolaor N, Lee HD, Takaki WSR, Ensina LA, Coy CSR, Wu FC (2020) A systematic review on content-based video retrieval. Eng Appl Artif Intell 103557:90
Google Scholar
Sun J, Peng Y, Guo Y, Li D (2021) Segmentation of the multimodal brain tumor image used the multi-pathway architecture method based on 3d fcn. Neurocomputing 423:34–45
Article Google Scholar
Supriya S, Siuly S, Wang H, Zhang Y (2020) Automated epilepsy detection techniques from electroencephalogram signals: a review study. Health Inf Sci Syst 8(1):1–15
Article Google Scholar
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks International conference on machine learning, pp 6105–6114
Tanberk S, Dağlı V, Gürkan MK (2021) Deep learning for videoconferencing: a brief examination of speech to text and speech synthesis. In: 6Th international conference on computer science and engineering (UBMK), pp 506–511
Thounaojam DM, Bhadouria VS, Roy S, Singh K, et al. (2017) Shot boundary detection using perceptual and semantic information. Int J Multimed Inf Retr 6(2):167–174
Article Google Scholar
Tippaya S, Sitjongsataporn S, Tan T, Khan MM, Chamnongthai K (2017) Multi-modal visual features-based video shot boundary detection. IEEE Access 5:12563–12575
Article Google Scholar
Zhu Q, Guo X, Deng W, Guan Q, Zhong Y, Zhang L, Li D (2022) Land-use/land-cover change detection based on a siamese global learning framework for high spatial resolution remote sensing imagery. J Photogrammetry Remote Sens 184:63–78
Article Google Scholar

Download references

Funding

The authors have no relevant financial or non-financial interests to disclose.

Author information

Authors and Affiliations

MIRACL Laboratory: Multimedia, InfoRmation Systems and Advanced Computing, Sfax University, Sfax, Tunisia
Bouyahi Mohamed & Ben Ayed Yassine

Authors

Bouyahi Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Ben Ayed Yassine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bouyahi Mohamed.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ben Ayed Yassine contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mohamed, B., Yassine, B.A. Shot boundary detection using multimodal Siamese network. Multimed Tools Appl 83, 5055–5078 (2024). https://doi.org/10.1007/s11042-023-15428-4

Download citation

Received: 17 February 2022
Revised: 25 March 2023
Accepted: 18 April 2023
Published: 30 May 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15428-4

Shot boundary detection using multimodal Siamese network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhanced video temporal segmentation using a Siamese network with multimodal features

Shot Boundary Detection Using Artificial Neural Network

Shot boundary detection in video using dual-stage optimized VGGNet based feature fusion and classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Shot boundary detection using multimodal Siamese network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhanced video temporal segmentation using a Siamese network with multimodal features

Shot Boundary Detection Using Artificial Neural Network

Shot boundary detection in video using dual-stage optimized VGGNet based feature fusion and classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation