Abstract
The escalating consumption of superior quality streaming videos among digital users has intensified the exploration of Video Super-Resolution (VSR) methodologies. Implementing VSR on the user end enhances video resolution without the need for additional bandwidth or capitalising on localised or edge computing resources. In the contemporary digital era, the proliferation of high-quality video content and the relative simplicity of VSR dataset generation have bolstered the popularity of Deep Neural Network-based VSR (DNN-VSR) approaches. Such dataset generation typically involves associating down-sampled high-resolution videos with their low-resolution equivalents as training instances. Nonetheless, current DNN-VSR techniques predominantly concentrate on enriching down-sampled videos, such as through Bicubic Interpolation (BI), without factoring in the inherent codec loss within video streaming applications, consequently constraining their practicality. This research scrutinises five state-of-the-art (SOTA) DNN-VSR algorithms, contrasting their performance on streaming videos using Fast Forward Moving Picture Expert Group (FFMPEG) to emulate codec loss. Our analysis also integrates subjective testing to address the limitations of objective metrics for VSR evaluation. The manuscript concludes with an introspective discussion of the results and outlines potential avenues for further investigation in the domain.
Similar content being viewed by others
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Jiang J, Sekar V, Zhang H (2014) Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive. IEEE/ACM Trans Netw 22:326–340
Li Z, Zhu X, Gahm J, Pan R, Hu H, Begen A, Oran D (2014) Probe and adapt: rate adaptation for http video streaming at scale. IEEE J Sel Areas Commun 32:719–733
Wen W, Ren W, Shi Y, Nie Y, Zhang J, Cao X (2022) Video super-resolution via a spatio-temporal alignment network. IEEE Trans Image Process 31:1761–1773
Zhang W, Zhou M, Ji C, Sui X, Bai J (2022) Cross-frame transformer-based spatio-temporal video super-resolution. IEEE Trans, Broadcast
Chen Z, Yang W, Yang J (2022) Video super-resolution network using detail component extraction and optical flow enhancement algorithm. Appl Intell 1–13
Hummel RA, Kimia B, Zucker SW (1987) Deblurring gaussian blur. Comput Gr Image Process 38(1):66–80
Liu H, Ruan Z, Zhao P, Dong C, Shang F, Liu Y, Yang L, Timofte R (2022) Video super-resolution based on deep learning: a comprehensive survey. Artif Intell Rev 1–55
Daithankar MV, Ruikar SD (2020) Video super resolution: a review. ICDSMLA 2019:488–495
Tu Z, Li H, Xie W, Liu Y, Zhang S, Li B, Yuan J (2022) Optical flow for video super-resolution: a survey. arXiv:2203.10462
Båvenstrand E (2021) Real-time video super-resolution: a comparative study of interpolation and deep learning approaches to upsampling real-time video
Sodagar I (2011) The mpeg-dash standard for multimedia streaming over the internet. IEEE MultiMedia 18(4):62–67
Tomar S (2006) Converting video formats with FFmpeg. Linux J 2006(146):10
Jo Y, Oh SW, Kang J, Kim SJ (2018) Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation, pp 3224–3232
Isobe T, Li S, Jia X, Yuan S, Slabaugh G, Xu C, Li YL, Wang S, Tian Q Video super-resolution with temporal group attention
Isobe T, Jia X, Gu S, Li S, Wang S, Tian Q Video super-resolution with recurrent structure-detail network
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al (2018) Recent advances in convolutional neural networks. Pattern Recognit 77:354–377
Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850
Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath A (2018) Generative adversarial networks: an overview. IEEE Sig Process Magazine 35:53–65
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: learning optical flow with convolutional networks, pp 2758–2766
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks, pp 4489–4497
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks, pp 764–773
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: more deformable, better results, pp 9308–9316
Wang L, Guo Y, Lin Z, Deng X, An W (2018) Learning for video super-resolution through hr optical flow estimation, pp 514–529
Chu M, Xie Y, Mayer J, Leal-Taixé L, Thuerey N (2020) Learning temporal coherence via self-supervision for gan-based video generation. ACM Trans Graph 39(4):75–1
Ying X, Wang L, Wang Y, Sheng W, An W, Guo Y (2020) Deformable 3d convolution for video super-resolution. IEEE Signal Process Lett 27:1500–1504
Chan KC, Zhou S, Xu X, Loy CC (2021) Investigating tradeoffs in real-world video super-resolution. arXiv:2111.12704
Caballero J, Ledig C, Aitken A, Acosta A, Totz J, Wang Z, Shi W (2017) Real-time video super-resolution with spatio-temporal networks and motion compensation, pp 4778–4787
Gao S, Gruev V (2011) Bilinear and bicubic interpolation methods for division of focal plane polarimeters. Optics Express 19(27):26161–26173
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Seshadrinathan K, Bovik AC (2009) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350
Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. IEEE 2:1398–1402
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441
Pinson MH, Wolf S (2003) Comparing subjective video quality testing methodologies 5150:573–582
Tian Y, Zhang Y, Fu Y, Xu C (2020) Tdan: temporally-deformable alignment network for video super-resolution, pp 3360–3369
Wang X, Chan KC, Yu K, Dong C, Change Loy C (2019) Edvr: video restoration with enhanced deformable convolutional networks, pp 0–0
Funding
The research leading to these results received funding from AIT President’s Doctoral Scholarship 2020.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, X., Qiao, Y., Lee, B. et al. A comparative study of super-resolution algorithms for video streaming application. Multimed Tools Appl 83, 43493–43512 (2024). https://doi.org/10.1007/s11042-023-17230-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17230-8