Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-20068-7_31guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

FAST-VQA: Efficient End-to-End Video Quality Assessment with Fragment Sampling

Published: 23 October 2022 Publication History

Abstract

Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos. This cost hinders them from learning better video-quality-related representations via end-to-end training. Existing approaches typically consider naive sampling to reduce the computational cost, such as resizing and cropping. However, they obviously corrupt quality-related information in videos and are thus not optimal to learn good representations for VQA. Therefore, there is an eager need to design a new quality-retained sampling scheme for VQA. In this paper, we propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution and covers global quality with contextual relations via mini-patches sampled in uniform grids. These mini-patches are spliced and aligned temporally, named as fragments. We further build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs. Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations. It improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos. The newly learned video-quality-related representations can also be transferred into smaller VQA datasets, boosting the performance on these scenarios. Extensive experiments show that FAST-VQA has good performance on inputs of various resolutions while retaining high efficiency. We publish our code at https://github.com/timothyhtimothy/FAST-VQA.

References

[1]
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lucic, M., Schmid, C.: ViViT: a video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6836–6846, October 2021
[2]
Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: ActivityNet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2015
[3]
Chen B, Zhu L, Li G, Lu F, Fan H, and Wang S Learning generalized spatial-temporal deep feature representation for no-reference video quality assessment IEEE Trans. Circ. Syst. Video Technol. 2021 32 4 1903-1916
[4]
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. ACL (2014)
[5]
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
[6]
Fan, H., et al.: Multiscale vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6824–6835, October 2021
[7]
Ghadiyaram D, Pan J, Bovik AC, Moorthy AK, Panda P, and Yang KC In-capture mobile video distortions: a study of subjective behavior and objective algorithms IEEE Trans. Circ. Syst. Video Technol. 2018 28 9 2061-2077
[8]
Götz-Hahn F, Hosu V, Lin H, and Saupe D KonVid-150k: a dataset for no-reference video quality assessment of videos in-the-wild IEEE Access 2021 9 72139-72160
[9]
Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
[10]
Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3D residual networks for action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 3154–3160 (2017)
[11]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
[12]
Hosu, V., et al.: The Konstanz natural video database (KoNViD-1k). In: Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6 (2017)
[13]
Hosu V, Lin H, Sziranyi T, and Saupe D KonIQ-10k: an ecologically valid database for deep learning of blind image quality assessment IEEE Trans. Image Process. 2020 29 4041-4056
[14]
Kang, L., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for no-reference image quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
[15]
Kang, L., Ye, P., Li, Y., Doermann, D.: Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks. In: IEEE International Conference on Image Processing (ICIP) (2015)
[16]
Kay, W., et al.: The kinetics human action video dataset. ArXiv abs/1705.06950 (2017)
[17]
Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: MUSIQ: multi-scale image quality transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5148–5157, October 2021
[18]
Kim W, Kim J, Ahn S, Kim J, and Lee S Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Deep video quality assessor: from spatio-temporal visual sensitivity to a convolutional neural aggregation network Computer Vision – ECCV 2018 2018 Cham Springer 224-241
[19]
Kolesnikov, A., et al.: An image is worth 16×16 words: transformers for image recognition at scale (2021)
[20]
Korhonen J Two-level approach for no-reference consumer video quality assessment IEEE Trans. Image Process. 2019 28 12 5923-5938
[21]
Korhonen, J., Su, Y., You, J.: Blind natural video quality prediction via statistical temporal features and deep spatial features. In: Proceedings of the 28th ACM International Conference on Multimedia, MM 2020, pp. 3311–3319. Association for Computing Machinery, New York (2020)
[22]
Li B, Zhang W, Tian M, Zhai G, and Wang X Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception IEEE Trans. Circ. Syst. Video Technol. 2022 32 9 5944-5958
[23]
Li, D., Jiang, T., Jiang, M.: Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, pp. 2351–2359. Association for Computing Machinery, New York (2019)
[24]
Li D, Jiang T, and Jiang M Unified quality assessment of in-the-wild videos with mixed datasets training Int. J. Comput. Vis. 2021 129 4 1238-1257
[25]
Li D, Jiang T, Lin W, and Jiang M Which has better visual quality: the clear blue sky or a blurry animal? IEEE Trans. Multimedia 2019 21 5 1221-1234
[26]
Liao, L., et al.: Exploring the effectiveness of video perceptual representation in blind video quality assessment. In: Proceedings of the 30th ACM International Conference on Multimedia (ACM MM) (2022)
[27]
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)
[28]
Liu, Z., et al.: Video swin transformer. arXiv preprint arXiv:2106.13230 (2021)
[29]
Mittal A, Moorthy AK, and Bovik AC No-reference image quality assessment in the spatial domain IEEE Trans. Image Process. 2012 21 12 4695-4708
[30]
Mittal A, Saad MA, and Bovik AC A completely blind video integrity oracle IEEE Trans. Image Process. 2016 25 1 289-300
[31]
Nuutinen M, Virtanen T, Vaahteranoksa M, Vuori T, Oittinen P, and Häkkinen J CVD2014–a database for evaluating no-reference video quality assessment algorithms IEEE Trans. Image Process. 2016 25 7 3073-3086
[32]
Saad MA, Bovik AC, and Charrier C Blind image quality assessment: a natural scene statistics approach in the DCT domain IEEE Trans. Image Process. 2012 21 8 3339-3352
[33]
Sinno Z and Bovik AC Large-scale study of perceptual video quality IEEE Trans. Image Process. 2019 28 2 612-627
[34]
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 4278–4284. AAAI Press (2017)
[35]
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., J’egou, H.: Training data-efficient image transformers & distillation through attention. In: Proceedings of the International Conference on Machine Learning (ICML) (2021)
[36]
Tu Z, Wang Y, Birkbeck N, Adsumilli B, and Bovik AC UGC-VQA: benchmarking blind video quality assessment for user generated content IEEE Trans. Image Process. 2021 30 4449-4464
[37]
Tu Z, Yu X, Wang Y, Birkbeck N, Adsumilli B, and Bovik AC RAPIQUE: rapid and accurate video quality prediction of user generated content IEEE Open J. Sig. Process. 2021 2 425-440
[38]
Wang, Y., et al.: Rich features for perceptual quality assessment of UGC videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13435–13444, June 2021
[39]
Wu, H., et al.: DisCoVQA: temporal distortion-content transformers for video quality assessment. arXiv preprint arXiv: 2206.09853 (2022)
[40]
Yim, J.G., Wang, Y., Birkbeck, N., Adsumilli, B.: Subjective quality assessment for YouTube UGC dataset. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 131–135 (2020)
[41]
Ying, Z.A., Niu, H., Gupta, P., Mahajan, D., Ghadiyaram, D., Bovik, A.: From patches to pictures (PaQ-2-PiQ): mapping the perceptual space of picture quality. arXiv preprint arXiv:1912.10088 (2019)
[42]
Ying, Z., Mandal, M., Ghadiyaram, D., Bovik, A.: Patch-VQ: ‘patching up’ the video quality problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14019–14029, June 2021
[43]
You, J.: Long short-term convolutional transformer for no-reference video quality assessment. In: Proceedings of the 29th ACM International Conference on Multimedia, MM 2021, pp. 2112–2120. Association for Computing Machinery, New York (2021)
[44]
You, J., Korhonen, J.: Deep neural networks for no-reference video quality assessment. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 2349–2353 (2019)
[45]
Zhang W, Ma K, Yan J, Deng D, and Wang Z Blind image quality assessment using a deep bilinear convolutional neural network IEEE Trans. Circ. Syst. Video Technol. 2020 30 1 36-47

Cited By

View all
  • (2024)Using Spatial-Temporal Attention for Video Quality EvaluationInternational Journal of Intelligent Systems10.1155/2024/55146272024Online publication date: 1-Jan-2024
  • (2024)MT-VQA: A Multi-task Approach for Quality Assessment of Short-form VideosProceedings of the 3rd Workshop on Quality of Experience in Visual Multimedia Applications10.1145/3689093.3689181(30-38)Online publication date: 28-Oct-2024
  • (2024)Dual-Criterion Quality Loss for Blind Image Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681250(7823-7832)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. FAST-VQA: Efficient End-to-End Video Quality Assessment with Fragment Sampling
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VI
    Oct 2022
    803 pages
    ISBN:978-3-031-20067-0
    DOI:10.1007/978-3-031-20068-7

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 23 October 2022

    Author Tags

    1. Video quality assessment
    2. fragments

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Using Spatial-Temporal Attention for Video Quality EvaluationInternational Journal of Intelligent Systems10.1155/2024/55146272024Online publication date: 1-Jan-2024
    • (2024)MT-VQA: A Multi-task Approach for Quality Assessment of Short-form VideosProceedings of the 3rd Workshop on Quality of Experience in Visual Multimedia Applications10.1145/3689093.3689181(30-38)Online publication date: 28-Oct-2024
    • (2024)Dual-Criterion Quality Loss for Blind Image Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681250(7823-7832)Online publication date: 28-Oct-2024
    • (2024)Subjective and Objective Quality-of-Experience Assessment for 3D Talking HeadsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680964(6033-6042)Online publication date: 28-Oct-2024
    • (2024)Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training StrategyProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680907(9913-9922)Online publication date: 28-Oct-2024
    • (2024)Subjective-Aligned Dataset and Metric for Text-to-Video Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680868(7793-7802)Online publication date: 28-Oct-2024
    • (2024)Semantic-Aware and Quality-Aware Interaction Network for Blind Video Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680598(9970-9979)Online publication date: 28-Oct-2024
    • (2024)Q-Ground: Image Quality Grounding with Large Multi-modality ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680575(486-495)Online publication date: 28-Oct-2024
    • (2024)GMS-3DQA: Projection-Based Grid Mini-patch Sampling for 3D Model Quality AssessmentACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364381720:6(1-19)Online publication date: 1-Feb-2024
    • (2024)Conformer Based No-Reference Quality Assessment for UGC VideoAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5597-4_39(464-472)Online publication date: 5-Aug-2024
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media