Abstract
Given recent advances in learned video prediction, we investigate whether a simple video codec using a pretrained deep model for next frame prediction based on previously encoded/decoded frames without sending any motion side information can compete with standard video codecs based on block motion compensation. Frame differences given learned frame predictions are encoded by a standard still-image (intra) codec. Experimental results show that the rate distortion performance of the simple codec with symmetric complexity is on average better than that of x264 codec on 10 MPEG test videos, but does not yet reach the level of x265 codec. This result demonstrates the power of learned frame prediction (LFP), since unlike motion compensation, LFP does not use information from the current picture. The implications of training with \(\ell ^1\), \(\ell ^2\) or combined \(\ell ^2\) and adversarial loss on prediction performance and compression efficiency are analyzed.
Similar content being viewed by others
References
A new image format for the web. https://developers.google.com/speed/webp
x264: A high performance h.264/avc encoder. https://www.videolan.org/developers/x264.html (2006)
Babaeizadeh, M., Finn, C., Erhan, D., Campbell, R.H., Levine, S.: Stochastic variational video prediction. In: International Conference on Learning Representations (ICLR), Vancouver, Canada (2018)
Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling (2018). arXiv:1803.01271.pdf
Bellard, F.: Better portable graphics. https://www.bellard.org/bpg. Last accessed: April 2020
Bellard, F.: Ffmpeg multimedia system. https://www.ffmpeg.org/ [Last accessed: Apr. 2020]
Bjontegaard, G.: Calculation of average PSNR differences between rd-curves. VCEG-M33 (2001)
Chen, Z., He, T., Jin, X., Wu, F.: Learning for video compression. IEEE Trans. Circuits Syst. Video Technol. 30(2), 566–576 (2020)
Chintala, S., Denton, E., Arjovsky, M., Mathieu, M.: How to train a GAN? Tips and tricks to make GANs work. https://github.com/soumith/ganhacks (2016)
Choi, H., Bajić, I.V.: Deep Frame Prediction for Video Coding. IEEE Trans. Circuits Syst. Video Technol. 30(7), 1843–1855 (2020)
Chollet, F.: Deep Learning with Python. Manning Publications Company, Shelter Island (2017)
Denton, E., Fergus, R.: Stochastic video generation with a learned prior. In: Proceedings of International Conference on Machine Learning (PMLR), vol. 80, pp. 1174–1183 (2018)
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems, pp. 658–666 (2016)
Dumas, T., Roumy, A., Guillemot, C.: Autoencoder based image compression: can the learning be quantization independent? In: IEEE ICASSP, Calgary, Canada (2018)
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)
Huo, S., Liu, D., Wu, F., Li, H.: Convolutional neural network-based motion compensation refinement for video coding. In: IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy (2018)
Kalchbrenner, N., Oord, A.v.d., Simonyan, K., Danihelka, I., Vinyals, O., Graves, A., Kavukcuoglu, K.: Video pixel networks. In: Proceedings of International Conference on Machine Learning (PMLR), vol. 70, pp. 1771–1779 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Representations Learning (ICLR) (2015)
Lee, A.X., Zhang, R., Ebert, F., Abbeel, P., Finn, C., Levine, S.: Stochastic adversarial video prediction (2018). arXiv:1804.01523
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), vol. 1, p. 4 (2017)
Lin, J., Liu, D., Li, H., Wu, F.: Generative adversarial network-based frame extrapolation for video coding. In: Visual Communications and Image Processing (VCIP) (2018)
Lu, G., Zhang, X., Chen, L., Gao, Z.: Novel integration of frame rate up conversion and HEVC coding based on rate-distortion optimization. IEEE Trans. Image Process. 27(2), 678–691 (2018)
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: Proceedings of International Conference on Learning Representation (ICLR) (2016)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (Poster) (2016)
Schwarz, H., Wiegand, T.: Video coding: part II of fundamentals of source and video coding. Found. Trends Signal Process. 10(1–3), 1–346 (2016)
Selva Castelló, J.: A comprehensive survey on deep future frame video prediction. Master’s thesis, Universitat Politècnica de Catalunya (2018)
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: IEEE Conference on CVPR, pp. 1874–1883 (2016)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild (2012). arXiv:1212.0402
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: International Conference on Machine Learning, pp. 843–852 (2015)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017)
Timofte, R., et al.: NTIRE 2017 challenge on single image super-resolution: methods and results. In: IEEE Conference Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1110–1121 (2017)
Timofte, R., et al.: NTIRE 2018 challenge on single image super-resolution: methods and results. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 965–976 (2018)
Van Amersfoort, J., Kannan, A., Ranzato, M., Szlam, A., Tran, D., Chintala, S.: Transformation-based models of video sequences (2017). arXiv:1701.08435
van den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: Proceedings of International Conference on Machine Learning (ICML), vol. 48, pp. 1747–1756 (2016)
Villegas, R., Pathak, A., Kannan, H., Erhan, D., Le, Q.V., Lee, H.: High fidelity video prediction with large stochastic recurrent neural networks. In: Conference on Neural Information Processing Systems (NIPS) (2019)
Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. In: International Conference on Machine Learning (ICML) (2017)
Vondrick, C., Torralba, A.: Generating the future with adversarial transformers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 3 (2017)
Wang, X., et al.: ESRGAN: enhanced super-resolution generative adversarial networks. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Wang, Y., Fan, X., Jia, C., Zhao, D., Gao, W.: Neural network based inter prediction for HEVC. In: IEEE International Conference on Multimedia and Expo (2018)
Wichers, N., Villegas, R., Erhan, D., Lee, H.: Hierarchical long-term video prediction without supervision. In: Proceedings of International Conference on Machine Learning (PMLR), Stockholm (2018)
Xia, S., Yang, W., Hu, Y., Liu, J.: Deep inter prediction via pixel-wise motion oriented reference generation. In: IEEE International Conference Image Processing (2019)
Zhao, L., Wang, S., Zhang, X., Wang, S., Ma, S., Gao, W.: Enhanced CTU-level inter prediction with deep frame rate up-conversion for high efficiency video coding. In: IEEE International Conference on Image Processing (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A. M. Tekalp acknowledges support from the TUBITAK project 217E033 and Turkish Academy of Sciences (TUBA).
Rights and permissions
About this article
Cite this article
Sulun, S., Tekalp, A.M. Can learned frame prediction compete with block motion compensation for video coding?. SIViP 15, 401–410 (2021). https://doi.org/10.1007/s11760-020-01751-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-020-01751-y