Data Overfitting for On-device Super-Resolution with Dynamic Algorithm and Compiler Co-design

Gen Li¹³,
Zhihao Shu¹⁴,
Jie Ji¹³,
Minghai Qin^13,14,
Fatemeh Afghah¹³,
Wei Niu¹⁴ &
…
Xiaolong Ma¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15125))

Included in the following conference series:

European Conference on Computer Vision

15 Accesses

Abstract

Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN’s overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7$\times $ speedup while saving up to 1.61$\times $ memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: $\{$TensorFlow$\}$: a system for $\{$Large-Scale$\}$ machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)
Google Scholar
Ahn, B.H., Lee, J., Lin, J.M., Cheng, H.P., Hou, J., Esmaeilzadeh, H.: Ordering chaos: memory-aware scheduling of irregularly wired neural networks for edge devices. Proc. Mach. Learn. Syst. 2, 44–57 (2020)
Google Scholar
Bengio, Y., LeCun, Y., et al.: Scaling learning algorithms towards AI. Large-Scale Kernel Mach. 34(5), 1–41 (2007)
Google Scholar
Caballero, J., et al.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787 (2017)
Google Scholar
Cai, X., Xu, T., Yi, J., Huang, J., Rajasekaran, S.: Dtwnet: a dynamic time warping network. In: Advances in Neural Information Processing Systems, vol. 32 (NeurIPS 2019). NeurIPS Foundation (2019)
Google Scholar
Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: Investigating tradeoffs in real-world video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5962–5971 (2022)
Google Scholar
Chen, H., et al.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)
Google Scholar
Chen, J., Hu, M., Luo, Z., Wang, Z., Wu, D.: SR360: boosting 360-degree video streaming with super-resolution. In: Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, pp. 1–6 (2020)
Google Scholar
Chen, Y., Dai, X., Liu, M., Chen, D.D., Yuan, L., Liu, Z.: Dynamic convolution: attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF (2020)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)
Google Scholar
Dasari, M., Bhattacharya, A., Vargas, S., Sahu, P., Balasubramanian, A., Das, S.R.: Streaming 360-degree videos using super-resolution. In: IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 1977–1986. IEEE (2020)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
Article Google Scholar
Dong, C., Loy, C.C., Tang, X.: Accelerating the super-resolution convolutional neural network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 391–407. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_25
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fan, Y., Tian, F., Qin, T., Bian, J., Liu, T.Y.: Learning what data to learn. arXiv preprint arXiv:1702.08635 (2017)
Figurnov, M., et al.: Spatially adaptive computation time for residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://arxiv.org/abs/1612.02297
Ghosh-Dastidar, S., Adeli, H.: Spiking neural networks (2009). https://doi.org/10.1142/S0129065709002002
Gui, T., Zhang, Q., Huang, X., et al.: Long short-term memory with dynamic skip connections. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI (2018). https://doi.org/10.1609/aaai.v33i01.33016481
Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs (2016)
Google Scholar
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. IEEE (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hong, C., Kim, H., Baik, S., Oh, J., Lee, K.M.: DAQ: channel-wise distribution-aware quantization for deep image super-resolution networks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2675–2684 (2022)
Google Scholar
Ji, J., et al.: Advancing dynamic sparse training by exploring optimization opportunities. In: Forty-First International Conference on Machine Learning (2024). https://openreview.net/forum?id=szRHR9XGrY
Jiang, X., et al.: MNN: a universal and efficient inference engine. In: MLSys (2020)
Google Scholar
Jiang, X., et al.: MNN: a universal and efficient inference engine. Proc. Mach. Learn. Syst. 2, 1–13 (2020)
Google Scholar
Khani, M., Sivaraman, V., Alizadeh, M.: Efficient video compression via content-adaptive super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4521–4530 (2021)
Google Scholar
Kim, J., Jung, Y., Yeo, H., Ye, J., Han, D.: Neural-enhanced live streaming: improving live video ingest via online learning. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 107–125 (2020)
Google Scholar
Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)
Google Scholar
Kim, T.H., Sajjadi, M.S., Hirsch, M., Scholkopf, B.: Spatio-temporal transformer network for video restoration. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 106–122 (2018)
Google Scholar
Kumar, M., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Google Scholar
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Google Scholar
Li, G., et al.: Towards high-quality and efficient video super-resolution via spatial-temporal data overfitting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10259–10269 (2023)
Google Scholar
Li, G., et al.: Neurrev: train better sparse neural network practically via neuron revitalization. In: The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=60lNoatp7u
Li, X., et al.: Efficient meta-tuning for content-aware neural video delivery. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 308–324. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_18
Chapter Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)
Google Scholar
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
Google Scholar
Liu, J., et al.: Overfitting the data: compact neural video delivery via content-aware feature modulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4631–4640 (2021)
Google Scholar
Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2021)
Google Scholar
Mercat, A., Viitanen, M., Vanne, J.: UVG dataset: 50/120FPS 4K sequences for video codec analysis and development. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 297–302 (2020)
Google Scholar
Nasution, A., Efendi, S., Suwilo, S.: Image steganography in securing sound file using arithmetic coding algorithm, triple data encryption standard (3DES) and modified least significant bit (MLSB). In: Journal of Physics: Conference Series, vol. 1007, p. 012010. IOP Publishing (2018)
Google Scholar
Niu, B., et al.: Single image super-resolution via a holistic attention network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 191–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_12
Chapter Google Scholar
ONNX: Open neural network exchange. https://www.onnx.ai
Pisarchyk, Y., Lee, J.: Efficient memory management for deep neural net inference. arXiv preprint arXiv:2001.03288 (2020)
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)
Google Scholar
Qualcomm: Snapdragon 8 gen 2 (2023)
Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS). NeurIPS Foundation (2017)
Google Scholar
Sajjadi, M.S., Vemulapalli, R., Brown, M.: Frame-recurrent video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634 (2018)
Google Scholar
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4472–4480 (2017)
Google Scholar
Tian, Y., Zhang, Y., Fu, Y., Xu, C.: TDAN: temporally-deformable alignment network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3360–3369 (2020)
Google Scholar
Toneva, M., Sordoni, A., Combes, R.T.D., Trischler, A., Bengio, Y., Gordon, G.J.: An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159 (2018)
Wang, H., Chen, P., Zhuang, B., Shen, C.: Fully quantized image super-resolution networks. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 639–647 (2021)
Google Scholar
Wang, H., Su, D., Liu, C., Jin, L., Sun, X., Peng, X.: Deformable non-local network for video super-resolution. IEEE Access 7, 177734–177744 (2019)
Article Google Scholar
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
Xiao, X., Wang, W., Chen, T., Cao, Y., Jiang, T., Zhang, Q.: Sensor-augmented neural adaptive bitrate video streaming on UAVs. IEEE Trans. Multimedia 22(6), 1567–1576 (2019)
Article Google Scholar
Yeo, H., Chong, C.J., Jung, Y., Ye, J., Han, D.: Nemo: enabling neural-enhanced video streaming on commodity mobile devices. In: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pp. 1–14 (2020)
Google Scholar
Yeo, H., Do, S., Han, D.: How will deep learning change internet video delivery? In: Proceedings of the 16th ACM Workshop on Hot Topics in Networks, pp. 57–64 (2017)
Google Scholar
Yeo, H., Jung, Y., Kim, J., Shin, J., Han, D.: Neural adaptive content-aware internet video delivery. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 645–661 (2018)
Google Scholar
Yin, L., et al.: Dynamic sparsity is channel-level sparsity learner. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar
Ying, X., Wang, L., Wang, Y., Sheng, W., An, W., Guo, Y.: Deformable 3D convolution for video super-resolution. IEEE Signal Process. Lett. 27, 1500–1504 (2020)
Article Google Scholar
Yu, J., et al.: Wide activation for efficient and accurate image super-resolution. arXiv preprint arXiv:1808.08718 (2018)
Yuan, G., et al.: MEST: accurate and fast memory-economic sparse training framework on the edge. In: Advances in Neural Information Processing Systems, vol. 34, pp. 20838–20850 (2021)
Google Scholar
Zawad, S., Li, C., Yao, Z., Zheng, E., He, Y., Yan, F.: DYSR: adaptive super-resolution via algorithm and system co-design. In: The Eleventh International Conference on Learning Representations (2022)
Google Scholar
Zhan, Z., et al.: Achieving on-mobile real-time super-resolution with neural architecture and pruning search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4821–4831 (2021)
Google Scholar
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301 (2018)
Google Scholar
Zhang, Y., Wei, D., Qin, C., Wang, H., Pfister, H., Fu, Y.: Context reasoning attention network for image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4278–4287 (2021)
Google Scholar
Zhao, F., Zhao, J., Yan, S., Feng, J.: Dynamic conditional networks for few-shot learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–35. Springer, Cham (2018)
Google Scholar
Zhou, J., Jampani, V., Pi, Z., Liu, Q., Yang, M.H.: Decoupled dynamic filter networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF (2021)
Google Scholar
Zhu, K., et al.: Disc: a dynamic shape compiler for machine learning workloads. In: Proceedings of the 1st Workshop on Machine Learning and Systems, pp. 89–95 (2021)
Google Scholar

Download references

Acknowledgments

This work is partly supported by the National Science Foundation CCF-2312616, CCF-2427875, CNS-2232048 and National Aeronautics and Space Administration (NASA) 80NSSC23K1393. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF and NASA.

Author information

Authors and Affiliations

Clemson University, Clemson, USA
Gen Li, Jie Ji, Minghai Qin, Fatemeh Afghah & Xiaolong Ma
University of Georgia, Athens, USA
Zhihao Shu, Minghai Qin & Wei Niu

Authors

Gen Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Shu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ji
View author publications
You can also search for this author in PubMed Google Scholar
Minghai Qin
View author publications
You can also search for this author in PubMed Google Scholar
Fatemeh Afghah
View author publications
You can also search for this author in PubMed Google Scholar
Wei Niu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gen Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, G. et al. (2025). Data Overfitting for On-device Super-Resolution with Dynamic Algorithm and Compiler Co-design. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15125. Springer, Cham. https://doi.org/10.1007/978-3-031-72855-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-72855-6_21
Published: 09 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72854-9
Online ISBN: 978-3-031-72855-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics