Nothing Special   »   [go: up one dir, main page]

Skip to main content

Data Overfitting for On-device Super-Resolution with Dynamic Algorithm and Compiler Co-design

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15125))

Included in the following conference series:

  • 15 Accesses

Abstract

Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN’s overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7\(\times \) speedup while saving up to 1.61\(\times \) memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: \(\{\)TensorFlow\(\}\): a system for \(\{\)Large-Scale\(\}\) machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)

    Google Scholar 

  2. Ahn, B.H., Lee, J., Lin, J.M., Cheng, H.P., Hou, J., Esmaeilzadeh, H.: Ordering chaos: memory-aware scheduling of irregularly wired neural networks for edge devices. Proc. Mach. Learn. Syst. 2, 44–57 (2020)

    Google Scholar 

  3. Bengio, Y., LeCun, Y., et al.: Scaling learning algorithms towards AI. Large-Scale Kernel Mach. 34(5), 1–41 (2007)

    Google Scholar 

  4. Caballero, J., et al.: Real-time video super-resolution with spatio-temporal networks and motion compensation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4778–4787 (2017)

    Google Scholar 

  5. Cai, X., Xu, T., Yi, J., Huang, J., Rajasekaran, S.: Dtwnet: a dynamic time warping network. In: Advances in Neural Information Processing Systems, vol. 32 (NeurIPS 2019). NeurIPS Foundation (2019)

    Google Scholar 

  6. Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: Investigating tradeoffs in real-world video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5962–5971 (2022)

    Google Scholar 

  7. Chen, H., et al.: Pre-trained image processing transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12299–12310 (2021)

    Google Scholar 

  8. Chen, J., Hu, M., Luo, Z., Wang, Z., Wu, D.: SR360: boosting 360-degree video streaming with super-resolution. In: Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, pp. 1–6 (2020)

    Google Scholar 

  9. Chen, Y., Dai, X., Liu, M., Chen, D.D., Yuan, L., Liu, Z.: Dynamic convolution: attention over convolution kernels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF (2020)

    Google Scholar 

  10. Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)

    Google Scholar 

  11. Dai, T., Cai, J., Zhang, Y., Xia, S.T., Zhang, L.: Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074 (2019)

    Google Scholar 

  12. Dasari, M., Bhattacharya, A., Vargas, S., Sahu, P., Balasubramanian, A., Das, S.R.: Streaming 360-degree videos using super-resolution. In: IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 1977–1986. IEEE (2020)

    Google Scholar 

  13. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)

    Article  Google Scholar 

  14. Dong, C., Loy, C.C., Tang, X.: Accelerating the super-resolution convolutional neural network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 391–407. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_25

    Chapter  Google Scholar 

  15. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  16. Fan, Y., Tian, F., Qin, T., Bian, J., Liu, T.Y.: Learning what data to learn. arXiv preprint arXiv:1702.08635 (2017)

  17. Figurnov, M., et al.: Spatially adaptive computation time for residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017). https://arxiv.org/abs/1612.02297

  18. Ghosh-Dastidar, S., Adeli, H.: Spiking neural networks (2009). https://doi.org/10.1142/S0129065709002002

  19. Gui, T., Zhang, Q., Huang, X., et al.: Long short-term memory with dynamic skip connections. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI (2018). https://doi.org/10.1609/aaai.v33i01.33016481

  20. Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs (2016)

    Google Scholar 

  21. Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. IEEE (2022)

    Google Scholar 

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  23. Hong, C., Kim, H., Baik, S., Oh, J., Lee, K.M.: DAQ: channel-wise distribution-aware quantization for deep image super-resolution networks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2675–2684 (2022)

    Google Scholar 

  24. Ji, J., et al.: Advancing dynamic sparse training by exploring optimization opportunities. In: Forty-First International Conference on Machine Learning (2024). https://openreview.net/forum?id=szRHR9XGrY

  25. Jiang, X., et al.: MNN: a universal and efficient inference engine. In: MLSys (2020)

    Google Scholar 

  26. Jiang, X., et al.: MNN: a universal and efficient inference engine. Proc. Mach. Learn. Syst. 2, 1–13 (2020)

    Google Scholar 

  27. Khani, M., Sivaraman, V., Alizadeh, M.: Efficient video compression via content-adaptive super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4521–4530 (2021)

    Google Scholar 

  28. Kim, J., Jung, Y., Yeo, H., Ye, J., Han, D.: Neural-enhanced live streaming: improving live video ingest via online learning. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 107–125 (2020)

    Google Scholar 

  29. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1646–1654 (2016)

    Google Scholar 

  30. Kim, T.H., Sajjadi, M.S., Hirsch, M., Scholkopf, B.: Spatio-temporal transformer network for video restoration. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 106–122 (2018)

    Google Scholar 

  31. Kumar, M., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Advances in Neural Information Processing Systems, vol. 23 (2010)

    Google Scholar 

  32. Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)

    Google Scholar 

  33. Li, G., et al.: Towards high-quality and efficient video super-resolution via spatial-temporal data overfitting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10259–10269 (2023)

    Google Scholar 

  34. Li, G., et al.: Neurrev: train better sparse neural network practically via neuron revitalization. In: The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=60lNoatp7u

  35. Li, X., et al.: Efficient meta-tuning for content-aware neural video delivery. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 308–324. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_18

    Chapter  Google Scholar 

  36. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844 (2021)

    Google Scholar 

  37. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)

    Google Scholar 

  38. Liu, J., et al.: Overfitting the data: compact neural video delivery via content-aware feature modulation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4631–4640 (2021)

    Google Scholar 

  39. Mei, Y., Fan, Y., Zhou, Y.: Image super-resolution with non-local sparse attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2021)

    Google Scholar 

  40. Mercat, A., Viitanen, M., Vanne, J.: UVG dataset: 50/120FPS 4K sequences for video codec analysis and development. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 297–302 (2020)

    Google Scholar 

  41. Nasution, A., Efendi, S., Suwilo, S.: Image steganography in securing sound file using arithmetic coding algorithm, triple data encryption standard (3DES) and modified least significant bit (MLSB). In: Journal of Physics: Conference Series, vol. 1007, p. 012010. IOP Publishing (2018)

    Google Scholar 

  42. Niu, B., et al.: Single image super-resolution via a holistic attention network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 191–207. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_12

    Chapter  Google Scholar 

  43. ONNX: Open neural network exchange. https://www.onnx.ai

  44. Pisarchyk, Y., Lee, J.: Efficient memory management for deep neural net inference. arXiv preprint arXiv:2001.03288 (2020)

  45. Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)

    Google Scholar 

  46. Qualcomm: Snapdragon 8 gen 2 (2023)

    Google Scholar 

  47. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS). NeurIPS Foundation (2017)

    Google Scholar 

  48. Sajjadi, M.S., Vemulapalli, R., Brown, M.: Frame-recurrent video super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6626–6634 (2018)

    Google Scholar 

  49. Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)

    Google Scholar 

  50. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  51. Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4472–4480 (2017)

    Google Scholar 

  52. Tian, Y., Zhang, Y., Fu, Y., Xu, C.: TDAN: temporally-deformable alignment network for video super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3360–3369 (2020)

    Google Scholar 

  53. Toneva, M., Sordoni, A., Combes, R.T.D., Trischler, A., Bengio, Y., Gordon, G.J.: An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159 (2018)

  54. Wang, H., Chen, P., Zhuang, B., Shen, C.: Fully quantized image super-resolution networks. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 639–647 (2021)

    Google Scholar 

  55. Wang, H., Su, D., Liu, C., Jin, L., Sun, X., Peng, X.: Deformable non-local network for video super-resolution. IEEE Access 7, 177734–177744 (2019)

    Article  Google Scholar 

  56. Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  57. Xiao, X., Wang, W., Chen, T., Cao, Y., Jiang, T., Zhang, Q.: Sensor-augmented neural adaptive bitrate video streaming on UAVs. IEEE Trans. Multimedia 22(6), 1567–1576 (2019)

    Article  Google Scholar 

  58. Yeo, H., Chong, C.J., Jung, Y., Ye, J., Han, D.: Nemo: enabling neural-enhanced video streaming on commodity mobile devices. In: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pp. 1–14 (2020)

    Google Scholar 

  59. Yeo, H., Do, S., Han, D.: How will deep learning change internet video delivery? In: Proceedings of the 16th ACM Workshop on Hot Topics in Networks, pp. 57–64 (2017)

    Google Scholar 

  60. Yeo, H., Jung, Y., Kim, J., Shin, J., Han, D.: Neural adaptive content-aware internet video delivery. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 645–661 (2018)

    Google Scholar 

  61. Yin, L., et al.: Dynamic sparsity is channel-level sparsity learner. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  62. Ying, X., Wang, L., Wang, Y., Sheng, W., An, W., Guo, Y.: Deformable 3D convolution for video super-resolution. IEEE Signal Process. Lett. 27, 1500–1504 (2020)

    Article  Google Scholar 

  63. Yu, J., et al.: Wide activation for efficient and accurate image super-resolution. arXiv preprint arXiv:1808.08718 (2018)

  64. Yuan, G., et al.: MEST: accurate and fast memory-economic sparse training framework on the edge. In: Advances in Neural Information Processing Systems, vol. 34, pp. 20838–20850 (2021)

    Google Scholar 

  65. Zawad, S., Li, C., Yao, Z., Zheng, E., He, Y., Yan, F.: DYSR: adaptive super-resolution via algorithm and system co-design. In: The Eleventh International Conference on Learning Representations (2022)

    Google Scholar 

  66. Zhan, Z., et al.: Achieving on-mobile real-time super-resolution with neural architecture and pruning search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4821–4831 (2021)

    Google Scholar 

  67. Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301 (2018)

    Google Scholar 

  68. Zhang, Y., Wei, D., Qin, C., Wang, H., Pfister, H., Fu, Y.: Context reasoning attention network for image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4278–4287 (2021)

    Google Scholar 

  69. Zhao, F., Zhao, J., Yan, S., Feng, J.: Dynamic conditional networks for few-shot learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–35. Springer, Cham (2018)

    Google Scholar 

  70. Zhou, J., Jampani, V., Pi, Z., Liu, Q., Yang, M.H.: Decoupled dynamic filter networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF (2021)

    Google Scholar 

  71. Zhu, K., et al.: Disc: a dynamic shape compiler for machine learning workloads. In: Proceedings of the 1st Workshop on Machine Learning and Systems, pp. 89–95 (2021)

    Google Scholar 

Download references

Acknowledgments

This work is partly supported by the National Science Foundation CCF-2312616, CCF-2427875, CNS-2232048 and National Aeronautics and Space Administration (NASA) 80NSSC23K1393. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF and NASA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gen Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, G. et al. (2025). Data Overfitting for On-device Super-Resolution with Dynamic Algorithm and Compiler Co-design. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15125. Springer, Cham. https://doi.org/10.1007/978-3-031-72855-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72855-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72854-9

  • Online ISBN: 978-3-031-72855-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics