Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-73661-2_23guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

Published: 10 November 2024 Publication History

Abstract

Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing this challenge necessitates the integration of conditional information, encompassing temporal features, low-frequency texture, and high-frequency events, to guide the Denoising Diffusion Probabilistic Model (DDPM) in producing accurate and natural outputs. To tackle this issue, we introduce a novel approach, the Temporal Residual Guided Diffusion Framework, which effectively leverages both temporal and frequency-based event priors. Our framework incorporates three key conditioning modules: a pre-trained low-frequency intensity estimation module, a temporal recurrent encoder module, and an attention-based high-frequency prior enhancement module. In order to capture temporal scene variations from the events at the current moment, we employ a temporal-domain residual image as the target for the diffusion model. Through the combination of these three conditioning paths and the temporal residual framework, our framework excels in reconstructing high-quality videos from event flow, mitigating issues such as artifacts and over-smoothing commonly observed in previous approaches. Extensive experiments conducted on multiple benchmark datasets validate the superior performance of our framework compared to prior event-based reconstruction methods.

References

[1]
Bardow, P., Davison, A.J., Leutenegger, S.: Simultaneous optical flow and intensity estimation from an event camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 884–892 (2016)
[2]
Cadena PRG, Qian Y, Wang C, and Yang M Spade-e2vid: spatially-adaptive denormalization for event-based video reconstruction IEEE Trans. Image Process. 2021 30 2488-2500
[3]
Choi, J., Kim, S., Jeong, Y., Gwon, Y., Yoon, S.: ILVR: conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938 (2021)
[4]
Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687 (2022)
[5]
Cook, M., Gugelmann, L., Jug, F., Krautz, C., Steger, A.: Interacting maps for fast visual interpretation. In: The 2011 International Joint Conference on Neural Networks, pp. 770–776. IEEE (2011)
[6]
Dhariwal P and Nichol A Diffusion models beat GANs on image synthesis Adv. Neural Inf. Process. Syst. 2021 34 8780-8794
[7]
Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
[8]
Fei, B., et al.: Generative diffusion prior for unified image restoration and enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9935–9946 (2023)
[9]
Gantier Cadena, P.R., Qian, Y., Wang, C., Yang, M.: Sparse-e2vid: a sparse convolutional model for event-based video reconstruction trained with real event noise. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4150–4158 (2023).
[10]
Gao, S., et al.: Implicit diffusion models for continuous super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10021–10030 (2023)
[11]
Gu, D., Li, J., Zhu, L., Zhang, Y., Ren, J.S.: Reliable event generation with invertible conditional normalizing flow. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
[12]
Guo, L., et al.: Shadowdiffusion: when degradation prior meets diffusion model for shadow removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14049–14058 (2023)
[13]
Ho J, Jain A, and Abbeel P Denoising diffusion probabilistic models Adv. Neural Inf. Process. Syst. 2020 33 6840-6851
[14]
Jin, Y., Yang, W., Ye, W., Yuan, Y., Tan, R.T.: Shadowdiffusion: diffusion-based shadow removal using classifier-driven attention and structure preservation. arXiv preprint arXiv:2211.08089 (2022)
[15]
Kawar B, Elad M, Ermon S, and Song J Denoising diffusion restoration models Adv. Neural Inf. Process. Syst. 2022 35 23593-23606
[16]
Kawar B, Vaksman G, and Elad M Snips: solving noisy inverse problems stochastically Adv. Neural Inf. Process. Syst. 2021 34 21757-21769
[17]
Kim, G., Kwon, T., Ye, J.C.: Diffusionclip: text-guided diffusion models for robust image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2426–2435 (2022)
[18]
Kim H, Handa A, Benosman R, Ieng SH, and Davison AJ Simultaneous mosaicing and tracking with an event camera J. Solid State Circ 2008 43 566-576
[19]
Liang, Q., Zheng, X., Huang, K., Zhang, Y., Chen, J., Tian, Y.: Event-diffusion: event-based image reconstruction and restoration with diffusion models. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 3837–3846 (2023)
[20]
Lichtsteiner P, Posch C, and Delbruck T A 128 × 128 120 db 15 μs latency asynchronous temporal contrast vision sensor IEEE J. Solid-State Circuits 2008 43 2 566-576
[21]
Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755
[22]
Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Image restoration with mean-reverting stochastic differential equations. arXiv preprint arXiv:2301.11699 (2023)
[23]
Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Refusion: enabling large-size realistic image restoration with latent-space diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1680–1691 (2023)
[24]
Mueggler E, Rebecq H, Gallego G, Delbruck T, and Scaramuzza D The event-camera dataset and simulator: event-based data for pose estimation, visual odometry, and slam Int. J. Robot. Res. 2017 36 2 142-149
[25]
Munda G, Reinbacher C, and Pock T Real-time intensity-image reconstruction for event cameras using manifold regularisation Int. J. Comput. Vis. 2018 126 1381-1393
[26]
Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821–8831. PMLR (2021)
[27]
Rebecq, H., Gehrig, D., Scaramuzza, D.: Esim: an open event camera simulator. In: Conference on Robot Learning, pp. 969–982. PMLR (2018)
[28]
Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: Events-to-video: bringing modern computer vision to event cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3857–3866 (2019)
[29]
Rebecq H, Ranftl R, Koltun V, and Scaramuzza D High speed and high dynamic range video with an event camera IEEE Trans. Pattern Anal. Mach. Intell. 2019 43 6 1964-1980
[30]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
[31]
Saharia, C., et al.: Palette: image-to-image diffusion models. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–10 (2022)
[32]
Saharia C, Ho J, Chan W, Salimans T, Fleet DJ, and Norouzi M Image super-resolution via iterative refinement IEEE Trans. Pattern Anal. Mach. Intell. 2022 45 4 4713-4726
[33]
Scheerlinck, C., Rebecq, H., Gehrig, D., Barnes, N., Mahony, R., Scaramuzza, D.: Fast image reconstruction with an event camera. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 156–163 (2020)
[34]
Shang, S., Shan, Z., Liu, G., Zhang, J.: Resdiff: combining cnn and diffusion model for image super-resolution. arXiv preprint arXiv:2303.08714 (2023)
[35]
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 28 (2015)
[36]
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper_files/paper/2019/file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf
[37]
Stoffregen T et al. Vedaldi A, Bischof H, Brox T, Frahm J-M, et al. Reducing the sim-to-real gap for event cameras Computer Vision – ECCV 2020 2020 Cham Springer 534-549
[38]
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
[39]
Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022)
[40]
Wang Z, Bovik AC, Sheikh HR, and Simoncelli EP Image quality assessment: from error visibility to structural similarity IEEE Trans. Image Process. 2004 13 4 600-612
[41]
Weng, W., Zhang, Y., Xiong, Z.: Event-based video reconstruction using transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2563–2572 (2021)
[42]
Xia, B., et al.: Diffir: efficient diffusion model for image restoration. arXiv preprint arXiv:2303.09472 (2023)
[43]
Xiang, X., Zhu, L., Li, J., Tian, Y., Huang, T.: Temporal up-sampling for asynchronous events. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 01–06. IEEE (2022)
[44]
Yang M, Liu SC, and Delbruck T A dynamic vision sensor with 1% temporal contrast sensitivity and in-pixel asynchronous delta modulator for event encoding IEEE J. Solid-State Circuits 2015 50 9 2149-2160
[45]
Zeng Z, Yang F, Liu H, and Satoh S Improving deep metric learning via self-distillation and online batch diffusion process Vis. Intell. 2024 2 1 1-13
[46]
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
[47]
Zhang, Y., Shi, X., Li, D., Wang, X., Wang, J., Li, H.: A unified conditional framework for diffusion-based image restoration. arXiv preprint arXiv:2305.20049 (2023)
[48]
Zhu AZ, Thakur D, Özaslan T, Pfrommer B, Kumar V, and Daniilidis K The multivehicle stereo event camera dataset: an event camera dataset for 3d perception IEEE Robot. Autom. Lett. 2018 3 3 2032-2039
[49]
Zhu, L., Li, J., Wang, X., Huang, T., Tian, Y.: Neuspike-net: high speed video reconstruction via bio-inspired neuromorphic cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2400–2409 (2021)
[50]
Zhu, L., Wang, X., Chang, Y., Li, J., Huang, T., Tian, Y.: Event-based video reconstruction via potential-assisted spiking neural network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3594–3604 (2022)
[51]
Zihao Zhu, A., Yuan, L., Chaney, K., Daniilidis, K.: Unsupervised event-based optical flow using motion compensation. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

Index Terms

  1. Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XL
          Sep 2024
          587 pages
          ISBN:978-3-031-73660-5
          DOI:10.1007/978-3-031-73661-2
          • Editors:
          • Aleš Leonardis,
          • Elisa Ricci,
          • Stefan Roth,
          • Olga Russakovsky,
          • Torsten Sattler,
          • Gül Varol

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 10 November 2024

          Author Tags

          1. Event camera
          2. diffusion model
          3. video reconstruction

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 28 Feb 2025

          Other Metrics

          Citations

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media