research-article

Deep equilibrium approaches to diffusion models

AUTHORs:

Zhengyang Geng,

Zico KolterAuthors Info & Claims

NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems

Article No.: 2752, Pages 37975 - 37990

Published: 28 November 2022 Publication History

Abstract

Diffusion-based generative models are extremely effective in generating high-quality images, with generated samples often surpassing the quality of those produced by other models under several metrics. One distinguishing feature of these models, however, is that they typically require long sampling chains to produce high-fidelity images. This presents a challenge not only from the lenses of sampling time, but also from the inherent difficulty in backpropagating through these chains in order to accomplish tasks such as model inversion, i.e., approximately finding latent states that generate known images. In this paper, we look at diffusion models through a different perspective, that of a (deep) equilibrium (DEQ) fixed point model. Specifically, we extend the recent denoising diffusion implicit model (DDIM) [68], and model the entire sampling chain as a joint, multi-variate fixed point system. This setup provides an elegant unification of diffusion and equilibrium models, and shows benefits in 1) single image sampling, as it replaces the fully-serial typical sampling process with a parallel one; and 2) model inversion, where we can leverage fast gradients in the DEQ setting to much more quickly find the noise that generates a given image. The approach is also orthogonal and thus complementary to other methods used to reduce the sampling time, or improve model inversion. We demonstrate our method's strong performance across several datasets, including CIFAR10, CelebA, and LSUN Bedroom and Churches.

Supplementary Material

Additional material (3600270.3603022_supp.pdf)

Supplemental material.

Download
18.50 MB

References

[1]

Rameen Abdal, Yipeng Qin, and Peter Wonka. Image2stylegan: How to embed images into the stylegan latent space? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4432-4441, 2019. (Cited on 10)

[2]

Rameen Abdal, Peihao Zhu, Niloy J Mitra, and Peter Wonka. Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (TOG), 40(3):1-21, 2021. (Cited on 2, 10)

[3]

Brandon Amos. Tutorial on amortized optimization for learning to optimize over continuous domains. arXiv preprint arXiv:2202.00665, 2022. (Cited on 9)

[4]

Brandon Amos and J. Zico Kolter. OptNet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning (ICML), 2017. (Cited on 9)

[5]

Donald G Anderson. Iterative procedures for nonlinear integral equations. Journal of the ACM (JACM), 1965. (Cited on 3, 4, 9, 17, 21)

[6]

Shaojie Bai, J Zico Kolter, and Vladlen Koltun. Deep equilibrium models. Neural Information Processing Systems (NeurIPS), 2019. (Cited on 2, 3, 6, 9, 19)

[7]

Shaojie Bai, Vladlen Koltun, and J Zico Kolter. Multiscale deep equilibrium models. Neural Information Processing Systems (NeurIPS), 2020. (Cited on 9)

[8]

Shaojie Bai, Vladlen Koltun, and J Zico Kolter. Stabilizing equilibrium models by jacobian regularization. arXiv preprint arXiv:2106.14342, 2021. (Cited on 6, 9)

[9]

Shaojie Bai, Zhengyang Geng, Yash Savani, and J Zico Kolter. Deep equilibrium optical flow estimation. arXiv preprint arXiv:2204.08442, 2022. (Cited on 6, 9)

[10]

Shaojie Bai, Vladlen Koltun, and J Zico Kolter. Neural deep equilibrium solvers. In International Conference on Learning Representations, 2022. (Cited on 9)

[11]

David Bau, Hendrik Strobelt, William Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, and Antonio Torralba. Semantic photo manipulation with a generative image prior. arXiv preprint arXiv:2005.07727, 2020. (Cited on 10)

[12]

Ashish Bora, Ajil Jalal, Eric Price, and Alexandras G Dimakis. Compressed sensing using generative models. In International Conference on Machine Learning, pages 537-546. PMLR, 2017. (Cited on 10)

[13]

Charles G Broyden. A class of methods for solving nonlinear simultaneous equations. Mathematics of computation, 1965. (Cited on 3, 9, 19)

[14]

Kelvin CK Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, and Chen Change Loy. Glean: Generative latent bank for large-factor image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14245-14254, 2021. (Cited on 10)

[15]

Qi Chen, Yifei Wang, Yisen Wang, Jiansheng Yang, and Zhouchen Lin. Optimization-induced graph implicit nonlinear diffusion. In International Conference on Machine Learning, pages 3648-3661. PMLR, 2022. (Cited on 9)

[16]

Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Neural Information Processing Systems (NeurIPS), 2018. (Cited on 9)

[17]

Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938, 2021. (Cited on 10)

[18]

Hyungjin Chung, Byeongsu Sim, and Jong Chul Ye. Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. arXiv preprint arXiv:2112.05146, 2021. (Cited on 10)

[19]

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34, 2021. (Cited on 1, 9)

[20]

Josip Djolonga and Andreas Krause. Differentiable learning of submodular models. Advances in Neural Information Processing Systems, 30, 2017. (Cited on 9)

[21]

Priya L. Donti, David Rolnick, and J Zico Kolter. DC3: A learning method for optimization with hard constraints. In International Conference on Learning Representations (ICLR), 2021. (Cited on 9)

[22]

Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. Augmented neural ODEs. In Neural Information Processing Systems (NeurIPS), 2019. (Cited on 9)

[23]

Laurent El Ghaoui, Fangda Gu, Bertrand Travacca, and Armin Askari. Implicit deep learning. arXiv:1908.06315, 2019. (Cited on 9)

[24]

Thorsten Falk, Dominic Mai, Robert Bensch, Özgün Çiçek, Ahmed Abdulkadir, Yassine Marrakchi, Anton Böhm, Jan Deubner, Zoe Jäckel, Katharina Seiwald, et al. U-net: deep learning for cell counting, detection, and morphometry. Nature methods, 16(1):67-70, 2019. (Cited on 17)

[25]

Zhili Feng and J Zico Kolter. On the neural tangent kernel of equilibrium models, 2021. (Cited on 9)

[26]

Samy Wu Fung, Howard Heaton, Qiuwei Li, Daniel McKenzie, Stanley Osher, and Wotao Yin. Fixed point networks: Implicit depth models with jacobian-free backprop. arXiv e-prints, pages arXiv-2103, 2021. (Cited on 6)

[27]

Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, and Zhouchen Lin. Is attention better than matrix decomposition? In International Conference on Learning Representations (ICLR), 2021. (Cited on 6, 9)

[28]

Zhengyang Geng, Xin-Yu Zhang, Shaojie Bai, Yisen Wang, and Zhouchen Lin. On training implicit models. Neural Information Processing Systems (NeurIPS), 2021. (Cited on 6, 9, 17)

[29]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014. (Cited on 3, 10)

[30]

Albert Gu, Karan Goel, and Christopher Re. Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations (ICLR), 2022. (Cited on 9)

[31]

Fangda Gu, Heng Chang, Wenwu Zhu, Somayeh Sojoudi, and Laurent El Ghaoui. Implicit Graph Neural Networks. In Neural Information Processing Systems (NeurIPS), pages 11984-11995, 2020. (Cited on 9)

[32]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017. (Cited on 7)

[33]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Neural Information Processing Systems (NeurIPS), 2020. (Cited on 2, 3, 6, 8, 9, 10, 17, 20)

[34]

Minyoung Huh, Richard Zhang, Jun-Yan Zhu, Sylvain Paris, and Aaron Hertzmann. Transforming and projecting images into class-conditional generative networks. In European Conference on Computer Vision, pages 17-34. Springer, 2020. (Cited on 10)

Digital Library

[35]

Thibaut Issenhuth, Ugo Tanielian, Jérémie Mary, and David Picard. Edibert, a generative model for image editing. arXiv preprint arXiv:2111.15264, 2021. (Cited on 2)

[36]

Ajil Jalal, Marius Arvinte, Giannis Daras, Eric Price, Alexandros G Dimakis, and Jon Tamir. Robust compressed sensing mri with deep generative priors. Advances in Neural Information Processing Systems, 34:14938-14954, 2021. (Cited on 10)

[37]

Zahra Kadkhodaie and Eero P Simoncelli. Solving linear inverse problems using the prior implicit in a denoiser. arXiv preprint arXiv:2007.13640, 2020. (Cited on 10)

[38]

Kenji Kawaguchi. On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers. In International Conference on Learning Representations (ICLR), 2020. (Cited on 9)

[39]

Bahjat Kawar, Gregory Vaksman, and Michael Elad. Snips: Solving noisy inverse problems stochastically. Advances in Neural Information Processing Systems, 34:21757-21769, 2021. (Cited on 10)

[40]

Gwanghyun Kim and Jong Chul Ye. Diffusionclip: Text-guided image manipulation using diffusion models. arXiv preprint arXiv:2110.02711, 2021. (Cited on 10)

[41]

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. (Cited on 17)

[42]

Diederik P Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. arXiv preprint arXiv:2107.00630, 2021. (Cited on 1)

[43]

J. Zico Kolter, David Duvenaud, and Matthew Johnson. Deep implicit layers tutorial - neural ODEs, deep equilibirum models, and beyond. Neural Information Processing Systems Tutorial, 2020. (Cited on 9, 17)

[44]

Zhifeng Kong and Wei Ping. On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132, 2021. (Cited on 1, 9)

[45]

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations (ICLR), 2021. (Cited on 9)

[46]

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009. (Cited on 2, 6)

[47]

Haoying Li, Yifan Yang, Meng Chang, Shiqi Chen, Huajun Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 2022. (Cited on 10)

[48]

Mingjie Li, Yisen Wang, and Zhouchen Lin. Cerdeq: Certifiable deep equilibrium model. In International Conference on Machine Learning, 2022. (Cited on 9)

[49]

Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, and Sanja Fidler. Editgan: High-precision semantic image editing. Advances in Neural Information Processing Systems, 34:16331-16345, 2021. (Cited on 2)

[50]

Zenan Ling, Xingyu Xie, Qiuhao Wang, Zongpeng Zhang, and Zhouchen Lin. Global convergence of over-parameterized deep equilibrium models. arXiv preprint arXiv:2205.13814, 2022. (Cited on 9)

[51]

Juncheng Liu, Kenji Kawaguchi, Bryan Hooi, Yiwei Wang, and Xiaokui Xiao. EIGNN: Efficient infinite-depth graph neural networks. In Neural Information Processing Systems (NeurIPS), 2021. (Cited on 9)

[52]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015. (Cited on 2, 6)

Digital Library

[53]

Cheng Lu, Jianfei Chen, Chongxuan Li, Qiuhao Wang, and Jun Zhu. Implicit normalizing flows. In International Conference on Learning Representations (ICLR), 2021. (Cited on 9)

[54]

Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed. ArXiv, abs/2101.02388, 2021. (Cited on 1, 9)

[55]

Chenlin Meng, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021. (Cited on 2, 10)

[56]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021. (Cited on 10)

[57]

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162-8171. PMLR, 2021. (Cited on 1, 10)

[58]

Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. Diffusion models for adversarial purification. arXiv preprint arXiv:2205.07460, 2022. (Cited on 2)

[59]

Junyoung Park, Jinhyun Choo, and Jinkyoo Park. Convergent graph solvers. arXiv preprint arXiv:2106.01680, 2021. (Cited on 9)

[60]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017. (Cited on 17)

[61]

Guim Perarnau, Joost Van De Weijer, Bogdan Raducanu, and Jose M Alvarez. Invertible conditional gans for image editing. arXiv preprint arXiv:1611.06355, 2016. (Cited on 10)

[62]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022. (Cited on 9)

[63]

Ardavan Saeedi, Matthew Hoffman, Stephen DiVerdi, Asma Ghandeharioun, Matthew Johnson, and Ryan Adams. Multimodal prediction and personalization of photo edits with deep generative models. In International Conference on Artificial Intelligence and Statistics, pages 1309-1317. PMLR, 2018. (Cited on 2)

[64]

Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement. arXiv preprint arXiv:2104.07636, 2021. (Cited on 10)

[65]

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations (ICLR), 2022. (Cited on 1,9)

[66]

Hiroshi Sasaki, Chris G Willcocks, and Toby P Breckon. Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models. arXiv preprint arXiv:2104.05358, 2021. (Cited on 10)

[67]

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (ICML), 2015. (Cited on 2, 3, 9, 20)

Digital Library

[68]

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020. (Cited on 1, 2, 3, 4, 6, 8, 9, 17, 20, 22)

[69]

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019. (Cited on 1, 9)

[70]

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021. (Cited on 9)

[71]

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021. (Cited on 9, 10)

[72]

Po-Wei Wang, Priya Donti, Bryan Wilder, and Zico Kolter. Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. In International Conference on Machine Learning (ICML), 2019. (Cited on 9)

[73]

Tiancai Wang, Xiangyu Zhang, and Jian Sun. Implicit Feature Pyramid Network for Object Detection. arXiv preprint arXiv:2012.13563, 2020. (Cited on 9)

[74]

Colin Wei and J Zico Kolter. Certified robustness for deep equilibrium models via interval bound propagation. In International Conference on Learning Representations, 2022. (Cited on 9)

[75]

Ezra Winston and J. Zico Kolter. Monotone operator equilibrium networks. In Neural Information Processing Systems (NeurIPS), 2020. (Cited on 9)

[76]

Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR, abs/1506.03365, 2015. (Cited on 6)

[77]

Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. In-domain gan inversion for real image editing. In European conference on computer vision, pages 592-608. Springer, 2020. (Cited on 2, 10)

Digital Library

[78]

Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. Generative visual manipulation on the natural image manifold. In European conference on computer vision, pages 597-613. Springer, 2016. (Cited on 10) https://github.com/ashwinipokle/deq-ddim

Index Terms

Deep equilibrium approaches to diffusion models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Diffusion Models in Vision: A Survey
Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a ...
Diffusion models for image super-resolution: State-of-the-art and future directions
Abstract
The single image super-resolution (SISR) task has received much attention due to the wide range of applications in many tasks. The progress in this SISR is mainly based on deep learning methods. In recent years, many methods have been developed ...
Highlights
- Image super-resolution based on diffusion models is reviewed.
- Classifications of diffusion models for image super-resolution are discussed.
- The future directions of diffusion models image super-resolution are indicated.
On ill-posed anisotropic diffusion models
ICIP '95: Proceedings of the 1995 International Conference on Image Processing (Vol.2)-Volume 2 - Volume 2

This paper describes a class of ill-posed anisotropic diffusion models of the type presented by Perona and Malik (1990). The analysis is based on a previous result that anisotropic diffusion is a steepest descent motion on an energy surface and its ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

November 2022

39114 pages

ISBN:9781713871088

Copyright © 2022 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 28 November 2022

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten