research-article

Open access

LOOSECONTROL: Lifting ControlNet for Generalized Depth Conditioning

Authors:

Shariq Farooq Bhat,

Niloy Mitra,

Peter WonkaAuthors Info & Claims

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

Article No.: 102, Pages 1 - 11

https://doi.org/10.1145/3641519.3657525

Published: 13 July 2024 Publication History

All formats PDF

Abstract

We present LooseControl to allow generalized depth conditioning for diffusion-based image generation. ControlNet, the SOTA for depth conditioned image generation, produces remarkable results but relies on having access to detailed depth maps for guidance. Creating such exact depth maps, in many scenarios, is challenging. This paper introduces a generalized version of depth conditioning that enables new content creation workflows. Specifically, we allow (C1) scene boundary control for loosely specifying scenes with only boundary conditions, and (C2) 3D box control for specifying the target objects’ layout locations rather than the objects’ exact shape and appearance. Using LooseControl, along with text guidance, users can create complex environments (e.g., rooms, street views, etc.) by specifying only scene boundaries and locations of primary objects. Further, we provide two editing mechanisms to refine the results: (E1) 3D box editing enables the user to refine images by changing, adding, or removing boxes while freezing the image style. This yields minimal changes apart from changes induced by the edited boxes. (E2) Attribute editing proposes possible editing directions to change one particular aspect of the scene, such as the overall object density or a particular object. Tests and comparisons with baselines demonstrate the generality of our method. We believe that LooseControl can become an important design tool for easily creating complex environments and be extended to other forms of guidance channels. The project page can be found at https://shariqfarooq123.github.io/loose-control/.

Supplemental Material

MP4 File - presentation

presentation

Download
187.57 MB

PDF File

Appendix

Download
1.09 MB

PDF File

Appendix

Download
1.09 MB

PDF File

Appendix

Download
1.09 MB

References

[1]

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, and Ming-Yu Liu. 2022. eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers. arXiv preprint arXiv:2211.01324 (2022).

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

DATENeRF: Depth-Aware Text-Based Editing of NeRFs

Depth-aware guidance with self-estimated depth representations of diffusion models

Generalised compositing

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations