Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.13252 (cs)

[Submitted on 20 Dec 2023]

Title:Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Authors:Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

View PDF

Abstract:While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training mixture than is common, and an efficient diffusion parameterization, our method, DMD (Diffusion for Metric Depth) achieves a 25\% reduction in relative error (REL) on zero-shot indoor and 33\% reduction on zero-shot outdoor datasets over the current SOTA using only a small number of denoising steps. For an overview see this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.13252 [cs.CV]
	(or arXiv:2312.13252v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.13252

Submission history

From: Saurabh Saxena [view email]
[v1] Wed, 20 Dec 2023 18:27:47 UTC (21,371 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators