Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.01569 (cs)

[Submitted on 3 Mar 2024]

Title:Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV

Authors:Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden

View PDF

Abstract:Self-supervised learning is the key to unlocking generic computer vision systems. By eliminating the reliance on ground-truth annotations, it allows scaling to much larger data quantities. Unfortunately, self-supervised monocular depth estimation (SS-MDE) has been limited by the absence of diverse training data. Existing datasets have focused exclusively on urban driving in densely populated cities, resulting in models that fail to generalize beyond this domain.
To address these limitations, this paper proposes two novel datasets: SlowTV and CribsTV. These are large-scale datasets curated from publicly available YouTube videos, containing a total of 2M training frames. They offer an incredibly diverse set of environments, ranging from snowy forests to coastal roads, luxury mansions and even underwater coral reefs. We leverage these datasets to tackle the challenging task of zero-shot generalization, outperforming every existing SS-MDE approach and even some state-of-the-art supervised methods.
The generalization capabilities of our models are further enhanced by a range of components and contributions: 1) learning the camera intrinsics, 2) a stronger augmentation regime targeting aspect ratio changes, 3) support frame randomization, 4) flexible motion estimation, 5) a modern transformer-based architecture. We demonstrate the effectiveness of each component in extensive ablation experiments. To facilitate the development of future research, we make the datasets, code and pretrained models available to the public at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Cite as:	arXiv:2403.01569 [cs.CV]
	(or arXiv:2403.01569v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.01569

Submission history

From: Jaime Spencer [view email]
[v1] Sun, 3 Mar 2024 17:29:03 UTC (25,054 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators