Open access
Author
Date
2023Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Standard supervised learning, which dominates recognition tasks in computer vision, requires retraining models from scratch for new tasks or domains. This is inefficient and ignores related information or previous knowledge that could help learn tasks more efficiently and robustly. In this thesis, we explore how to use such inductive knowledge along with data to improve recognition models for dense visual prediction tasks. We focus on two types of knowledge-sharing approaches: multi-task learning and domain adaptation. We first present two neural network architectures that can learn multiple dense prediction tasks concurrently and achieve state-of-the-art prediction quality for all tasks. These architectures avoid negative transfer among tasks by modifying either the encoder or the decoder parts of the network: The first method uses neural architecture search to automatically create branching structures in the encoder. Owing to a resource-aware objective function, it can find effective branching structures for various dense prediction tasks within constrained resource budgets. The second method introduces a decoder module that improves task predictions by using attention-based cross-task contexts. In the process, we explore how different types of context affect each task. In the second part of the thesis, we discuss two methods for domain-adaptive semantic segmentation. These methods use image-level correspondences to enhance the adaptation of models trained in normal conditions (clear weather and daytime) to adverse conditions. The first method is a general extension of self-training-based unsupervised domain adaptation methods. It involves aligning the normal image with the adverse image using a dense matching network and improving the adverse prediction with the normal prediction using an adaptive label correction mechanism. Second, we present a method for source-free domain-adaptive semantic segmentation. Through cross-domain contrastive learning, the method learns condition-invariant features, which enables it to generalize exceptionally well. In summary, this thesis demonstrates how knowledge transfer between tasks and domains can improve dense prediction in computer vision. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000648597Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Van Gool, Luc
Examiner: Roth, Stefan
Examiner: Sattler, Torsten
Examiner: Sakaridis, Christos
Publisher
ETH ZurichOrganisational unit
03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)
More
Show all metadata
ETH Bibliography
yes
Altmetrics