Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-72973-7guideproceedingsBook PagePublication PagesConference Proceedingsacm-pubtype
Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part L
2024 Proceeding
  • Editors:
  • Aleš Leonardis,
  • Elisa Ricci,
  • Stefan Roth,
  • Olga Russakovsky,
  • Torsten Sattler,
  • Gül Varol
Publisher:
  • Springer-Verlag
  • Berlin, Heidelberg
Conference:
European Conference on Computer VisionMilan, Italy29 September 2024
ISBN:
978-3-031-72972-0
Published:
14 November 2024

Reflects downloads up to 10 Nov 2024Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
front-matter
Front Matter
Pages i–lxxxv
back-matter
Back Matter
Article
Revisit Human-Scene Interaction via Space Occupancy
Abstract

Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks. However, one of the major obstacles is its limited data scale. High-quality data with simultaneously captured human and 3D environments is ...

Article
Face-Adapter for Pre-trained Diffusion Models with Fine-Grained ID and Attribute Control
Abstract

Current face reenactment and swapping methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these models is resource-intensive, and the ...

Article
WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
Abstract

Recently learned image compression (LIC) has achieved great progress and even outperformed the traditional approach using DCT or discrete wavelet transform (DWT). However, LIC mainly reduces spatial redundancy in the autoencoder networks and ...

Article
Grid-Attention: Enhancing Computational Efficiency of Large Vision Models Without Fine-Tuning
Abstract

Recently, transformer-based large vision models, e.g., the Segment Anything Model (SAM) and Stable Diffusion (SD), have achieved remarkable success in the computer vision field. However, the quartic complexity within the transformer’s Multi-Head ...

Article
Mitigating Background Shift in Class-Incremental Semantic Segmentation
Abstract

Class-Incremental Semantic Segmentation (CISS) aims to learn new classes without forgetting the old ones, using only the labels of the new classes. To achieve this, two popular strategies are employed: 1) pseudo-labeling and knowledge distillation ...

Article
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
Abstract

This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-...

Article
BKDSNN: Enhancing the Performance of Learning-Based Spiking Neural Networks Training with Blurred Knowledge Distillation
Abstract

Spiking neural networks (SNNs), which mimic biological neural systems to convey information via discrete spikes, are well-known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for ...

Article
Agent Attention: On the Integration of Softmax and Linear Attention
Abstract

The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel ...

Article
Learning by Aligning 2D Skeleton Sequences and Multi-modality Fusion
Abstract

This paper presents a self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications. In contrast with the state-of-the-art method of CASA, where sequences of 3D skeleton ...

Article
Resolving Scale Ambiguity in Multi-view 3D Reconstruction Using Dual-Pixel Sensors
Abstract

Multi-view 3D reconstruction, namely structure-from-motion and multi-view stereo, is an essential component in 3D computer vision. In general, multi-view 3D reconstruction suffers from unknown scale ambiguity unless a reference object of known ...

Article
Object-Oriented Anchoring and Modal Alignment in Multimodal Learning
Abstract

Modality alignment has been of paramount importance in recent developments of multimodal learning, which has inspired many innovations in multimodal networks and pre-training tasks. Single-stream networks can effectively leverage self-attention ...

Article
Towards Stable 3D Object Detection
Abstract

In autonomous driving, the temporal stability of 3D object detection greatly impacts the driving safety. However, the detection stability cannot be accessed by existing metrics such as mAP and MOTA, and consequently is less explored by the ...

Article
FYI: Flip Your Images for Dataset Distillation
Abstract

Dataset distillation synthesizes a small set of images from a large-scale real dataset such that synthetic and real images share similar behavioral properties (e.g., distributions of gradients or features) during a training process. Through ...

Article
On-the-Fly Category Discovery for LiDAR Semantic Segmentation
Abstract

LiDAR semantic segmentation is important for understanding the surrounding environment in autonomous driving. Existing methods assume closed-set situations with the same training and testing label space. However, in the real world, unknown classes ...

Article
Dual-Camera Smooth Zoom on Mobile Phones
Abstract

When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user’s zoom experience. In this work, we introduce a new task, i.e., dual-camera smooth zoom (DCSZ) ...

Article
ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
Abstract

Point cloud completion aims to reconstruct the geometry of partial point clouds captured by various sensors. Traditionally, training a point cloud model is carried out on synthetic datasets, which have limited categories and deviate significantly ...

Article
CONDA: Condensed Deep Association Learning for Co-salient Object Detection
Abstract

Inter-image association modeling is crucial for co-salient object detection. Despite satisfactory performance, previous methods still have limitations on sufficient inter-image association modeling. Because most of them focus on image feature ...

Article
Cascade Prompt Learning for Vision-Language Model Adaptation
Abstract

Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks. However, current learnable prompt tokens are primarily used for the single phase of ...

Article
PolyRoom: Room-Aware Transformer for Floorplan Reconstruction
Abstract

Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from point ...

Article
BenchLMM: Benchmarking Cross-Style Visual Capability of Large Multimodal Models
Abstract

Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning on data in common image styles. However, their robustness against diverse style shifts, crucial for practical applications, remains ...

Article
SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
Abstract

Transformer-based restoration methods achieve significant performance as the self-attention (SA) of the Transformer can explore non-local information for better high-resolution image reconstruction. However, the key dot-product SA requires ...

Article
HENet: Hybrid Encoding for End-to-End Multi-task 3D Perception from Multi-view Cameras
Abstract

Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird’s-eye-view (BEV) semantic segmentation. To improve perception precision, ...

Article
Hierarchical Unsupervised Relation Distillation for Source Free Domain Adaptation
Abstract

Source free domain adaptation (SFDA) aims to transfer the model trained on labeled source domain to unlabeled target domain without accessing source data. Recent SFDA methods predominantly rely on self-training, which supervise the model with ...

Article
Customized Generation Reimagined: Fidelity and Editability Harmonized
Abstract

Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts. However, customized generation suffers from an inherent trade-...

Article
AUFormer: Vision Transformers Are Parameter-Efficient Facial Action Unit Detectors
Abstract

Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable ...

Article
Improving Video Segmentation via Dynamic Anchor Queries
Abstract

Modern video segmentation methods adopt feature transitions between anchor and target queries to perform cross-frame object association. The smooth feature transitions between anchor and target queries enable these methods to achieve satisfactory ...

Article
Controllable Contextualized Image Captioning: Directing the Visual Narrative Through User-Defined Highlights
Abstract

Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further ...

Index terms have been assigned to the content through auto-classification.
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations