Author: Su, Hao : Search

research-article

Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-rank Decomposition

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7172–7181https://doi.org/10.1145/3664647.3681588

To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. ...

Article

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Computer Vision – ECCV 2024Pages 143–163https://doi.org/10.1007/978-3-031-73039-9_9

Abstract

Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may ...

Article

CONDENSE: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

Computer Vision – ECCV 2024Pages 19–38https://doi.org/10.1007/978-3-031-72949-2_2

Abstract

To advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D ...

Article

GLAD: A Global-Attention-Based Diffusion Model for Infrared and Visible Image Fusion

Advanced Intelligent Computing Technology and ApplicationsPages 345–356https://doi.org/10.1007/978-981-97-5600-1_30

Abstract

Infrared and visible image fusion (IVIF) is a widely used approach to enhance scenario understanding, which fuses the salience of infrared images and the texture details of visible images. Existing methods typically focus on extracting local ...

research-article

Open Access

A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

SIGGRAPH '24: ACM SIGGRAPH 2024 Conference PapersArticle No.: 124, Pages 1–11https://doi.org/10.1145/3641519.3657427

Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance ...

research-article

Open Access

An audio‐based risky flight detection framework for quadrotors

IET Cyber-Systems and Robotics (CSY2), Volume 6, Issue 1https://doi.org/10.1049/csy2.12105

Abstract

Drones have increasingly collaborated with human workers in some workspaces, such as warehouses. The failure of a drone flight may bring potential risks to human beings' life safety during some aerial tasks. One of the most common flight ...

research-article

Open Access

General-Purpose Sim2Real Protocol for Learning Contact-Rich Manipulation With Marker-Based Visuotactile Sensors

IEEE Transactions on Robotics (TOR), Volume 40Pages 1509–1526https://doi.org/10.1109/TRO.2024.3352969

Visuotactile sensors can provide rich contact information, having great potential in contact-rich manipulation tasks with reinforcement learning (RL) policies. Sim2Real technique tackles the challenge of RL's reliance on a large amount of ...

research-article

OpenShape: scaling up 3D shape representation towards open-world understanding

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1944, Pages 44860–44879

We introduce OpenShape, a method for learning multi-modal joint representations of text, image, and point clouds. We adopt the commonly used multi-modal contrastive learning framework for representation alignment, but with a specific focus on scaling up ...

research-article

OpenIllumination: a multi-illumination dataset for inverse rendering evaluation on real objects

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1607, Pages 36951–36962

We introduce OpenIllumination, a real-world dataset containing over 108K images of 64 objects with diverse materials, captured under 72 camera views and a large number of different illuminations. For each image in the dataset, we provide accurate camera ...

research-article

Deductive verification of chain-of-thought reasoning

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1580, Pages 36407–36433

Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can ...

research-article

DiffVL: scaling up soft body manipulation using vision-language driven differentiable physics

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1301, Pages 29875–29900

Combining gradient-based trajectory optimization with differentiable physics simulation is an accurate and efficient technique for solving soft-body manipulation problems. Using a well-crafted optimization objective, the solver can quickly converge onto ...

research-article

One-2-3-45: any single image to 3D mesh in 45 seconds without per-shape optimization

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 976, Pages 22226–22246

Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world. Many existing methods solve this problem by optimizing a neural radiance field under the guidance of 2D diffusion models but ...

research-article

MARVEL: Raster Gray-Level Manga Vectorization via Primitive-Wise Deep Reinforcement Learning

IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 4Pages 2677–2693https://doi.org/10.1109/TCSVT.2023.3309786

Manga is a fashionable Japanese-style comic form that is composed of black-and-white strokes and is generally displayed as raster images on digital devices. Typical mangas have simple textures, wide lines, and few color gradients, which are vectorizable ...

research-article

ActiveZero++: Mixed Domain Learning Stereo and Confidence-Based Depth Completion With Zero Annotation

IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 45, Issue 12Pages 14098–14113https://doi.org/10.1109/TPAMI.2023.3305399

Learning-based stereo methods usually require a large scale dataset with depth, however obtaining accurate depth in the real domain is difficult, but groundtruth depth is readily available in the simulation domain. In this article we propose a new ...

poster

Construction of Hardware Course Group in New Engineering Computer Specialty Facing Complex Engineering Problems

ACM TURC '23: Proceedings of the ACM Turing Award Celebration Conference - China 2023Pages 122–124https://doi.org/10.1145/3603165.3607430

How to implement the construction of hardware course group for complex engineering problems in the context of the New engineering education certification in China, taking the cultivation of Computer System Capability as an example, through a series of ...

research-article

Dictionary Fields: Learning a Neural Basis Decomposition

ACM Transactions on Graphics (TOG), Volume 42, Issue 4Article No.: 156, Pages 1–12https://doi.org/10.1145/3592135

We present Dictionary Fields, a novel neural representation which decomposes a signal into a product of factors, each represented by a classical or neural field representation, operating on transformed input coordinates. More specifically, we factorize a ...

research-article

Abstract-to-executable trajectory translation for one-shot task generalization

ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 1411, Pages 33850–33882

Training long-horizon robotic policies in complex physical environments is essential for many applications, such as robotic manipulation. However, learning a policy that can generalize to unseen tasks is challenging. In this work, we propose to achieve ...

research-article

Reparameterized policy learning for multimodal trajectory optimization

ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 567, Pages 13957–13975

We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian ...

research-article

On pre-training for visuo-motor control: revisiting a learning-from-scratch baseline

ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 506, Pages 12511–12526

In this paper, we examine the effectiveness of pretraining for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly ...

research-article

Online Detection of 1D and 2D Hierarchical Super-Spreaders in High-Speed Networks

APNet '23: Proceedings of the 7th Asia-Pacific Workshop on NetworkingPages 109–115https://doi.org/10.1145/3600061.3600080

Traditionally, a firewall tracks the per-flow spread of each source and destination IP address to detect network scans and DDoS attacks. It is not designed with hierarchical IP addresses in mind. However, cyberattacks nowadays become more stealthy. To ...

Applied Filters

People

Names

Institutions

Authors

Editors

Advisors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences