Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-rank Decomposition
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7172–7181https://doi.org/10.1145/3664647.3681588To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. ...
- ArticleOctober 2024
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
AbstractOpen-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may ...
- ArticleOctober 2024
CONDENSE: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images
- Xiaoshuai Zhang,
- Zhicheng Wang,
- Howard Zhou,
- Soham Ghosh,
- Danushen Gnanapragasam,
- Varun Jampani,
- Hao Su,
- Leonidas Guibas
AbstractTo advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D ...
- ArticleAugust 2024
GLAD: A Global-Attention-Based Diffusion Model for Infrared and Visible Image Fusion
Advanced Intelligent Computing Technology and ApplicationsPages 345–356https://doi.org/10.1007/978-981-97-5600-1_30AbstractInfrared and visible image fusion (IVIF) is a widely used approach to enhance scenario understanding, which fuses the salience of infrared images and the texture details of visible images. Existing methods typically focus on extracting local ...
- research-articleJuly 2024
A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose
SIGGRAPH '24: ACM SIGGRAPH 2024 Conference PapersArticle No.: 124, Pages 1–11https://doi.org/10.1145/3641519.3657427Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance ...
-
- research-articleJanuary 2024
An audio‐based risky flight detection framework for quadrotors
AbstractDrones have increasingly collaborated with human workers in some workspaces, such as warehouses. The failure of a drone flight may bring potential risks to human beings' life safety during some aerial tasks. One of the most common flight ...
- research-articleJanuary 2024
General-Purpose Sim2Real Protocol for Learning Contact-Rich Manipulation With Marker-Based Visuotactile Sensors
IEEE Transactions on Robotics (TOR), Volume 40Pages 1509–1526https://doi.org/10.1109/TRO.2024.3352969Visuotactile sensors can provide rich contact information, having great potential in contact-rich manipulation tasks with reinforcement learning (RL) policies. Sim2Real technique tackles the challenge of RL's reliance on a large amount of ...
- research-articleMay 2024
OpenShape: scaling up 3D shape representation towards open-world understanding
- Minghua Liu,
- Ruoxi Shi,
- Kaiming Kuang,
- Yinhao Zhu,
- Xuanlin Li,
- Shizhong Han,
- Hong Cai,
- Fatih Porikli,
- Hao Su
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1944, Pages 44860–44879We introduce OpenShape, a method for learning multi-modal joint representations of text, image, and point clouds. We adopt the commonly used multi-modal contrastive learning framework for representation alignment, but with a specific focus on scaling up ...
- research-articleMay 2024
OpenIllumination: a multi-illumination dataset for inverse rendering evaluation on real objects
- Isabella Liu,
- Linghao Chen,
- Ziyang Fu,
- Liwen Wu,
- Haian Jin,
- Zhong Li,
- Chin Ming Ryan Wong,
- Yi Xu,
- Ravi Ramamoorthi,
- Zexiang Xu,
- Hao Su
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1607, Pages 36951–36962We introduce OpenIllumination, a real-world dataset containing over 108K images of 64 objects with diverse materials, captured under 72 camera views and a large number of different illuminations. For each image in the dataset, we provide accurate camera ...
- research-articleMay 2024
Deductive verification of chain-of-thought reasoning
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1580, Pages 36407–36433Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can ...
- research-articleMay 2024
DiffVL: scaling up soft body manipulation using vision-language driven differentiable physics
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 1301, Pages 29875–29900Combining gradient-based trajectory optimization with differentiable physics simulation is an accurate and efficient technique for solving soft-body manipulation problems. Using a well-crafted optimization objective, the solver can quickly converge onto ...
- research-articleMay 2024
One-2-3-45: any single image to 3D mesh in 45 seconds without per-shape optimization
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 976, Pages 22226–22246Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world. Many existing methods solve this problem by optimizing a neural radiance field under the guidance of 2D diffusion models but ...
- research-articleAugust 2023
MARVEL: Raster Gray-Level Manga Vectorization via Primitive-Wise Deep Reinforcement Learning
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 34, Issue 4Pages 2677–2693https://doi.org/10.1109/TCSVT.2023.3309786Manga is a fashionable Japanese-style comic form that is composed of black-and-white strokes and is generally displayed as raster images on digital devices. Typical mangas have simple textures, wide lines, and few color gradients, which are vectorizable ...
- research-articleAugust 2023
ActiveZero++: Mixed Domain Learning Stereo and Confidence-Based Depth Completion With Zero Annotation
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 45, Issue 12Pages 14098–14113https://doi.org/10.1109/TPAMI.2023.3305399Learning-based stereo methods usually require a large scale dataset with depth, however obtaining accurate depth in the real domain is difficult, but groundtruth depth is readily available in the simulation domain. In this article we propose a new ...
- posterSeptember 2023
Construction of Hardware Course Group in New Engineering Computer Specialty Facing Complex Engineering Problems
ACM TURC '23: Proceedings of the ACM Turing Award Celebration Conference - China 2023Pages 122–124https://doi.org/10.1145/3603165.3607430How to implement the construction of hardware course group for complex engineering problems in the context of the New engineering education certification in China, taking the cultivation of Computer System Capability as an example, through a series of ...
- research-articleJuly 2023
Dictionary Fields: Learning a Neural Basis Decomposition
ACM Transactions on Graphics (TOG), Volume 42, Issue 4Article No.: 156, Pages 1–12https://doi.org/10.1145/3592135We present Dictionary Fields, a novel neural representation which decomposes a signal into a product of factors, each represented by a classical or neural field representation, operating on transformed input coordinates. More specifically, we factorize a ...
- research-articleJuly 2023
Abstract-to-executable trajectory translation for one-shot task generalization
ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 1411, Pages 33850–33882Training long-horizon robotic policies in complex physical environments is essential for many applications, such as robotic manipulation. However, learning a policy that can generalize to unseen tasks is challenging. In this work, we propose to achieve ...
- research-articleJuly 2023
Reparameterized policy learning for multimodal trajectory optimization
ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 567, Pages 13957–13975We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian ...
- research-articleJuly 2023
On pre-training for visuo-motor control: revisiting a learning-from-scratch baseline
- Nicklas Hansen,
- Zhecheng Yuan,
- Yanjie Ze,
- Tongzhou Mu,
- Aravind Rajeswaran,
- Hao Su,
- Huazhe Xu,
- Xiaolong Wang
ICML'23: Proceedings of the 40th International Conference on Machine LearningArticle No.: 506, Pages 12511–12526In this paper, we examine the effectiveness of pretraining for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly ...
- research-articleSeptember 2023
Online Detection of 1D and 2D Hierarchical Super-Spreaders in High-Speed Networks
APNet '23: Proceedings of the 7th Asia-Pacific Workshop on NetworkingPages 109–115https://doi.org/10.1145/3600061.3600080Traditionally, a firewall tracks the per-flow spread of each source and destination IP address to detect network scans and DDoS attacks. It is not designed with hierarchical IP addresses in mind. However, cyberattacks nowadays become more stealthy. To ...