-
Westlake University
- Hang Zhou, Zhe Jiang, China
-
10:14
(UTC +08:00) - https://akawincent.github.io/
- https://www.zhihu.com/people/wincent-84
- @pu_wen99907
Lists (9)
Sort Name ascending (A-Z)
Stars
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[NeurIPS 2025] E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization
[ICCV 2025] Official impl. of "MV-Adapter: Multi-view Consistent Image Generation Made Easy"
A simple state update rule to enhance length generalization for CUT3R
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perception and reasoning in VLMs.
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Code for FastVGGT: Training-Free Acceleration of Visual Geometry Transformer
Official repo and evaluation implementation of VSI-Bench
🚀🚀 Efficient implementations of Native Sparse Attention
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
4DNeX: Feed-Forward 4D Generative Modeling Made Easy
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Reference PyTorch implementation and models for DINOv3
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
✨✨Latest Advances on Multimodal Large Language Models
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes"
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
Neural Scene Flow Prior (NeurIPS 2021 spotlight)
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence