-
NWPU -> NKU
- Tianjin, China
-
16:21
(UTC +08:00) - https://jbwang1997.github.io/
Stars
Pytorch implementation of "Genie: Generative Interactive Environments", Bruce et al. (2024).
Universal 3D World Reconstruction with Any-Prior Prompting
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
[ICCV 2023 Oral] Game-theoretic modeling and learning of Transformer-based interactive prediction and planning
[NeurIPS 2025] Official implementation for "Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling"
[ICCV 2025] SuperDec: 3D Scene Decomposition with Superquadric Primitives.
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Official Implementation of DA^2: Depth Anything in Any Direction
SOTAMak1r / Infinite-Forcing
Forked from guandeh17/Self-ForcingInfinite-Forcing: Towards Infinite-Long Video Generation
[NeurIPS 2025 (Spotlight)] The implementation for the paper "4DGT Learning a 4D Gaussian Transformer Using Real-World Monocular Videos"
[NeurIPS'25 Spotlight] GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
[NeurIPS 2025] RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
A minimal implementation of DeepMind's Genie world model
[CVPR 2024 Highlight] Visual Point Cloud Forecasting
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion (ICCV 2025)
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Official implementation of Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction