This repository is the official implementation of LMM-Det, a simple yet effective approach that leverages a Large Multimodal Model for vanilla object Detection without relying on specialized detection modules.
LMM-Det: Make Large Multimodal Models Excel in Object Detection
Jincheng Li*, Chunyu Xie*, Ji Ao, Dawei Leng†, Yuhui Yin (*Equal Contribution, ✝Corresponding Author)
- 🚀 [2025/08/01] We have updated the LMM-Det github repository, and now you can test our models!
- 🚀 [2025/07/24] We released the paper of LMM-Det: Make Large Multimodal Models Excel in Object Detection.
- 🚀 [2025/06/26] LMM-Det has been accepted by ICCV'25.
# remember to modify Line 7 in deploy.sh
bash deploy.sh
We also provide the official weight of OWlv2-ViT
We have released our customized dataset during Stage IV.
For the curation details, please refer to: [Custom Data]
Step 1: Download the COCO dataset. You can put COCO into LMM-Det/data or make a soft link using ln -s.
Step 2: Modify the COCO Path in Lines 4-5 in LMM-Det/scripts/eval/eval_coco_model_w_sft_data.sh
Step 3: Download the model and put it into LMM-Det/checkpoints
bash evaluate.sh
We are seeking academic interns in the Multimodal field. If interested, please send your resume to xiechunyu@360.cn.
@misc{li2025lmmdet,
title={LMM-Det: Make Large Multimodal Models Excel in Object Detection},
author={Jincheng Li and Chunyu Xie and Ji Ao and Dawei Leng and Yuhui Yin},
year={2025},
eprint={2507.18300},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.18300},
}
This project is licensed under the Apache License (Version 2.0).
This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!