Code release for "SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency"
- Models.
- Evaluation pipeline.
- Training pipeline.
- Clone this repository and navigate to SAISA folder
git clone https://github.com/icip-cas/SAISA.git
cd SAISA
- Install Package
conda create -n saisa python=3.10 -y
conda activate saisa
pip install --upgrade pip # enable PEP 660 support
pip install -e .
- Install additional packages for evaluation with lmms-eval
cd lmms_eval
pip install -e .
Chat about images using SAISA.
python -m llava.serve.cli \
--model-path yuanqianhao/saisa-vicuna \
--image-file "https://llava-vl.github.io/static/images/view.jpg"
LMMs-Eval is an evaluation framework meticulously crafted for consistent and efficient evaluation of LMM.
export MODEL_PATH="yuanqianhao/saisa-vicuna"
export MODEL_NAME="saisa_vicuna"
export CONV_MODE="v1"
accelerate launch --num_processes=1 --main_process_port=12346 -m lmms_eval \
--model llava \
--model_args pretrained=${MODEL_PATH},conv_template=${CONV_MODE} \
--tasks mmmu_val \
--batch_size 1 \
--log_samples_suffix ${MODEL_NAME} \
--output_path ./logs/
export MODEL_PATH="yuanqianhao/saisa-llama3"
export MODEL_NAME="saisa_llama3"
export CONV_MODE="llama_3"
accelerate launch --num_processes=1 --main_process_port=12346 -m lmms_eval \
--model llava \
--model_args pretrained=${MODEL_PATH},conv_template=${CONV_MODE} \
--tasks mmmu_val \
--batch_size 1 \
--log_samples_suffix ${MODEL_NAME} \
--output_path ./logs/
See Evaluation.md.
This work is built upon the LLaVA and lmms-eval.
If you find ShortV useful for your research and applications, please cite using this BibTeX:
@article{yuan2025saisa,
title={SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency},
author={Yuan, Qianhao and Liu, Yanjiang and Lu, Yaojie and Lin, Hongyu and He, Ben and Han, Xianpei and Sun, Le},
journal={arXiv preprint arXiv:2502.02458},
year={2025}
}