GitHub - Zirconium233/DTPF: Source code for "Distilling Textual Priors from LLM to Efficient Image Fusion"

Introduction

Source code for Distilling Textual Priors from LLM to Efficient Image Fusion

Requirements

We use python 3.10.6 with pytorch 2.0.1, cuda 11.8 and lightning 2.2.0. The full environment can be found in requirements.txt. But it contains some packages that are not necessary for this project, so we recommend that you follow the guide below to install the key libraries and then go there to find the missing libraries from requirements.txt based on the error messages.

First, install the key libraries:

pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install pytorch_lightning==2.2.0

If you got error about pytorch_lightning, you can replace it with pip install lightning==2.2.0, or edit the code to use:

import pytorch_lightning as pl # old version
import lightning.pytorch as pl # new version

Next, the clip should be installed by the following command:

pip install git+https://github.com/openai/CLIP.git

Then, run the following command to install the missing libraries:

pip install easydict, tensorboardX

If you still encounter some errors, you can install the missing libraries from requirements.txt. If you want to directly run the code by pip install -r requirements.txt, all testing was done on Ubuntu 22.04 LTS.

Data

We use the following datasets, you can download them from their official website.

MSRS, M3FD, RoadScene for ivf
Havard Medical Image Fusion Datasets for medical

Usage

Text Annotation

We provide the anno folder, which contains the example code for using qwen2-vl to generate text annotations for the images in the data folder. We suggest to generate the text before inference and training to save time. You can refer to qwen2-vl for more details(Now suggest qwen2.5-vl). The transformer and flash attention library are required.

pip install transformers
pip install flash-attn==2.5.9.post1 --no-build-isolation 
# recommend to use this version if you have same pytorch and lighting version.

The usage of the code can be found in argparse. It is recommended to change the parameters based on your own needs, so we do not provide the detailed usage here. If you encounter an error like cuda out of memory. , you can use AWQ or GPTQ quantization.

pip install autoawq # for AWQ
pip install auto-gptq optimum # for GPTQ

# Example code for loading quantized models (AWQ/GPTQ)

from transformers import AutoModelForCausalLM, AutoTokenizer

# Model repository path (Hugging Face Hub ID or local path)
model_path_awq = "path/to/your/qwen-vl-awq-model" # Assuming this is an AWQ quantized model
model_path_gptq = "path/to/your/qwen-vl-gptq-model" # Assuming this is a GPTQ quantized model

# Load Tokenizer (usually the same as the original model)
# Use either the AWQ or GPTQ path depending on which model you are loading below
tokenizer = AutoTokenizer.from_pretrained(model_path_awq, trust_remote_code=True)

# Load AWQ model (requires autoawq to be installed)
print("Loading AWQ model...")
model_awq = AutoModelForCausalLM.from_pretrained(
    model_path_awq,
    trust_remote_code=True,
    device_map="auto" # Automatically map to GPU
)
print("AWQ model loaded.")
# You can now use model_awq for inference
# ... perform inference ...
# del model_awq # Release GPU memory if needed

# Load GPTQ model (requires auto-gptq and optimum to be installed)
print("\nLoading GPTQ model...")
model_gptq = AutoModelForCausalLM.from_pretrained(
    model_path_gptq,
    trust_remote_code=True,
    device_map="auto" # Automatically map to GPU
    # For GPTQ, you might sometimes need to specify use_safetensors=True/False
    # use_safetensors=True # If the model uses the .safetensors format
)
print("GPTQ model loaded.")
# You can now use model_gptq for inference
# ... perform inference ...
# del model_gptq # Release GPU memory

Training and Testing

You can edit the configs/Train_text_xrestormer.yaml to change the parameters. Then, run the following command to start training:

python train.py --config configs/Train_text_xrestormer.yaml # train
python test.py --config configs/Train_text_xrestormer.yaml --test_all -i sliding_window # test
# Note that full_image mode may uses a lot of memory for large images, but sliding_window mode take up more time.

Pre-trained models

We provide a pre-trained distilled model for the IVF task in the Experiments/EXP_IVF folder. Due to size limitations, we do not include the teacher model. This model was trained on the MSRS dataset (note that all competing methods were trained exclusively on MSRS, with other datasets used solely for testing) and demonstrates superior generalization performance for IVF tasks. Notably, it maintains strong performance even on medical tasks outside its training scope, though the reported medical task metrics do not originate from this model. For medical datasets, you can train the model using configs/Train_text_xrestormer_Med.yaml.

Appendix

Compare with other methods: IVF datasets: Havard Medical Image Fusion Datasets:

If you find our work useful, please cite it as:

@misc{zhang2025distillingtextualpriorsllm,
      title={Distilling Textual Priors from LLM to Efficient Image Fusion}, 
      author={Ran Zhang and Xuanhua He and Ke Cao and Liu Liu and Li Zhang and Man Zhou and Jie Zhang},
      year={2025},
      eprint={2504.07029},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.07029}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Requirements

Data

Usage

Text Annotation

Training and Testing

Pre-trained models

Appendix

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Datasets		Datasets
Experiments/EXP_IVF		Experiments/EXP_IVF
Metric_Python		Metric_Python
anno		anno
configs		configs
images		images
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Zirconium233/DTPF

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements

Data

Usage

Text Annotation

Training and Testing

Pre-trained models

Appendix

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages