Nothing Special   »   [go: up one dir, main page]

Skip to content

Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.

License

Notifications You must be signed in to change notification settings

kywen1119/DSRAN

Repository files navigation

Introduction

This is the official source code for Dual Semantic Relations Attention Network(DSRAN) proposed in our journal paper Learning Dual Semantic Relations with Graph Attention for Image-Text Matching (TCSVT 2020). It is built on top of the VSE++ in PyTorch.

The framework of DSRAN:

The results on MSCOCO and Flickr30K dataset:(With BERT or GRU)

GRU Image-to-Text Text-to-Image
Dataset R@1 R@5 R@10 R@1 R@5 R@10 Rsum
MSCOCO-1K 80.4 96.7 98.7 64.2 90.4 95.8 526.2
MSCOCO-5K 57.6 85.6 91.9 41.5 71.9 82.1 430.6
Flickr30k 79.6 95.6 97.5 58.6 85.8 91.3 508.4
BERT Image-to-Text Text-to-Image
Dataset R@1 R@5 R@10 R@1 R@5 R@10 Rsum
MSCOCO-1K 80.6 96.7 98.7 64.5 90.8 95.8 527.1
MSCOCO-5K 57.9 85.3 92.0 41.7 72.7 82.8 432.4
Flickr30k 80.5 95.5 97.9 59.2 86.0 91.9 511.0

Requirements and Installation

We recommended the following dependencies.

  • Python 3.6
  • PyTorch 1.1.0
  • NumPy (>1.12.1)
  • torchtext
  • pycocotools
  • nltk

Download data

Download the raw images, pre-computed image features, pre-trained BERT models, pre-trained ResNet152 model and pre-trained DSRAN models. As for the raw images, they can be downloaded from VSE++.

wget http://www.cs.toronto.edu/~faghri/vsepp/data.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/vocab.tar

We refer to the path of extracted files for data.tar as $DATA_PATH while only raw images are used which are coco and f30k.

For pre-computed image features, they can be obtained from VLP. These zip files should be extracted into the fold data/joint-pretrain. We refer to the path of extracted region_bbox_file(.h5) as $REGION_BBOX_FILE and regional feature paths feat_cls_1000/ for COCO and trainval/ for FLICKR30K as $FEATURE_PATH.

Pre-trained ResNet152 model can be downloaded from torchvision and put in the root directory.

wget https://download.pytorch.org/models/resnet152-b121ed2d.pth

For our trained DSRAN models, you can download runs.zip on Google Drive or GRU.zip together with BERT.zip on BaiduNetDisk(extract code:1119). There are totally 8 models (4 for each dataset).

Pre-trained BERT models are obtained form an old version of transformers. It is noticed that there's a simpler way of using BERT as seen in transformers. We'll update the code in the future. The pre-trained models we use can be downloaded from the same Google Drive and BaiduNetDisk(extract code:1119) links. We refer to the path of extracted files for uncased_L-12_H-768_A-12.zip as $BERT_PATH.

Data Structure

├── data/
|   ├── coco/           /* MSCOCO raw images
|   |   ├── images/
|   |   |   ├── train2014/
|   |   |   ├── val2014/
|   |   ├── annotations/
|   ├── f30k/           /* Flickr30K raw images
|   |   ├── images/
|   |   ├── dataset_flickr30k.json
|   ├── joint-pretrain/           /* pre-computed image features
|   |   ├── COCO/
|   |   |   ├── region_feat_gvd_wo_bgd/
|   |   |   |   ├── feat_cls_1000/           /* $FEATURE_PATH
|   |   |   |   ├── coco_detection_vg_thresh0.2_feat_gvd_checkpoint_trainvaltest.h5  /* $REGION_BBOX_FILE
|   |   |   ├── annotations/
|   |   ├── flickr30k/
|   |   |   ├── region_feat_gvd_wo_bgd/
|   |   |   |   ├── trainval/                /* $FEATURE_PATH
|   |   |   |   ├── flickr30k_detection_vg_thresh0.2_feat_gvd_checkpoint_trainvaltest.h5  /* $REGION_BBOX_FILE
|   |   |   ├── annotations/

Evaluate trained models

Test on single model:

  • Test on MSCOCO dataset (1K and 5K simultaneously):

    • Test on BERT-based models:
    python evaluation_bert.py --model BERT/cc_model1 --fold --data_path "$DATA_PATH" --region_bbox_file "$REGION_BBOX_FILE" --feature_path "$FEATURE_PATH"
    • Test on GRU-based models:
    python evaluation.py --model GRU/cc_model1 --fold --data_path "$DATA_PATH" --region_bbox_file "$REGION_BBOX_FILE" --feature_path "$FEATURE_PATH"
  • Test on Flickr30K dataset:

    • Test on BERT-based models:
    python evaluation_bert.py --model BERT/f_model1 --data_path "$DATA_PATH" --region_bbox_file "$REGION_BBOX_FILE" --feature_path "$FEATURE_PATH"
    • Test on GRU-based models:
    python evaluation.py --model GRU/f_model1 --data_path "$DATA_PATH" --region_bbox_file "$REGION_BBOX_FILE" --feature_path "$FEATURE_PATH"

Test on two-models ensemble and re-rank:

/* Remember to modify the "$DATA_PATH", "$REGION_BBOX_FILE" and "$FEATURE_PATH" in the .sh files.

  • Test on MSCOCO dataset (1K and 5K simultaneously):

    • Test on BERT-based models:
    sh test_bert_cc.sh
    • Test on GRU-based models:
    sh test_gru_cc.sh
  • Test on Flickr30K dataset:

    • Test on BERT-based models:
    sh test_bert_f.sh
    • Test on GRU-based models:
    sh test_gru_f.sh

Train new models

Train a model with BERT on MSCOCO:

python train_bert.py --data_path "$DATA_PATH" --data_name coco --num_epochs 18 --batch_size 320 --lr_update 9 --logger_name runs/cc_bert --bert_path "$BERT_PATH" --ft_bert --warmup 0.1 --K 4 --feature_path "$FEATURE_PATH" --region_bbox_file "$REGION_BBOX_FILE"

Train a model with BERT on Flickr30K:

python train_bert.py --data_path "$DATA_PATH" --data_name f30k --num_epochs 12 --batch_size 128 --lr_update 6 --logger_name runs/f_bert --bert_path "$BERT_PATH" --ft_bert --warmup 0.1 --K 2 --feature_path "$FEATURE_PATH" --region_bbox_file "$REGION_BBOX_FILE"

Train a model with GRU on MSCOCO:

python train.py --data_path "$DATA_PATH" --data_name coco --num_epochs 18 --batch_size 300 --lr_update 9 --logger_name runs/cc_gru --use_restval --K 2 --feature_path "$FEATURE_PATH" --region_bbox_file "$REGION_BBOX_FILE"

Train a model with GRU on Flickr30K:

python train.py --data_path "$DATA_PATH" --data_name f30k --num_epochs 16 --batch_size 128 --lr_update 8 --logger_name runs/f_gru --use_restval --K 2 --feature_path "$FEATURE_PATH" --region_bbox_file "$REGION_BBOX_FILE"

Acknowledgement

We thank Linyang Li for the help with the code and provision of some computing resources.

Reference

If DSRAN is useful for your research, please cite our paper:

@ARTICLE{9222079,
  author={Wen, Keyu and Gu, Xiaodong and Cheng, Qingrong},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Learning Dual Semantic Relations With Graph Attention for Image-Text Matching}, 
  year={2021},
  volume={31},
  number={7},
  pages={2866-2879},
  doi={10.1109/TCSVT.2020.3030656}}

License

Apache License 2.0

About

Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published