End-to-end Image Captioning via Visual Region Aggregation and Dual-level Collaboration

Home > Archive>Volume 13, Issue 2, 2023 >221-241. DOI:10.21655/ijsi.1673-7288.00316

End-to-end Image Captioning via Visual Region Aggregation and Dual-level Collaboration
DOI:
                        10.21655/ijsi.1673-7288.00316
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To date, Transformer-based pre-trained models have demonstrated powerful capabilities of modality representation, leading to a shift towards a fully end-to-end paradigm for multimodal downstream tasks such as image captioning, and enabling better performance and faster inference. However, the grid features extracted with the pre-trained model lack regional visual information, which leads to inaccurate descriptions of the object content by the model. Thus, the applicability of using pre-trained models for image captioning remains largely unexplored. Toward this goal, this paper proposes a novel end-to-end image captioning method based on Visual Region Aggregation and Dual-level Collaboration (VRADC). Specifically, to learn regional visual information, this paper designs a visual region aggregation that aggregates grid features with similar semantics to obtain a compact visual region representation. Next, dual-level collaboration uses the cross-attention mechanism to learn more representative semantic information from the two visual features, which in turn generates more fine-grained descriptions. Experimental results on the MSCOCO and Flickr30k datasets show that the proposed method, VRADC, can significantly improve the quality of image captioning, and achieves state-of-the-art performance.

Reference

Cited by

Get Citation

Jingkuan Song, Pengpeng Zeng, Jiayang Gu, Jinkuan Zhu, Lianli Gao. End-to-end Image Captioning via Visual Region Aggregation and Dual-level Collaboration. International Journal of Software and Informatics, 2023,13(2):221~241

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 18,2022
Revised:
Adopted:August 24,2022
Online: June 29,2023
Published:

Home

About Journal

Editorial Board

Guidelines

Content

News

Top papers

E-mail Alert

Publication Ethics

Old Version

Get Citation

Share

Article Metrics

History