research-article

Research on Data Augmentation Strategy Methods for Image Caption

Authors:

Mengdi LiuAuthors Info & Claims

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

Pages 1383 - 1388

https://doi.org/10.1145/3573428.3573673

Published: 15 March 2023 Publication History

Abstract

Data augmentation can effectively expand the number of samples in a dataset and increase the diversity of samples. Image caption refers to the generation of a description statement corresponding to an image, and its accuracy directly affects the accuracy of the description statement. In this paper, we study and analyze data augmentation and VizWiz dataset, then we find that data augmentation can effectively simulate the image quality problems existing in VizWiz dataset. In order to improve the accuracy of the image caption model on the VizWiz dataset, this paper presents a method based on a data augmentation strategy, which mainly uses four data augmentation operators to simulate camera shake, out-of-focus, flash and low light conditions. The strategy space also contains basic translate, shear and contrast operations for the image. The method achieves a score: BLEU_1 of 62.5, BLEU_4 of 23.1, ROUGE_L of 46.6 and CIDEr of 49.6 on the VizWiz dataset.

References

[1]

Bigham J P, Jayant C, Ji H, Vizwiz: nearly real-time answers to visual questions [C]// Proceedings of the 23nd annual ACM symposium on User interface software and technology. 2010: 333-342.

[2]

Zhong Z, Zheng L, Kang G, Random erasing data augmentation [C]// Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 13001-13008.

[3]

Hendrycks D, Mu N, Cubuk E D, Augmix: A simple data processing method to improve robustness and uncertainty [J]. arXiv preprint arXiv:1912.02781, 2019.

[4]

Yun S, Han D, Oh S J, Cutmix: Regularization strategy to train strong classifiers with localizable features [C]// Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6023-6032.

[5]

Zhang H, Cisse M, Dauphin Y N, mixup: Beyond empirical risk minimization [J]. arXiv preprint arXiv:1710.09412, 2017.

[6]

Ma S, Fu J, Chen C W, Da-gan: Instance-level image translation by deep attention generative adversarial networks [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5657-5666.

[7]

Goodfellow I, Pouget-Abadie J, Mirza M, Generative adversarial networks [J]. Communications of the ACM, 2020, 63(11): 139-144.

Digital Library

[8]

Cubuk E D, Zoph B, Mane D, Autoaugment: Learning augmentation strategies from data [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 113-123.

[9]

Lim S, Kim I, Kim T, Fast autoaugment [J]. Advances in Neural Information Processing Systems, 2019, 32.

[10]

Zhang X, Wang Q, Zhang J, Adversarial autoaugment [J]. arXiv preprint arXiv:1912.11188, 2019.

[11]

Cubuk E D, Zoph B, Shlens J, Randaugment: Practical automated data augmentation with a reduced search space [C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020: 702-703.

[12]

Ho D, Liang E, Chen X, Population based augmentation: Efficient learning of augmentation policy schedules [C]// International Conference on Machine Learning. PMLR, 2019: 2731-2741.

[13]

Li Y, Hu G, Wang Y, Dada: Differentiable automatic data augmentation [J]. arXiv preprint arXiv:2003.03780, 2020.

[14]

Huang L, Wang W, Chen J, Attention on attention for image captioning [C]// Proceedings of the IEEE/CVF international conference on computer vision. 2019: 4634-4643.

[15]

He K, Zhang X, Ren S, Deep residual learning for image recognition [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

[16]

Xie S, Girshick R, Dollár P, Aggregated residual transformations for deep neural networks [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.

[17]

Papineni K, Roukos S, Ward T, Bleu: a method for automatic evaluation of machine translation [C]// Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002: 311-318.

[18]

Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments [C]// Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 2005: 65-72.

[19]

Lin C Y. Rouge: A package for automatic evaluation of summaries [C]// Text summarization branches out. 2004: 74-81.

[20]

Vedantam R, Lawrence Zitnick C, Parikh D. Cider: Consensus-based image description evaluation [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4566-4575.

[21]

Hochreiter S, Schmidhuber J. Long short-term memory [J]. Neural computation, 1997, 9(8): 1735-1780.

[22]

Chiu T Y, Zhao Y, Gurari D. Assessing image quality issues for real-world problems [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 3646-3656.

Index Terms

Research on Data Augmentation Strategy Methods for Image Caption
1. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods
        Agile software development

Recommendations

Natural Language Model for Image Caption
NLPIR '20: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval

Image caption is a technology that combines the knowledge of computer vision with natural language processing. In this paper, we offer some natural language models by human beings in picture description in hope that they might shed light on relevant ...
LesionMix: A Lesion-Level Data Augmentation Method for Medical Image Segmentation
Data Augmentation, Labelling, and Imperfections
Abstract
Data augmentation has become a de facto component of deep learning-based medical image segmentation methods. Most data augmentation techniques used in medical imaging focus on spatial and intensity transformations to improve the diversity of ...
A Comprehensive Survey of Image Augmentation Techniques for Deep Learning
Highlights
- We examine challenges and vicinity distribution to demonstrate the necessity of image augmentation for deep learning.
Abstract
Although deep learning has achieved satisfactory performance in computer vision, a large volume of images is required. However, collecting images is often expensive and challenging. Many image augmentation algorithms have been proposed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

October 2022

1999 pages

ISBN:9781450397148

DOI:10.1145/3573428

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EITCE 2022

EITCE 2022: 2022 6th International Conference on Electronic Information Technology and Computer Engineering

October 21 - 23, 2022

Xiamen, China

Acceptance Rates

Overall Acceptance Rate 508 of 972 submissions, 52%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
39
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents