Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3573428.3573673acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

Research on Data Augmentation Strategy Methods for Image Caption

Published: 15 March 2023 Publication History

Abstract

Data augmentation can effectively expand the number of samples in a dataset and increase the diversity of samples. Image caption refers to the generation of a description statement corresponding to an image, and its accuracy directly affects the accuracy of the description statement. In this paper, we study and analyze data augmentation and VizWiz dataset, then we find that data augmentation can effectively simulate the image quality problems existing in VizWiz dataset. In order to improve the accuracy of the image caption model on the VizWiz dataset, this paper presents a method based on a data augmentation strategy, which mainly uses four data augmentation operators to simulate camera shake, out-of-focus, flash and low light conditions. The strategy space also contains basic translate, shear and contrast operations for the image. The method achieves a score: BLEU_1 of 62.5, BLEU_4 of 23.1, ROUGE_L of 46.6 and CIDEr of 49.6 on the VizWiz dataset.

References

[1]
Bigham J P, Jayant C, Ji H, Vizwiz: nearly real-time answers to visual questions [C]// Proceedings of the 23nd annual ACM symposium on User interface software and technology. 2010: 333-342.
[2]
Zhong Z, Zheng L, Kang G, Random erasing data augmentation [C]// Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 13001-13008.
[3]
Hendrycks D, Mu N, Cubuk E D, Augmix: A simple data processing method to improve robustness and uncertainty [J]. arXiv preprint arXiv:1912.02781, 2019.
[4]
Yun S, Han D, Oh S J, Cutmix: Regularization strategy to train strong classifiers with localizable features [C]// Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6023-6032.
[5]
Zhang H, Cisse M, Dauphin Y N, mixup: Beyond empirical risk minimization [J]. arXiv preprint arXiv:1710.09412, 2017.
[6]
Ma S, Fu J, Chen C W, Da-gan: Instance-level image translation by deep attention generative adversarial networks [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 5657-5666.
[7]
Goodfellow I, Pouget-Abadie J, Mirza M, Generative adversarial networks [J]. Communications of the ACM, 2020, 63(11): 139-144.
[8]
Cubuk E D, Zoph B, Mane D, Autoaugment: Learning augmentation strategies from data [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 113-123.
[9]
Lim S, Kim I, Kim T, Fast autoaugment [J]. Advances in Neural Information Processing Systems, 2019, 32.
[10]
Zhang X, Wang Q, Zhang J, Adversarial autoaugment [J]. arXiv preprint arXiv:1912.11188, 2019.
[11]
Cubuk E D, Zoph B, Shlens J, Randaugment: Practical automated data augmentation with a reduced search space [C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020: 702-703.
[12]
Ho D, Liang E, Chen X, Population based augmentation: Efficient learning of augmentation policy schedules [C]// International Conference on Machine Learning. PMLR, 2019: 2731-2741.
[13]
Li Y, Hu G, Wang Y, Dada: Differentiable automatic data augmentation [J]. arXiv preprint arXiv:2003.03780, 2020.
[14]
Huang L, Wang W, Chen J, Attention on attention for image captioning [C]// Proceedings of the IEEE/CVF international conference on computer vision. 2019: 4634-4643.
[15]
He K, Zhang X, Ren S, Deep residual learning for image recognition [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[16]
Xie S, Girshick R, Dollár P, Aggregated residual transformations for deep neural networks [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.
[17]
Papineni K, Roukos S, Ward T, Bleu: a method for automatic evaluation of machine translation [C]// Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002: 311-318.
[18]
Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments [C]// Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 2005: 65-72.
[19]
Lin C Y. Rouge: A package for automatic evaluation of summaries [C]// Text summarization branches out. 2004: 74-81.
[20]
Vedantam R, Lawrence Zitnick C, Parikh D. Cider: Consensus-based image description evaluation [C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4566-4575.
[21]
Hochreiter S, Schmidhuber J. Long short-term memory [J]. Neural computation, 1997, 9(8): 1735-1780.
[22]
Chiu T Y, Zhao Y, Gurari D. Assessing image quality issues for real-world problems [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 3646-3656.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering
October 2022
1999 pages
ISBN:9781450397148
DOI:10.1145/3573428
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Computer vision
  2. Data augmentation
  3. Image caption

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EITCE 2022

Acceptance Rates

Overall Acceptance Rate 508 of 972 submissions, 52%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 39
    Total Downloads
  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media