research-article

Neural caption generation over figures

Authors:

Razvan BunescuAuthors Info & Claims

UbiComp/ISWC '19 Adjunct: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers

Pages 482 - 485

https://doi.org/10.1145/3341162.3345601

Published: 09 September 2019 Publication History

Abstract

Figures are human-friendly but difficult for computers to process automatically. In this work, we investigate the problem of figure captioning. The goal is to automatically generate a natural language description of a given figure. We create a new dataset for figure captioning, FigCAP. To achieve accurate generation of labels in figures, we propose the Label Maps Attention Model. Extensive experiments show that our method outperforms the baselines. A successful solution to this task allows figure content to be accessible to those with visual impairment by providing input to a text-to-speech system; and enables automatic parsing of vast repositories of documents where figures are pervasive.

References

[1]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65--72.

[2]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255.

[3]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[4]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[5]

Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. DVQA: Understanding Data Visualizations via Question Answering. In CVPR. 5648--5656.

[6]

Samira Ebrahimi Kahou, Adam Atkinson, Vincent Michalski, Ákos Kádár, Adam Trischler, and Yoshua Bengio. 2017. Figureqa: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300 (2017).

[7]

Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128--3137.

[8]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[9]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out (2004).

[10]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755.

[11]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL. Association for Computational Linguistics, 311--318.

Digital Library

[12]

Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2016. Self-critical sequence training for image captioning. In CVPR.

[13]

Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing result-figures in research papers. In ECCV. Springer, 664--680.

[14]

Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In CVPR. 4566--4575.

[15]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML. 2048--2057.

Digital Library

Cited By

Kim DChoi SKim JSetlur VAgrawala M(2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3327150
Bromley DSetlur V(2023)What Is the Difference Between a Mountain and a Molehill? Quantifying Semantic Labeling of Visual Features in Line Charts2023 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54172.2023.00041(161-165)Online publication date: 21-Oct-2023
https://doi.org/10.1109/VIS54172.2023.00041
Aubakirova DGerdes KLiu L(2023)PatFig: Generating Short and Long Captions for Patent Figures2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00305(2835-2841)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ICCVW60793.2023.00305
Show More Cited By

Index Terms

Neural caption generation over figures
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
    2. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Attention based sequence-to-sequence framework for auto image caption generation

Caption generation using an encoder-decoder approach has recently been extensively studied and implemented in various domains, including image captioning and code captioning. In this research article, we propose one particular approach for completing a ...
Summarization-based Video Caption via Deep Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Generating appropriate descriptions for visual content draws increasing attention recently, where the promising progresses were obtained owing to the breakthroughs in deep neural networks. Different from the traditional SVO (subject, verb, object) based ...
Learning Multimodal Attention LSTM Networks for Video Captioning
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Automatic generation of video caption is a challenging task as video is an information-intensive media with complex variations. Most existing methods, either based on language templates or sequence learning, have treated video as a flat data sequence ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UbiComp/ISWC '19 Adjunct: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers

September 2019

1234 pages

ISBN:9781450368698

DOI:10.1145/3341162

General Chairs:
Robert Harle
Cambridge University
,
Katayoun Farrahi
University Of Southampton
,
Nicholas Lane
University Of Oxford And Samsung Ai

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

UbiComp '19

Sponsor:

UbiComp '19: The 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing

September 9 - 13, 2019

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 764 of 2,912 submissions, 26%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
381
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim DChoi SKim JSetlur VAgrawala M(2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3327150
Bromley DSetlur V(2023)What Is the Difference Between a Mountain and a Molehill? Quantifying Semantic Labeling of Visual Features in Line Charts2023 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54172.2023.00041(161-165)Online publication date: 21-Oct-2023
https://doi.org/10.1109/VIS54172.2023.00041
Aubakirova DGerdes KLiu L(2023)PatFig: Generating Short and Long Captions for Patent Figures2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00305(2835-2841)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ICCVW60793.2023.00305
Ramesh Kashyap AYang YKan M(2023)Scientific document processing: challenges for modern learning methodsInternational Journal on Digital Libraries10.1007/s00799-023-00352-724:4(283-309)Online publication date: 24-Mar-2023
https://doi.org/10.1007/s00799-023-00352-7
Ahmed SJawade BPandey SSetlur SGovindaraju V(2023)RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order LogicDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41682-8_5(66-83)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-41682-8_5
Chintalapati SBragg JWang L(2022)A Dataset of Alt Texts from HCI PublicationsProceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3517428.3544796(1-12)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1145/3517428.3544796
Elavsky FBennett CMoritz D(2022)How accessible is my visualization? Evaluating visualization accessibility with ChartabilityComputer Graphics Forum10.1111/cgf.1452241:3(57-70)Online publication date: 12-Aug-2022
https://doi.org/10.1111/cgf.14522
Lundgard ASatyanarayan A(2022)Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic ContentIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311477028:1(1073-1083)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TVCG.2021.3114770
Wu AWang YShu XMoritz DCui WZhang HZhang DQu H(2022)AI4VIS: Survey on Artificial Intelligence Approaches for Data VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.309900228:12(5049-5070)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1109/TVCG.2021.3099002
Kim DSetlur VAgrawala MKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)Towards Understanding How Readers Integrate Charts and Captions: A Case Study with Line ChartsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445443(1-11)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445443

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten