Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3341162.3345601acmconferencesArticle/Chapter ViewAbstractPublication PagesubicompConference Proceedingsconference-collections
research-article

Neural caption generation over figures

Published: 09 September 2019 Publication History

Abstract

Figures are human-friendly but difficult for computers to process automatically. In this work, we investigate the problem of figure captioning. The goal is to automatically generate a natural language description of a given figure. We create a new dataset for figure captioning, FigCAP. To achieve accurate generation of labels in figures, we propose the Label Maps Attention Model. Extensive experiments show that our method outperforms the baselines. A successful solution to this task allows figure content to be accessible to those with visual impairment by providing input to a text-to-speech system; and enables automatic parsing of vast repositories of documents where figures are pervasive.

References

[1]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65--72.
[2]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255.
[3]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[4]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[5]
Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. DVQA: Understanding Data Visualizations via Question Answering. In CVPR. 5648--5656.
[6]
Samira Ebrahimi Kahou, Adam Atkinson, Vincent Michalski, Ákos Kádár, Adam Trischler, and Yoshua Bengio. 2017. Figureqa: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300 (2017).
[7]
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128--3137.
[8]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[9]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out (2004).
[10]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755.
[11]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL. Association for Computational Linguistics, 311--318.
[12]
Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2016. Self-critical sequence training for image captioning. In CVPR.
[13]
Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing result-figures in research papers. In ECCV. Springer, 664--680.
[14]
Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In CVPR. 4566--4575.
[15]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML. 2048--2057.

Cited By

View all
  • (2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
  • (2023)What Is the Difference Between a Mountain and a Molehill? Quantifying Semantic Labeling of Visual Features in Line Charts2023 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54172.2023.00041(161-165)Online publication date: 21-Oct-2023
  • (2023)PatFig: Generating Short and Long Captions for Patent Figures2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00305(2835-2841)Online publication date: 2-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
UbiComp/ISWC '19 Adjunct: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers
September 2019
1234 pages
ISBN:9781450368698
DOI:10.1145/3341162
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CNN
  2. LSTM
  3. figure captioning
  4. neural networks

Qualifiers

  • Research-article

Conference

UbiComp '19

Acceptance Rates

Overall Acceptance Rate 764 of 2,912 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)6
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
  • (2023)What Is the Difference Between a Mountain and a Molehill? Quantifying Semantic Labeling of Visual Features in Line Charts2023 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS54172.2023.00041(161-165)Online publication date: 21-Oct-2023
  • (2023)PatFig: Generating Short and Long Captions for Patent Figures2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00305(2835-2841)Online publication date: 2-Oct-2023
  • (2023)Scientific document processing: challenges for modern learning methodsInternational Journal on Digital Libraries10.1007/s00799-023-00352-724:4(283-309)Online publication date: 24-Mar-2023
  • (2023)RealCQA: Scientific Chart Question Answering as a Test-Bed for First-Order LogicDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41682-8_5(66-83)Online publication date: 21-Aug-2023
  • (2022)A Dataset of Alt Texts from HCI PublicationsProceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3517428.3544796(1-12)Online publication date: 23-Oct-2022
  • (2022)How accessible is my visualization? Evaluating visualization accessibility with ChartabilityComputer Graphics Forum10.1111/cgf.1452241:3(57-70)Online publication date: 12-Aug-2022
  • (2022)Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic ContentIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311477028:1(1073-1083)Online publication date: 1-Jan-2022
  • (2022)AI4VIS: Survey on Artificial Intelligence Approaches for Data VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.309900228:12(5049-5070)Online publication date: 1-Dec-2022
  • (2021)Towards Understanding How Readers Integrate Charts and Captions: A Case Study with Line ChartsProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445443(1-11)Online publication date: 6-May-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media