research-article

M-TBQA: Multimodal Table-Based Question Answering

Authors:

Rongrong Zheng,

Tiangang ZhuAuthors Info & Claims

ICMLCA '23: Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application

Pages 227 - 231

https://doi.org/10.1145/3650215.3650255

Published: 16 April 2024 Publication History

Abstract

In recent years, there has been considerable interest in the research on table-based question answering (TBQA). These studies aim to understand the table content and generate the answer for the given table. With the rapid growth of the Internet, table content is no longer confined to pure text but also includes images. However, most previous methods overlook the important visual information linked to table cells and focus solely on parsing questions to logic form using textual tabular data. This limitation prevents the exploration of the multimodal application of TBQA. Therefore, we propose a novel task multimodal table-based question answering (M-TBQA), which is required to perform multimodal question answering on tabular data with images. Specially, M-TBQA initially identifies candidate rows relevant to the question using a table-relevance net, and then predicts the expected answer using an answer prediction net. This approach effectively harnesses multimodal information from both tabular data and images to predict answers accurately. Additionally, it highlights the significant role of images in the M-TBQA task. To facilitate related research in the further, the dataset will be released at https://github.com/cooperResearch001/M-TBQA/tree/main

References

[1]

Hwang, W., Yim, J., Park, S., & Seo, M. 2019. A comprehensive exploration on wikisql with table-aware word contextualization. arXiv preprint arXiv:1902.01069.

[2]

Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425-2433).

Digital Library

[3]

Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086).

[4]

Liu, Y., Li, H., Garcia-Duran, A., Niepert, M., Onoro-Rubio, D., & Rosenblum, D. S. 2019. MMKG: multi-modal knowledge graphs. In The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16 (pp. 459-474). Springer International Publishing.

Digital Library

[5]

He, P., Mao, Y., Chakrabarti, K., & Chen, W. 2019. X-SQL: reinforce schema representation with context. arXiv preprint arXiv:1908.08113.

[6]

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[7]

Rahman, W., Hasan, M. K., Zadeh, A., Morency, L. P., & Hoque, M. E. 2019. 7: Injecting multimodal information in the bert structure. arXiv preprint arXiv:1908.05787.

[8]

He, K., Zhang, X., Ren, S., & Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

[9]

Herzig, J., Nowak, P. K., Müller, T., Piccinno, F., & Eisenschlos, J. M. 2020. TaPas: Weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349.

[10]

Yuan, S., Shen, X., Zhao, Y., Liu, H., Yan, Z., Liu, R., & Chen, M. 2022, September. MCIC: multimodal conversational intent classification for E-commerce customer service. In CCF International Conference on Natural Language Processing and Chinese Computing (pp. 749-761). Cham: Springer International Publishing.

[11]

Liu, R., Yuan, S., Dai, A., Shen, L., Zhu, T., Chen, M., & He, X. 2022, October. Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 3741-3752).

[12]

Xing, Y., Shi, Z., Meng, Z., Lakemeyer, G., Ma, Y., & Wattenhofer, R. 2021. Km-bart: Knowledge enhanced multimodal bart for visual commonsense generation. arXiv preprint arXiv:2101.00419.

[13]

Sun, N., Yang, X., & Liu, Y. 2020. Tableqa: a large-scale chinese text-to-sql dataset for table-aware sql generation. arXiv preprint arXiv:2006.06434.

[14]

Liu, Q., Chen, B., Guo, J., Ziyadi, M., Lin, Z., Chen, W., & Lou, J. G. 2021. TAPEX: Table pre-training via learning a neural SQL executor. arXiv preprint arXiv:2107.07653.

[15]

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.

[16]

Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086).

[17]

Xu, X., Liu, C., & Song, D. 2017. Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436.

Index Terms

M-TBQA: Multimodal Table-Based Question Answering
1. Networks
  1. Network algorithms
    1. Data path algorithms

Recommendations

Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Community Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...
Will my question be answered? predicting "Question Answerability" in community question-answering sites
ECMLPKDD'13: Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III

All askers who post questions in Community-based Question Answering (CQA) sites such as Yahoo! Answers, Quora or Baidu's Zhidao, expect to receive an answer, and are frustrated when their questions remain unanswered. We propose to provide a type of "...
Photo-based question answering
MM '08: Proceedings of the 16th ACM international conference on Multimedia

Photo-based question answering is a useful way of finding information about physical objects. Current question answering (QA) systems are text-based and can be difficult to use when a question involves an object with distinct visual features. A photo-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLCA '23: Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application

October 2023

1065 pages

ISBN:9798400709449

DOI:10.1145/3650215

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

?1+27+N?ICT ????????????????

Conference

ICMLCA 2023

ICMLCA 2023: 2023 4th International Conference on Machine Learning and Computer Application

October 27 - 29, 2023

Hangzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
15
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents