Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3650215.3650255acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcaConference Proceedingsconference-collections
research-article

M-TBQA: Multimodal Table-Based Question Answering

Published: 16 April 2024 Publication History

Abstract

In recent years, there has been considerable interest in the research on table-based question answering (TBQA). These studies aim to understand the table content and generate the answer for the given table. With the rapid growth of the Internet, table content is no longer confined to pure text but also includes images. However, most previous methods overlook the important visual information linked to table cells and focus solely on parsing questions to logic form using textual tabular data. This limitation prevents the exploration of the multimodal application of TBQA. Therefore, we propose a novel task multimodal table-based question answering (M-TBQA), which is required to perform multimodal question answering on tabular data with images. Specially, M-TBQA initially identifies candidate rows relevant to the question using a table-relevance net, and then predicts the expected answer using an answer prediction net. This approach effectively harnesses multimodal information from both tabular data and images to predict answers accurately. Additionally, it highlights the significant role of images in the M-TBQA task. To facilitate related research in the further, the dataset will be released at https://github.com/cooperResearch001/M-TBQA/tree/main

References

[1]
Hwang, W., Yim, J., Park, S., & Seo, M. 2019. A comprehensive exploration on wikisql with table-aware word contextualization. arXiv preprint arXiv:1902.01069.
[2]
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425-2433).
[3]
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086).
[4]
Liu, Y., Li, H., Garcia-Duran, A., Niepert, M., Onoro-Rubio, D., & Rosenblum, D. S. 2019. MMKG: multi-modal knowledge graphs. In The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16 (pp. 459-474). Springer International Publishing.
[5]
He, P., Mao, Y., Chakrabarti, K., & Chen, W. 2019. X-SQL: reinforce schema representation with context. arXiv preprint arXiv:1908.08113.
[6]
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[7]
Rahman, W., Hasan, M. K., Zadeh, A., Morency, L. P., & Hoque, M. E. 2019. 7: Injecting multimodal information in the bert structure. arXiv preprint arXiv:1908.05787.
[8]
He, K., Zhang, X., Ren, S., & Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[9]
Herzig, J., Nowak, P. K., Müller, T., Piccinno, F., & Eisenschlos, J. M. 2020. TaPas: Weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349.
[10]
Yuan, S., Shen, X., Zhao, Y., Liu, H., Yan, Z., Liu, R., & Chen, M. 2022, September. MCIC: multimodal conversational intent classification for E-commerce customer service. In CCF International Conference on Natural Language Processing and Chinese Computing (pp. 749-761). Cham: Springer International Publishing.
[11]
Liu, R., Yuan, S., Dai, A., Shen, L., Zhu, T., Chen, M., & He, X. 2022, October. Few-Shot Table Understanding: A Benchmark Dataset and Pre-Training Baseline. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 3741-3752).
[12]
Xing, Y., Shi, Z., Meng, Z., Lakemeyer, G., Ma, Y., & Wattenhofer, R. 2021. Km-bart: Knowledge enhanced multimodal bart for visual commonsense generation. arXiv preprint arXiv:2101.00419.
[13]
Sun, N., Yang, X., & Liu, Y. 2020. Tableqa: a large-scale chinese text-to-sql dataset for table-aware sql generation. arXiv preprint arXiv:2006.06434.
[14]
Liu, Q., Chen, B., Guo, J., Ziyadi, M., Lin, Z., Chen, W., & Lou, J. G. 2021. TAPEX: Table pre-training via learning a neural SQL executor. arXiv preprint arXiv:2107.07653.
[15]
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
[16]
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077-6086).
[17]
Xu, X., Liu, C., & Song, D. 2017. Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436.

Index Terms

  1. M-TBQA: Multimodal Table-Based Question Answering

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMLCA '23: Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application
    October 2023
    1065 pages
    ISBN:9798400709449
    DOI:10.1145/3650215
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 April 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • ?1+27+N?ICT ????????????????

    Conference

    ICMLCA 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 15
      Total Downloads
    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media