research-article

A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis

Authors:

Xin LinAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 638 - 646

https://doi.org/10.1145/3652583.3658115

Published: 07 June 2024 Publication History

Abstract

Recently, multimodal sentiment analysis from social media posts has received increasing attention, as it can effectively improve single-modality-based sentiment analysis by leveraging the complementary information between text and images. Despite their success, current methods still suffer from two weaknesses: (1) the current methods for obtaining image representations do not obtain sentiment information, which leads to a significant gap between image representations and results; (2) the current methods ignore the sentiments expressed by the symbols (emoticons, emojis) in the text, but these symbols can effectively reflect the user's sentiments. To address these issues, we propose a sentimental prompt framework with visual text encoder (SPFVTE). Specifically, for the first problem, instead of using the image representation directly, we project the image representation as a prompt and utilize the prompt learning to capture sentimental information in images by learning a sentiment-specific prompt. For the second problem, considering that people get the meanings of emojis and emoticons from their graphics, we propose to render the text as an image and use a visual text encoder to capture the sentiments contained in emojis and emoticons. We have conducted experiments on three public multimodal sentiment datasets, and the experimental results show that our method can significantly and consistently outperform the state-of-the-art methods. The datasets and source code can be found at https://github.com/JinFish/SPFVTE.

References

[1]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.

[2]

Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th annual meeting of the association for computational linguistics. 2506--2515.

[3]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[4]

Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang, and Houfeng Wang. 2019. Text Level Graph Neural Network for Text Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3444--3450.

[5]

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXIII. Springer, 709--727.

[6]

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT. 4171--4186.

[7]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1746--1751.

[8]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).

[9]

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI conference on artificial intelligence, Vol. 29.

[10]

Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4582--4597.

[11]

Zhen Li, Bing Xu, Conghui Zhu, and Tiejun Zhao. 2022. CLMLF: A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection. In Findings of the Association for Computational Linguistics: NAACL 2022. 2282--2294.

[12]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys, Vol. 55, 9 (2023), 1--35.

Digital Library

[13]

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 61--68.

[14]

Teng Niu, Shiai Zhu, Lei Pang, and Abdulmotaleb El Saddik. 2016. Sentiment analysis on multi-view social data. In MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II 22. Springer, 15--27.

[15]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.

[16]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[17]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015).

[18]

Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, and Desmond Elliott. 2023. Language Modelling with Pixels. In The Eleventh International Conference on Learning Representations, ICLR 2023.

[19]

Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2019. NIH Public Access, 6558.

[20]

Xuwu Wang, Junfeng Tian, Min Gui, Zhixu Li, Jiabo Ye, Ming Yan, and Yanghua Xiao. 2022. PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition. In Database Systems for Advanced Applications: 27th International Conference, DASFAA 2022, Virtual Event, April 11-14, 2022, Proceedings, Part III. Springer, 297--305.

Digital Library

[21]

Yiwei Wei, Shaozu Yuan, Ruosong Yang, Lei Shen, Zhangmeizhi Li, Longbiao Wang, and Meng Chen. 2023. Tackling Modality Heterogeneity with Multi-View Calibration Network for Multimodal Sentiment Detection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 5240--5252.

[22]

Bo Xu, Shizhou Huang, Ming Du, Hongya Wang, Hui Song, Yanghua Xiao, and Xin Lin. 2023. A Unified Visual Prompt Tuning Framework with Mixture-of-Experts for Multimodal Information Extraction. In Database Systems for Advanced Applications: 28th International Conference, DASFAA 2023, Tianjin, China, April 17-20, 2023, Proceedings, Part III. Springer, 544--554.

Digital Library

[23]

Nan Xu. 2017. Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In 2017 IEEE international conference on intelligence and security informatics (ISI). IEEE, 152--154.

Digital Library

[24]

Nan Xu and Wenji Mao. 2017. Multisentinet: A deep semantic network for multimodal sentiment analysis. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2399--2402.

Digital Library

[25]

Nan Xu, Wenji Mao, and Guandan Chen. 2018. A co-memory network for multimodal sentiment analysis. In The 41st international ACM SIGIR conference on research & development in information retrieval. 929--932.

Digital Library

[26]

Xiaojun Xue, Chunxia Zhang, Zhendong Niu, and Xindong Wu. 2022. Multi-level attention map network for multimodal sentiment analysis. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 5 (2022), 5105--5118.

[27]

Ashima Yadav and Dinesh Kumar Vishwakarma. 2023. A deep multi-level attentive network for multimodal sentiment analysis. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 19, 1 (2023), 1--19.

Digital Library

[28]

Bo Yang, Lijun Wu, Jinhua Zhu, Bo Shao, Xiaola Lin, and Tie-Yan Liu. 2022. Multimodal sentiment analysis with two-phase multi-task learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 30 (2022), 2015--2024.

Digital Library

[29]

Jiuding Yang, Yakun Yu, Di Niu, Weidong Guo, and Yu Xu. 2023. ConFEDE: Contrastive Feature Decomposition for Multimodal Sentiment Analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 7617--7630.

[30]

Xiaocui Yang, Shi Feng, Daling Wang, and Yifei Zhang. 2020. Image-text multimodal emotion classification via multi-view attentional network. IEEE Transactions on Multimedia, Vol. 23 (2020), 4014--4026.

[31]

Xiaocui Yang, Shi Feng, Yifei Zhang, and Daling Wang. 2021. Multimodal sentiment detection based on multi-channel graph neural networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 328--339.

[32]

Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers). 207--212.

[33]

Tong Zhu, Leida Li, Jufeng Yang, Sicheng Zhao, Hantao Liu, and Jiansheng Qian. 2022. Multimodal sentiment analysis with image-text interaction network. IEEE Transactions on Multimedia (2022).

Cited By

Narejo KZan HDharmani KZhou LAlahmadi TAssam MSehito NGhadi Y(2024)EEBERT: An Emoji-Enhanced BERT Fine-Tuning on Amazon Product Reviews for Text Sentiment ClassificationIEEE Access10.1109/ACCESS.2024.345603912(131954-131967)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3456039

Index Terms

A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis

Recommendations

Multi-task disagreement-reducing multimodal sentiment fusion network
Abstract
Existing multimodal sentiment analysis models can effectively capture sentimental commonalities between different modalities and possess high sentimental acquisition capability. However, there are still shortcomings in the model's analysis and ...
Highlights
- We can summarize the main contributions of our paper as follow:
- Revealing the presence of special samples with contradictory sentiment polarities
- Considering the potential issue that may arise due to the dominance of the textual ...
SWACL: Sentimental Words Aware Contrastive Learning for Multimodal Sentiment Analysis
ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition

Multimodal Sentiment Analysis (MSA) aims to predict the emotional polarity of multiple modalities, such as text, video, and audio. Previous studies have focused extensively on fusing multimodal features while ignoring the value of implicit textual ...
Multimodal Social Media Sentiment Analysis Based on Cross-Modal Hierarchical Attention Fusion
Artificial Intelligence and Mobile Services – AIMS 2021
Abstract
With the diversification of data forms on social media, more and more multimodal information mixed with image and text replaces the traditional single text description. Compared with single-modal data, multimodal data can more fully express people’...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science and Technology Major Project
Science and Technology Commission of Shanghai Municipality Grant

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
233
Total Downloads

Downloads (Last 12 months)233
Downloads (Last 6 weeks)44

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Narejo KZan HDharmani KZhou LAlahmadi TAssam MSehito NGhadi Y(2024)EEBERT: An Emoji-Enhanced BERT Fine-Tuning on Amazon Product Reviews for Text Sentiment ClassificationIEEE Access10.1109/ACCESS.2024.345603912(131954-131967)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3456039

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents