Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

An image-text consistency driven multimodal sentiment analysis approach for social media

Published: 01 November 2019 Publication History

Highlights

We propose an image-text consistency measure for image-text posts.
We develop a multimodal sentiment analysis approach for image-text posts.
The proposed approach achieves superior performance in Flickr benchmark dataset.

Abstract

Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in the conventional social media. Consequently, the conventional text-based sentiment analysis has evolved into more complicated studies of multimodal sentiment analysis. To tackle the challenge of how to effectively exploit the information from both visual content and textual content from image-text posts, this paper proposes a new image-text consistency driven multimodal sentiment analysis approach. The proposed approach explores the correlation between the image and the text, followed by a multimodal adaptive sentiment analysis method. To be more specific, the mid-level visual features extracted by the conventional SentiBank approach are used to represent visual concepts, with the integration of other features, including textual, visual and social features, to develop a machine learning sentiment analysis approach. Extensive experiments are conducted to demonstrate the superior performance of the proposed approach.

References

[1]
D. Borth, R. Ji, T. Chen, T. Breuel, S.F. Chang, Large-scale visual sentiment ontology and detectors using adjective noun pairs, ACM int. conf. on multimedia, Oct, 2013, pp. 223–232.
[2]
D. Cao, R. Ji, D. Lin, S. Li, Visual sentiment topic model based microblog image sentiment analysis, Multimedia Tools and Applications 75 (15) (2016) 8955–8968.
[3]
Chen, T., Borth, D., Darrell, T., & Chang, S. F. (2014). Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv: 1410.8586.
[4]
T. Chen, H.M.S. Eldeen, X. He, M.-Y. Kan, D. Lu, Velda: Relating an image tweet’s text and images, 2015, pp. 30–36.
[5]
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. arXiv: 1610-02 357.
[6]
C. Colombo, A.D. Bimbo, P. Pala, Semantics in visual information retrieval, IEEE MultiMedia 6 (3) (1999) 38–53.
[7]
G. Csurka, C. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, European conf. on computer vision, no. 1-22, Sep, 2004, pp. 1–2.
[8]
R. Datta, D. Joshi, J. Li, J.Z. Wang, Studying aesthetics in photographic images using a computational approach, European conference on computer vision, may, 2006, pp. 288–301.
[9]
K. Dave, S. Lawrence, D.M. Pennock, Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, Int. conf. on world wide web, Mar, 2003, pp. 519–528.
[10]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei Fei, Imagenet: A large-scale hierarchical image database, IEEE int. conf. on computer vision and pattern recognition, Jun, 2009, pp. 248–255.
[11]
T. Hayashi, M. Hagiwara, Image query by impression words-the IQI system, IEEE Transactions on Consumer Electronics 44 (2) (1998) 347–352.
[12]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, IEEE int. conf. on computer vision and pattern recognition, Jun, 2016, pp. 770–778.
[13]
B. Jiang, J. Yang, Z. Lv, K. Tian, Q. Meng, Y. Yan, Internet cross-media retrieval based on deep learning, Journal of Visual Communication and Image Representation 48 (2017) 356–366.
[14]
B. Jou, T. Chen, N. Pappas, M. Redi, M. Topkara, S.F. Chang, Visual affect around the world: A large-scale multilingual visual sentiment ontology, ACM int. conf. on multimedia, Oct, 2015, pp. 159–168.
[15]
Y. Ke, X. Tang, F. Jing, The design of high-level features for photo quality assessment, IEEE int. conf. on computer vision and pattern recognition, Jul, 2006, pp. 419–426.
[16]
B. Li, S. Feng, W. Xiong, W. Hu, Scaring or pleasing: Exploit emotional impact of an image, ACM int. conf. on multimedia, Nov, 2012, pp. 1365–1366.
[17]
J. Lilleberg, Y. Zhu, Y. Zhang, Support vector machines and Word2vec for text classification with semantic features, IEEE int. conf. on cognitive informatics & cognitive computing, Jul, 2015, pp. 136–140.
[18]
M. Liu, L. Zhang, Y. Liu, H. Hu, W. Fang, Recognizing semantic correlation in image-text weibo via feature space mapping, Computer Vision and Image Understanding 163 (2017) 58–66.
[19]
A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y. Ng, C. Potts, Learning word vectors for sentiment analysis, Annual meeting of association for computational linguistics, 2011, pp. 142–150.
[20]
J. Machajdik, A. Hanbury, Affective image classification using features inspired by psychology and art theory, ACM int. conf. on multimedia, Oct, 2010, pp. 83–92.
[21]
L. Marchesotti, F. Perronnin, D. Larlus, G. Csurka, Assessing the aesthetic quality of photographs using generic image descriptors, IEEE Int. Conf. on Computer Vision, Nov, 2011, pp. 1784–1791.
[22]
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv: 1301.3781.
[23]
T. Nasukawa, J. Yi, Sentiment analysis: Capturing favorability using natural language processing, Int. conf. on knowledge capture, Oct, 2003, pp. 70–77.
[24]
W.-Z. Nie, W.-J. Peng, X.-Y. Wang, Y.-L. Zhao, Y.T. Su, Multimedia venue semantic modeling based on multimodal data, Journal of Visual Communication and Image Representation 48 (2017) 375–385.
[25]
B. Pang, L. Lee, A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, Annual meeting association for computational linguistics, Jul, 2004, p. 271.
[26]
B. Pang, L. Lee, Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2 (1–2) (2008) 1–135.
[27]
J. Peng, Y. Shen, J. Fan, Cross-modal social image clustering and tag cleansing, Journal of Visual Communication and Image Representation 24 (7) (2013) 895–910.
[28]
F. Perronnin, C. Dance, Fisher kernels on visual vocabularies for image categorization, IEEE int. conf. on computer vision and pattern recognition, Jun, 2007, pp. 1–8.
[29]
S. Siersdorfer, E. Minack, F. Deng, J. Hare, Analyzing and predicting sentiment of images on the social web, ACM int. conf. on multimedia, Oct, 2010, pp. 715–718.
[30]
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556.
[31]
M. Soleymani, D. Garcia, B. Jou, B. Schuller, S.-F. Chang, M. Pantic, A survey of multimodal sentiment analysis, Image and Vision Computing 65 (2017) 3–14.
[32]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, IEEE int. conf. on computer vision and pattern recognition, Jun, 2016, pp. 2818–2826.
[33]
P.D. Turney, Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews, Annual meeting on association for computational linguistics, Jul, 2002, pp. 417–424.
[34]
V. Vonikakis, S. Winkler, Emotion-based sequence of family photos, ACM Int. Conf. on Multimedia, Nov, 2012, pp. 1371–1372.
[35]
F. Wang, S. Qi, G. Gao, S. Zhao, X. Wang, Logo information recognition in large-scale social media data, Multimedia Systems 22 (1) (2016) 63–73.
[36]
W. Wang, Q. He, A survey on emotional semantic image retrieval, IEEE Int. Conf. on Image Processing, Dec, 2008, pp. 117–120.
[37]
T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler, J. Wiebe, Y. Choi, et al., Opinionfinder: A system for subjectivity analysis, HLT/EMNLP on interactive demonstrations, 2005, pp. 34–35.
[38]
N. Xu, W. Mao, Multisentinet: A deep semantic network for multimodal sentiment analysis, ACM int. conf. on information and knowledge management, 2017, pp. 2399–2402.
[39]
Q. You, L. Cao, H. Jin, J. Luo, Robust visual-textual sentiment analysis: When attention meets tree-structured recursive neural networks, ACM int. conf. on multimedia conference, 2016, pp. 1008–1017.
[40]
Q. You, J. Luo, H. Jin, J. Yang, Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia, ACM int. conf. on web search and data mining, 2016, pp. 13–22.
[41]
H. Yu, V. Hatzivassiloglou, Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences, Int. conf. on empirical methods in natural language processing, 2003, pp. 129–136.
[42]
R. Yu, H. Qiu, Z. Wen, C.Y. Lin, Y. Liu, A survey on social media anomaly detection, ACM SIGKDD Explorations Newsletter 18 (1) (2016) 1–14.
[43]
J. Yuan, S. Mcdonough, Q. You, J. Luo, Sentribute: Image sentiment analysis from a mid-level perspective, Int. workshop on issues of sentiment discovery and opinion mining, Aug, 2013, pp. 10:1–10:8.
[44]
S. Zhao, G. Ding, Y. Gao, J. Han, Approximating discrete probability distribution of image emotions by multi-modal features fusion, International joint conference on artificial intelligence, 2017, pp. 4669–4675.
[45]
S. Zhao, Y. Gao, G. Ding, T. Chua, Real-time multimedia social event detection in Microblog, IEEE Transactions on Cybernetics 48 (2018) 3218–3231.
[46]
S. Zhao, H. Yao, Y. Gao, G. Ding, T.S. Chua, Predicting personalized image emotion perceptions in social networks, IEEE Transactions on Affective Computing 9 (4) (2016) 526–540.
[47]
S. Zhao, H. Yao, Y. Gao, R. Ji, G. Ding, Continuous probability distribution prediction of image emotions via multitask shared sparse regression, IEEE Transactions on Multimedia 19 (3) (2017) 632–645.
[48]
S. Zhao, H. Yao, Y. Yang, Y. Zhang, Affective image retrieval via multi-graph learning, ACM int. conf. on multimedia, Nov, 2014, pp. 1025–1028.

Cited By

View all
  • (2024)Modality-specific and -shared Contrastive Learning for Sentiment AnalysisProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658004(731-739)Online publication date: 30-May-2024
  • (2024)Enhancing image sentiment analysisInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10374961:4Online publication date: 1-Jul-2024
  • (2024)A multimodal fusion network with attention mechanisms for visual–textual sentiment analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122731242:COnline publication date: 16-May-2024
  • Show More Cited By

Index Terms

  1. An image-text consistency driven multimodal sentiment analysis approach for social media
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Information Processing and Management: an International Journal
      Information Processing and Management: an International Journal  Volume 56, Issue 6
      Nov 2019
      457 pages

      Publisher

      Pergamon Press, Inc.

      United States

      Publication History

      Published: 01 November 2019

      Author Tags

      1. Multimodal sentiment analysis
      2. Textual sentiment
      3. Visual sentiment
      4. Social media

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Modality-specific and -shared Contrastive Learning for Sentiment AnalysisProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658004(731-739)Online publication date: 30-May-2024
      • (2024)Enhancing image sentiment analysisInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10374961:4Online publication date: 1-Jul-2024
      • (2024)A multimodal fusion network with attention mechanisms for visual–textual sentiment analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122731242:COnline publication date: 16-May-2024
      • (2024)Deep CNN with late fusion for real time multimodal emotion recognitionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122579240:COnline publication date: 15-Apr-2024
      • (2024)Capsule network-based deep ensemble transfer learning for multimodal sentiment analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122454239:COnline publication date: 1-Apr-2024
      • (2024)Detect Text Forgery with Non-forged Image Features: A Framework for Detection and Grounding of Image-Text ManipulationPattern Recognition and Computer Vision10.1007/978-981-97-8795-1_25(366-380)Online publication date: 18-Oct-2024
      • (2023)Scraping Relevant Images from Web Pages without DownloadACM Transactions on the Web10.1145/361684918:1(1-27)Online publication date: 11-Oct-2023
      • (2023)Multimodal Sentiment Analysis: A Survey of Methods, Trends, and ChallengesACM Computing Surveys10.1145/358607555:13s(1-38)Online publication date: 13-Jul-2023
      • (2023)Image–Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late FusionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358486122:6(1-30)Online publication date: 17-Feb-2023
      • (2023)A Deep Multi-level Attentive Network for Multimodal Sentiment AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351713919:1(1-19)Online publication date: 5-Jan-2023
      • Show More Cited By

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media