research-article

Generating Multimodal Metaphorical Features for Meme Understanding

Authors:

Feng XiaAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 447 - 455

https://doi.org/10.1145/3664647.3681060

Published: 28 October 2024 Publication History

Abstract

Understanding a meme is a challenging task, due to the metaphorical information contained in the meme that requires intricate interpretation to grasp its intended meaning fully. In previous works, attempts have been made to facilitate computational understanding of memes through introducing human-annotated metaphors as extra input features into machine learning models. However, these approaches mainly focus on formulating linguistic representation of a metaphor (extracted from the texts appearing in memes), while ignoring the connection between the metaphor and corresponding visual features (e.g., objects in meme images). In this paper, we argue that a more comprehensive understanding of memes can only be achieved through a joint modelling of both visual and linguistic features of memes. To this end, we propose an approach to generate Multimodal Metaphorical feature for Meme Classification, named MMMC. MMMC derives visual characteristics from linguistic attributes of metaphorical concepts, which more effectively convey the underlying metaphorical concept, leveraging a text-conditioned generative adversarial network. The linguistic and visual features are then integrated into a set of multimodal metaphorical features for classification purpose. We perform extensive experiments on a benchmark metaphorical meme dataset, MET-Meme. Experimental results show that MMMC significantly outperforms existing baselines on the task of emotion classification and intention detection. Our code and dataset are available at https://github.com/liaolianfoka/MMMC.

References

[1]

Arjun Reddy Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas John Guibas, William T. Freeman, Yuanzhen Li, and Varun Jampani. 2023. MetaCLUE: Towards Comprehensive Visual Metaphors Research. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17--24, 2023. IEEE, 23201--23211.

[2]

Ana-Maria Bucur, Adrian Cosma, and Ioan-Bogdan Iordache. 2022. BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.

[3]

Xianyang Chen, Chee Wee Leong, Michael Flor, and Beata Beigman Klebanov. 2020. Go Figure! Multi-task transformer-based architecture for metaphor detection using idioms: ETS team in 2020 metaphor shared task. In Proceedings of the Second Workshop on Figurative Language Processing. Association for Computational Linguistics, 235--243.

[4]

Minjin Choi, Sunkyung Lee, Eunseong Choi, Heesoo Park, Junhyuk Lee, Dongwon Lee, and Jongwuk Lee. 2021. MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1763--1773.

[5]

Patrick Davison. 2012. 9. The Language of Internet Memes. New York University Press, New York, USA, 120--134. ISBN 9780814763025.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171--4186.

[7]

Baishan Duan and Yuesheng Zhu. 2022. BROWALLIA at Memotion 2.0 2022 : Multimodal Memotion Analysis with Modified OGB Strategies. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.

[8]

Jean H. French. 2017. Image-based memes as sentiment predictors. In 2017 International Conference on Information Society (i-Society). 80--85. https://doi.org/10.23919/i-Society.2017.8354676

[9]

Xiaoyu Guo, Jing Ma, and Arkaitz Zubiaga. 2023. NUAA-QMUL-AIIT at Memotion 3: Multi-modal Fusion with Squeeze-and-Excitation for Internet Meme Emotion Analysis.

[10]

Muzhaffar Hazman, Susan McKeever, and Josephine Griffith. 2023. Meme Sentiment Analysis Enhanced with Multimodal Spatial Encoding and Face Embedding. In Artificial Intelligence and Cognitive Science. Springer, Munster, Ireland, 318--331.

[11]

EunJeong Hwang and Vered Shwartz. 2023. MemeCap: A Dataset for Captioning and Interpreting Memes. CoRR, Vol. abs/2305.13703 (2023).

[12]

Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, and Davide Testuggine. 2020. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 2611--2624.

[13]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations. San Diego, USA.

[14]

Christos Koutlis, Manos Schinas, and Symeon Papadopoulos. 2023. MemeFier: Dual-Stage Modality Fusion for Image Meme Classification. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. ACM, Thessaloniki, Greece, 586--591.

Digital Library

[15]

Gwang Gook Lee and Mingwei Shen. 2022. Amazon PARS at Memotion 2.0 2022: Multi-modal Multi-task Learning for Memotion 2.0 Challenge. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.

[16]

Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In International Conference on Machine Learning, ICML 2023, 23--29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 19730--19742. https://proceedings.mlr.press/v202/li23q.html

[17]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision -- ECCV 2014. Springer, Zurich, Switzerland, 740--755.

[18]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In 2021 IEEE/CVF International Conference on Computer Vision. IEEE, Montreal, Canada, 9992--10002.

[19]

Shreyash Mishra, Suryavardan S, Parth Patwa, Megha Chakraborty, Anku Rani, Aishwarya Reganti, Aman Chadha, Amitava Das, Amit P. Sheth, Manoj Chinnakotla, Asif Ekbal, and Srijan Kumar. 2023. Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes. CoRR, Vol. abs/2303.09892 (2023).

[20]

Thanh Van Nguyen, Nhat Truong Pham, Ngoc Duy Nguyen, Hai Nguyen, Long H. Nguyen, and Yong-Guk Kim. 2022. HCILab at Memotion 2.0 2022: Analysis of Sentiment, Emotion and Intensity of Emotion Classes from Meme Images using Single and Multi Modalities (short paper). In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.

[21]

OpenAI. 2023. GPT-4 Technical Report. CoRR, Vol. abs/2303.08774 (2023).

[22]

Kim Ngan Phan, Gueesang Lee, Hyung-Jeong Yang, and Soo-Hyung Kim. 2022. Little Flower at Memotion 2.0 2022 : Ensemble of Multi-Modal Model using Attention Mechanism in MEMOTION Analysis (short paper). In DE-FACTIFY@AAAI. https://api.semanticscholar.org/CorpusID:252015554

[23]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 8748--8763.

[24]

Sathyanarayanan Ramamoorthy, Nethra Gunti, Shreyash Mishra, Suryavardan S, Aishwarya N. Reganti, Parth Patwa, Amitava Das, Tanmoy Chakraborty, Amit P. Sheth, Asif Ekbal, and Chaitanya Ahuja. 2022. Memotion 2: Dataset on Sentiment and Emotion Analysis of Memes. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.

[25]

Tal Ridnik, Emanuel Ben Baruch, Asaf Noy, and Lihi Zelnik. 2021. ImageNet-21K Pretraining for the Masses. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/98f13708210194c475687be6106a3b84-Abstract-round1.html

[26]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, Vol. abs/1910.01108 (2019).

[27]

Ekaterina Shutova, Douwe Kiela, and Jean Maillard. 2016. Black Holes and White Rabbits: Metaphor Identification with Visual Features. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, San Diego California, USA.

[28]

Kevin Stowe, Tuhin Chakrabarty, Nanyun Peng, Smaranda Muresan, and Iryna Gurevych. 2021. Metaphor Generation with Conceptual Mappings. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 6724--6736.

[29]

Yu-Chien Tang, Kuang-Da Wang, Ting-Yun Ou, and Wen-Chih Peng. 2023. NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have Good Meme Analysis. CoRR, Vol. abs/2302.06078 (2023).

[30]

Weiyao Wang, Du Tran, and Matt Feiszli. 2020. What Makes Training Multi-Modal Classification Networks Hard?. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Computer Vision Foundation / IEEE, Seattle, USA, 12692--12702.

[31]

Bo Xu, Tingting Li, Junzhe Zheng, Mehdi Naseriparsa, Zhehuan Zhao, Hongfei Lin, and Feng Xia. 2022. MET-Meme: A Multimodal Meme Dataset Rich in Metaphors. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, USA, 2887--2899.

Digital Library

[32]

Senmao Ye, Huan Wang, Mingkui Tan, and Fei Liu. 2023. Recurrent Affine Transformation for Text-to-image Synthesis. IEEE Transactions on Multimedia (2023).

[33]

Omnia Zayed, John P. McCrae, and Paul Buitelaar. 2019. Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets. In 2nd Conference on Language, Data and Knowledge. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Leipzig, Germany, 1--17.

[34]

Dongyu Zhang, Minghao Zhang, Heting Zhang, Liang Yang, and Hongfei Lin. 2021. MultiMET: A Multimodal Dataset for Metaphor Understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 3214--3225.

[35]

Linhao Zhang, Li Jin, Guangluan Xu, Xiaoyu Li, Cai Xu, Kaiwen Wei, Nayu Liu, and Haonan Liu. 2024. CAMEL: Capturing Metaphorical Alignment with Context Disentangling for Multimodal Emotion Recognition. In AAAI Conference on Artificial Intelligence.

[36]

Jiawen Zhu, Roy Ka-Wei Lee, and Wen Haw Chong. 2022. Multimodal Zero-Shot Hateful Meme Detection. In Proceedings of the 14th ACM Web Science Conference 2022. ACM, Barcelona, Spain, 382--389.

Digital Library

[37]

Yan Zhuang and Yanru Zhang. 2022. Yet at Memotion 2.0 2022 : Hate Speech Detection Combining BiLSTM and Fully Connected Layers. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.

Index Terms

Generating Multimodal Metaphorical Features for Meme Understanding
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
    2. Natural language processing

Recommendations

MET-Meme: A Multimodal Meme Dataset Rich in Metaphors
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Memes have become the popular means of communication for Internet users worldwide. Understanding the Internet meme is one of the most tricky challenges in natural language processing (NLP) tasks due to its convenient non-standard writing and network ...
Intuition Pumps

The award of the 2003 Barwise Prize to Daniel Dennett by the American Philosophical Association signifies Dennett's importance in the developing area of philosophical inquiry into computing and information. One source of Dennett's intellectual stature ...
Theorizing through metaphorical transfer in OM/SCM research: Divorce as a metaphor for strategic buyer–supplier relationship dissolution
Abstract
Operations Management and Supply Chain Management (OM/SCM), as a discipline, can benefit from proper theorizing to address persistent urgings for better and new theories. This paper hopes to inspire more theorizing engagements through the formal ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
77
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)77

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents