Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3681060acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Generating Multimodal Metaphorical Features for Meme Understanding

Published: 28 October 2024 Publication History

Abstract

Understanding a meme is a challenging task, due to the metaphorical information contained in the meme that requires intricate interpretation to grasp its intended meaning fully. In previous works, attempts have been made to facilitate computational understanding of memes through introducing human-annotated metaphors as extra input features into machine learning models. However, these approaches mainly focus on formulating linguistic representation of a metaphor (extracted from the texts appearing in memes), while ignoring the connection between the metaphor and corresponding visual features (e.g., objects in meme images). In this paper, we argue that a more comprehensive understanding of memes can only be achieved through a joint modelling of both visual and linguistic features of memes. To this end, we propose an approach to generate Multimodal Metaphorical feature for Meme Classification, named MMMC. MMMC derives visual characteristics from linguistic attributes of metaphorical concepts, which more effectively convey the underlying metaphorical concept, leveraging a text-conditioned generative adversarial network. The linguistic and visual features are then integrated into a set of multimodal metaphorical features for classification purpose. We perform extensive experiments on a benchmark metaphorical meme dataset, MET-Meme. Experimental results show that MMMC significantly outperforms existing baselines on the task of emotion classification and intention detection. Our code and dataset are available at https://github.com/liaolianfoka/MMMC.

References

[1]
Arjun Reddy Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas John Guibas, William T. Freeman, Yuanzhen Li, and Varun Jampani. 2023. MetaCLUE: Towards Comprehensive Visual Metaphors Research. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17--24, 2023. IEEE, 23201--23211.
[2]
Ana-Maria Bucur, Adrian Cosma, and Ioan-Bogdan Iordache. 2022. BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.
[3]
Xianyang Chen, Chee Wee Leong, Michael Flor, and Beata Beigman Klebanov. 2020. Go Figure! Multi-task transformer-based architecture for metaphor detection using idioms: ETS team in 2020 metaphor shared task. In Proceedings of the Second Workshop on Figurative Language Processing. Association for Computational Linguistics, 235--243.
[4]
Minjin Choi, Sunkyung Lee, Eunseong Choi, Heesoo Park, Junhyuk Lee, Dongwon Lee, and Jongwuk Lee. 2021. MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1763--1773.
[5]
Patrick Davison. 2012. 9. The Language of Internet Memes. New York University Press, New York, USA, 120--134. ISBN 9780814763025.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171--4186.
[7]
Baishan Duan and Yuesheng Zhu. 2022. BROWALLIA at Memotion 2.0 2022 : Multimodal Memotion Analysis with Modified OGB Strategies. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.
[8]
Jean H. French. 2017. Image-based memes as sentiment predictors. In 2017 International Conference on Information Society (i-Society). 80--85. https://doi.org/10.23919/i-Society.2017.8354676
[9]
Xiaoyu Guo, Jing Ma, and Arkaitz Zubiaga. 2023. NUAA-QMUL-AIIT at Memotion 3: Multi-modal Fusion with Squeeze-and-Excitation for Internet Meme Emotion Analysis.
[10]
Muzhaffar Hazman, Susan McKeever, and Josephine Griffith. 2023. Meme Sentiment Analysis Enhanced with Multimodal Spatial Encoding and Face Embedding. In Artificial Intelligence and Cognitive Science. Springer, Munster, Ireland, 318--331.
[11]
EunJeong Hwang and Vered Shwartz. 2023. MemeCap: A Dataset for Captioning and Interpreting Memes. CoRR, Vol. abs/2305.13703 (2023).
[12]
Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, and Davide Testuggine. 2020. The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 2611--2624.
[13]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations. San Diego, USA.
[14]
Christos Koutlis, Manos Schinas, and Symeon Papadopoulos. 2023. MemeFier: Dual-Stage Modality Fusion for Image Meme Classification. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. ACM, Thessaloniki, Greece, 586--591.
[15]
Gwang Gook Lee and Mingwei Shen. 2022. Amazon PARS at Memotion 2.0 2022: Multi-modal Multi-task Learning for Memotion 2.0 Challenge. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.
[16]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In International Conference on Machine Learning, ICML 2023, 23--29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 19730--19742. https://proceedings.mlr.press/v202/li23q.html
[17]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision -- ECCV 2014. Springer, Zurich, Switzerland, 740--755.
[18]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In 2021 IEEE/CVF International Conference on Computer Vision. IEEE, Montreal, Canada, 9992--10002.
[19]
Shreyash Mishra, Suryavardan S, Parth Patwa, Megha Chakraborty, Anku Rani, Aishwarya Reganti, Aman Chadha, Amitava Das, Amit P. Sheth, Manoj Chinnakotla, Asif Ekbal, and Srijan Kumar. 2023. Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes. CoRR, Vol. abs/2303.09892 (2023).
[20]
Thanh Van Nguyen, Nhat Truong Pham, Ngoc Duy Nguyen, Hai Nguyen, Long H. Nguyen, and Yong-Guk Kim. 2022. HCILab at Memotion 2.0 2022: Analysis of Sentiment, Emotion and Intensity of Emotion Classes from Meme Images using Single and Multi Modalities (short paper). In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.
[21]
OpenAI. 2023. GPT-4 Technical Report. CoRR, Vol. abs/2303.08774 (2023).
[22]
Kim Ngan Phan, Gueesang Lee, Hyung-Jeong Yang, and Soo-Hyung Kim. 2022. Little Flower at Memotion 2.0 2022 : Ensemble of Multi-Modal Model using Attention Mechanism in MEMOTION Analysis (short paper). In DE-FACTIFY@AAAI. https://api.semanticscholar.org/CorpusID:252015554
[23]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 8748--8763.
[24]
Sathyanarayanan Ramamoorthy, Nethra Gunti, Shreyash Mishra, Suryavardan S, Aishwarya N. Reganti, Parth Patwa, Amitava Das, Tanmoy Chakraborty, Amit P. Sheth, Asif Ekbal, and Chaitanya Ahuja. 2022. Memotion 2: Dataset on Sentiment and Emotion Analysis of Memes. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.
[25]
Tal Ridnik, Emanuel Ben Baruch, Asaf Noy, and Lihi Zelnik. 2021. ImageNet-21K Pretraining for the Masses. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/98f13708210194c475687be6106a3b84-Abstract-round1.html
[26]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, Vol. abs/1910.01108 (2019).
[27]
Ekaterina Shutova, Douwe Kiela, and Jean Maillard. 2016. Black Holes and White Rabbits: Metaphor Identification with Visual Features. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, San Diego California, USA.
[28]
Kevin Stowe, Tuhin Chakrabarty, Nanyun Peng, Smaranda Muresan, and Iryna Gurevych. 2021. Metaphor Generation with Conceptual Mappings. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 6724--6736.
[29]
Yu-Chien Tang, Kuang-Da Wang, Ting-Yun Ou, and Wen-Chih Peng. 2023. NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have Good Meme Analysis. CoRR, Vol. abs/2302.06078 (2023).
[30]
Weiyao Wang, Du Tran, and Matt Feiszli. 2020. What Makes Training Multi-Modal Classification Networks Hard?. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Computer Vision Foundation / IEEE, Seattle, USA, 12692--12702.
[31]
Bo Xu, Tingting Li, Junzhe Zheng, Mehdi Naseriparsa, Zhehuan Zhao, Hongfei Lin, and Feng Xia. 2022. MET-Meme: A Multimodal Meme Dataset Rich in Metaphors. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, USA, 2887--2899.
[32]
Senmao Ye, Huan Wang, Mingkui Tan, and Fei Liu. 2023. Recurrent Affine Transformation for Text-to-image Synthesis. IEEE Transactions on Multimedia (2023).
[33]
Omnia Zayed, John P. McCrae, and Paul Buitelaar. 2019. Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets. In 2nd Conference on Language, Data and Knowledge. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Leipzig, Germany, 1--17.
[34]
Dongyu Zhang, Minghao Zhang, Heting Zhang, Liang Yang, and Hongfei Lin. 2021. MultiMET: A Multimodal Dataset for Metaphor Understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 3214--3225.
[35]
Linhao Zhang, Li Jin, Guangluan Xu, Xiaoyu Li, Cai Xu, Kaiwen Wei, Nayu Liu, and Haonan Liu. 2024. CAMEL: Capturing Metaphorical Alignment with Context Disentangling for Multimodal Emotion Recognition. In AAAI Conference on Artificial Intelligence.
[36]
Jiawen Zhu, Roy Ka-Wei Lee, and Wen Haw Chong. 2022. Multimodal Zero-Shot Hateful Meme Detection. In Proceedings of the 14th ACM Web Science Conference 2022. ACM, Barcelona, Spain, 382--389.
[37]
Yan Zhuang and Yanru Zhang. 2022. Yet at Memotion 2.0 2022 : Hate Speech Detection Combining BiLSTM and Fully Connected Layers. In DE-FACTIFY@AAAI 2022, Vol. 3199. CEUR-WS.org.

Index Terms

  1. Generating Multimodal Metaphorical Features for Meme Understanding

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
      October 2024
      11719 pages
      ISBN:9798400706868
      DOI:10.1145/3664647
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. meme understanding
      2. metaphor
      3. multimodal

      Qualifiers

      • Research-article

      Conference

      MM '24
      Sponsor:
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne VIC, Australia

      Acceptance Rates

      MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 77
        Total Downloads
      • Downloads (Last 12 months)77
      • Downloads (Last 6 weeks)77
      Reflects downloads up to 25 Nov 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media