research-article

Open access

Humor Detection System for MuSE 2023: Contextual Modeling, Pesudo Labelling, and Post-smoothing

Authors:

Bin LiuAuthors Info & Claims

MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

Pages 35 - 41

https://doi.org/10.1145/3606039.3613107

Published: 29 October 2023 Publication History

Abstract

Humor detection has emerged as an active research area within the field of artificial intelligence. Over the past few decades, it has made remarkable progress with the development of deep learning. This paper introduces a novel framework aimed at enhancing the model's understanding of humorous expressions. Specifically, we consider the impact of correspondence between labels and features. In order to achieve more effective models with limited training samples, we employ a widely utilized semi-supervised learning technique called pseudo labeling. Furthermore, we use the post-smoothing strategy to eliminate abnormally high predictions. At the same time, in order to alleviate the over-fitting phenomenon of the model on the validation set, we created 10 different random subsets of the training and then aggregating their prediction. To verify the effectiveness of our strategy, we evaluate its performance on the Cross-Cultural Humour sub-challenge at MuSe 2023. Experimental results demonstrate that our system achieves an AUC score of 0.9112, surpassing the performance of baseline models by a substantial margin.

Supplementary Material

MP4 File (045video.mp4)

presentation video

Download
25.48 MB

References

[1]

Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Messner, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023. MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects. In Proceedings of the 31st ACM International Conference on Multimedia (MM'23), October 29-November 2, 2023, Ottawa, Canada. Association for Computing Machinery, Ottawa, Canada. to appear.

Digital Library

[2]

Shahin Amiriparian, Tobias Hübner, Vincent Karas, Maurice Gerczuk, Sandra Ottl, and BjörnWSchuller. 2022. Deepspectrumlite: A power-efficient transfer learning framework for embedded speech and audio processing from decentralized data. Frontiers in Artificial Intelligence 5 (2022), 856232.

[3]

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 (2020), 12449--12460.

[4]

Dario Bertero and Pascale Fung. 2016. Deep learning of audio and language features for humor prediction. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 496--501.

[5]

Dario Bertero and Pascale Fung. 2016. Deep learning of audio and language features for humor prediction. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 496--501.

[6]

G. E. P. Box and G. M. Jenkins. 2010. Time series analysis : forecasting and control. Journal of Time 31, 3 (2010).

[7]

Arnie Cann, Amanda J Watson, and Elisabeth A Bridgewater. 2014. Assessing humor at work: The humor climate questionnaire. Humor 27, 2 (2014), 307--323.

[8]

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650--9660.

[9]

Krishna Chaitanya, Ertunc Erdil, Neerav Karani, and Ender Konukoglu. 2023. Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation. Medical Image Analysis 87 (2023), 102792.

[10]

Peng-Yu Chen and Von-Wun Soo. 2018. Humor recognition using deep learning. In Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 2 (short papers). 113--117.

[11]

Sakib Chowdhury, Md. Latifur Rahman, Shams Nafisa Ali, and Md. Jahin Alam. 2020. A RNN Based Parallel Deep Learning Framework for Detecting Sentiment Polarity from Twitter Derived Textual Data. In 2020 11th International Conference on Electrical and Computer Engineering (ICECE). 9--12. https://doi.org/10.1109/ ICECE51571.2020.9393137

[12]

Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Müller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023. The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation. In MuSe'23: Proceedings of the 4th Multimodal Sentiment Analysis Workshop and Challenge. Association for Computing Machinery. co-located with ACM Multimedia 2022, to appear.

[13]

Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, and Björn W Schuller. 2022. Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results. arXiv preprint arXiv:2209.14272 (2022).

[14]

Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, and Björn W. Schuller. 2023. Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results. arXiv:2209.14272 [cs.LG]

[15]

Jia Deng,Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[16]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[17]

Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978).

[18]

Florian Eyben, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing 7, 2 (2015), 190--202.

[19]

Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. 2023. Eva-02: A visual representation for neon genesis. arXiv preprint arXiv:2303.11331 (2023).

[20]

Valentin Flunkert, David Salinas, and Jan Gasthaus. 2020. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. International Journal of Forecasting 36, 3 (2020).

[21]

Panagiotis Gkorezis, Eugenia Petridou, and Panteleimon Xanthiakos. 2014. Leader positive humor and organizational cynicism: LMX as a mediator. Leadership & Organization Development Journal 35, 4 (2014), 305--315.

[22]

C. W. J. Granger. 2001. Investigating causal relations by econometric models and cross-spectral methods. Harvard University Press (2001).

[23]

Md Kamrul Hasan, Wasifur Rahman, Amir Zadeh, Jianyuan Zhong, Md Iftekhar Tanveer, Louis-Philippe Morency, et al. 2019. UR-FUNNY: A multimodal language dataset for understanding humor. arXiv preprint arXiv:1904.06618 (2019).

[24]

Md Kamrul Hasan, Wasifur Rahman, Amir Zadeh, Jianyuan Zhong, Md Iftekhar Tanveer, Louis Philippe Morency, Mohammed, and Hoque. 2019. UR-FUNNY: A Multimodal Language Dataset for Understanding Humor. (2019).

[25]

Lang He, Dongmei Jiang, Le Yang, Ercheng Pei, PengWu, and Hichem Sahli. 2015. Multimodal affective dimension prediction using deep bidirectional long shortterm memory recurrent neural networks. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. 73--80.

Digital Library

[26]

Jian Huang, Ya Li, Jianhua Tao, Zheng Lian, Zhengqi Wen, Minghao Yang, and Jiangyan Yi. 2017. Continuous multimodal emotion prediction based on long short term memory recurrent neural network. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. 11--18.

Digital Library

[27]

Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, and Trevor Strohman. 2022. Pseudo label is better than human label. arXiv preprint arXiv:2203.12668 (2022).

[28]

Tonglin Jiang, Hao Li, and Yubo Hou. 2019. Cultural differences in humor perception, usage, and implications. Frontiers in psychology 10 (2019), 123.

[29]

Jacob Kahn, Ann Lee, and Awni Hannun. 2020. Self-training for end-to-end speech recognition. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7084--7088.

[30]

Yuta Kayatani, Zekun Yang, Mayu Otani, Noa Garcia, Chenhui Chu, Yuta Nakashima, and Haruo Takemura. 2021. The Laughing Machine: Predicting Humor in Video. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 2073--2082.

[31]

Jussa Klapuri et al. 2013. Collaborative filtering methods on a very sparse reddit recommendation dataset. Master's thesis.

[32]

Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semisupervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, Vol. 3. Atlanta, 896.

[33]

Soroosh Mariooryad and Carlos Busso. 2014. Correcting time-continuous emotional labels by modeling the reaction lag of evaluators. IEEE Transactions on Affective Computing 6, 2 (2014), 97--108.

Digital Library

[34]

Ke Mei, Chuang Zhu, Jiaqi Zou, and Shanghang Zhang. 2020. Instance adaptive self-training for unsupervised domain adaptation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXVI 16. Springer, 415--430.

[35]

Badri N. Patro, Mayank Lunayach, Deepankar Srivastava, Sarvesh, Hunar Singh, and Vinay P. Namboodiri. 2021. Multimodal Humor Dataset: Predicting Laughter Tracks for Sitcoms. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 576--585.

[36]

Alec Radford, JongWook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning. PMLR, 28492--28518.

[37]

Victor Raskin, Willibald Ruch, and Victor Raskin. 2008. The primer of humor research. Mouton de Gruyter.

[38]

Fabien Ringeval, Florian Eyben, Eleni Kroupi, Anil Yuce, Jean-Philippe Thiran, Touradj Ebrahimi, Denis Lalanne, and Björn Schuller. 2015. Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognition Letters 66 (2015), 22--30.

Digital Library

[39]

Sefik Ilkin Serengil and Alper Ozpinar. 2020. Lightface: A hybrid deep face recognition framework. In 2020 innovations in intelligent systems and applications conference (ASYU). IEEE, 1--5.

[40]

Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles-a knowledge reuse framework for combining multiple partitions. Journal of machine learning research 3, Dec (2002), 583--617.

[41]

Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, and Yue Cao. 2023. Eva-clip: Improved training techniques for clip at scale. arXiv preprint arXiv:2303.15389 (2023).

[42]

Jiaming Wu, Hongfei Lin, Liang Yang, and Bo Xu. 2021. Mumor: A multimodal dataset for humor detection in conversations. In Natural Language Processing and Chinese Computing: 10th CCF International Conference, NLPCC 2021, Qingdao, China, October 13--17, 2021, Proceedings, Part I 10. Springer, 619--627.

Digital Library

[43]

Zixiaofan Yang, Lin Ai, and Julia Hirschberg. 2019. Multimodal Indicators of Humor in Videos. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 538--543. https://doi.org/10.1109/MIPR.2019.00109

[44]

Minghu Zhang, Jianwen Guo, Xin Li, and Rui Jin. 2020. Data-Driven Anomaly Detection Approach for Time-Series Streaming Data. Sensors 20, 19 (2020), 5646.

[45]

Dingyuan Zheng, Jimin Xiao, Ke Chen, Xiaowei Huang, Lin Chen, and Yao Zhao. 2022. Soft pseudo-label shrinkage for unsupervised domain adaptive person re-identification. Pattern Recognition 127 (2022), 108615.

Digital Library

Cited By

Xu YZou PWang RLi QXu CZhao ZYang XSun XGuo DWang MAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)AMTN: Attention-Enhanced Multimodal Temporal Network for Humor DetectionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689375(65-69)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689375
Amiriparian SChrist LKathan AGerczuk MMüller NKlug SStappen LKönig ACambria ESchuller BEulitz SAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor RecognitionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689088(1-9)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689088
Wen ZYao HChen SSun HXu MSun LLian ZLiu BZhang FZhang STao JAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)Social Perception Prediction for MuSe 2024: Joint Learning of Multiple PerceptionsProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689087(52-59)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689087
Show More Cited By

Index Terms

Humor Detection System for MuSE 2023: Contextual Modeling, Pesudo Labelling, and Post-smoothing
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-Wise Pseudo Labeling
Abstract
The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos. It often performs in a weakly-supervised manner, where only video event labels are ...
Improving Semi-Supervised Text Classification with Dual Meta-Learning
The goal of semi-supervised text classification (SSTC) is to train a model by exploring both a small number of labeled data and a large number of unlabeled data, such that the learned semi-supervised classifier performs better than the supervised ...
Malware Detection Using Pseudo Semi-Supervised Learning
Pattern Recognition and Artificial Intelligence
Abstract
Malware, due to its ever-evolving nature, remains a serious threat. Sophisticated attacks using ransomware and viruses have crippled organizations globally. Traditional heuristic and signature-based methods have failed to keep up and are easily ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

November 2023

113 pages

ISBN:9798400702709

DOI:10.1145/3606039

General Chairs:
Shahin Amiriparian
University of Augsburg, Germany
,
Lukas Christ
University of Augsburg, Germany
,
Andreas Konig
University of Passau, Germany
,
Alan Cowen
Hume AI, USA
,
Eva-Maria Meßner
University of Ulm, Germany
,
Erik Cambria
Nanyang Technological University, Singapore
,
Bjorn W. Schuller
Imperial College London, UK

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Natural Science Foundation of China
Beijing Municipal Science&Technology CommissionAdministrative Commission of Zhongguancun Science Park
Open Research Projects of Zhejiang Lab

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 14 of 17 submissions, 82%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
342
Total Downloads

Downloads (Last 12 months)274
Downloads (Last 6 weeks)19

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu YZou PWang RLi QXu CZhao ZYang XSun XGuo DWang MAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)AMTN: Attention-Enhanced Multimodal Temporal Network for Humor DetectionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689375(65-69)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689375
Amiriparian SChrist LKathan AGerczuk MMüller NKlug SStappen LKönig ACambria ESchuller BEulitz SAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor RecognitionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689088(1-9)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689088
Wen ZYao HChen SSun HXu MSun LLian ZLiu BZhang FZhang STao JAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)Social Perception Prediction for MuSe 2024: Joint Learning of Multiple PerceptionsProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689087(52-59)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689087
Chen SYao HXu MWen ZSun HSun LLian ZLiu BZhang FZhang STao JAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)DPP: A Dual-Phase Processing Method for Cross-Cultural Humor DetectionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689080(70-78)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689080
Tsuge YShimozaki ANadamoto A(2024)Generative Action Procedure Manzai Scenario Based on Maslow’s Stages of Need TheoryAdvances in Network-Based Information Systems10.1007/978-3-031-72325-4_31(319-327)Online publication date: 20-Sep-2024
https://doi.org/10.1007/978-3-031-72325-4_31
Amiriparian SChrist LKönig ACowen AMeßner ECambria ESchuller BEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of AffectsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3610943(9723-9725)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3610943

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten