Abstract
We cannot overemphasize the essence of contextual information in most natural language processing (NLP) applications. The extraction of context yields significant improvements in many NLP tasks, including emotion recognition from texts. The paper discusses transformer-based models for NLP tasks. It highlights the pros and cons of the identified models. The models discussed include the Generative Pre-training (GPT) and its variants, Transformer-XL, Cross-lingual Language Models (XLM), and the Bidirectional Encoder Representations from Transformers (BERT). Considering BERT’s strength and popularity in text-based emotion detection, the paper discusses recent works in which researchers proposed various BERT-based models. The survey presents its contributions, results, limitations, and datasets used. We have also provided future research directions to encourage research in text-based emotion detection using these models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Acheampong FA, Wenyu C, Nunoo-Mensah H (2020) Text-based emotion detection: Advances, challenges, and opportunities. Engineering Reports e12189
Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 1638–1649
Akhtar MS, Ekbal A, Cambria E (2020) How intense are you? predicting intensities of emotions and sentiments using stacked ensemble. IEEE Comput Intell Mag 15(1):64–75
Al-Rfou R, Choe D, Constant N, Guo M, Jones L (2019) Character-level language modeling with deeper self-attention. Proc AAAI Conf Artif Intell 33:3159–3166
Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. Lrec 10:2200–2204
Baird S, Doug S, Pan Y (2017) Talos targets disinformation with fake news challenge victory. (2017), URL https://blog.talosintelligence.com/2017/06/talos-fake-news-challenge.html
Baroni M, Bernardini S, Ferraresi A, Zanchetta E (2009) The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Lang Resour Eval 43(3):209–226
Baziotis C, Pelekis N, Doulkeridis C (2017) Datastories at semeval-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pp 747–754
Blinov V, Bolotova-Baranova V, Braslavski P (2019) Large dataset and language model fun-tuning for humor recognition. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4027–4032
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146
Bradley MM, Lang PJ (1999) Affective norms for english words (anew): Instruction manual and affective ratings. Tech Report C-1, Center Res Psychophysiol 30(1):25–36
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. arXiv 75
Buechel S, Hahn U (2017) Emobank: Studying the impact of annotation perspective and representation format on dimensional emotion analysis. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics: vol 2, Short Papers, pp 578–585
Cambria E, Livingstone A, Hussain A (2012) The hourglass of emotions. In: Cognitive behavioural systems. Springer, pp 144–157
Cambria E, Fu J, Bisio F, Poria S (2015) Affectivespace 2: Enabling affective intuition for concept-level sentiment analysis. In: AAAI, pp 508–514
Cambria E, Poria S, Hazarika D, Kwok K (2018) Senticnet 5: Discovering conceptual primitives for sentiment analysis by means of context embeddings. In: Thirty-second AAAI conference on artificial intelligence, pp 1795–1802
Cambria E, Li Y, Xing FZ, Poria S, Kwok K (2020) Senticnet 6: Ensemble application of symbolic and subsymbolic ai for sentiment analysis. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 105–114
Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini G (2007) Language resources and linguistic theory: typology, second language acquisition, english linguistics, chapter micro-wnop: A gold standard for the evaluation of automatically compiled lexical resources for opinion mining. Franco Angeli Editore, Milano, IT, pp 200–210
Chatterjee A, Narahari KN, Joshi M, Agrawal P (2019) Semeval-2019 task 3: Emocontext contextual emotion detection in text. In: Proceedings of the 13th international workshop on semantic evaluation, pp 39–48
Chen SY, Hsu CC, Kuo CC, Huang K, Ku LW (2019) Emotionlines: An emotion corpus of multi-party conversations. In: 11th international conference on language resources and evaluation, LREC 2018. European language resources association (ELRA), pp 1597–1601
Chiruzzo L, Castro S, Etcheverry M, Garat D, Prada JJ, Rosá A (2019) Overview of haha at iberlef 2019: Humor analysis based on human annotation. In: Proceedings of the Iberian languages evaluation forum (IberLEF 2019). CEUR workshop proceedings, CEUR-WS, Bilbao, Spain (9 2019), pp 132–144
Conneau A, Lample G (2019) Cross-lingual language model pretraining. In: Advances in Neural Information Processing Systems, pp 7057–7067
Dai Z, Yang Z, Yang Y, Carbonell JG, Le Q, Salakhutdinov R (2019) Transformer-xl: Attentive language models beyond a fixed-length context. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2978–2988
Da San Martino G, Yu S, Barrón-Cedeno A, Petrov R, Nakov P (2019) Fine-grained analysis of propaganda in news article. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5640–5650
Davis R, Proctor C (2017) Fake news, real consequences: Recruiting neural networks for the fight against fake news. Stanford CS224d Deep Learning for NLP final project, p 8
Deng L, Wiebe J (2015) Joint prediction for entity/event-level sentiment analysis using probabilistic soft logic models. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 179–189
Devlin J, Chang M-W, Lee K, Toutanova K (June 2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers), (Minneapolis, Minnesota). Association for computational linguistics, pp 4171–4186
Du K-L, Swamy MN (2013) Neural networks and statistical learning. Springer Science & Business Media, Berlin
Ekman P (1999) Basic emotions. Handbook Cogn Emotion 98(45–60):16
Fadel A, Al-Ayyoub M, Cambria E (2020) Justers at semeval-2020 task 4: Evaluating transformer models against commonsense validation and explanation. In: SemEval-2020, p 9
Felbo B, Mislove A, Søgaard A, Rahwan I, Lehmann S (2017) Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1615–1625
Ferrarotti MJ, Rocchia W, Decherchi S (2018) Finding principal paths in data space. IEEE Trans Neural Netw Learn Syst 30(8):2449–2462
Gilbert C, Hutto E (2014) Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international conference on weblogs and social media (ICWSM-14). Available at (20/04/16) http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf, vol. 81, p 82
Gobinda G (2003) Natural language processing. Ann Rev Inf Sci Technol 37:1
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Rep Stanford 1(12):2009
Gupta P, Schütze H (2018) Lisa: Explaining recurrent neural network judgments via layer-wise semantic accumulation and example to pattern transformation. In: Proceedings of the 2018 EMNLP workshop BlackboxNLP: analyzing and interpreting neural networks for NLP, pp 154–164
Gupta P, Schütze H, Andrassy B (2016) Table filling multi-task recurrent neural network for joint entity and relation extraction. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 2537–2547
Gupta P, Saxena K, Yaseen U, Runkler T, Schütze H (2019) Neural architectures for fine-grained propaganda detection in news. In: Proceedings of the second workshop on natural language processing for internet freedom: Censorship, Disinformation, and Propaganda, pp 92–97
Hanselowski A, Avinesh P, Schiller B, Caspelherr F, Chaudhuri D, Meyer CM, Gurevych I (2018) A retrospective analysis of the fake news challenge stance-detection task. In: Proceedings of the 27th international conference on computational linguistics, pp 1859–1874
Hermans M, Schrauwen B (2013) Training and analysing deep recurrent neural networks. In: Advances in neural information processing systems, pp 190–198
Hou L, Yu C-P, Samaras D (2016) Squared earth mover’s distance-based loss for training deep neural networks. arXiv preprint arXiv:1611.05916, p 9
Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1: Long Papers, pp 328–339
Huang Y-H, Lee S-R, Ma M-Y, Chen Y-H, Yu Y-W, Chen Y-S (2019) Emotionx-idea: Emotion bert–an affectional model for conversation, arXiv preprint arXiv:1908.06264, p 6
Huang C, Trabelsi A, Zaiane OR (2019) Ana at semeval-2019 task 3: Contextual emotion detection in conversations through hierarchical lstms and bert. In: Proceedings of the 13th international workshop on semantic evaluation, pp 49–53
Hussain A, Cambria E (2018) Semi-supervised learning for big social data analysis. Neurocomputing 275:1662–1673
Jwa H, Oh D, Park K, Kang JM, Lim H (2019) exbake: Automatic fake news detection model based on bidirectional encoder representations from transformers (bert). Appl Sci 9(19):4062
Kao EC-C, Liu C-C, Yang T-H, Hsieh C-T, Soo V-W (2009) Towards text-based emotion detection a survey and possible improvements. In: 2009 International conference on information management and engineering. IEEE, pp 70–74
Kazameini A, Fatehi S, Mehta Y, Eetemadi S, Cambria E (2020) Personality trait detection using bagged svm over bert word embedding ensembles, arXiv preprint arXiv:2010.01309, p 4
Khosla S (2018) Emotionx-ar: Cnn-dcnn autoencoder based emotion classifier. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 37–44
Kumar R, Ojha AK, Malmasi S, Zampieri M (2018) Benchmarking aggression identification in social media. In: Proceedings of the first workshop on trolling, aggression and cyberbullying (TRAC-2018), pp 1–11
Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018) Phrase-based & neural unsupervised machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 5039–5049
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite bert for self-supervised learning of language representations. In: International conference on learning representations, p 17
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach, arXiv:abs/1907.11692, p 13
Li Y, Su H, Shen X, Li W, Cao Z, Niu S (2017) Dailydialog: A manually labelled multi-turn dialogue dataset. In: Proceedings of the eighth international joint conference on natural language processing, vol 1: Long Papers, pp 986–995
Li J, Zhang M, Ji D, Liu Y (2020) Multi-task learning network for emotion recognition in conversation. arXiv preprint arXiv:2003.01478, p 7
Luo L, Wang Y (2019) Emotionx-hsu: Adopting pre-trained bert for emotion classification, arXiv preprint arXiv:1907.09669, p 4
Mairesse F, Walker MA, Mehl MR, Moore RK (2007) Using linguistic cues for the automatic recognition of personality in conversation and text. J Artif Intell Res 30:457–500
Malte A, Ratadiya P (2019) Multilingual cyber abuse detection using advanced transformer architecture. In: TENCON 2019-2019 IEEE region 10 conference (TENCON). IEEE, pp 784–789
Matero M, Idnani A, Son Y, Giorgi S, Vu H, Zamani M, Limbachiya P, Guntuku SC, Schwartz HA (2019) Suicide risk assessment with multi-level dual-context language and bert. In: Proceedings of the sixth workshop on computational linguistics and clinical psychology, pp 39–44
Mehta Y, Fatehi S, Kazameini A, Stachl C, Cambria E, Eetemadi S (2020) Bottom-up and top-down: Predicting personality with psycholinguistic and language model features. In: 20th IEEE international conference on data mining (ICDM), p 6
Mohammad S (2018) Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 english words. In: Proceedings of the 56th annual meeting of the association for computational linguistics, vol 1: Long Papers, pp 174–184
Mohammad SM, Turney PD (2013) Crowdsourcing a word-emotion association lexicon. Comput Intell 29(3):436–465
Mohammad S, Bravo-Marquez F, Salameh M, Kiritchenko S (2018) Semeval-2018 task 1: Affect in tweets. In: Proceedings of the 12th international workshop on semantic evaluation, pp 1–17
Murugesan S (2007) Understanding web 2.0. IT Prof 9(4):34–41
Nielsen FÅ (2011) A new anew: Evaluation of a word list for sentiment analysis in microblogs. In: 1st Workshop on making sense of Microposts, pp 93–98
Ortony A, Clore GL, Collins A (1990) The cognitive structure of emotions. Cambridge University Press, Cambridge
Park S, Kim J, Jeon J, Park H, Oh A (2019) Toward dimensional emotion detection from categorical emotion annotations, arXiv preprint arXiv:1911.02499, p 11
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227–2237
Plutchik R (1980) A general psychoevolutionary theory of emotion. In: Theories of emotion. Elsevier, pp 3–33
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2019) Meld: A multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 527–536
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training, URL https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, p 12
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1(8):9
Riedel B, Augenstein I, Spithourakis G, Riedel S (2017) A simple but tough-to-beat baseline for the fake news challenge stance detection task. corr arXiv:abs/1707.03264
Ruder S, Peters ME, Swayamdipta S, Wolf T, (2019) Transfer learning in natural language processing. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pp 15–18
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161
Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11(3):273–294
Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold. In: Proceedings of the 1st interantional workshop on emotion and sentiment in social and expressive media: approaches and perspectives from AI (ESSEM 2013), p 9
Scherer KR, Wallbott HG (1994) Evidence for universality and cultural variation of differential emotion response patterning. J Pers Soc Psychol 66(2):310
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Schwartz HA, Giorgi S, Sap M, Crutchley P, Ungar L, Eichstaedt J (2017) Dlatk: Differential language analysis toolkit. In: Proceedings of the 2017 conference on empirical methods in natural language processing: System demonstrations, pp 55–60
Shing H-C, Nair S, Zirikly A, Friedenberg M, Daumé III H, Resnik P (2018) Expert, crowdsourced, and machine assessment of suicide risk via online postings. In: Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic, pp 25–36
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Sordoni A, Bengio Y, Vahabi H, Lioma C, Grue Simonsen J, Nie J-Y (2015) A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 553–562
Strapparava C, Valitutti A, et al. (2004) “Wordnet affect: an affective extension of wordnet. In: Lrec, vol. 4. Citeseer, p 40
Sundermeyer M, Schlüter R, Ney H (2012) Lstm neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association, p 4
Susanto Y, Livingstone AG, Ng BC, Cambria E (2020) The hourglass model revisited. IEEE Intell Syst 35(5):96–102
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguis 37(2):267–307
Tang R, Lu Y, Liu L, Mou L, Vechtomova O, Lin J (2019) Distilling task-specific knowledge from bert into simple neural networks. arXiv 8
Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas A (2010) Sentiment strength detection in short informal text. J Am Soc Inform Sci Technol 61(12):2544–2558
Trinh TH, Le QV (2018) A simple method for commonsense reasoning. arXiv 12
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Vlad G-A, Tanase M-A, Onose C, Cercel D-C (2019) Sentence-level propaganda detection in news articles with transfer learning and bert-bilstm-capsule model. In: Proceedings of the second workshop on natural language processing for internet freedom: Censorship, Disinformation, and Propaganda, pp 148–154
Vu NT, Adel H, Gupta P, et al. (2016) Combining recurrent and convolutional neural networks for relation classification. In: Proceedings of NAACL-HLT, pp 534–539
Wang S, Peng G, Zheng Z, Xu Z (2019) Capturing emotion distribution for multimedia emotion tagging. IEEE Trans Affect Comput p 11
Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) Opinionfinder: A system for subjectivity analysis. In: Proceedings of HLT/EMNLP 2005 interactive demonstrations, pp 34–35
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser Łukasz, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144:23. http://arxiv.org/abs/1609.08144
Xu H, Liu B, Shu L, Yu P (2019) Bert post-training for review reading comprehension and aspect-based sentiment analysis. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, vol. 1, p 12
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems, pp 5753–5763
Yang K, Lee D, Whang T, Lee S, Lim H (2019) Emotionx-ku: Bert-max based contextual emotion classifier. CoRR, arXiv:abs/1906.11565, p 6
Yang H, Deng Y, Wang M, Qin Y, Sun S (2019) Humor detection based on paragraph decomposition and bert fine-tuning. In: Reasoning for complex QA workshop 2020, p 4
Yue L, Chen W, Li X, Zuo W, Yin M (2019) A survey of sentiment analysis in social media. Knowl Inf Sys 60(2):617–663
Zahiri SM, Choi JD (2018) Emotion detection on tv show transcripts with sequence-based convolutional neural networks. In: Workshops at the thirty-second aaai conference on artificial intelligence, p 10
Zhu X, Kiritchenko S, Mohammad S (2014) Nrc-canada-2014: Recent improvements in the sentiment analysis of tweets. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), pp 443–447
Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27
Zirikly A, Resnik P, Uzuner O, Hollingshead K (2019) “Clpsych 2019 shared task: Predicting the degree of suicide risk in reddit posts. In: Proceedings of the sixth workshop on computational linguistics and clinical psychology, pp 24–33
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Acheampong, F.A., Nunoo-Mensah, H. & Chen, W. Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif Intell Rev 54, 5789–5829 (2021). https://doi.org/10.1007/s10462-021-09958-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-09958-2