1 Introduction
Emotional intelligence can be considered a mental ability to reason validly with emotional information and the action of emotions to enhance thought [
44]. Empowering the dialogue system with emotional intelligence can enhance plenty of applications, such as self-disclosure in AI mental therapy, smart customer services, and virtual human in the Metaverse. To achieve this goal, it is necessary to enable the machine to understand users’ emotions, generate appropriate emotions, and express them coherently in conversations.
Despite existing works are relatively mature in recognizing the emotions in dialogue context [
15,
40,
60] and rendering specific emotions in responses [
10,
80], the performance of generating appropriate emotions is still limited. Following the problem settings in empathetic responding [
54], most existing studies [
35,
76,
78,
79] infer appropriate emotions for the response without modeling the personality of speakers. Consequently, emotions in responses might be inconsistent with the preceding context. As a result, users may feel they are still talking to rigid machines and reduce their engagement; an example is shown in Figure
1.
To fill in the research gap, we propose a new task: Personality-affected Emotion Generation, which is to empower the dialogue system with personality traits and generate emotion for the response so the emotions will always be consistent with the given personality trait. Affective instability is a core feature of personality disorder [
64], which inspires us that a stable personality trait for the dialogue system helps solve the inconsistency in emotional responses. Besides, it is shown that the personality, i.e., the big-five personality model [
11] can be represented as temperament in the
Valence-
Arousal-
Dominance
(VAD) space for emotions [
45,
46].
1 These findings suggest that different personalities make different impacts on emotional expressions. Moreover, similar to the one-to-many nature [
77] of the dialogue, multiple emotions can be appropriate for response in a similar conversation context, but only one can be selected for the response each time in the dialogue system. Accordingly, personality provides an additional reasoning condition to narrow down the searching space so simplifies the emotion generation.
However, the proposed task entails the following two challenges when applied to reality. The first one is heterogeneously integrating personality and emotional factors. Personality plays an important role in inferring appropriate emotions, as discussed above. According to the trait theory, personality is usually represented as dimensions or spectrums with extremes at both ends. However, the aspects described by the dimensions are defined in psychological analysis, which is different from the ways we describe the emotional factors in conversations (e.g., discrete emotion labels, or dimensional affective vectors). So, how to integrate the personality with the emotional factors to facilitate our task is difficult. The second challenge is the extraction of multi-granularity emotional information (i.e., emotional factors) from the dialogue context. Affective information from conversational data facilitates the emotion generation process. Nevertheless, it can be captured in multiple aspects from conversational data, e.g., utterance-level semantic content, manually annotated emotion labels, and token-level emotional embeddings. Combining affective information in different granularities and different aspects is challenging.
Inspired by personality studies and mood state analysis in
human-computer interaction (HCI) [
17,
21,
45], we propose to simulate the mood transition process affected by the given personality traits for emotion generation to solve the above challenges. In detail, we first project the mood states and emotions of the dialogue system as regions and discrete points in the VAD space. Then, the transition among the mood states is modeled as the shifting among the different regions, where the emotion factors within the dialogue context are the shifting variation, and the personality is modeled as the weight of shifting. In this way, the influence of personality is described in the VAD emotional space where the emotional factors (e.g., mood states and emotions) are also represented. The mood transition process is modeled based on the psychology and HCI domain knowledge, while the parameters are learned from daily conversation data during training. To extract the multi-granularity affective information from the dialogue context, we design an attention layer to align the semantic representations from BERT [
12] as well as the emotion annotations at the utterance level and the token-level VAD embeddings in different aspects. Finally, we observed that different people may express different emotions even in similar mood states or dialogue contexts according to their personalities. So, we generate the response emotion (i.e., a specific point corresponding to discrete emotion labels in the VAD space) within the predicted mood state region also conditioned on the given personality trait.
To facilitate related researches, we construct the
Personality
Emotion
Lines
Dataset
(PELD), which includes 6,510 dialogue triples from 711 pieces of daily conversations with emotion labels and annotated personality traits. The emotion labels and personality annotations are collected and re-organized from other researchers [
22,
51,
75] analyzing the script of a famous TV series
Friends.
2 We further explore the mood transition patterns in different characters on PELD and observe that the personality traits influence more in the mood transitions related to negative emotions. Besides, we analyze the distributions of mood state transitions in PELD and the correlations between personality and mood transitions to facilitate future research.
We conduct extensive experiments on the PELD dataset for evaluation. The results verify that by integrating the personality and the mood transition regression modeling, our method achieves significantly better F-scores than the BERT-based model in Emotion Generation, especially for minority emotions. Besides, we also conduct an extensive ablation study to empirically validate the effectiveness of personality and mood transitions in the Emotion Generation task.
3Our key contributions are summarized as follows:
—
To the best of our knowledge, we are the first to raise the effect of personality on generating appropriate emotion in dialogue systems. We propose the task of personality-affected Emotion Generation, identify the challenging issues, and propose a solution by simulating the mood transition process in the dialogue system inspired by psychological findings.
—
We construct a dialogue script dataset PELD with emotion and personality annotations from several existing corpora. Besides, we analyze the distributions of mood state transitions in PELD and the correlations between personality and mood transitions to facilitate future research.
—
We conduct extensive experiments on PELD to evaluate the effectiveness of our method. The results verify that integrating the personality and the mood transition regression significantly improves the F-scores than base models in Emotion Generation, especially in minority emotions.
In the following content, we first review the related studies in Section
2. Section
3 describes the background and preliminary knowledge of our work. Section
4 presents the problem definition of personality-affected emotion generation and the methodology we propose based on mood transition prediction. Section
6 introduces the experimental setup. Section
7 describes the evaluation results and analysis. Finally, we conclude and mention future work in Section
8.
5 The PELD Dataset
5.1 Dataset Construction & Statistics
To facilitate related research, we construct the
Personality
Emotion
Lines
Dataset (
PELD), an affective dialogue dataset with personality traits for speakers and emotion annotations for utterances. In PELD, each sample is represented as a dyadic dialogue triple
\(\lbrace (U_1, U_2, U_3), M_i, M_r, E_r, P\) }, as shown in Figure
4.
\(M_i\) and
\(M_r\) are mood states expressed in
\(U_1\) and
\(U_3\) ,
P is the personality trait, and
\(E_r\) is the emotion label to be generated.
According to the book
Television Dialogue: The Sitcom Friends vs Natural Conversation [
53], television conversations and natural conversations are basically the same in terms of linguistic features. This classic script is widely analyzed in many dialogue researches [
22,
27,
28]. Therefore, we construct the triples in PELD from the dialogue scripts in
Friends based on existing studies.
Specifically, the utterances and their emotion labels are mainly adopted from the dialogues in the MELD [
51] and the EmoryNLP datasets [
75], two famous datasets analyzing emotional expressions in
Friends. The VAD vector of each mood state refers to Table
3. To keep consistency, each dialogue triple in PELD is constructed within the same dialogue in the original datasets. The personality traits in our dataset are adopted from the personality annotations in 711 different dialogues [
22]. We only keep the personality traits of the six main roles in
Friends for confidence, as these annotations are most frequent. Referring to the annotations, a role may exhibit different aspects of its personality in different conversation scenarios. Thus, for each of the main roles, we average their annotated personality traits in all the dialogues by
\(P = \frac{1}{K}\sum _{i=1}^K{P_i}\) for simplification, where
K is the number of annotations. The averaged results are shown in Table
4.
We also calculate the standard deviation of the personality traits on each dimension. It suggests that Extroversion and Conscientiousness share the largest distinction among the five traits in the six main roles. These two traits, especially Extroversion, play important roles in mood state variation and emotion generation, according to the previous discussion. Therefore, it also reflects the availability of PELD in the aspect of personality.
We split the PELD into
Train,
Valid, and
Test set with portion around 8:1:1. There are 6,527 triples in PELD. The total number of unique utterances in PELD (10,468) is less than the sum of the original MELD (13,708) and the EmoryNLP (9,489). The reason is there are lots of overlaps between these two datasets. Besides, not all dialogues include the six main roles and are suitable for constructing triples. The overall statistics of the dataset are shown in Table
5.
7 The average length of utterances is 9.32 in the whole PELD, which also conforms to the length of short sentences in daily conversation.
Similar to existing emotional conversation datasets [
7,
32], PELD also suffers the emotion imbalance issue. Utterances labeled as
Neutral are the majority (44.6%), while
Fear and
Disgust only take a small portion (6.9% and 1.9%). Though it reflects the actual emotion distribution in daily conversation, it also brings challenges to machine learning models to identify and generate emotions. We tried several automatic methods for data augmentation like synonym substitution, back-translation, or the
Easy Data Augmentation (EDA) proposed in Reference [
68]. But most of the synthetic samples are either odd or the same as the original samples. The reason might be there are limited options for short sentences as utterances in conversation to replace synonyms, add or delete words.
The mood state distribution in PELD is the sum of emotions within mood states. All the triples are equally distributed among the six main roles, ranging from 977 (Phoebe) to 1,159 (Rachel). So, there is no explicit dominant personality in our dataset, making the model more robust when trained on it.
5.2 Mood Transitions in PELD
After constructing PELD, we further explore the dataset in the aspect of mood transitions. As the triples in PELD are constructed for analyzing the transitions between
\(M_i\) in
\(U_1\) and the
\(M_r\) in
\(U_3\) , we show both the mood states and emotion distributions in the
\(U_1\) and
\(U_3\) in Table
6, respectively.
We can see that for both mood states and emotions, the distributions in \(U_1\) and \(U_3\) are similar, which means the transition of mood states and emotions are equitable in PELD triples, i.e., no explicit mood state or emotion transition dominates during the dialogues. It also conforms to the nature of daily conversations. Moreover, the proportions of all mood states and emotions are also similar to the overall statistics of PELD, which suggests that the mood states and emotions in PELD are also average distributed in the triples.
Since mood transitions are affected by the personality traits as discussed above, we exhibit the mood transition patterns for different roles with different personality traits in Figure
5. In general, among the six transition matrixes, most of the first columns are in deeper colors, which indicates most transitions occur from other mood states to
Neutral, as it is the majority in PELD. Besides, blocks with deeper color are also more likely to occur around or in the diagonals of the transition matrixes; it suggests the preceding mood states tend to transition to the same or similar ones.
As for individual differences, nearly 60% of
\(M_3\) in Joey and Phoebe transmit to
Neutral and
\(M_1\) , which shows they can better handle the negative mood states. This also correlates to the lower Extraversion compared to others in Table
4. Specifically, over 80% of
\(M_1\) in Joey remains to be
Neutral and
\(M_1\) , which especially shows Joey is an optimistic person. As for the ratio of unpleasure mood states
\(M_2\) and
\(M_3\) change to pleasure mood state
\(M_1\) , Phoebe achieves the highest.
Moreover, to highlight the individual differences in mood transitions among the six main roles in detail, we also show the
standard deviations (Std) of each row in the transition matrixes of the six main roles, as shown in Figure
6. The red bar chart shows the Std of the infinite norms of rows in the mood transition matrix, which indicates the diversity of the most probable mood states from the same mood state in transitions of different roles. While the blue bar chart shows the Std of the L2-norms, which generally describes the difference in how different roles transfer from one mood state to others. Detailed calculations of the standard deviations are shown below:
where
\(||\mathcal {M}^i||_2\) is the L2-norm of the
ith mood state among the six roles, while
\(||\mathcal {M}^i||_{inf}\) is the corresponding infinit-norm.
m is the number of the roles,
n is the number of columns in each mood transition matrix.
\(\mathcal {R}\) is the
ith row in the mood transition matrix, indicating the result mood states of the
ith mood state’s transition.
\(\mu _{2}^i\) and
\(\mu _{inf}^i\) are the mean of
\(||\mathcal {M}^i||_2\) and
\(||\mathcal {M}^i||_{inf}\) , respectively.
Both charts show similar patterns of mood transitions. Unpleasure moods ( \(M_2\) and \(M_3\) ) vary the most in different roles, while people are more common when process Neutral and pleasure mood \(M_1\) in conversation. Besides, Unpleasure mood states ( \(M_2\) and \(M_3\) ) are relatively higher than pleasure mood states and Neutral on average, which means the individual difference is larger. So, we can infer that personality traits influence more in mood transitions from negative emotions.
5.3 Personality-aware Mood State Transitions
Although we have done an extensive analysis of the mood transitions of different characters, these conclusions may have certain limitations, because their personality traits are specially designed by the scriptwriters for the TV series.
To explore more of the influence of personality on mood transitions, we investigate the personality-aware mood state transitions. Specifically, we calculate the Spearman correlations between mood state transitions and different personality traits in PELD, as shown in Figure
7.
Figure
7 shows the mood state variation from
\(U_1\) to
\(U_3\) in the triples in PELD, where we illustrate the variations in separate V-A dimensions.
8 We can see that
Conscientiousness (C) has a higher positive correlation with mood state variation in the V-dimension. It suggests that the mood states of speakers with higher Conscientiousness tend to be more positive during conversations. In other words, if these speakers initially experience
Fear,
Anger,
Sadness, or
Disgust, then they are more likely to transition toward
Surprise or
Joy as the conversation progresses. On the contrary, speakers with higher
Openness (O) have a higher negative mood state variation in the V-dimension, which means these speakers are more likely to be first positive and then relatively negative in the conversations. In terms of emotion, these speakers are more likely to be first in
Surprise or
Joy, then become
Fear,
Anger,
Sadness, or
Disgust.
It is worth noting that the current conclusions we draw from analyzing mood state transition patterns based on the Big Five in PELD still have certain limitations. Personality expression, especially in conversational contexts, is affected by various cultural backgrounds, social factors, and even different conversation situations. Therefore, any analysis results from a single conversation dataset will have specific limitations.
However, our intent is not to derive a universal conclusion about the influence of personality on mood transitions in conversations from a social science perspective. The methods we present, such as calculating mood transition matrices, comparing the norms of these matrices, and examining the correlation between changes in different affective dimensions (V, A) and the intensity of different personality traits, can be used to analyze various conversation datasets with Big Five personality and affective annotations.
6 Experiment
6.1 Evaluation Task
To validate the effectiveness of our proposed method, we conducted the Emotion Generation task on PELD. Emotion Generation requires the model to generate the appropriate response emotion in the upcoming utterance based on the preceding dialogue context in a dyadic conversation scenario. The emotions here are discrete labels of basic emotions (i.e., Anger, Disgust, Fear, Joy, Neutral, Sadness, and Surprise).
We evaluate the performance by F-scores of single emotions. Besides, the overall performance is also measured from two aspects: the macro averaged F-score (m-avg) and the weighted averaged F-score (w-avg) of all seven emotions. A higher m-avg indicates the model performs relatively better in predicting each category, while a higher w-avg means the model predicts mood states or emotions with larger proportions in the dataset better. Our evaluation is under the assumption that emotions expressed in the upcoming utterance are the emotions generated by the speakers.
6.2 Ablation Study Setting
Although there are existing studies working on emotion recognition in conversation [
34,
61], there is no existing model to solve the Emotion Generation task issued in our work. Therefore, we conduct an ablation study to evaluate the effectiveness of different modules of our model design and the ways we utilize the personality and model the mood transition. The ablation study compares the performances of the following baseline models:
BERT-base: BERT-base [
12] is a famous pre-trained language model with 12 transformer layers, over 110 million parameters, and 12 attention heads. It is capable of capturing bidirectional context for performing a wide range of downstream NLU tasks. BERT-base is pre-trained with a large corpus of text consisting of around 3.3 billion words and 2,500 million sentences. This vast corpus allowed BERT to learn a wide range of linguistic patterns and structures, making it a highly effective pre-trained model for natural language processing tasks. We use the pre-trained BERT-base model, corresponding to the
\(E_n\) in our model, to encode the preceding dialogue context to obtain the semantic representation as input, then directly predict the emotions for response through a classification head.
BERT-Mood: based on the vanilla BERT-base model, we add a mood state classification as the auxiliary task to enhance Emotion Generation. In BERT-Mood, we first use the BERT-base model to encode the preceding dialogue context to obtain the semantic representation. Then, the semantic representation is fed into two classification heads: Emotion Generation and mood state classification. The sum of the losses from these two classification tasks will be back-propagated to the BERT-base model and update its parameters. This baseline model is to verify if the mood state information helps the Emotion Generation task.
BERT-MT: BERT-
MT (Mood Transition) integrates the mood transition module (as described in Section
5.2) into BERT-base. In BERT-MT, the transitioned mood state is concatenated with the dialogue context to help generate
\(E_r\) , where we use both the regression loss of the mood state generation and the classification loss of Emotion Generation for supervision. In this baseline model, the personality is ablated to its effectiveness. To eliminate the influence of the model scale, we keep the number of parameters the same as our model and randomly initialize the personality-related parameters.
BERT-P: In BERT-P, we concatenate the preceding context representation from BERT-base and the specified personality trait vector together. Then, the concatenated representation is fed into the classification head for Emotion Generation. This baseline model is to evaluate whether merely using the personality trait vector can enhance Emotion Generation.
6.3 Implementation Details
To facilitate the reproduction of our model, we add the implementation details as follows: The dialogue flows in PELD are fed into the models in batches with a size of 16. In encoding the preceding dialogue context, we pad all the utterances with [PAD] to the MAX_LEN of 128. We adopt the pre-trained BERT-base model from huggingface.
9We set the warm-up as 0.05 of the total training times. Besides, we use Adam [
25] as the optimization algorithm in training. The learning rate for all the models is set as
1e-5. All the models for testing are selected referring to the best performance on the Validation sets in 50 epochs of training.
To ensure the reliability of results, we run each experiment 10 times with different random seeds and record the average performances and the stardard deviations, where the random seeds are used to split the datasets and initialize the model parameters. Last, we released our code and data at github.
107 Results and Analysis
In this section, we report and analyze the experimental results of Emotion Generation in our ablation study by answering several research questions:
RQ1:
Does our model outperform baseline methods in Emotion Generation?
RQ2:
What is the correlation between the mood transition process and Emotion Generation?
RQ3:
Are certain personalities easier to predict mood transition and Emotion Generation?
RQ4:
Are there any cases to show how our model generates the emotions for the response?
7.1 Does our model improve baseline methods with the mood transition and personality in Emotion Generation?
We report the performance of Emotion Generation on the test set of the PELD in Table
7. To show the significance of the outperformance of our method, we conduct the Welch’s t-test [
70] between our model and all the baseline models. The Welch’s t-test, also known as Welch’s unequal variances t-test, is a statistical test used to compare the means of two independent groups when the assumption of equal variances is violated. It is a modification of the traditional Student’s t-test that allows for unequal variances between the groups being compared. We employed Welch’s t-test to assess the statistical significance of our model’s superior performance over other baseline models. Since these results were derived from different models, we cannot guarantee that their variances are equal or similar. Therefore, we opted for Welch’s t-test instead of the standard t-test.
As we can see in Table
7, the comprehensive performance of all models is moderately low, which indicates the task’s difficulty. The reason is that Emotion Generation is a seven-classes generation task without knowing the response content. Besides, it also suffers from the imbalance issue, as shown in the data distribution in Table
6. When we look at the average F-scores, w-avgs of all models are higher than the m-avgs. It also verifies that generating the majority emotion (i.e.,
Neutral) is easier than other emotions. Our model statistical-significantly outperforms all other baseline models in w-avg,
Anger,
Disgust, and
Neutral. Besides, it also surpasses multiple baseline models in m-avg,
Surprise,
Joy, and
Sadness. The results verify that our model design efficiently utilizes the mood transition process and the personality to improve Emotion Generation.
The lowest performance occurs when we only use the BERT-base model for the Emotion Generation task. The m-avg is 0.242 and the F-score of Disgust is only 0.012. However, when we integrate the mood state transition and the personality trait into the BERT model, the performances are immediately improved, especially in generating Disgust. It suggests that our method improves the base model by raising the performance of generating minority emotions.
We first compare the performances of BERT, BERT-Mood, and BERT-MT to illustrate how different utilizations of mood states influence Emotion Generation. By comparing BERT and BERT-Mood, we can see that integrating the mood transition prediction task decreases the F-scores of the majority of emotions such as Anger, Joy, and Neutral but increase the rest of minorities. Consequently, although the m-avg remains the same, the w-avg goes down. So, predicting the mood state helps generate minority emotions.
Then, we look at BERT-MT compared with BERT-Mood, F-scores of Fear, Sadness, and Fear decrease, while the F-scores of all other emotions grow up. Besides, both the m-avg and the w-avg improve than BERT-Mood. So, integrating the mood transition process in Emotion Generation fixed the performance issue of the majority of emotions in BERT-Mood.
Then, we focus on the personality. Comparing BERT-P and BERT, we can see that: Although integrating the personality trait can slightly improve the m-avg, some majority emotions, e.g., Neutral, Anger are lower, so the w-avg also decreases. However, when utilizing the personality as the mood transition weight in our model, we can see the performance is much better. So, in the Emotion Generation task, directly concatenating the personality trait causes unstable improvement, but modeled as the mood transition weight, the personality helps achieve robust and significant enhancement.
7.2 What is the correlation between the mood transition process and Emotion Generation?
To elucidate the correlation between the mood transition process and Emotion Generation, we present the impact of distinct mood transition processes on the generation of diverse emotions. Specifically, we conduct experiments of Emotion Generation with 10 random seeds and calculate the Spearman correlation coefficient between the results of mood transition (the F-scores of accurately predicting each mood in the response) and the F-scores of different emotions. The correlation is illustrated in Figure
8. In particular, the last row in Table
8 indicates the Spearman correlation coefficient between the average F-scores of all the mood transition predictions and the F scores of each emotion generation. Similarly, the last column in Table
8 is the Spearman correlation coefficient between the average F-scores of all emotion generations and the F scores of each mood transition prediction.
In general, Emotion Generation is partially conditioned on the results of mood transition; accurate prediction of mood transition means that the information provided to Emotion Generation is correct. The last row of Figure
8 shows that the overall predictions for mood transitions are positively correlated with the generation of all emotions. Besides, the last column shows that the transition predictions of all single mood states are also positively correlated with Emotion Generation.
The last column shows that the F-score of predicting M1 has the highest correlation with the average result of Emotion Generation. We infer the reason is that emotions in M1 (Joy and Surprise) take a large proportion of all emotions in PELD. Accurately predicting M1 helps to correctly generate these emotions. Moreover, the high correlation between M3 and Emo_avg shows that M3 only corresponds to the minority emotion Sadness, and correctly predicting M3 will help eliminate interferences in Emotion Generation.
We further look at the correlations among specific mood states and emotions. The accurate prediction of M3 is highly correlated with the generation of Sadness, and it is the highest correlation value among all. This is also due to M3 only corresponding to one emotion, Sadness, as mentioned before. However, we also noticed that the results of some mood transitions are negatively correlated with Emotion Generation, such as Neutral mood states and Anger or M2 and Fear. It indicates that when these mood transitions are accurately predicted, some Emotion Generation will be decreased. We infer the reason behind this might be that these two different losses \(\mathcal {L}_{mood}\) and \(\mathcal {L}_{emo}\) in our model inevitably conflict with each other. However, because the introduction of mood transition has improved the Emotion Generation as a whole, as mentioned above, we retain the current design.
7.3 Do certain personalities easier to predict mood transition and Emotion Generation
We analyze the results of our method (averaged from 10 random seeds) on the test set based on different personalities. The results are shown in Figure
9.
Specifically, for the triples in the test set, we manually labeled the speakers with binary big-five personality categories according to the original annotations in FriendsPersona. It means if the triples occur in FriendsPersona, we directly adopt the annotations; otherwise, we manually label the personality categories of the speakers.
The results in Figure
9 show that the mood transition results of different personalities (both mood_macro and mood_weighted) vary a lot. The mood states of speakers with NEU are easier to predict, while speakers with EXT are more difficult to be captured, relatively. We infer the reason might be: NEU speakers are more sensitive in conversation. Their mood states will be easier to be influenced by the dialogue content. Therefore, the dialogue content as the mood transition variables in our method provides more valuable information. While EXT speakers tend to be outgoing and talkative in most dialogue contexts, their subtle mood states transitions are more difficult to capture in our method.
As for the Emotion Generation, emotion_macro of different personalities also vary, but emotion_weighted of different personalities are more similar to each other. The reason is that correctly predicting the majority emotion Neutral is counted more in calculating emotion_weighted compared with emotion_macro. However, Neutral is expressed the most among all speakers regardless of their personalities. Besides, the patterns of speakers of EXT and NEU remain similar in emotion_macro due to their characteristics.
7.4 Are there any cases to show how our model generates the emotions for the response?
We conduct a case study and analyze the result samples of our method on the test set to show how our method works in real conversation scenarios. As shown in Table
8, the Utterance 1 and the Ground Truth Response are from the same speaker with a known personality.
First, we show two representative samples where our model correctly generates the emotion for the response. The first sample occurs when the two speakers are talking about the fetal movement, both speakers are in the Joy emotion in the context, so our model generates the Joy for the response and is verified in the ground truth. In the second sample, Rachel is sad in utterance 1; after being comforted and helped by Michael in utterance 2, she becomes happy in the response. This sample verifies that our model is able to generate the Joy emotion with the personality of Rachel. With relatively higher Extroversion, Rachel’s mood state is easier to be influenced by others and express emotion with higher Arousability.
Moreover, we also show an example where our model makes mistakes. In the third case, our model wrongly generates the Angry because the two speakers argue angrily in the context. However, the content of Rachel’s response clearly demonstrates her willingness to attend the lecture. This case also illustrates the limitation of our method: Without knowing and modeling enough background knowledge of the speakers, it is difficult to generate appropriate emotion in conversations.
8 Conclusion and Future Work
In this work, we raise a new task of personality-affected emotion generation and propose a new perspective to solve it through personality-affected mood transition. Besides, we construct a dialogue script dataset PELD with emotion and personality labels to facilitate related research. We conduct extensive experiments on PELD to evaluate the effectiveness of our method. The results verify that integrating the personality and the mood transition regression significantly improves the performance in emotion generation, especially in minority emotions.
In future research, we intend to focus on two issues: (1) the personality effects on emotions in the multi-modality scenario and (2) personality effects on response generation. Facial expressions, voices, gestures, and environment information are also vital in emotional interaction, but they are not captured in purely text-based dialogue systems. Besides, as seen from statistics in PELD, the most common emotion in the dialogue scripts is still
Neutral. One possible reason is that other subtle affective information is not captured in the text. Therefore, our future works will continue to investigate the personality effects on emotions in the multi-modality scenario. Besides, the influence of personality on language usage is also studied in existing works [
8,
63]. To construct intelligent dialogue systems with personality, it is also important to investigate how the given personality influences the semantic content in response generation [
20].