Keywords

1 Introduction

Nowadays, driver monitoring is a topic of paramount importance, because distraction and inattention are a relevant safety concern, as well as leading factors in car crashes [1]. For example, being distracted can make drivers less aware of other road users such as pedestrians, cyclists and road workers, as well as less observant of road rules such as speed limits and junction controls. Therefore, among driver assistance systems and autonomous driving system, the detection of driver’s states has especial importance, since such a status may affect (less benefits) the effectiveness of these systems, e.g. driver is not ready to take back the control, or to responds to a given warning [2, 3].

Currently, several Driver Monitoring Systems (DMSs) offered by carmakers are based on:

  • The vehicle dynamic data to detect “abnormal behavior” and so identify tiredness or drowsiness;

  • The use of internal camera, which considers where the driver is looking at, to determine visual distraction (e.g., eyes off the road) and drowsiness.

However, they neglect to consider driver emotional states, despite research demonstrated precisely that emotion and attention are linked, and both have an impact on performances [4]. In particular, driving tasks strongly vary based on emotional state [4], and it is well known that aggressive driving due to inability to manage one’s emotions represents a major cause of accidents [5, 6]. In fact, negative emotions – such as anger, frustration, anxiety, fear, and stress - can alter perception and decision-making, leading to the misinterpretation of events and even affect physical capabilities [7, 8].

To improve driving safety, it is therefore necessary to equip cars with emotion-aware system based on AI technologies able to detect and monitoring human emotions and to react in case of hazard, e.g. providing warning and proper stimuli for emotion regulation and even acting on the dynamics of the vehicle.

In this context, this paper proposes to equip the car with a DMS, which thanks to a multimedia sensing network and an affective intelligent interface is able to:

  • Monitor the emotional state and degree of attention of driver and passengers, by respectively analyzing the persons’ facial expressions and eye gaze;

  • Mapping both the cognitive and emotional states with the car interface features in a responsive way, in order to increase human wellbeing and safety.

The paper presents an overall description of the proposed system and focuses on a preliminary experiment aimed at assessing the feasibility of using a low-cost emotion detection system based on facial expression analysis in a car simulator context.

2 Research Background

Induced emotions may affect subjective judgment, driving performance, and perceived workload during car driving. In particular, it seems that induced anger reduces deriver’s subjective safety level and lead to degraded driving performance compared to neutral and fear. Happiness also showed degraded driving performance compared to neutral and fear. Fear did not have any significant effect on subjective judgment, driving performance, or perceived workload [9].

As it is well known, a number of environmental factors may arouse emotions (e.g., road traffic condition, driving concurrent tasks such as phone conversation, etc.). For example, people actually spend most of their driving time listening music on their radio or CD player, and this affect the driving attention because of its emotional nature. It has been observed that, while a neutral music does not seem to influence driving attention, sad music would lead to no-risk driving whereas happy music would be associated with a more dangerous driving [4].

However, despite the importance and prevalence of affective states in driving, no systematic approach has yet been developed in order to relate the driver’s emotional state and driving performance [10]. Moreover, results of empirical studies seems to evidence that the traditional influence mechanisms (e.g., the one based on valence and arousal dimensions) may not be able to explain the affective effects on driving. In fact, affective states in the same valence or arousal dimension (e.g., both anger and fear belong to negative valence and high arousal) showed different performance results [9]. Consequently, to explain complicated phenomena such as the effects of emotions on driving, a more elaborate framework needs to be defined.

Today several methods and technologies allow the recognition of human emotions, which differ in level of intrusiveness. Obviously, the use of invasive instruments based on biofeedback sensors, (e.g., ECG or EEG, other biometric sensors) can affect the subjects’ behavior and in particular it may adulterate his/her spontaneity and consequently the emotions experienced by them [11]. Consequently, in the last years, several efforts have been made to improve non-intrusive emotion recognition systems, based on speech recognition analysis and facial emotion analysis. In particular, facial emotion analysis aims to recognize patterns form facial expressions and to connect them to emotions, based on a certain theoretical model. Most of these systems implements Deep Learning algorithms based on Convolutional Neural Networks (CNN), which take in input different kind of pictures and make predictions based on a trained model [12, 13]. The most robust algorithms developed until now allow to recognize only Ekman & Fresner’s primary emotions (i.e., anger, fear, dis-gust, surprise, joy and sadness) [14]. In fact, most of the databases of facial expressions currently available are based on these emotions.

In the past, several emotion-aware car interfaces have been proposed, which uses bio-signals collected through wearable devices [15, 16] or audio signal analysis, e.g., detection vocal inflection changes [17]. Based on the best of our knowledge, only [10] attempted to combine facial expression and speech analysis only at a conceptual level, while no study has actually tested the effectiveness of facial expression recognition systems in a motor vehicle context to enable emotion recognition.

3 The Proposed System

The scheme in Fig. 1 describes the proposed overall system architecture. It is characterized by the following subsystems:

Fig. 1.
figure 1

The proposed system architecture

  • The Driver Attention Detection (DAD) system: it acquires data (e.g., eye gaze) through the eye tracking system just embedded in the car simulator and processes it by using the machine learning algorithms described in [18] and [19].

  • The Emotion Recognition (ER) system: it is a software capable of recognizing Ekman’s universal emotions by mean of a convolutional neural network based on the Keras and Tensowflow frameworks, trained merging three different public datasets (i.e., the lab generated CK+ , FER+ and AffectNet) and based on one of the most popular CNN architectures (i.e., VGG13). This trained network has been evaluated on EmotioNet 2018 challenge dataset [20]. Results reported in [21] evidenced high recognition accuracy especially for happiness, anger and neutral emotions.

  • The DMS system: it takes the data from the ER and DAD systems as input and reworks them according to rules, in order to determine the prevailing emotion and degree of attention in a given inter-valley of time. It then makes a holistic assessment of the risk associated with the cognitive and emotional state of the driver, using appropriate decisional algorithms (e.g., based on Bayesian Belief Networks as suggested in [22]). For example, it associates a high risk when it releaves strong states of excitement due to anger or happiness, which as has been demonstrated in previous studies may alter driving skills and a low level of attention, while it associates a medium level of risk when it detects an altered emotional state but a high level of attention.

  • The Driving Style Detection Module: it is based on the sensing of driving behavior, based on the acquisition of the CAN network from the vehicle/simulated vehicle. Several researches [23, 24] indicate that driving-related parameters are significant and objective indicators of driver’s impairment. For example, steering frequency (e.g. expressed in Steering wheel Reversal Rate) is considered as a relevant indicator to measure driving task demand [25], while Standard Deviation from Lateral Position (SDLP), i.e. the results of the vehicle’s movement induced by driver steering actions with respect to the road environment (in specific, the position from the center of the lane), is considered as a direct indicator of driving performance [26]. Safety-related insights can be also measured with the calculation of the Time to Collision (TTC), i.e. the time required for two vehicles to collide at their present speed and on the same path [27]. Nowadays, in-vehicle DMS are often designed to exploit this data, and the topic of Driver Monitoring and Assistance Systems (DMAS), based on system intervention depending on the driver’s monitored status, is raising [28].

  • Smart Car Interface): it behaves as a Smart Car Agent, interacting with the driver to adapt the level of automation or support the decision-making process or providing audio and video alerts, when an affective state that could compromise the driving performance is revealed. Moreover, it can ensure emotion regulation and provide a unique user experience, e.g. thorough the selection of a proper music playlist according to the emotional state, activation of LED lighting to change cabin colors or creating a more engaging atmosphere.

4 System Implementation and Experimental Test

A first step for the investigation of the requirements of this system has been developed, by integrating 4k full spectrum cameras in the driving simulator. A preliminary test was conducted in order to determine the effectiveness of the proposed emotion recognition system in a driving context.

4.1 Participants

Twenty participants were involved and randomly assigned to the control (5 males and 5 females) or to the experimental group (6 males and 4 females). All participants had a valid driving license since at least 3 years and normal hearing capabilities. The average age was 26.6 for the control group and 29.4 for the experimental group.

4.2 Materials

The RE:Lab simulator that was used in this study is a static driving simulator composed of a complete automobile, fully functional pedals and dashboard, and a large screen showing a highway scenario. The simulation engine is SCANeR Studio 1.8 [19]. Simulated highway scenarios were designed using actual European geometric route design standards and consisted in a simple long O-shape path without any crossroad. During the simulation, several possible driving parameters were monitored: in particular the simulator was able to record the steering wheel movements, the force applied to the pedals and the real-time position of the vehicle in the simulation scenario. These raw data were used to evaluate the driving performance indicators used as independent variables.

Furthermore, the images of a 4k camera installed on the simulator was used as input for the emotion recognition system, while the acoustic stimuli were conveyed through a high-performance headset system.

4.3 Experimental Procedure

Each trial lasted approximately 20 min for the control group and 15 min for the experimental group, all participants firstly drove for a 5-min period in order to familiarize themselves with the simulator; after that, both groups were asked to complete a 6-min driving task during which, only for the experimental group, the six acoustic stimuli of approximately 5-s each were provided and delivered with a delay of 45 s among each one.

Additional to the driving task, the control group received also a listening task with the same stimuli used during the driving task of the experimental group.

At the end of the tasks all participants were asked to compile a NASA-TLX survey to rate the perceived workload.

4.4 The Task

The driving task was the same for the two 6-min driving periods (experimental and control group). The subjects were required to drive on an high-way setting without traffic and to respect the code of the road as they would in a natural setting, information of the speed limit were provided directly within the scenario as road signs and the subjects could see their speed on the dashboard speedometer.

The listening task was performed by the participants of the control group while sitting on the simulator station through a pair of headsets, but in this case without driving and with the simulator switched off, only the camera information for the emotion recognition system were processed.

4.5 Dependent Variables

Firstly, in this study was explored the effect of acoustic stimulations on emotion both in driving and no-driving context, by using a convolutional Neural network model to recognize the six Ekman’s universal emotions [21]. The data of classification of each of the six universal emotions have been combined to create an engagement index as metric for emotional involvement opposed to the absence of emotions.

Furthermore, the emotions indexes have been combined in order to generate a differentiation of the emotional engagement in accord with the arousal-valence model [29]. On this base, four new indexes were created:

  • Engagement classified on the arousal axis;

    • Active engagement (Surprise, Fear, Anger, Happiness);

    • Passive engagement (Sad and disgust);

  • Engagement classified on the valence axis;

    • Positive engagement (Happiness and surprise);

    • Negative engagement (Anger, Disgust, Sadness, Fear).

All these indexes were compared for the different experimental scenarios: driving task with and without stimuli, exposition to stimuli with and without driving task.

Secondly, the effects of the emotional state and engagement on driving performance have been analyzed. To this end, the standard deviation of the Lane position (SDLP), and the standard deviation of the steering wheel angle (SDSTW) have been taken into consideration as indicators for later control performance metrics. While the standard deviation of speed, and the standard deviation of pressure on the gas pedal have been considered as indicators of longitudinal control performance.

Finally, the cognitive workload was assessed for each participant through the NASA-TLX survey [30].

5 Results

5.1 Classification of the Acoustic Stimuli

Seven different sounds were used to elicit emotions on participants while the emotion recognition system was used to classify the participants’ reaction to such stimuli both in driving and no-driving context in accord with the Ekman theory of fundamental Emotions.

The stimuli have been chosen taking as inspiration the International affective digital sounds (IADS) [31] from other datasets available online and freely accessible; however, the sounds were not already standardized or categorized.

Among all the stimuli the neutral indicator (namely the absence of emotions) showed the highest value: this may be explained by the nature of the acoustic stimuli, which may not be sufficiently engaging to trigger a strong emotional reaction. However, the emotion recognition system was able to detect the presence of the six fundamental emotions in different percentage associated with each stimulus (Fig. 2). Apart from the neutral indicator, the second main emotion monitored was sadness which was associated with 5 stimuli (car crash, human vomiting, zombie, car horn and scream pain) followed by happiness which showed a higher level in 2 stimuli (laughing child and fart/raspberry).

Fig. 2.
figure 2

Fundamental emotions recorded after emotion induction through stimuli in all participants

5.2 Engagement

Together with the Ekman fundamental emotions, the sum of the percentage values associated to the probability of occurrence of each Ekman’s emotions (i.e., Joy, Sadness, Anger, Fear, Disgust and Surprise) has been recorded for each tenth of second. This value may be described as a representation of the engagement level of each participant, which is specular to the “neutral/not revealed value” of emotion (Fig. 3).

Fig. 3.
figure 3

Differences in engagement among groups

The level of engagement was compared among the different groups using the non-parametric Wilcoxon test due to the lack of normality of the distribution. As result the group engaged in the primary driving task while listening at the auditory stimuli (Mdn = 23.92) showed a significant lower level of engagement, Z = 12157669, p < 0.001, compared with the group associated with the acoustic stimuli without driving task (Mdn = 43.53). Also the comparison between the group engaged in the driving task with acoustic stimuli (Mdn = 23.92) and without stimuli (Mdn = 30.26) showed a statistically significant difference (Z = 10501813, p < 0.001) and a lower level of emotional engagement for participants whom drove without acoustic stimuli.

Engagement and Arousal

When we consider the level of engagement as distributed on the arousal continuum, we find a significant higher level of active engagement (anger, happiness, surprise and fear) (Z = 7235033, p < 0.001) for the group exposed to acoustic stimuli while driving (Mdn = 12.2) compared with the group not exposed to stimuli while driving (Mdn = 6.12). The passive engagement (Sadness and disgust) on the other hand, resulted to be significantly higher (Z = 13898945, p < 0.001) in the group not exposed to stimuli (Mdn = 3.52) than in the group exposed to stimuli while driving (Mdn = 0.74) (Fig. 4).

Fig. 4.
figure 4

Differences between groups in terms of passive and active engagement.

Regarding the comparison of the groups exposed to acoustic stimuli with and without driving task, we can observe a significantly higher level of active engagement (Z = 7792699, p < 0.001) in the group with the task (Mdn = 12.2) compared with the group without the driving task (Mdn = 6.38). At the same time, a statistically significant lower level of passive engagement (Z = 12981028, p < 0.001) was shown for the group exposed to stimuli while driving (Mdn = 2.2) than for the group without driving task (Mdn = 0.74).

Engagement and Valence

When we compare the engagement of opposite valence between the groups engaged in the driving task with and without stimuli, we find a statistically significant difference of positive engagement (Z = 7379057, p < 0.001) for the group exposed to stimuli (Mdn = 3.22) compared with the group non exposed to stimuli during the driving task (Mdn = 1.35); conversely, if we compare the Negative engagement between the same groups, we find a significantly higher level of negative engagement in the group non exposed to stimuli (Mdn = 26.51) compared with the group exposed to stimuli while driving (Fig. 5).

Fig. 5.
figure 5

Differences between groups in terms of negative and positive engagement.

Similarly, the positive engagement appears to be significantly higher (Z = 9193588, p < 0.001) in the group exposed to stimuli while driving (Mdn = 3.22) compared with the group exposed to stimuli without driving task (Mdn = 1.92); while the negative engagement appears significantly higher (Z = 12741859, p < 0.001) in the group without driving task (Mdn = 25.89) compared with the group exposed to stimuli during the driving task (Mdn = 4.74).

5.3 Driving Performance

Assessments of the driving performance were conducted between the group driving with acoustic stimuli and the one driving without stimuli. Of the four performance indicators considered, only the standard deviation of the lane position (SDLP) was found to be statistically different and normally distributed. Indeed, comparing the means through a t test, it showed a statistically higher level t(15.9) = 4.3043, p < 0.001 in the group not exposed to acoustic stimuli (M = 0.53, SD = 0.13) than the group exposed to stimuli while driving (M = 0.31, SD = 0.09) (Fig. 6).

Fig. 6.
figure 6

Effect of emotion stimulation on driving performance.

5.4 Workload

The cognitive workload for each group was assessed using the NASA-TLX, which defines the workload as “the sources of loading imposed by different tasks.” [30].

The overall workload rating in this study was medium-low, falling in the bottom half of the NASA-TLX scale in all the experimental conditions (values range from 0 to 20). The workload resulted overall higher in the control group (driving task without acoustic stimuli) with a mean value of 8.18, while the experimental group (driving task with acoustic stimuli) reported a workload mean value of 5.3, for both the groups the item with the highest rating was Effort, respectively 11 in the experimental condition and 6.5 in the control group.

6 Conclusion

The paper introduced at a conceptual level an emotion-aware DMS, which exploits a low-cost emotion recognition system based on facial expression analysis. A preliminary experiment has been carried out in order to assess the feasibility of the proposed system in a driving context.

Results evidenced that the considered emotion recognition system is capable to correctly qualify the drivers’ emotion in a driving simulation context: it associate negative emotion (sadness, anger) to stimuli labeled with negative valence (e.g., car crash, human vomiting, car horn, etc.), while it associated a positive emotion (happiness) to stimuli labeled with positive valence (e.g., laughing child).

However, a high percentage of neutral indicator has been collected for all the stimuli, whereas the detected levels of engagement are overall low. This can be due to the nature of stimuli used to induce emotion: the induction of emotions by means of sound stimuli is not as effective as that carried out by means of audiovisual stimuli or other techniques, such as autobiographical recall [32]. Future studies should be conducted using other methods to induce emotions while driving.

The exposition to the auditory stimuli increased the participant active engagement during the driving tasks. The same auditory stimuli involved participants in a more positively (or less negatively) way during driving, if compared with a no-driving situation. In addition, the driving task with emotional stimulation resulted in a lower perception of mental workload and a better performance in terms of lateral control of the driving trajectory.

This may be due to the fact that the driving route was very quiet and repetitive and therefore participants may have found it boring. In fact, boredom can be representative of low-activation of affective states while driving [10]. As a result, the sounds, regardless of their nature, may have contributed to increasing the enjoyment, and consequently the positive affective engagement, of the participants during the driving task. However, future studies should be carried out to better investigate the effect of induced emotions on more realistic driving tasks.