Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

Vocal features obtained through automated methods in verbal fluency tasks can aid the identification of mixed episodes in bipolar disorder

2021, Translational Psychiatry

Translational Psychiatry ARTICLE www.nature.com/tp OPEN Vocal features obtained through automated methods in verbal fluency tasks can aid the identification of mixed episodes in bipolar disorder Luisa Weiner 1,2,3 ✉ , Andrea Guidi4,5, Nadège Doignon-Camus1, Anne Giersch 1 , Gilles Bertschy1,2,6 and Nicola Vanello4,5 © The Author(s) 2021 There is a lack of consensus on the diagnostic thresholds that could improve the detection accuracy of bipolar mixed episodes in clinical settings. Some studies have shown that voice features could be reliable biomarkers of manic and depressive episodes compared to euthymic states, but none thus far have investigated whether they could aid the distinction between mixed and nonmixed acute bipolar episodes. Here we investigated whether vocal features acquired via verbal fluency tasks could accurately classify mixed states in bipolar disorder using machine learning methods. Fifty-six patients with bipolar disorder were recruited during an acute episode (19 hypomanic, 8 mixed hypomanic, 17 with mixed depression, 12 with depression). Nine different trials belonging to four conditions of verbal fluency tasks—letter, semantic, free word generation, and associational fluency—were administered. Spectral and prosodic features in three conditions were selected for the classification algorithm. Using the leave-onesubject-out (LOSO) strategy to train the classifier, we calculated the accuracy rate, the F1 score, and the Matthews correlation coefficient (MCC). For depression versus mixed depression, the accuracy and F1 scores were high, i.e., respectively 0.83 and 0.86, and the MCC was of 0.64. For hypomania versus mixed hypomania, accuracy and F1 scores were also high, i.e., 0.86 and 0.75, respectively, and the MCC was of 0.57. Given the high rates of correctly classified subjects, vocal features quickly acquired via verbal fluency tasks seem to be reliable biomarkers that could be easily implemented in clinical settings to improve diagnostic accuracy. Translational Psychiatry (2021)11:415 ; https://doi.org/10.1038/s41398-021-01535-z INTRODUCTION Mixed episodes, wherein depressive and manic symptoms cooccur, are frequently experienced during the course of bipolar disorder (BD), and are associated with a more recurrent and unfavorable illness course [1]. Among patients experiencing an acute mood episode, frequency of mixed depression and mixed mania range between 20–70% and 30–40%, respectively, depending on the samples, and the diagnostic thresholds used [1, 2]. The identification of mixed episodes has important consequences for care, as mixed episodes require different treatment options compared to their non-mixed forms. Indeed, patients with mixed mania have a significantly poorer response to lithium compared to patients with pure mania [3] but show better treatment responses with stabilizing antiepileptic drugs and some atypical antipsychotics [3]. In individuals with mixed depression, antidepressants are less effective [4] and may increase the likelihood of a switch to mania, and risk of suicide [3, 4]. Therefore, antidepressant prescriptions need to be handled with caution, and the use of antipsychotics and mood-stabilizers has been warranted [4]. Prior studies have shown that clinicians often fail to recognize symptoms from the opposite polarity in predominantly depressive or manic episodes [5]. This might be due to an over-reliance on the more predominant symptoms presented by the patient, and to the specific phenomenology of mixed episodes, including increased anxiety and emotional lability related to hyperarousal [1, 6], which has been thus far overlooked by classifications [7]. Moreover, the restrictive diagnostic thresholds for symptoms of opposite polarity in mania and depression may also be involved in the underdiagnosis of mixed episodes [8]. In order to increase diagnostic specificity, in the most recent version of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [9] symptoms shared by mania and depression (i.e., distractibility, irritability, and psychomotor agitation, or DIP) were excluded from the mixed symptoms specifier that can be applied to major depressive or hypomanic/manic episodes. However, DIP symptoms are commonly found during mixed episodes [7], and the DSM-5 criteria have been found to lack sensitivity, allowing the diagnosis of mixed depression for instance only in one out of four cases [8]. Recently, studies have shown that the presence of fewer concurrent symptoms from the opposite polarity, regardless of their overlapping nature—e.g., in the case of a depressive episode, a score above 2 on the Young Mania Rating Scale (YMRS) [10]— had better sensitivity and specificity than the current DSM-5 diagnostic criteria for mixed episodes “with mixed features” [2, 11]. There is a lack of consensus in the literature thus far on the diagnostic thresholds that could improve the detection accuracy 1 INSERM 1114, Strasbourg, France. 2University Hospital of Strasbourg, Strasbourg, France. 3Laboratoire de Psychologie des Cognitions, Université de Strasbourg, Strasbourg, France. 4Dipartimento di Ingegneria dell’Informazione, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy. 5Research Center “E. Piaggio”, University of Pisa, Largo L, Lazzarino 1, 56122 Pisa, Italy. 6Fédération de Médecine Translationnelle de Strasbourg, Université de Strasbourg, Strasbourg, France. ✉email: luisa.weiner@chru-strasbourg.fr Received: 29 September 2020 Revised: 5 July 2021 Accepted: 26 July 2021 L. Weiner et al. 1234567890();,: 2 of mixed episodes in BD. Here we investigated whether vocal features acquired via verbal fluency tasks (VFT) could accurately classify mixed states in BD using machine learning methods. To our knowledge, no large-scale studies have targeted the biomarkers that could aid the distinction between mixed and non-mixed BD episodes. Yet, recent studies have shown that automated methods relying on physiological parameters, e.g., speech prosody and heartbeat variability, could provide relevant information regarding mood states in BD [12, 13]. Voice acoustical analyses, in particular, may provide a powerful and easy to implement complement to the standard clinical interview for mood episodes in BD, as speech changes have been found to be sensitive and valid measures of depression and mania [12]. Specifically, through the automated use of speech parameters, promising results have been reported in the classification of depressive and manic episodes relative to euthymia [12, 14]. Very few studies have reported whether and how speech is modified during mixed episodes. Speech changes are characteristic of manic and depressive symptomatology, as assessed by specific items in clinician-rated scales such as the YMRS [10] (item 6 assesses speech rate and item 9 assesses loudness), and the Quick Inventory for Depressive Symptomatology (QIDS [15]; item 15 assesses speech rate), respectively. In depressive episodes, intonation has been typically described as flat, speech rate is slower than usual, and voice intensity and frequency are decreased [16]. Conversely, fewer studies have been conducted in mania, but faster than usual speech rate (i.e., pressure of speech), and increased voice volume are among the most commonly reported symptoms in mania [17], and have been hypothesized to be related to hyperarousal [14, 17]. In mixed episodes, a study from 1938 highlighted that, in one patient experiencing a mixed episode, pressure of speech was associated with vigorous articulatory movements, wide pitch range, fast speech tempo, and infrequent prosodic pauses [18]. From a psychoacoustic perspective, the correlates of such speech changes refer to mood-modulated prosodic and spectral features [14, 19]. According to Scherer’s [20] model, arousal, in particular, is associated with physiological changes in respiration, phonation, and articulation, which lead, in turn, to emotion and mood-specific patterns in acoustic parameters. For instance, speech fundamental frequency (F0), i.e., the inverse of the vocal folds opening and closing cycle period, which is related to the perceived pitch or voice tone, is used to voluntarily convey specific information to the listener, thus creating intonational events with language-specific meaning [14]. In addition to F0, long-term average spectrum (LTAS) has been described as a key feature involved in the differentiation of discrete emotions [21]. LTAS is related to voice quality, which is involved in how the listener perceives one’s voice as either creaky, breathy or tense [21]. Importantly, LTAS results from the speaker’s anatomy which determines both the width of the potential operating range, and its long-term muscular adjustments of the larynx or the supraglottic vocal tract [19]. Recently, Guidi et al. [14, 21] found that voice quality (LTAS) and prosodic features (F0 dynamics) were particularly sensitive and specific to longitudinally-evaluated nonmixed mood episodes in BD. In this study, we investigated specific speech parameters—i.e., voice quality (LTAS) and prosody (F0 dynamics and pauses)—that have been found to be modulated by mood changes using VFT. VFT are easy to administer, well-validated and widely used neuropsychological tests targeting spontaneous word production during an allotted time [22]. In VFT, subjects are instructed to generate words according to specified rules based on phonemic or semantic criteria (“letter” and “semantic” fluency, respectively), in the absence of a specified criterion (free word generation), or through the continuous association of words following a cue word (associational fluency) [22]. The latter unrestrictive conditions are closer to natural discourse, while they also circumvent the methodological pitfalls of natural speech, such as pragmatics and syntax [23]. In BD, word count has been found to be impaired in letter and semantic VFT in euthymia [24]. Moreover, in unconstrained VFT (free word generation), Weiner et al. [23] found an increased number of switches from one conceptual unit (word or cluster) to another in patients with mixed symptoms compared to non-mixed depressed patients. VFT seems thus to be a sensitive means of measuring mixed symptoms found in depression and mania. To our knowledge, studies focusing on whether and how vocal parameters could aid the identification of mixed episodes in BD are lacking. The present study aims at assessing the classification accuracy of mixed episodes, using a machine learning approach, via the automated analysis of voice quality (LTAS) and prosodic (F0 related features and pauses) vocal parameters obtained via different VFT conditions, i.e., differing in terms of their retrieval rules, but also, in associational tasks, in terms of the valence of the cue words [25]. Consistent with other studies which have used natural speech conditions for the classification of acute non-mixed states relative to euthymia [12], compared to a standard clinical assessment, we expected classification accuracy to be high in the detection of mixed relative to non-mixed episodes using vocal parameters in VFT. METHODS Participants Fifty-six patients ages 19–64 (M = 41.12, SD = 13.05) with BD were recruited from inpatient and outpatient clinics at the University Hospital of Strasbourg. Patients fulfilled the criteria for BD according to the DSM-5 [9]. Twenty-four patients had BD type 1, and thirty-two BD type 2. Patients with BD had no history of neurological disorder, ADHD, borderline personality disorder, or substance use disorder within the last 12 months. Mania and depression symptoms were assessed with the YMRS [10] and the QIDS-C16 [15]. Anxiety symptoms were assessed via the Beck Anxiety Inventory (BAI) [26]. Patients were considered to be in a predominantly depressive or manic/(hypo)manic episode if they fulfilled the DSM-5 criteria for either episode [9]. Given that mixed symptoms in depression might be observed with very few concurrent hypomanic symptoms [7, 11], our mixed depression group was defined based on the less restrictive criteria of Miller et al. [11]. In their study, mixed depression was operationalized as above threshold depressive symptoms (QIDS-C16 score >5), co-occurring with mild hypomanic symptoms (YMRS score >2 and <6). This cut-off has proved to be more sensitive to mixed depression than the DSM-5 criteria, as DIP features are not precluded from their criteria [11]. Moreover, we assessed anxiety via the BAI [26] because it has been consistently linked to mixed symptomatology in BD [1], and to speech changes in healthy individuals [20]. A YMRS score >5 was considered reflective of hypomania [27] and a QIDS-C16 score >5 was reflective of depression [15]. A mixed manic/ hypomanic label was applied if manic and depressive symptoms were above the cut-off [28]. Consistent with Miller et al. [11] mixed depression was operationalized as above the threshold depressive symptoms (QIDSC16 score >5), co-occurring with mild hypomanic symptoms (YMRS score >2 and <6). Nineteen patients were considered by clinicians as hypomanic and 8 had a mixed hypomanic episode. Among predominantly depressed patients, 17 were clinically assessed as mixed depressive and 12 had a pure depressive episode (cf. Table 1). Subjects provided written informed consent prior to inclusion in the study in accordance with the Declaration of Helsinki (ClinicalTrials.gov registration number: NCT02036606). This study was approved by the regional ethics committee of the East of France. Materials and procedure Following the aforementioned diagnostic clinical assessment, participants completed the Beck Anxiety Inventory (BAI) [26] prior to the administration of the VFT. Verbal fluency tasks. Nine different trials belonging to four conditions of VFT were administered: (i) one free fluency trial, (ii) six associational fluency trials, (iii) one letter fluency trial, and (iv) one category fluency trial. The four conditions of the VFT were administered in a fixed order starting with Translational Psychiatry (2021)11:415 L. Weiner et al. 3 Table 1. Demographic characteristics of the patient samples. Hypomania n = 19 Mixed hypomania n=8 Mixed Depression n = 17 Depression n = 12 Agea 37.58 (13.52) 42 (10.3) 42.12 (12.75) 44.83 (14.43) Sex (F/M) 12/7 4/4 14/3 8/4 YMRSa 12.58 (3.63) 9.25 (2.37) 4.37 (1.5) 0.83 (0.94) QIDS-C16a 2.37 (1.5) 9.75 (3.01) 12.37 (3.95) 12.5 (3.58) Lithium (% yes) 42% 12.5% 56% 50% Anti-epileptics (% yes) 42% 50% 56% 50% Antipsychotics (%yes) 63% 37.5% 37.5% 50% Antidepressants (%yes) 26.5% 37.5% 37.5% 66.5% Benzodiazepines (%yes) 15.5% 12.5% 25% 25% YMRS Young Mania Rating Scale, QIDS-C16 Quick Inventory of Depressive Symptomatology. a Mean and standard deviation the most unrestrictive condition [29]: first the free condition, followed by the six associational conditions, the letter condition, and the category condition. Participants’ oral production was recorded using the Audacity© software (44,100 Hz sampling frequency, 24-bit pulse code modulation). A microphone connected to a laptop was used. The microphone was kept approximately 60 cm far from the subjects’ mouth. The room was quiet with low reverberation levels. Free fluency condition. In the free fluency trial, participants were asked to produce as many words as possible, with their eyes closed, during 150 s [30]. Associational fluency conditions. Subjects were orally presented with an initial cue word, and they were asked to produce words, during 120 s, with their eyes closed, following the presentation of the initial word. Prior to the task trials, subjects were provided with an example (i.e., the word “glass”). Two types of inductive words were used, i.e., concrete and abstract nouns. For each type, three disyllabic words were chosen, with a negative, neutral or positive emotional valence (for concrete words, “snake”, “kingdom”, and “swimming pool”, respectively; for abstract words, “pain”, “beginning”, and “courage”, respectively). Words within each triplet were matched in terms of their film subtitle-based frequency in French [31], their concreteness [32], and emotional valence ratings among French native speakers [33]. The six inductive words were presented in random order. Letter fluency condition. Subjects were asked to produce as many words as possible starting with the letter ‘p’, with the exception of proper nouns, during 120 s [29]. Semantic fluency condition. Participants were asked to produce as many words as possible belonging to a specific category, i.e., “animals”, during 120 s [29]. Extraction of vocal features Speech features related both to prosodic information and voice quality were obtained. The analysis was based on a two-step procedure. First, single words were selected using a voice activity detection algorithm, then speech features were calculated for each word. Word detection. The word detection algorithm analyzes the energy of the audio signal as well as its temporal and spectral features. Specifically, it consists of a modification of the signal intensity and zero crossing rate that checks whether the word candidate comprises voiced sounds, according to a spectral matching procedure based on the Camacho SWIPE’ algorithm [34]. This algorithm compares the spectral content of the audio signal with a spectral template of a sawtooth waveform, mimicking the glottal source signal characteristics. The resulting estimates consist of F0 (pitch) and the strength of the spectral matching (cf. Table 2). We then analyzed signal intensity, zero crossing rate, and spectral strength, to detect single words. Speech feature estimation. We estimated specific features related to prosody (F0 related features and pauses) and voice quality (LTAS-related features; see Table 2). Translational Psychiatry (2021)11:415 Prosodic features were obtained after word segmentation, by calculating pauses between words as well as word length, and by estimating F0 dynamics (Fig. 1). F0 dynamics (temporal windows of 10 milliseconds) were obtained using the SWIPE’ algorithm. For each word, summary statistics of F0, e.g., median and median absolute deviation, were estimated. Nine features describing the F0 contour for each word were also analyzed. These features correspond to Taylor’s [35] tilt intonational model and describe the relative size and duration of intonational events (Fig. 1). We extended the use of these features to all the voiced segments within each word [14]. The resulting features, Amplitude*, Duration* and Tilt*, are described in Table 2. These features refer to the shape of the F0 contour within each voiced segment, which allow for the analysis of F0 dynamics, i.e., both rising and falling phases within each voiced segment [14] (Table 2). Voice quality features were obtained by estimating the LTAS using F0correction [36]. LTAS is used to identify long-term muscular settings of the larynx and the vocal tract, that deviate from neutral settings [19]. Given that LTAS is estimated using a moving time window approach applied to the voice signal, the obtained amplitude spectra are averaged over all windows. F0 correction reduces the influence of F0 on spectral characteristics, such as the articulatory movements involved in LTAS. Using this procedure, we aimed at minimizing the overlap between F0- and LTAS-related speech features (see Fig. 2). To estimate F0-corrected LTAS, we modulated the size and the location of the moving window in order to analyze a single glottal cycle for each window using the DYPSA algorithm [37]. The LTAS was estimated using a frequency resolution of 150 Hz, within the whole frequency range (i.e., 0–2,2050 Hz). The selected LTAS features are described in Table 2. Statistical analyses Statistical analyses were defined in advance and were performed by researchers who were blind to the diagnostic status of patients. In order to select the most relevant vocal parameters for the classification algorithm, we calculated Spearman rho coefficients between clinical questionnaire scores—i.e., YMRS total, QIDS-C16, and BAI—and vocal measures obtained in the nine VFT within two sample of patients—manic and mixed manic, and depressed and mixed depressed. The VFT and the vocal features that were correlated to the clinical measures were then entered in our classification algorithm. A support vector machine (SVM) classifier [38] was used to train discriminative models using the extracted measures. The task consisted of a binary classification, where a model was trained to discriminate between two distinct acute episodes, i.e., manic versus mixed manic and depression versus mixed depression. The leave-one-subject-out (LOSO) strategy was used to train the classifier: LOSO consists of removing all the observations related to a subject (validation set) and train the classifier on the remaining observations (training set). The subject is then classified, and the predicted label is compared with the clinical scoring. This operation is repeated for the remaining subjects in the dataset until a predicted label is obtained for each subject. Before the training and the test steps, each speech feature underwent a subject-specific normalization step, to remove subjectspecific speech characteristics, such as gender or vocal tract signature. The MadF0, i.e. the median absolute deviation of F0, obtained for each task was L. Weiner et al. 4 Table 2. Features used for the analysis of speech signals. Feature name Feature category Definition Meaning MedianF0 Prosodic Median of F0 values estimated within each word Central tendency of voiced sound fundamental frequency MadF0 Prosodic Median Absolute Deviation of F0 values estimated within each word Dispersion index of voiced sound fundamental frequency Amplitute prosodic Duration Prosodic Arise jAfall j Arise þjAfall j Drise Dfall Drise þDfall Amplitude þDuration 2 Arise Drise jAfall j Dfall jAfall j Arise Drise þ Dfall Arise jAfall j Drise þDfall Tilt* Prosodic PosSlope Prosodic AbsNegSlope Prosodic Relative size of F0 rising and falling phase amplitudes Relative size of F0 rising and falling phase durations Mean of Amplitude* and Duration* features steepness of the F0 contour during rising phase steepness of the F0 contour during falling phase Sum of PosSlope and AbsNegSlope SumDer Prosodic GlobalSlope Prosodic Mean_Pause Prosodic The mean across a VFT of pause lengths between two consecutive words Position index of pause length distribution Std_Pause Prosodic The standard deviation across a VFT of pause lengths between two consecutive words Dispersion index of pause length distribution Mean_Speech Prosodic The mean word length across a VFT Position index of word length distribution Std_Speech Prosodic The standard deviation of word length across a VFT Dispersion index of word length distribution LTAS_F_median Voice quality the median frequency of a power spectrum divides the total power in two halves LTAS shape feature. Central tendency index of the LTAS spectrum distribution. Relative Contribution of high and low frequencies LTAS_A_median Voice quality The amplitude of the LTAS spectrum corresponding to LTAS_F_median LTAS_Max_A Voice quality The maximum amplitude of LTAS spectrum LTAS_Max_A_F Voice quality The frequency values corresponding to LTAS_Max_A LTAS shape feature. Its value is expected to be lower than LTAS_F_median LTAS_slope Voice quality LTAS Max A LTAS A median LTAS Max A F LTAS F median LTAS shape feature: it is related to the slope of the LTAS spectrum between the peak and the amplitude corresponding to median frequency. The lower (negative), the smaller the contribution of higher frequencies LTAS_Ratio_Max Voice quality LTAS Max A LTAS Max A F LTAS shape feature: it is related to the slope of the LTAS spectrum between the origin and the LTAS peak. Given that the maximum peak is at low frequencies, it weights amplitude of lower frequencies LTAS_Ratio_Median Voice quality LTAS A median LTAS F median LTAS shape feature: it is related to the slope of the LTAS spectrum between the origin and the value corresponding to the median frequency. normalized through the use of the median value of F0. This was used to compensate for higher values of F0 variability at higher median values of F0 (measured in Hz) due to intra-individual speaker characteristics. The remaining features were normalized using a z-score normalization, i.e., subtracting the mean and dividing by the standard deviation estimated across each of the 6 tasks. To account for the potential effect of drugs on classification results, we calculated the chlorpromazine equivalent doses of the antipsychotic drugs, as antipsychotics have been found to have a significant effect on speech [39], and they are the first-line treatment for mixed episodes in BD [4]. Chlorpromazine equivalent dose was hence adopted as descriptor, and as an input to the classifiers along with the speech features. Classical measures of classification performance were estimated, and the confusion matrix was reported. Specifically, positive and negative predictive values, indicated as PPV and NPV respectively, as well as sensitivity and specificity were estimated. Moreover, we also considered three complementary measures of classification: i.e., accuracy, F1 score, and Matthews correlation coefficient (MCC). Accuracy is defined as the sum of true positive and true negative results divided by the total positive and negative results. F1 score considers both the precision and the recall of the test to compute the score: precision refers to the number of correct F0 slope between the first and the final F0 values in each voiced segment positive results divided by the total number of positive results returned by the classifier, and recall is the number of correct positive results divided by the number of samples that should have been identified as positive. The different measures were estimated using mixed symptoms as a target (i.e., positive state). The confusion matrix describes the number of classified and misclassified subjects in both categories. The confusion matrix thus allows to have a quick and detailed view of the classification performances. The Matthews correlation coefficient (MCC) was used as a compact descriptor of the confusion matrix results, measuring the relationship between the observed and the ideal results. MCC is not biased by the number of observations nor does it depend on the choice of the target (i.e., positive state) [40]. MCC higher than 0.5 was expected, reflecting a strong positive relationship between the actual and the ideal classification results. MCC offers a robust measure of classification performance and can complete the information gained by the accuracy and the F1 score. In fact, it can be used in the case of unbalanced data, that is, when the number of subjects in the two groups are different, whereas accuracy can be biased by the results in the group with more subjects [40]. Moreover, MCC considers the overall performances related to the classification in both groups, while F1 is related to precision and recall and does not consider true negatives. Translational Psychiatry (2021)11:415 L. Weiner et al. 5 Fig. 1 F0 dynamics of the word "courage" and Taylor’s (2000) tilt model. Upper. The time course of audio signal related to the French word “courage” along with its phonetic transcription. Fricative voiceless sounds are characterized by more rapid changes with respect to voiced sound (central part of the word). For voiced sounds, the fundamental frequency (F0) can be estimated and its time contour is shown in red. Lower. Taylor’s (2000) tilt model, whereby the falling phase is present thus resulting in geometric parameters as Duration*, and Amplitude* equal to −1. RESULTS Demographic and clinical data Four patients were not taking any psychotropic medication at the time of the assessment. Of the remaining 52 patients, 45.3% were taking lithium, 53.9% were prescribed antiepileptic drugs, 61.5% were taking antipsychotics, 44.2% were on antidepressants, and 23.1% were taking benzodiazepines. Detailed demographic data are presented in Table 1. Speech feature and task selection In the mixed manic and manic groups (n = 27), higher depression scores, measured by the QIDS-c16, were correlated to elevated voice quality parameters (LTAS_A_median, LTAS_Max_A_F, and LTAS_Ratio_Median) on the ‘snake’ condition only. In the mixed depression and depression groups (n=29), higher YMRS total score was correlated to higher median F0, higher F0 variability as expressed by MadF0 (the dispersion of voiced sound fundamental frequency) and higher Tilt* (mean of amplitude and duration) feature on the semantic, the letter, as well as on several conditions of the associational VFT. Given these results, the selected features for the mixed versus non-mixed algorithm were respectively for predominantly (i) manic and (ii) depression groups: (i) LTAS measures obtained in the “snake” associational VFT condition, and (ii) MedianF0, MadF0, Duration*, in the semantic and “beginning” associational VFT conditions. Classification results Among the predominantly depressed patients, mixed depression was correctly classified in 15 cases, and 2 patients were misclassified as depressed. Pure depression, on the other hand, was correctly classified in 9 cases, and misclassified in 3 of them. The resulting accuracy and F1 scores were very high, i.e., respectively 0.83 and 0.86, and the MCC was equal to 0.64, revealing a good classification performance. Correctly classified mixed depression cases had, on average, higher anxiety scores, as measured by the BAI, compared to misclassified cases of mixed depression. One misclassified case had a predominantly irritable clinical presentation (YMRS score on the irritability item of 2), Translational Psychiatry (2021)11:415 whereas the other was more agitated (YMRS score on the agitation item of 2). Classification and descriptive results on clinical measures are presented in Tables 3 and 4. Among predominantly hypomanic patients, mixed mania was correctly classified in 6 cases, and misclassified in 2. Pure manic/ hypomanic states were correctly classified in 17 cases and misclassified in 2. The accuracy and F1 scores were high, i.e., 0.86 and 0.75, respectively. The good performance of the classifier, as applied to the unbalanced dataset, was confirmed by the MCC whose result was 0.57. Correctly classified mixed manic/hypomanic, but also pure manic/hypomanic states, had, on average, higher anxiety scores on the BAI, compared to misclassified cases. Dysphoric psychotic features were part of the ongoing episode in the two misclassified mixed manic cases. Classification and descriptive results on the clinical measures are presented in Tables 5 and 6. The analyses were repeated using information about antipsychotic medication (chlorpromazine equivalent dose) as a predictor. The classification results did not improve when medication was added to the input features of the classifier. As a matter of fact, the best classification performances were not obtained when any of the selected speech features was substituted by medication. DISCUSSION Our classification results suggest that vocal features obtained through automated methods in VFT are sensitive measures of manic and depressive symptoms in BD, even in their milder forms, such as subthreshold hypomanic symptoms concurrent with a depressive episode (i.e., mixed depression). Strikingly, ours is the first study to show that classification accuracy using vocal acoustic measures is not only high in acute relative to euthymic states [12, 14, 21], but also in mixed relative to non-mixed acute episodes of the same polarity. These results bear important implications for clinicians, given the high rates of misdiagnoses of mixed states in clinical settings [1], and the higher risk of suicide associated with mixed episodes [1, 3]. L. Weiner et al. 6 Fig. 2 Long-term average spectrum (LTAS) estimation strategy and example. Upper. Long-term average spectrum estimation strategy. Lower. An example of LTAS. The features, provided in Table 2, were identified to parsimoniously describe the LTAS shape. Our accuracy results are higher than those previously reported in the classification of hypomanic or depressive episodes relative to euthymia based on natural speech conditions obtained via smartphone [12, 41]. Such is particularly the case for the classification of depression relative to euthymia, whose accuracy results were found to be lower (i.e., 0.68) than those reported for hypomanic states (i.e., 0.74) in the largest study conducted to date, with 28 patients with BD [12]. In our study which had the largest sample size to date (n = 56), classification performances were very high for both mixed depression and mixed mania, suggesting that automated methods relying on specific prosodic and spectral features can correctly classify most cases of patients presenting with mixed symptoms. Specifically, in the hypomanic groups, where the number of subjects differed considerably, it is important to highlight the good classification performance found with the MCC, which, unlike accuracy, is unbiased by unequal sample sizes. These results may seem surprising, inasmuch as classification accuracy could be expected to be higher when identifying the presence relative to the absence of symptoms (i.e., acute versus euthymic states), rather than the presence of subthreshold manic and depressive symptoms in acute episodes of opposite polarity. The selection of vocal parameters used as classifiers, the specific phenomenology of mixed states, but also the manner through which voice samples were acquired might explain these results. Firstly, in our study, we focused on a relatively small set of vocal features which had been consistently linked to emotion and mood-modulated speech changes in healthy individuals [20], and in some studies conducted in individuals with BD [14, 21]. Other studies have used automatic systems to produce a high dimensional description of the voice signal, but they did not specify which features contributed to the classification results [12, 41]. Hence, the underlying vocal and clinical mechanisms involved in the results could not be fully interpreted. LTAS (voice quality), for instance, has been suggested to highlight different vocal tract settings across different mood states [21]. In our study, subjects with depressive symptoms concurrent with hypomania (i.e., mixed hypomania) showed larger amplitude values of low frequency formants as well as a flatter spectrum. This pattern of results differs from some reports in unipolar depression whereby a decrease of second and third formant was found with increasing depression severity [42, 43]. However, consistent with our results, in a longitudinal study in BD [21], larger amplitude of high frequency components was found in depression compared to euthymia. Hence, it is possible that LTAS features in bipolar depression differ from those found in unipolar depression, on the one hand, and that depressive symptoms concurrent with hypomania (i.e., mixed hypomania) are characterized by a flatter spectrum akin to bipolar depression [43]. In predominantly depressive episodes, concurrent hypomanic symptoms were associated with more variable intonation, as reflected by larger values of tilt features, such as Duration*. Given that the opposite pattern of results—i.e., lower median pitch and flat intonation—has been reported in patients with non-mixed depression [16, 20], our results are the first to show that voice acoustic measures are different in mixed depression compared to pure depression [18]. Specifically, more dynamic intonation characteristics (translated by tilt) were found in mixed depression compared to pure depression [18]. Given that increased self-rated anxiety was significantly correlated to higher fundamental frequency (F0; pitch) values, it is likely that increased arousal in mixed depression is involved in the more dynamic intonation found in mixed relative to non-mixed depression [1]. In addition to heightened anxiety, different clinical symptoms might be related to speech peculiarities in mixed episodes. It should be noted that, in our sample, anxiety scores were higher in mixed relative to non-mixed episodes, but also in correctly classified versus misclassified mixed cases. This suggests that Translational Psychiatry (2021)11:415 L. Weiner et al. 7 Table 3. Classification results in depression groups and descriptive clinical measures (mean and SD). Classification Depression n Correct Depression Incorrect Mixed depression Correct Mixed depression Incorrect 9 YMRS QIDS-C16 BAI 0.78 (0.97) 12.67 (3.77) 18.33 (6.12) 3 1 (1) 15 4.43 (1.60) 2 4 (0) 12 (3.60) 16.67 (13.05) 12.14 (4.11) 29.14 (11.75) 14 (2.83) 16.50 (13.43) YMRS Young Mania Rating Scale, QIDS-C16 Quick Inventory for Depressive Symptomatology, BAI Beck Anxiety Inventory. Table 4. Classifier performance measures for depression groups (mixed symptoms as target). NPV PPV Spec Sens Acc F1 MCC 0.82 0.83 0.75 0.88 0.83 0.86 0.64 NPV negative predictive value, PPV positive predictive value, Spec specificity, Sens sensitivity, Acc accuracy, F1 F1 score, MCC Matthew’s correlation coefficient. increased anxiety might be at least partially involved in the vocal changes that are determinant for the classification of mixed relative to non-mixed patients [1, 44]. These findings are consistent with those from a number of studies showing increased fundamental frequency (F0) (i.e., higher pitch), in patients with anxiety disorders [44], and in healthy anxious individuals [20]. However, the phenomenology of mixed episodes is polymorphous, and might also encompass non-anxious forms. Heterogeneous clinical presentations among patients with mixed episodes might thus explain some of our results. Indeed, according to a study using principal component analysis by Perugi et al. [45], anxiety is a clinical dimension found in most, but not all, subtypes of severe mixed episodes in BD; these include, for instance, predominantly agitated-irritable mixed depression, anxious-irritable-psychotic mania, and retarded-psychotic mixed depression. In healthy individuals, Laukka et al. [46] found that irritability and resignation were associated with specific changes in F0 (pitch), voice quality (LTAS), and voice intensity measures. Given that our misclassified mixed patients were less anxious but had more psychotic features in the two mixed manic patients, and were more irritable and agitated, for the two misclassified mixed depression cases, it is possible that these clinical dimensions are involved in our results. Hence it seems particularly important to increase our understanding of the specific phenomenology of mixed symptoms in order to improve diagnostic accuracy in clinical settings [7, 11]. In our study, instead of applying the DSM-5 criteria [9], we used a data-driven approach based on Miller et al. [11] and Suppes et al. [28] to determine whether patients were in a mixed episode or not. This approach has been favored in a number of recent studies which argue for the use of a lessrestrictive diagnostic algorithm for the diagnosis of mixed states, including overlapping symptoms (distractibility, irritability, and psychomotor agitation) [7, 11, 45, 47]. While the algorithms we used were less restrictive than the ones proposed by the DSM-5, there is an ongoing debate whether even less restrictive approaches focusing on specific features should be favored [7]. It is noteworthy that incorrectly classified pure depression individuals (as mixed depression) had a very low YMRS (hypomanic) score of 1. In a recent study [47], a YMRS score of 1 in subjects with bipolar depression was associated with increased racing thoughts, and a mixed-suggestive clinical picture characterized by hyperarousal. Hence, it is possible that misclassified subjects with ‘pure depression’, who had a YMRS score of only 1, might in fact have a mixed-suggestive clinical presentation, including mild vocal changes that were captured here via automated methods. Translational Psychiatry (2021)11:415 Another aspect that might have contributed to our results is the task we used. Unlike studies using natural speech recorded during phone calls [12, 41], which relied on long and diverse speech samples, our vocal measures were obtained through single verbal fluency trials. Most studies conducted in people with BD have focused on the use of free speech samples (i.e., recorded conversations) [12, 41]. This procedure has the advantage of being more ecologically valid than other widely used procedures (e.g., reading), as speech is captured either during a monologue (i.e., describing a memory) or a dialogue (i.e., social interaction) [48]. However, it also has disadvantages, as it requires the acquisition of a large amount of data (e.g., hundreds of hours of recorded conversations) due to the high contextual dependence of the data (e.g., kind of interaction or text read, and type of device used) [49]. Moreover, free speech often relies on the recording of personal conversations, and this might raise more ethical concerns than the use of standardized cognitive tasks [48]. Both issues might limit the use of these tasks in clinical settings. Conversely, VFT are an economic and standardized means for measuring language and speech production which circumvent some of the pragmatic and syntax confounders inherent to free speech [22]. Importantly, VFT have been widely used in BD, and have been found to tackle the semantic abnormalities that characterize manic speech [23, 24]. Moreover, a growing body of evidence suggests that there are moderate-to-strong correlations between laboratory-based cognitive performances, such as those acquired here via VFT, and performances acquired in natural settings [50]. In our study, three VFT conditions, lasting 120 s each, were selected as classifiers—i.e., two associational conditions and the semantic condition of VFT. Associational and semantic conditions of VFT tackle how words stored in semantic memory are retrieved in a relatively spontaneous fashion [25]. Word count in the semantic VFT has been found to be disproportionately impaired in BD [24], which has been linked to functional semantic abnormalities [23]. Anomalous word retrieval based on semantic cues might thus have interacted with some of the speech changes relevant for mood state classification (i.e., interjections, intonation, and voice quality changes). However, given that speech rate and pause duration (i.e., a proxy of word count) were not among the features correlated with depressive and manic symptoms in our study, it seems unlikely that semantic abnormalities alone could subtend the vocal changes that allowed the classification of mixed versus non-mixed episodes. Consistent with one previous study that found that positive affective category cues were related to greater number of words in VFT in euthymia relative to healthy controls [25], we found here that emotion category cues had an impact on the voice quality features (LTAS) of word production in BD. Indeed, in the manic/hypomanic groups, higher depression scores were correlated to voice quality (LTAS) features on the negative emotion category only (‘snake’ condition), highlighting a possible interaction between mood-discrepant emotion and word output in individuals with manic symptoms. There are some limitations to our study. First, given the small number of misclassified patients, our interpretations are based on a small sample of patients. The limited amount of data also affects L. Weiner et al. 8 Table 5. Classification results in manic groups and descriptive clinical measures (mean and SD). Classification n YMRS QIDS-C16 BAI Mania Correct 17 12.82 (3.56) 2.47 (1.54) 14.94 (9.84) Mania Incorrect 2 10.50 (4.95) 1.50 (0.71) Mixed mania Correct 6 9.50 (2.74) 9.33 (3.44) 22.20 (10.42) Mixed mania Incorrect 2 8.50 (0.71) 11 (0) 18.5 (12.01) 2 (2.82) YMRS Young Mania Rating Scale, QIDS-C16 Quick Inventory for Depressive Symptoms-Clinician version, BAI Beck Anxiety Inventory. Table 6. Classifier performance measures for manic groups (mixed symptoms as target). NPV PPV Spec Sens Acc F1 MCC 0.89 0.75 0.89 0.75 0.86 0.75 0.57 NPV negative predictive value, PPV positive predictive value, Spec Specificity, Sens sensitivity, Ac Accuracy, F1 F1 score, MCC Matthew’s correlation coefficient. the generalizability of our classifier: in fact, we could not split the data in a training set, validation set, and test set. However, this issue was alleviated by the leave one subject out (LOSO) cross validation procedure that we used, which provides a good estimate of classifier performance [51]. Cross-sectional studies with larger samples are thus needed to further investigate the relationship between specific mood dimensions and classification results in BD. Second, the effect of medication on voice parameters in BD is still largely unknown, although antidepressant medication has been linked to greater pitch variability and improved speech tempo in unipolar depression [16]. Further studies assessing this particular domain are needed. Third, the predictive value of acoustic measures in detecting mixed symptoms in clinical contexts is still unknown but should be valuable for the follow-up of patients and the assessment of treatment response and risk of suicide [52, 53]. This is particularly relevant in the context of mixed episodes, as they are associated with an increased risk of suicide [1, 3]. Studies with a longitudinal design are hence warranted to address this point. Relatedly, thus far, studies in BD focused on voice data acquired over longer time periods, hence allowing to model intrasubject changes. This has proved useful to track mood changes (euthymia relative to depression or mania) [12, 14, 21], to build personalized models, and to investigate the long-term development of the illness [54, 55]. Studies with a longitudinal design are needed in order to address the question of whether speech samples acquired via VFT over several time points can aid the classification of mood states in BD, including mixed episodes. Fourth, VFT speech features were acquired in a laboratory setting, and the feasibility and the utility of this task in tracking mood changes in natural settings remain to be tested [48]. In sum, we found high rates of correctly classified subjects based on prosodic and spectral features obtained in three conditions of VFT. Our results suggest that VFT can be a valid and economic means of acquiring speech samples in patients with BD. More specifically, voice quality, pitch, and intonation measures acquired via VFT appear to be reliable and informative potential biomarkers regarding mixed symptoms in acute episodes of BD. Studies should consider investigating the additive value of combining semantic and speech measures in VFT to the classification of acute mood states in BD. Since most mixed cases are undiagnosed in clinical settings [1, 5], and are associated with an increased risk of suicide [1, 3], vocal features quickly acquired via VFT have the potential to complement the clinical assessment of patients presenting with a mood episode and improve the clinical management of mixed states. DATA AVAILABILITY The data that support the findings of this study are available from the corresponding author upon request. REFERENCES 1. Swann AC, Lafer B, Perugi G, Frye MA, Bauer M, Bahk WM, et al. Bipolar mixed states: an international society for bipolar disorders task force report of symptom structure, course of illness, and diagnosis. Am J Psychiatry. 2013;170:31–42. 2. Suppes T, Eberhard J, Lemming O, Young AH, McIntyre RS. Anxiety, irritability, and agitation as indicators of bipolar mania with depressive symptoms: a post hoc analysis of two clinical trials. Int J Bipolar Disord. 2017;5:36. 3. Perugi G, Quaranta G, Dell’Osso L. The significance of mixed states in depression and mania. Curr Psychiatry Rep. 2014;16:486. 4. Stahl SM, Morrissette DA, Faedda G, Fava M, Goldberg JF, Keck PE, et al. Guidelines for the recognition and management of mixed depression. CNS Spectr. 2017;22:203–19. 5. Goldberg JF, Perlis RH, Bowden CL, Thase ME, Miklowitz DJ, Marangell LB, et al. Manic symptoms during depressive episodes in 1,380 patients with bipolar disorder: findings from the STEP-BD. Am J Psychiatry. 2009;166:173–81. 6. Cassidy F. Anxiety as a symptom of mixed mania: implications for DSM-5. Bipolar Disord. 2010;12:437–9. 7. Sani G, Vöhringer PA, Barroilhet SA, Koukopoulos AE, Ghaemi SN. The Koukopoulos mixed depression rating scale (KMDRS): an International Mood Network (IMN) validation study of a new mixed mood rating scale. J Affect Disord. 2018;232:9–16. 8. Perugi G, Angst J, Azorin JM, Bowden CL, Mosolov S, Reis J, et al. Mixed features in patients with a major depressive episode: the BRIDGE-II-MIX study. J Clin Psychiatry. 2015;76:e351–358. 9. APA. Diagnostic and Statistical Manual of Mental Disorders (DSM-5®). American Psychiatric Pub, 2013. 10. Young RC, Biggs JT, Ziegler VE, Meyer DA. A rating scale for mania: reliability, validity and sensitivity. Br J Psychiatry 1978;133:429–35. 11. Miller S, Suppes T, Mintz J, Hellemann G, Frye MA, McElroy SL, et al. Mixed depression in bipolar disorder: prevalence rate and clinical correlates during naturalistic follow-up in the stanley bipolar network. Am J Psychiatry. 2016;173:1015–23. 12. Faurholt-Jepsen M, Busk J, Frost M, Vinberg M, Christensen EM, Winther O, et al. Voice analysis as an objective state marker in bipolar disorder. Transl Psychiatry. 2016;6:e856. 13. Faurholt-Jepsen M, Kessing LV, Munkholm K. Heart rate variability in bipolar disorder: A systematic review and meta-analysis. Neurosci Biobehav Rev. 2017;73:68–80. 14. Guidi A, Vanello N, Bertschy G, Gentili C, Landini L, Scilingo EP. Automatic analysis of speech F0 contour for the characterization of mood changes in bipolar patients. Biomed Signal Process Control. 2015;17:29–37. 15. Rush AJ, Trivedi MH, Ibrahim HM, Carmody TJ, Arnow B, Klein DN, et al. The 16item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biol Psychiatry. 2003;54:573–83. 16. Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ. Voice acoustical measurement of the severity of major depression. Brain Cogn. 2004;56:30–35. 17. Zhang J, Pan Z, Gui C, Xue T, Lin Y, Zhu J, et al. Analysis on speech signal features of manic patients. J Psychiatr Res. 2018;98:59–63. 18. Newman S, Mather VG. Analysis of spoken language of patients with affective disorders. Am J Psychiatry. 1938;94:913–42. 19. Gobl C. Nı ́ Chasaide A. The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 2003;40:189–212. 20. Scherer KR. Vocal correlates of emotional arousal and affective disturbance. In: Handbook of social psychophysiology. John Wiley & Sons: Oxford, England, 1989, 165-97. Translational Psychiatry (2021)11:415 L. Weiner et al. 9 21. Guidi A, Schoentgen J, Bertschy G, Gentili C, Landini L, Scilingo EP, et al. Voice quality in patients suffering from bipolar disease. Conf Proc Annu Int Conf IEEE Eng Med Biol Soc IEEE Eng Med Biol Soc. Annu Conf. 2015;2015:6106–9. 22. Lezak MD. Neuropsychological Assessment. Oxford University Press, 2004. 23. Weiner L, Doignon-Camus N, Bertschy G, Giersch A. Thought and language disturbance in bipolar disorder quantified via process-oriented verbal fluency measures. Sci. Rep. 2019;9:14282. 24. Raucher-Chéné D, Achim AM, Kaladjian A, Besche-Richard C. Verbal fluency in bipolar disorders: a systematic review and meta-analysis. J Affect Disord. 2017;207:359–66. 25. Rossell SL. Category fluency performance in patients with schizophrenia and bipolar disorder: The influence of affective categories. Schizophr Res. 2006;82:135–8. 26. Beck AT, Epstein N, Brown G, Steer RA. An inventory for measuring clinical anxiety: psychometric properties. J Consult Clin Psychol. 1988;56:893–7. 27. Favre S, Aubry JM, Gex-Fabry M, Ragama-Pardos E, McQuillan A, Bertschy G. [Translation and validation of a French version of the Young Mania Rating Scale (YMRS)]. L’Encephale. 2003;29:499–505. 28. Suppes T, Mintz J, McElroy SL, Altshuler LL, Kupka RW, Frye MA, et al. Mixed hypomania in 908 patients with bipolar disorder evaluated prospectively in the Stanley Foundation Bipolar Treatment Network: a sex-specific phenomenon. Arch. Gen. Psychiatry. 2005;62:1089–96. 29. Cardebat D, Doyon B, Puel M, Goulet P, Joanette Y. [Formal and semantic lexical evocation in normal subjects. Performance and dynamics of production as a function of sex, age and educational level]. Acta Neurol Belg. 1990;90:207–17. 30. Yves J, Yves J, Laura M. Impacts d’une lésion cérébrale droite sur la communication verbale <br/>Impact of a right-hemisphere lesion on verbal communication <br/>. Rééduc Orthophonique. 2004;42:9–26. 31. New B, Brysbaert M, Veronis J, Pallier C. The use of film subtitles to estimate word frequencies. Appl Psycholinguist. 2007;28:661–77. 32. Desrochers A, Bergeron M. Norms of subjective frequency of use and imagery for a sample of 1,916 French nouns. Can J Exp Psychol Rev Can. Psychol Exp. 2000;54:274–325. 33. Syssau A, Font N. Évaluations des caractéristiques émotionnelles d’un corpus de 604 mots. Bull Psychol. 2012;Numéro477:361–7. 34. Camacho A, Harris JG. A sawtooth waveform inspired pitch estimator for speech and music. J Acoust Soc Am. 2008;124:1638–52. 35. Taylor P. Analysis and synthesis of intonation using the Tilt model. J Acoust Soc Am. 2000;107:1697–714. 36. Nordenberg M, Sundberg J. Effect on LTAS of vocal loudness variation. Logop Phoniatr Vocol. 2004;29:183–91. 37. Naylor P, Kounoudes A, Gudnason J, Brookes M. Estimation of glottal closure instants in voiced speech using the dypsa algorithm. Audio Speech Lang Process, IEEE Trans. 2007;15:34–43. 38. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Their Appl. 1998;13:18–28. 39. de Boer JN, Voppel AE, Brederoo SG, Wijnen FNK, Sommer IEC. Language disturbances in schizophrenia: the relation with antipsychotic medication. NPJ schizophrenia. 2020;6:1–9. 40. Boughorbel S, Jarray F, El-Anbari M. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 2017;12. https://doi.org/ 10.1371/journal.pone.0177678. 41. Karam ZN, Provost EM, Singh S, Montgomery J, Archer C, Harrington G, et al. Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech. Proc IEEE Int Conf Acoust Speech Signal Process Spons Inst Electr Electron Eng Signal Process Soc ICASSP Conf. 2014;2014:4858–62. 42. Hargreaves WA, Starkweather JA, Blacker KH. Voice quality in depression. J Abnorm Psychol. 1965;70:218–20. 43. Flint AJ, Black SE, Campbell-Taylor I, Gailey GF, Levinton C. Acoustic analysis in the differentiation of Parkinson’s disease and major depression. J. Psycholinguist Res. 1992;21:383–9. 44. Weeks JW, Lee CY, Reilly AR, Howell AN, France C, Kowalsky JM, et al. ‘The Sound of Fear’: assessing vocal fundamental frequency as a physiological indicator of social anxiety disorder. J Anxiety Disord. 2012;26:811–22. 45. Perugi G, Medda P, Reis J, Rizzato S, Giorgi Mariani M, Mauri M. Clinical subtypes of severe bipolar mixed states. J Affect Disord. 2013;151:1076–82. Translational Psychiatry (2021)11:415 46. Laukka P, Neiberg D, Forsell M, Karlsson I, Elenius K. Expression of affect in spontaneous speech: acoustic correlates and automatic detection of irritation and resignation. Comput Speech Lang. 2011;25:84–104. 47. Weiner L, Ossola P, Causin JB, Desseilles M, Keizer I, Metzger JY, et al. Racing thoughts revisited: a key dimension of activation in bipolar disorder. J. Affect Disord. 2019;255:69–76. 48. Low DM, Bentley KH, Ghosh SS. Automated assessment of psychiatric disorders using speech: a systematic review. Laryngoscope Investigative Otolaryngol. 2020;5:96–116. 49. Horwitz R et al. On the relative importance of vocal source, system, and prosody in human depression. IEEE International Conference on Body Sensor Networks 2013:1–6. 50. Moore RC, Campbell LM, Delgadillo JD, Paolillo EW, Sundermann EE, Holden J, et al. Smartphone-based measurement of executive function in older adults with and without HIV. Arch Clin Neuropsychol. 2020;35:347–57. 51. Abu-Mostafa YS, Magdon-Ismail M, Lin HT. Learning from data, Vol. 4. AMLBook, 2012. 52. Perugi G, Quaranta G, Dell’Osso L. The significance of mixed states in depression and mania. Curr Psychiatry Rep. 2014;16:486. 53. France DJ, Shiavi RG, Silverman S, Silverman M, Wilkes M. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng. 2000;47:829–37. https://doi.org/10.1109/10.846676. 54. Kessing LV, Munkholm K, Faurholt-Jepsen M, Miskowiak KW, Nielsen LB, FrikkeSchmidt R, et al. The bipolar illness onset study: research protocol for the BIO cohort study. BMJ Open 2017;7:e015462. 55. Arevian AC, Bone D, Malandrakis N, Martinez VR, Wells KB, Miklowitz DJ, et al. Clinical state tracking in serious mental illness through computational analysis of speech. PLoS ONE. 2020;15:e0225695. ACKNOWLEDGEMENTS None. COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION Correspondence and requests for materials should be addressed to L.W. Reprints and permission information is available at http://www.nature.com/ reprints Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. © The Author(s) 2021