Article
Open access
Published: 09 December 2023

Using digital phenotyping to capture depression symptom variability: detecting naturalistic variability in depression symptoms across one year using passively collected wearable movement and sleep data

Translational Psychiatry volume 13, Article number: 381 (2023) Cite this article

2464 Accesses
6 Citations
Metrics details

Subjects

Abstract

Major Depressive Disorder (MDD) presents considerable challenges to diagnosis and management due to symptom variability across time. Only recent work has highlighted the clinical implications for interrogating depression symptom variability. Thus, the present work investigates how sociodemographic, comorbidity, movement, and sleep data is associated with long-term depression symptom variability. Participant information included (N = 939) baseline sociodemographic and comorbidity data, longitudinal, passively collected wearable data, and Patient Health Questionnaire-9 (PHQ-9) scores collected over 12 months. An ensemble machine learning approach was used to detect long-term depression symptom variability via: (i) a domain-driven feature selection approach and (ii) an exhaustive feature-inclusion approach. SHapley Additive exPlanations (SHAP) were used to interrogate variable importance and directionality. The composite domain-driven and exhaustive inclusion models were both capable of moderately detecting long-term depression symptom variability (r = 0.33 and r = 0.39, respectively). Our results indicate the incremental predictive validity of sociodemographic, comorbidity, and passively collected wearable movement and sleep data in detecting long-term depression symptom variability.

Predictive biosignature of major depressive disorder derived from physiological measurements of outpatients using machine learning

Article Open access 25 April 2023

Personalized machine learning of depressed mood using wearables

Article Open access 09 June 2021

Personalized mood prediction from patterns of behavior collected with smartphones

Article Open access 28 February 2024

Introduction

Major Depressive Disorder (MDD) is highly prevalent and burdensome, socially and economically. An estimated 8% of all U.S. adults (~21 M) experienced a depressive episode in the last year [1], and an estimated 6% (15 M) experienced associated severe functional impairment [1]. Depression is ranked in the top twenty leading causes of disability, globally [2] and is estimated to cost $326 billion USD annually, an increase of 38% in the last decade [3]. Many people with MDD do not receive treatment, with one in three people with active symptoms failing to receive care [1]. Further, MDD is frequently misdiagnosed by primary care, which is often the first point of contact for those with clinical symptoms [4].

MDD presents considerable challenges to effective diagnosis and management, due, in part, to its dynamic nature and variable trajectory [5]. The longitudinal course of MDD, as described by the DSM-5, allows for considerable variability across persons, such that some individuals may experience only discrete episodes separated by long periods of remission, while others experience chronic, unrelenting symptoms over years [6]. Research to date has explored person-to-person differences in depression course and variability over time, with empirical evidence for heterogeneity in symptom trajectory [7,8,9], as well as difficulty in predicting longitudinal course [10]. These findings suggest that cross-sectional severity (“level of depression”) and presence (“depressed vs. not depressed”) outcomes alone, while providing informative “snapshots” in time, are insufficient for understanding the naturalistic course of MDD, and thus, the core nature of MDD.

We posit that depression symptom variability, per se, is an important outcome, which has meaningful basic science and translational implications. For the purpose of our study, we define depression symptom variability to mean the degree of within-person variation in reported depression symptom severity across time. Indeed, research to date examining depression temporal dynamics (Nemesure et al., [11]), has revealed considerable within and between-person symptom variability over time. We provide a theoretical and empirical basis for the importance of depression symptom variability as an outcome. First, variability is important to explore as a core metric of depression’s naturalistic, longitudinal course. Together with other summative longitudinal metrics, such as mean severity, variability provides an important summary of depression’s longitudinal course. Depression symptom variability is a necessary precondition for relapse and remission (i.e., major depressive episodes), which are important outcome and prognostic markers in MDD [6]. Further, depression temporal variability may help to inform diagnostic distinctions, such as that between MDD and Persistent Depressive Disorder (PDD), with the latter theoretically showing less long-term temporal variability than the former as well as more severe functional impairment [12]. Therefore, a nuanced understanding of depression’s course, including an understanding of those factors associated with symptom variability, is fundamental to effective assessment and management. A highly variable course, for instance, would require more frequent assessments to accurately describe the disorder trajectory, and likely more temporally dynamic interventions.

Second, depression symptom variability has been associated with important clinical, prognostic, and treatment outcomes. Specifically, higher depression symptom variability has been positively associated with (i) higher risk of suicide attempts [13], (ii) lower family functioning (in maternal depression) [14], (iii) cognitive decline [15], and (iv) pathological narcissism [16] (an important prognostic marker for mental health treatment) [17]. Depressed mood variability has also been shown to interact with perceived self-esteem instability in predicting future depression at six-month follow-up [18], and a variable, chronic depression course has been associated with all-cause mortality in older adults [19]. In addition, rapid symptom fluctuation in depressed people has been associated with involvement in violence [20]. Given these impactful clinical and prognostic associations, it is of considerable importance to understand naturalistic depression symptom variability, including the personalized features which may contribute to a fluctuating course.

Of important transdiagnostic consideration, there is face validity that depression variability may have a relation to affective instability, the latter of which has been studied in relation to depression utilizing repeat assessment of both high and low-arousal negative affect features [21]; low-arousal negative affect features (e.g., “tired”, “bored”, “droopy”) [22] have considerable overlap with the core neurovegetative depressive symptoms including low energy, depressed mood, and reduced interest [6]. Thus it may be a reasonable assumption that affective instability may be at least partially explained by temporal depression variability, and therefore understanding depression variability may help in understanding affective instability, which is also an important consideration in borderline personality and bipolar disorders [23].

Machine learning methods, operating on highly dimensional datasets, have shown great promise in modeling important clinically relevant outcomes in MDD [24,25,26]. Advances in computing power and passive data streaming have made possible the application of ecologically valid, person-generated health data (e.g., sleep, movement) to personalized depression models [24], complementing more traditional demographic features. Price et al., for example, utilized actigraphy data to effectively detect MDD presence in a large cohort [27]. Naturalistic movement and sleep data are promising candidates for modeling MDD symptom variability, given their established relationship to major depressive episodes and their capacity for predicting depression severity [28, 29]. In particular, sleep and movement problems are core features of depression [6], and sleep problems are a known risk factor for depression recurrence [30], a plausible driver of long-term symptom variability. In addition, such passively collected features have contributed to empirical support for MDD-associated (1) sleep and circadian rhythm irregularities [31, 32], (2) reduced locomotion [33], and (3) reduced daily activity [34]. These efforts inform our understanding of features associated with depression presence and severity, and thereby serve as a benchmark for identifying biodemographic and behavioral characteristics that may also have an association with long-term depression symptom variability.

To build upon efforts by Makhmutova et al. in the development of the Prediction of Severity-Change Depression (PSYCHE-D) model and data source [35, 36], the present work leveraged a stacked ensemble machine learning approach applied to baseline biodemographic (i.e., sociodemographic and comorbidity) features and objective, wearable passively collected movement and sleep data, to explore factors associated with long-term depression symptom variability. Methodologically, our work is unique in our direct model comparisons on the basis of feature selection and feature-type. First, we compared a model trained on theory-informed feature selection against a parallel model trained on an exhaustive feature set. Second, we compare a model trained on baseline demographic features to a parallel model trained on passively derived sleep and activity features. Further, we examine the incremental predictive gain when combining both types of features; for all models we utilize a robust stacked ensemble approach. We hypothesized that (1) features having known association with depression presence and severity would also associate with long-term symptom variability. Further, (2) we hypothesized that biodemographic and objective passively collected movement and sleep data each contain complementary information and, thus, when combined would produce improved model prediction compared to either singular information modality, as accounting for complementarity during feature selection has been shown to increase model performance [37, 38]. To test our hypotheses, we used 12-month longitudinal data [39] comprising personal biodemographic data, movement, and sleep metrics statistically derived from passively collected wearable accelerometry data, and quarterly PHQ-9 scores. A cross-validation framework, coupled with a stacked ensemble machine learning approach, was implemented to model depression symptom variability using features with empirical associations with depression. For model interpretability, we used an algorithmic approach to quantify the relative importance and directionality of biodemographic features, statistical movement, and sleep features, and both in concert for predicting depression symptom variability.

Methods

Study sample

The present work used publicly available biodemographic, wearable passively collected movement and sleep, and depression symptom data originally collected over a 12-month period provided in the PSYCHE-D dataset [40], which was captured as part of the DiSCover Project developed by Evidation Health [39]. Participants were originally recruited via Achievement, a community of adults in the United States that can connect consumer-grade fitness applications and wearable (e.g., Fitbit, Garmin) to the study platform. Participant inclusion was limited in the present analyses to individuals with twelve consecutive months of objective accelerometer information, reflecting non-missing values for some or all of the related movement and sleep metrics for each month, and a reported Patient Health Questionnaire-9 (PHQ-9) [41] composite score completed at baseline and every subsequent 3-month time point for the 12-month study period (N = 939, 70.61% female, 29.39% male, age_mean = 42.55 ± 10.23, 91.37% White, 4.69% Black, 4.05% Hispanic, 2.66% Asian, 2.23% Race not specified, 10.81% required financial assistance from the government) (see Fig. 1). A full description of the original DiSCover Project study design, recruitment protocols, and participant baseline demographic information is provided by Lee et al. [39].

**Fig. 1: A flow diagram representing the selection and exclusion of participants, which led to the 939-participant sample in the present work.**

Study measures

The original PSYCHE-D dataset contains 150 person-generated health data (PGHD) features reflecting baseline biodemographic information, derived passively collected movement and sleep information, and Patient Health Questionnaire-9 (PHQ-9) composite scores (PHQ-9_mean = 6.80 ± 5.72; 42.79% No Depressive Symptoms, 28.78% Mild Depressive Symptoms, 17.61% Moderate Depressive Symptoms, 7.41% Moderately Severe Depressive Symptoms, 3.41% Severe Depressive Symptoms) [40]; a common screening tool for MDD [42] consisting of nine items which reflect the degree to which each item was bothersome over the last two weeks (e.g., feeling down, depressed, or hopeless) [41]. Makhmutova et al. describe the PGHD feature collection and processing in further detail [35]. The dataset was subset for the present analyses to 20 features consisting of a combination of 8 baseline biodemographic (i.e., Sex, Race, BMI, Pregnancy Status, Money Assistance, Comorbid Diabetes Type I, Comorbid Diabetes Type II, Comorbid Migraines), and 12 derived passively collected movement and sleep data (i.e., Average Awake Activity, Low Physical Activity Duration, Moderate-to-Vigorous Activity Duration, Active Day Count, Sedentary Day Count, Nighttime Sleep Variability, Average Weekday Sleep, Average Weekend Sleep, Sleep Start Time, Variability In Sleep Start Time, Weekly Hypersomnia Count, Weekly Hyposomnia Count). These features were chosen based on known direct or indirect associations with depression, outlined in Supplementary Table 1, as feature engineering and selection informed by domain knowledge has been shown to improve predictive performance and model interpretability [43].

Data preprocessing

All data preprocessing was performed in R (v 4.0.2) [44]. Baseline biodemographic feature data types were interrogated and converted according to their reporting structure (e.g., Migraine comorbidity was converted from numerical to categorical). To account for the missingness of certain biodemographic and movement and sleep-related metrics, multivariate imputation by chained equations (mice) with predictive mean matching was implemented using the mice package in R [45], as mice is well-suited to handling high proportions of missing data, and captures the uncertainty associated with approximating missing information [46]. Across all participants, 0.08% of the subsetted biodemographic information was missing, and 15.64% of the subsetted passively collected movement and sleep-related metrics information was missing. Resultantly, five imputed datasets were generated, reflecting the plausible distribution of missing information, and used for subsequent analyses. Following imputation, summative metrics of the longitudinal passive-collected movement and sleep features were derived to represent the average and variability of each selected feature across the twelve-month data collection period. The average was calculated as the mean of the feature’s values, and variability was calculated as the root mean square of successive differences (RMSSD) of the respective feature. The summative features were derived to reflect longitudinal movement and sleep behaviors, as well as avoid a nested data structure, such that each participant could be represented as a single row with their fixed baseline biodemographic features and their statistically derived movement and sleep features. To interrogate the naturalistic fluctuation in sequential depressive symptoms across a twelve-month period, the RMSSD of depressive symptom change was calculated. As previously stated, individuals’ composite PHQ-9 scores collected at months 0, 3, 6, 9, and 12 were used to calculate variability in depressive symptoms (RMSSD). Thus, an individual’s PHQ-9_RMSSD represented a single metric of depressive symptom variability that captured fluctuation in symptom expression across the entire study. Additionally, PHQ-9_RMSSD was correlated with mean PHQ-9 score to establish that PHQ-9_RMSSD was not simply a proxy for depression symptom intensity (r = 0.54, R² = 0.29).

Machine learning modeling approach

The present analyses were completed in Python (v 3.9) [47], and followed a threefold cross-validation framework (80%), allowing for a within-sample completely held-out test set (20%) to quantify predictive performance [48], and providing an efficacious approach in allowing for unbiased performance estimates in machine learning modeling [49]. Specifically, a stacked ensemble machine learning approach was used across the five MICE-generated datasets to assess for predictive robustness across the plausible imputation distribution. Stacked ensemble machine learning approaches have shown the capacity to consistently outperform base algorithms in detecting depression [50], by leveraging algorithmically distinct machine learning models (e.g., linear models, tree-based models) to individually train on the data. The individual model predictions are subsequently used as inputs to a final “meta” model, which returns a consensus score. The stacked ensemble algorithms and hyperparameters implemented for the present analysis are provided in Supplementary Table 2. In addition, the cross-validation architecture and random seed chosen for splitting the data were standardized across the three models (baseline biodemographic model; passively collected movement and sleep model; composite model) to reflect consistency across the model progression. Further, an exhaustive feature-inclusion approach was implemented, where all originally collected features were incorporated or transformed for the three respective model types (see Table 1) to evaluate performance with an increased feature space.

Table 1 (A) Model performance of the theory-organized and full variable set stacked ensemble machine learning approaches for the validation and held-out test set(s) for the three model types, reported as correlation ± standard deviation; (B) model performance of the theory-organized and full variable set stacked ensemble machine learning approaches for the validation and held-out test set(s) for the three model types, reported as normalized mean absolute error ± standard deviation.

Full size table

Model performance

Model performance was reported for the validation and held-out test set for each of the machine learning models as the mean and standard deviation across the five MICE-imputed datasets for correlative strength (r), and normalized mean absolute error (MAE_norm). The MAE_norm reflects an outcome-agnostic representation of the model’s mean absolute error by dividing the mean absolute error by the range of the observed outcome, and thus represents the mean percentage error of the prediction.

Model introspection

To assess the most influential features for model prediction across the three models, SHapley Additive exPlanations (SHAP) were implemented, and the top five most influential features were reported for each model. SHAP provides a method for model introspection by iteratively perturbing the input features and assessing how this affects the model prediction [51]. Thus, SHAP provides a mechanism for determining feature importance, as well as the marginal contribution of each input variable to the model’s prediction at the individual level, represented as the individual values positioning on the x axis of Fig. 2. Specifically, an individual features SHAP values can be interpreted as the features’ partial association with the outcome when controlling for all other input features in the model. Collectively, SHAP can estimate the relative magnitude of a feature’s influence on a model’s predictions, directional relationships between features and predicted outcomes, as well as different order interactions between features.

**Fig. 2: Model(s) actual versus predicted values plotted with respective correlative strength and the top five most influential features for the models’ predictions.**

Results

Baseline biodemographic features

Baseline biodemographic modeling results

Baseline biodemographic features were incorporated into a stacked ensemble machine learning approach to detect depression symptom variability (PHQ-9_RMSSD) (Supplementary Table 1). Averaged across the five MICE-imputed datasets, we found a weak, positive correlation (r = 0.27 ± 0.00, MAE_norm 0.14 ± 0.00; see Table 1) between predicted long-term depression symptom variability outcomes and actual long-term depression symptom variability outcomes in the held-out test set (see Fig. 2A).

Relative feature importance and directionality for the baseline biodemographic model

Using SHAP (see Methods section Model introspection), we found comorbid migraines to be the most influential feature in the model’s prediction of higher depression symptom variability, followed by female sex, high body mass index (BMI), required financial assistance, and non-White race (see Fig. 2A and Supplementary Table 1).

Passively collected movement and sleep features

Passively collected movement and sleep modeling results

Statistically derived features from wearable, passively collected movement and sleep data (Supplementary Table 1) were incorporated into a stacked ensemble machine learning model to detect depression symptom variability (PHQ-9_RMSSD). Similar to the biodemographic model, when averaged across the five MICE-imputed datasets, we found a weak, positive correlation (r = 0.27 ± 0.01, MAE_norm 0.14 ± 0.00; see Table 1) between predicted long-term depression symptom variability outcomes and actual long-term depression symptom variability outcomes in the held-out test set (see Fig. 2B).

Relative feature importance and directionality for the passively collected movement and sleep model

Using SHAP (see Methods section Model introspection), we found (1) high weekday sleep duration, (2) high count of nights with less than five hours asleep (hyposomnia) in the last week, (3) lower recent step count, (4) high range of sleep duration, and (5) low weekend sleep duration to be the top five most influential features in the model’s prediction of high depression symptom variability (see Fig. 2B and Supplementary Table 1). The top five features in the passively collected movement and sleep reflect an average over twelve months.

Combined biodemographic and passively collected movement and sleep features

Biodemographic and passively collected movement and sleep modeling results

Using a composite model of baseline biodemographic features (see Results section Baseline biodemographic features) and statistically derived features from wearable passively collected movement and sleep data (see Results section Passively collected movement and sleep features) we found a moderate, positive correlation (r = 0.33 ± 0.01, MAE_norm 0.14 ± 0.00; see Table 1) between predicted depression score variability outcomes and actual depression score variability outcomes in the held-out test set (see Fig. 2C).

Relative feature importance for the combined biodemographic and passively collected movement and sleep model

Using SHAP (see Methods section Model introspection), we identified (1) comorbid migraines to be most influential in the model’s prediction of high depression symptom variability (PHQ-9_RMSSD), followed by (2) female sex, (3) lower duration of weekend sleep, averaged over 12 months, (4) higher range of time asleep, averaged over 12 months. and (5) higher duration of weekday sleep, averaged over 12 months (see Fig. 2C and Supplementary Table 1).

Exhaustive feature-inclusion modeling results

Complementing the decision to subset biodemographic and passively collected movement and sleep features using theoretical and empirical domain knowledge, we also constructed three parallel stacked ensemble machine learning models operating on the non-subsetted PSYCHE-D [40] feature set, including 49 original and statistically derived biodemographic features, and 222 statistically derived movement and sleep features. The exhaustive feature-inclusion approach showed marginal performance improvement compared to the theory-driven variable selection approach across the three model types (see Table 1 and Fig. 3). Nevertheless, the exhaustive inclusion of all previously collected features introduced increased model complexity and reduced featured interpretability.

**Fig. 3: Comparative analysis incorporating or transforming all originally collected variables for the three respective models.**

Discussion

General overview

The present results demonstrate the successful application of both biodemographic and passively collected movement and sleep features for modeling the novel outcome, long-term depression symptom variability. We found moderate predictive capacity of the biodemographic and passively collected movement and sleep features for long-term depression symptom variability detection when used in concert. This validates our hypothesis (1) of features indicative of depression severity also indicative of depression symptom variability and (2) the predictive utility of complementarity (i.e., unique information) between feature types. Regarding our theory-guided subsetting approach, we found modest improvements in predictive performance using a non-subset feature set with an increase in model complexity (see Table 1 and Fig. 3).

Implications and importance

The successful application of the biodemographic and passively collected movement features used in the present analysis to detect depression symptom variability has promising mental health clinical implications, strengthening evidence for more objective and naturalistic assessments, with less burden to patients [52]. The work also validates our hypothesis of variables empirically correlated with major depressive disorder (e.g., sex, migraines, sleep disturbances) also having association with depression symptom variability. While biomarkers of depression severity have been studied more extensively, factors associated with depression symptom variability have had relatively less attention.

In this work, we make the case for (1) variability, per se, as an outcome of high importance, as well as (2) the importance and utility of predicting who is likely to have high variability. First, variability has been linked to important outcomes, including suicide attempts in high-risk individuals [13], as well as family functioning in the case of maternal depression [14]. Thus, symptoms variability, itself, may be a risk factor for important clinical outcomes. Second, long-term symptom variability is a necessary precondition for episodic depression relapse and remission. Relapse and remission counts have obvious importance as clinical outcomes by themselves, and have been associated with poorer long-term prognosis in MDD [53, 54]. Third, predicting person-level variability has implications for personalized medicine [55] approaches to mental healthcare. Identifying who is likely to have higher symptom variability over time, would allow for person-tailored assessment frequencies. For instance, a person with high depression symptom variability would require more frequent depression assessments compared to someone with lower depression symptom variability to adequately capture the disorder course over time.

Model introspection and depression symptom variability theory

The presence of migraines was the most influential of the biodemographic features for predicting depression symptom variability and remained so even when combined with statistically derived passively collected movement and sleep features (see Fig. 2). Migraines have been established as highly comorbid with depression [56, 57]; additionally, research has demonstrated that migraines may perturb the naturalistic course of depression, prolonging the time to depression remission [58]. However, the direct relationship of migraines to depression symptom variability is not well understood. A plausible explanation stems from research demonstrating depression exacerbation in concurrence with migraine headache onset (a phenomenon reported in nearly one-third of a depressed sample) [59]. Given the discrete and episodic nature of migraine headaches [60], as well as the empirical support for simultaneity in migraine onset and depression exacerbation, it would follow that such patients would show heightened variability in their depression over time.

Following migraines, the next most influential features for modeling depression symptom variability in the biodemographic model included: (i) female sex, (ii) high BMI, (iii) required financial assistance, and (iv) non-White race. These findings may be contextualized in research to date, which demonstrated females had a considerably higher rate of depressive episodes [61], with higher frequency, theoretically serving as a proxy for variability. Further, required financial assistance may be a proxy for lower socioeconomic status, a known correlate of depression [62]; specific to variability, a large longitudinal cohort study (N = 12,650) showed socioeconomic status predicted long-term patterns of change in intra-individual depression symptom variability [63]. However, it is also important to consider that markers of variability in depression, such as race and sex, could also be markers for events such as racism and discrimination, which may, themselves, have an episodic course [64]. While racism and discrimination have been shown to predict depressive symptoms, longitudinally [65], discriminatory events have also been shown to cause acute exacerbations in depression [66]. Such depression “spikes” over time may appear to be of a more variable course.

Movement and sleep features derived from passively collected actigraphic data demonstrated capacity for modeling depression symptom variability. Sleep behaviors were highly represented among the most influential features in the movement and sleep model, as well as the composite model (see Fig. 2B, C). Specifically, sleep duration (for both weekends and weekdays), range of sleep duration, and nights spent with hyposomnia were the most influential sleep-related features. These findings are generally consistent with well-established knowledge of the close relationship between sleep, activity, and depression [6, 67], validated with passively collected, objective data [33]. Notably, sleep quality and duration have bidirectional associations with psychosocial functioning amongst young adults [68]. Moreover, short sleep duration and poor sleep quality are associated with a higher prevalence of depressive symptoms among university students [69]. This suggests a complex relationship between sleep and depression that is not merely unidirectional, but rather complicated by biopsychosocial variables.

Further, specific sleep profiles have been empirically correlated with longitudinal depression symptom variability [70], perhaps suggesting the existence of sleep markers for MDD variation. Curiously, sleep quality correlates more strongly with psychosocial functioning than sleep duration among young adults [68]. Our findings, range of sleep duration, nights with hyposomnia, and sleep duration, may be further contextualized in research linking similar features (i.e., total sleep time and day-to-day variability in total sleep time) to next-day mood and depressive symptoms [71]. It follows that changes in mood may track with changes in sleep; thus, a higher range of nightly sleep duration would imply a wider range of depression severity. Recognizing the multifactorial nature of sleep, optimizing sleep architecture, quality, and duration collectively, yet intricately, influences depression outcomes. Both insufficient and excessive sleep durations have been shown to elevate depression risk [72, 73], with the latter being particularly pertinent when coupled with sustained poor sleep quality. Factors such as emotional exhaustion and stress, whether stemming from academic demands [74] or shift work [75], further complicate the intricate relationship between sleep and depression.

Recall that, in addition to a feature subsetting approach, guided by a priori domain knowledge, we comparatively tested an exhaustive feature set approach, using all biodemographic and all movement and sleep features (see Fig. 3C). Despite the reduced interpretability of such a model, conferred by the inclusion of statistical features which are more convoluted, there is a modest increase in performance (r = 0.39, compared to r = 0.33 with reduced feature model), highlighting the utility and application of such an approach for a performance-driven task. In contrast to the domain-driven approach, the top five most influential features in the exhaustive feature model were all derived from passively collected movement and sleep data—none from biodemographic information or self-report. Notably, a subset of these features were generated from regression-based statistics on the passively collected movement and sleep data [35], which have not been established in the literature on long-term depression symptom variability, but do seem to offer a substantive increase in information for the model’s predictions, allowing for increased model performance. These findings suggest further consideration into the utility of feature engineering as it pertains to passively collected movement and sleep data, as it offers clear advantages for tasks strictly concerned with improving predictive performance relating to long-term depression symptom variability.

Strengths, limitations, and future directions

The current study uniquely utilized long-term depression variability as an outcome measure. In addition, our methods allow for a direct comparison between feature selection strategies, specifically theory-informed versus exhaustive, and between feature types, specifically passive sensing-derived features and baseline demographic features. A significant strength of our work lies in our application of a robust stacked ensemble approach, accommodating the potentially complex relationships among features. Despite the strengths and novelty of our work, the study results must be considered in the context of several important limitations, described here. (1) The study population was limited in demographic diversity, and future research would benefit from analyzing a more nationally representative sample when detecting depression symptom variability. Further, a consideration for depression symptom variability within demographic groups (e.g., gender, race) should be assessed, as influential biodemographic and passively collected movement and sleep features are likely differentially expressed between populations, which would allow for more effective personalized treatment. (2) Recall that the outcome (PHQ-9_RMSSD) is derived from self-reported PHQ-9 scores at 3-month intervals over the course of one year. As such, the temporal resolution of depression symptom variability is limited. A related but distinct limitation inherent in the original study design is the mismatch between the 2-week look-back period of the PHQ-9 and the 3-month interval at which the measurements were collected. In future research investigating depression symptom variability, ecological momentary assessments for depressive symptoms would be preferable. (4) Finally, the choice of one year over which to measure variability has important implications in the applicability and interpretation of results. While one year is likely sufficient to capture a single depressive episode [76], it may be insufficient to capture the temporal dynamics across multiple depressive episodes. Furthermore, while the present investigation of factors associated with depression symptom variability is appropriately conducted on a community sample, given that over one-third of participants (38.8%) reported PHQ-9 scores both below and above the clinical threshold for depression (PHQ-9 ≥ 10), generalizability to a clinical sample remains uncertain. Thus, a future extension of this work would be validation and comparison on a clinical sample to assess both model performance as well as features most associated with the model’s predictions.

Conclusion

In the present work, we emphasize depression symptom variability as an important clinical and research variable in mental health. Variability represents an important attribute of the depression’s longitudinal course, as well as a dimension of heterogeneity between depressed persons. In addition, depression symptom variability has been linked to important clinical outcomes, such as suicide. Though much is known of factors associated with point-in-time depression severity, relatively little is known of long-term, naturalistic variability in depression, as well as person-specific factors which associate with variability. In the present work, we explore the capacity of biodemographic and passively collected movement and sleep information to model depression symptom variability. We find positive results to suggest association between both biodemographic and passively collected data types, independently, as well as evidence of complementarity in predictive capacity. Our work provides an early step toward the complementary, personalized use of unobtrusive data types in addressing the question of depression’s temporal variability.

Data availability

The Prediction of Severity-Change Depression (PSYCHE-D) dataset used in the present manuscript can be accessed at https://zenodo.org/records/5085146.

Code availability

The data that support the findings of this study are available from the corresponding author, GP, upon reasonable request.

References

NSDUH. 2020 National Survey of Drug Use and Health (NSDUH) Releases | CBHSQ Data [Internet]. 2020.
Vos T, Lim SS, Abbafati C, Abbas KM, Abbasi M, Abbasifard M, et al. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396:1204–22.
Article Google Scholar
Greenberg PE, Fournier AA, Sisitsky T, Simes M, Berman R, Koenigsberg SH, et al. The economic burden of adults with major depressive disorder in the United States (2010 and 2018). PharmacoEconomics. 2021;39:653–65.
Article PubMed PubMed Central Google Scholar
Vermani M, Marcus M, Katzman MA. Rates of detection of mood and anxiety disorders in primary care: a descriptive, cross-sectional study. Prim Care Companion CNS Disord. 2011. http://www.psychiatrist.com/pcc/article/pages/2011/v13n02/10m01013.aspx
Chen LS, Eaton WW, Gallo JJ, Nestadt G. Understanding the heterogeneity of depression through the triad of symptoms, course and risk factors: a longitudinal, population-based study. J Affect Disord. 2000;59:1–11.
Article PubMed CAS Google Scholar
American Psychiatric Association APA. Diagnostic and Statistical Manual of Mental Disorders (DSM-5). Arlington, Virginia: American Psychiatric Association; 2013.
Kennedy N, Abbott R, Paykel ES. Longitudinal syndromal and sub-syndromal symptoms after severe depression: 10-year follow-up study. Br J Psychiatry. 2004;184:330–6.
Article PubMed Google Scholar
Musliner KL, Munk-Olsen T, Eaton WW, Zandi PP. Heterogeneity in long-term trajectories of depressive symptoms: Patterns, predictors and outcomes. J Affect Disord. 2016;192:199–211.
Article PubMed Google Scholar
van Eeden WA, van Hemert AM, Carlier IVE, Penninx BW, Giltay EJ. Severity, course trajectory, and within-person variability of individual symptoms in patients with major depressive disorder. Acta Psychiatr Scand. 2019;139:194–205.
Article PubMed Google Scholar
Rushton JL, Forcier M, Schectman RM. Epidemiology of depressive symptoms in the National Longitudinal Study of Adolescent Health. J Am Acad Child Adolesc Psychiatry. 2002;41:199–205.
Article PubMed Google Scholar
Nemesure MD, Collins AC, Price GD, Griffin TZ, Pillai A, Nepal S, Heinz MV, Lekkas D, Campbell AT, Jacobson NC. Depressive symptoms as a heterogeneous and constantly evolving dynamical system: Idiographic depressive symptom networks of rapid symptom changes among persons with major depressive disorder. J Psychopathol Clin Sci. (in press).
Schramm E, Klein DN, Elsaesser M, Furukawa TA, Domschke K. Review of dysthymia and persistent depressive disorder: history, correlates, and clinical implications. Lancet Psychiatry. 2020;7:801–12.
Article PubMed Google Scholar
Melhem NM, Porta G, Oquendo MA, Zelazny J, Keilp JG, Iyengar S, et al. Severity and variability of depression symptoms predicting suicide attempt in high-risk individuals. JAMA Psychiatr. 2019;76:603–13.
Article Google Scholar
Seifer R, Dickstein S, Sameroff AJ, Magee KD, Hayden LC. Infant mental health and variability of parental depression symptoms. J Am Acad Child Adolesc Psychiatry. 2001;40:1375–82.
Article PubMed CAS Google Scholar
Rovner BW, Casten RJ, Leiby BE. Variability in depressive symptoms predicts cognitive decline in age-related macular degeneration. Am J Geriatr Psychiatry. 2009;17:574–81.
Article PubMed PubMed Central Google Scholar
Dawood S, Pincus A. Pathological narcissism and the severity, variability, and instability of depressive symptoms. Personal Disord Theory Res Treat. 2018;9:144–54.
Article Google Scholar
Ellison WD, Levy KN, Cain NM, Ansell EB, Pincus AL. The Impact of pathological narcissism on psychotherapy utilization, initial symptom severity, and early-treatment symptom change: a naturalistic investigation. J Pers Assess. 2013;95:291–300.
Article PubMed Google Scholar
Franck E, De Raedt R. Self-esteem reconsidered: unstable self-esteem outperforms level of self-esteem as vulnerability marker for depression. Behav Res Ther. 2007;45:1531–41.
Article PubMed Google Scholar
Geerlings SW, Beekman ATF, Deeg DJH, Twisk JWR, Tilburg WV. Duration and severity of depression predict mortality in older adults in the community. Psychol Med. 2002;32:609–18.
Article PubMed CAS Google Scholar
Odgers CL, Mulvey EP, Skeem JL, Gardner W, Lidz CW, Schubert C. Capturing the Ebb and flow of psychiatric symptoms with dynamical systems models. Am J Psychiatry. 2009;166:575–82.
Article PubMed Google Scholar
Bos EH, de Jonge P, Cox RFA. Affective variability in depression: revisiting the inertia–instability paradox. Br J Psychol. 2019;110:814–27.
Article PubMed Google Scholar
Feldman Barrett L, Russell JA. Independence and bipolarity in the structure of current affect. J Pers Soc Psychol. 1998;74:967–84.
Article Google Scholar
Henry C, Mitropoulou V, New AS, Koenigsberg HW, Silverman J, Siever LJ. Affective instability and impulsivity in borderline personality and bipolar II disorders: similarities and differences. J Psychiatr Res. 2001;35:307–12.
Article PubMed CAS Google Scholar
Heinz MV, Thomas NX, Nguyen ND, Griffin TZ, Jacobson NC. Technological Advances in Clinical Assessment. Comprehensive Clinical Psychology. Elsevier. 2022 p 301–20. https://doi.org/10.1016/b978-0-12-818697-8.00171-0.
Nemesure MD, Heinz MV, Huang R, Jacobson NC. Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence. Sci Rep. 2021;11:1980.
Article PubMed PubMed Central CAS Google Scholar
Shatte ABR, Hutchinson DM, Teague SJ. Machine learning in mental health: a scoping review of methods and applications. Psychol Med. 2019;49:1426–48.
Article PubMed Google Scholar
Price GD, Heinz MV, Collins AC, Jacobson NC. Detecting major depressive disorder presence using passively-collected wearable movement data in a nationally-representative sample. psyarxiv [Preprint]. 2023. https://psyarxiv.com/9p4xr/
Jacobson NC, Weingarden H, Wilhelm S. Digital biomarkers of mood disorders and symptom change. NPJ Digit Med. 2019;2:3.
Article PubMed PubMed Central Google Scholar
Moshe I, Terhorst Y, Opoku Asare K, Sander LB, Ferreira D, Baumeister H, et al. Predicting symptoms of depression and anxiety using smartphone and wearable data. Front Psychiatry. 2021. https://www.frontiersin.org/article/10.3389/fpsyt.2021.625247
Peterson MJ, Benca RM. Sleep in mood disorders. Sleep Med Clin. 2008;3:231–49.
Article Google Scholar
Korszun A, Young EA, Engleberg NC, Brucksch CB, Greden JF, Crofford LA. Use of actigraphy for monitoring sleep and activity levels in patients with fibromyalgia and depression. J Psychosom Res. 2002;52:439–43.
Article PubMed Google Scholar
Rykov Y, Thach TQ, Bojic I, Christopoulos G, Car J. Digital biomarkers for depression screening with wearable devices: cross-sectional study with machine learning modeling. JMIR MHealth UHealth. 2021;9:e24872.
Article PubMed PubMed Central Google Scholar
Burton C, McKinstry B, Szentagotai Tătar A, Serrano-Blanco A, Pagliari C, Wolters M. Activity monitoring in patients with depression: a systematic review. J Affect Disord. 2013;145:21–8.
Article PubMed Google Scholar
Wang R, Chen F, Chen Z, Li T, Harari G, Tignor S, et al. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing - UbiComp ’14 Adjunct. Seattle, Washington: ACM Press; 2014. p. 3–14.
Makhmutova M, Kainkaryam R, Ferreira M, Min J, Jaggi M, Clay I. Prediction of self-reported depression scores using person-generated health data from a virtual 1-year mental health observational study. In: Proceedings of the 2021 Workshop on Future of Digital Biomarkers. Virtual Event Wisconsin: ACM; 2021. p. 4–11.
Makhmutova M, Kainkaryam R, Ferreira M, Min J, Jaggi M, Clay I. Predicting changes in depression severity using the PSYCHE-D (prediction of severity change-depression) model involving person-generated health data: longitudinal case-control observational study. JMIR MHealth UHealth. 2022;10:e34148.
Article PubMed PubMed Central Google Scholar
Singha S, Shenoy PP. An adaptive heuristic for feature selection based on complementarity. Mach Learn. 2018;107:2027–71.
Article Google Scholar
Zhang Y, Lyu H, Liu Y, Zhang X, Wang Y, Luo J. Monitoring Depression Trends on Twitter During the COVID-19 Pandemic: Observational Study. JMIR Infodemiology. 2021;1:e26769.
Article PubMed Google Scholar
Lee JL, Cerrada CJ, Ying Vang MK, Scherer K, Tai C, Tran JLA, et al. The DiSCover Project: protocol and baseline characteristics of a decentralized digital study assessing chronic pain outcomes and behavioral data. Pain Medicine. 2021. https://doi.org/10.1101/2021.07.14.21260523
Makhmutova M, Kainkaryam R, Ferreira M, Min J, Jaggi M, Clay I. PSYCHE-D: predicting change in depression severity using person-generated health data (DATASET). Zenodo. 2021. https://zenodo.org/record/5085146
Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–13.
Article PubMed PubMed Central CAS Google Scholar
Arroll B, Goodyear-Smith F, Crengle S, Gunn J, Kerse N, Fishman T, et al. Validation of PHQ-2 and PHQ-9 to screen for major depression in the primary care population. Ann Fam Med. 2010;8:348–53.
Article PubMed PubMed Central Google Scholar
Heaton J. An empirical analysis of feature engineering for predictive modeling. IEEE Xplore. 2016. p. 1–6. https://ieeexplore.ieee.org/document/7506650/information.
R Core Team. R: a language and environment for statistical computing [Internet]. R Foundation for Statistical Computing; 2021. Available from: https://www.R-project.org/
Buuren S, van, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011;45:1–67.
Article Google Scholar
Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64:402.
Article PubMed PubMed Central Google Scholar
Van Rossum G, Drake FL. Python 3 reference manual. Scotts Valley, CA: CreateSpace; 2009.
Berrar D. Cross-validation. Encyclopedia Bioinform Comput Biol. 2019;1:542–5.
Article Google Scholar
Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinforma. 2006;7:91.
Article Google Scholar
Tao X, Chi O, Delaney PJ, Li L, Huang J. Detecting depression using an ensemble classifier based on Quality of Life scales. Brain Inf. 2021;8:2.
Article Google Scholar
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;4765–74.
Heinz MV, Price GD, Ruan F, Klein RJ, Nemesure M, Lopez A, et al. Association of selective serotonin reuptake inhibitor use with abnormal physical movement patterns as detected using a piezoelectric accelerometer and deep learning in a nationally representative sample of noninstitutionalized persons in the US. JAMA Netw Open. 2022;5:e225403.
Article PubMed PubMed Central Google Scholar
Klein NS, Holtman GA, Bockting CLH, Heymans MW, Burger H. Development and validation of a clinical prediction tool to estimate the individual risk of depressive relapse or recurrence in individuals with recurrent depression. J Psychiatr Res. 2018;104:1–7.
Article PubMed Google Scholar
Ruhe HG, Mocking RJT, Figueroa CA, Seeverens PWJ, Ikani N, Tyborowska A, et al. Emotional biases and recurrence in major depressive disorder. Results of 2.5 years follow-up of drug-free cohort vulnerable for recurrence. Front Psychiatry. 2019. https://www.frontiersin.org/article/10.3389/fpsyt.2019.00145
Berrouiguet S, Perez-Rodriguez MM, Larsen M, Baca-García E, Courtet P, Oquendo M. From eHealth to iHealth: transition to participatory and personalized medicine in mental health. J Med Internet Res. 2018;20:e7412.
Article Google Scholar
Jahangir S, Adjepong D, Al-Shami HA, Malik BH. Is there an association between migraine and major depressive disorder? A narrative review. Cureus. 2020;12:e8551.
PubMed PubMed Central Google Scholar
Molgat CV, Patten SB. Comorbidity of major depression and migraine—a Canadian population-based study. Can J Psychiatry. 2005;50:832–7.
Article PubMed Google Scholar
Fuller-Thomson E, Battiston M, Gadalla TM, Brennenstuhl S. Bouncing back: remission from depression in a 12-year panel study of a representative Canadian community sample. Soc Psychiatry Psychiatr Epidemiol. 2014;49:903–10.
Article PubMed Google Scholar
Hung CI, Liu CY, Juang YY, Wang SJ. The impact of migraine on patients with major depressive disorder. Headache J Head Face Pain. 2006;46:469–77.
Article Google Scholar
Headache Classification Committee of the International Headache Society (IHS). The international classification of headache disorders, 3rd edition. Cephalalgia Int J Headache 2018;38:1–211.
Fergusson DM, Boden JM, Horwood LJ. Recurrence of major depression in adolescence and early adulthood, and later mental health, educational and economic outcomes. Br J Psychiatry. 2007;191:335–42.
Article PubMed Google Scholar
Everson SA, Maty SC, Lynch JW, Kaplan GA. Epidemiologic evidence for the relation between socioeconomic status and depression, obesity, and diabetes. J Psychosom Res. 2002;53:891–5.
Article PubMed Google Scholar
Melchior M, Chastang JF, Head J, Goldberg M, Zins M, Nabi H, et al. Socioeconomic position predicts long-term depression trajectory: a 13-year follow-up of the GAZEL cohort study. Mol Psychiatry. 2013;18:112–21.
Article PubMed CAS Google Scholar
Roche MJ, Jacobson NC. Elections have consequences for student mental health: an accidental daily diary study. Psychol Rep. 2019;122:451–64.
Article PubMed Google Scholar
English D, Lambert SF, Ialongo NS. Longitudinal associations between experienced racial discrimination and depressive symptoms in african american adolescents. Dev Psychol. 2014;50:1190–6.
Article PubMed Google Scholar
Torres L, Ong AD. A daily diary investigation of Latino ethnic identity, discrimination, and depression. Cult Divers Ethn Minor Psychol. 2010;16:561–8.
Article Google Scholar
Tsuno N, Besset A, Ritchie K. Sleep and depression. J Clin Psychiatry. 2005;66:1254–69.
Tavernier R, Willoughby T. Bidirectional associations between sleep (quality and duration) and psychosocial functioning across the university years. Dev Psychol. 2014;50:674–82.
Article PubMed Google Scholar
Li W, Yin J, Cai X, Cheng X, Wang Y. Association between sleep duration and quality and depressive symptoms among university students: a cross-sectional study. PLoS ONE. 2020;15:e0238811.
Article PubMed PubMed Central CAS Google Scholar
Bi K, Chen S. Sleep profiles as a longitudinal predictor for depression magnitude and variability following the onset of COVID-19. J Psychiatr Res. 2022;147:159–65.
Article PubMed PubMed Central Google Scholar
Fang Y, Forger DB, Frank E, Sen S, Goldstein C. Day-to-day variability in sleep parameters and depression risk: a prospective cohort study of training physicians. NPJ Digit Med. 2021;4:1–9.
Article Google Scholar
Amelia VL, Jen HJ, Lee TY, Chang LF, Chung MH. Comparison of the associations between self-reported sleep quality and sleep duration concerning the risk of depression: a nationwide population-based study in Indonesia. Int J Environ Res Public Health. 2022;19:14273.
Article PubMed PubMed Central Google Scholar
Furihata R, Uchiyama M, Suzuki M, Konno C, Konno M, Takahashi S, et al. Association of short sleep duration and short time in bed with depression: a Japanese general population survey: short time in bed and depression. Sleep Biol Rhythms. 2015;13:136–45.
Article Google Scholar
Zhou T, Cheng G, Wu X, Li R, Li C, Tian G, et al. The associations between sleep duration, academic pressure, and depressive symptoms among Chinese adolescents: results from China family panel studies. Int J Environ Res Public Health. 2021;18:6134.
Article PubMed PubMed Central Google Scholar
Hu Y, Niu Z, Dai L, Maguire R, Zong Z, Hu Y, et al. The relationship between sleep pattern and depression in Chinese shift workers: a mediating role of emotional exhaustion. Aust J Psychol. 2020;72:68–81.
Article Google Scholar
Philipp M, Fickinger M. The definition of remission and its impact on the length of a depressive episode. Arch Gen Psychiatry. 1993;50:407–8.
Article PubMed CAS Google Scholar

Download references

Funding

This work was supported by the National Institute of Mental Health (NIMH) and the National Institute of General Medical Sciences (NIGMS) (grant number 1 R01 MH123482-01).

Author information

Authors and Affiliations

Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA
George D. Price, Michael V. Heinz, Matthew D. Nemesure & Nicholas C. Jacobson
Quantitative Biomedical Sciences Program, Dartmouth College, Lebanon, NH, USA
George D. Price, Matthew D. Nemesure & Nicholas C. Jacobson
Department of Psychiatry, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA
Michael V. Heinz & Nicholas C. Jacobson
Department of Psychiatry, Beth Israel Deaconess Medical Center, Boston, MA, USA
Seo Ho Song
Digital Data Design Institute, Harvard Business School, Harvard University, Cambridge, MA, USA
Matthew D. Nemesure
Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA
Nicholas C. Jacobson

Authors

George D. Price
View author publications
You can also search for this author in PubMed Google Scholar
Michael V. Heinz
View author publications
You can also search for this author in PubMed Google Scholar
Seo Ho Song
View author publications
You can also search for this author in PubMed Google Scholar
Matthew D. Nemesure
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas C. Jacobson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GP, MH, SS, MN, and NJ contributed to the conceptualization, methodology, and writing of the original draft. GP and MH contributed to the validation and visualization of the analysis. GP contributed to the formal analysis. MH and NJ provided supervision to the present work.

Corresponding author

Correspondence to George D. Price.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Price, G.D., Heinz, M.V., Song, S.H. et al. Using digital phenotyping to capture depression symptom variability: detecting naturalistic variability in depression symptoms across one year using passively collected wearable movement and sleep data. Transl Psychiatry 13, 381 (2023). https://doi.org/10.1038/s41398-023-02669-y

Download citation

Received: 14 June 2023
Revised: 02 November 2023
Accepted: 13 November 2023
Published: 09 December 2023
DOI: https://doi.org/10.1038/s41398-023-02669-y

This article is cited by

Mood instability metrics to stratify individuals and measure outcomes in bipolar disorder
- Sarah H. Sperry
- Anastasia K. Yocum
- Melvin G. McInnis
Nature Mental Health (2024)
Advancements and Limitations: A Systematic Review of Remote-Based Deep Learning Predictive Algorithms for Depression
- Fintan Haley
- Jacob Andrews
- Nima Moghaddam
Journal of Technology in Behavioral Science (2024)