Abstract
Objective
Delirium, an acute confusional state, affects 20-80% of patients in Intensive Care Units (ICUs), one in three medically hospitalized patients, and up to 50% of all patients who have had surgery. Its development is associated with short- and long-term morbidity, and increased risk of death. Yet, we lack any rapid, objective, and automated method to diagnose delirium. Here, we detail the prospective deployment of a novel dual-camera contextual eye-tracking platform. We then use the data from this platform to contemporaneously classify delirium.Results
We recruited 42 patients, resulting in 210 (114 with delirium, 96 without) recordings of hospitalized patients in ICU across two centers, as part of a prospective multi-center feasibility pilot study. All recordings made with our platform were usable for analysis. We divided the collected data into training and validation cohorts based on the data originating center. We trained two Temporal Convolutional Network (TCN) models that can classify delirium using a pre-existing manual scoring system (Confusion Assessment Method in ICU (CAM-ICU)) as the training target. The first model uses eye movements only which achieves an Area Under the Receiver Operator Curve (AUROC) of 0.67 and a mean Average Precision (mAP) of 0.68. The second model uses the point of regard, the part of the scene the patient is looking at, and increases the AUROC to 0.76 and the mAP to 0.81. These models are the first to classify delirium using continuous non-invasive eye-tracking but will require further clinical prospective validation prior to use as a decision-support tool.Clinical impact
Eye-tracking is a biological signal that can be used to identify delirium in patients in ICU. The platform, alongside the trained neural networks, can automatically, objectively, and continuously classify delirium aiding in the early detection of the deteriorating patient. Future work is aimed at prospective evaluation and clinical translation.Free full text
A Dual-Camera Eye-Tracking Platform for Rapid Real-Time Diagnosis of Acute Delirium: A Pilot Study
Abstract
Objective: Delirium, an acute confusional state, affects 20-80% of patients in Intensive Care Units (ICUs), one in three medically hospitalized patients, and up to 50% of all patients who have had surgery. Its development is associated with short- and long-term morbidity, and increased risk of death. Yet, we lack any rapid, objective, and automated method to diagnose delirium. Here, we detail the prospective deployment of a novel dual-camera contextual eye-tracking platform. We then use the data from this platform to contemporaneously classify delirium.Results: We recruited 42 patients, resulting in 210 (114 with delirium, 96 without) recordings of hospitalized patients in ICU across two centers, as part of a prospective multi-center feasibility pilot study. All recordings made with our platform were usable for analysis. We divided the collected data into training and validation cohorts based on the data originating center. We trained two Temporal Convolutional Network (TCN) models that can classify delirium using a pre-existing manual scoring system (Confusion Assessment Method in ICU (CAM-ICU)) as the training target. The first model uses eye movements only which achieves an Area Under the Receiver Operator Curve (AUROC) of 0.67 and a mean Average Precision (mAP) of 0.68. The second model uses the point of regard, the part of the scene the patient is looking at, and increases the AUROC to 0.76 and the mAP to 0.81. These models are the first to classify delirium using continuous non-invasive eye-tracking but will require further clinical prospective validation prior to use as a decision-support tool.Clinical impact: Eye-tracking is a biological signal that can be used to identify delirium in patients in ICU. The platform, alongside the trained neural networks, can automatically, objectively, and continuously classify delirium aiding in the early detection of the deteriorating patient. Future work is aimed at prospective evaluation and clinical translation.
I. Introduction
Delirium is a clinical syndrome of acutely impaired cognition and memory secondary to a wide spectrum of underlying acute pathologies [1], [2], [3], [4]. It affects between 20-80% of all patients admitted to Intensive Care Unit (ICU), affects up to one in three medically hospitalized patients, and up to one in two of patients who have undergone surgery. It is associated with an increased risk of hospital-acquired infections, increased risk of falls, increased hospital length of stay, and increased cost of stay [5], [6], [7], [8], [9], [10], [11]. The development of delirium also has long-term consequences - following discharge from the hospital, patients who develop delirium have worse cognitive function scores, and this cognitive dysfunction can persist for many years [12], [13], [14].
The diagnosis of delirium in ICU is through the use of the CAM-ICU. This scoring system aims to assess consciousness and attention as surrogate markers of cognition and memory [15]. However, it requires manual monitoring and awareness from the clinician to trigger the assessment. The scoring system itself is burdensome and can miss episodes of delirium owing to the intermittent nature of manual testing. This can lead to under-diagnosis [16], [17].
Attempts at automating the diagnosis of delirium have been made but have been fraught with difficulties - quantitive Electroencephalogram (EEG) through the use of the BiSpectral Index (BIS) system has been found to correspond to arousal [18], [19]. However, other EEG indices have been used with good metrics [20], [21], [22], [23]. The main drawback of these techniques is that the patient is instrumented and is required to be stationary for the signal to be acquired, which can be difficult as delirious patients can be agitated and combative. Thus, an alternative non-invasive technique that doesn’t instrument the patient is needed.
Eye-tracking has been used for the diagnosis and monitoring of neuropsychiatric diseases including schizophrenia, affective disorders, autism spectrum disorder, and Alzheimer’s dementia [24], [25], [26], [27]. It has been hypothesized to also be diagnostic for delirium due to the joint role of top-down visual attention modulation and memory encoding of the medial temporal lobe [28], [29]. However, no such platform exists for the eye tracking of patients with delirium, as the eye-tracking device has to meet the requirements of clinical safety owing to the acuity of patients with delirium. Patients with delirium can also be agitated and thus a close wearable eye-tracker is not suitable. Thus, we have developed a bespoke eye-tracking platform that uses a pipeline of neural networks and computer vision algorithms facilitating the acquisition of eye movement and Point of Regard (PoR), the part of the scene the patient is looking at, in real-time, at a safe distance [30], [31], [32]. Despite multiple eye-movement-derived indices being used in neuro-psychiatric diseases, such as abnormalities in smooth pursuit in schizophrenia, none have been explored in delirium [26], [33], [34].
To explore eye-tracking for patients suffering from delirium, we developed and validated a novel eye-tracking platform that is suitable for deployment in a clinical environment where it meets criteria for signal acquisition and patient safety [31], [32], [35]. The platform does not require patient level calibration or involvement and does not require a stimulus to be presented and thus is completely non-invasive. In this paper, we describe the prospective multi-center deployment of this camera-based eye-tracking platform for the purpose of ascertaining whether eye-tracking can be used as a biological signal that is diagnostic of delirium.
We use the data gathered from our eye-tracking platform for purposes of classification of delirium using time-series data. We train TCNs on data gathered from one center and validate the models on data gathered from a different center. We train two models - the first uses eye-movement indices only, whilst the second model adds scene contextual information which increases the performance. The result is a model that can take eye-tracking data as input and output a probability of delirium. The resulting models are continuous in nature, can provide a probability of delirium in real-time, and are non-invasive owing to the platform’s nature.
II. Method
We conducted a prospective multi-center pilot study which was registered on ClinicalTrials.gov 1 for the purpose of the study of eye movements in patients with delirium. It was approved by Health Research Authority (HRA) and Research & Ethics Committee (REC) (Approval number: 20/LO/0162) and was conducted in accordance with the Helsinki declaration. Patients were recruited across Chelsea and Westminster Hospital (CWH) and West Middlesex Hospital (WMH) between November 2020 and February 2022; two interdisciplinary general medical and surgical hospitals. Participants were not remunerated for their participation in the study.
A. Protocol
Following recruitment, patients underwent a once-daily assessment of delirium through CAM-ICU by a singular medically trained intensivist who has had formal training in the diagnosis of delirium. Measurements of eye movements and the fixations on the scene, known as the Point of Regard (PoR), occurred concurrently, for 10 minutes daily until discharge from ICU - see Fig. 1 for an overview of the eye-tracking platform data pipeline [15], [30], [31], [32]. To avoid recruiting many patients who are not delirious, we enriched our cohort by pre-selection avoiding patients at low risk of delirium using the Early PREdiction of DELIRium in ICu patients (E-PRE-DELIRIC) model. To maximize the external validity of the resulting classification models that were trained on the eye-tracking indices, we used data from CWH for training and development while data from WMH was used for external validation. Table 1 demonstrates the patient characteristics across the two sites while Fig. 2 demonstrates the Consolidated Standards of Reporting Trials (CONSORT) diagram for the flow of patients throughout the study.
TABLE 1
CWH (Development) | WMH (Validation) | Overall Sessions | |
Sessions | 186 | 76 | 262 |
CAM-ICU | |||
Positive | 75 | 39 | 114 |
Negative | 69 | 27 | 96 |
Unable to measure | 42 | 10 | 52 |
Gender | |||
Female | 11 | 4 | 15 |
Male | 21 | 6 | 27 |
Age (Years) | |||
Mean ± Standard Deviation | |||
Frailty Score | |||
Mean ± Standard Deviation | |||
APACHE-II Probability (%) | |||
Mean ± Standard Deviation | |||
Urea (mmol/L) | |||
Mean ± Standard Deviation | |||
Corticosteroids | |||
No | 31 | 10 | 41 |
Yes | 1 | 0 | 1 |
Admission Type | |||
Elective | 8 | 2 | 10 |
Emergency | 24 | 8 | 32 |
ICU Length of Stay (Days) | |||
Mean ± Standard Deviation | |||
Unit Outcome | |||
Alive | 25 | 8 | 33 |
Dead | 7 | 2 | 9 |
Consent to enter this study was obtained in one of two ways: as the trial studied delirium, a capacity-losing state, an assessment of capacity was performed, and if present, fully informed consent was gained from the patient directly. Should the participant’s mental state, competence, and capacity, mean that they were unable to provide consent, the patient’s relatives/friends were approached for advice using the same procedure. In situations where a relative/friend was not available, a nominated consultee was sought which was one of the Medical Consultant Intensivists who were not involved in the care of the patient at the time of advice. Should the patient recover their capacity, fully informed consent was undertaken to ensure that their participation in the study is in line with their wishes.
The study’s inclusion and exclusion criteria were:
Inclusion Criteria:
- 1)
Aged years
- 2)
Expected risk of delirium as defined by the E-PRE-DELIRIC score to be %
- 3)
Expected Length of Stay days
Exclusion Criteria:
- 1)
Lack of Consent
- 2)
Pre-existing diagnosis of Dementia
- 3)
Significant visual impairment
- 4)
Non-concordant eyes
- 5)
The inability for facial recognition and eye tracking to be performed reliably
In relation to exclusion criteria - patients with dementia were excluded as the diagnosis of Delirium-on-Dementia is a clinical challenge and the presence of dementia can confound the diagnosis of delirium; CAM-ICU is specifically not validated for this cohort [36]. The last exclusion criterion ‘Patients who were unable to be reliably tracked’ was added to ensure that the data processing pipeline can reliably extract the patient’s face, detect their landmarks, and regress their gaze vector; an example of a patient that would be excluded would be one undergoing maxillofacial surgery, or with a pre-existing facial deformity. This was added to ensure that the data was of high quality and would generalize to the wider population.
B. Eye Tracking Platform
Due to the presence of cognitive and memory dysfunction in delirium, traditional eye-tracking platforms cannot be used as they require calibration or are placed within an unsafe distance of the patient. With this motivation, we have previously developed a camera-based non-invasive platform for the continuous measurement of eye signals; specifically head pose, eye horizontal and vertical angles, and blinking status [30], [31], [32]. The patient’s environment was instrumented with two cameras connected to a commercially available laptop. One camera was placed at the foot end of the bed facing the patient (termed the head camera) and another was placed behind the patient facing the same direction as the patient’s head (termed the scene camera). Compared to other techniques of eye-gaze regression, our method is non-invasive, accurate and precise from a clinically safe distance, and does not limit the patient or restrict clinical care [37]. We deployed this system prospectively in the two centers. The data pipeline of each camera is depicted in Fig. 1.
In summary, the platform performs facial detection and landmark measurement sequentially on images acquired from the head camera using deep convolutional neural networks. These networks were specifically tuned for use in healthcare to minimize false identification of faces and cope with occlusions from medical equipment. The patient’s 3D head position and rotation relative to the head camera are then estimated by minimizing the re-projection error of the translation and rotation of a generic pre-specified 3D model onto the measured landmarks. The landmarks also facilitated the extraction of the patient’s eye patches which were then used for blink classification and gaze regression using two further neural network ensembles. The platform runs in real-time at a rate of 20 - 30Hz and has been demonstrated to surpass the required accuracy and precision required for this setting [30], [31], [32]. The gaze vectors were then gathered into fixations which intersected with the scene depth image using a novel image-space gaze-scene intersection algorithm which was developed specifically for this setting and is state-of-the-art [32]. The scene camera also performs pose estimation of the head camera by locating a specialized marker (ChArUco board) located above the camera thus ensuring the patient’s measurements are relative to the scene camera. The final outputs of the platform are the patient’s gaze vector for each fixation and the PoR in pixel coordinates. Auxiliary measurements are also stored to ensure data validity.
Each 10-minute recording is labeled with CAM-ICU as the ground-truth label for whether the patient was delirious during this recording. To reduce inter-rater variability, CAM-ICU was conducted by the study personnel rather than the bedside nurse.
C. Data Processing
Data acquired by the platform are stored in non-ordered sequential storage which is not suitable for machine learning. Thus, an offline processing stage takes the raw data messages and converts them to a format suitable for digestion. The unordered sequential data is thus temporally ordered.
Following the conversion of the measured data into ordered temporal data, a post-processing stage performs data cleaning which removes gaze and scene data where the blink classifier deems that the patient was blinking as per [30]; Fig. 3a illustrates how eye closure/blinking was excluded. Following this, gaze data is converted to fixations based on a dispersion filter. This filter calculates the pair-wise cosine distances across a set of gaze vectors and the inverse cosine of the maximum pair-wise distance is taken to the angle of that set of dispersion, formally:
where is the norm of its argument , and is the dot product of and . The result is a dispersion, measured in radians, of the set . The set is composed of gaze vectors where the number of elements in the set comprises a duration of 600ms [38], [39]. If the dispersion of the set is below a threshold, that set is deemed to be a fixation. The dispersion threshold is set to the precision of the gaze tracker at 6° [30].
Following fixation classification, the PoR is measured by the intersection of the fixation vector with the scene as per [32]. This results in a complete dataset per recording session per participant.
D. Classification Models
We define the task of delirium classification as a supervised time-series binary classification task. As little is known on what visual and eye-movements features are robust for classification in this task, and to maximize external validity, we opt to use TCNs as they are mechanistically simple but provide powerful classification performance of time-series data [40]. We split our data spatially, data from CWH was used for training whilst data from WMH was used for validation. We report performance metrics from the validation cohort only.
Two models were trained, the first, aimed at classifying delirium from eye movements, where the inputs to the TCN’s are the horizontal and vertical eye angles of each fixation. This model aims to understand if any spatio-temporal eye movements differ from delirious and non-delirious episodes. The second model aims to understand the impact of the scene on eye movements. The part of the scene that the patient is looking at, known as the PoR, was extracted by intersecting the fixation vector into the depth image acquired by the depth camera [32]. Each patch was then encoded into a 1024-length vector using a ResNet-50 neural network pre-trained on ImageNet and used as the inputs into the TCN [41]. Fig. 3b illustrates the architecture for both models.
III. Classification Model Specification
As our data is large time series in nature, with variable length, a causal classification technique that can attend to distant time points is required for optimal classification. Whilst both Long-Short Term Memory (LSTM) and TCN can handle arbitrarily long sequences, multi-layer TCN have been shown to be superior to LSTM for long sequences and thus form the basis of our classification models [40].
The TCN acts as a time-series encoder outputting a fixed-length vector that represents the time series in multi-dimensional space. The TCN ’s last hidden output thus forms an encoding of the entire time series. This encoding is then fed into a set of fully connected linear layers that output a logit. This logit is then converted to a probability using the so-called softmax function, indicating whether the networks classify the patient’s time series as being delirious or not.
As each recording session is 10 minutes in length, rather than using the entire recording session for training, we instead opt to use a window of fixed size within that recording to train the network; this window size was subjected to hyper-parameter search and the result of a window-size of 1000 was used. This has the added benefit of data augmentation as the start of the window can be shuffled thus augmenting the data available for machine learning.
1) Delirium Classification from Eye Movements Model
The TCN for this task is composed of 8 temporal layers followed by 6 fully connected linear layers. Each temporal block consists of a 1-dimensional convolutional filter, with a channel size of 256 and a kernel size of 7 followed by a non-linear activation function (ReLU) and dropout (p=0.05) for regularization.
2) Delirium Classification from PoR Model
PoR, the part of the scene that the patient is looking at, was extracted by intersecting the fixation vector with the scene image [32]. A crop of the scene, a patch, was then encoded into a 1024-length vector using a ResNet-52 neural network pre-trained on ImageNet [41]. Thus, the inputs into the TCN ( in Fig. 1) are thus the fixed-length vectors.
Similar to the eye movement model specification, the PoR classification model is composed of an 8-layer TCN followed by 5 fully connected linear layers. Each temporal block consists of a 1-dimensional convolutional filter, with a channel size of 512 and a kernel size of 7 followed by a non-linear activation function (ReLU) and dropout (p=0.05) for regularization [40].
A. Training Regime
AdamW optimizer was used with hyper-parameters , and a learning rate of [42]. Given that our data is balanced by trial design, non-weighted binary cross-entropy was used for the loss function.
A random search optimization strategy was utilized to find the optimal set of hyper-parameters to maximize performance. A nested cross-validation scheme was used where the training dataset was split into 10 folds where 9 folds were used to train a model with specific hyper-parameters and the last fold was used for validation repeated 10 times and the results averaged [43]. The hyper-parameter optimization random search was bounded within the accepted range of each hyper-parameter. The tuned hyper-parameters were: window-size, TCN layers, channel size, kernel size, number of fully connected layers, learning rate, regularization, and dropout rate [44], [45]. The best-performing model’s parameters were stored and used for validation with the best-validated model used for testing.
B. Performance Metrics
Model performance is demonstrated by using Receiver Operator Characteristic (ROC) on the testing cohort and Precision-Recall Curve (PRC). AUROC demonstrates the accuracy of the model where a number closest to 1.0 indicates perfect classification and 0.5 indicates performance similar to chance. The PRC curve demonstrates the trade-off between Recall and Precision with the mAP summarizing the curve. The closer the mAP is to 1.0 the better the model where 1.0 indicates perfect accuracy compared to a baseline model of 0.
C. Calibration and Threshold Tuning
Model calibration was analyzed using normalized calibration curves; these are isotonic curves that compare the estimated binned probabilities to the fraction of observed risk where the diagonal line represents perfect concordance between predictions and observations. Objectively, the Brier score can be used to assess calibration; this is a unit-less score between 0 and 1 where the best-calibrated model attains a score of 0 which is calculated as the sum of residual errors between the prediction and the label [46].
To identify the threshold at which to dichotomize the probabilistic output of the models, we plotted a sweep of the threshold between and calculated the F-1 score, the harmonic mean of the precision and recall at each threshold. The optimal point to dichotomize the probabilistic output of the classification models is the threshold which maximizes the F1-score.
IV. Results
The eye-tracking platform facilitated the recruitment of 42 critically unwell patients across two centers, resulting in 262 recording sessions. Table 1 demonstrates the patient characteristics across the two sites, Fig. 2 demonstrates the CONSORT diagram for the flow of patients in the analysis. Recruitment occurred sequentially and all incomers were screened for inclusion and exclusion criteria; the majority of excluded patients were post-operative in nature and thus did not meet the inclusion criteria for length of stay, while a small number of the remaining patients failed to meet the prespecified E-PRE-DELIRIC cutoff of %. Only one patient, who suffered from divergent strabismus was excluded from the study due to the platform’s inability to infer their direction of gaze. Fig. 4 demonstrates the distribution of recordings across the two centers aligned by days since admission.
Comparing the development and validation cohorts, delirium incidence, as defined by CAM-ICU, was similar ( , chi-squared test). Similarly, many of the collected confounders, that can lead to the development of delirium, were also found to be similar between the training and validation cohorts ( , chi-squared test) except the patient’s admission urgency ( ¡ 0.01, chi-squared test). Following recording, 52 sessions were excluded due to the patient’s conscious level corresponding to a Richmond Agitation Sedation Scale (RASS) score precluding the measurement of delirium using CAM-ICU. No patients were excluded because of the platform’s inability to perform eye-tracking. In total, 136,615 fixations were collected that were suitable for analysis.
Each recording session was labeled to originate from a delirious or non-delirious episode by performing manual labeling using CAM-ICU at the time of the recording by the research team to maximize reliability. The data was then split into a training set and a validation set. To maximize external validity, we chose to use data from CWH for model training, and data from WMH for validation. We conducted three studies: the first two concerned the creation of classification models digesting eye tracking data into a probability of delirium; these were trained under a supervised scheme using CAM-ICU as the label. We chose to use neural networks for their powerful ability as universal function approximators without requiring complex specifications, as little is known about the specifics of eye movements in patients with delirium.
A. Eye-Movements in Delirium
We first aimed to test whether eye movements themselves, regardless of visual scene attention, were different between episodes of delirium and non-delirium. We trained a model that uses time-series data to classify delirium purely based on the gaze angle. The intuition is that, if the model acts as a universal approximation function, then it should be able to identify the spatio-temporal differences of eye movements if such a difference exists. The trained model (Fig. 3c and and3d)3d) demonstrates adequate discriminatory validation performance with an AUROC of 0.67 and a mAP of 0.68 indicating that eye movements differ between delirious and non-delirious episodes. Table 2 demonstrates the sensitivity and specificity of the model at different thresholds.
TABLE 2
Eye Movements | Scene | |||
---|---|---|---|---|
Threshold | Specificity | Sensitivity | Specificity | Sensitivity |
0.1 | 0.11 | 0.79 | 0.18 | 0.93 |
0.3 | 0.48 | 0.75 | 0.58 | 0.82 |
0.5 | 0.62 | 0.71 | 0.64 | 0.76 |
0.7 | 0.81 | 0.38 | 0.85 | 0.4 |
0.9 | 0.89 | 0.14 | 0.91 | 0.14 |
B. Visual Attention in Delirium
The spatiotemporal eye movement differences outlined in the eye-movements-only model could either be a result of the intrinsic cerebral activity or because of altered processing of the visual information received by the patient. To explore this further, we trained another model that utilizes the contextual information around the patient, namely the scene, for the classification of delirium. The intuition here is that if eye movements are not reactive to the environment, then a model would fail at classifying whether the inputs originated from a delirious episode or a non-delirious episode.
Thus, a crop of the scene that forms the PoR of the participant’s fixation was extracted and another TCN model was then trained on the time series of those image crops. This model demonstrates an increase in the discriminatory performance of the classifier to an AUROC of 0.76 and a mAP of 0.81, an 11% increase in the AUROC and a 16% increase in the mAP when compared to using eye movements alone (Fig. 3c and and3d).3d). This suggests that the different eye movements of delirious and non-delirious recordings can be accounted for, at least in part, by extrinsic scene information.
Both trained models exhibit appropriate training diagnostics given the binary cross entropy loss function (Fig. 3e). The classifiers are also stable across a wide range of delirium risk (Figs. 3g and and3h).3h). Table 2 demonstrates the sensitivity and specificity of the model at different thresholds.
V. Discussion
Delirium is common and affects a wide range of patients with profound short and long-term consequences. Yet, no objective marker of been developed. While visual attention has been hypothesized to be diagnostic for delirium, testing this hypothesis has been hampered by a lack of a technological solution that is clinically safe, accurate, precise, and meets empiric requirements. In our previous work, we developed and validated a state-of-the-art eye-tracking platform suitable for the continual non-invasive eye-tracking of delirious patients across two hospitals [30], [31], [32], [35]. In this manuscript, we sought to understand the utility of eye-tracking for the classification of delirium using our platform.
Using the data acquired using the platform from two general medical and surgical ICUs, we conducted two studies. The first study was aimed at understanding eye movement characteristics between delirium episodes and non-delirium episodes. A TCN, took blink-gated fixation adjusted gaze vectors as inputs and predicted delirium. The classification accuracy on a validation dataset suggested that there are differences in the fixation angle behavior between delirious and on-delirious episodes with good classification accuracy indicating that fixation-angle is separable between patients suffering with and without delirium. To address whether this difference is due to internal or external factors, we trained another TCN – if eye movements were intrinsic - i.e. originating from internal mechanisms without any external influence, then a classifier trained on scene information would not be able to discriminate between delirious and non-delirious episodes. This scene TCN found that scene information, through the extraction of PoRs, increased classification accuracy suggesting that scene information contributes to the eye movements characteristics of delirium. By adding PoR as contextual information, the classifier can more accurately delineate between patients suffering from delirium compared to those who are not. This finding the first of its kind to shed light on whether visual attention can discriminate episodes of delirium and non-delirium. The performance of the classifiers was stable across a wide range of delirium risk and the classifiers were well-calibrated but the best-performing model only achieved an AUROC of 0.76 and a mAP of 0.81. This is a good performance but not yet at the level of clinical utility.
Our study had several strengths. Firstly, it is the first study of its kind that looks at eye-tracking in critically unwell patients. We prospectively gathered eye-tracking data from patients across two centers where the first center’s data was used to create and train the classifier while the second center’s data was used for validation. This provides assurances on the validity of our findings of delirium classification. Secondly, the CAM-ICU test was conducted by the same personnel across both sites decreasing the inter-rater variability and increasing diagnostic confidence. Thirdly, the nature of the dual-camera solution provides a clinically safe system that is non-invasive and hands-free enabling continuous care without any instrumentation, unlike eye-tracking glasses. This makes it deployable across many healthcare institutions including medical and surgical wards, as well as community nursing homes – ICU was chosen for its large concentration of delirious patients as part of a pilot feasibility trial. We envisage that part of the deployment of the system would result in the fixation of the cameras in the patient’s bedspace thus making them a fixture that does not require regular set-up.
Putting the results of this study in a clinical context, we found that eye movements, and specifically, where in the scene the patient is paying attention to, can readily diagnose delirium in an automated way. This provides an objective marker that is free from the moderately high inter-observer variability of CAM-ICU. This finding is in keeping with the current thinking around delirium where visual attention is thought to be a key diagnostic feature relating to the interplay between working memory and visual processing [47]. This is also a useful biological signal that is the direct result of a cognitive process in an acutely ill patient and can serve as a foundation to build further clinical translational work including the development of a clinical decision tool, the understanding of the neurological basis of visual inattention in delirium, and the development of a clinical trial to understand the impact of automatic delirium monitoring on patient outcomes.
A. Limitations
While the study we conducted has several strengths, we wish to highlight some limitations of our approach, ways in which we envisage they could be addressed, and future work in this area.
Firstly, the camera system is required to be positioned at a place where the eyes of the patient can be viewed. This limits the usability of the system to supine patients – i.e. not lying on their front, and not significantly lying on their side. It is a standard of care in ICU to nurse patients at 30° head-up in a supine position to minimize aspiration risk and thus the supine forward-facing position encompasses the majority of patients. Active Vision, the research area involved in finding the optimal position of the cameras to maximize signal acquisition could provide a potential solution for patients lying on their side. Similarly, occlusion of the patient’s eyes by clinical staff, or other objects, is another limitation. Fig. 1 illustrates that the first stage is facial detection, which if occluded, would fail thus stopping the eye tracking pipeline from progressing. Taking this limitation to an extreme where the majority of the time, the view is obstructed, would result in our system being an intermittent test, similar to the current diagnostic standard of CAM-ICU.
B. Future Work
This pilot feasibility study looking at eye-tracking in ICU could be advanced further in several directions, both from an engineering perspective and a clinical translational perspective. We firstly wish to investigate the classifiers in a prospective setting where their clinical utility would be scrutinized, and secondly, to focus on understanding the neurological basis of visual inattention in delirium.
1) Prospective Evaluation
To achieve our first goal, the classification models, which take the eye-tracking data as input and result in a diagnostic probability of delirium, would require further work to increase their performance and prospective evaluation prior to its use as part of a clinical decision tool. Formal decision analysis will also required to ascertain the impact of automatic delirium monitoring across various thresholds as well as qualitative work relating to appropriate threshold tuning prior to informing of a positive delirium diagnosis. Other diagnostic metrics can also be evaluated, such as time-to-diagnosis, important patient outcomes following intervention against the current standard, as well as potential harm from false positive diagnoses. This will also facilitate the understanding of the technology iteratively, improving the technology alongside clinical outcomes. In addition to this, appropriate regulatory approvals (e.g. from the Food & Drug Administration in the USA and the Medicine Healthcare products Regulatory Agency in the UK) will be sought to ensure the technology is safe and effective for use in the clinical setting.
2) Neurological Basis of Visual in-Attention
For our second goal, one of understanding the neurological basis of visual inattention, this can either be through EEG recordings of episodes of delirium aiming at the inspection of the interplay between the medial temporal lobe and the ventral visual processing stream. Alternatively, simulations of the visual attention, given the scene, can then be compared to the actual visual attention in episodes of delirium - thus contrasting healthy and delirious minds under a cognitive architecture scheme. We would also wish to phenotype delirium using the eye-tracking platform and then compare the phenotypes to the underlying neurological basis of delirium. This would provide a deeper understanding of the neurological basis of delirium and provide a foundation for the development of novel treatments. Unfortunately, the current study was not designed to answer these questions, with a limited set of patients with delirium, and thus future work is required to address these questions.
3) Phenotyping Delirium Using Eye-Tracking
An interesting direction of future work would be to phenotype the type of delirium based on our eye-tracking classifiers. Beyond hypo-active, and hyper-active delirium which are clinically obvious once delirium is diagnosed, the underlying cause is often not immediately known and clinicians often correlate the cause with the patient’s current state as well as their underlying diagnosis - e.g. inflammatory delirium if the patient has a concurrent infection, or metabolic delirium if the patient has electrolyte disturbances.
However, the etiology of delirium can often be different from the underlying disease - e.g. metabolic delirium owing to sepsis-induced hepatitis while the patient has an infection. Thus, hinting at the cause of delirium would lead to different treatments and is of importance for future work.
Many techniques exist in the literature that can reveal the underlying groups without further data - e.g. self-supervised techniques would use contrastive techniques to ‘pull’ data points that are similar to each other, and ‘push’ data points that are disparate from each other during the training phase [48]. These techniques require a significant number of patients as the phenotyping would have to be at the level of the patient and not the recording. Simpler techniques can be Principal Component Analysis (PCA), or T-Distributed Stochastic Neighbor Embedding (TSNE) [49], [50].
An alternative technique, linear probing, would place a linear layer on one of the hidden layers of the neural network, and only that layer would then be trained using supervised learning for a classification task. This would require the a-priori knowledge of the number of clusters that can output a probability of belonging to a pre-specified number of classes [51]. However, again, this technique would require a larger sample size to work effectively.
VI. Conclusion
Delirium affects a wide range of patients with severe consequences. Yet, an objective automated system has not been developed. We demonstrated how eye tracking, as performed in a non-invasive, calibration-free manner, can automatically classify delirium in Intensive Care Unit (ICU) to good performance metrics in a pilot feasibility study. Future work is aimed at improving the performance of the models, validating the classification models for clinical use, and using the biological signal to phenotype delirium to understand the neurological basis of this signal.
Acknowledgment
The authors would like to thank the patients, families, and members of staff at Chelsea & Westminster Hospital (CWH) and West Middelsex Hospital (WMH) for their contributions to this work. Ahmed Al-Hindawi and Marcela Vizcaychipi would like to thank the British Medical Association (BMA) Research Fund J Moulton Prize for clinical research into mental health and CW+ charity for their funding of this project. Ahmed Al-Hindawi would like to thank members of the Personal Robotics Laboratory, Singer Lab, and UCL HAL Lab. Specifically, Professor Mervyn Singer, Dr. Nishkantha Aruklumaran, Dr. Steve Harris, Dr. Timothy AC Snow, Dr. Tobias Zimmermann, Dr. Rodrigo Chacon-Quesada, Dr. Tobias Fischer, and Dr. Joshua Elsdon. Marcela Vizcaychipi would like to thank the Westminster Medical School Research Fund for its on going support. Yiannis Demiris would like to thank the Royal Academy of Engineering for his chair in Emerging Technologies.
Funding Statement
The work of Ahmed Al-Hindawi and Marcela Vizcaychipi was supported in part by British Medical Association (BMA) Research Fund J Moulton Prize for clinical research into mental health and in part by CW+ Charity. The work of Marcela Vizcaychipi was supported by Westminster Medical School Research Fund.
References
Articles from IEEE Journal of Translational Engineering in Health and Medicine are provided here courtesy of Institute of Electrical and Electronics Engineers
Full text links
Read article at publisher's site: https://doi.org/10.1109/jtehm.2024.3397737
Read article for free, from open access legal sources, via Unpaywall: https://ieeexplore.ieee.org/ielx7/6221039/6563131/10521720.pdf
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
Clinical Trials
- (1 citation) ClinicalTrials.gov - NCT04589169
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Continuous Non-Invasive Eye Tracking In Intensive Care.
Annu Int Conf IEEE Eng Med Biol Soc, 2021:1869-1873, 01 Nov 2021
Cited by: 1 article | PMID: 34891652
Recalibration of the delirium prediction model for ICU patients (PRE-DELIRIC): a multinational observational study.
Intensive Care Med, 40(3):361-369, 18 Jan 2014
Cited by: 47 articles | PMID: 24441670
The validity and reliability of the Arabic version of the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU): A prospective cohort study.
Int J Nurs Stud, 80:83-89, 28 Dec 2017
Cited by: 5 articles | PMID: 29358101
The confusion assessment method for the intensive care unit (CAM-ICU) and intensive care delirium screening checklist (ICDSC) for the diagnosis of delirium: a systematic review and meta-analysis of clinical studies.
Crit Care, 16(4):R115, 03 Jul 2012
Cited by: 222 articles | PMID: 22759376 | PMCID: PMC3580690
Review Free full text in Europe PMC