Abstract
Background
Intensive care unit admission (ICUA) triage has been urgent need for solving the shortage of ICU beds, during the coronavirus disease 2019 (COVID-19) surge. In silico analysis and integrated machine learning (ML) approach, based on multi-omics and immune cells (ICs) profiling, might provide solutions for this issue in the framework of predictive, preventive, and personalized medicine (PPPM).
Methods
Multi-omics was used to screen the synchronous differentially expressed protein-coding genes (SDEpcGs), and an integrated ML approach to develop and validate a nomogram for prediction of ICUA. Finally, the independent risk factor (IRF) with ICs profiling of the ICUA was identified.
Results
Colony-stimulating factor 1 receptor (CSF1R) and peptidase inhibitor 16 (PI16) were identified as SDEpcGs, and each fold change (FCij) of CSF1R and PI16 was selected to develop and validate a nomogram to predict ICUA. The area under curve (AUC) of the nomogram was 0.872 (95% confidence interval (CI): 0.707 to 0.950) on the training set, and 0.822 (95% CI: 0.659 to 0.917) on the testing set. CSF1R was identified as an IRF of ICUA, expressed in and positively correlated with monocytes which had a lower fraction in COVID-19 ICU patients.
Conclusion
The nomogram and monocytes could provide added value to ICUA prediction and targeted prevention, which are cost-effective platform for personalized medicine of COVID-19 patients. The log2fold change (log2FC) of the fraction of monocytes could be monitored simply and economically in primary care, and the nomogram offered an accurate prediction for secondary care in the framework of PPPM.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Shortage of ICU beds is a knotty problem during COVID-19 pandemic
Coronavirus disease 2019 (COVID-19) is a highly contagious, serious disease caused by a betacoronavirus, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) [1]. COVID-19 has continued to press public health resources and caused many sociopolitical problems [2]. There can be a wide range of symptoms in COVID-19 patients, from mild to severe [3]. Severe cases should be admitted to the intensive care unit (ICU) and undergo mechanical ventilation (MV) with a higher mortality rate [4]. The intensive care unit admissions (ICUA) criteria of COVID-19 patients is still controversial and remain poorly defined under the paradigm of reactive medicine [5,6,7]. According to current criteria, approximately 5 to 32% of COVID-19 patients are admitted to an ICU, which has caused a huge burden to ICU beds [8, 9]. For now, the shortage of the ICU beds has been documented as a very sensitive issue since the COVID-19 pandemic surge.
Integrated machine learning for ICUA prediction in the framework of PPPM
Predictive, Preventive, and Personalized Medicine (PPPM) is an effective integrative approach, which has been promoted by the European Association for Predictive, Preventive and Personalized Medicine (EPMA, http://www.epmanet.eu/) [10, 11]. It mainly contains three aspects: individual predisposition prediction, targeted preventive measures and personalized treatment algorithms [12]. For now, the paradigm of handling COVID-19 pandemic is facing a shift from reactive medicine to PPPM. Machine learning (ML) is an important method for application of artificial intelligence technology. Today, only a few studies applied ML algorithms, such as TabNet, support vector machine and deep neural network, have been used to predict ICUA of COVID-19 patients [13,14,15]. As a brand-new paradigm for integrative proactive medicine, ML-based PPPM allowed us to take the individual prediction and prevention strategies by the suitable predictive algorithms [16]. Integrated ML in the framework of PPPM can make multi-omics and immune cells (ICs) profiling analysis for ICUA prediction and provide great promise for the practice of ICUA targeted prevention and individual treatment in COVID-19 patients.
Challenges in predicting ICUA of COVID-19 patients
Recently, ICUA-related risk factors, including endothelial makers, inflammasome-related markers, and many other serological markers, were confirmed to predict the ICUA in COVID-19 patients [17, 18]. Besides these, the image analysis methods were also performed to assist ICUA prediction [19, 20]. However, most of these top predictors were identified by conventional statistical methods. The accuracy and reliability of these predictors are needed to be further validated. For hard to be explained, only a few above-mentioned algorithms were used to predict ICUA, and none of the above-mentioned studies involved targeted prevention of ICUA. The reported predictors lack the corresponding targeted prevention measures, and are not conducive to personalized medicine. Hence, the prediction of ICUA in COVID-19 patients is a new research subject full of challenges. To solve this problem, we cast new lights on the paradigm shift from reactive medicine to proactive medicine, and tried to establish and verify an optimal predictive model, screen the independent risk factor (IRF)-related IC of ICUA in COVID-19 patients by using multi-omics (proteomics and transcriptomics) and ICs profiling in the framework of PPPM.
Monocytes immune response to COVID-19
Innate immunity system plays an important role in combating the SARS-CoV-2 [21]. Monocytes are essential phagocytic cells of the innate immune system. At present, researches have shown that monocytes participate in and mediate the immune response of different stages in COVID-19 patients [22,23,24]. Although single-cell sequencing and multi-omics analysis have focused on the role of monocytes in COVID-19, and reveal the association between the monocytes and progressive COVID-19 [25, 26], there is still considerable uncertainty regarding the monocytes as an indicator to assess the severity of the disease. Therefore, it is very meaningful to evaluate the monocytes and their protein-molecules markers for ICUA by multi-omics and ICs profiling analysis.
Working hypothesis
Based on multi-omics and ICs profiling, we aim at building and validating an integrated ML prediction model to predict ICUA, and screening IRF-related IC of ICUA as a prevention target in the framework of PPPM. We hypothesized the model is satisfying and perfect and could best predict ICUA for secondary care. As a preventive target of ICUA, the monitoring of IRF-related IC is easy to carry out for primary care. In this study, we developed and validated the cost-effective ICUA prediction model and preventive target of ICUA to add value for personalized medicine of COVID-19 patients in the framework of PPPM. The flow chart of the study is presented in Fig. 1.
Methods
Datasets and ethics statement
The multi-omics differentially expressed data (COVID-19 vs NONCOVID-19) of proteomics and transcriptomics were screened from supplementary data from the corresponding literature and verified by “COVID-omics. app” (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7543711/bin/mmc2.xlsx, https://covid-omics.app) [27]. Within the GEO (Gene Expression Omnibus) Dataset (GSE157103), we identified 26 nonCOVID-19 and 100 COVID-19 patients with available transcriptome expression matrix, demographic and clinical baseline data. The transcriptomic, clinical and demographic data, including gene expression matrix (transcripts per million (TPM) values), sex, age, charlson score (CS), MV, and ICU status were extracted from GEO datasets (GSE157103). The TPM values of synchronous differentially expressed protein-coding genes (SDEpcGs) of COVID-19 were extracted from RNA sequencing data with log2 transformation (log2(TPM + 1)), and integrated with the clinical-demographic data. An entire record was excluded from the analysis if the missing data were involved in the calculations. The clinical and multi-omics data were obtained from GSE157103, which has established a patient’s informed consent mechanism as one of the main prerequisites for opening its data to the public. Therefore, it was confirmed that all valid informed consent was obtained.
Identification and annotation of the SDEpcGs of COVID-19 patients
The multi-omics differential expression data and transcriptome expression data in COVID-19 patients were used to extract the SDEpcGs with q value < 0.05 and the significance threshold of log2 fold change (log2 FC), which was selected through the kernel density estimation (KDE) [28]. First, the differentially expressed proteins (DEPs) were screened from multi-omics differential expression data by threshold (q < 0.05 and KDE-based |log2 FC|). The DEPs with corresponding protein-coding differentially expressed genes (DEGs) (q < 0.05) were identified by searching the UniProt Knowledgebase (UniProtKB) database using the Reviewed (Swiss-Prot)-Manually annotated section (https://www.uniprot.org/) [29], and the synchronous regulation protein-coding genes (q < 0.05 and KDE-based synchronous log2 FC) were selected as the SDEpcGs. Subsequently, the COVID-19 database (http://www.biomedical-web.com/covid19db) was used to further analyze the differential expression value (TPM value with log2 transformation) of SDEpcGs in whole blood by using the “wilcox.test” function of R language (24 healthy controls vs 62 COVID-19 patients; no significant difference (NSD): p > 0.05; *: p < = 0.05; **: p < = 0.01; ***: p < = 0.001; ****: p < = 0.0001) [30]. The Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed to discover and annotate molecular function, biological process, cellular component, and pathway of the SDEpcGs by using the g: Profiler analysis (https://biit.cs.ut.ee/gprofiler/gost) with Benjamini–Hochberg false discovery rate (FDR) < 0.05 [31].
Consensus clustering and clinical-demographic features in COVID-19 patients
To explore the relationship between SDEpcGs and clinical- demographic features in COVID-19 patients, we clustered the SDEpcGs expression patterns of SARS-CoV-2 infected blood samples into different groups with “ConsensusClusterPlus” [32]. To further investigate the role of the SDEpcGs of COVID-19, the relationships between the clusters and clinical-demographic features were studied. Age was divided into two groups (≤ 65 years and > 65 years), and the grades of CS were categorized into severe CS (CS ≥ 5), moderate CS (2 < CS < 5) and mild CS (CS ≤ 2). Then, different log2(TPM + 1) values of the SDEpcGs in the clusters were analyzed by using the Wilcoxon test, and the differences in the clusters and clinical-demographic features, including ICU status, MV status, age, sex, and CS grades were systematically analyzed by using the Pearson’s chi-square test or Fisher's exact test. Heatmap and alluvial diagram were used to probe the cluster-stratified relationship in clinical-demographic features.
Construction and validation of a nomogram model for predicting ICUA of the COVID-19 patients
To construct a nomogram model, that could prompt alarm for ICUA of the COVID-19 patients, we used the cluster-based R function “createDataPartition” in a 1:1 ratio to randomly divide the integrated data into training and testing sets. The expression data in the form of TPM were log2(TPM + 1) transformed and each fold change (FCij) on log2 (TPM + 1)-level gene expression data was calculated using the following formula:
where the indices i and j stand for gene and sample index, respectively.
The age-adjusted Charlson Comorbidity Index (aCCI) was calculated using the method described in a previous study [33]. Wilcoxon-Mann–Whitney (WMW) test, which is also known as Mann–Whitney U Test, and Wilcoxon rank-sum (WRS) test, is a rank-based test that compares values for two independent samples [34]. To compare whether there are the differences between the two sets, the Pearson’s Chi-square test and the WRS test with an approximation, were used to analyze the count data and measurement data by “chisq.test” and “wilcox.test,” respectively. Then, more appropriate statistical methods, including the permutation tests of the approximative Pearson’s chi-squared test and WMW test were performed by using the “chisq_test” and “wilcox_test” function in the coin package of the R language.
The least absolute shrinkage and selection operator (LASSO)-logistic regression was performed to construct and validate a nomogram model for assessing ICUA. First, LASSO algorithm was constructed to obtain the important features from the FCij of SDEpcGs, age, CS, and sex in the training set. To further built the models for assessing ICUA of the COVID-19 patients, the important features were included in logistic regression analysis (these models were fit against the data using “glm” function of R language). Subsequently, the optimal model was selected by the highest compared performance score [35]. To further study the relationship of the features and ICU status, we used the optimal model to carry out a nomogram model in the training set, and the variance inflation factors (VIFs) was performed to measure the multiple collinearities of the model. The discriminations and calibrations were assessed by using the area under curves (AUCs) and the Gruppo italiano per la Valutazione degli interventi in Terapia Intensiva (GiViTI) calibration curves, respectively [36]. Decision curve analysis (DCA) was plotted to evaluate the consistency and clinical efficacy of the models [37].
Identification and assessment of the IRF for ICUA of the COVID-19 patients
We utilized logistic regression analysis to identify the IRF for ICUA of the COVID-19 patients. Then, sensitivity analysis was performed to assess potential confounding factors of the IRF.
Identification of the IRF for ICUA
The feature variables, including the SDEpcGs, age, CS, and sex in the integrated data, were analyzed by the univariate logistic regression analysis, with the response variable of ICUA, the features with statistical significance in the univariate logistic regression analysis were included in the multivariate logistic regression analysis. Logistic regression was used to identify the IRF associated with ICUA, and collinearity diagnostics were performed to measure the degree of collinearity.
Sensitivity analysis
The one-at-a-time (OAT) method [38] combined with Boruta algorithm feature screening was used to perform sensitivity analysis of the IRF for ICUA. First, the Boruta algorithm was used to select the important features from SDEpcGs age, sex, and CS. Then, the OAT method was used to build the adjusted logistic regression models by the ranked variables, and sensitivity analysis was performed to assess potential confounding factors of the IRF with the adjusted logistic regression models.
The correlation analysis of the IRF and ICs in COVID-19 patients
CIBERSORTx (https://cibersortx.stanford.edu/) was used to calculate the fractions of 22 ICs, based on the transcriptome expression matrix of COVID-19 and nonCOVID-19 patients. Then, the transcriptome expression matrix of COVID-19 patients was used to evaluate the fractions and cell-type specific SDEpcGs expression profiles of 22 ICs by CIBERSORTx [39, 40]. The intersection of IC with significant difference of the fraction (Wilcoxon rank-sum test, P < 0.05) in different groups, including COVID-19, clusters, MV or ICU status were identified, and the log2FCIC on fraction of the intersection IC was calculated using the following formula:
where the indices IC stands for immune cell.
The correlation between the IRF and the fraction of the intersection IC with the threshold of Spearman's rank correlation coefficient (ρSpearman) > 0.6 and p < 0.05 was evaluated.
Statistical analysis
All statistical calculations in this study were performed in R 4.1.1 software (https://cran.r-project.org/). Unless otherwise specified, statistical significance was defined as a p < 0.05. If VIF > 10, multiple collinearities exist in the logistic regression analysis. Box–Tidwell method [41] was performed to verify the assumption of linearity in the logit for the continuous variables. The interactive term (the cross product of each continuous variable times its natural logarithm) was added into the logistic regression model. The significance level (α) was 0.05/n (n is the number of terms included in the model) after Bonferroni Correction. If the interaction term is statistically significant (p < α), there is no linear relationship between the corresponding continuous independent variable and the logit conversion value of the response variable (that is, it does not conform to the linear assumption). The same random seed as set. seed (21) was used in the random processes.
Results
Identification and annotation of SDEpcGs
The 517 protein and corresponding protein-coding gene expression levels were compared from 26 non-COVID-19 and 100 COVID-19 blood samples. We estimated the log2 FC of protein expression by density curve (Fig. 2A), and took q < 0.05 with |log2FC|> 1) as the threshold of DEPs. Then, we screened 14 DEPs (|log2 FC |> 1, q < 0.05) with corresponding protein-coding DEGs (q < 0.05), Table 1. The nine-quadrant diagram selected two SDEpcGs (colony-stimulating factor 1 receptor (CSF1R) and peptidase inhibitor 16(PI16)) from 14 corresponding protein-coding DEGs in the SARS-CoV-2 infected blood samples with the threshold of q < 0.05 and synchronous log2 FC < -1 (Fig. 2B), and the expression levels of the two SDEpcGs were confirmed to be substantially less in SARS-CoV-2 infected blood by the COVID-19 database (Fig. 2C, D). The g: Profiler analysis (https://biit.cs.ut.ee/gprofiler/gost) showed 83 GO and 9 KEGG enrichment pathways. The CSF1R involved 14 biological processes (GO: BP) of monocyte-macrophage system. Cell composition (GO: CC) indicated it is located at CSF1-CSF1R complex, and molecular function (GO: MF) revealed it has a macrophage colony-stimulating factor receptor activity, and may involve viral protein interaction with cytokine and cytokine receptor (KEGG). (Supplementary Fig. 1A, B).
Identification analysis of SDEpcGs. (A) Kernel density plot of log2 FC of proteins; Among these proteins, 193(37.3%) were hypoexpressed, and 324(62.7%) hyperexpressed. (B) Nine-quadrant diagram of DEPs with corresponding protein-coding DEGs; CSF1R and PI16 were identified as SDEpcGs with the threshold of q value < 0.05 and synchronous log2FC < -1. (C, D) Boxplots revealed the significantly different expression of CSF1R and PI16 between heathy controls and COVID-19 patients in the COVID-19 database. p values were defined as: NSD: p > 0.05; *: p < = 0.05; **: p < = 0.01; ***: p < = 0.001; ****: p < = 0.0001. Expression values of CSF1R and PI16 were TPM values with log2 transformation
Clinical-demographic features of the clusters based on SDEpcGs
Based on the expression of SDEpcGs, consensus clustering was applied, and k = 2 seemed to be the optimal number of clusters, because the K value had the lowest “proportion of ambiguous clustering” (PAC) and could classify the COVID-19 patients into two distinct clusters (Fig. 3A and B). The associations between clusters and clinical-demographic features were assessed by heatmap, and the clusters were significantly associated with MV and ICU status, but were not significantly correlated with other clinical-demographic characteristics, including sex, age and CS (Fig. 3C). The alluvial diagram describes the relationship of age group, sex, CS grade, MV and ICU status (Fig. 3D). Most patients in cluster 2 needed to enter ICU or use MV, while non-MV or non-ICU patients essentially corresponded with cluster 1. The boxplot indicated that CSF1R and PI16 were significantly increased in cluster 1 (NSD: p > 0.05; *: p < = 0.05; **: p < = 0.01; ***: p < = 0.001; ****: p < = 0.0001) (Supplementary Fig. 2A, B).
Consensus clustering of SDEpcGs and its correlation with clinical-demographic features. (A) Consensus clustering of SDEpcGs for all samples showed that they were more likely to be clustered together when k = 2, with an optimal cumulative distribution function (CDF) curve (B). (C) The heatmap indicates that the cluster has a significant correlation with ICU and MV, and the COVID-19 patients flows were visualized by the alluvial diagram (D)
Construction and validation of a nomogram model for the prediction of ICUA of COVID-19 patients
The integrated data were randomly divided into training and testing sets at a 1:1 ratio, based on clusters, by using “createDataPartition” function in caret package of R language. Two records with missing values of age were deleted. Comparison of the random grouping data indicated that the training and testing sets showed no significant differences in the general situation of ICU, MV, age, sex, CS, aCCI, FCij of CSF1R and PI16 (Pearson’s Chi-square test p > 0.05, WRS p > 0.05 and permutation test p > 0.05). The count data, such as ICU, MV, and sex were shown as the count values with constituent ratios in brackets, and the measurement data, including age, CS, aCCI, FCij of CSF1R and PI16 were present as the medians with ranges in brackets, Table 2.
LASSO-logistic regression was used to evaluate the predictive value of five features, including FCij of CSF1R, FCij of PI16, age, CS, and sex, in the different categories of ICUA. First, FCij of CSF1R, FCij of PI16, age, CS, and sex were included in the LASSO regression analysis of the training set. We used the “lambda.min” as the best lambda and FCij of CSF1R and PI16 were identified as important characteristic variables for ICUA with non-zero coefficients (Fig. 4 A, B; Supplementary Fig. 3 A). Subsequently, univariate and multivariate logistic regression models, based on the FCij of CSF1R and PI16, were preformed, respectively. All indices, including Tjur’s R2, RMSE, Sigma, Log_loss, Score_log, Score_spherical, PCP, AIC, and BIC weights, for each model were rescaled to a range from 0 to 1, and the mean value of them were taken as the Performance-Score, by using “compare_performance” function in performance package of R language. The results showed a model with FCij of two genes (CSF1R and PI16) was the optimal model, which achieved the best Performance-Score (70.92%) than the single SDEpcG-FCij-based models in training set (Supplementary Fig. 3 B, Supplementary Table1.). The multiple collinearity diagnostics confirmed that there was no serious multicollinearity among the two genes in the optimal model. The VIFs of FCij of CSF1R and PI16 were 1.162 and 1.162, respectively, and the FCij of CSF1R and PI16 met the linearity assumption. The nomogram, based on the optimal model, was built to predict ICUA, and the predicted risk (Pr) of ICUA in COVID-19 patients could be obtained from the nomogram. The points correspond to each prediction indicator, and the sum of the points is defined as the total score, and the Pr, which correspond to the total score, is the probability of ICUA in COVID-19 patients. For instance, a COVID-19 patient with FCij of PI16 at 0.08(60 points) had FCij of CSF1R at 0.53 (88 points). The total score of the two prediction indicators was 60 + 88 = 148, and the corresponding Pr is 0.947 (94.7%). It indicated the patients has high-risk of ICUA. (Fig. 4 C). The receiver operating characteristic (ROC) curves obtained for the nomogram model had an AUC of 0.872 (95% Confidence Interval (CI): 0.707 to 0.950) (Fig. 4D), which was better than CS or aCCI-based models in the training set (Supplementary Fig. 3 C, D), and the AUC of the nomogram model was 0.822 (95% CI: 0.659 to 0.917) in the testing set (Fig. 4E). The GiViTI calibration plot of the nomogram displayed 95% CI area does not cross the diagonal bisector in the training and testing sets, which indicated that the predicted model was in good concordance with the actual observation in the training and testing sets (Fig. 4F, G). DCA curve indicated that the nomogram model, with a range of threshold probability from 0.2 to 0.8, had a higher net benefit than the CS or aCCI based model (Fig. 4H).
Developed and validated a nomogram model to predict ICUA of COVID-19 patients. (A) The plot of ten-fold cross-validation showed log (Lambda) versus Binomial Deviance along with the number of features. The left dotted vertical line signified the minimum of Binomial Deviance (lambda.min) and the right dotted vertical line meant one standard error from the minimum (lambda.1se). (B) The plot of coefficients for all 5 features (age, Sex, CS, PI16 FCij and CSF1R FCij). The abscissa is the Log Lambda value, and the ordinate is the coefficient of the feature. With the increase of the lambda value, the coefficient of the features changed to 0, one by one, and finally all the coefficients of the feature changed to 0. To shrink the features effectively, the best lambda (lambda.min) was selected, and the features with zero coefficient would be dropped and not affect the stability of the non-zero-coefficient features-based model. (C) A nomogram model, based on the FCij of PI16 and CSF1R, was built to predict ICUA of patients after SARS-CoV-2 infection. (D) ROC curves revealed that the discrimination of the nomogram was better than CS or aCCI (Supplementary Fig. 3 C, D) in the training set, and it also has a good performance in the testing set (E). (F, G) The GiViTI calibration plots of the nomogram for assessing the probability of ICUA in the training and testing sets. The p values of the GiViTI calibration test of the training set and testing set were 0.782 and 0.525 respectively, without any 95% CIs of GiViTI calibration belt in both sets crossing the diagonal bisector line. (H) The DCA curve of the nomogram show higher net benefits for predicting ICUA in the training set, when the threshold probability was set at 0.2 ~ 0.8. There are five DCA curves corresponding to five models in the figure, three of which are mainly colored models (nomogram, CS and aCCI). The other two are models for assistant decision-making; One is “ALL,” indicating that all patients enter ICU, and the other is “None”, indicating that all patients did not enter ICU. The abscissa is the threshold probability, the ordinate is the net benefit, and the net benefit = true positive proportion-false positive proportion × weight coefficient (weighting by the relative harm of a false positive and a false negative clinical consequence, the proportion of all patients who are false-positive is subtracted from the proportion who are true-positive). The net benefit of CS and aCCI models are almost the same as the “ALL” and “None” models when the probability threshold was set at 0.2 ~ 0.8, and the models are not conducive to clinical decision-making with the lower net benefit and clinical application value than the nomogram model
Identification of the IRF for ICUA of COVID-19 patients
The relationship between the response variable (ICUA) and the features of COVID-19 patients was evaluated using logistic regression analysis. Univariate logistic regression analysis showed that CSF1R and PI16 were significantly different (P < 0.001) with odds ratio (PI16: (OR), 0.210 (95% CI, 0.093–0.420)) and (CSF1R: (OR), 0.274 (95% CI, 0.152–0.447)) (Fig. 5 A). PI16 and CSF1R were included in multivariate logistic regression, which revealed that CSF1R was an IRF of ICUA (P < 0.001) with odds ratio (CSF1R: (OR), 0.353 (95%CI, 0.185–0.612)) (Fig. 5 B). The VIFs of CSF1R and PI16, were 1.282 and 1.282, respectively, and the continuous independent variables met the linearity assumption. The Boruta algorithm showed that the order of importance ranking of the indicators was CSF1R > PI16 > age > CS > Sex, according to the Z-score of each feature (Fig. 5C). Sensitivity analysis were performed by OAT method, according to Boruta algorithm. As in the OAT method, adjusted indicators are added one by one to the CSF1R-based model, and the OR value of CSF1R is always less than 1 with p < 0.001. It indicated that CSF1R was a robust IRF for ICUA of COVID-19 patients with different adjustments, Table 3.
Logistic regression analysis and features ranking for ICUA of COVID-19 patients. (A) Univariate logistic regression analysis of the features showed that the statistically significant features were CSF1R and PI16. (B) Statistically significant features screened from the univariate logistic regression analysis were included in the multivariate logistic regression analysis, and CSF1R was identified as an IRF of ICUA. (C) The Boruta algorithm ranked the importance of the features as CSF1R > PI16 > age > CS > Sex, according to the Z-score of each feature; The ordinate, labeled Importance, represents the Z-score of every feature. The blue boxplots (shadowMin, shadowMean and shadowMax) correspond to minimal, average and maximum Z-score of a shadow feature. The red boxplots and green boxplots represent Z-scores of rejected features (Sex and CS) and confirmed features (age, PI16 and CSF1R), respectively
The correlation between CSF1R and ICs in COVID-19 patients
The fraction matrix of 22 ICs and the cell-type specific gene expression profiles in blood of COVID-19 patients were transformed from the gene expression matrix by CIBERSORTx. The results showed the fraction of monocytes decreased significantly in COVID-19, cluster 2, MV and ICU groups (Supplementary Fig. 4A, B, C, D; Fig. 6A), and the log2FC of monocytes fraction between ICU and non-ICU group was − 0.79. CSF1R was expressed in monocytes (Fig. 6B). Correlation analysis indicated that CSF1R was positively correlated with the fraction of monocytes (ρSpearman = 0.65, p < 0.05) (Fig. 6C).
The correlation analysis of CSF1R and ICs in COVID-19 Patients. (A) Venn diagram identified monocytes have a similar differential fraction in those groups. (B) CSF1R was expressed in monocytes with Q < 0.05 and GEPs filtered expression value = 252.20. (C) Positive correlation between CSF1R and monocytes with ρSpearman = 0.65 and p < 0.05
Discussion
In this study, we adopted an integrated ML-based PPPM approach to analyze clinical-demographic and multi-omics data, develop and validate a nomogram prediction model, and screen IRF-related IC for targeted prevention. A nomogram model, based on two negative-regulatory SDEpcGs (CSF1R and PI16) in COVID-19 patients, was identified as an optimal model for ICUA prediction. Monocytes were related to CSF1R, which was an IRF of ICUA. It might be a potential target for ICUA monitoring and prevention.
Biological function annotation of the SDEpcGs in COVID-19
Two negative regulatory genes, including PI16 and CSF1R, were identified as SDEpcGs with a threshold (q < 0.05, synchronous log2FC < -1). PI16 protein is an encoding peptidase inhibitor. Although PI16 was proved expressing in memory Treg [42], the role of PI16 in COVID-19 is unclear. There was a positive correlation between the monocytes with CD68-CSF1R-IL1BhiCD14 + immunophenotype and the severity of COVID-19, and a significant drop in membrane CSF1R is useful for stratifying in COVID-19 patients [43, 44]. In our study, GO function enrichment analysis showed that CSF1R was located in the CSF1-CSF1R complex and played macrophage colony − stimulating factor receptor activity. It is mainly associated with monocyte-macrophage differentiation, proliferation and migration. KEGG analysis indicated that CSF1R might be involved in viral protein interactions with cytokines and cytokine receptors.
Significance of ML-based PPPM approach in ICUA prediction of COVID-19 patients
As a heterogeneous acute respiratory disease, COVID-19 can manifest a wide range of symptoms in COVID-19 patients, from mild to severe [45]. With the pandemic spreading of COVID-19 worldwide, public health resources have continued to be pressed, and the shortage of ICU-beds increased waiting times as well as mortality rates [46]. Now, researches have been shown CS and aCCI could be used as predictors of severe clinical outcomes in COVID-19 patients [47, 48]. Although several studies have shown that age is one of the important clinical-demographic features for predicting ICUA in the COVID-19 patients with complications [49, 50], the CS and aCCI give the lower AUCs and are inaccurate to predict IUCA of COVID-19 patients in our study. To provide solutions to ICU-beds shortage, the accuracy and interpretable predictive models and specific prevention target for the ICUA of COVID-19 patients urgently need to be identified and defined. Currently, SARS-CoV-2 has been widely spread around the world. This virus can induce immunologic complications, and immune responses play an important role in the occurrence and development of COVID-19 [51]. Omics have been used to analyze the immune response in COVID-19 and contribute to potential therapeutic strategies [52, 53]. As far as we know, there are no reports using multi-omics and ICs profiling to predict ICUA in COVID-19 patients. Multi-omics and ICs profiling could play an important role in personalized medicine of COVID-19 patients. PPPM is a new paradigm that focuses on the integrated prediction, prevention and individual treatment of disease in patients. Different from the paradigm of reactive medicine, PPPM is based on new biological and computational techniques which could help to predict and prevent disease before symptoms appeared and benefit the options of individualized treatment. With the development of artificial intelligence, various ML algorithms have been applied for cost-effective predictions and targeted preventive measures for individual patients in the practical implementation of PPPM. At present, ML-based PPPM was recommended for handling problems by COVID-19 pandemic [54]. Building the optimal model, analyzing ICs profile and screening specific IRF-related IC in the framework of PPPM with ML algorithms have a profound effect on the cost-effective predictions and targeted preventive for ICUA in COVID-19 patients.
A nomogram enables the accurate prediction of ICUA in COVID-19 patients
LASSO is a popular ML algorithm often used in conjunction with logistic regression that can screen the characteristic variable and estimate the optimal model. Although some researchers suggested that the random forest (RF) model had a better prediction effect than the logistic regression model for predicting COVID-19 severity [55], a recent study noted that logistic regression is not always inferior to other ML algorithms [56]. It has better explanatory power than RF and the previously mentioned algorithms because they are a kind of “black-box” ML algorithms [57]. The Boruta algorithm was especially suitable for feature selection, although it is also a tree-based ML algorithm, which can be applied easily and modeled nonlinear relations well without much tuning [58]. Each machine learning algorithm has its inherent characteristics and applicable scenarios. In the majority of cases, integrated these algorithms might give better performances. With the development of genomics and improvement of gene chip, studies would go deep into the molecule level, and many researchers began to use RNA sequencing (RNA-seq) data to build prediction models [59,60,61]. Yet, it is worth noting that RNA-seq could not provide accurate absolute measurements, and often produced gene-specific biases due to different laboratories, different platforms or different analysis pipelines. In this study, we firstly adopt the method of the SEQC Consortium [62], and use FCij as specific filters in order to normalize the relative accurate and reproducible expression across platforms and laboratories. A ML-based PPPM approach with multi-omics was then used to predict the ICUA in COVID-19 patients, and a nomogram model was built via LASSO-logistic algorithm in training set and validated in testing set.
By following PPPM, we firstly identified two SDEpcGs (CSF1R and PI16) from multi-omics data and two clusters were integrated through a consensus clustering algorithm based on the SDEpcGs expression matrix. Then, LASSO-logistic algorithm was used to build the two-SDEpcGs-FCij-based nomogram model to predict the ICUA of COVID-19 patients. This nomogram was optimal and reliable in further validations. It presented better interpretability, and received a higher net benefit alone across the reasonable threshold probability (range: 0.2 to 0.8) than CS or aCCI, although they had been identified as a predictor of the outcomes of COVID-19 patients. These suggest that the nomogram has better forecast ability for ICUA in COVID-19 patients than CS or aCCI, and is available for personalized treatment in secondary care. Before using the nomogram, each clinical laboratory should build an RNA-seq database to obtain the median log2 (TPM + 1) value of non-ICU COVID-19 patients. Then, the FCij of CSF1R and PI16 of each evaluated COVID-19 patient were calculated by the formula mentioned above. One important future direction is to establish more standardized pipelines to calculate FCij in RNA-seq analysis. Thus, future iterations of analysis methods may in fact demonstrate even greater potency of the nomogram for personalized treatment.
CSF1R-related monocytes benefits for targeted prevention of ICUA in COVID-19 patients
Recent studies indicated ICs played an important role in fighting COVID-19 [43, 63], and monocytes were associated with the risk stratifications and cytokine storms of COVID-19 patients [64, 65]. Previous researches showed that circulating monocytes played a role in all stages of SARS-Cov-2 infection. It might play an important role in regulated and maintained the monocytes in response to SARS-Cov-2 infection and negative correlation with the severity of COVID-19. CSF1R, as a pan-monocyte marker, expressed in all subsets of monocytes, can be used as a monitoring indicator of the level of monocytes activation in the blood of COVID-19 patients [66]. In our study, CSF1R was less expressed in cluster 2 of the COVID-19 patients, who were serious and need to enter ICU or use MV, and the monocytes were significantly decreased in COVID-19, cluster 2, MV and ICU groups. CSF1R was identified as a strong robustness IRF of ICUA by joint algorithms (Logistic-Boruta-OAT). We confirmed that, as an IRF of ICUA, CSF1R was mainly expressed in monocytes by CIBERSORTx analysis and it significantly positively correlated with the fraction of monocytes. These indicated CSF1R-related monocytes could be used as a potential individualized prevention target of ICUA of COVID-19 patients. In other words, CSF1R-related monocytes might be monitorable by the log2FC of monocytes fraction using flow cytometry in primary care. The 22 ICs plates were configured with staining cocktails according to the published research [67]. Monocytes were identified by size via forward- and side-scatter properties and CSF1R-related monocytes were stained by CSF1R liquid antibody. Low log2FCIC (< -0.79) indicates high possibility of ICUA. The drugs, which can stimulate the proliferation of CSF1R-related monocytes and upregulate the fraction of monocytes, might have a broad research and application prospect in personalized medicine for the COVID-19 patients with decreased fraction of monocytes.
Strengths and limitations
There are several strengths worth mentioning. First, this study proposed and used the FCij of SDEpcG to build a predictive model, and therefore, avoided the potential heterogeneity in some of the previous researches. Second, evaluation of the fraction of monocytes with log2FC could simplify the forecasting process, reduced costs and save time in the context of routine clinical laboratory flow cytometry in primary patient care. Third, the nomogram increased benefits, and could help the physicians to have a better predicting of the ICUA for COVID-19 patients in secondary care. One of limitations of our study however is that the RNA-seq data might differ significantly under different experimental and analysis pipelines. To use the nomogram, each clinical laboratory should establish its own “basal” level (median log2 (TPM + 1) value of non-ICU COVID-19 patients). It is worth further discussion on how many biological replicates are needed in the RNA-seq experiment, because lots of biological replicates will take much more time to get the median log2 (TPM + 1) value of non-ICU COVID-19 patients with higher-cost RNA-seq analysis, and inadequate biological duplication will produce a bias in the estimates. In addition, the study was based solely on in silico analysis, and the research data came mainly from an online database, focused on adults with limited sample size, and more patients of all age and sex should be included in future prospective studies. Considering the intra-individual biological variation of CSF1R and PI16 expression and ICs profile was calculated by CIBERSORTx, the proper determination of FCij of SDEpcGs and the fraction of monocytes in real process of patient care should be further evaluated.
Conclusions and expert recommendations
COVID-19 is a contagious disease caused by SARS-CoV-2. Due to the high heterogeneity of COVID-19, the accurate prediction and targeted prevention of ICUA have been challenging. We developed and validated a nomogram to predict the ICUA, and it is showing that the nomogram is an optimal model for personalized prediction of ICUA in COVID-19 patients. It could be used as an ICUA triaging approach in clinical practice in secondary care. This predictive approach is effective and accurate but needed further confirm by the real-world evidence. Before using the nomogram, the RNA-seq data from different clinical laboratory in the real world should be standardized. Thus, each clinical laboratory needs establish its own “basal/healthy” level of the particular expression in the particular patient suffering of COVID-19. We recommended the median log2(TPM + 1) value of non-ICU COVID-19 patients as the “basal/healthy” level. To establish the level, at least 12 biological repeats are considered necessary [68].
In this study, we also identified CSF1R as the IRF of ICUA in COVID-19 patients. It is expressed in monocytes and has a significant positive relation with the fraction of monocytes. That is to say, we could do preliminary ICUA triage in primary care, according to the fraction of monocytes. We suggest the -0.79, which was calculated by the formula above, as the threshold of log2FC of the fraction of monocytes. The individual COVID-19 patient’s log2FC was calculated by the following formula:
where the indices I stands for individual patient. If the log2FCI < -0.79, the COVID-19 patient has a high possibility of ICUA.
The above tools have a promise application prospect in the monitoring of ICUA of COVID-19 patients, and targeted prevention for individual treatment of COVID-19 patients. Integrated approach, including the nomogram and the evaluating of log2FCI of monocytes fraction, could provide an optimal and value-add solution for ICUA triage in the framework of PPPM. The added value of the findings was to fill the research gap of ICUA prediction and targeted prevention in individual medicine of COVID-19 patients, and we propose the integrated approach for the further PPPM development and practical application in ICUA prediction. The integrated approach promotes the ICUA cost-effective prediction and profits the targeted monitoring and prevention for individual treatments in COVID-19 patients. All of these contribute to the paradigm shift towards proactive PPPM, and should be highlighted as a potential proactive medical approach for COVID-19 patients.
Data availability
Publicly available datasets were analyzed in this study. The data can be found here:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7543711/bin/mmc2.xlsx;
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157103;
Code availability
The software applications or custom codes for data analysis are available upon request.
Abbreviations
- aCCI :
-
Age-adjusted Charlson Comorbidity Index
- AIC :
-
Akaike information criterion
- AUC :
-
Area under curve
- BIC :
-
Bayesian information criterions
- BP :
-
Biological processes
- CC :
-
Cell composition
- CDF :
-
Cumulative distribution function
- CI :
-
Confidence interval
- COVID-19 :
-
Coronavirus disease 2019
- CRISP3 :
-
Cysteine Rich Secretory Protein 3
- CS :
-
Charlson score
- CSF1R :
-
Colony stimulating factor 1 receptor
- DCA :
-
Decision curve analysis
- DEGs :
-
Differentially expressed genes
- DEPs :
-
Differentially expressed proteins
- EEF1A1 :
-
Eukaryotic Translation Elongation Factor 1 Alpha 1
- EEF2 :
-
Eukaryotic Translation Elongation Factor 2
- EPMA :
-
European Association for Predictive, Preventive and Personalized Medicine
- FC ij :
-
Each fold change
- FDR :
-
False discovery rate
- FTL :
-
Ferritin Light Chain
- GEO :
-
Gene Expression Omnibus
- GEPs :
-
Gene expression profiles
- GiViTI :
-
Gruppo italiano per la Valutazione degli interventi in Terapia Intensiva
- GO :
-
Gene Ontology
- HSPA1A :
-
Heat Shock Protein Family A (Hsp70) Member 1A
- HSPA1B :
-
Heat Shock Protein Family A (Hsp70) Member 1B
- HSPD1 :
-
Heat Shock Protein Family D (Hsp60) Member 1
- ICU :
-
Intensive care unit
- ICUA :
-
Intensive care unit admissions
- IRF :
-
Independent prognostic factors
- ICs :
-
Immune cells
- KDE :
-
Kernel density estimation
- KEGG :
-
Kyoto Encyclopedia of Genes and Genomes
- LASSO :
-
Least absolute shrinkage and selection operator
- log 2 FC :
-
Log2 fold change
- MF :
-
Molecular function
- ML :
-
Machine learning
- MV :
-
Mechanical ventilation
- No.:
-
Number
- NSD :
-
No significant difference
- NUCB1 :
-
Nucleobindin 1
- OAT :
-
One-at-a-time
- OR :
-
Odds ratio
- PAC :
-
Proportion of ambiguous clustering
- PCP :
-
Percentage of correct predictions
- PI16 :
-
Peptidase inhibitor 16
- PPPM :
-
Predictive, Preventive and Personalized Medicine
- Pr :
-
Predicted risk
- RF :
-
Random forest
- RMSE :
-
Root mean square error
- RNA-seq :
-
RNA sequencing
- ROC :
-
Receiver operating characteristic
- SARS-CoV-2 :
-
Severe acute respiratory syndrome coronavirus 2
- SDEpcGs :
-
Synchronous differentially expressed protein-coding genes
- SFTPB :
-
Surfactant Protein B
- ρSpearman :
-
Spearman’s rank correlation coefficient
- STOM :
-
Stomatin
- TKT :
-
Transketolase
- TPM :
-
Transcripts per million
- Tregs :
-
Regulatory T cells
- UniProtKB :
-
UniProt Knowledgebase
- VIF :
-
Variance inflation factor
- VNN1 :
-
Vanin 1
- WMW :
-
Wilcoxon-Mann–Whitney
- WRS :
-
Wilcoxon rank-sum
- VS.:
-
Versus
References
Sohrabi C, Alsafi Z, O’Neill N, Khan M, Kerwan A, Al-Jabir A, et al. World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). Int J Surg. 2020;76:71–6.
Shim RS, Starks SM. COVID-19, Structural Racism, and Mental Health Inequities: Policy Implications for an Emerging Syndemic. Psychiatr Serv. 2021;72(10):1193–8.
Georgakopoulou VE, Gkoufa A, Damaskos C, Papalexis P, Pierrakou A, Makrodimitri S, et al. COVID-19-associated acute appendicitis in adults. A report of five cases and a review of the literature. Exp Ther Med. 2022;24(1):482.
Kloka JA, Blum LV, Old O, Zacharowski K, Friedrichson B. Characteristics and mortality of 561,379 hospitalized COVID-19 patients in Germany until December 2021 based on real-life data. Sci Rep. 2022;12(1):11116.
Kumar A, Kumar N, Kumar A, Kumar A. COVID-19 pandemic and the need for objective criteria for ICU admissions. J Clin Anesth. 2020;66:109945.
Joynt GM, Leung AKH, Ho CM, So D, Shum HP, Chow FL, et al. Admission triage tool for adult intensive care unit admission in Hong Kong during the COVID-19 outbreak. Hong Kong Med J. 2022;28(1):64–72.
Bouwmans P, Brandts L, Hilbrands LB, Duivenvoorden R, Vart P, Franssen CFM, et al. The clinical frailty scale as a triage tool for ICU admission of dialysis patients with COVID-19: An ERACODA analysis. Nephrol Dial Transplant. 2022;37(11):2264–74.
Guan W, Ni Z, Hu Y, Liang W, Ou C, He J, et al. Clinical characteristics of coronavirus disease 2019 in China. N Engl J Med. 2020;382(18):1708–20.
Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan. China Lancet. 2020;395(10223):497–506.
Golubnitschaja O, Baban B, Boniolo G, Wang W, Bubnov R, Kapalla M, et al. Medicine in the early twenty-first century: paradigm and anticipation - EPMA position paper 2016. EPMA J. 2016;7(1):23.
Chaari L (ed). Digital Health in Focus of Predictive, Preventive and Personalised Medicine. Advances in Predictive, Preventive and Personalised Medicine. Cham, Switzerland: Springer Nature Switzerland AG; 2020.
Golubnitschaja O, Costigliola V. Predictive, Preventive and Personalised Medicine as the Medicine of the Future: Anticipatory Scientific Innovation and Advanced Medical Services. In: Nadin M, Editors. Anticipation and Medicine. Cham, Switzerland: Springer; 2017. pp. 69–85.
Nazir A, Ampadu HK. Interpretable deep learning for the prediction of ICU admission likelihood and mortality of COVID-19 patients. PeerJ Comput Sci. 2022;8:e889.
Famiglini L, Campagner A, Carobene A, Cabitza F. A robust and parsimonious machine learning method to predict ICU admission of COVID-19 patients. Med Biol Eng Comput. 2022. https://doi.org/10.1007/s11517-022-02543-x
Li X, Ge P, Zhu J, Li H, Graham J, Singer A, et al. Deep learning prediction of likelihood of ICU admission and mortality in COVID-19 patients using clinical variables. PeerJ. 2020;8:e10337.
Tachalov VV, Orekhova LY, Kudryavtseva TV, Loboda ES, Pachkoriia MG, Berezkina IV, et al. Making a complex dental care tailored to the person: population health in focus of predictive, preventive and personalised (3P) medical approach. EPMA J. 2021;12(2):129–40.
Vassiliou AG, Keskinidou C, Jahaj E, Gallos P, Dimopoulou I, Kotanidou A, et al. ICU Admission Levels of Endothelial Biomarkers as Predictors of Mortality in Critically Ill COVID-19 Patients. Cells. 2021;10(1):186.
Adamik B, Ambrożek-Latecka M, Dragan B, Jeznach A, Śmiechowicz J, Gożdzik W, et al. Inflammasome-related Markers upon ICU Admission do not Correlate with Outcome in Critically Ill COVID-19 Patients. Shock. 2022;57(5):672–9.
Bellos I, Tavernaraki K, Stefanidis K, Michalopoulou O, Lourida G, Korompoki E, et al. Chest CT severity score and radiological patterns as predictors of disease severity, ICU admission, and viral positivity in COVID-19 patients. Respir Investig. 2021;59(4):436–45.
Aguersif A, Sarton B, Bouharaoua S, Gaillard L, Standarovski D, Faucoz O, et al. Lung Ultrasound to Assist ICU Admission Decision-Making Process of COVID-19 Patients with Acute Respiratory Failure. Crit Care Explor. 2022;4(6):e0719.
Schultze JL, Aschenbrenner AC. COVID-19 and the human innate immune system. Cell. 2021;184(7):1671–92.
Song L, Liang E, Wang H, Shen Y, Kang C, Xiong Y, et al. Differential diagnosis and prospective grading of COVID-19 at the early stage with simple hematological and biochemical variables. Diagn Microbiol Infect Dis. 2021;99(2):115169.
Markovic SS, Jovanovic M, Gajovic N, Jurisevic M, Arsenijevic N, Jovanovic M, et al. IL 33 Correlates With COVID-19 Severity, Radiographic and Clinical Finding. Front Med (Lausanne). 2021;8:749569.
Wen W, Su W, Tang H, Le W, Zhang X, Zheng Y, et al. Immune cell profiling of COVID-19 patients in the recovery stage by single-cell sequencing. Cell Discov. 2020;6:31.
Stephenson E, Reynolds G, Botting RA, Calero-Nieto FJ, Morgan MD, Tuong ZK, et al. Single-cell multi-omics analysis of the immune response in COVID-19. Nat Med. 2021;27(5):904–16.
Unterman A, Sumida TS, Nouri N, Yan X, Zhao AY, Gasque V, et al. Single-cell multi-omics reveals dyssynchrony of the innate and adaptive immune system in progressive COVID-19. Nat Commun. 2022;13(1):440.
Overmyer KA, Shishkova E, Miller IJ, Balnis J, Bernstein MN, Peters-Clarke TM, et al. Large-Scale Multi-omic Analysis of COVID-19 Severity. Cell Syst. 2021;12(1):23–40.
Ma J, Li R, Wang J. Characterization of a prognostic four-gene methylation signature associated with radiotherapy for head and neck squamous cell carcinoma. Mol Med Rep. 2019;20(1):622–32.
UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49(D1):D480–9.
Zhang W, Zhang Y, Min Z, Mo J, Ju Z, Guan W, et al. COVID19db: a comprehensive database platform to discover potential drugs and targets of COVID-19 at whole transcriptomic scale. Nucleic Acids Res. 2022;50(D1):D747–57.
Reimand J, Arak T, Adler P, Kolberg L, Reisberg S, Peterson H, et al. g: Profiler-a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 2016;44(W1):W83–9.
Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010;26(12):1572–3.
Charlson M, Szatrowski TP, Peterson J, Gold J. Validation of a combined comorbidity index. J Clin Epidemiol. 1994;47(11):1245–51.
Bergmann R, Ludbrook J, Spooren WPJM. Different Outcomes of the Wilcoxon-Mann-Whitney Test from Different Statistics Packages. Am Stat. 2000;54(1):72–7. https://doi.org/10.1080/00031305.2000.10474513.
Lüdecke D, Ben-Shachar M, Patil I, Waggoner P, Makowski D. performance: An R Package for Assessment, Comparison and Testing of Statistical Models. J Open Source Softw. 2021;6(60):3139.
Nattino G, Finazzi S, Bertolini G. A new test and graphical tool to assess the goodness of fit of logistic regression models. Stat Med. 2016;35(5):709–20.
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26(6):565–74.
Hamby DM. A review of techniques for parameter sensitivity analysis of environmental models. Environ Monit Assess. 1994;32(2):135–54.
Rusk N. Expanded CIBERSORTx. Nat Methods. 2019;16(7):577. https://doi.org/10.1038/s41592-019-0486-8.
Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol. 2019;37(7):773–82.
Box GEP, Tidwell PW. Transformation of the independent variables. Technometrics. 1962;4(4):531–50.
Nicholson IC, Mavrangelos C, Bird DRG, Bresatz-Atkins S, Eastaff-Leung NG, Grose RH, et al. PI16 is expressed by a subset of human memory Treg with enhanced migration to CCL17 and CCL20. Cell Immunol. 2012;275(1–2):12–8.
Wang X, Bai H, Ma J, Qin H, Zeng Q, Hu F, et al. Identification of Distinct Immune Cell Subsets Associated with Asymptomatic Infection, Disease Severity, and Viral Persistence in COVID-19 Patients. Front Immunol. 2022;13:812514.
Combes TW, Orsenigo F, Stewart A, Mendis ASJR, Dunn-Walters D, Gordon S, et al. CSF1R defines the mononuclear phagocyte system lineage in human blood in health and COVID-19. Immunother Adv. 2021;1(1):ltab003. https://doi.org/10.1093/immadv/ltab003.
Trevisan C, Remelli F, Fumagalli S, Mossello E, Okoye C, Bellelli G, et al. COVID-19 as a Paradigmatic Model of the Heterogeneous Disease Presentation in Older People: Data from the GeroCovid Observational Study. Rejuvenation Res. 2022;25(3):129–40.
Arora P, Shankar T, Joshi S, Pillai A, Kabi A, Arora RK, et al. Prognostication of COVID-19 patients using ROX index and CURB-65 score - A retrospective observational study. J Family Med Prim Care. 2022;11(10):6006–14.
Barış SA, Boyacı H, Akhan S, Mutlu B, Deniz M, Başyiğit İ. Charlson Comorbidity Index in Predicting Poor Clinical Outcomes and Mortality in Patients with COVID-19. Turk Thorac J. 2022;23(2):145–53.
Kim DH, Park HC, Cho A, Kim J, Yun K, Kim J, et al. Age-adjusted Charlson comorbidity index score is the best predictor for severe clinical outcome in the hospitalized patients with COVID-19 infection. Medicine (Baltimore). 2021;100(18):e25900.
Lei M, Lin K, Pi Y, Huang X, Fan L, Huang J, et al. Clinical Features and Risk Factors of ICU Admission for COVID-19 Patients with Diabetes. J Diabetes Res. 2020;2020:5237840.
Solmaz I, Özçaylak S, Alakuş ÖF, Kılıç J, Kalın BS, Güven M, et al. Risk factors affecting ICU admission in COVID-19 patients; Could air temperature be an effective factor? Int J Clin Pract. 2021;75(3):e13803.
Azkur AK, Akdis M, Azkur D, Sokolowska M, Veen WVD, Brüggen M, et al. Immune response to SARS-CoV-2 and mechanisms of immunopathological changes in COVID-19. Allergy. 2020;75(7):1564–81.
Hao M, Wang D, Xia Q, Kan S, Chang L, Liu H, et al. Pathogenic Mechanism and Multi-omics Analysis of Oral Manifestations in COVID-19. Front Immunol. 2022;13:879792.
Milani D, Caruso L, Zauli E, Owaifeer AMA, Secchiero P, Zauli G, et al. p53/NF-kB Balance in SARS-CoV-2 Infection: From OMICs, Genomics and Pharmacogenomics Insights to Tailored Therapeutic Perspectives (COVIDomics). Front Pharmacol. 2022;13:871583.
Wang LY, Cui JJ, OuYang QY, Zhan Y, Wang Y, Xu X, et al. Complex analysis of the personalized pharmacotherapy in the management of COVID-19 patients and suggestions for applications of predictive, preventive, and personalized medicine attitude. EPMA J. 2021;12:307–24.
Xiong Y, Ma Y, Ruan L, Li D, Lu C, Huang L, et al. Comparing different machine learning techniques for predicting COVID-19 severity. Infect Dis Poverty. 2022;11(1):19.
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Calster BV. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12–22.
Beranová L, Joachimiak MP, Kliegr T, Rabby G, Sklenák V. Why was this cited? Explainable machine learning applied to COVID-19 research literature. Scientometrics. 2022;127(5):2313–49.
Kursa MB, Rudnicki WR. Feature Selection with the Boruta Package. J Stat Softw. 2010;36(11):1–13.
Wang J, Tu W, Qiu J, Wang D. Predicting prognosis and immunotherapeutic response of clear cell renal cell carcinoma. Front Pharmacol. 2022;13:984080.
Shen N, Zhu S, Zhang Z, Yong X. High Expression of COL10A1 Is an Independent Predictive Poor Prognostic Biomarker and Associated with Immune Infiltration in Advanced Gastric Cancer Microenvironment. J Oncol. 2022;2022:1463316.
Wang D, Chen B, Bai S, Zhao L. Screening and identification of tissue-infiltrating immune cells and genes for patients with emphysema phenotype of COPD. Front Immunol. 2022;13:967357.
SEQC Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium. Nat Biotechnol. 2014;32(9):903–14.
Burnett CE, Okholm HTL, Tenvooren I, Marquez DM, Tamaki S, Sandoval PM, et al. Mass cytometry reveals a conserved immune trajectory of recovery in hospitalized COVID-19 patients. Immun. 2022;55(7):1284–98.
Ligi D, Lo Sasso B, Henry BM, Ciaccio M, Lippi G, Plebani M, et al. Deciphering the role of monocyte and monocyte distribution width (MDW) in COVID-19: an updated systematic review and meta-analysis. Clin Chem Lab Med. 2023. https://doi.org/10.1515/cclm-2022-0936.
Guo C, Li B, Ma H, Wang X, Cai P, Yu Q, et al. Single-cell analysis of two severe COVID-19 patients reveals a monocyte-associated and tocilizumab-responding cytokine storm. Nat Commun. 2020;11(1):3924.
Martinez FO, Combes TW, Orsenigo F, Gordon S. Monocyte activation in systemic Covid-19 infection: Assay and rationale. EBioMedicine. 2020;59:102964.
Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust enumeration of cell subsets from tissue expression Profiles. Nat Methods. 2015;12(5):453–7.
Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, et al. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA. 2016;22(6):839–51.
Funding
This research was supported by the following funds:
Zhejiang University special scientific research fund for COVID-19 prevention and control, (Q.S.); National Natural Science Foundation of China, 82272191 (Q.S.); National Natural Science Foundation of China, 81901989 (D.L.); Natural Science Foundation of Zhejiang Province, LY21H150005 (D.L.); Foundation for The Top-Notch Youth Talent Cultivation Project of Independent Design Project of National Clinical Research Center for Child Health, Q21B0007 (D.L.); Special Fund for the Incubation of Young Clinical Scientist, The Children’s Hospital of Zhejiang University School of Medicine, CHZJU2022YS002 (D.L.).
Author information
Authors and Affiliations
Contributions
K.Z., Z.C., and Y.X. designed of the study, curated the data, and wrote the first draft of the manuscript. K. Z. and X.W. performed the statistical analysis and visualization. D.L. wrote, reviewed and edited the manuscript. X.F., and Q.S. supervised the project. All authors contributed to manuscript revision, read, and approved the submitted version.
Corresponding authors
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
All authors gave their consent for publication.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, K., Chen, Z., Xiao, Y. et al. Multi-omics and immune cells’ profiling of COVID-19 patients for ICU admission prediction: in silico analysis and an integrated machine learning-based approach in the framework of Predictive, Preventive, and Personalized Medicine. EPMA Journal 14, 101–117 (2023). https://doi.org/10.1007/s13167-023-00317-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13167-023-00317-5