Enhancing Non-Small Cell Lung Cancer Survival Prediction through Multi-Omics Integration Using Graph Attention Network
<p>Pipeline for NSCLC survival prediction using GAT.</p> "> Figure 2
<p>The omics dataset visualization: (<b>a</b>) mRNA, (<b>b</b>) miRNA, (<b>c</b>) DNA methylation.</p> "> Figure 3
<p>Kaplan–Meier survival curves (<b>a</b>) using all of the features and (<b>b</b>) using the significant features that were selected using chi-square test.</p> "> Figure 4
<p>Predictive performance of GAT-based method using different feature combinations.</p> "> Figure 5
<p>Average ROC curve for cancer survival prediction using mRNA, miRNA, methylation, and clinical information (AUC = 0.82).</p> "> Figure 6
<p>KEGG pathways analysis.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. The Omics Data
2.2. The Clinical Information Data
2.3. Chi-Square Test for Feature Selection
2.4. Kaplan–Meier Survival Curves
2.5. The Cox Proportional Hazards (Cox PH)
2.6. Graph Attention Network (GAT)
2.6.1. Node Features
2.6.2. Attention Mechanism
2.6.3. Normalization
2.6.4. Aggregation
2.6.5. Multi-Head Attention
2.7. Synthetic Minority Over-Sampling Technique (SMOTE)
3. Experimental Setup
4. Results
4.1. Exploratory Analysis by Evaluating the Survival Differences between High-Risk and Low-Risk Groups
4.2. Comparative Analysis of Predictive Models for Non-Small Cell Lung Cancer
4.3. Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Almuayqil, S.N.; Elbashir, M.K.; Ezz, M.; Mohammed, M.; Mostafa, A.M.; Alruily, M.; Hamouda, E. An Approach for Cancer-Type Classification Using Feature Selection Techniques with Convolutional Neural Network. Appl. Sci. 2023, 13, 10919. [Google Scholar] [CrossRef]
- Chaudhary, K.; Poirion, O.B.; Lu, L.; Garmire, L.X. Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer. Clin. Cancer Res. 2018, 24, 1248–1259. [Google Scholar] [CrossRef]
- Ritchie, M.D.; Holzinger, E.R.; Li, R.; Pendergrass, S.A.; Kim, D. Methods of integrating data to uncover genotype–phenotype interactions. Nat. Rev. Genet. 2015, 16, 85–97. [Google Scholar] [CrossRef] [PubMed]
- Alexandrov, L.B.; Nik-Zainal, S.; Wedge, D.C.; Aparicio, S.A.J.R.; Behjati, S.; Biankin, A.V.; Bignell, G.R.; Bolli, N.; Borg, A.; Børresen-Dale, A.; et al. Signatures of mutational processes in human cancer. Nature 2013, 500, 415–421. [Google Scholar] [CrossRef] [PubMed]
- Snyder, M.; Wang, Z.; Gerstein, M. RNA-Seq: A revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009, 10, 57–63. [Google Scholar] [CrossRef]
- Nicholson, J.K.; Lindon, J.C.; Holmes, E. ‘Metabonomics’: Understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 1999, 29, 1181–1189. [Google Scholar] [CrossRef]
- Jones, P.A. Functions of DNA methylation: Islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 2012, 13, 484–492. [Google Scholar] [CrossRef]
- Sidorova, J.; Lozano, J.J. Review: Deep Learning-Based Survival Analysis of Omics and Clinicopathological Data. Inventions 2024, 9, 59. [Google Scholar] [CrossRef]
- Tong, L.; Mitchel, J.; Chatlin, K.; Wang, M.D. Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC Med. Inform. Decis. Mak. 2020, 20, 225. [Google Scholar] [CrossRef]
- Chen, R.; Mias, G.; Li-Pook-Than, J.; Jiang, L.; Lam, H.K.; Chen, R.; Miriami, E.; Karczewski, K.; Hariharan, M.; Dewey, F.; et al. Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes. Cell 2012, 148, 1293–1307. [Google Scholar] [CrossRef]
- Zitnik, M.; Agrawal, M.; Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 2018, 34, i457–i466. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Armenia, J.; Zhang, C.; Penson, A.V.; Reznik, E.; Zhang, L.; Minet, T.; Ochoa, A.; Gross, B.E.; Iacobuzio-Donahue, C.A.; et al. Unifying cancer and normal RNA sequencing data from different sources. Sci. Data 2018, 5, 180061. [Google Scholar] [CrossRef] [PubMed]
- Ellen, J.G.; Jacob, E.; Nikolaou, N.; Markuzon, N. Autoencoder-based multimodal prediction of non-small cell lung cancer survival. Sci. Rep. 2023, 13, 15761. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Zhang, J. Prognostic factors and survival prediction of resected non-small cell lung cancer with ipsilateral pulmonary metastases: A study based on the Surveillance, Epidemiology, and End Results (SEER) database. BMC Pulm. Med. 2023, 23, 413. [Google Scholar] [CrossRef] [PubMed]
- She, Y.; Jin, Z.; Wu, J.; Deng, J.; Zhang, L.; Su, H.; Jiang, G.; Liu, H.; Xie, D.; Cao, N.; et al. Development and Validation of a Deep Learning Model for Non–Small Cell Lung Cancer Survival. JAMA Netw. Open 2020, 3, e205842. [Google Scholar] [CrossRef]
- Zhang, D.; Lu, B.; Liang, B.; Li, B.; Wang, Z.; Gu, M.; Jia, W.; Pan, Y. Interpretable deep learning survival predictive tool for small cell lung cancer. Front. Oncol. 2023, 13, 1162181. [Google Scholar] [CrossRef]
- Zheng, S.; Guo, J.; Langendijk, J.A.; Both, S.; Veldhuis, R.N.J.; Oudkerk, M.; van Ooijen, P.M.A.; Wijsman, R.; Sijtsema, N.M. Survival prediction for stage I-IIIA non-small cell lung cancer using deep learning. Radiother. Oncol. 2023, 180, 109483. [Google Scholar] [CrossRef] [PubMed]
- Guo, Y.; Li, L.; Zheng, K.; Du, J.; Nie, J.; Wang, Z.; Hao, Z. Development and validation of a survival prediction model for patients with advanced non-small cell lung cancer based on LASSO regression. Front. Immunol. 2024, 15, 1431150. [Google Scholar] [CrossRef]
- Li, Q.; Zhao, Y.; Xu, Z.; Ma, Y.; Wu, C.; Shi, H. Development and validation of prognostic models for small cell lung cancer patients with liver metastasis: A SEER population-based study. BMC Pulm. Med. 2024, 24, 13. [Google Scholar] [CrossRef]
- Hasin, Y.; Seldin, M.; Lusis, A. Multi-omics approaches to disease. Genome Biol. 2017, 18, 83. [Google Scholar] [CrossRef]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
- Colaprico, A.; Silva, T.C.; Olsen, C.; Garofano, L.; Cava, C.; Garolini, D.; Sabedot, T.S.; Malta, T.M.; Pagnotta, S.M.; Castiglioni, I.; et al. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016, 44, 71. [Google Scholar] [CrossRef]
- Jiang, G.; Zheng, J.; Ren, S.; Yin, W.; Xia, X.; Li, Y.; Wang, H. A comprehensive workflow for optimizing RNA-seq data analysis. BMC Genom. 2024, 25, 631. [Google Scholar] [CrossRef] [PubMed]
- Ritchie, M.E.; Phipson, B.; Wu, D.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies. Nucleic Acids Res. 2015, 43, e47. [Google Scholar] [CrossRef]
- Kuhn, M.; Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models: Hardback; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
- Elamin, A.M.K.; Mohmmed, A.O.A. The Cox regression and Kaplan-Meier for time-to-event of survival data patients with renal failure. World J. Adv. Eng. Technol. Sci. 2023, 8, 97–109. [Google Scholar] [CrossRef]
- Meira-Machado, L. The Kaplan-Meier Estimator: New Insights and Applications in Multi-state Survival Analysis. In Computational Science and Its Applications—ICCSA 2023 Workshops; Springer Nature: Cham, Switzerland, 2023; pp. 129–139. [Google Scholar]
- Koletsi, D.; Pandis, N. Survival analysis, part 2: Kaplan-Meier method and the log-rank test. Am. J. Orthod. Dentofac. Orthop. 2017, 152, 569–571. [Google Scholar] [CrossRef]
- Negash Terefe, A. Modeling Time-to- Recovery of Adult Diabetic Patients Using Cox-Proportional Hazards Model. Int. J. Stat. Distrib. Appl. 2017, 3, 67. [Google Scholar] [CrossRef]
- Veličković, P.; Casanova, A.; Liò, P.; Cucurull, G.; Romero, A.; Bengio, Y. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018—Conference Track Proceedings, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Zhang, X.; Zitnik, M. GNNGuard: Defending Graph Neural Networks against Adversarial Attacks. arXiv 2020, arXiv:2006.08149. [Google Scholar]
- Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef]
- Bamber, D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psychol. 1975, 12, 387–415. [Google Scholar] [CrossRef]
- Austin, P.C.; Steyerberg, E.W. Interpreting the concordance statistic of a logistic regression model: Relation to the variance and odds ratio of a continuous explanatory variable. BMC Med. Res. Methodol. 2012, 12, 82. [Google Scholar] [CrossRef] [PubMed]
- Wu, T.; Hu, E.; Xu, S.; Chen, M.; Guo, P.; Dai, Z.; Feng, T.; Zhou, L.; Tang, W.; Zhan, L.; et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2021, 2, 100141. [Google Scholar] [CrossRef] [PubMed]
- Kim, N.E.; Kang, E.; Ha, E.; Lee, J.; Lee, J.H. Association of type 2 diabetes mellitus with lung cancer in patients with chronic obstructive pulmonary disease. Front. Med. 2023, 10, 1118863. [Google Scholar] [CrossRef] [PubMed]
- Lu, Y.; Hu, Y.; Zhao, Y.; Xie, S.; Wang, C. Impact of Type 2 Diabetes Mellitus on the Prognosis of Non-Small Cell Lung Cancer. J. Clin. Med. 2022, 12, 321. [Google Scholar] [CrossRef]
- Garmendia, I.; Varthaman, A.; Marmier, S.; Angrini, M.; Matchoua, I.; Darbois-Delahousse, A.; Josseaume, N.; Foy, P.; Roumenina, L.T.; Naouar, N.; et al. Acute Influenza Infection Promotes Lung Tumor Growth by Reprogramming the Tumor Microenvironment. Cancer Immunol. Res. 2023, 11, 530–545. [Google Scholar] [CrossRef]
- Weng, C.; Chen, L.; Lin, C.; Chen, H.; Lee, H.H.; Ling, T.; Hsiao, F. Association between the risk of lung cancer and influenza: A population-based nested case-control study. Int. J. Infect. Dis. 2019, 88, 8–13. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, T.; Xu, Z.; Dong, M. Association of Epstein-Barr virus (EBV) with lung cancer: Meta-analysis. Front. Oncol. 2023, 13, 1177521. [Google Scholar] [CrossRef]
- Osorio, J.C.; Blanco, R.; Corvalán, A.H.; Muñoz, J.P.; Calaf, G.M.; Aguayo, F. Epstein–Barr Virus Infection in Lung Cancer: Insights and Perspectives. Pathogens 2022, 11, 132. [Google Scholar] [CrossRef]
- Bi, G.; Yao, G.; Bian, Y.; Xue, L.; Zhang, Y.; Lu, T.; Fan, H. The Effect of Diabetes Mellitus on Prognosis of Patients with Non-Small-Cell Lung Cancer: A Systematic Review and Meta-Analysis. Ann. Thorac. Cardiovasc. Surg. 2020, 26, 1–12. [Google Scholar] [CrossRef]
- Gyamfi, J.; Kim, J.; Choi, J. Cancer as a Metabolic Disorder. Int. J. Mol. Sci. 2022, 23, 1155. [Google Scholar] [CrossRef]
- Elkhalifa, A.M.E.; Nabi, S.U.; Shah, O.S.; Bashir, S.M.; Muzaffer, U.; Ali, S.I.; Wani, I.A.; Alzerwi, N.A.N.; Elderdery, A.Y.; Alanazi, A.; et al. Insight into Oncogenic Viral Pathways as Drivers of Viral Cancers: Implication for Effective Therapy. Curr. Oncol. 2023, 30, 1924–1944. [Google Scholar] [CrossRef]
Characteristics | Statistics | Missing |
---|---|---|
Total Number | 624 | |
Gender | 0 | |
Male | 370 (59.29%) | |
Female | 254 (40.71%) | |
Age | 12 | |
Average (std) | 66.19 (9.32) | |
Age range | 39–88 | |
Prior Malignancy | 1 not reported | |
Yes | 86 (13.80%) | |
No | 537 (86.20%) | |
Synchronous malignancy | 43 not reported | |
Yes | 13 (2.24%) | |
No | 568 (97.76%) | |
Prior Treatment | 0 | |
Yes | 3 (0.48%) | |
No | 621 (99.52%) | |
Primary diagnosis | 0 | |
Adenocarcinoma | 348 (55.77%) | |
Squamous cell carcinoma | 276 (44.23%) | |
Tumor stage | 0 | |
T1, T1a, T1b | 209 (33.49%) | |
T2, T2a, T2b | 319 (51.12%) | |
T3 | 78 (12.50%) | |
T4 | 15 (2.40%) | |
TX | 3 (0.48%) | |
Lymph node stage | 1 | |
N0 | 433 (69.50%) | |
N1 | 127 (20.38%) | |
N2 | 51 (8.19%) | |
N3 | 11 (1.77%) | |
NX | 1 (0.16%) | |
Metastasis stage | 4 | |
M0 | 428 (69.03%) | |
M1, M1a, M1b | 12 (1.93%) | |
MX | 180 (29.03%) | |
Tissue organ | 0 | |
Upper lobe, lung | 361 (57.85%) | |
Lower lobe, lung | 214 (34.29%) | |
Middle lobe, lung | 21 (3.37%) | |
Lung, NOS | 14 (2.24%) | |
Overlapping lesion of lung | 8 (1.28%) | |
Main bronchus: | 6 (0.96%) | |
No. of pack-years smoked | 154 | |
Average (std) | 47.09 (28.43) |
Data Type | Number of Significant Features | Min p-Value | Max p-Value |
---|---|---|---|
mRNA Data | 2945 | 3.7131 × 10−7 | 0.04998 |
miRNA Data | 77 | 0.0002 | 0.0315 |
DNA Methylation Data | 13,046 | 6.9051 × 10−6 | 0.0315 |
Data Combination | Number of Features | Average C-Index |
---|---|---|
mRNA-miRNA-Meth-ClinicInfo | 16,079 | 0.81693 |
mRNA-miRNA-Meth | 16,068 | 0.81439 |
mRNA-miRNA | 3022 | 0.86322 |
mRNA-Meth | 15,991 | 0.81750 |
miRNA-Meth | 13,123 | 0.79529 |
mRNA-ClinicInfo | 2956 | 0.86056 |
miRNA-ClinicInfo | 88 | 0.84738 |
Meth-ClinicInfo | 13,057 | 0.79437 |
mRNA-miRNA-ClinicInfo | 3033 | 0.86092 |
mRNA-Meth-ClinicInfo | 16,002 | 0.82184 |
miRNA-Meth-ClinicInfo | 13,134 | 0.79463 |
mRNA | 2945 | 0.85676 |
miRNA | 77 | 0.83605 |
Meth | 13,046 | 0.79091 |
ClinicInfo | 11 | 0.62480 |
Author | Method | Features | Accuracy | C-index |
---|---|---|---|---|
Our Method | Graph attention network | Multi-omics data + Clinical Information | 0.75 | 0.82 |
Jacob G. Ellen [13] | autoencoders | Omics data + Clinical Information | (LUAD 0.69) (LUSC 0.62) | |
Zhang, J [14] | Nomogram model | Clinical Information | 0.71 | |
She Y et al. [15] | Deep learning based-algorithm | Clinical Information | 0.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Elbashir, M.K.; Almotilag, A.; Mahmood, M.A.; Mohammed, M. Enhancing Non-Small Cell Lung Cancer Survival Prediction through Multi-Omics Integration Using Graph Attention Network. Diagnostics 2024, 14, 2178. https://doi.org/10.3390/diagnostics14192178
Elbashir MK, Almotilag A, Mahmood MA, Mohammed M. Enhancing Non-Small Cell Lung Cancer Survival Prediction through Multi-Omics Integration Using Graph Attention Network. Diagnostics. 2024; 14(19):2178. https://doi.org/10.3390/diagnostics14192178
Chicago/Turabian StyleElbashir, Murtada K., Abdullah Almotilag, Mahmood A. Mahmood, and Mohanad Mohammed. 2024. "Enhancing Non-Small Cell Lung Cancer Survival Prediction through Multi-Omics Integration Using Graph Attention Network" Diagnostics 14, no. 19: 2178. https://doi.org/10.3390/diagnostics14192178
APA StyleElbashir, M. K., Almotilag, A., Mahmood, M. A., & Mohammed, M. (2024). Enhancing Non-Small Cell Lung Cancer Survival Prediction through Multi-Omics Integration Using Graph Attention Network. Diagnostics, 14(19), 2178. https://doi.org/10.3390/diagnostics14192178