HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene
<p>The overall process framework of the HSSG.</p> "> Figure 2
<p>The process for constructing a similarity network of a single gene sample.</p> "> Figure 3
<p>The change curve of cluster accuracy in selecting genes for identifying two subtypes. (<b>a</b>) Curve of the accuracy of three different gene screening methods changing with that number of added genes; (<b>b</b>) With the addition of gene modules, the accuracy changed with the number of adding genes.</p> "> Figure 4
<p>Simple heat map of gene expression for the genes selected in identifying two cancer subtypes. (<b>a</b>) The simple heat map of the expression of 644 genes. (<b>b</b>) Simple heat map of the expression of 67 genes.</p> "> Figure 5
<p>The survival rate and enrichment analysis of 67 genes and 644 selected genes. (<b>a</b>) The survival rate schematic of the 67 genes cluster. (<b>b</b>) The survival rate schematic of the 644 genes cluster. (<b>c</b>) GO enrichment analysis of the 67 genes. (<b>d</b>) GO enrichment analysis and KEGG pathway enrichment analysis of the 644 genes.</p> "> Figure 6
<p>Heat map of gene expression clustering for genes selected by six different methods. (<b>a</b>) The heat map of HSSG-selected gene expression clustering. (<b>b</b>) The heat map of Variance-selected gene expression clustering. (<b>c</b>) The heat map of Entropy-selected gene expression clustering. (<b>d</b>) The heat map of Kruskal-test-selected gene expression clustering. (<b>e</b>) The heat map of Differential expression-selected gene expression clustering. (<b>f</b>) The heat map of Random Forest-selected gene expression clustering.</p> "> Figure 6 Cont.
<p>Heat map of gene expression clustering for genes selected by six different methods. (<b>a</b>) The heat map of HSSG-selected gene expression clustering. (<b>b</b>) The heat map of Variance-selected gene expression clustering. (<b>c</b>) The heat map of Entropy-selected gene expression clustering. (<b>d</b>) The heat map of Kruskal-test-selected gene expression clustering. (<b>e</b>) The heat map of Differential expression-selected gene expression clustering. (<b>f</b>) The heat map of Random Forest-selected gene expression clustering.</p> "> Figure 7
<p>The GO enrichment analysis and KEGG pathway enrichment analysis of differentially expressed screening genes.</p> "> Figure 8
<p>The changing curve of the accuracy of three different methods in identifying three cancer subtypes.</p> "> Figure 9
<p>Simple heat map of gene expression for the selected genes in identifying multiple subtypes. (<b>a</b>) Simple heat map of 500 genes selected by pseudo-f statistics. (<b>b</b>) Simple heat map of the 196 genes selected by network module mining.</p> "> Figure 10
<p>The GO and KEGG pathway enrichment analysis of 196 obtained gene.</p> "> Figure 11
<p>The changing cluster accuracy curve for two different ways to add genes.</p> "> Figure 12
<p>The PCA plot of three different cancer samples.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Sources
2.2. Construction of Sample-Similarity Network Based on a Single Gene
2.3. Calculating Heterogeneity Score of Single Gene Based on Pseudo-F Statistics
2.4. Construction of Gene–Gene Network Based on Single Gene Sample-Similarity Network
2.5. Network Analysis and Module Mining
2.6. Performance Evaluation Metrics
3. Results
3.1. Identification of Cancer Subtypes Based on Single-Omics Data
3.1.1. Result of Identifying Two Cancer Subtypes
3.1.2. Result of Identifying Multiple Cancer Subtypes
3.2. The Performance in Feature Selection of Multi-Omics Data
3.3. The Effectiveness and Stability of the Selected Genes
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
GO | Gene Ontology |
KEGG | Kyoto encyclopedia of genes and genomes |
logFC | Log2 foldchange |
ACC | Accuracy |
MCC | Matthews correlation coefficient |
RI | Rand coefficient |
ARI | Adjust Rand coefficient |
References
- Turajlic, S.; Sottoriva, A.; Graham, T.; Swanton, C. Resolving genetic heterogeneity in cancer. Nat. Rev. Genet. 2019, 20, 404–416. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Tian, S.; Qiu, Y.; Zhao, P.; Zou, Q. MDICC: Novel method for multi-omics data integration and cancer subtype identification. Brief. Bioinform. 2022, 23, bbac132. [Google Scholar] [CrossRef] [PubMed]
- Prat, A.; Parker, J.S.; Karginova, O.; Fan, C.; Livasy, C.; Herschkowitz, J.I.; He, X.; Perou, C.M. Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 2010, 12, R68. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jahid, M.J.; Huang, T.H.; Ruan, J. A personalized committee classification approach to improving prediction of breast cancer metastasis. Bioinformatics 2014, 30, 1858–1866. [Google Scholar] [CrossRef] [PubMed]
- Parker, J.S.; Mullins, M.; Cheang, M.C.; Leung, S.; Voduc, D.; Vickery, T.; Davies, S.; Fauron, C.; He, X.; Hu, Z.; et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 2009, 27, 1160. [Google Scholar] [CrossRef]
- Cleary, A.S.; Leonard, T.L.; Gestl, S.A.; Gunther, E.J. Tumour cell heterogeneity maintained by cooperating subclones in Wnt-driven mammary cancers. Nature 2014, 508, 113–117. [Google Scholar] [CrossRef]
- Panchy, N.; Azeredo-Tseng, C.; Luo, M.; Randall, N.; Hong, T. Integrative transcriptomic analysis reveals a multiphasic epithelial–mesenchymal spectrum in cancer and non-tumorigenic cells. Front. Oncol. 2020, 9, 1479. [Google Scholar] [CrossRef]
- Roider, T.; Seufert, J.; Uvarovskii, A.; Frauhammer, F.; Bordas, M.; Abedpour, N.; Stolarczyk, M.; Mallm, J.P.; Herbst, S.A.; Bruch, P.M.; et al. Dissecting intratumour heterogeneity of nodal B-cell lymphomas at the transcriptional, genetic and drug-response levels. Nat. Cell Biol. 2020, 22, 896–906. [Google Scholar] [CrossRef]
- Curtis, C.; Shah, S.P.; Chin, S.F.; Turashvili, G.; Rueda, O.M.; Dunning, M.J.; Speed, D.; Lynch, A.G.; Samarajiwa, S.; Yuan, Y.; et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012, 486, 346–352. [Google Scholar] [CrossRef]
- Guo, L.Y.; Wu, A.H.; Wang, Y.X.; Zhang, L.P.; Chai, H.; Liang, X.F. Deep learning-based ovarian cancer subtypes identification using multi-omics data. BioData Min. 2020, 13, 10. [Google Scholar] [CrossRef]
- Wu, Y.; Wang, H.; Li, Z.; Cheng, J.; Fang, R.; Cao, H.; Cui, Y. Subtypes identification on heart failure with preserved ejection fraction via network enhancement fusion using multi-omics data. Comput. Struct. Biotechnol. J. 2021, 19, 1567–1578. [Google Scholar] [CrossRef] [PubMed]
- Tang, K.L.; Li, T.H.; Xiong, W.W.; Chen, K. Ovarian cancer classification based on dimensionality reduction for SELDI-TOF data. BMC Bioinform. 2010, 11, 109. [Google Scholar] [CrossRef] [Green Version]
- Liu, S.; Zhang, Y.; Shang, X.; Zhang, Z. ProTICS reveals prognostic impact of tumor infiltrating immune cells in different molecular subtypes. Brief. Bioinform. 2021, 22, bbab164. [Google Scholar] [CrossRef]
- Wasito, I.; Istiqlal, A.N.; Budi, I. Data integration model for cancer subtype identification using Kernel Dimensionality Reduction-Support Vector Machine (KDR-SVM). In Proceedings of the 2012 7th International Conference on Computing and Convergence Technology (ICCCT), Seoul, Korea, 3–5 December 2012; pp. 876–880. [Google Scholar]
- Zhu, Z.; Ong, Y.S.; Dash, M. Wrapper–filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man, Cybern. Part B (Cybernetics) 2007, 37, 70–76. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Su, R.; Zhang, J.; Wei, L. Classification and gene selection of triple-negative breast cancer subtype embedding gene connectivity matrix in deep neural network. Brief. Bioinform. 2021, 22, bbaa395. [Google Scholar] [CrossRef] [PubMed]
- Jung, J.; Seol, H.S.; Chang, S. The generation and application of patient-derived xenograft model for cancer research. Cancer Res. Treat. Off. J. Korean Cancer Assoc. 2018, 50, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Inda, M.d.M.; Bonavia, R.; Seoane, J. Glioblastoma multiforme: A look inside its heterogeneous nature. Cancers 2014, 6, 226–239. [Google Scholar] [CrossRef] [Green Version]
- Allison, K.H.; Sledge, G.W. Heterogeneity and cancer. Oncology 2014, 28, 772. [Google Scholar]
- Dagogo-Jack, I.; Shaw, A.T. Tumour heterogeneity and resistance to cancer therapies. Nat. Rev. Clin. Oncol. 2018, 15, 81–94. [Google Scholar] [CrossRef]
- Chen, Y.; Gu, Y.; Hu, Z.; Sun, X. Sample-specific perturbation of gene interactions identifies breast cancer subtypes. Brief. Bioinform. 2021, 22, bbaa268. [Google Scholar] [CrossRef]
- Yuanyuan, Z.; Ziqi, W.; Shudong, W.; Chuanhua, K. SSIG: Single-Sample Information Gain Model for Integrating Multi-Omics Data to Identify Cancer Subtypes. Chin. J. Electron. 2021, 30, 303–312. [Google Scholar] [CrossRef]
- Nakazawa, M.A.; Tamada, Y.; Tanaka, Y.; Ikeguchi, M.; Higashihara, K.; Okuno, Y. Novel cancer subtyping method based on patient-specific gene regulatory network. Sci. Rep. 2021, 11, 23653. [Google Scholar] [CrossRef]
- UCSC Xena. UCSC Xena International Centre; 2022; Available online: https://xenabrowser.net (accessed on 20 July 2022).
- Rogers-Soeder, T.S.; Blackwell, T.; Yaffe, K.; Ancoli-Israel, S.; Redline, S.; Cauley, J.A.; Ensrud, K.E.; Paudel, M.; Barrett-Connor, E.; LeBlanc, E.; et al. Rest-activity rhythms and cognitive decline in older men: The osteoporotic fractures in men sleep study. J. Am. Geriatr. Soc. 2018, 66, 2136–2143. [Google Scholar] [CrossRef] [PubMed]
- Gasch, A.P.; Eisen, M.B. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 2002, 3, 1–22. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Qin, G.; Dang, M.; Gao, H.; Wang, H.; Luo, F.; Chen, R. Deciphering the protein–protein interaction network regulating hepatocellular carcinoma metastasis. Biochim. Biophys. Acta Proteins Proteom. 2017, 1865, 1114–1122. [Google Scholar] [CrossRef] [PubMed]
- Mondragón, R. Estimating degree–degree correlation and network cores from the connectivity of high–degree nodes in complex networks. Sci. Rep. 2020, 10, 1–24. [Google Scholar] [CrossRef]
- Csardi, G.; Nepusz, T. The igraph software package for complex network research. Inter J. Complex Syst. 2006, 1695, 1–9. [Google Scholar]
- Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
- Hullermeier, E.; Rifqi, M.; Henzgen, S.; Senge, R. Comparing Fuzzy Partitions: A Generalization of the Rand Index and Related Measures. IEEE Trans. Fuzzy Syst. 2012, 20, 546–556. [Google Scholar] [CrossRef] [Green Version]
- Yu, S.; Wang, M.; Pang, S.; Song, L.; Qiao, S. Intelligent fault diagnosis and visual interpretability of rotating machinery based on residual neural network. Measurement 2022, 196, 111228. [Google Scholar] [CrossRef]
- Sato, S.; Nakamura, Y.; Tsuchiya, E. Difference of allelotype between squamous cell carcinoma and adenocarcinoma of the lung. Cancer Res. 1994, 54, 5652–5655. [Google Scholar] [PubMed]
- McKight, P.E.; Najab, J. Kruskal-wallis test. In The Corsini Encyclopedia of Psychology; Wiley: Hoboken, NJ, USA, 2010; p. 1. [Google Scholar]
- Hoadley, K.A.; Yau, C.; Wolf, D.M.; Cherniack, A.D.; Tamborero, D.; Ng, S.; Leiserson, M.D.; Niu, B.; McLellan, M.D.; Uzunangelov, V.; et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 2014, 158, 929–944. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wu, D.; Wang, D.; Zhang, M.Q.; Gu, J. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: Application to cancer molecular classification. BMC Genom. 2015, 16, 1022. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Monti, S.; Tamayo, P.; Mesirov, J.; Golub, T. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 2003, 52, 91–118. [Google Scholar] [CrossRef]
- Mo, Q.; Shen, R.; Guo, C.; Vannucci, M.; Chan, K.S.; Hilsenbeck, S.G. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. Biostatistics 2018, 19, 71–86. [Google Scholar] [CrossRef]
- Chalise, P.; Fridley, B.L. Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm. PLoS ONE 2017, 12, e0176278. [Google Scholar] [CrossRef] [Green Version]
- Huang, W.; Jin, Y.; Yuan, Y.; Bai, C.; Wu, Y.; Zhu, H.; Lu, S. Validation and target gene screening of hsa-miR-205 in lung squamous cell carcinoma. Chin. Med. J. 2014, 127, 272–278. [Google Scholar]
- Lebanony, D.; Benjamin, H.; Gilad, S.; Ezagouri, M.; Dov, A.; Ashkenazi, K.; Gefen, N.; Izraeli, S.; Rechavi, G.; Pass, H.; et al. Diagnostic assay based on hsa-miR-205 expression distinguishes squamous from nonsquamous non–small-cell lung carcinoma. J. Clin. Oncol. 2009, 27, 2030–2037. [Google Scholar] [CrossRef]
- Berghmans, T.; Ameye, L.; Willems, L.; Paesmans, M.; Mascaux, C.; Lafitte, J.J.; Meert, A.P.; Scherpereel, A.; Cortot, A.; CsToth, I.; et al. Identification of microRNA-based signatures for response and survival for non-small cell lung cancer treated with cisplatin-vinorelbine A ELCWP prospective study. Lung Cancer 2013, 82, 340–345. [Google Scholar] [CrossRef]
- Jin, L.; Li, Y.; Liu, J.; Yang, S.; Gui, Y.; Mao, X.; Nie, G.; Lai, Y. Tumor suppressor miR-149-5p is associated with cellular migration, proliferation and apoptosis in renal cell carcinoma. Mol. Med. Rep. 2016, 13, 5386–5392. [Google Scholar] [CrossRef] [Green Version]
- Monteleone, N.J.; Lutz, C.S. miR-708-5p enhances erlotinib/paclitaxel efficacy and overcomes chemoresistance in lung cancer cells. Oncotarget 2020, 11, 4699. [Google Scholar] [CrossRef] [PubMed]
- Chen, Q.; Kong, J.; Zou, S.; Gao, H.; Wang, F.; Qin, S.; Wang, W. LncRNA LINC00342 regulated cell growth and metastasis in non-small cell lung cancer via targeting miR-203a-3p. Eur. Rev. Med. Pharmacol. Sci. 2019, 23, 7408–7418. [Google Scholar] [PubMed]
- Geng, Y.; Deng, L.; Su, D.; Xiao, J.; Ge, D.; Bao, Y.; Jing, H. Identification of crucial microRNAs and genes in hypoxia-induced human lung adenocarcinoma cells. Oncotargets Ther. 2016, 9, 4605. [Google Scholar]
- Sun, C.; Huang, C.; Li, S.; Yang, C.; Xi, Y.; Wang, L.; Zhang, F.; Fu, Y.; Li, D. Hsa-miR-326 targets CCND1 and inhibits non-small cell lung cancer development. Oncotarget 2016, 7, 8341. [Google Scholar] [CrossRef] [Green Version]
- Wang, M.; Sun, X.; Yang, Y.; Jiao, W. Long non-coding RNA OIP5-AS1 promotes proliferation of lung cancer cells and leads to poor prognosis by targeting miR-378a-3p. Thorac. Cancer 2018, 9, 939–949. [Google Scholar] [CrossRef]
- Gayosso-Gómez, L.; Zárraga-Granados, G.; Paredes-Garcia, P.; Falfán-Valencia, R.; Vazquez-Manríquez, M.; Martinez-Barrera, L.; Castillo-Gonzalez, P.; Rumbo-Nava, U.; Guevara-Gutierrez, R.; Rivera-Bravo, B.; et al. Identification of circulating miRNAs profiles that distinguish malignant pleural mesothelioma from lung adenocarcinoma. EXCLI J. 2014, 13, 740. [Google Scholar]
ID | Name | Data Type | Sample | Total Genes | |
---|---|---|---|---|---|
Tumor | Normal | ||||
LUAD | Lung Adenocarcinoma | RNA-seq | 517 | 56 | 2050 |
LUSC | Lung Squamous Cell Carcinoma | RNA-seq | 502 | 54 | |
STAD | Stomach Cancer | RNA-seq | 350 | 46 | |
COAD | Colon Cancer | RNA-seq | 288 | 41 | |
THCA | Thyroid Cance | RNA-seq | 513 | 59 | |
LUAD | Lung Adenocarcinoma | miRNA-seq | 518 | 46 | 1881 |
LUSC | Lung Squamous Cell Carcinoma | miRNA-seq | 478 | 45 |
Method | Accuracy | Sensitivity | Specificity |
---|---|---|---|
Random selection | 70.5% | 68.6% | 73.1% |
Differential Expression selection | 92.8% | 97.5% | 89.8% |
Variance-based score | 93.0% | 97.9% | 89.1% |
Kruskal–Wallis Test | 93.6% | 98.2% | 89.9% |
Entropy-based score | 93.4% | 98.2% | 89.6% |
Random Forest | 93.9% | 98.0% | 90.5% |
HSSG | 93.91% | 98.5% | 90.2% |
Method | Accuracy | Sensitivity | Specificity |
---|---|---|---|
Random selection | 90.8% | 92.6% | 88.6% |
Differential Expression selection | 93.2% | 98.4% | 89.1% |
Variance-based score | 94.3% | 97.8% | 91.2% |
Kruskal-Wallis Test | 94.4% | 98.2% | 91.2% |
Entropy-based score | 94.3% | 98.4% | 90.7% |
Random Forest | 94.5% | 98.4% | 91.2% |
HSSG | 94.7% | 98.9% | 91.2% |
Method | Gene Number | Accuracy | Overlap Number with HSSG | Overlap Rate |
---|---|---|---|---|
Random selection | 67 | 70.5% | 0.6 * | 0.89% |
Differential Expression | 79 | 93.2% | 40 | 50.6% |
Variance-based score | 71 | 93.1% | 9 | 12.6% |
Entropy-based score | 176 | 93.7% | 39 | 22.1% |
Kruskal–Wallis Test | 68 | 93.7% | 18 | 26.0% |
Random Forest | 81 | 93.91% | 24 | 29.6% |
HSSG | 67 | 93.91% | 67 | 100% |
Method | Gene Number | Accuracy |
---|---|---|
Random selection | 500 | 81.4% |
Differential Expression selection | 500 | 95.4% |
Variance-based score | 500 | 96.5% |
Kruskal–Wallis Test | 500 | 97.4% |
Entropy-based score | 500 | 96.6% |
Random Forest | 500 | 99.6% |
HSSG | 500 | 99.8% |
Random selection | 196 | 80.0% |
Differential Expression selection | 196 | 96.1% |
Variance-based score | 196 | 95.9% |
Kruskal–Wallis Test | 196 | 97.1% |
Entropy-based score | 196 | 97.8% |
Random Forest | 196 | 99.6% |
HSSG | 196 | 99.8% |
Method | Accuracy without Feature Selection | Accuracy after Feature Selection | Increase of Method |
---|---|---|---|
COCA [35] | 66.8% | 92.7% | +25.9% |
LRAcluster [36] | 92.8% | 94.6% | +1.8% |
ConsensusClustering [37] | 94.1% | 94.8% | +0.7% |
iClusterBayes [38] | 94.2% | 94.8% | +0.6% |
IntNMF [39] | 93.9% | 95.1% | +1.2% |
Number | Name of miRNA | Heterogeneity Score | Verified |
---|---|---|---|
1 | hsa-mir-205 [40,41] | 1847.826056 | Yes |
2 | hsa-mir-149 [42,43] | 856.438976 | Yes |
3 | hsa-mir-708-5p [44] | 684.3413755 | Yes |
4 | hsa-mir-203a-3p [45] | 542.132398 | Yes |
5 | hsa-mir-769-5p [46] | 457.3321947 | Yes |
6 | hsa-mir-326 [47] | 449.3204298 | Yes |
7 | hsa-mir-6510 | 424.1914996 | No |
8 | hsa-mir-6512 | 387.6342542 | No |
9 | hsa-mir-378a-3p [48] | 375.8590445 | Yes |
10 | hsa-mir-1271-5p [49] | 356.288151 | Yes |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pang, S.; Wu, W.; Zhang, Y.; Wang, S.; Niu, M.; Zhang, K.; Yin, W. HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene. Cells 2022, 11, 2456. https://doi.org/10.3390/cells11152456
Pang S, Wu W, Zhang Y, Wang S, Niu M, Zhang K, Yin W. HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene. Cells. 2022; 11(15):2456. https://doi.org/10.3390/cells11152456
Chicago/Turabian StylePang, Shanchen, Wenhao Wu, Yuanyuan Zhang, Shudong Wang, Muyuan Niu, Kuijie Zhang, and Wenjing Yin. 2022. "HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene" Cells 11, no. 15: 2456. https://doi.org/10.3390/cells11152456
APA StylePang, S., Wu, W., Zhang, Y., Wang, S., Niu, M., Zhang, K., & Yin, W. (2022). HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene. Cells, 11(15), 2456. https://doi.org/10.3390/cells11152456