Abstract
Free full text
Systematic analysis of mutation distribution in three dimensional protein structures identifies cancer driver genes
Abstract
Protein tertiary structure determines molecular function, interaction, and stability of the protein, therefore distribution of mutation in the tertiary structure can facilitate the identification of new driver genes in cancer. To analyze mutation distribution in protein tertiary structures, we applied a novel three dimensional permutation test to the mutation positions. We analyzed somatic mutation datasets of 21 types of cancers obtained from exome sequencing conducted by the TCGA project. Of the 3,622 genes that had ≥3 mutations in the regions with tertiary structure data, 106 genes showed significant skew in mutation distribution. Known tumor suppressors and oncogenes were significantly enriched in these identified cancer gene sets. Physical distances between mutations in known oncogenes were significantly smaller than those of tumor suppressors. Twenty-three genes were detected in multiple cancers. Candidate genes with significant skew of the 3D mutation distribution included kinases (MAPK1, EPHA5, ERBB3, and ERBB4), an apoptosis related gene (APP), an RNA splicing factor (SF1), a miRNA processing factor (DICER1), an E3 ubiquitin ligase (CUL1) and transcription factors (KLF5 and EEF1B2). Our study suggests that systematic analysis of mutation distribution in the tertiary protein structure can help identify cancer driver genes.
Identification of driver genes from the mutation catalog is one of the most important issues in cancer genetics research. The identified driver genes can lead to development of new drugs and elucidate mechanisms of carcinogenesis1,2. One of the strongest signatures of cancer driver genes is the frequency of mutated samples and/or number of mutations. In recent studies, statistical comparison between the number of observed and expected mutations, i.e. gene burden test, has been performed and many driver genes have been discovered3,4. Although this framework has been used to successfully identify new driver genes and pathways, recent studies reveal that most cancers are very heterogeneous, and many low-frequency driver genes exist in the long-tailed mutated gene lists3,4. To discover low-frequency driver genes from huge lists of mutations, application of a new algorithms that focuses on new aspects of driver genes is required.
Driver genes are generally classified into oncogenes and tumor suppressor genes (TSGs). Mutations in oncogenes render the gene constitutively active (gain of function) or confer a different function (change of function)2. On the other hand, mutations in TSGs reduce activity of the gene2. Since the amino acid residues that can activate protein function tend to be limited, mutations in oncogenes are likely to be tightly clustered to a few functionally important codons2. In the TSGs, missense mutations would be distributed in functionally important domains, such as the DNA binding domain in the TP53 gene5. This expectation predicts that mutations in driver genes are not randomly distributed, and analysis of mutation position in a gene can identify driver genes6,7,8.
In this study, we examined the skew of mutation distributions in the tertiary structures of the proteins. Our analysis revealed a different pattern of mutation distribution between oncogenes and TSGs, and identified driver genes and pathways. We consider that our method can complement traditional gene burden tests and contribute to a better interpretation of mutations in cancer genomes.
Results
Analysis method
In the present study, we developed a novel method to analyze mutation position in the tertiary (3 dimensional [3D]) structure of proteins. Since biochemical activities of proteins depend on their structure in vivo, it is reasonable to analyze mutation positions in the context of their 3D structure. To evaluate 3 dimensional accumulation of mutations, we applied a permutation test (3D permutation test) (Fig. 1a and Supplementary Fig. 1). Protein 3D structures were downloaded from the Protein Data Bank (PDB), and the amino acid sequence of the protein structure was aligned to that of the reference genome using MAFFT9. Based on this alignment, mutations in the reference genome were mapped onto the 3D structure. For each gene with n mutations in the sequence covered by the 3D structure, the average pairwise Euclidean distance between the n mutations (observed average distance) was calculated.
To obtain the null distribution, n positions were randomly selected from the amino acid residues, and the average distance (simulated average distance) was calculated. This step was repeated 104~106 times to generate the null distribution of the simulated average distances. The significance of the observed distance was obtained from the null distribution (Fig. 1a, and Supplementary Fig. 1). Thus, the skew of mutation position was evaluated by the p-value. In addition to the 3D permutation test, we performed a simple permutation test for the observed average distance on the protein primary structure (1D permutation test).
Analysis of cancer somatic mutation data
We applied this method to exome sequence data from 7,215 samples of 21 types of cancer released from the TCGA project (Supplementary Table 1)10,11,12,13,14,15,16,17,18,19,20,21,22,23. To remove possible germline SNVs, we examined two independent filters: Filter 1, removal of somatic mutation candidates that were present in the 1000genome common SNVs set, and Filter 2, removal of somatic mutation candidates were found in dbSNP with a validation status of “YES”. Filter 2 was more conservative but removed known cancer hotspots, therefore we adopted the result from filter 1 (Results from the filter 2 and comparison of the two filters can be found in Supplementary Tables 2 and 3 and the Supplementary information).
Using the filtered SNV set, we analyzed 3,622 genes with ≥3 missense mutations in the 3D structure using the 3D permutation test. AML did not have a sufficient number of somatic mutations, and no genes passed the analysis criteria. Analysis with different minimum missense mutation cutoffs (≥4, 5, and 6) is provided in the Supplementary information. Quantile-quantile (Q-Q) plots of the permutation test did not show a systematic inflation of test statistics (Supplementary Fig. 2). After adjustment for multiple testing on the number of the analyzed genes, 188 genes (106 unique genes) had mutations in the >3% of the samples with a false discovery rate (FDR) q-value ≤0.1 (Supplementary Table 2). Of the 21 evaluated cancer types, significant genes were identified in 20 cancer types. Of the 106 significant genes, 31 were significant across multiple cancers (Fig. 1b). To examine the reliability of the result, we examined whether known cancer genes were identified by the 3D permutation method. We adopted the annotation of tumor suppressor genes (TSGs) or oncogenes as defined by Vogelstein et al.24, and CSOMIC cancer gene census. The set of significant genes that were mutated in more than 3% of the samples were enriched for tumor suppressor genes (TSGs), oncogenes and COSMIC cancer genes (TSG; Fisher’s exact test p-value=7.1×10−10, and odds ratio=11.8, oncogene; p-value=7.4×10−18, and odds ratio=29.5, and CSOMIC cancer gene census gene; p-value=5.9×10−19, and odds ratio=9.0) (Fig. 2a–c), suggesting that our analysis can successfully identify driver genes.
We then analyzed frequency of mutated samples and observed average distance between mutations among significant oncogenes, TSGs, other significant genes and non-significant genes. Known TSGs were found to be mutated in a significantly larger number of samples than oncogenes (Fig. 2d). Known TSGs and oncogenes were mutated in a significantly larger number of samples than the other genes (Fig. 2d). Since most of the TSGs and oncogenes were previously detected by frequency of mutated samples, it is reasonable that their frequencies are higher than the other genes. When conditioned on the length of coding sequences (CDS) of the respective genes, the adjusted average distances of the oncogenes were significantly lower than those of the TSGs (p-value=7.2×10−6), indicating that the mutations in the oncogenes were more tightly clustered than those of the TSGs (Fig. 2e). The adjusted average distances of the other significant genes were significantly smaller than those of the TSGs (p-value=2.8×10−8), but not more than those of the oncogenes. These results suggest that the other significant gene set likely includes potential oncogenes, which were difficult to identify using conventional frequency based methods.
Comparison with other methods
We compared our method with the gene burden test and test for mutation clustering. We applied the 3D permutation method to the mutation lists of BRCA, KIRC, LUSC and UCEC, which were analyzed by Lawrence et al.22, and compared the results for the testable genes by the 3D permutation method (the number of mutations in the 3D structure ≥3) (Supplementary Table 4, and Supplementary Fig. 3). First, we compared our result with that from MutSigCV, which is a gene burden test (Supplementary Fig. 3a). In the four cancer types, 16 genes were significant by both methods. Nineteen genes were found only by MutSigCV, likely due to the wide distribution of mutations in these genes. Fifty-nine genes were uniquely identified by our method. Second, we compared the 3D permutation method with results for LUSC and UCEC by MutSigCL22 and OncodriveCLUST6, which are the methods to detect mutation clusters in the primary structure. In the four cancer types, sixty-eight genes were significant by only one method, and 19 and 9 genes were detected by two and three methods, respectively (Supplementary Fig. 3b). Thirty-five genes were identified as significant only by the 3D permutation method, which included the FGFR3, HRAS, and KEAP1 genes. Third, we compared the 3D permutation method with combined MutSig result, which were generated by merging p-values of three methods22 (Supplementary Fig. 3c). In the four cancer types, 27 genes were significant by the both methods. Twenty-two genes were found only by MutSig. Fifty-one genes were uniquely identified by our method, which included the FGFR2, HRAS, NFE2L2 and DICER1 genes These results suggest that the 3D permutation method can compliment the gene burden test and the previous methods to detect mutation clusters, and identify new candidates, although analysis is restricted to genes with 3D structures available.
Genes with skew in mutation distribution
The genes, TP53, PIK3CA, CTNNB1, KRAS, PTEN, HRAS, BRAF, CDKN2A, PIK3R1, NFE2L2, SPOP, and IDH1 showed a significant skew in the observed mutation distribution in 3 or more different types of cancers. Most of them are known TSGs or oncogenes, suggesting they have universal roles in carcinogenesis. While the mutation distribution for most of genes showed similar patterns, CDKN2A gene in the HNSC samples had an additional hotspot (Supplementary Fig. 4). Additionally, we obtained important observations of BRAF. In LUAD, the distribution of mutation in BRAF was significantly skewed even after the removal of the known hotspot (codon 600) (Fig. 3a), suggesting functional effects of the non-hotspot mutations.
A significant skew was observed for eighteen genes in exactly 2 different types of cancers and 6 of them (NRAS, PPP2R1A, EGFR, FGFR3, ERBB2 and SMAD4) are classified as either a TSG or an oncogene. Additionally, RAC1 and KEAP1 are known to be important driver genes25,26 (Fig. 3b). KMT2C, also called MLL3, encodes a chromatin regulator27. DHX9 and PARG had the same mutational hotspot across cancer groups (Supplementary Table 2). PARG, a poly (ADP-ribose) glycohydrolase, is known to remove ADP-ribosylation of some DNA repair proteins recruited to DNA damage sites and can influence the activity of the DNA repair system along with PARP128. DHX9 encodes a DEAH-box DNA/RNA helicase that catalyzes the ATP-dependent unwinding of double-stranded RNA and DNA-RNA complexes29,30. DHX9 plays important roles in multiple cellular pathways, including protein translation, RNA splicing, maintenance of genomic stability and apoptosis29,30. Since DHX9 and PARG were mutated in less than 5% of the samples, they would have been difficult to identify through conventional frequency based methods. In addition, known cancer driver genes that are often mutated at low frequency among samples, such as BRAF in LUAD, NFE2L2 in HNSC, CDKN2A in LUAD and PIK3CA in GBM, were identified by the 3D permutation test. This result strongly suggests the important role of these genes in certain cancers.
Seventy-five genes were identified in only one type of cancer. Of these, 58 genes were neither classified as an oncogene nor a TSG, and 54 were not in the COSMIC cancer gene census. The seventy-five genes included TSGs (VHL, BAP1, HNF1A, PBRM1, and, RB1), oncogenes (AKT1, FGFR2, GNAS, MAP2K1, and MET), kinases (MAPK1, EPHA5, ERBB3, and ERBB4), an apoptosis related gene (APP), an RNA splicing factor (SF1), a miRNA processing factor (DICER1), an E3 ubiquitin ligase (CUL1) and transcription factors (KLF5 and EEF1B2) (Supplementary Table 2) (DICER1 and FAS are shown as examples in Fig. 3c,d).
Pathway analysis
To identify pathways related to the putative driver genes, we analyzed pathway enrichment in the reactome database, and identified 326 pathways with an FDR <0.1 (Supplementary Table 5). Since most of these pathways were already known critical pathways, we excluded genes annotated as oncogenes and TSGs, and carried out the same analysis for the remaining 71 genes. One hundred seventy-two pathways were significantly enriched (Supplementary Table 6), including pathways related to “axon guidance” and “semaphorin”. Genes related to the axon guidance pathway were identified in pancreatic and liver cancers31,32. Semaphorin is involved in tumor growth, invasiveness and metastasis, and our analysis provides supportive evidence of the importance of the semaphorin pathway in carcingenesis33.
Comparison between the 3D and 1D permutation tests
We then compared the results between the 3D and 1D permutation tests (Note that 1D permutation was done for entire coding regions, but 3D permutation analysis was limited to coding region in the 3D protein structure) (Supplementary Fig. 5). For the 1D permutation tests, 141 genes (80 unique genes) had a significantly skewed observed average distance between mutations, 44 of which were commonly identified by both the 3D and 1D permutation tests (Supplementary Fig. 6). The number of significant genes found in the 3D analysis was larger than that of 1D analysis. We compared the proportion of known TSGs, oncogenes and COSMIC cancer genes among the genes identified by the both methods, only the 1D test, and only the 3D test. As expected, commonly identified genes had the highest proportion of these cancer-related genes (Supplementary Fig. 6). Genes identified by only the 3D test had a larger proportion of oncogenes, TSGs and COSMIC cancer genes than these of the 1D test (COSMIC cancer genes; P-value=0.028, oncogenes and TSGs were not significantly different) (Supplementary Fig. 6).
Focusing on each gene, driver genes, such as ERBB2, KIT, HNF1A and GNAS, were not identified in the 1D analysis (Supplementary Fig. 7a–d). In addition to the known driver genes, our 3D analysis identified CUL1 as a candidate driver gene (Supplementary Fig. 7e). CUL1, a known cell cycle regulator, is a member of the Skp1-Cul1-F-box ubiquitin ligase complex, which functions as both a tumor suppressor and oncogene through degradation of its target proteins34,35. Accordingly, 3D permutation analysis should be favored over 1D analysis as it confers several advantages including the efficient identification of oncogenes and TSGs.
We manually reviewed the genes identified only by the 3D permutation test, and found interesting examples. Several recurrently mutated codons, whose frequency was not enough high to be identified by the 1D analysis, were clustered in the 3D structure in RAC1 of HNSC, DICER1 of UCEC and KIT of SKCM (Fig. 3b,c and Supplementary Fig. 7b). One recurrently mutated codon and positionally close mutation(s) were found in KRAS of BLCA, ERBB2 of CESC and GNAS of SKCM (Recurrently mutated codons were codon 12 of KRAS, codon 279 of ERBB2 and codon 844 of GNAS. Codon 12 of KRAS, and codon 844 of GNAS are known as the hotspots of the genes) (Fig. 3e and Supplementary Fig. 7a,e). In KRAS, one mutation (codon 61) was close to the known hotspot (codon 12) in the 3D structure (Fig. 3e). Mutations at codon 61 were also observed in other cancers (COAD, READ, STAD and UCEC), and the mutation was known to have a functional effect36. These examples show that the analysis of mutation distribution in the 3D structure merge mutated codons, which would not be detected as a mutation cluster by 1D analysis, and thus increases detection power for driver gene identification.
Discussion
The 3D permutation method successfully identified many known oncogenes, TSGs and promising candidates of cancer driver genes. Interestingly, our analysis identified a number of TSGs, in addition to the oncogenes, suggesting that skew of mutation distribution is a general feature of TSGs. Although the mutations in the oncogenes are tightly clustered and sensitivity of the 3D permutation method for oncogenes should be higher than that of TSGs, analysis of the mutation distribution can identify TSGs. The functional analysis of close mutations in the TSGs could help define new motifs and critical regions. Previously, Ryslik et al. proposed methods to and successfully identified mutation clusters in the 3D structure7,8. Their methods focused on small mutation clusters, which can be candidates for drug development, and majority of the identified genes were oncogenes. In this study, we examined the skew of mutation positions in the proteins, and identified both oncogenes and TSGs. We consider that the 3D permutation method we implement is more straight-forward and, as evident above, can identify TSGs.
In addition to identifying genes that showed a significant skew in mutation distributions, we were also able to identify skews in domains known to be important. For example, the RNase III domain of DICER1, which is a known mutational hotspot domain37, showed significant skew of mutation distribution (Fig. 3b). This suggests that mutation distributions are not uniform within a domain, and such analyses could contribute to the identification of functionally important amino acid positions within the domain.
We note that our method has several limitations to be assessed in future. First, this analysis is limited to mutations in genes with available 3D protein structures, and therefore genes/proteins without 3D information cannot be analyzed. It is also difficult to obtain 3D structures for the entire protein, for example, structures of transmembrane regions are difficult to examine. In our analysis, only 3,622 genes had 3 or more mutations in regions for which protein structures were available. Second, protein structures in PDB may not always accurately reflect protein structure in vivo. Third, errors in mutation call from sequencing analysis can affect these results, especially, if common germline SNVs are not called in multiple normal samples and identified in their matched cancer samples. The germline SNVs would be identified as clustered mutations. Since such types of errors should be rare, we excluded mutations found in the 1000 genomes common set and focused on genes that are mutated in ≥3% of the samples. However, a small number of false positives caused by germline SNVs may still be included in our list. Fourth, our method implicitly assumes that the mutation rate in a gene is uniform in the protein structure. This assumption should be appropriate for most genes, but may fail for some larger genes. For example, TTN, one of the largest genes, was significant in two cancers, and this may result from non-uniform mutation rate across the TTN gene region. Although analysis of the skew of distribution of synonymous mutations may make it possible to test the mutation rate variation in a gene, the number of the synonymous mutations in the downloaded data is too small to analyze. In the future, we will be able to obtain a larger number of mutations, and test the mutation rate variation with synonymous mutations. The contamination of germline SNVs and non-uniform mutation rates in a gene may explain the slight inflation seen in the Q-Q plots (Supplementary Fig. 2).
Together, our analysis systematically identified skewed mutation distributions in the 3D protein structure of 106 genes, which include many strong candidate driver genes with various biological functions. Further functional analysis of these candidates and the mutations should elucidate the mechanism of carcinogenesis and their status as driver genes. Since an accumulation of mutations in a protein can also indicate a good drug target, our methodology can be applied to future drug development, in addition to the identification of low frequency driver genes.
Methods
Data
Mutation data was downloaded from TCGA web site. MAF files with a data status of “No restrictions; all data available without limitations” were downloaded on December 19th, 2015 (Supplementary Table 1). Redundant SNVs in same sample were removed. To remove possible germline variations, we performed two kinds of SNVs filters: Filter 1, removal of somatic mutation candidates that were found in the 1000genome common SNV set, and Filter 2, removal of somatic mutation candidates that were found in dbSNP with a validation status of “YES”. The number of germline SNVs in dbSNP is much larger than these in 1000geome and filter 2 is the more conservative filter and, remove important hotspots in cancer as suggested in Jung et al.39. Indeed, known hotspots in BRAS, KRAS and GNAS were removed by filter 2. Therefore we discussed the results of filter 1 in the paper. All files in a cancer type were merged and annotated with gene information of GENCODE v19 (http://www.gencodegenes.org). Remaining SNVs that resulted in missense mutations were selected for analysis. Commands to visualize mutation position in the 3D structure in Chimera software were provided in the Supplementary Table 2.
Protein 3D structures were downloaded from PDB on August 7th, 2014.
Annotation of oncogenes and TSGs was based on Table S2A in Vogelstein et al.10. Cancer gene census was obtained from COSMIC.
Comparison with other methods
We compared out methods with different methods; MutSigCV, MutSigCL, combined result of MutSig and OncodriveCLUST. MutSigCV is a gene burden test, and MutSigCL and OncodriveCLUST are methods to identify mutation clusters in the primary structure. The P-values of MutSigCV and MutSigCL were obtained from Lawrence et al.22. In the paper, 21 cancers were analyzed, and we selected four cancer types (BRCA, KIRC, LUSC, and UCEC), because mutation lists used in the study on the GRCh37 were available from TCGA database. P-values for MutSigCV, MutSigCL were obtained from Lawrence et al.3, and correction for the multiple testing was by Benjamini and Hochberg’s method40 with n=18,388 according to Lawrence et al.3. We also compared the result of MutSigCL and OncodriveCLUST for LUSC and UCEC, whose results were available from previous publications6,22.
Statistical methods
To quantify the skew of the mutation distribution, we applied a permutation test (Supplementary Fig. 1). Not all protein structures are human in origin, and most are not of the full length coding region. Because of this, we aligned the amino acid sequence of the protein structure to the translated reference genome using MAFFT9. Amino acid sequence aligned to human sequence with ratio of gaps to length of the alignment <0.5 and ratio of aligned sequence length to total sequence length ≥0.5 were used for the analysis. Using the alignment, we converted the positions of the mutations to those in the 3D structures.
Using the positions in the 3D proteins, we tested the skew in mutation position distribution. We calculated the pairwise Euclidian distances between mutations in the 3D protein structure, and averaged the distances (observed average distance). Given a gene with n mutations in the 3D structure, average pairwise Euclidian distances between n mutations was calculated as follows;
where (x, y, z) are location of amino acid residue in the 3D structure.
To test the significance of the observed average distance, we applied a permutation test. For gene with n mutations, n positions were randomly selected from the amino acid residues, and calculated average distances (simulated average distance) with the above formula. This step was repeated m times and the distribution of the simulated average distances was generated and used as the null distribution for the gene. (Fig. 1a, and Supplementary Fig. 1). We initially performed 10,000 permutations for each gene. To accurately estimate small p-values, genes with p-value≤0.01 were reanalyzed using 1,000,000 permutations. If one gene had multiple 3D structures, the 3D structure with the lowest p-value was selected. Correction for multiple testing was done with the Benjamini and Hochberg’s method40.
We also analyzed the primary structure. We calculated the distance between mutations in the amino acid sequence. The observed distance was calculated as follows;
where x is position of the mutation in the primary structure. The calculation, selection of a p-value and correction of the multiple testing were carried out as done for the 3D permutation.
Software
Program for the 3D permutation method is available (3D permutation; https://github.com/afujimoto/3Dpermutation.git).
URLs and software
PDB; http://www.rcsb.org/pdb/home/home.do
Source of the 3D structures; ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/
Conversion of gene symbol to PDB id was based on ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.
MAFFT; http://mafft.cbrc.jp/alignment/software/
TCGA; http://cancergenome.nih.gov
COSMIC; http://cancer.sanger.ac.uk/cosmic
DAVID; http://david.abcc.ncifcrf.gov
Uniplot; http://www.uniprot.org
VMD; http://www.ks.uiuc.edu/Research/vmd/
Chimera; http://www.cgl.ucsf.edu/chimera/
REACTOME; http://www.reactome.org/
Additional Information
How to cite this article: Fujimoto, A. et al. Systematic analysis of mutation distribution in three dimensional protein structures identifies cancer driver genes. Sci. Rep. 6, 26483; 10.1038/srep26483 (2016).
Supplementary Material
Acknowledgments
We appreciate the data released by the TCGA project. The super-computing resource “SHIROKANE” was provided by Human Genome Center, The University of Tokyo (http://sc.hgc.jp/shirokane.html). This work was supported partially by Takeda Science Foundation.
Footnotes
Author Contributions A.F. designed study, developed the algorithm and wrote program. A.F. and Y.O. analyzed the data. A.F., Y.O., K.A.B., H.T. and H.N. wrote the paper. T.T. advised on the data analysis.
References
- Wheeler D. A. & Wang L. From human genome to cancer genome: the first decade. Genome Res 23, 1054–1062, 10.1101/gr.157602.113 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Vogelstein B. & Kinzler K. W. Cancer genes and the pathways they control. Nat Med 10, 789–799, 10.1038/nm1087 (2004). [Abstract] [CrossRef] [Google Scholar]
- Lawrence M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218, 10.1038/nature12213 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Tamborero D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci Rep 3, 2650, 10.1038/srep02650 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Olivier M., Hollstein M. & Hainaut P. TP53 mutations in human cancers: origins, consequences, and clinical use. Cold Spring Harb Perspect Biol. 2, a001008 (2010). [Europe PMC free article] [Abstract] [Google Scholar]
- Tamborero D., Gonzalez-Perez A. & Lopez-Bigas N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244, 10.1093/bioinformatics/btt395 (2013). [Abstract] [CrossRef] [Google Scholar]
- Ryslik G. A., Cheng Y., Cheung K. H., Modis Y. & Zhao H. Utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 14, 190, 10.1186/1471-2105-14-190 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Ryslik G. A., Cheng Y., Cheung K. H., Modis Y. & Zhao H. A graph theoretic approach to utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 15, 86, 10.1186/1471-2105-15-86 (2014). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Katoh K., Misawa K., Kuma K. & Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 3059–3066 (2002). [Europe PMC free article] [Abstract] [Google Scholar]
- The Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med 368, 2059–2074, 10.1056/NEJMoa1301689 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507, 315–322, 10.1038/nature12965 (2014). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70, 10.1038/nature11412 (2012). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068, 10.1038/nature07385 (2008). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Davis C. F. et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26, 319–330, 10.1016/j.ccr.2014.07.014 (2014). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49, 10.1038/nature12222 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550, 10.1038/nature13385 (2014). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525, 10.1038/nature11404 (2012). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615, 10.1038/nature10166 (2011). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202–209, 10.1038/nature13480 (2014). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 159, 676–690, 10.1016/j.cell.2014.09.050 (2014). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- The Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73, 10.1038/nature12113 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Lawrence M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501, 10.1038/nature12912 (2014). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Kandoth C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339, 10.1038/nature12634 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Vogelstein B. et al. Cancer genome landscapes. Science 339, 1546–1558, 10.1126/science.1235122 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Krauthammer M. et al. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Nat Genet 44, 1006–1014, 10.1038/ng.2359 (2012). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Jaramillo M. C. & Zhang D. D. The emerging role of the Nrf2-Keap1 signaling pathway in cancer. Genes Dev 27, 2179–2191, 10.1101/gad.225680.113 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Ford D. J. & Dingwall A. K. The cancer COMPASS: navigating the functions of MLL complexes in cancer. Cancer Genet 208, 178–191, 10.1016/j.cancergen.2015.01.005 (2015). [Abstract] [CrossRef] [Google Scholar]
- Schiewer M. J. & Knudsen K. E. Transcriptional roles of PARP1 in cancer. Mol Cancer Res 12, 1069–1080, 10.1158/1541-7786.MCR-13-0672 (2014). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Bernards R. Unlikely suspects identified in neuroblastoma conspiracy. Cancer Discov 4, 392–393, 10.1158/2159-8290.CD-14-0170 (2014). [Abstract] [CrossRef] [Google Scholar]
- Jain A. et al. DHX9 helicase is involved in preventing genomic instability induced by alternatively structured DNA in human cells. Nucleic Acids Res 41, 10345–10357, 10.1093/nar/gkt804 (2013). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Fujimoto A. et al. Whole-genome mutational landscape of liver cancers displaying biliary phenotype reveals hepatitis impact and molecular diversity. Nat Commun 6, 6120, 10.1038/ncomms7120 (2015). [Abstract] [CrossRef] [Google Scholar]
- Biankin A. V. et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 491, 399–405, 10.1038/nature11547 (2012). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Tamagnone L. Emerging role of semaphorins as major regulatory signals and potential therapeutic targets in cancer. Cancer Cell 22, 145–152, 10.1016/j.ccr.2012.06.031 (2012). [Abstract] [CrossRef] [Google Scholar]
- Frescas D. & Pagano M. Deregulated proteolysis by the F-box proteins SKP2 and beta-TrCP: tipping the scales of cancer. Nat Rev Cancer 8, 438–449, 10.1038/nrc2396 (2008). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Ji S. et al. ERK kinase phosphorylates and destabilizes the tumor suppressor FBW7 in pancreatic cancer. Cell Res 25, 561–573, 10.1038/cr.2015.30 (2015). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Stolze B., Reinhart S., Bulllinger L., Frohling S. & Scholl C. Comparative analysis of KRAS codon 12, 13, 18, 61 and 117 mutations using human MCF10A isogenic cell lines. Sci Rep 5, 8535, 10.1038/srep08535 (2015). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Foulkes W. D., Priest J. R. & Duchaine T. F. DICER1: mutations, microRNAs and mechanisms. Nat Rev Cancer 14, 662–672, 10.1038/nrc3802 (2014). [Abstract] [CrossRef] [Google Scholar]
- Wang L. et al. The Fas-FADD death domain complex structure reveals the basis of DISC assembly and disease mutations. Nat Struct Mol Biol 17, 1324–1329, 10.1038/nsmb.1920 (2010). [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
- Jung H., Bleazard T., Lee J. & Hong D. Systematic investigation of cancer-associated somatic point mutations in SNP databases. Nat Biotechnol 31, 787–789, 10.1038/nbt.2681 (2013). [Abstract] [CrossRef] [Google Scholar]
- Benjamini Y. & Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995) [Google Scholar]
Articles from Scientific Reports are provided here courtesy of Nature Publishing Group
Full text links
Read article at publisher's site: https://doi.org/10.1038/srep26483
Read article for free, from open access legal sources, via Unpaywall: https://www.nature.com/articles/srep26483.pdf
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Article citations
Mut-Map: Comprehensive Computational Pipeline for Structural Mapping and Analysis of Cancer-Associated Mutations.
Brief Bioinform, 25(6):bbae514, 01 Sep 2024
Cited by: 0 articles | PMID: 39413799 | PMCID: PMC11483132
Role of RNA Splicing Mutations in Diffuse Large B Cell Lymphoma.
Int J Gen Med, 16:2469-2480, 15 Jun 2023
Cited by: 0 articles | PMID: 37342407 | PMCID: PMC10278864
Review Free full text in Europe PMC
Faltwerk: a library for spatial exploratory data analysis of protein structures.
Bioinform Adv, 3(1):vbad007, 23 Jan 2023
Cited by: 0 articles | PMID: 36908399 | PMCID: PMC9998081
A New View of Activating Mutations in Cancer.
Cancer Res, 82(22):4114-4123, 01 Nov 2022
Cited by: 10 articles | PMID: 36069825 | PMCID: PMC9664134
Review Free full text in Europe PMC
HNF1A POU Domain Mutations Found in Japanese Liver Cancer Patients Cause Downregulation of HNF4A Promoter Activity with Possible Disruption in Transcription Networks.
Genes (Basel), 13(3):413, 24 Feb 2022
Cited by: 2 articles | PMID: 35327967 | PMCID: PMC8949677
Go to all (16) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
SNPs
- (2 citations) dbSNP - rs146546850
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
The structural impact of cancer-associated missense mutations in oncogenes and tumor suppressors.
Mol Cancer, 10:54, 16 May 2011
Cited by: 35 articles | PMID: 21575214 | PMCID: PMC3123651
Comprehensive assessment of cancer missense mutation clustering in protein structures.
Proc Natl Acad Sci U S A, 112(40):E5486-95, 21 Sep 2015
Cited by: 129 articles | PMID: 26392535 | PMCID: PMC4603469
Identification of new driver and passenger mutations within APOBEC-induced hotspot mutations in bladder cancer.
Genome Med, 12(1):85, 28 Sep 2020
Cited by: 29 articles | PMID: 32988402 | PMCID: PMC7646471
Whole-exome sequencing reveals recurrent somatic mutation networks in cancer.
Cancer Lett, 340(2):270-276, 12 Nov 2012
Cited by: 18 articles | PMID: 23153794
Review