Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

Compositional dynamics and codon usage pattern of BRCA1 gene across nine mammalian species

2018, Genomics

The BRCA1 gene is located on the human chromosome 17q21.31 and plays important role in biological processes. The aminoacyl-tRNA synthetases (AARS) are a family of heterogenous enzymes responsible protein synthesis and whose secondary functions include a role in autoimmune myositis. Our findings reveal that the compositional constraint and the preference of more A/T -ending codons determine the codon usage patterns in BRCA1 gene while more G/C-ending codons influence the codon usage pattern of AARS gene among mammals. The codon usage bias in BRCA1 and AARS genes is low. The codon CGC encoding arginine amino acid and the codon TTA encoding leucine were uniformly distributed in BRCA1 and AARS genes, respectively in mammals including human. Natural selection might have played a major role while mutation pressure might have played a minor role in shaping the codon usage pattern of BRCA1 and AARS genes.

Genomics xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Compositional dynamics and codon usage pattern of BRCA1 gene across nine mammalian species ⁎ Supriyo Chakrabortya, , Tarikul Huda Mazumdera, Arif Uddinb, a b ⁎ Department of Biotechnology, Assam University, Silchar 788011, Assam, India Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Algapur, Hailakandi 788150, Assam, India A R T I C L E I N F O A B S T R A C T Keywords: Cancer Breast Molecular genetics Codon usage bias The BRCA1 gene is located on the human chromosome 17q21.31 and plays important role in biological processes. The aminoacyl-tRNA synthetases (AARS) are a family of heterogenous enzymes responsible protein synthesis and whose secondary functions include a role in autoimmune myositis. Our findings reveal that the compositional constraint and the preference of more A/T –ending codons determine the codon usage patterns in BRCA1 gene while more G/C-ending codons influence the codon usage pattern of AARS gene among mammals. The codon usage bias in BRCA1 and AARS genes is low. The codon CGC encoding arginine amino acid and the codon TTA encoding leucine were uniformly distributed in BRCA1 and AARS genes, respectively in mammals including human. Natural selection might have played a major role while mutation pressure might have played a minor role in shaping the codon usage pattern of BRCA1 and AARS genes. 1. Introduction Genetic code is degenerate meaning that more than one codon encodes the same amino acid. Unequal usage of synonymous codons for encoding the same amino acid during translation of a gene transcript into a protein is a well-established phenomenon commonly known as codon usage bias (CUB). It is species specific and significantly differs among the genes of the same taxa [3,12,31,35]. The codon usage patterns have been analyzed since the outstanding efforts for the creation of the first molecular sequence databases were initiated [12]. The result of Grantham and his co-workers demonstrated that species specific genes share similar patterns of synonymous codon usage frequency as stated by the “genome hypothesis” [11,12]. Therefore, scanning the codon usage patterns of all the genes in an organism may obscure the underlying heterogeneity [2] and hence it is better to identify the trends of codon usage patterns within the genes of a species or between closely related species. Various factors responsible for codon usage bias in different organisms from lower prokaryotes to higher eukaryotes have been discussed earlier by researchers across the globe but till date the codon usage patterns within the genes of an organism during the course of evolution have been interpreted for varied explanations. In general, researchers reported that the compositional constraints under mutation pressure or natural selection have been considered as the major factors involved in the codon usage variation among different organisms [8,20,26,48]. ⁎ The BRCA1 gene in human is located on the chromosome 17q21.31 and comprises of 24 exons and its coding region encodes a protein of 1863 amino acids [33]. Multiple functions of BRCA1 attributed to its tumor activity include progression of cell cycle, DNA damage repair process and regulation of specific set of pathways as well as germ line mutations in its sequence. The predisposition of these functions of BRCA1 gene to breast and ovarian cancer in affected individuals [36] has been discussed earlier but the comparative analysis of synonymous codon usage influencing the codon bias in BRCA1 gene among mammals with reference to human has not been done so far. Housekeeping genes are typically constitutive genes that carry out the maintenance of basic cellular functions, and are expressed in all cells of an organism under normal and patho-physiological conditions [9,22]. The AARS gene encodes the enzyme alanyl-tRNA synthetase and catalyzes the binding of alanine amino acid to the appropriate tRNA. The aminoacyl-tRNA synthetases are a family of heterogenous enzymes responsible protein synthesis and their secondary functions include a role in autoimmune myositis [18]. In this study, an attempt has been made to analyze the codon bias and codon context patterns in the coding sequences of BRCA1 and compared with one house keeping gene (AARS) having same length across mammals using the codon bias measures like effective number of codons (ENC), relative synonymous codon usage (RSCU) and relative abundance of dinucleotides. Further, in order to understand the extent of selection pressure acting on the protein coding BRCA1 and AARS Corresponding author. E-mail addresses: supriyoch_2008@rediffmail.com (S. Chakraborty), arif.uddin29@gmail.com (A. Uddin). https://doi.org/10.1016/j.ygeno.2018.01.013 Received 1 September 2017; Received in revised form 22 December 2017; Accepted 22 January 2018 0888-7543/ © 2018 Elsevier Inc. All rights reserved. Please cite this article as: Chakraborty, S., Genomics (2018), https://doi.org/10.1016/j.ygeno.2018.01.013 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. Table 1 Coding sequence accession number and length (bp) of BRCA1 and AARS (alanyl-tRNA synthetase) genes across mammals. Sl. no. Mammal Lengtha (bp) BRCA1 gene Accession no. 1 2 3 4 5 6 7 8 9 a Homo sapiens Pan troglodytes Pongo pygmaeus abelii Nomascus leucogenys Nomascus gabriellae Macaca fascicularis Macaca mulatta Papio anubis Miopithecus talapoin Lengtha (bp) AARS gene Accession no. gi|353441748 gi|113865840 gi|667713698 gi|667713708 gi|667713690 gi|672890339 gi|169234601 gi|692110329 gi|667713714 5589 5589 5589 5589 5589 5589 5589 5589 5589 NM_001605.2 XM_016930106.1 NM_001131919.1 XM_005592535.2 XM_015126512.1 XM_003918580.4 – – – 2904 2904 2904 2904 2904 2904 – – – Coding sequence length excludes the stop codon. Table 2 Nucleotide composition (%) at three codon positions and AT-GC contents (%) of synonymous codons in the coding sequences of BRCA1 and AARS genes across mammals. No. A T G C A3 T3 G3 C3 AT GC AT3 GC3 BRCA1 gene 1 34.7 2 34.8 3 34.9 4 34.8 5 34.8 6 34.9 7 34.9 8 34.9 9 34.8 M 34.83 SD 0.071 24.1 24.0 24.0 24.0 24.1 24.2 24.2 24.3 24.2 24.12 0.109 21.8 21.7 21.6 21.6 21.6 21.5 21.4 21.5 21.6 21.59 0.117 19.4 19.5 19.5 19.6 19.5 19.4 19.5 19.3 19.4 19.46 0.088 31.4 31.5 31.6 31.4 31.3 31.5 31.6 31.7 31.7 31.52 0.139 34.1 34.0 33.8 33.7 33.8 34.3 34.2 34.2 34.2 34.0 0.218 17.7 17.6 17.7 17.8 17.8 17.7 17.5 17.6 17.7 17.68 0.097 16.8 16.9 16.9 17.1 17.1 16.5 16.7 16.5 16.4 16.77 0.259 58.8 58.8 58.9 58.8 58.8 59.1 59.1 59.2 59.0 58.94 0.159 41.2 41.2 41.1 41.2 41.2 40.9 40.9 40.8 41.0 41.05 0.159 65.5 65.5 65.4 65.2 65.1 65.8 65.8 65.9 65.9 65.57 0.300 34.5 34.5 34.6 34.8 34.9 34.2 34.2 34.1 34.1 34.43 0.300 AARS gene 1 25.0 2 25.1 3 25.2 4 25.2 5 25.3 6 25.3 M 25.18 SD 0.117 21.3 21.3 21.0 20.9 21.0 20.8 21.05 0.208 28.6 28.5 28.3 28.5 28.5 28.5 28.48 0.098 25.1 25.1 25.5 25.4 25.2 25.4 25.28 0.172 15.3 15.3 15.8 15.8 16.1 15.9 15.70 0.329 24.6 24.6 23.9 23.6 23.8 23.2 23.95 0.558 28.7 28.6 28.3 29.0 28.6 28.9 28.68 0.248 31.4 31.5 32.0 31.6 31.5 32.0 31.67 0.266 46.4 46.4 46.2 46.1 46.2 46.1 46.23 0.137 53.6 53.6 53.8 53.9 53.8 53.9 53.76 0.137 39.9 39.9 39.7 39.3 39.9 39.1 39.63 0.350 60.1 60.1 60.3 60.7 60.1 60.9 60.37 0.350 M: mean; SD: standard deviation. Table 3 Correlation coefficients among nucleotide compositions at three codon positions in the coding sequences of BRCA1 gene (below the diagonal) and for AARS gene (above the diagonal in blue) across mammals. ⁎ A1 A2 A3 T1 T2 T3 C1 C2 C3 G1 G2 G3 A1 0 -0.71 0.79 -0.71 0.76 -0.96** 0.89* 0.18 0.66 -0.94** 0.72 0.67 A2 0.27 0 -0.82* 0.54 -0.70 0.62 -0.71 -0.10 -0.18 0.61 -0.94** -0.41 A3 -0.03 -0.27 0 -0.80 0.46 -0.84* 0.95** 0.52 0.63 -0.88* 0.70 0.16 T1 0.45 0.04 -0.48 0 -0.31 0.77 -0.90* -0.54 -0.74 0.83* -0.32 -0.04 T2 0.53 -0.48 0.58 0.22 0 -0.67 0.48 -0.47 0.29 -0.51 0.79 0.81* T3 0.15 -0.35 0.60 -0.51 0.33 0 -0.93** -0.27 -0.83* 0.97** -0.60 -0.50 C1 -0.29 0.04 0.57 -0.49 0.22 -0.10 0 0.53 0.75 -0.97** 0.60 0.28 C2 0.67* -0.30 0.02 0.10 0.53 0.50 -0.40 0 0.39 -0.48 -0.07 -0.44 C3 -0.02 0.54 -0.76* 0.56 0.49 -0.93** -0.16 0.41 0 -0.80 0.09 0.09 G1 -0.71* -0.23 -0.10 -0.57 -0.64 0.31 -0.25 -0.18 -0.23 0 -0.56 -0.40 G2 -0.93** 0.15 -0.18 -0.22 -0.64 0.31 0.12 -0.80* 0.22 0.64 0 0.62 G3 -0.20 -0.03 -0.82** 0.45 -0.35 -0.78* -0.17 -0.18 0.72* -0.04 0.35 0 p < 0.05, ⁎⁎ p < 0.01. 2 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. Fig. 1. Heat map representation of the correlation coefficient between codon usage and GC3s in the coding sequences of BRCA1 and AARS genes among mammals. codon usage bias in the gene [47]. ENC value shows an inverse relationship with the degree of codon bias. genes, we have measured the ratio of nonsynonymous substitution per nonsynonymous site to the number of synonymous substitution per synonymous site (dN/dS ratio) between human and closely related mammals. Our current study provides an insight into the patterns of codon usage in gaining the clues for codon optimization to alter the translational efficiency as well as for the functional conservation of gene expression and also the significance of nucleotide composition in BRCA1 and housekeeping (AARS) gene within mammals. 2.4. Relative synonymous codon usage The relative synonymous codon usage (RSCU) values of different codons in the coding sequences of BRCA1 and AARS genes were calculated as per Comeron and Aguade [7] using the formula: 2. Methodology RSCU = 2.1. Sequence data gij ni ni ∑ j gij Nucleotide coding sequences (CDS) of equal length having perfect start and stop codon, devoid of any unknown bases (N) and exact multiple of three bases for BRCA1 gene for nine mammals (n = 9) were retrieved with accession number (Table 1) from GenBank database of the National Center for Biotechnology Information (NCBI) (http:// www.ncbi.nlm.nih.gov). Moreover, the CDS of a housekeeping gene i.e. alanyl-tRNA synthetase (AARS) gene for six mammals (n = 6) were retrieved with accession number (Table 1) having equal length for comparative analysis with BRCA1 with the house keeping gene. where, gij is the relative codon usage frequency of the ith codon for the jth amino acid which is encoded by ni synonymous codons [7]. In our analysis, RSCU value > 1.0 represents positive codon usage bias while RSCU value < 1.0 indicates a negative codon usage bias for the corresponding amino acid. Moreover, synonymous codons with RSCU value > 1.6 are considered as over-represented and those with RSCU < 0.6 as under-represented codons, respectively [46]. 2.2. Nucleotide composition analysis The relative abundance of sixteen dinucleotides in the coding sequences of BRCA1 and AARS genes across mammals was determined using the approach of Chiusano et al. [6]. The odd ratio of each dinucleotide was computed using the formula: 2.5. Relative dinucleotide abundance The occurrence of the nucleotide A, T, G, C contents (%), overall frequency of four nucleotides at third position i.e. A3, T3, G3, C3 (%) along with GC and AT contents (%) at different positions of synonymous codons were estimated in the coding sequences of BRCA1 and AARS genes across selected mammals in order to examine the extent of base compositional bias. Pxy = f y fx where, fx and fy denote the frequency of the nucleotide X and Y respectively and fxy denotes the frequency of the dinucleotide XY. In our analysis, pxy > 1.23 was considered as the over-represented dinucleotide and pxy < 0.78 as the under-represented dinucleotide in terms of relative abundance. 2.3. Effective number of codons The observed effective number of codons (ENC) for each coding sequence of BRCA1 and AARS gene was calculated using the formula given by Wright [47] as follows: ENC = 2 + fxy 9 1 5 3 + + + F2 F3 F4 F6 2.6. Analysis of selection pressures on the coding sequence of BRCA1 and AARS genes where, Fk expression (k = 2, 3, 4 or 6) is the average of the Fk values for k-fold degenerate amino acids. The F value denotes the probability that two randomly chosen codons for an amino acid with two codons are identical. ENC value generally ranges from 20 to 61. Low ENC value (< 35) indicates high codon usage bias and higher ENC value reveals low The degree of nonsynonymous substitution (dN) per nonsynonymous site, synonymous substitution per synonymous site (dS) and the ratio dN to dS i.e. dN/dS were estimated as per Nielsen and Yang [30] to assess the effect of natural selection that acted on BRCA1 and AARS genes during the course of evolution. 3 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. influence of amino acid composition. Each CDS was represented as a 59-dimensional vectors, and each dimension corresponded to the RSCU value of one sense codon with the exception of ATG (methionine), TGG (tryptophan) and three stop codons. The major trends in codon usage variation can be determined with relative inertia, according to which the coding sequences are analyzed to investigate the major factors affecting the codon usage pattern. COA was done using XLSTAT Pro software. Table 4 Overall relative synonymous codon usage (RSCU) values in the coding sequences of BRCA1 and AARS genes among mammals. AA Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Phe Pro Ser Thr Tyr Val Codon GCA** GCC* GCG GCT* CGT CGC CGA* CGG* AGA** AGG** AAC* AAT* GAT* GAC* TGC* TGT* CAA CAG* GAA* GAG* GGA* GGC* GGG GGT* CAT* CAC* ATA* ATC* ATT* TTA* TTG* CTA CTC* CTG* CTT AAA* AAG TTT* TTC CCA* CCC CCG CCT** TCA TCC TCG TCT** AGC* AGT** ACA* ACC ACG ACT** TAC TAT* GTA GTC GTG* GTT* BRCA1 AARS N. RSCUa N. RSCUa 305 139 28 280 43 18 42 28 317 233 355 733 523 237 95 297 359 539 1194 561 267 140 143 200 313 149 252 182 270 299 291 214 137 260 211 710 507 279 160 335 127 5 371 336 159 10 596 384 571 340 209 36 417 127 154 206 133 262 332 1.63 0.75 0.15 1.49 0.38 0.17 0.37 0.26 2.80 2.05 0.65 1.35 1.37 0.63 0.48 1.52 0.80 1.20 1.36 0.64 1.43 0.74 0.76 1.07 1.36 0.64 1.08 0.78 1.16 1.27 1.23 0.90 0.58 1.11 0.90 1.17 0.83 1.27 0.74 1.60 0.60 0.02 1.77 0.97 0.45 0.01 1.75 1.13 1.67 1.36 0.84 0.15 1.67 0.90 1.10 0.88 0.57 1.12 1.43 98 228 19 181 8 40 64 96 35 55 117 87 199 252 39 35 31 220 154 232 95 172 106 97 29 73 10 150 140 24 69 28 115 270 49 140 256 129 92 47 88 12 72 33 67 10 70 86 28 88 70 54 103 81 75 35 111 213 44 0.73 1.74 0.14 1.37 0.16 0.80 1.28 1.94 0.70 1.10 1.15 0.85 0.88 1.12 1.05 0.95 0.25 1.75 0.80 1.20 0.81 1.45 0.91 0.83 0.57 1.43 0.10 1.50 1.40 0.24 0.75 0.29 1.24 2.91 0.54 0.71 1.29 1.16 0.84 0.87 1.61 0.21 1.31 0.66 1.36 0.20 1.43 1.77 0.56 1.11 0.89 0.68 1.31 1.04 0.96 0.35 1.09 2.11 0.44 2.8. Neutrality plot Mutations that mostly occur in the 3rd position of synonymous codons result in synonymous mutation, whereas mutation that occurs in 1st and 2nd codon position leads to nonsynonymous change (amino acid changing type). Nonsynonymous mutations occur less frequently since they may affect gene functionality. Theoretically mutations should occur randomly at three positions of codons in a DNA molecule if there is no external pressure. The preference of bases in three different codon positions would not be same in the presence of selection pressure [37]. Neutrality plot, a graphical plot of GC12 against GC3 depicts the role of directional mutational pressure and natural selection. In this plot, regression coefficient of GC12 on GC3 is the equilibrium state of mutation and selection [37]. 2.9. Software used A PERL program was developed to estimate the codon usage bias indices and the selection pressure on the coding sequences of BRCA1 and AARS genes. Statistical analysis was carried out using the IBM SPSS version 21.0 and the heat map (cluster analysis) was generated with NetWalker software version 1.0 [17]. Phylogenetic analysis based on nonsynonymous substitution was performed with Mega 6.0 software [39]. 2.10. Correlation analysis Correlation coefficient between any two parameters was estimated by Karl Pearson's product moment method to assess the presence and the degree of relationship between the parameters. The significance of the correlation coefficient was tested by t-test for (n-2) degrees of freedom at p < 0.01 or p < 0.05. 2.11. Skewness analysis of nucleotides Skewness of any two nucleotides (x,y) was estimated as (x − y)/ (x + y) to understand the compositional dynamics of nucleotide composition in coding sequences. A positive value of skewness between x and y nucleotides indicates the preponderance of x over y nucleotide while a negative value reveals less abundance of x over y in the coding sequence. The skewness value deviating from zero clearly indicates unequal usage of two nucleotides in the transcript. 3. Results and discussion 3.1. Nucleotide compositions in BRCA1 and AARS genes across mammals a Mean values of RSCU based on the synonymous codon usage frequency, AA: amino acid, N: total number of codons, *RSCU > 1.0, **RSCU > 1.6. Nucleotide compositions in the coding sequences of BRCA1 and AARS genes among the selected mammals were analyzed (Table 2). In case of BRCA1 gene, the result showed that the overall percentage of AT (58.94%) content was higher than GC (41.05%) content. But, the nucleotide composition analysis of the AARS gene showed that the overall percentage of GC (53.76%) content was higher than AT (46.23%) content. It is a well-known fact that the nucleotide at the third codon position varies considerably due to wobble hypothesis which allows the cell to identify all the 61 sense codons on the mRNA by a few tRNA molecules. In our analysis, we observed that the bases T and C were the 2.7. Correspondence analysis Correspondence analysis is generally used to investigate the major trend in codon usage variation among genes [34,44]. To explore the variation in codon usage in BRCA1 and AARS genes among nine mammalian species, the RSCU values of codons for all CDS selected in this study were used for correspondence analysis to reduce the 4 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. Fig. 2. Graphical representation of nucleotide skewness values for BRCA1 and AARS genes among mammals. Fig. 3. Correspondence analysis of RSCU values in BRCA1 and AARS genes. Each point in the plot represents the distribution of a gene corresponding to the coordinates of the primary and secondary axes of variation. Black color indicates the codons while blue color indicates different species. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Fig. 4. Neutrality plot analysis of GC12 versus GC3 in BRCA1 and AARS genes. 5 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. Fig. 5. Line diagram showing the relative abundance of 16 dinucleotides in BRCA1 and AARS genes. Fig. 6. Heat map representation of amino acid usage in BRCA1 and AARS across mammals. Each rectangular box with color bar represents the occurrence of amino acid frequency (shown in columns) corresponds to [A] BRCA1 gene and [B] AARS gene across mammals (shown in rows). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) mammals [24]. most frequently used ones at the 3rd position of codons in the coding sequences of BRCA1 and AARS genes, respectively. Moreover, in BRCA1 gene A1 showed a significant positive correlation with C2 but significant negative correlation with G1 and G2. However, both A3 and T3 were negatively correlated (p < 0.01) with C3 and G3 respectively. In addition, C3 had significant positive correlation with G3 in the CDS of BRCA1 gene (Table 3) [49]. In contrast, the CDS of AARS gene showed that C1 had a significant negative correlation (p < 0.01) with G1 but T3 was strongly negatively correlated with C1 but positively correlated with G1. Similarly, A1 showed a significant negative correlation with T3 and G1 (Table 3) [49]. Similar to BRCA1 gene, the nucleotide distribution of the members of albumin superfamily also exhibited low GC content (< 44.63%) [25]. Uddin and Chakraborty [42] also reported the low GC content of mitochondrial CYB genes [42]. Similar to AARS gene, the overall GC content was higher in GATA2 gene across 3.2. Codon usage bias of BRCA1 and AARS genes The observed ENC value in the coding sequences of BRCA1 and AARS genes among mammals ranged from 49 to 51, indicating low bias in synonymous codon usage and it indicated that all the synonymous codons were used almost equally for the corresponding amino acid in both BRCA1 and AARS genes of mammals [49]. The low codon usage bias of a gene might be helpful for efficient replication in vertebrates with different cell types having different preferences of codons [14,43]. Moreover, low codon bias of a gene indicates the presence of greater genetic variability for synonymous codon usage in the gene. High codon bias arises in a gene when one or few codons within a synonymous family is preferred in the mRNA. 6 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. Fig. 7. Variation of synonymous and nonsynonymous mutation for amino acid in BRCA1 at different positions across nine mammals. (r = −0.853, p < 0.01) between ENC and AT contents were observed for BRCA1 gene across mammals. However, for AARS gene, ENC and GC content were negatively correlated but ENC and AT content were positively correlated. The ENC value was negatively correlated with A3 and T3 but positively correlated with G3 and C3 for the BRCA1 gene across mammals. While for the CDS of AARS gene, the ENC value was negatively correlated with A3 and G3 but positively correlated with T3 and C3. These findings suggest that, compositional constraints were one of the important factors in determining the codon usage patterns in BRCA1 as well as AARS genes across mammals [5]. Relatively weak codon bias exists in the coding sequence of BRCA1 and AARS genes across mammals as reflected by high ENC values (49–51). Similarly, the mean ENC value of GATA2 gene was 41.60 ± 7.33 suggesting the existence of relatively weak codon bias across mammals [24]. ENC value ranged from 51.65 to 56.62 in albumin superfamily genes in human. The overall ENC value of these genes was > 50 which suggested that the synonymous codons were used equally in albumin superfamily genes and hence showed less codon usage bias [25]. The ENC values for CYB gene in different species of pisces, aves and mammals were 58.33, 59.66, and 58.33 respectively [43]. 3.5. Relative synonymous codon usage (RSCU) of BRCA1 and AARS genes among mammals 3.3. Pattern of codon usage To elucidate the relationship of the codon usage variation with GC constraints among the selected coding sequences of BRCA1 and AARS genes, we analyzed the correlation coefficients of codon usage with GC3 using heat map (Fig. 1). We observed that some of the codons displayed positive correlation while some other codons showed negative correlation with GC3. The codon CGC encoding arginine amino acid and the codon TTA encoding leucine were uniformly distributed in BRCA1 and AARS genes respectively in mammals including human. These results from our analysis suggest that the usage frequency of positively correlated codons will increase with the increase of GC bias and that of negatively correlated codons will decrease with the increase of GC bias [24,32]. In GATA2 gene, it was reported that most of the codons with G/C-ending base in the coding sequences were positively correlated with GC3 indicating that codon usage had been influenced by the GC bias but little by the A/T -ending base [24]. The total RSCU values of 59 sense codons excluding the codon ATG and TGG encoding the amino acid methionine and tryptophan, respectively and three stop codons were analyzed in the coding sequences of BRCA1 and AARS genes among mammals (Table 4). We observed that 29 codons (A–ending 9, T–ending 14, G–ending 5, C–ending 1) were more frequently used (RSCU > 1.0) where in T/A-ending codons were predominant over G/C-ending codons. Among the 29 more frequently used codons, four T-ending (CCT, TCT, AGT and ACT) codons, two A-ending (GCA, AGA) codons and one G-ending (AGG) codon were highly used (RSCU > 1.6) in BRCA1 gene. In contrast, for AARS gene, 28 codons (A–ending 2, T–ending 6, G–ending 7, C–ending 13) were more frequently used (RSCU > 1.0) where C–ending codons were more predominant over other codons. Furthermore, from our analysis it was revealed that the codons AGA and AGG encoding arginine amino acid for BRCA1 gene and the codons CTG, GTG encoding leucine and valine amino acid respectively for the CDS of AARS gene across mammals had the highest RSCU value i.e. > 2.0. Our analysis revealed that T-ending codon was mostly favored in the coding sequences of BRCA1 gene but C–ending codon was mostly favored in the coding sequences of AARS gene across mammals [42]. For GATA2 gene the relative codon usage frequency revealed that C-ending codons were mostly preferred to G-ending codons across mammals. The codon ATT encoding 3.4. Relationship of codon usage bias with compositional properties We performed correlation analysis between codon usage bias and compositional properties to understand the effect of base composition on codon usage bias. Significant positive correlation (r = 0.853, p < 0.01) between ENC and GC contents but negative correlation 7 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. Fig. 8. Phylogenetic tree based on nonsynonymous (dN) distance using codon alignment [A] BRCA1 gene and [B] AARS gene across mammals. The tree was drawn to infer the evolutionary history using neighbor-joining method with 1000 bootstrap replicates, conducted in MEGA6. significantly in exons and introns of human genes [21]. In order to find out the relationship between the nucleotide skewness with codon usage bias, we calculated the skew value from the variation in base composition within each coding sequence of BRCA1 and AARS genes. Skewness values for the GC, AT, keto, amino, purine and pyrimidine bases revealed that base composition bias is linked to transcription processes [10,41]. In our analysis, positive skew value was observed (Fig. 2) in case of GC, AT, GC3, keto, amino, purine and pyrimidine bases [27] in the coding sequence of BRCA1 gene across mammals. However, in the coding sequence of AARS gene across mammals, we observed positive skew value in GC and AT bases but negative skew value for keto, amino, purine and pyrimidine bases [43]. Table 5 Comparisons between human and other mammals for the ratio of the number of nonsynonymous substitution per nonsynonymous site to the number of synonymous substitution per synonymous site (dN/dS) in the coding sequences of BRCA1 and AARS genes. Homo Homo Homo Homo Homo Homo Homo Homo sapiens sapiens sapiens sapiens sapiens sapiens sapiens sapiens vs. vs. vs. vs. vs. vs. vs. vs. Pan troglodytes Pongo pygmaeus abelii Nomascus leucogenys Nomascus gabriellae Macaca fascicularis Macaca mulatta Papio anubis Miopithecus talapoin BRCA1 AARS dN/dS dN/dS 1.600 0.667 0.750 0.833 5.500 5.810 5.500 6.000 0.093 0.099 – – 0.059 0.068 0.069 – 3.7. Correspondence analysis To determine the trends in codon usage variation of BRCA1 and AARS genes, we performed correspondence analysis (COA) based on the RSCU values of 59 synonymous codons. We observed that the first principal axis (f1) accounted for 44.91% of total variation, whereas the second axis (f2) accounted for only 33.37% of variation (Fig. 3) in BRCA1 gene. However, in case of AARS gene, the first principal axis (f1) accounted for 62.57% of total variation, whereas the second axis (f2) accounted for only 23.92% (Fig. 3). In these plots, the positions of the isoleucine showed the RSCU value zero because nature might have disfavored this codon in GATA2 gene across the mammals [24]. 3.6. Relationship between nucleotide skewness and codon usage It was reported that due to differential mutational pressure, the usage of nucleotide frequency varies across the genes and differs 8 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. codons are more close to axes, indicating that the base composition for mutation bias might correlate to the codon usage of BRCA1 and AARS genes [15,45]. dinucleotides were over-represented in BRCA1 and AARS genes across mammals which might be due to the effect of CpG dinucleotide as reported earlier in different organisms [4]. 3.8. Neutrality plot analysis 3.10. Amino acid usage and codon bias We performed a neutrality plot analysis of GC12 versus GC3 in order to understand the influence of selection and mutation pressure on codon usage bias of BRCA1 and AARS genes [37]. In neutrality plot analysis, when a gene is located in the figure on the slope of unity there exists a significant correlation between GC12 and GC3, indicating that the gene is under neutral mutation pressure through random selective pressure. But if the gene is under directional mutation pressure, the gene would fall below the slope of unity, i.e. closer to X-axis and farther from the Y-axis. Therefore, a regression line with a slope < 1 indicates that a non-neutral mutation pressure affects the codon usage in the gene within the same genome [28,38]. In our analysis, we observed nonsignificant (p > 0.05) negative correlation (Pearson r = −0.156, r = −0.194) between GC12 and GC3 for BRCA1 and AARS genes respectively. Further, we estimated the magnitude of natural selection and mutation pressure using regression coefficient. The regression coefficient of GC12 on GC3 in BRCA1 gene was 0.409, indicating that the relative neutrality was 40.9%, while the relative constraint was 59.1% for GC3 (Fig. 4). Similarly for AARS gene, the regression coefficient of GC12 on GC3 was 0.054 which indicated the relative neutrality was 5.4% and the relative constraint was 94.6%. These results made us believe that natural selection played a major role while mutation pressure played a minor role in the codon usage bias of BRCA1 and AARS genes across mammals [13,43]. The frequency distributions of amino acid usage in BRCA1 and AARS proteins in the selected mammals were grouped using hierarchical clustering with Euclidean distance and represented in a heat map (Fig. 6). The outcome of the results showed that amino acids serine (S), glutamate (E), leucine (L), lysine (K), asparagine (N) and to a lesser extent threonine (T) and valine (V) were the most over-represented amino acids in BRCA1 protein. Similarly, the amino acids leucine (L), alanine (A), glycine (G), aspartate (D) and valine (V) were the overrepresented amino acids in AARS protein. Multiple amino acid sequence alignment of 1863 amino acid residues for BRCA1 protein across mammals was implemented in Mega6 under CodonW alignment (Fig. 7). Our results showed that amino acid at different positions in BRCA1 protein radically changed in human when compared with other mammals during evolution [24]. In GATA2 gene, four amino acids, namely alanine (A), glycine (G), proline (P) and serine (S) were more widely used and two amino acids isoleucine (I) and tryptophan (W) were least used [24]. 3.11. Non-synonymous substitution in BRCA1 and AARS genes corresponds to phylogeny of mammals The coding sequences of BRCA1 and AARS genes from the selected mammals were aligned in clustalW2 program [40]. Phylogenetic trees based on nonsynonymous substitution in the complete coding sequences of BRCA1 and AARS genes were constructed (Fig. 8) by neighbor-joining method using MEGA6 [39]. The evolutionary distances were computed using the Nei-Gojobori method [29] and were in the units of the number of nonsynonymous substitutions per nonsynonymous site. In our analysis, the tree indicates that closely related mammals were distinctly separated with different clades. The bootstrap values are shown above the branches [24]. In GATA2 gene, there was a close relationship between the rate of nonsynonymous substitution of GATA2 gene in M. musculus and R. norvegicus but distinctly different from H. sapiens [24]. 3.9. Relative abundance of dinucleotide and codon usage patterns Literature suggested that dinucleotide bias can influence the overall codon usage patterns in a variety of organisms [6,16]. We computed the relative abundance of 16 dinucleotides in the coding sequences of BRCA1 and AARS genes in order to assess the effect of dinucleotides on the codon usage patterns of these genes across selected mammals under study. Our analysis showed that CpG dinucleotides were under-represented (mean ± SD = 0.18 ± 0.011, mean ± SD = 0.46 ± 0.025) whereas GpC dinucleotides were over-represented (mean ± SD = 1.04 ± 0.016, mean ± SD = 1.01 ± 0.009) in the CDS of BRCA1 and AARS genes respectively. Moreover, the codons (Table 4) containing CpG dinucleotide (TCG, CCG, ACG, GCG, CGA, CGC, CGG and CGT) had RSCU values < 1.0 indicating their under representation for corresponding amino acids in BRCA1 gene. Similar result was also observed in AARS gene except that the codons CGA and CGG had RSCU value > 1.0. However, the codons containing GpC dinucleotide (i.e. TGC, CGC, GGC, GCC, GCG) had RSCU value < 1.0 except AGC, GCA and GCT in BRCA1 gene. Again, in AARS gene the dinucleotide containing GpC (i.e. TGC, CGC, GGC, GCC, AGC, GCA) had RSCU value < 1.0 except the codons AGC and GCT. Similarly, the dinucleotide TpG containing four codons (TTG, CTG, TGT, GTG) and CpA containing five codons (CCA, CAT, CAG, ACA and GCA) were over-represented in the CDS of BRCA1 gene and most of them were also used as preferred codons for their corresponding amino acid (Fig. 5). Besides, the TpG containing codon CTG and CpA containing codon CAG were over-represented in the CDS of AARS gene. Previous study reported that CpA and TpG were over-represented in different organisms and this could be due to the role of CpG dinucleotide. Spontaneous deamination (mutation) of methylated cytosine in CpG dinucleotide results in thymine (T) residue forming the dinucleotide TpG in the same strand and CpA on the opposite strand of DNA after replication [4]. Over-representation of TpG and CpG dinucleotides in BRCA1 and AARS genes across mammals clearly suggest that mutation through spontaneous deamination might have affected the codon usage pattern of these genes during evolution. Our analysis also revealed that CpA and TpG 3.12. Selection pressure acting on the protein-coding BRCA1 and AARS genes The ratio of nonsynonymous substitution per nonsynonymous site to the number of synonymous substitution per synonymous site (dN/dS ratio) is a good indicator of the extent of selective pressure acting on the protein coding gene [19]. If dN/dS ratio is greater than unity it indicates positive or Darwinian selection where nature supports amino acid change in protein. The ratio less than one reveals purifying selection where nature suppresses the alteration of amino acid whereas the ratio equal to unity points towards neutral selection [1]. In our analysis (Table 5), the dN/dS ratio for the coding sequences of BRCA1 gene was greater than one when Homo sapiens was compared with P. troglodytes, M. fascicularis, M. mulatta, P. anubis, M. talapoin, indicating positive or Darwinian selection during their evolution. However, dN/dS ratio less than one was observed between Homo sapiens and each of P. pygmaeus, N. leucogenys, N. gabriellae suggesting that BRCA1 gene has undergone purifying selection to preserve its protein functionality in these organisms [23]. Similar result (dN/dS < 1) was also observed in the CDS of AARS gene when Homo sapiens was compared with other mammals like P. troglodytes, P. pygmaeus abelii, M. fascicularis, M. mulatta and P. anubis. In GATA2 gene, the mean rate of synonymous substitution per synonymous site (dS) was higher in H. sapiens, M. musculus, S. scrofa and B. Taurus, however these data showed no statistically significant difference between the groups. While the rate of 9 Genomics xxx (xxxx) xxx–xxx S. Chakraborty et al. nonsynonymous substitution per site (dN) for GATA2 gene was also higher in H. sapiens, M. musculus and B. taurus, but relatively lower in S. scrofa and R. norvegicus. However, all the nonsynonymous substitutions between the groups showed strong, statistically significant differences (p < 0.001) [24]. [20] W.H. Li, Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons, J. Mol. Evol. 24 (1987) 337–345. [21] E. Louie, J. Ott, J. Majewski, Nucleotide frequency variation across human genes, Genome Res. 13 (2003) 2594–2601. [22] M. Mahadevappa, J.A. Warrington, Housekeeping Genes, eLS, 2002. [23] T.H. Mazumder, S. Chakraborty, Gaining insights into the codon usage patterns of TP53 gene across eight mammalian species, PLoS One 10 (2015) e0121709. [24] T.H. Mazumder, A. Uddin, S. Chakraborty, Transcription factor gene GATA2: association of leukemia and nonsynonymous to the synonymous substitution rate across five mammals, Genomics 107 (2016) 155–161. [25] H. Mirsafian, A. Mat Ripen, A. Singh, P.H. Teo, A.F. Merican, S.B. Mohamad, A comparative analysis of synonymous codon usage bias pattern in human albumin superfamily, Sci. World J. 2014 (2014). [26] R.R. Nair, M.B. Nandhini, T. Sethuraman, G. Doss, Mutational pressure dictates synonymous codon usage in freshwater unicellular alpha - cyanobacterial descendant Paulinella chromatophora and beta - cyanobacterium Synechococcus elongatus PCC6301, Spring 2 (2013) 492. [27] A. Necsulea, J.R. Lobry, A new method for assessing the effect of replication on DNA base composition asymmetry, Mol. Biol. Evol. 24 (2007) 2169–2179. [28] A. Necşulea, J.R. Lobry, Revisiting the directional mutation pressure theory: the analysis of a particular genomic structure in Leishmania major, Gene 385 (2006) 28–40. [29] M. Nei, T. Gojobori, Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions, Mol. Biol. Evol. 3 (1986) 418–426. [30] R. Nielsen, Z. Yang, Estimating the distribution of selection coefficients from phylogenetic data with applications to mitochondrial and viral DNA, Mol. Biol. Evol. 20 (2003) 1231–1239. [31] M. Nirenberg, P. Leder, M. Bernfield, R. Brimacombe, J. Trupin, F. Rottman, C. O'Neal, RNA codewords and protein synthesis, VII. On the general nature of the RNA code, Proc. Natl. Acad. Sci. U. S. A. 53 (1965) 1161–1168. [32] G.A. Palidwor, T.J. Perkins, X. Xia, A general model of codon bias due to GC mutational bias, PLoS One 5 (2010) e13431. [33] A. Pavlicek, V.N. Noskov, N. Kouprina, J.C. Barrett, J. Jurka, V. Larionov, Evolution of the tumor suppressor BRCA1 locus in primates: implications for cancer predisposition, Hum. Mol. Genet. 13 (2004) 2737–2751. [34] G. Perriere, J. Thioulouse, Use and misuse of correspondence analysis in codon usage studies, Nucleic Acids Res. 30 (2002) 4548–4555. [35] Y. Prat, M. Fromer, N. Linial, M. Linial, Codon usage is associated with the evolutionary age of genes in metazoan genomes, BMC Evol. Biol. 9 (2009) 285. [36] E.M. Rosen, S. Fan, R.G. Pestell, I.D. Goldberg, BRCA1 gene in breast cancer, J. Cell. Physiol. 196 (2003) 19–41. [37] N. Sueoka, Directional mutation pressure and neutral molecular evolution, Proc. Natl. Acad. Sci. U. S. A. 85 (1988) 2653–2657. [38] N. Sueoka, Y. Kawanishi, DNA G+ C content of the third codon position and codon usage biases of human genes, Gene 261 (2000) 53–62. [39] K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: molecular evolutionary genetics analysis version 6.0, Mol. Biol. Evol. 30 (2013) 2725–2729. [40] J.D. Thompson, D.G. Higgins, T.J. Gibson, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice, Nucleic Acids Res. 22 (1994) 4673–4680. [41] M. Touchon, E.P. Rocha, From GC skews to wavelets: a gentle guide to the analysis of compositional asymmetries in genomic data, Biochimie 90 (2008) 648–659. [42] A. Uddin, S. Chakraborty, Synonymous codon usage pattern in mitochondrial CYB gene in pisces, aves, and mammals, Mitochondrial DNA (2015) 1–10. [43] A. Uddin, S. Chakraborty, Codon usage trend in mitochondrial CYB gene, Gene 586 (2016) 105–114. [44] H.C. Wang, D.A. Hickey, Rapid divergence of codon usage patterns within the rice genome, BMC Evol. Biol. 7 (Suppl. 1) (2007) S6. [45] L. Wei, J. He, X. Jia, Q. Qi, Z. Liang, H. Zheng, Y. Ping, S. Liu, J. Sun, Analysis of codon usage bias of mitochondrial genome in Bombyx mori and its relation to evolution, BMC Evol. Biol. 14 (2014) 262. [46] E.H. Wong, D.K. Smith, R. Rabadan, M. Peiris, L.L. Poon, Codon usage bias and the evolution of influenza a viruses. Codon usage biases of influenza virus, BMC Evol. Biol. 10 (2010) 253. [47] F. Wright, The ‘effective number of codons’ used in a gene, Gene 87 (1990) 23–29. [48] C. Xu, X. Cai, Q. Chen, H. Zhou, Y. Cai, A. Ben, Factors affecting synonymous codon usage bias in chloroplast genome of oncidium gower ramsey, Evol. Bioinformatics Online 7 (2011) 271–278. [49] Z. Zhang, W. Dai, D. Dai, Synonymous codon usage in TTSuV2: analysis and comparison with TTSuV1, PLoS One 8 (2013) e81469. Acknowledgements We are thankful to Assam University, Silchar, Assam, India, for providing the necessary lab facilities in carrying out this research work. Ethics statement: Not applicable. The study is based on analysis of DNA sequences available in public databases accessible to everyone. Conflict of interest There is no conflict of interest in this research work. References [1] M. Anisimova, D.A. Liberles, The quest for natural selection in the age of comparative genomics, Heredity (Edinb) 99 (2007) 567–579. [2] S. Aota, T. Gojobori, F. Ishibashi, T. Maruyama, T. Ikemura, Codon usage tabulated from the GenBank genetic sequence data, Nucleic Acids Res. 16 (Suppl) (1988) r315–402. [3] S.K. Behura, D.W. Severson, Comparative analysis of codon usage bias and codon context patterns between dipteran and hymenopteran sequenced genomes, PLoS One 7 (2012) e43111. [4] A.P. Bird, DNA methylation and the frequency of CpG in animal DNA, Nucleic Acids Res. 8 (1980) 1499–1504. [5] A.M. Butt, I. Nasrullah, Y. Tong, Genome-wide analysis of codon usage and influencing factors in chikungunya viruses, PLoS One 9 (2014) e90905. [6] M.L. Chiusano, F. Alvarez-Valin, M. Di Giulio, G. D'Onofrio, G. Ammirato, G. Colonna, G. Bernardi, Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code, Gene 261 (2000) 63–69. [7] J.M. Comeron, M. Aguade, An evaluation of measures of synonymous codon usage bias, J. Mol. Evol. 47 (1998) 268–274. [8] L. Duret, D. Mouchiroud, Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis, Proc. Natl. Acad. Sci. U. S. A. 96 (1999) 4482–4487. [9] E. Eisenberg, E.Y. Levanon, Human housekeeping genes, revisited, Trends Genet. 29 (2013) 569–574. [10] S. Fujimori, T. Washio, M. Tomita, GC-compositional strand bias around transcription start sites in plants and fungi, BMC Genomics 6 (2005) 26. [11] R. Grantham, C. Gautier, M. Gouy, R. Mercier, A. Pave, Codon catalog usage and the genome hypothesis, Nucleic Acids Res. 8 (1980) r49–r62. [12] R. Grantham, C. Gautier, M. Gouy, M. Jacobzone, R. Mercier, Codon catalog usage is a genome strategy modulated for gene expressivity, Nucleic Acids Res. 9 (1981) r43–74. [13] B. He, H. Dong, C. Jiang, F. Cao, S. Tao, Xu L-a, Analysis of codon usage patterns in Ginkgo biloba reveals codon usage tendency from A/U-ending to G/C-ending, Sci. Rep. 6 (2016). [14] G.M. Jenkins, E.C. Holmes, The extent of codon usage bias in human RNA viruses and its evolutionary origin, Virus Res. 92 (2003) 1–7. [15] X. Jia, S. Liu, H. Zheng, B. Li, Q. Qi, L. Wei, T. Zhao, J. He, J. Sun, Non-uniqueness of factors constraint on the codon usage in Bombyx mori, BMC Genomics 16 (2015) 356. [16] S. Karlin, C. Burge, Dinucleotide relative abundance extremes: a genomic signature, Trends Genet. 11 (1995) 283–290. [17] K. Komurov, S. Dursun, S. Erdin, P.T. Ram, NetWalker: a contextual network analysis tool for functional genomics, BMC Genomics 13 (2012) 282. [18] M.A. Kron, M. Petridis, M. Haertlein, B. Libranda-Ramirez, L.E. Scaffidi, Do tissue levels of autoantigenic aminoacyl-tRNA synthetase predict clinical disease? Med. Hypotheses 65 (2005) 1124–1127. [19] S. Kryazhimskiy, J.B. Plotkin, The population genetics of dN/dS, PLoS Genet. 4 (2008) e1000304. 10