Europe PMC
Nothing Special   »   [go: up one dir, main page]

Europe PMC requires Javascript to function effectively.

Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please turn on Javascript support in your web browser and reload this page.

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


The "inactive" X chromosome (Xi) has been assumed to have little impact, in trans, on the "active" X (Xa). To test this, we quantified Xi and Xa gene expression in individuals with one Xa and zero to three Xis. Our linear modeling revealed modular Xi and Xa transcriptomes and significant Xi-driven expression changes for 38% (162/423) of expressed X chromosome genes. By integrating allele-specific analyses, we found that modulation of Xa transcript levels by Xi contributes to many of these Xi-driven changes (≥121 genes). By incorporating metrics of evolutionary constraint, we identified 10 X chromosome genes most likely to drive sex differences in common disease and sex chromosome aneuploidy syndromes. We conclude that human X chromosomes are regulated both in cis, through Xi-wide transcriptional attenuation, and in trans, through positive or negative modulation of individual Xa genes by Xi. The sum of these cis and trans effects differs widely among genes.

Free full text 


Logo of cgenLink to Publisher's site
Cell Genom. 2023 Feb 8; 3(2): 100259.
Published online 2023 Feb 8. https://doi.org/10.1016/j.xgen.2023.100259
PMCID: PMC9932992
PMID: 36819663

The human inactive X chromosome modulates expression of the active X chromosome

Associated Data

Supplementary Materials
Data Availability Statement

Summary

The “inactive” X chromosome (Xi) has been assumed to have little impact, in trans, on the “active” X (Xa). To test this, we quantified Xi and Xa gene expression in individuals with one Xa and zero to three Xis. Our linear modeling revealed modular Xi and Xa transcriptomes and significant Xi-driven expression changes for 38% (162/423) of expressed X chromosome genes. By integrating allele-specific analyses, we found that modulation of Xa transcript levels by Xi contributes to many of these Xi-driven changes (≥121 genes). By incorporating metrics of evolutionary constraint, we identified 10 X chromosome genes most likely to drive sex differences in common disease and sex chromosome aneuploidy syndromes. We conclude that human X chromosomes are regulated both in cis, through Xi-wide transcriptional attenuation, and in trans, through positive or negative modulation of individual Xa genes by Xi. The sum of these cis and trans effects differs widely among genes.

Keywords: sex chromosomes, X chromosome inactivation, gene expression, sex differences, aneuploidy, turner syndrome, Klinefelter syndrome, dosage sensitivity

Introduction

The X chromosome of eutherian mammals exists in two distinct epigenetic states that are referred to as “active” (Xa) and “inactive” (Xi).1,2,3 The “n−1” rule (where n is the number of X chromosomes per cell) states that all diploid human somatic cells possess one X chromosome in the active state (Xa), while all other (i.e., n−1) copies of chromosome (Chr) X4 are transcriptionally repressed through a mechanism known as X chromosome inactivation (XCI). Despite the name, Xi is functionally active, making critical contributions to human fitness and viability. For example, 99% of fetuses with only one sex chromosome (45,X) abort spontaneously, suggesting that viability hinges on gene expression from a second sex chromosome—either Xi or Y.5,6 The rare survivors likely have a mixture of 45,X cells and cells with a second sex chromosome, and they display a constellation of anatomic features known as Turner syndrome.7,8

Studies have revealed that as many as a quarter of X-linked genes are expressed from Xi in humans; such genes are said to “escape” X inactivation.9 Early studies demonstrated the expression of certain Chr X genes on Xi (“escape”) in human-rodent hybrid cell lines that had retained human Xi but had lost human Xa (for example, Mohandas et al., 1980; Brown et al., 1997; and Carrel et al., 1999).10,11,12 Subsequent allele-specific methods distinguished transcripts from Xa and Xi in human cell lines that exhibited skewed XCI or in single cells.13,14,15,16,17,18 While conceptually superior to hybrid cell lines, allele-specific methods yielded sparse data because they require the presence of heterozygous single-nucleotide polymorphisms (SNPs) to differentiate between alleles. Other studies approximated the contributions of Xi to X-linked gene expression by comparing samples with varying Xi copy numbers: in some cases, between 46,XY and 46,XX samples, and in others, between sex chromosome aneuploid and euploid samples.15,19,20,21,22,23,24,25,26 These studies employed analytic methods that made it difficult to separate the effect of Xi copy number from the potentially confounding effects of correlated factors such as Chr Y copy number, hormonal differences, or tissue composition. More importantly—as underscored by this study—previous work assumed, without directly testing, the independence and additivity of Xi and Xa expression. In particular, these studies assumed that any increase in expression observed with additional copies of Xi was due to expression from Xi, which may not always be the case. Given these limitations, we hypothesized that revisiting Xi gene expression with alternative experimental and analytic methods would reveal new insights.

Here, we used a series of quantitative approaches to investigate gene expression from Xi and Xa. Inspired by previous studies, we took advantage of the natural occurrence of diverse sex chromosome aneuploidies in the human population. We performed RNA sequencing (RNA-seq) on two cell types (lymphoblastoid cell lines and primary skin fibroblasts) from 176 individuals spanning 11 different sex chromosome constitutions—from 45,X (Turner syndrome) to 49,XXXXY. We analyzed the resulting data from these 176 individuals using linear regression models to identify significant changes in Chr X gene expression in identically cultured cells with zero, one, two, or three copies of Xi. 38% of Chr X genes displayed significant Xi-driven expression changes, which we quantified on a gene-by-gene basis using a novel metric that we developed called ΔEX. By combining ΔEX findings with allele-specific analyses performed in the same cell lines and comparing our results with published, independent annotations of genes subject to XCI, we found that Xi positively or negatively modulates steady-state levels of transcripts of at least 121 genes on Xa, in trans. Thus, Xi and Xa expression are highly interdependent. By combining ΔEX with published gene-wise metrics of evolutionary constraint, we identified a set of 10 Chr X genes most likely to drive phenotypes that are associated with natural variation in Xi copy number. These 10 candidate “drivers” can now be prioritized in studies of sex differences in common disease and in explorations of sex chromosome aneuploidy syndromes.

Results

Sampling gene expression across sex chromosome constitutions

To conduct a robust, quantitative analysis of Xi’s impacts on X-linked gene expression, we recruited individuals with a wide range of sex chromosome constitutions to provide blood samples and/or skin biopsies (Figure 1A). We generated or received Epstein Barr virus-transformed B cell lines (lymphoblastoid cell lines [LCLs]) and/or primary dermal fibroblast cultures from 176 individuals with one to four X chromosomes and zero to four Y chromosomes. After culturing cells under identical conditions, we profiled gene expression by RNA-seq in LCLs from 106 individuals and fibroblast cultures from 99 individuals (some individuals contributed both blood and skin samples; Tables 1 and S1). To enable analysis at both the gene and transcript isoform levels, we generated 100-bp paired-end RNA-seq reads to a median depth of 74 million reads per sample. A resampling (bootstrapping) analysis of our dataset indicated that including more individuals with sex chromosome aneuploidy would only marginally increase the number of differentially expressed genes detected in our analyses (Figure S1; STAR Methods).

An external file that holds a picture, illustration, etc.
Object name is gr1.jpg

Gene expression analysis of cells from across the spectrum of sex chromosome constitution

(A) Collection and processing of samples from individuals with variation in sex chromosome constitution.

(B) Schematic of the sex chromosomes featuring the X-Y-shared pseudoautosomal regions, PAR1 and PAR2, and the diverged regions, NPX and NPY.

(C) Linear modeling strategy for analyzing RNA-seq data from individuals with one to four copies of Chr X (zero to three copies of Xi).

See also Table S1.

Table 1

Samples included in sex chromosome aneuploidy analysis

Karyotype# LCLs# Fibroblast cultures
45,X1723
46,XX2220
46,XY1714
47,XXX74
47,XXY1130
47,XYY105
48,XXXX10
48,XXXY41
48,XXYY30
49,XXXXY121
49,XYYYY21
Total:10699

A metric for the impact of Xi on gene expression

To leverage the full power of our datasets, we compiled all RNA-seq data for each cell type into a single analysis. We included protein-coding and well-characterized long non-coding RNA (lncRNA) genes with a median expression in either 46,XX or 46,XY samples of at least 1 transcript per million (TPM). This resulted in 357 Chr X genes expressed in LCLs and 393 expressed in fibroblasts. Combining these two gene lists, we analyzed 423 Chr X genes in all. These genes reside within structurally and evolutionarily distinct regions (Figure 1B): two pseudoautosomal regions (PAR1 and PAR2), which are identical in sequence between Chr X and Y, and the non-pseudoautosomal region of the X (NPX), which has diverged in structure and gene content from the non-pseudoautosomal region of the Y (NPY).27,28 Despite this divergence, 17 homologous “NPX-NPY pair genes” with varying degrees of X-Y similarity in sequence and function remain.27,29

We hypothesized that each copy of Xi would incrementally increase expression of some Chr X genes, and therefore, for each gene, we modeled expression as a linear function of Xi copy number, controlling for Chr Y copy number and batch (Figure 1C; STAR Methods). To assess whether expression of each Chr X gene changed linearly per Xi, we fit non-linear least square regression models to the expression data using power functions (STAR Methods). Most NPX and PAR1 genes previously annotated as escaping XCI were best fit by linear models in which expression increases by a fixed amount per Xi, while most genes previously annotated as subject to XCI were best fit by models with no change in expression per Xi (Figure S2; see STAR Methods for the derivation of XCI status annotations from published studies). These results validate the “n−1” rule at the transcriptomic level, indicating that each cell has a single copy of Xa and n−1 copies of Xi. Moreover, linear modeling revealed that contributions by Xi to Chr X gene expression are strikingly modular, meaning that each Xi is more or less equivalent.

Linear models allowed us to identify genes whose expression changed significantly with additional copies of Xi and to quantify the absolute changes in expression (i.e., changes in read counts per Xi). To compare genes expressed at different levels, we also quantified the relative changes in expression per Xi. Specifically, we divided the change in expression per Xi (slope of regression, βX) by the expression from the single Xa (average intercept across batches, β0)—a metric we refer to as ΔEX (Figure 2A). ΔEX = 0 indicates that adding one or more copies of Xi does not affect the level of expression (e.g., PRPS2; Figures 2B and S3); ΔEX > 0 indicates that expression increases under these circumstances (e.g., KDM5C; Figures 2C and S4), ΔEX = 1 indicates that Xa and Xi contribute equally; and ΔEX < 0 indicates that expression decreases (e.g., F8; Figures 2D and S5). XIST, the lncRNA that acts in cis to transcriptionally repress X chromosomes from which it is expressed,30,31 was the only gene without detectable expression in cells with one copy of Chr X (Xa) that was expressed robustly in cells with one or more inactive copies (Xi). Considering samples with two or more X chromosomes, we found that XIST expression increased linearly with each additional copy of Xi (Figures 2E and S4).

An external file that holds a picture, illustration, etc.
Object name is gr2.jpg

Quantitative assessment of Xi contributions to X chromosome gene expression

(A) Schematic scatterplot, linear regression line, and ΔEX calculation for a hypothetical Chr X gene. Each point represents the expression level for an individual sample with the indicated number of copies of Chr X. The calculated coefficients from the linear model in Figure 1C are used to derive ΔEX.

(B–E) Actual scatterplots and regression lines with confidence intervals for selected Chr X genes in LCLs, representing a range of ΔEX values. Adjusted p values (FDR) < 0.05 indicate that ΔEX values are significantly different from 0.

(F and G) Scatterplots of ΔEX versus significance for all Chr X genes expressed in LCLs (F) and fibroblasts (G) illustrate variation in Xi contributions to Chr X gene expression; genes with FDR < 0.05 and |ΔEX| ≥ 0.2 are labeled; genes depicted in (B)–(E) are underlined.

(H) Scatterplot comparing ΔEX in LCLs and fibroblasts for 327 Chr X genes expressed in both cell types. Colors as in (F) and (G). Deming regression line and Pearson correlation are indicated.

See also Table S2.

ΔEX values vary widely among Chr X genes but not between cell types

Analyzing ΔEX values across Chr X genes revealed that Xi contributions to expression varied widely. Of 357 Chr X genes expressed in LCLs, 235 (66%) showed no significant change in expression level with additional copies of Xi (ΔEX ≈ 0), and the same was true for 304 (77%) of 393 Chr X genes expressed in fibroblasts (Figures 2F and 2G; full results in Table S2). This is consistent with these genes being expressed—in the respective cell types—from a cell’s first X chromosome (Xa) and silencing on all others (Xi). The remaining 122 (34%) Chr X genes expressed in LCLs and 89 (23%) Chr X genes expressed in fibroblasts had significantly negative or positive ΔEX values. Combining the results in LCLs and fibroblasts, Xi copy number significantly impacts gene expression levels for 162 of 423 (38%) Chr X genes expressed in one or both cell types. NPX genes’ ΔEX values ranged from −0.39 to 1.2 (Figures 2F and 2G). PAR1 genes had ΔEX values near one, while PAR2 genes had values near zero (Table S2). The stark difference between PAR1 and PAR2 likely reflects their evolutionary origins: PAR1 was preserved on Chr X and Y through sex chromosome evolution and retains autosome-like features, while PAR2 evolved later through a transposition from Chr X to Chr Y.32 For nearly all Chr X genes, the change in expression per Xi falls short of that contributed by Xa (ΔEX<1), similar to previous studies using allelic ratio analysis.13,14 Only two NPX genes—XIST and PUDP—and three PAR1 genes—DHSRX, PLCXD1, and PPP2R3B—showed ΔEX values approaching or exceeding one in both LCLs and fibroblasts.

We assessed whether these Chr X expression dynamics were influenced by factors apart from Xi count. We found few differences between cell types; genes expressed in both LCLs and fibroblasts displayed concordant ΔEX values (Figure 2H). This is consistent with studies of differential expression between 46,XY and 46,XX tissues that found correlated expression changes for Chr X genes across diverse tissues.15 To control for any effects of gonadal sex or Y chromosome copy number on our results, we reanalyzed the data from samples with zero (45,X; 46,XX; 47,XXX; 48,XXXX) or one copy of Chr Y (46,XY; 47,XXY; 48,XXXY; 49,XXXXY), modeling expression as a function of Xi copy number and batch. ΔEX values were unaffected by the presence or absence of a Y chromosome (Figure S6). Because of its design, our study reveals that these consistent Chr X expression dynamics derive from direct, cell-autonomous contributions of Xi rather than systemic effects of hormones or environmental factors.

For genes with multiple transcript isoforms (alternative transcripts), we asked whether ΔEX values were consistent between isoforms (STAR Methods). For most genes, transcript isoforms displayed concordant ΔEX values. However, for genes with multiple transcript isoforms, 33 (19%) in LCLs and 25 (13%) in fibroblasts had discordant ΔEX values: at least one isoform’s ΔEX differed significantly from zero (false discovery rate [FDR] < 0.05), while another isoform’s ΔEX did not (Figure S7; Table S3). The most striking case is that of UBA1, where alternative transcription start sites, separated by a CTCF binding site, display divergent behaviors (Figure S8).

To assess reproducibility, we compared our results with those from an independent dataset that used microarrays to assay gene expression in LCLs across diverse sex chromosome constitutions.24 Reanalyzing this dataset using linear models, we found that the resulting microarray ΔEX values correlated well with the ΔEX values calculated from our RNA-seq data (Figure S9; STAR Methods).

Supernumerary copies of Chr Y and 21 show little attenuation of gene expression

To determine whether the attenuated expression observed with extra copies of Chr X also occurs with additional copies of other chromosomes, we analyzed cells from individuals with additional copies of Chr Y or with trisomy 21, a common autosomal aneuploidy and the cause of Down syndrome.33

For Chr Y, we used the same linear model as for Chr X: modeling expression as a function of Chr Y copy number, Xi copy number, and batch (Figure 3A; STAR Methods). We calculated ΔEY values separately for NPY and PAR genes because NPY genes are not expressed in samples with zero Y chromosomes, while PAR genes are expressed in all samples.

An external file that holds a picture, illustration, etc.
Object name is gr3.jpg

Contributions of Chr Y or 21 copy number to gene expression

(A) Chr Y copy number series with zero to four copies.

(B) Each point shows the expression of NPY gene KDM5D in one LCL sample across the Chr Y copy-number series, with the regression line and its confidence interval plotted. The formula for calculating ΔEY from the regression coefficients is indicated.

(C and D) Scatterplot of ΔEY versus significance for all Chr Y genes expressed in LCLs (C) or fibroblasts (D); all NPY genes are labeled; KDM5D, depicted in (B), is shown in black.

(E) Chr 21 copy-number series with two to three copies.

(F) Each point shows the expression of CCT8 in one LCL sample across the Chr 21 copy-number series, with the regression line and its confidence interval plotted. The formula for calculating ΔE21 from the regression coefficients is indicated.

(G) Scatterplot of ΔE21 versus significance for all Chr 21 genes expressed in LCLs. CCT8, depicted in (F), is shown in black.

(H) Violin plots with median and interquartile range for ΔE values of NPX (without or with an NPY homolog), PAR, NPY, and Chr 21 genes. p values are listed for comparisons referenced in the text. ΔEX values for NPX genes with and without a Y homolog were compared using Wilcoxon rank-sum test. ΔEX and ΔEY values for PAR1 genes were compared using paired t test.

See also Tables S4 and S5.

For NPY genes, we analyzed samples with one to four Y chromosomes to quantify expression differences, if any, between the first and additional Y chromosomes. Expression of all NPY genes increased significantly, with ΔEY values close to 1, consistent with near-equal expression from each copy of Chr Y (e.g., KDM5D; Figures 3B–3D and S10; full results in Table S4).

For PAR genes, we analyzed samples with zero to four Y chromosomes. As with Chr X, PAR1 gene expression increased with additional copies of Chr Y, yielding ΔEY values close to one, whereas PAR2 genes had ΔEY values near zero (Figures 3C and 3D). This implies, first, that PAR1 genes are expressed on each copy of Chr X or Y, while PAR2 genes are only expressed on the first copy of Chr X (Xa), and, second, that PAR1 gene expression from each additional Chr X or Y is roughly equal to expression from the first.

Finally, we examined Chr 21 gene expression as a function of Chr 21 copy number (Figures 3E and S11; STAR Methods). Nearly three-quarters of expressed Chr 21 genes significantly (FDR < 0.05) increased in expression with an additional copy of Chr 21 (e.g., CCT8; Figure 3F), and none significantly decreased (Figure 3G; Table S5). These results align well with independent studies of Chr 21 gene expression.34

In sum, unlike genes on Chr X, our analysis reveals that most genes on Chr Y and Chr 21 are expressed similarly on each copy of their respective chromosomes. The median ΔE values for Chr Y (including NPY and PAR1) and Chr 21 genes range from 0.74 to 1. By comparison, NPX genes without NPY homologs had median ΔEX ≈ 0, while NPX genes with NPY homologs had modestly higher median ΔEX values (LCLs: 0.26, fibroblasts: 0.17; Figure 3H). Even PAR1 genes, which as a group had the highest median ΔEX values, were modestly attenuated on Xi compared with Chr Y, especially in LCLs (Figure 3H). This Y-vs.-X effect was most pronounced for CD99, located near the PAR1-NPX/Y boundary (Tables S2 and S4), consistent with suggestions that PAR1 gene expression on Xi is modestly attenuated by spreading of heterochromatin.15 These differences highlight the absence of a chromosome-wide mechanism attenuating (or otherwise altering) gene expression on supernumerary copies of Chr Y and Chr 21, in contrast to Chr X.

Xi modulation of Xa transcript levels revealed by divergence of ΔEX and allelic ratio

ΔEX conveys the change in a gene’s expression due to an additional Xi regardless of the mechanism(s) responsible for this change. We hypothesized that a Chr X gene’s ΔEX value could reflect the combined effects of two mechanisms: (1) transcription of Xi allele(s) and (2) modulation of the Xa allele by Xi in trans.

Seeking evidence of these mechanisms, we searched gene by gene for agreements and disagreements between our calculated ΔEX values and published descriptions of the genes as “escaping” XCI (being expressed from Xi) or being subject to it (silenced on Xi). For this purpose, we curated annotations of each expressed gene’s XCI status from studies of allele-specific expression13,14,15,16,18 (STAR Methods; Table S6). Many genes with ΔEX > 0 were classified as expressed from Xi (50/74 in LCLs and 43/61 in fibroblasts), indicating that transcription from Xi alleles underlies their ΔEX values (Figure 4A; Table S6). Genes with these characteristics overlapped significantly between fibroblasts and LCLs (Figure 4B).

An external file that holds a picture, illustration, etc.
Object name is gr4.jpg

Comparison of ΔEX values with allelic ratios (ARs) reveals that Xi modulates Xa expression

(A) Stacked barplots for genes with ΔEX values greater than, less than, or approximately equal to zero, apportioned by their annotated XCI status from published studies (see STAR Methods and Table S6 for newly compiled XCI status calls).

(B and C) Venn diagrams comparing LCLs and fibroblasts for genes with ΔEX values that are either (B) explained or (C) not explained by published XCI status. Genes expressed in both cell types were included in the Venn diagrams, and genes with cell-type-specific expression are noted below.

(D and E) Each point shows the mean adjusted AR for an informative gene (with heterozygous SNPs in at least 3 samples with skewed XCI) and whether AR is significantly greater than zero in (D) LCLs or (E) fibroblasts.

(F and G) Each point denotes AR and ΔEX values for an AR-informative gene in (F) LCLs or (G) fibroblasts. The color of the point indicates whether the gene’s AR value is significantly greater than zero (blue) or not (gray); the shape indicates whether the gene’s ΔEX value is significantly different from zero (circles) or not (squares); and an orange outline indicates that ΔEX differs significantly from AR. Black diagonal line, AR = ΔEX. Pearson correlation coefficients (r) and p values are indicated.

(H) Venn diagram comparing LCLs and fibroblasts for genes with ΔEX values not equal to their AR values. Genes expressed and informative in both cell types are depicted in the Venn diagram, with genes that are cell-type specific or informative in only one cell type indicated below.

(I) Venn diagram comparing all modulated genes in LCLs and fibroblasts (the union of figures, C and H). All Venn diagram p values, hypergeometric test.

See also Table S6.

For 102 (24%) of the 423 Chr X genes we evaluated in LCLs or fibroblasts, our calculated ΔEX values were at odds with expectations arising from the published annotations of XCI status (Figures 4A and 4C; Table S6). For example, among genes with ΔEX > 0, 22 in LCLs and 14 in fibroblasts were described as silenced on Xi. Additionally, previous models offer no explanation for the 48 genes in LCLs and 28 genes in fibroblasts with ΔEX < 0, most of which were described as silenced on Xi (Figures 4A and 4C; Table S6). Genes with these characteristics did not overlap significantly between LCLs and fibroblasts, even though most are expressed in both cell types, indicating that this regulation is largely cell-type specific (Figure 4C). These unanticipated findings are unlikely to reflect experimental error in the previous or current studies. Instead, they suggest that, for many Chr X genes whose Xi allele(s) are silent, the Xa allele is nonetheless upregulated (ΔEX > 0) or downregulated (ΔEX < 0) by Xi.

To corroborate these findings in our own dataset, we performed an allele-specific analysis in our LCL and fibroblast samples with two X chromosomes. To distinguish between Chr X alleles, we identified heterozygous SNPs in expressed genes (STAR Methods). We then identified samples in our dataset with skewed XCI (21 LCL and 10 fibroblast samples; Figures S12–S16) and used these samples to compute the average ratio of Xi to Xa expression (the allelic ratio [AR]) for each gene. To calculate the AR, we required heterozygous SNPs in at least three samples, resulting in AR values for 151 genes in LCLs and 119 in fibroblasts (Table S6; Figure S17). In LCLs and fibroblasts, respectively, 38 (25%) and 22 (18%) of these genes had AR values significantly greater than zero, indicating that they are expressed from Xi (Figures 4D and 4E); these results agreed well with published AR values (Figure S18).

We next compared each gene’s AR and ΔEX values. If Xi and Xa expression are fully independent of each other, and therefore additive, we would expect the AR for a given gene to approximate its ΔEX value. However, if Xi modulates the gene’s Xa transcript levels in trans, then independence and additivity will not be observed, and instead the gene’s ΔEX and AR values will differ. Most X-linked genes, e.g., EIF2S3, had AR values that approximate their ΔEX values, and AR and ΔEX were highly correlated among many informative genes in both LCLs and fibroblasts (Figures 4F and 4G). For these genes, the ΔEX value may directly reflect the level of transcription from Xi.

However, for 33 informative genes in LCLs or fibroblasts, AR and ΔEX were significantly different, indicating that Xi modulated Xa transcript levels upward or downward in trans (Figures 4F–4H; STAR Methods). Some genes, like MPP1, were not expressed from Xi (AR ≈ 0) but nonetheless had ΔEX values significantly different from zero: 0.24 in LCLs and 0.21 in fibroblasts, indicating that levels of Xa-derived transcripts are positively regulated by Xi. Other genes, like DDX3X and PUDP, had significant expression from Xi (AR > 0, FDR < 0.05) and evidence of Xi regulation of steady-state expression levels. DDX3X had an AR (LCLs: 0.55, fibroblasts: 0.42) that is significantly higher than its ΔEX value (LCLs: 0.26, fibroblasts: 0.16) in both LCLs and fibroblasts, indicating both that DDX3X is expressed on Xi and that its steady-state transcript levels are negatively regulated by Xi. Conversely, PUDP had an AR (LCLs: 0.71, fibroblasts: 0.73) that is significantly lower than its ΔEX value (LCLs: 1, fibroblasts: 1.2), indicating both that PUDP is expressed on Xi and that the gene’s steady-state transcript levels are positively regulated by Xi.

These analyses, combining ΔEX with published or newly derived AR data, provide a rich portrait of X-linked gene regulation. They show that Xi can impact expression levels of an X-linked gene through two mechanisms: transcription of the Xi allele and modulation of steady-state transcript levels by Xi in trans. These mechanisms can operate independently of each other, or together, on a gene-by-gene basis, and each of the two mechanisms affects a sizable fraction of all X chromosome genes. Of 423 X chromosome genes expressed in LCLs and/or fibroblasts, at least 121 genes (29%) are modulated on Xa by Xi in one or both cell types (Figure 4I). This represents the union of the 102 genes for which the public AR data cannot explain the ΔEX values (Figure 4C) and the 33 genes with AR values significantly different from ΔEX (Figure 4H). The observed modulation of steady-state transcript levels suggests that Xi regulates the expression of genes on Xa in trans.

Combining ΔEX and expression constraint metrics identifies likely drivers of Xi-associated phenotypes

While the somatic cells of all diploid individuals have one Xa, the number of Xis varies in the human population from zero to four. This variation is associated with many important differences in phenotypes and disease predispositions, for example those observed between 45,X (Turner syndrome) and 46,XX individuals, between 47,XXY (Klinefelter syndrome) and 46,XY individuals, or even between 46,XY males and 46,XX females. We hypothesized that phenotypes and predispositions associated with Xi copy number are due to changes in the copy numbers of some of the Chr X genes where we found positive or negative ΔEX values. We reasoned that phenotypically critical genes would be “dosage sensitive”, i.e., their expression levels would be tightly constrained by natural selection, while the expression levels of genes whose dosage is not phenotypically critical could vary with little consequence.

To gauge the constraints that selection has imposed on each gene’s expression level, we turned to metrics derived from population and evolutionary genetic studies. We assessed tolerance of under-expression using (1) loss of function observed/expected upper fraction (LOEUF), the ratio of observed to expected loss-of-function (LoF) variants in human populations,35 (2) RVIS, the residual variation intolerance score,36 and (3) pHI, the probability of haploinsufficiency.37 Both LOEUF and RVIS use large-scale human genomic sequencing data to evaluate selection against LoF variants, while pHI is based on evolutionary and functional metrics. LoF variants should be culled from the population in genes whose under-expression is deleterious, while they may accumulate in genes whose under-expression has little effect on fitness.

To assess tolerance of over-expression, we examined conservation of targeting by microRNAs (miRNAs; PCT score38), which repress expression by binding to a gene’s 3′ untranslated region.39 Genes sensitive to over-expression have maintained their miRNA binding sites across vertebrate evolution, while genes whose over-expression has little or no effect on fitness show less conservation of these sites.40

To weigh these four metrics simultaneously, we calculated each gene’s percentile rank for each metric, from most constrained (high percentile) to least constrained (low percentile). We calculated percentiles separately for autosomal (including PAR1) and NPX genes and then, for each gene, averaged percentile rankings across the four metrics.

We first examined expression constraints for PAR1 genes whose high ΔEX values suggested that they may drive phenotypes associated with Xi copy number. Compared with autosomal genes, PAR1 genes are less constrained on average (p = 5.5e−5, Wilcoxon rank-sum test), with most ranking in the least-constrained quartile (Figures 5A and 5B; Table S7). This indicates that altering their expression levels has little impact on human fitness. Indeed, homozygous LoF mutations have been reported for 3 of 15 PAR1 genes, demonstrating dispensability.35 Only two PAR1 genes, SHOX and SLC25A6, rank in the more constrained half of the comparison group (Table 2). SHOX copy number contributes to variation in height in individuals with sex chromosome anomalies,41,42,43,44,45 while SLC25A6 has not yet been linked to any phenotype. Apart from these two genes, the high tolerance of under- and over-expression for most PAR1 genes argues against prominent roles in phenotypes associated with Chr X (or X + Y) copy number.

An external file that holds a picture, illustration, etc.
Object name is gr5.jpg

Combining ΔEX with metrics of constraint on expression levels identifies genes likely to contribute to phenotypes associated with Xi copy number

Scatterplots of ΔEX versus gene constraint percentile ranking for PAR1 (A and B) or NPX (C and D) genes. Each point represents an expressed gene with scores for at least two of the four expression constraint metrics evaluated, excluding ampliconic genes. Dashed lines indicate |ΔEX| thresholds of 0.1 for genes to be considered likely contributors to phenotypes driven by Xi copy number; labeled genes include (A and B) SLC25A6, the only PAR1 gene to score above the 50th percentile for autosomal and PAR genes, and (C and D) among NPX genes with |ΔEX| > 0.1, the 10 genes with the highest constraint percentile rankings in LCLs or fibroblasts.

See also Table S7.

Table 2

X chromosome genes that may drive the phenotypic impacts of variation in Xi copy number

RegionGene symbolGene nameNPY gene symbolΔEX
Gene constraint (average % ranking)aDisease associations
LCLFib.PhenotypeInheritancebMIM #
NPXKDM6Alysine demethylase 6AUTY0.830.4593.3Kabuki syndromeXLD300867
KDM5Clysine demethylase 5CKDM5D0.730.5890.3Claes-Jensen syndromeXLR300534
SMC1Astructural maintenance of chromosomes 1A0.580.4387.6Cornelia de Lange syndrome; developmental and epileptic encephalopathyXLD300590, 301044
ZFXzinc finger protein X-linkedZFY0.450.4783.0
RBBP7RB-binding protein 7, chromatin remodeling factor0.010.2982.5
DDX3XDEAD-box helicase 3 X-linkedDDX3Y0.260.1689.2syndromic IDD,c Snijders Blok typeXLD, XLR300958
CDK16Cyclin dependent kinase 160.090.2483.8
DLG3discs large MAGUK scaffold protein 30.180.0782.8IDDXLR300580
USP9Xubiquitin-specific protease 9 X-linkedUSP9Y0.140.1794.4IDDXLR, XLD300919, 300968
BCORBCL6 corepressor0.120.0191.1oculofaciocardiodental syndromeXLD300166
PAR1SLC25A6solute carrier family 25 member 6N/A1.00.7467.4
SHOXshort stature homeoboxN/AN/AdN/A58.4Leri-Weill dyschondrosteosis; Langer mesomelic dysplasia; short stature idiopathic familialPD, PR127300, 249700, 300582
aGene constraint percentile ranking is calculated for NPX genes relative to all annotated NPX genes and for PAR1 genes relative to all PAR and autosomal genes.
bXLD, X-linked dominant; XLR, X-linked recessive; PD, pseudoautosomal dominant; PR, pseudoautosomal recessive.
cIDD, intellectual developmental disorder.
dSHOX is not expressed in fibroblasts or LCLs but is included because its dosage has been conclusively linked to height in individuals with sex chromosome aneuploidy.

Turning to the much larger set of NPX genes, we found that their widely ranging constraint metrics correlated poorly with their ΔEX values (Figures 5C and 5D; Table S7). Thus, ΔEX alone does not predict dosage sensitivity among NPX genes. To identify the NPX genes most likely to drive Xi-copy-number-dependent phenotypes, we selected those with |ΔEX| ≥ 0.1 (FDR < 0.05) in LCLs or fibroblasts and ranked these by their average constraint metrics. Five of the top 10 genes by these criteria (Table 2) had NPY homologs, a significant enrichment (p = 7.0e−4, hypergeometric test), and all five had ΔEX values > 0.1 in both cell types. Of the five genes without NPY homologs, only two had ΔEX values significantly greater than zero in both cell types: SMC1A and CDK16. The remaining three genes had ΔEX values significantly different from zero in only one of the two cell types analyzed, and they tended to have lower absolute ΔEX values.

If these 10 genes are dosage-sensitive drivers of Xi-dependent phenotypes—even when harboring no mutations—then one might expect mutant phenotypes to be pronounced and to display distinctive modes of inheritance. Accordingly, we searched OMIM for disease annotations. Germline mutations in seven of the 10 genes are reported to cause severe developmental disorders, including well-characterized childhood syndromes for five of the genes (Tables S2 Table S7). Indeed, five of the seven mutation-bearing genes are reported to display dominant inheritance (affected heterozygous females)—a significant enrichment among X-linked genes (p = 0.0037, hypergeometric test) and consistent with extraordinary dosage sensitivity. Extrapolating from these findings, we speculate that some of the genes without annotated OMIM phenotypes may have important roles in disease; in the case of ZFX, no LoF mutations are reported in gnomAD even though the gene’s roles in regulating stem cell self-renewal and cancer cell proliferation are well documented.46,47,48 Taken together, these 10 genes represent good candidates for driving Xi-dependent phenotypes characteristic of individuals with sex chromosome aneuploidies—as well as differences in disease risks between ordinary (euploid) females and males.

Discussion

We analyzed Chr X gene expression quantitatively in two types of cells cultured from individuals with one to four X chromosomes, e.g., 45,X to 49,XXXXY (Figure 1). Folding this diversity of sex chromosome constitutions into a single linear model (Figure 2) yielded advantages over previous studies, which compared sex chromosome constitutions in pairwise fashion, most frequently 45,X vs. 46,XX; 46,XY vs. 47,XXY; or 46,XX vs. 46,XY. First, our linear model embodied, tested, and confirmed—at the level of the X transcriptome—the “n−1” rule,4 whereby diploid somatic cells with a given number (n = 1, 2, 3, 4) of X chromosomes have a single Xa and n−1 Xi’s. Second, linear modeling provided the power needed to detect and precisely quantify increases or decreases in expression of individual Chr X genes as a function of Chr X copy number. Third, linear modeling revealed that the expression contributions made by each copy of Xi are modular, indicating that each copy of Xi is equivalent, or nearly so, even among unrelated individuals. Fourth, by comparing samples that vary in Xi copy number with and without a Y chromosome, we found that expression from Xa is quantitatively indistinguishable in phenotypic males and females—as is expression from Xi (Figure S6). Thus, both Xi and Xa make modular contributions to Chr X gene expression—contributions independent of and unaffected by the presence of the NPY or the gonadal sex of the individual.

Finally, linear modeling of gene expression as a function of Chr X copy number yielded the metric ΔEX, which captures the positive or negative impact of Xi(s) on steady-state transcript levels for each gene, normalized to account for gene-to-gene variation in expression level (Figure 2A). Fully 38% (162/423) of expressed Chr X genes in LCLs or fibroblasts displayed a statistically significant positive or negative ΔEX value, indicating that their expression is impacted by the presence of one or more copies of Xi. This is nearly double what would be expected based on the prior literature’s estimates of escape, which—based upon our re-analysis using the broadest definition of escape—includes only 20% (86/423) of expressed Chr X genes in these cell types (Table S6). ΔEX values varied widely among Chr X genes, from −0.39 to 1.2 (Figures 2F and 2G), but showed much less variation between the two cell types studied (Figure 2H), suggesting the possibility that the ΔEX “settings” for each gene were established prior to the embryonic divergence of the hematopoietic and skin fibroblast lineages and subsequently maintained through development.

We extended the utility of the ΔEX metric by cross-referencing and comparing it, one gene at a time, with an orthogonal metric: the AR of Xi and Xa transcripts in cells with skewed XCI and SNP heterozygosity. AR values significantly greater than zero unambiguously identify Chr X genes that are expressed from both Xi and Xa (and therefore “escape” XCI).49 By comparing ΔEX and AR values, we discovered that Xi up- or downmodulates Xa expression of at least 121 genes, or nearly 29% of the 423 Chr X genes that are demonstrably expressed in either LCLs or fibroblasts. This modulation is manifest whenever a gene’s AR and ΔEX values differ significantly, and it is most starkly apparent when the gene is not expressed from Xi (i.e., when AR approximates zero) but nonetheless displays a significantly positive or negative ΔEX value. While “escape” from XCI has been well documented over the past four decades,12,13,14,15 the novel combination of the AR and ΔEX metrics reported here was required to observe modulation, explaining why it has previously been unappreciated.

Thus, combined analysis of ΔEX and AR reveals a nuanced, gene-by-gene tapestry of Xi-driven changes in expression of Chr X genes. For some genes, ΔEX was explained entirely by expression from the Xi allele, while for others, ΔEX was explained entirely by modulation—positive or negative—of steady-state RNA levels derived from the Xa allele. For a third set of genes, ΔEX was explained by the combined effects of expression from Xi and modulation of steady-state RNA levels. Proposals of uniform, chromosome-wide “X chromosome upregulation” (XCU) during mammalian development or evolution50,51 will need to be revisited in light of this unforeseen diversity of gene-by-gene responses to variation in Chr X copy number.

Finally, we paired the ΔEX metric with population and evolutionary measures of constraint on expression levels to identify 10 NPX genes that are most likely (among the 423 Chr X genes expressed in LCLs and/or fibroblasts) to drive Xi-associated phenotypes (Figures 5C and 5D; Table 2). Despite their high ΔEX values, most PAR genes did not exhibit the constraints on expression levels that we required for inclusion in this select group of candidate drivers (Figures 5A and 5B). We propose the 10 NPX genes—five of which have divergent NPY homologs—as potential drivers of (1) differences in health and disease between 46,XY and 46,XX cohorts and (2) the distinctive phenotypes associated with sex chromosome aneuploidies, including Turner syndrome (45,X) and Klinefelter syndrome (47,XXY). We speculate that one or more of these 10 NPX genes, which include transcriptional and epigenetic regulators, may also drive the modulation of Xa genes by Xi.

Limitations of the study

The human individuals sampled here are mostly of European ancestry; it will be important to validate these findings in a more ancestrally diverse set of individuals. Our findings in LCLs and fibroblasts were largely concordant, but they may not generalize to all somatic tissues and cell types. Our study focused on 423 Chr X genes that are expressed in LCLs and/or fibroblasts; our conclusions may not generalize to Chr X genes that are not expressed in these cell types. Our list of Chr X genes likely to drive Xi-dependent phenotypes is incomplete, as it is biased toward genes expressed in LCLs and fibroblasts and toward genes with long open reading frames well suited to expression constraint analysis; future studies will add to this list. In addition to these caveats regarding our current findings, several topics remain unexplored in this article and should be addressed in future studies; these include the molecular mechanisms by which Xi modulates gene expression on Xa, whether these mechanisms are direct or indirect, and whether these mechanisms also affect gene expression on autosomes.

STAR[large star]Methods

Key resources table

REAGENT or RESOURCESOURCEIDENTIFIER
Experimental models: Cell lines

Lymphoblastoid cell lines and primary fibroblast cell culturesThis paperN/A
Lymphoblastoid cell linesColorado Children’s Hospital BiobankN/A
Lymphoblastoid cell lines and primary fibroblast cell culturesCoriell Cell RepositorySee Table S1
EBV-producing lymphoblastsCoriell Cell RepositoryB95-8 (RRID:CVCL_1953)

Chemicals, peptides, and recombinant proteins

PercollCytivaCat# 17-0891-01
RPMI 1640GibcoCat# 31800-089
HEPESSAFCCat# RES6008H-A702X
FBSHycloneCat# SH30071
Amphotericin BGibcoCat# 15290-018
GentamicinGibcoCat# 15710-072
Penicillin-StreptomycinLonzaCat# 11140-076
CyclosporineLC LaboratoriesCat# C-6000
DMEM/F12GibcoCat# 11320-033
DMEM High GlucoseGibcoCat# 11960-069
L-GlutamineMP BiomedicalsCat# IC10180683
MEM Non-essential amino acidsGibcoCat# 15140-163
GelatinSigmaCat# G2500
TRIzolThermoFisherCat# 15596026
RNAprotect Cell ReagentQiagenCat# 76526

Critical commercial assays

Vacutainer ACD TubesBD BiosciencesCat# 364606
MycoAlert KitLonzaCat# LT07-318
SapphireAmp Fast PCR Master MixTakaraCat# RR350A
RNeasy Plus Mini KitQiagenCat# 74134
QIAshredder ColumnsQiagenCat# 79654
Qubit RNA HS Assay KitThermoFisherCat# Q32855
Fragment Analyzer RNA KitAgilentCat# DNF-471
HS NGS Fragment KitAgilentCat # DNF-474
TruSeq RNA Library Preparation Kit v2IlluminaCat# RS-122-2001
KAPA mRNA Hyper-Prep KitRocheCat# KK8581
PippinHT 2% Agarose Gel CassettesSage ScienceCat# HTC2010

Deposited data

Raw, de-identified RNA-seq dataThis paperdbGaP: phs002481.v2.p1
Processed dataThis paperhttps://doi.org/10.5281/zenodo.7504743
Custom GENCODE v24 transcriptome annotationGodfrey et al., 202052https://doi.org/10.5281/zenodo.3627233
GTEx Expression DataGTEx Consortium et al., 201753https://gtexportal.org
CTCF ChIP-seq GM12878ENCODEENCFF852CRG
CTCF ChIP-seq GM12864ENCODEENCFF593YIG
CTCF ChIP-seq AG09309ENCODEENCFF640EZJ
CTCF ChIP-seq AG10803ENCODEENCFF694SQA
CTCF ChIA-PET GM12878ENCODEENCFF80PGS, ENCFF847QOE
LOEUF scores and homozygous LoF gene listKarczewski et al., 202035https://gnomad.broadinstitute.org/
RVIS scoresPetrovski et al., 201336http://genic-intolerance.org/data/RVIS_Unpublished_ExAC_May2015.txt
Pct scoresFriedman et al., 200938N/A
pHI scoresHuang et al., 201037N/A
X chromosome inactivation consensus callsBalaton et al., 20159N/A
Microarray dataset of sex chromosome aneuploidy samplesRaznahan et al., 201824N/A
Allelic expression data in fibroblasts and hybrid cell linesCarrel et al., 200513N/A
Allelic expression data in LCLs and fibroblasts from paired genomic and cDNA SNP-chipsCotton et al., 201314N/A
Allelic expression data from bulk and single LCLs in GTExTukiainen et al., 201715N/A
Allelic expression data in single fibroblastsGarieri et al., 201816N/A
Allelic expression data in bulk LCLsSauteraud et al., 202118N/A
Phenotype associations with Chr X genesOMIMhttps://www.omim.org/

Oligonucleotides

ERCC RNA Sike-In MixInvitrogenCat#: 4456740
Mycoplasma primer, F: CTT CWT CGA CTT YCA GAC CCA AGG CATThis paperN/A
Mycoplasma primer, R: ACA CCA TGG GAG YTG GTA ATThis paperN/A
hGAPDH primer, F: TGT CGC TGT TGA AGT CAG AGG AGAThis paperN/A
hGAPDH primer, R: AGA ACA TCA TCC CTG CCT CTA CTGThis paperN/A

Software and algorithms

Custom code to process RNA-seq data and generate figuresThis paperhttps://doi.org/10.5281/zenodo.7504743
R v3.6.3The R Foundationhttps://www.r-project.org
Kallisto v0.42.5Bray et al., 201654https://pachterlab.github.io/kallisto/
DESeq2 v1.26.0Love et al., 201455https://bioconductor.org/packages/release/bioc/html/DESeq2.html
Sleuth v0.30.0Pimentel et al., 201756https://pachterlab.github.io/sleuth/
BEDTools v2.26.0Quinlan et al., 201057https://bedtools.readthedocs.io/
IGVRobinson et al., 201158https://software.broadinstitute.org/software/igv/
Best Practices workflow for identifying short variants in RNA-seq dataBroad Institutehttps://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-
IllustratorAdobehttps://www.adobe.com/products/illustrator.html

Other

QuBit 4 FluorometerThermoFisherN/A
5200 Fragment Analyzer SystemAgilentN/A
PippinHT systemSage SciencesN/A
HiSeq 2500IlluminaN/A
NovaSeq 6000IlluminaN/A

Resource availability

Lead contact

Further information and request for resources and reagents should be directed to and will be fulfilled by lead contact, David C. Page (dcpage@wi.mit.edu).

Materials availability

Cell lines are available upon request to the lead contact.

Experimental model and subject details

Human subjects

Adults (18+ years of age) with sex chromosome aneuploidies or euploid controls were recruited through an IRB-approved study at the NIH Clinical Center (12-HG-0181) and Whitehead Institute/MIT (Protocol #1706013503). Informed consent was obtained from all study participants. Individuals with a previous karyotype showing non-mosaic sex chromosome aneuploidy were included in the study. From these individuals, blood samples and skin biopsies were collected at the NIH Clinical Center and shipped to the Page lab for derivation of cell lines. In addition, blood samples from individuals with sex chromosome aneuploidies, and euploid family members, ranging in age from 4-44 years were contributed by the Focus Foundation. Additional LCLs and fibroblast cultures were obtained from the Colorado Children’s Hospital Biobank and Coriell Research Institute, and cultured in the Page laboratory for at least two passages prior to collection for RNA-sequencing. Karyotyping of peripheral blood and fibroblast cell cultures was performed at the National Human Genome Research Institute Cytogenetics and Microscopy Core. To reduce the impact of sex chromosome mosaicism on our sex chromosome aneuploidy analysis, we excluded individuals with >15% mosaicism for other karyotypes. Metadata for cell lines represented in the RNA-sequencing dataset are provided in Table S1.

Method details

Cell culture

Lymphoblastoid cell lines

Blood was collected in BD Vacutainer ACD tubes and shipped at room temperature to the Page Lab for processing 1-3 days after collection. The buffy coat was resolved by centrifuging blood at 3300 rpm for 10 min, transferred to a new tube with PBS, and subjected to density gradient centrifugation in 50% Percoll (Cytiva) at 3300 rpm for 10 min. Lymphocytes were transferred to a new tube and washed twice with PBS. Lymphocytes were resuspended in 3 mL complete RPMI medium (RPMI 1640 (Gibco), 25mM HEPES (SAFC), 15% FBS (Hyclone), Fungizone (Amphotericin B, Gibco), Gentamicin (Gibco), Penicillin/Streptomycin (Lonza), pH 7.2) per tube of blood and transferred to a T25 flask, supplemented with 0.25mL EBV (produced by B95-8 marmoset lymphoblasts), and 0.2 mL of 1 mg/mL cyclosporine (LC Laboratories). They were incubated for one week at 37°C, fed 1-2 mL complete RPMI, and incubated for another week at 37°C. Once the media began to turn yellow (acidified), cultures were “half-fed” by removing half of the media and replacing it with double the volume. When cultures reached 15 mL, they were transferred to T75 flasks, and gradually expanded to 30 mL, while maintaining a concentration of <1 million cells/mL to ensure viability. Cells were viably frozen for future use by mixing with freezing media (LCL culture media + 5% DMSO), 1 million cells per vial. Cells were also preserved for RNA, DNA, and protein extraction (see below).

Primary fibroblast cultures

Our protocol for generating primary skin fibroblast cultures from a skin biopsy is based on Vangipuram et al.59 From adults (18+ years of age) at the NIH Clinical Center we obtained two 4-mm skin punch biopsies from the upper arm, which were immediately placed into a 15 mL conical tube with 10 mL of media (DMEM/F12 (Gibco), 20% FBS, and 100 IU/mL Penicillin-Streptomycin. Tubes were shipped to the Page lab overnight on ice for processing. Each biopsy was used to generate a separate skin fibroblast culture. Biopsies were cut into 18 pieces of equal size and placed 3/well in gelatinized 6-well plates with 1 mL media (High Glucose DMEM (Gibco), 20% FBS, L-Glutamine (MP Biomedicals), MEM Non-Essential Amino Acids (Gibco), 100 IU/mL Penicillin/Streptomycin (Lonza)). Plates were gelatinized by incubating 1 mL sterile 0.1% gelatin (Sigma) solution per well for 30 min at room temperature.

Plates were incubated for 1 week at 37°C without disturbance to allow biopsies to attach to the plate and begin to grow out. During week 2, we added 200 μL of fresh media per well every 2-3 days, being careful not to disturb the biopsies. The following week (week 3), we aspirated the media and replaced with 1 mL fresh media per well every 2-3 days. During week 4, we aspirated the media and replaced with 2 mL fresh media per well every 2-3 days. At this point, the fibroblasts generally reached the edges of the wells and were expanded to two T75 gelatinized flasks per 6 well plate. After two days, we combined the cells from the two T75 flasks and split them to three T175 gelatinized flasks. After two days, cells were viably frozen with 1 million cells per vial in freezing media (fibroblast culture media + 5% DMSO). Cells were also preserved for RNA extraction (see below). During optimization of the protocol, cell culture purity was confirmed by immunofluorescence of SERPINH1, a fibroblast marker.

Cell collection for subsequent analysis

Cells were collected when LCL cultures reached 30mL, and fibroblasts were ~80% confluent in three T175 plates. All cell counting was performed using the Countess II cell counter (Life Technologies) and Trypan Blue exclusion. Cultures with >85% cell viability were used in subsequent experiments. To preserve cells for subsequent RNA extraction, 1 million cells were washed in PBS, pelleted, and resuspended in 500 μL TRIzol (Invitrogen) or 200 μL RNAprotect Cell Reagent (Qiagen). Cell suspensions were then frozen at −80°C. Cell cultures were maintained at low passage number; RNA-sequencing experiments were performed on samples at or below passage 4.

Periodically, and on each passage used for experiments, cell cultures were confirmed negative for mycoplasma contamination using either the MycoAlert Kit (Lonza) following the manufacturer’s instructions, or PCR using SapphireAmp Fast PCR Master Mix (Takara) and the following primers:

Myco2(cb): 5′ CTTCWTCGACTTYCAGACCCAAGGCAT-3′

Myco11(cb): 5′ ACACCATGGGAGYTGGTAAT-3′

PCR for GAPDH was performed on the same sample, using the following primers:

hGAPDH-F: TGT CGC TGT TGA AGT CAG AGG AGA

hGAPDH-R: AGA ACA TCA TCC CTG CCT CTA CTG.

Known mycoplasma positive and negative samples were used as a reference.

RNA extraction, library preparation, and sequencing

RNA was extracted from 1 million cells per experiment using the RNeasy Plus Mini Kit (Qiagen) following the manufacturer’s instructions, with the following modifications: Cells in RNAprotect Cell Reagent were thawed on ice, pelleted, and lysed in buffer RLT supplemented with 10μL β-mercaptoethanol per mL. For most samples, ERCC control RNAs were added to the lysate based on the number of cells: 10μL of 1:100 dilution of ERCC control RNAs was added per 1 million cells. The lysate was then homogenized using QIAshredder columns (Qiagen), and transferred to a gDNA eliminator column. All subsequent optional steps in the protocol were performed, and RNA was eluted in 30 μL RNase-free water. RNA levels were measured using a Qubit fluorometer and the Qubit RNA HS Assay Kit (ThermoFisher). Before we switched to the per-cell spike-in protocol, we prepared 18 samples in which ERCC control RNAs were added based on amount of RNA after isolation: 2 μL of a 1:100 dilution of ERCC control RNAs was added per 1 μg of RNA. These samples are: #2237, 2245, 6312, 711, 4032, 706, 3429, 3430, 3442, 2690, 2703, 3107, 5297, 5566, 5755, 6029, 2547, and 525. RNA quality control was performed using the 5200 Fragment Analyzer System (Agilent); we consistently purified high-quality RNA with RNA integrity numbers (RIN) near 10. We randomized the samples by karyotype into batches for RNA extraction, library preparation, and sequencing.

RNA sequencing libraries were prepared using the TruSeq RNA Library Preparation Kit v2 (Illumina) with modifications as detailed in Naqvi et al,60 or using the KAPA mRNA Hyper-Prep Kit V2 (Roche). In both cases, libraries were size selected using the PippinHT system (Sage Science) and 2% agarose gels with a capture window of 300-600 bp. Paired-end 100x100 bp sequencing was performed on a HiSeq 2500 or NovaSeq 6000 (Illumina). Table S1 lists the library preparation kit and sequencing platform for each sample.

RNA-seq data processing and analysis

All analyses were performed using human genome build hg38, and a custom version of the comprehensive GENCODE v24 transcriptome annotation.52 This annotation represents the union of the “GENCODE Basic” annotation and transcripts recognized by the Consensus Coding Sequence project.61 Importantly, the GENCODE annotation lists the PAR gene annotations twice – once on Chr X and once on Chr Y– which complicates analysis. We removed these annotations from Chr Y so the PAR genes are only listed once in our annotation, on Chr X. To analyze samples in which ERCC spike-ins were added, we merged our custom transcript annotation with the ERCC Control annotation.

Reads were pseudoaligned to the transcriptome annotation, and expression levels of each transcript were estimated using kallisto software v0.42.5.54 We included the “--bias” flag to correct for sequence bias. The resulting count data (abundance.tsv file) were imported into R with the tximport package v1.14.062 for normalization using DESeq2 v1.26.0.55 For downstream analysis, we used only protein-coding genes (as annotated in ensembl v104) with the following exceptions: we included genes annotated as pseudogenes on Chr Y that are members of X-Y pairs (TXLNGY, PRKY) and well-characterized long non-coding RNAs (lncRNAs) involved in X-inactivation or other processes (XIST, JPX, FTX, XACT, FIRRE, TSIX). We annotated genes distal to XG, which spans the pseudoautosomal boundary on Xp and is truncated on Chr Y, as part of PAR1 - 15 genes in total. PAR2 comprised the four most distal genes on Xq and Yq. Annotations of non-pseudoautosomal region of the X (NPX) genes with homologs on the non-pseudoautosomal region of the Y (NPY) were derived from Bellott et al.63 224 protein-coding genes on Chr 21 (ensembl v104) were used as a starting point for our analyses. We excluded 21 annotated genes in several regions with high homology between the long and short arms of Chr 21 because the assembly was not fully validated in these regions (https://www.ncbi.nlm.nih.gov/grc/human/issues?filters=chr:21).

Identifying genes affected by changes in chr X, Y, or 21 copy number

We first defined lists of expressed NPX, NPY, PAR, or Chr 21 genes as those with median TPM of at least 1 in 46,XX or 46,XY samples. To ensure that no genes with robust expression were excluded, we also analyzed LCL and fibroblast expression data from GTEx,53 and included several genes that were just below our TPM cutoff but had median TPM of at least 1 in those datasets.

For each expressed NPX, NPY, or PAR gene we performed linear modeling using the lm() function in R. These calculations suppose that each additional chromosome adds a consistent and equal increment to the total expression level of the gene in question.

For NPX and PAR genes we used the following equation:

E=β0+βX(#chrXi)+βY(#chrY)+βB(batch)+ε

E represents the expression (read counts) per gene, β0 represents the intercept, βX and βY are the coefficients of the effect of additional copies of Chr Xi or Y, respectively, and is an error term. For this equation, the intercept represents the 45,X samples.

For NPY genes we employed the following equation, analyzing only those samples with one or more copies of Chr Y:

E=β0+βX(#chrXi)+βY(#chrY1)+βB(batch)+ε

For this equation, the intercept represents the 46,XY samples.

For Chr 21 genes we employed the following equation, analyzing only those samples with 46,XX; 46,XX; 47,XY,+21; or 47,XX,+21 karyotypes:

E=β0+β21(#chr212)+βSex(Sex)+βB(batch)+ε

β21 and βSex are the coefficients of the effect of an additional copy of Chr 21 and sex (XY vs XX), respectively. For this equation, the intercept represents the 46,XX samples.

The resulting p values were adjusted for multiple hypothesis testing using the p.adjust() function in R, specifying the Benjamini Hochberg method. Genes with a false discovery rate (FDR) < 0.05 were considered significant. To compute the normalized expression change per Chr Xi (ΔEX) or Y (ΔEY), we divided the coefficient of interest (βX or βY) by the average intercept across batches, which corresponds to the baseline expression of the gene in samples with only one X chromosome (for NPX and PAR genes) or one Y chromosome (in the case of NPY genes). For Chr 21, we computed ΔE21 by dividing the coefficient (β21) by the average intercept across batches divided by two to obtain the average expression from one copy of Chr 21.

ΔEX=βXβ0ΔEY=βYβ0ΔE21=β21β0/2

In the case of XIST, which is only expressed when two or more copies of Chr X are present, we used the following equations:

ΔEX=βXβ0+βXΔEY=βYβ0+βX

We calculated the standard error (SE) of ΔEX, ΔEY, and ΔE21 using the following equations:

SΔEX=βX2β02[SβX2βX2+Sβ02β02]SΔEY=βY2β02[SβY2βY2+Sβ02β02]SΔ21=β212(β0/2)2[Sβ212β212+Sβ02(β0/2)2]

To confirm the validity of our approach, we used bootstrapping to sample our dataset with replacement 1000 times and obtained similar results. BEX1 was removed from downstream analyses in fibroblasts because two samples (one 45,X and one 49,XXXXY) had high expression values for this gene resulting in >25 times higher error values for ΔEX and ΔEY compared to all other genes.

Saturation analysis for sex chromosome-encoded genes

For LCLs and fibroblasts, size-n subsets of available RNA-seq libraries were sampled randomly without replacement, 100 times for each sample size, n. After confirming that the model matrix would be full rank in each sampling (for example, that samples would not all be of the same karyotype or batch), we performed linear modeling on NPX, PAR, NPY, and Chr 21 genes as described above to identify genes whose expression changes significantly (FDR < 0.05) with copy number of Chr X, Y or 21.

Assessing linearity of sex-chromosome gene expression changes

To assess whether sex-chromosome gene expression changed linearly (i.e., by a fixed amount) with additional X or Y chromosomes, their expression levels across the LCL or fibroblast samples were fit by non-linear least squares to the power curves shown below, using the “nlsLM” function from the R package “minpack.lm”.NPX genes:

yj=1+b(xcountj1)a,whereyj=yjyk

PAR genes:

yj=1+b(xcountj+ycountj1)a,whereyj=yjyk

NPY genes:

yj=1+b(ycountj1)a,whereyj=yjyk

In each of the equations above, yj is the normalized RNA-seq read count for a given gene in sample j, given by the raw read count in sample j divided by the average read count in the set of samples S{·} with only one chromosome of the relevant type: for NPX genes, 1 copy of Chr X (and any number of Y chromosomes); for PAR genes, 45,X samples; for NPY genes, 1 copy of Chr Y (and any number of X chromosomes). b=0.5 and a=1 were used as initial parameter values. Fitted values of a1 indicate a linear relationship between expression and sex-chromosome count. Fitted values of a0 or b0 indicate no change in expression with X or Y count.

ΔEX calculations in samples with 0 Y chromosomes (females) and 1 Y chromosome (males)

We took subsets of the samples with either zero Y chromosomes (females) and with one Y chromosome (males) and performed the same linear modeling and ΔEX calculations as above. We removed MAP7D2 in female LCLs, IL13RA2 in female fibroblasts, and FHL1 in male fibroblasts because their error values (likely due to smaller sample size) were much higher than those of other genes. To compare the linear modeling results, we performed Pearson correlations between the results using all samples, and those from male-only or female-only samples.

Reanalysis of array data and comparison to RNA-seq data

A previous study performed gene expression analysis, using Illumina oligonucleotide BeadArrays, of LCLs from 68 individuals of the following karyotypes: 45,X; 46,XX; 46,XY; 47,XXX; 47,XXY; 47,XYY; and 48,XXYY.24 Since this microarray dataset was generated from an independent set of samples, we sought to validate our results through a reanalysis of the data.

The raw data from the microarrays was not publicly available, but the authors provided us pre-processed data upon request, which we used to perform our analysis. To identify genes that cleared a minimum signal threshold to be considered expressed in the microarray data, we assessed the median signal in 46,XY samples for all Chr Y genes annotated on the microarray. We focused on Chr Y genes in this analysis because many are known to be expressed exclusively in testes, and therefore could provide us with an appropriate sense of the background signal expected for genes not expressed in LCLs. From this analysis, we concluded that a signal threshold of 111 would be appropriate for identifying expressed genes (Figure S9A). We used this threshold to identify 278 expressed Chr X genes (including PAR and NPX genes) in the microarray dataset. This was fewer than the 341 expressed Chr X genes identified in our LCL RNA-seq data, but more than double the 121 expressed Chr X genes reported in Raznahan et al. (Table S4 in Raznahan et al.).24 This discrepancy could not be resolved by simply increasing the signal threshold in our analysis, as TMSB4X, one of the most highly expressed genes in LCLs, was excluded from the previously reported list of expressed genes.

Using our list of 278 expressed genes from the Raznahan et al. dataset, we analyzed the microarray signal values (in place of RNA-seq read counts) using linear models as a function of Xi copy number, controlling for Chr Y copy number. We calculated ΔEX values from the microarray data and compared these to our RNA-seq dataset using a Pearson correlation, which revealed that the results were generally concordant (Figure S9B). For genes that were lowly expressed, however, ΔEX values tended to be much lower in the microarray dataset, consistent with the higher sensitivity of RNA-seq data.

Isoform-specific analysis of RNA-seq data

After estimating counts for each transcript using kallisto software (as described above, with 100 bootstraps), we used sleuth v0.30.0 to normalize those transcript counts.56 X chromosome transcripts were called expressed if their corresponding gene was on the list of expressed genes (above) and median transcript counts were >200. Linear regressions and ΔEX calculations for transcripts were performed as for genes (above) to identify transcripts whose abundance changes significantly with additional copies of Chr X.

The following ENCODE datasets were used for visualization in IGV software58 at the UBA1 locus.

AssayCell typeCell lineKaryotypeAccession
CTCF
ChIP-seq
LCLGM1287846,XXENCFF644EEX
GM1286446,XYENCFF070FTG
FibroblastAG0930946,XXENCFF233THH
AG1080346,XYENCFF080HIA
CTCF
ChIA-PET
LCLGM1287846,XXENCFF80PGS
ENCFF847QOE

Gene constraint analysis

To investigate sensitivity to a reduction in gene dosage, we used three metrics: LOEUF, RVIS, and pHI. We downloaded LOEUF (loss-of-function observed/expected upper fraction) scores from gnomAD (v2.1.1.lof_metris.by_gene.txt; https://gnomad.broadinstitute.org/), and only used scores with a minimum of 10 expected LoF variants. Updated RVIS (residual variation intolerance scores)36 including the ExAC dataset were downloaded from http://genic-intolerance.org/data/RVIS_Unpublished_ExAC_May2015.txt. Updated probability of haploinsufficiency (pHI) scores37 were downloaded from https://www.deciphergenomics.org/files/downloads/HI_Predictions_Version3.bed.gz. To complement these data, we obtained a list of genes with observed homozygous loss-of-function variants.35 For sensitivity to an increase in gene dosage, we used the per-gene average probability of conserved miRNA targeting scores (PCT).38

For each metric, we computed a percentile rank score, ranking from most-to least-constrained. Because several of the metrics calculated scores separately for autosomal (including PAR genes) and NPX genes, we ranked autosomal (and PAR) genes separately from NPX genes. All annotated genes, regardless of expression status in LCLs or fibroblasts, were included in the rankings, with the following exceptions: 1) NPX genes previously annotated as “ampliconic”,64,65 since constraint metrics cannot be accurately applied to these highly similar genes, and 2) genes with <2 annotations across all metrics.

To obtain an aggregate sense of a gene’s expression constraint across multiple metrics, we calculated the average. Among NPX genes, we considered those with |ΔEX ≥ 0.1| (FDR < 0.05) to be most likely to contribute to phenotypes mediated by Xi copy number, prioritizing the top ten genes by the average gene-constraint metric. For PAR1 genes, we prioritized genes with an average gene constraint percentile ranking of at least 50%. To assess the phenotypic roles of highly constrained genes, we annotated them for disease phenotypes with known molecular basis from Online Mendelian Inheritance in Man (OMIM).66

Comparisons to published annotations of X-inactivation status

We re-compiled XCI status annotations of individual genes from four studies of Chr X allelic ratios.14,15,16,18 Previous XCI status compilations9 incorporated DNA methylation data, which we excluded because it does not directly measure Xi transcription. Previous compilations also incorporated information about expression in human-rodent hybrid cell lines carrying a human Xi13; we incorporated this information only where allelic ratios in human cells were not available. Our final XCI status annotations are listed in Table S6, with the workflow for generating these annotations explained below.

The first dataset that we incorporated was derived from paired genomic and cDNA SNP-chips in skewed LCL and fibroblast cell cultures (Additional file 7 from Cotton et al.).14 We used the AR values provided (average Xi expression column) for genes informative in at least 5 samples, resulting in AR values for 424 genes. Using the provided numbers of informative samples and standard deviations of AR values, we computed 95% confidence intervals for the AR values. We considered a gene “Subject” to XCI if the AR 95% confidence interval included zero or the AR value was <0.1; otherwise we considered the gene to “Escape”.

The second dataset that we incorporated was derived from bulk or single cell RNA-seq of LCLs.15 The bulk RNA-seq was from an individual in the GTEx dataset with 100% skewed XCI across the body (Table S5 from Tukiainen et al.).15 The single-cell RNA-seq in LCLs was from three individuals (Table S8 from Tukiainen et al.; we excluded data from one dendritic cell sample).15 For each dataset, we calculated an AR for each gene using read counts from the more lowly and highly expressed alleles in each sample, and used the provided adjusted p-values to identify genes with significant Xi expression (padj < 0.05). For a gene to be considered informative, we required data from at least two individuals in the single cell dataset, or one individual in the single cell dataset and informative in the bulk RNA-seq dataset, resulting in 82 informative genes. We called a gene as “Subject” to XCI if there was no significant expression from Xi in either the bulk or single-cell datasets, and “Escape” if one or both of the datasets showed evidence of Xi expression.

The third dataset that we incorporated was derived from single-cell allelic expression in fibroblasts.16 The dataset includes five individuals (Dataset 3 from Garieri et al.)16 and we required data from at least two samples to be considered informative for a given gene, resulting in 203 genes. We converted their reported values (Xa reads/total reads) to AR values using the following formula: AR=1Xareadstotalreads1. We used the AR threshold calculated in the previous study16 to consider a gene significantly expressed from Xi in each sample (AR > 0.0526). If a gene had no samples with significant expression from Xi or a mean AR value < 0.1 across samples, we considered it “Subject” to XCI; otherwise, it was judged to “Escape” XCI.

The fourth dataset that we incorporated was derived from allele-specific bulk RNA-seq performed on 136 samples with skewed XCI from the set of GEUVADIS LCLs (Tables S4 and S5 from Sauteraud et al.).18 For a gene to be considered scorable, we required that it be informative in at least 10 samples, resulting in 215 genes. We calculated an AR for each gene in each sample using the read counts from the more lowly and highly expressed alleles in each sample, adjusting for the level of skewing in each sample. To identify genes that were significantly expressed from Xi across samples, we performed paired, two-sample, one-sided t tests using the t.test function in R, asking whether the raw (pre-adjusted for skewing) AR values were greater than the baseline AR given the level of skewing in each sample (baselineAR=1skewingcoefficientskewingcoefficient); we corrected the resulting p values for multiple comparisons with the p.adjust function in R using the Benjamini-Hochberg method. Genes were considered “Escape” if they had padj < 0.01, and “Subject” otherwise.

Next, we synthesized the calls from these four datasets. We assigned a gene as “Subject” if: 1) all studies were “Subject” or 2) most (>50%) studies were “Subject” and the average AR across all studies was <0.1. We assigned “Escape” if 1) most (>50%) studies were “Escape” or 2) 50% or fewer (but more than 0) studies were “Escape” and either i) there was more than one study with evidence of escape or ii) the average AR across all studies was ≥1. Finally, we assigned “No call” if the gene was not informative in any of the four datasets. For these genes, we investigated whether there were any calls using hybrids from Carrel et al.13 as compiled in Balaton et al.9 If a gene had no call in any of the four AR datasets, but had a proportion of expression in Xi hybrids <0.22, we considered the gene “Subject”; genes with a greater proportion were called “Escape.”

To compare our calls with previous XCI consensus calls, we made the following modifications to the Balaton list: XG had been listed as a PAR gene, but we excluded it from our list of PAR genes because it is located at the PAR boundary and truncated on the Y chromosome. We updated its annotation to escape (“E”) since the Balaton table lists evidence for escape. The Balaton table lists XIST as “mostly subject” to XCI, but given its exclusive expression from Xi, we updated its status to escape (“E”). We manually examined all genes on our list that were not found in the Balaton list to make sure that genes were not misclassified due to differences in official gene names. For those genes still not present in the Balaton list after this correction, we list “No call”. To compare with our annotations, we grouped the Balaton calls into “Escape” if they were annotated as “PAR”, “Escape”, “Mostly escape”, “Variable Escape”, “Mostly Variable Escape”, or “Discordant”. We grouped Balaton calls into “Subject” if they were annotated as “Mostly subject” or “Subject”.

We compared our new calls with the Balaton calls for the 423 genes expressed in fibroblasts or LCLs, finding 48 where they differed. Of these, nine had a call in Balaton, but no call in the newer datasets. For two genes (TCEAL3, TMSB4X), there was no call in Balaton, but newer data enabled a call to be made (both “Subject”). Nineteen genes were called “Subject” in Balaton, but new data indicates that they have expression from Xi and we categorize them as “Escape.” The final eighteen genes were called “Escape” in Balaton, but new data suggested they have no expression from Xi. In total, our classification found 86 genes that “Escape”, 315 genes that are “Subject” to XCI, and 22 genes with “No call.”

Allele-specific expression analysis

This workflow is diagrammed in Figure S13.

SNP calling

We called SNPs in each RNA-seq sample with two X chromosomes (46,XX, 46,XX,+21, 47,XXY, 48,XXYY) following the Broad Institute’s “Best Practices” workflow for identifying short variants in RNA-seq data (https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-). To perform our skewing analysis, we filtered for SNPs with the following properties: 1) annotated in the dbSNP database, 2) located in an exon of an expressed gene, 3) displaying a minimum coverage of 10 reads, and 4) heterozygous with at least three reads mapping to each of the reference and alternative alleles. We excluded SNPs where the presence of two alleles likely represented technical artifacts rather than biallelic expression, including in WASH6P (SNPs map to multiple near-identical autosomal paralogs), ATRX (SNP in a mutation-prone stretch of Ts), and APOOL (SNPs within an inverted repeat). For samples with a copy of Chr Y, we excluded SNPs mapping to PAR genes, to avoid measuring allelic contributions of Chr Y.

Identifying cell lines with skewed X chromosome inactivation

We classified genes as “Xa only” (only expressed from Xa) if previously characterized as “silenced” and found here to have ΔEX < 0.05 (FDR > 0.5); see Table S6. We expect that in skewed cell lines, reads from Xa-only genes should be near or completely monoallelic (Figure S12). For each SNP in Xa-only genes, we calculated the “skewing coefficient” by dividing the number of reads from the dominant allele by the total number of reads covering the SNP (Figure S14). These coefficients range from 0.5 (equal expression of two alleles) to 1 (expression from a single allele). For each sample, we computed the median skewing coefficient across all SNPs in Xa-only genes, requiring a threshold of 0.8 to classify as skewed. Using simulations, we find that this level of skewing is unlikely to occur by chance (P < 1x10−6), and we do not find evidence of such skewing for SNPs on Chr 8, an autosome with a similar number of expressed genes (Figure S16).

Several samples had few (≤5) informative SNPs in Xa-only genes, but many SNPs in other genes (Figure S15). We interpret this to mean that these samples are highly skewed and that we do not observe enough RNA reads covering both alleles to count SNPs in Xa-only genes as informative. Between these highly-skewed samples and the samples with skewing coefficients of at least 0.8, we identified 21 LCLs and 10 fibroblast cultures with skewed XCI.

Determining allelic ratios for X chromosome genes

After identifying the skewed cell lines, we identified genes with informative SNPs values in at least three skewed samples of a given cell type. We then computed the allelic ratio (AR) at each informative SNP by dividing the number of reads from the more lowly expressed allele by the number of reads from the more highly expressed allele. In cell cultures that are partially skewed, genes will appear more biallelic than in completely skewed cell cultures since there are two populations of cells with different active X chromosomes present – the “major” and “minor” cell populations. Using our skewing estimates, we adjusted the AR on a per-sample basis using the following formula:

AR=ARARtt1tARt

Where t is the estimated percentage of cells in the “minor” population (i.e., with the other X chromosome active compared to the “major” cell population), calculated by: 1 – skewing coefficient. For highly-skewed samples, we were unable to calculate a stringent skewing coefficient due to too few SNPs, so we set skewing coefficient = 1. As a result, it is possible that allelic ratios in these samples may be slightly overestimated if the skewing coefficients are in fact <1. Within each sample, we obtained the average AR for each gene by averaging across all informative SNPs in that gene’s exons (Figure S17; Table S6) and then calculated the mean AR across skewed samples to obtain a final per-gene AR estimate (Table S6).

To assess whether AR values for each gene were significantly greater than zero, we performed one-sided t tests using the t.test function in R, asking whether the AR values were greater than zero; we corrected the resulting p values for multiple comparisons with the p.adjust function in R using the Benjamini-Hochberg method (Table S6). We also repeated this analysis excluding highly skewed samples, since the skewing coefficients cannot be stringently determined. This removed some informative genes but did not significantly affect the AR values (Figure S19; Table S6).

To identify genes whose AR and ΔEX values differ significantly, we performed one-sample, two-sided t tests for the AR values across samples, setting mu = ΔEX (Table S6). We selected genes with Benjamini-Hochberg adjusted p values < 0.1 as having significantly different AR and ΔEX values. From this list we excluded genes for which the 95% confidence interval of ΔEX values (1.96*SE) included the mean AR value, and those for which both ΔEX and AR were not significantly different from zero (FDR ≥ 0.05).

We compared our AR values derived from LCLs or fibroblasts with the four published allelic-ratio datasets described in the above methods on generating XCI status calls (Figure S18).

Quantification and statistical analyses

Various statistical tests were used to calculate p values as indicated in the methods section, figure legend, or text, where appropriate. Results were considered statistically significant when p < 0.05 or FDR<0.05 when multiple hypothesis correction was applied, unless stated otherwise. Data are shown as median and interquartile range, unless stated otherwise. All statistics were calculated using R software, version 3.6.3.67

Acknowledgments

We thank members of the Page laboratory, especially Lukas Chmatal, for helpful comments on the manuscript, Sahin Naqvi for advice on RNA-seq and analysis, and Jorge Adarme and Susan Tocio for laboratory support. We thank the Whitehead Institute Genome Technology Core facility for library preparation and sequencing. This work was funded by the following: National Institutes of Health grants F32HD091966 (A.K.S.R.), U01HG0007587 (D.C.P. and M.M.), and K23HD092588 (S.M.D.); Schmidt Science Fellows (A.F.G.); Lallage Feazel Wall Damon Runyon Cancer Research Foundation Fellowship (L.V.B.); Howard Hughes Medical Institute (D.C.P.); National Human Genome Research Institute Intramural Research Program (M.M.); and NIH/NCATS Colorado CTSA grant UL1 TR002535 (N.R.T.). The contents are the authors’ sole responsibility and do not necessarily represent official NIH views. We are also thankful for philanthropic gifts from Brit and Alexander d’Arbeloff, Arthur W. and Carol Tobin Brill, Matthew Brill, and Charles Ellis.

Author contributions

Conceptualization, A.K.S.R. and D.C.P.; data curation, A.K.S.R. and H.S.; formal analysis, A.K.S.R., A.K.G., H.S., D.W.B., A.F.G., H.L.H., L.V.B., and J.F.H.; funding acquisition, A.K.S.R., A.F.G., L.V.B., S.M.D., N.R.T., M.M., and D.C.P.; investigation, A.K.S.R., S.P., L.B., A.D., and E.P.; methodology, A.K.S.R., A.K.G., H.S., A.F.G., and H.L.H.; project administration, A.K.S.R., J.F.H., L.B., A.B., P.K., N.B., P.C.L., C.K., and S.M.D.; resources, A.B., P.K., N.B., P.C.L., C.K., S.M.D., N.R.T., C.S.-S., and M.M.; software, A.K.S.R., A.K.G., H.S., A.F.G., H.L.H., and L.V.B.; supervision, A.K.S.R., J.F.H., N.R.T., C.S.-S., M.M., and D.C.P.; validation, A.K.G., H.S., D.W.B., and L.V.B.; visualization, A.K.S.R., A.K.G., and H.L.H.; writing – original draft preparation, A.K.S.R. and D.C.P.; writing – review and editing, A.K.S.R., A.K.G., D.W.B., A.F.G., H.L.H., L.V.B., J.F.H., S.M.D., and D.C.P.

Declaration of interests

The authors declare no competing interests.

Inclusion and diversity

We support inclusive, diverse, and equitable conduct of research.

Notes

Published: February 8, 2023

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.xgen.2023.100259.

Supplemental information

Document S1. Figures S1–S19:

Table S1. Metadata for euploid and aneuploid samples with RNA sequencing, related to Figure 1 and Table 1:

Table S2. Linear regression results for expressed PAR and NPX genes, related to Figures 2 and S3–S5:

Table S3. Linear regression results for expressed PAR and NPX transcript isoforms, related to Figures 2 and S7:

Table S4. Linear regression results for expressed NPY genes, related to Figures 3 and S10:

Table S5. Linear regression results for expressed Chr 21 genes, related to Figures 3 and S11:

Table S6. Allelic ratios and statistics for informative Chr X genes, related to Figures 4 and S12–S19:

Table S7. Expression constraint metrics for PAR1 and NPX genes, related to Figure 5 and Table 2:

Document S2. Article plus supplemental information:

Data and code availability

  • • Raw, de-identified RNA-sequencing data from human cell cultures has been deposited to dbGaP under accession number phs002481.v2.p1, and processed data has been deposited at Zenodo under accession number https://doi.org/10.5281/zenodo.7504743.
  • • This paper analyzes existing, publicly available data. Accession numbers for these datasets are listed in the key resources table.
  • • Original code has been deposited at Zenodo under accession number https://doi.org/10.5281/zenodo.7504743 and is publicly available as of the date of publication.
  • • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

References

1. Ohno S., Hauschka T.S. Allocycly of the X-chromosome in tumors and normal tissues. Cancer Res. 1960;20:541–545. [Abstract] [Google Scholar]
2. Lyon M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.) Nature. 1961;190:372–373. 10.1038/190372a0. [Abstract] [CrossRef] [Google Scholar]
3. Lyon M.F. Sex chromatin and gene action in the mammalian X-chromosome. Am. J. Hum. Genet. 1962;14:135–148. [Europe PMC free article] [Abstract] [Google Scholar]
4. Harnden D.G. Nuclear sex in triploid XXY human cells. Lancet. 1961;2:488. 10.1016/s0140-6736(61)92457-6. [Abstract] [CrossRef] [Google Scholar]
5. Hook E.B., Warburton D. The distribution of chromosomal genotypes associated with Turner's syndrome: livebirth prevalence rates and evidence for diminished fetal mortality and severity in genotypes associated with structural X abnormalities or mosaicism. Hum. Genet. 1983;64:24–27. 10.1007/BF00289473. [Abstract] [CrossRef] [Google Scholar]
6. Hook E.B., Warburton D. Turner syndrome revisited: review of new data supports the hypothesis that all viable 45,X cases are cryptic mosaics with a rescue cell line, implying an origin by mitotic loss. Hum. Genet. 2014;133:417–424. 10.1007/s00439-014-1420-x. [Abstract] [CrossRef] [Google Scholar]
7. Turner H.H. A syndrome of infantilism, congenital webbed neck, and cubitus valgus. Endocrinology. 1938;23:566–574. 10.1210/endo-23-5-566. [CrossRef] [Google Scholar]
8. Ford C.E., Jones K.W., Polani P.E., Dealmeida J.C., Briggs J.H. A sex-chromosome anomaly in a case of gonadal dysgenesis (turners syndrome) Lancet. 1959;1:711–713. 10.1016/s0140-6736(59)91893-8. [Abstract] [CrossRef] [Google Scholar]
9. Balaton B.P., Cotton A.M., Brown C.J. Derivation of consensus inactivation status for X-linked genes from genome-wide studies. Biol. Sex Differ. 2015;6:35. 10.1186/s13293-015-0053-7. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
10. Mohandas T., Sparkes R.S., Hellkuhl B., Grzeschik K.H., Shapiro L.J. Expression of an X-linked gene from an inactive human X chromosome in mouse-human hybrid cells: further evidence for the noninactivation of the steroid sulfatase locus in man. Proc. Natl. Acad. Sci. USA. 1980;77:6759–6763. 10.1073/pnas.77.11.6759. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
11. Brown C.J., Carrel L., Willard H.F. Expression of genes from the human active and inactive X chromosomes. Am. J. Hum. Genet. 1997;60:1333–1343. 10.1086/515488. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
12. Carrel L., Cottle A.A., Goglin K.C., Willard H.F. A first-generation X-inactivation profile of the human X chromosome. Proc. Natl. Acad. Sci. USA. 1999;96:14440–14444. 10.1073/pnas.96.25.14440. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
13. Carrel L., Willard H.F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature. 2005;434:400–404. 10.1038/nature03479. [Abstract] [CrossRef] [Google Scholar]
14. Cotton A.M., Ge B., Light N., Adoue V., Pastinen T., Brown C.J. Analysis of expressed SNPs identifies variable extents of expression from the human inactive X chromosome. Genome Biol. 2013;14:R122. 10.1186/gb-2013-14-11-r122. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
15. Tukiainen T., Villani A.C., Yen A., Rivas M.A., Marshall J.L., Satija R., Aguirre M., Gauthier L., Fleharty M., Kirby A., et al. Landscape of X chromosome inactivation across human tissues. Nature. 2017;550:244–248. 10.1038/nature24265. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
16. Garieri M., Stamoulis G., Blanc X., Falconnet E., Ribaux P., Borel C., Santoni F., Antonarakis S.E. Extensive cellular heterogeneity of X inactivation revealed by single-cell allele-specific expression in human fibroblasts. Proc. Natl. Acad. Sci. USA. 2018;115:13015–13020. 10.1073/pnas.1806811115. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
17. Wainer Katsir K., Linial M. Human genes escaping X-inactivation revealed by single cell expression data. BMC Genom. 2019;20:201. 10.1186/s12864-019-5507-6. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
18. Sauteraud R., Stahl J.M., James J., Englebright M., Chen F., Zhan X., Carrel L., Liu D.J. Inferring genes that escape X-Chromosome inactivation reveals important contribution of variable escape genes to sex-biased diseases. Genome Res. 2021;31:1629–1637. 10.1101/gr.275677.121. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
19. Sudbrak R., Wieczorek G., Nuber U.A., Mann W., Kirchner R., Erdogan F., Brown C.J., Wöhrle D., Sterk P., Kalscheuer V.M., et al. X chromosome-specific cDNA arrays: identification of genes that escape from X-inactivation and other applications. Hum. Mol. Genet. 2001;10:77–83. [Abstract] [Google Scholar]
20. Craig I.W., Mill J., Craig G.M., Loat C., Schalkwyk L.C. Application of microarrays to the analysis of the inactivation status of human X-linked genes expressed in lymphocytes. Eur. J. Hum. Genet. 2004;12:639–646. 10.1038/sj.ejhg.5201212. [Abstract] [CrossRef] [Google Scholar]
21. Talebizadeh Z., Simon S.D., Butler M.G. X chromosome gene expression in human tissues: male and female comparisons. Genomics. 2006;88:675–681. 10.1016/j.ygeno.2006.07.016. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
22. Johnston C.M., Lovell F.L., Leongamornlert D.A., Stranger B.E., Dermitzakis E.T., Ross M.T. Large-scale population study of human cell lines indicates that dosage compensation is virtually complete. PLoS Genet. 2008;4:e9. 10.1371/journal.pgen.0040009. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
23. Trolle C., Nielsen M.M., Skakkebæk A., Lamy P., Vang S., Hedegaard J., Nordentoft I., Ørntoft T.F., Pedersen J.S., Gravholt C.H. Widespread DNA hypomethylation and differential gene expression in Turner syndrome. Sci. Rep. 2016;6 10.1038/srep34220. 34220–34214. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
24. Raznahan A., Parikshak N.N., Chandran V., Blumenthal J.D., Clasen L.S., Alexander-Bloch A.F., Zinn A.R., Wangsa D., Wise J., Murphy D.G.M., et al. Sex-chromosome dosage effects on gene expression in humans. Proc. Natl. Acad. Sci. USA. 2018;115:7398–7403. 10.1073/pnas.1802889115. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
25. Zhang X., Hong D., Ma S., Ward T., Ho M., Pattni R., Duren Z., Stankov A., Bade Shrestha S., Hallmayer J., et al. Integrated functional genomic analyses of Klinefelter and Turner syndromes reveal global network effects of altered X chromosome dosage. Proc. Natl. Acad. Sci. USA. 2020;117:4864–4873. 10.1073/pnas.1910003117. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
26. Nielsen M.M., Trolle C., Vang S., Hornshøj H., Skakkebaek A., Hedegaard J., Nordentoft I., Pedersen J.S., Gravholt C.H. Epigenetic and transcriptomic consequences of excess X-chromosome material in 47,XXX syndrome-A comparison with Turner syndrome and 46,XX females. Am. J. Med. Genet. C Semin. Med. Genet. 2020;184:279–293. 10.1002/ajmg.c.31799. [Abstract] [CrossRef] [Google Scholar]
27. Skaletsky H., Kuroda-Kawaguchi T., Minx P.J., Cordum H.S., Hillier L., Brown L.G., Repping S., Pyntikova T., Ali J., Bieri T., et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature. 2003;423:825–837. 10.1038/nature01722. [Abstract] [CrossRef] [Google Scholar]
28. Ross M.T., Grafham D.V., Coffey A.J., Scherer S., McLay K., Muzny D., Platzer M., Howell G.R., Burrows C., Bird C.P., et al. The DNA sequence of the human X chromosome. Nature. 2005;434:325–337. 10.1038/nature03440. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
29. Bellott D.W., Hughes J.F., Skaletsky H., Brown L.G., Pyntikova T., Cho T.-J., Koutseva N., Zaghlul S., Graves T., Rock S., et al. Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators. Nature. 2014;508:494–499. 10.1038/nature13206. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
30. Brown C.J., Ballabio A., Rupert J.L., Lafreniere R.G., Grompe M., Tonlorenzi R., Willard H.F. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature. 1991;349:38–44. 10.1038/349038a0. [Abstract] [CrossRef] [Google Scholar]
31. Penny G.D., Kay G.F., Sheardown S.A., Rastan S., Brockdorff N. Requirement for Xist in X chromosome inactivation. Nature. 1996;379:131–137. 10.1038/379131a0. [Abstract] [CrossRef] [Google Scholar]
32. Ciccodicola A., D'Esposito M., Esposito T., Gianfrancesco F., Migliaccio C., Miano M.G., Matarazzo M.R., Vacca M., Franzè A., Cuccurese M., et al. Differentially regulated and evolved genes in the fully sequenced Xq/Yq pseudoautosomal region. Hum. Mol. Genet. 2000;9:395–401. 10.1093/hmg/9.3.395. [Abstract] [CrossRef] [Google Scholar]
33. Mégarbané A., Ravel A., Mircher C., Sturtz F., Grattau Y., Rethoré M.O., Delabar J.M., Mobley W.C. The 50th anniversary of the discovery of trisomy 21: the past, present, and future of research and treatment of Down syndrome. Genet. Med. 2009;11:611–616. 10.1097/GIM.0b013e3181b2e34c. [Abstract] [CrossRef] [Google Scholar]
34. Sullivan K.D., Lewis H.C., Hill A.A., Pandey A., Jackson L.P., Cabral J.M., Smith K.P., Liggett L.A., Gomez E.B., Galbraith M.D., et al. Trisomy 21 consistently activates the interferon response. Elife. 2016;5:e16220. 10.7554/eLife.16220. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
35. Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P., et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. 10.1038/s41586-020-2308-7. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
36. Petrovski S., Wang Q., Heinzen E.L., Allen A.S., Goldstein D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 2013;9:e1003709. 10.1371/journal.pgen.1003709. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
37. Huang N., Lee I., Marcotte E.M., Hurles M.E. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 2010;6:e1001154. 10.1371/journal.pgen.1001154. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
38. Friedman R.C., Farh K.K.H., Burge C.B., Bartel D.P. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009;19:92–105. 10.1101/gr.082701.108. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
39. Bartel D.P. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136:215–233. 10.1016/j.cell.2009.01.002. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
40. Naqvi S., Bellott D.W., Lin K.S., Page D.C. Conserved microRNA targeting reveals preexisting gene dosage sensitivities that shaped amniote sex chromosome evolution. Genome Res. 2018;28:474–483. 10.1101/gr.230433.117. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
41. Ogata T., Matsuo N. Sex chromosome aberrations and stature: deduction of the principal factors involved in the determination of adult height. Hum. Genet. 1993;91:551–562. 10.1007/BF00205079. [Abstract] [CrossRef] [Google Scholar]
42. Rao E., Weiss B., Fukami M., Rump A., Niesler B., Mertz A., Muroya K., Binder G., Kirsch S., Winkelmann M., et al. Pseudoautosomal deletions encompassing a novel homeobox gene cause growth failure in idiopathic short stature and Turner syndrome. Nat. Genet. 1997;16:54–63. 10.1038/ng0597-54. [Abstract] [CrossRef] [Google Scholar]
43. Clement-Jones M., Schiller S., Rao E., Blaschke R.J., Zuniga A., Zeller R., Robson S.C., Binder G., Glass I., Strachan T., et al. The short stature homeobox gene SHOX is involved in skeletal abnormalities in Turner syndrome. Hum. Mol. Genet. 2000;9:695–702. 10.1093/hmg/9.5.695. [Abstract] [CrossRef] [Google Scholar]
44. Ottesen A.M., Aksglaede L., Garn I., Tartaglia N., Tassone F., Gravholt C.H., Bojesen A., Sørensen K., Jørgensen N., Rajpert-De Meyts E., et al. Increased number of sex chromosomes affects height in a nonlinear fashion: a study of 305 patients with sex chromosome aneuploidy. Am. J. Med. Genet. 2010;152A:1206–1212. 10.1002/ajmg.a.33334. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
45. Fukami M., Seki A., Ogata T. SHOX haploinsufficiency as a cause of syndromic and nonsyndromic short stature. Mol. Syndromol. 2016;7:3–11. 10.1159/000444596. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
46. Galan-Caridad J.M., Harel S., Arenzana T.L., Hou Z.E., Doetsch F.K., Mirny L.A., Reizis B. Zfx controls the self-renewal of embryonic and hematopoietic stem cells. Cell. 2007;129:345–357. 10.1016/j.cell.2007.03.014. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
47. Zhou Y., Su Z., Huang Y., Sun T., Chen S., Wu T., Chen G., Xie X., Li B., Du Z. The Zfx gene is expressed in human gliomas and is important in the proliferation and apoptosis of the human malignant glioma cell line U251. J. Exp. Clin. Cancer Res. 2011;30:114. 10.1186/1756-9966-30-114. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
48. Fang Q., Fu W.H., Yang J., Li X., Zhou Z.S., Chen Z.W., Pan J.H. Knockdown of ZFX suppresses renal carcinoma cell growth and induces apoptosis. Cancer Genet. 2014;207:461–466. 10.1016/j.cancergen.2014.08.007. [Abstract] [CrossRef] [Google Scholar]
49. Carrel L., Willard H.F. Heterogeneous gene expression from the inactive X chromosome: an X-linked gene that escapes X inactivation in some human cell lines but is inactivated in others. Proc. Natl. Acad. Sci. USA. 1999;96:7364–7369. 10.1073/pnas.96.13.7364. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
50. Okamoto I., Nakamura T., Sasaki K., Yabuta Y., Iwatani C., Tsuchiya H., Nakamura S.I., Ema M., Yamamoto T., Saitou M. The X chromosome dosage compensation program during the development of cynomolgus monkeys. Science. 2021;374:eabd8887. 10.1126/science.abd8887. [Abstract] [CrossRef] [Google Scholar]
51. Ohno S. Springer Berlin Heidelberg; 1967. [CrossRef] [Google Scholar]
52. Godfrey A.K., Naqvi S., Chmátal L., Chick J.M., Mitchell R.N., Gygi S.P., Skaletsky H., Page D.C. Quantitative analysis of Y-Chromosome gene expression across 36 human tissues. Genome Res. 2020;30:860–873. 10.1101/gr.261248.120. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
53. GTEx Consortium. Laboratory, Data Analysis &Coordinating Center LDACC—Analysis Working Group. Statistical Methods groups—Analysis Working Group. Enhancing GTEx eGTEx groups. NIH Common Fund. NIH/NCI. NIH/NHGRI. NIH/NIMH. NIH/NIDA. Biospecimen Collection Source Site—NDRI Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. 10.1038/nature24277. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
54. Bray N.L., Pimentel H., Melsted P., Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016;34:525–527. 10.1038/nbt.3519. [Abstract] [CrossRef] [Google Scholar]
55. Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. 10.1186/s13059-014-0550-8. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
56. Pimentel H., Bray N.L., Puente S., Melsted P., Pachter L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods. 2017;14:687–690. 10.1038/nmeth.4324. [Abstract] [CrossRef] [Google Scholar]
57. Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. 10.1093/bioinformatics/btq033. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
58. Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E.S., Getz G., Mesirov J.P. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. 10.1038/nbt.1754. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
59. Vangipuram M., Ting D., Kim S., Diaz R., Schüle B. Skin punch biopsy explant culture for derivation of primary human fibroblasts. JoVE. 2013:e3779. 10.3791/3779. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
60. Naqvi S., Godfrey A.K., Hughes J.F., Goodheart M.L., Mitchell R.N., Page D.C. Conservation, acquisition, and functional impact of sex-biased gene expression in mammals. Science. 2019;365:eaaw7317. 10.1126/science.aaw7317. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
61. Pruitt K.D., Harrow J., Harte R.A., Wallin C., Diekhans M., Maglott D.R., Searle S., Farrell C.M., Loveland J.E., Ruef B.J., et al. The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009;19:1316–1323. 10.1101/gr.080531.108. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
62. Soneson C., Love M.I., Robinson M.D. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 2015;4:1521. 10.12688/f1000research.7563.2. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
63. Bellott D.W., Page D.C. Dosage-sensitive functions in embryonic development drove the survival of genes on sex-specific chromosomes in snakes, birds, and mammals. Genome Res. 2021;31:198–210. 10.1101/gr.268516.120. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
64. Mueller J.L., Skaletsky H., Brown L.G., Zaghlul S., Rock S., Graves T., Auger K., Warren W.C., Wilson R.K., Page D.C. Independent specialization of the human and mouse X chromosomes for the male germ line. Nat. Genet. 2013;45:1083–1087. 10.1038/ng.2705. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
65. Jackson E.K., Bellott D.W., Cho T.J., Skaletsky H., Hughes J.F., Pyntikova T., Page D.C. Large palindromes on the primate X Chromosome are preserved by natural selection. Genome Res. 2021;31:1337–1352. 10.1101/gr.275188.120. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
66. McKusick-Nathans Institute of Genetic Medicine . OMIM; 2022. https://omim.org/ [Google Scholar]
67. R Development Core Team . R Foundation for Statistical Computing; 2020. [Google Scholar]

Articles from Cell Genomics are provided here courtesy of Elsevier

Citations & impact 


Impact metrics

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/142325842
Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/142325842

Smart citations by scite.ai
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by EuropePMC if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1016/j.xgen.2023.100259

Supporting
Mentioning
Contrasting
4
39
0

Article citations


Go to all (17) article citations

Data 


Data behind the article

This data has been text mined from the article, or deposited into data resources.

Similar Articles 


To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.


Funding 


Funders who supported this work.

NCATS NIH HHS (1)

NHGRI NIH HHS (1)

NICHD NIH HHS (2)