Nothing Special   »   [go: up one dir, main page]

Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Integrative approaches for large-scale transcriptome-wide association studies

Abstract

Many genetic variants influence complex traits by modulating gene expression, thus altering the abundance of one or multiple proteins. Here we introduce a powerful strategy that integrates gene expression measurements with summary association statistics from large-scale genome-wide association studies (GWAS) to identify genes whose cis-regulated expression is associated with complex traits. We leverage expression imputation from genetic data to perform a transcriptome-wide association study (TWAS) to identify significant expression-trait associations. We applied our approaches to expression data from blood and adipose tissue measured in 3,000 individuals overall. We imputed gene expression into GWAS data from over 900,000 phenotype measurements to identify 69 new genes significantly associated with obesity-related traits (BMI, lipids and height). Many of these genes are associated with relevant phenotypes in the Hybrid Mouse Diversity Panel. Our results showcase the power of integrating genotype, gene expression and phenotype to gain insights into the genetic basis of complex traits.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Schematic of the TWAS approach.
Figure 2: Modes of expression causality.
Figure 3: Accuracy of individual-level expression imputation algorithms.
Figure 4: The number of genes with significant cis heritability observed at varying sample sizes.
Figure 5: Power of summary-based expression imputation algorithms.

Similar content being viewed by others

References

  1. Visscher, P.M., Brown, M.A., McCarthy, M.I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375, S1–S3 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Lee, D., Bigdeli, T.B., Riley, B.P., Fanous, A.H. & Bacanu, S.A. DIST: direct imputation of summary statistics for unmeasured SNPs. Bioinformatics 29, 2925–2927 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Pasaniuc, B. et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

  6. Locke, A.E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Wood, A.R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Zhang, X. et al. Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat. Genet. 47, 345–352 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Albert, F.W. & Kruglyak, L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212 (2015).

    Article  CAS  PubMed  Google Scholar 

  13. Raj, T. et al. Polarization of the effects of autoimmune and neurodegenerative risk alleles in leukocytes. Science 344, 519–523 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Letourneau, A. et al. Domains of genome-wide gene expression dysregulation in Down's syndrome. Nature 508, 345–350 (2014).

    Article  CAS  PubMed  Google Scholar 

  15. Davis, L.K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Gamazon, E.R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Wray, N.R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Nuotio, J. et al. Cardiovascular risk factors in 2011 and secular trends since 2007: the Cardiovascular Risk in Young Finns Study. Scand. J. Public Health 42, 563–571 (2014).

    Article  PubMed  Google Scholar 

  23. Raitakari, O.T. et al. Cohort profile: the cardiovascular risk in Young Finns Study. Int. J. Epidemiol. 37, 1220–1226 (2008).

    Article  PubMed  Google Scholar 

  24. Wright, F.A. et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46, 430–437 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Torres, J.M. et al. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. Am. J. Hum. Genet. 95, 521–534 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Buil, A. et al. Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins. Nat. Genet. 47, 88–91 (2015).

    Article  CAS  PubMed  Google Scholar 

  29. Nica, A.C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Robinson, G.K. That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6, 15–32 (1991).

    Article  Google Scholar 

  31. Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–405, e1–e3 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Brown, C.D., Mangravite, L.M. & Engelhardt, B.E. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLoS Genet. 9, e1003649 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Wen, X., Luca, F. & Pique-Regi, R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Wood, A.R. et al. Another explanation for apparent epistasis. Nature 514, E3–E5 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    PubMed  PubMed Central  Google Scholar 

  39. Lee, D. et al. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics 31, 1176–1182 (2015).

    Article  PubMed  Google Scholar 

  40. Pritchard, J.K. & Cox, N.J. The allelic architecture of human disease genes: common disease–common variant...or not? Hum. Mol. Genet. 11, 2417–2423 (2002).

    Article  CAS  PubMed  Google Scholar 

  41. Wen, W. et al. Meta-analysis of genome-wide association studies in East Asian–ancestry populations identifies four new loci for body mass index. Hum. Mol. Genet. 23, 5492–5504 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Pers, T.H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).

    Article  CAS  PubMed  Google Scholar 

  43. Smith, G.D. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    Article  PubMed  Google Scholar 

  44. Pickrell, J. Fulfilling the promise of Mendelian randomization. bioRxiv doi:10.1101/018150 (16 April 2015).

  45. Pierce, B.L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Gusev, A. et al. Quantifying missing heritability at known GWAS loci. PLoS Genet. 9, e1003993 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Dimas, A.S. et al. Common regulatory variation impacts gene expression in a cell type–dependent manner. Science 325, 1246–1250 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

    Article  CAS  PubMed  Google Scholar 

  49. Pickrell, J.K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Finucane, H.K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Stancáková, A. et al. Hyperglycemia and a common variant of GCKR are associated with the levels of eight amino acids in 9,369 Finnish men. Diabetes 61, 1895–1902 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Stancáková, A. et al. Changes in insulin sensitivity and insulin release in relation to glycemia and glucose tolerance in 6,414 Finnish men. Diabetes 58, 1212–1221 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Turchin, M.C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Boomsma, D.I. et al. Netherlands Twin Register: from twins to twin families. Twin Res. Hum. Genet. 9, 849–857 (2006).

    Article  PubMed  Google Scholar 

  56. Bulik-Sullivan, B.K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Bolormaa, S. et al. A multi-trait, meta-analysis for detecting pleiotropic polymorphisms for stature, fatness and reproduction in beef cattle. PLoS Genet. 10, e1004198 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Zhu, X. et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am. J. Hum. Genet. 96, 21–36 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Zaitlen, N., Pas¸aniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Han, B., Kang, H.M. & Eskin, E. Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet. 5, e1000456 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We thank the individuals who participated in the study. We also acknowledge L. Yang for helpful discussions that have improved the quality of this manuscript. We also thank K. Mohlke, M. Boehnke and F. Collins for help with the METSIM data. This work was funded in part by US National Institutes of Health (NIH) grants F32 GM106584 (A.G.), R01 GM053725 (B.P.), R01 GM105857 (A.L.P., A.G. and G.B.), HL-28481 (P.P., A.J.L. and M.C.) and HL-095056 (P.P. and B.P.) and by the US NIH training grant in Genomic Analysis and Interpretation T32 HG002536 (A.K.).

Author information

Authors and Affiliations

Authors

Contributions

A.G. and B.P. conceived and designed the experiments. A.G., A.K. and H.S. performed the experiments and analyzed the data. G.B., W.C., B.W.J.H.P., R.J., E.J.C.d.G., D.I.B., F.A.W., P.F.S., E.N., M.A., M.C., A.J.L., T.L., E.R., M.K., I.S., O.T.R., J.K. and M.L. generated data, reagents, materials and analysis tools. A.G., A.L.P., P.P. and B.P. wrote the manuscript. All authors reviewed, revised and wrote feedback for the manuscript.

Corresponding authors

Correspondence to Alexander Gusev or Bogdan Pasaniuc.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Distribution of cis and trans SNP-heritability estimates cross three cohorts.

Cis (left) and trans (right) density plots shown for three cohorts investigated. Black line corresponds to all converged genes; red line corresponds to genes with cis-SNP-heritability >> 0 by LRT. Dotted lines show respective means.

Supplementary Figure 2 Scaled Venn Diagram of overlap in genes with significant cis SNP-heritability across the METSIM, NTR and YFS data.

Supplementary Figure 3 Cross-validation prediction accuracy across three expression cohorts.

Using 10-fold cross-validation, prediction R^2 (divided by corresponding cis-h2g) was compuated in the METSIM, YFS and NTR (Wright et al.) cohorts by three methods: best eQTL in the gene, BLUP and BSLMM. Left panels show accuracy as a functino of cis h2g; right panels show accuracy as a function of LRT P-value for non-zero cis-h2g.

Supplementary Figure 4 Histogram of BSLMM prediction gains in three cohorts.

For each of the METSIM, YFS and NTR (Wright et al.), the difference between BSLMM R^2 and eQTL R^2 (computed by cross-validation) is plotted as histogram.

Supplementary Figure 5 Correlation of summary-based TWAS and individual-level association Z scores.

Genes with significant cis-h2g in YFS were directly tested for association with height (Z-score on y-axis) and plotted against corresponding Z-score from TWAS using only height summary association data (x-axis). Left panel shows test of height against total expression on the y-axis (\rho = 0.415); right panel shows test of height against BLUP genetic component of expression on the y-axis (\rho = 0.998). Right panel demonstrates that summary-based TWAS is essentially identical to individual-level TWAS when using in-sample LD.

Supplementary Figure 6 GWAS summary simulations over diverse disease architectures.

Power to detect a genome-wide significant association is shown for three methods (GWAS; eGWAS computed from best eQTL, and TWAS computed from summary statistics) over different disease architectures. Colors represent number of causal variants in each simulated gene. Left/right panels correspond to simulations with causal variants hidden/shown. Top/bottom panels correspond to simulations with each gene explaining 0.001/0.0005 of trait variance. x-axis shows simulations at increasing GWAS sample sizes (with expression panel fixed at 1,000 samples).

Supplementary Figure 7 TWAS power with expression and genetic component of expression.

Power to detect a genome-wide significant association is shown for summary-statistic TWAS using observed expression as well BLUP genetic value of expression. Left/right panels correspond to simulations with causal variants hidden/shown. Top/bottom panels correspond to simulations with each gene explaining 0.001/0.0005 of trait variance. x-axis shows simulations at increasing GWAS sample sizes (with expression panel fixed at 1,000 samples).

Supplementary Figure 8 GWAS summary simulations with heritable expression independent of phenotype.

Expression and trait were simulated as having the same causal variants but independent effect-sizes and power evaluated. For a single causal variant, this model is statistically identical to a true causal model. Colors indicate the method used (see Supplementary Fig. 4). Left/right panels correspond to simulations with causal variants hidden/shown. Top/bottom panels correspond to simulations with each gene explaining 0.001/0.0005 of trait variance. x-axis shows simulations at increasing GWAS sample sizes (with expression panel fixed at 1,000 samples).

Supplementary Figure 9 TWAS/eGWAS power with increased sample reference size.

: Results from simulations matching Supplementary Figure 4 (top left) model with variable reference panel size from 100-2,000 individuals. Phenotype generated under to the untyped, high effect variant model where expression explains 0.001 of trait variance. TWAS matches or outperforms eGWAS at 1,000 samples, with little additional power gained subsequently. GWAS power is unaffected by expression reference panel and not shown (identical to Supplementary Fig. 4).

Supplementary Figure 10 TWAS/COLOC power under causal model.

Single locus power comparison for causal variant hidden (left) and typed (right), high effect model where expression explains 0.001 of trait variance. Unlike other power simulations, results are not adjusted for genome/transcriptome-wide multiple testing. TWAS and COLOC significance thresholds set in a null expression simulation (with realistic heritable GWAS, see Methods) to achieve 5% FDR.

Supplementary Figure 11 GWAS summary simulations using haplotype-copying model of population.

Results from simulations matching Supplementary Figure 4 re-analyzed using HAPGEN2 to generate all GWAS sub-study individuals. After holding out 1,000 samples for the expression reference, SHAPEIT2 was used to phase 5,000 individuals in a cis-block for each of 100 random genes, yielding 100 random reference panels. For each reference panel we used HAPGEN to generate 60 sub-studies of 5,000 individuals using its haplotype-copying model (under neutrality). Phenotypes were generated with the same causal variants and effects for each sub-study, and the corresponding GWAS summary statistics were computed by meta-analyzing across these batches. Default parameters for SHAPEIT and HAPGEN were always used, and the appropriate genetic map was downloaded from the SHAPEIT2 website.

Supplementary Figure 12 TWAS/COLOC power under independent effects model.

Single locus power comparison for model where expression and trait have the same causal variants with independent effects. GWAS generated from high effect model where expression explains 0.001 of trait variance and causal variants are hidden (left) or typed (right). Unlike other power simulations, results are not adjusted for genome/transcriptome-wide multiple testing. TWAS and COLOC significance thresholds set in a null expression simulation (with realistic heritable GWAS, see Methods) to achieve 5% FDR. Single causal variant scenario is statistically identical to the causal expression model and has equivalent power.

Supplementary Figure 13 LD Score regression (LDSC) provides a noisier local estimate than TWAS for quantifying correlation between complex trait and expression.

Left panel shows in-sample and summary-based estimate of association between height and cis genetic component of expression (correlation = 0.998). Right panel shows in-sample estimate of correlation between height and cis genetic component of expression (y-axis) and LDSC estimate of genetic correlation (x-axis); correlation between the two statistics was 0.7.

Supplementary Figure 14 Distribution of TWAS statistics for height as a function of expression panel size in the merged YFS + NTR (blood) cohorts.

To evaluate the effect of expression panel size on power in real data, we re-ran the TWAS for height using the merged YFS+NTR cohort with down-sampling. The histogram of TWAS Z^2 scores is shown for training from randomly sampled half of the cohort (1,200 samples, top) and the full cohort (2,400 samples, bottom). The mean TWAS Z^2 was 5.8 in the combined cohort and 5.6 in the down-sampled cohort, with the overall distribution of Z-scores largely consistent across the two studies. Importantly, there were four genes that were not previously significant in the YFS/NTR cohorts individually and were significant in the combined study, but all four had been detected by the omnibus test. This is consistent with our simulations showing that TWAS power is saturated beyond 1,000 expression samples. Only 652 genes that were significantly cis-heritable in both YFS and NTR were analyzed.

Supplementary Figure 15 Distance between nearby genes and GWAS index SNP at known GWAS risk loci.

Each line shows the histogram of number of GWAS risk loci overlapping a gene as a function of distance from the index SNP. Black line shows the closest gene. Red denotes genes with significant cis SNP heritability. Green shows the TWAS significant gene.

Supplementary Figure 16 Correlation of TWAS Z-scores from summary-level and individual-level data in METSIM.

Joint distribution of summary-level and individual-level TWAS Z-scores shown across all significant genes for four traits in the METSIM GWAS cohort.

Supplementary Figure 17 Evidence for allelic heterogeneity of expression

Novel TWAS genes were evaluated for allelic heterogeneity of expression using three different tests (left SKAT gene-based test; middle conditional analysis; right LASSO model selection). For each test, genes were categorized into those not reaching TWAS significance in a cohort, reaching TWAS significance but failing the permutation test, and reaching TWAS significance and passing the permutation test. The % of genes considered significant (SKAT/conditional) or the average number of associated variants selected (LASSO) are reported by each bar.For each of the 69 genes x 3 expression cohorts we ran the SKAT gene-based test with expression as outcome to look for evidence of allelic heterogeneity. As with TWAS, all SNPs in the 1-Mb locus around the gene were used, and SKAT was run in common+rare mode (Ionita-Laza et al. Am. J. Hum. Genet. 2013), which was most appropriate for array SNP data. Default parameters were used throughout. Genes were considered significant if they surpassed the multiple-testing burden of 69 genes. The conditional test was evaluated by including the top SNP as a covariate and testing for any secondary eQTLs at the locus that were significant after Bonferroni correction for the number of SNPs at the locus. The LASSO was run on all SNPs in the locus, penalized by the cis-h2g of expression.

Supplementary Figure 18 Expression accuracy (by cross-validation) in 100 random genes.

Average of prediction R^2 divided by corresponding gene cis-h2g shown from randomly selected genes for three methods and two cohorts.

Supplementary Figure 19 Correlation of MuTHER TWAS Z-scores using different LD reference samples.

MuTHER expression data was used to train TWAS predictors using LD from 600 METSIM individuals (x-axis) or 6,000 METSIM individuals (y-axis). Joint distribution of TWAS Z-scores and trend are shown.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1– 19 and Supplementary Note. (PDF 2393 kb)

Supplementary Tables 1– 17

Supplementary Tables 1– 17. (XLSX 127 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gusev, A., Ko, A., Shi, H. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet 48, 245–252 (2016). https://doi.org/10.1038/ng.3506

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3506

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing