Article
Published: 08 February 2016

Integrative approaches for large-scale transcriptome-wide association studies

Alexander Gusev^1,2,3,
Arthur Ko ORCID: orcid.org/0000-0002-1523-7225^4,5,
Huwenbo Shi⁶,
Gaurav Bhatia^1,2,3,
Wonil Chung ORCID: orcid.org/0000-0002-5766-6247¹,
Brenda W J H Penninx⁷,
Rick Jansen ORCID: orcid.org/0000-0002-3333-6737⁷,
Eco J C de Geus ORCID: orcid.org/0000-0001-6022-2666⁸,
Dorret I Boomsma⁸,
Fred A Wright⁹,
Patrick F Sullivan^10,11,12,
Elina Nikkola⁴,
Marcus Alvarez⁴,
Mete Civelek¹³,
Aldons J Lusis^4,13,
Terho Lehtimäki¹⁴,
Emma Raitoharju¹⁴,
Mika Kähönen¹⁵,
Ilkka Seppälä¹⁴,
Olli T Raitakari^16,17,
Johanna Kuusisto¹⁸,
Markku Laakso¹⁸,
Alkes L Price^1,2,3,
Päivi Pajukanta^4,5 &
…
Bogdan Pasaniuc^4,6,19

Nature Genetics volume 48, pages 245–252 (2016)Cite this article

57k Accesses
1258 Citations
69 Altmetric
Metrics details

Subjects

Abstract

Many genetic variants influence complex traits by modulating gene expression, thus altering the abundance of one or multiple proteins. Here we introduce a powerful strategy that integrates gene expression measurements with summary association statistics from large-scale genome-wide association studies (GWAS) to identify genes whose cis-regulated expression is associated with complex traits. We leverage expression imputation from genetic data to perform a transcriptome-wide association study (TWAS) to identify significant expression-trait associations. We applied our approaches to expression data from blood and adipose tissue measured in ∼3,000 individuals overall. We imputed gene expression into GWAS data from over 900,000 phenotype measurements to identify 69 new genes significantly associated with obesity-related traits (BMI, lipids and height). Many of these genes are associated with relevant phenotypes in the Hybrid Mouse Diversity Panel. Our results showcase the power of integrating genotype, gene expression and phenotype to gain insights into the genetic basis of complex traits.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Schematic of the TWAS approach.**

**Figure 2: Modes of expression causality.**

**Figure 3: Accuracy of individual-level expression imputation algorithms.**

**Figure 4: The number of genes with significant *cis* heritability observed at varying sample sizes.**

**Figure 5: Power of summary-based expression imputation algorithms.**

Systematic differences in discovery of genetic effects on gene expression and complex traits

Article 19 October 2023

Genome-wide large-scale multi-trait analysis characterizes global patterns of pleiotropy and unique trait-specific variants

Article Open access 14 August 2024

A comparison of the genes and genesets identified by GWAS and EWAS of fifteen complex traits

Article Open access 19 December 2022

References

Visscher, P.M., Brown, M.A., McCarthy, M.I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375, S1–S3 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lee, D., Bigdeli, T.B., Riley, B.P., Fanous, A.H. & Bacanu, S.A. DIST: direct imputation of summary statistics for unmeasured SNPs. Bioinformatics 29, 2925–2927 (2013).
Article CAS PubMed PubMed Central Google Scholar
Pasaniuc, B. et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914 (2014).
Article CAS PubMed PubMed Central Google Scholar
Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Locke, A.E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wood, A.R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Article CAS PubMed PubMed Central Google Scholar
Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat. Genet. 47, 345–352 (2015).
Article CAS PubMed PubMed Central Google Scholar
Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Article CAS PubMed PubMed Central Google Scholar
Albert, F.W. & Kruglyak, L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212 (2015).
Article CAS PubMed Google Scholar
Raj, T. et al. Polarization of the effects of autoimmune and neurodegenerative risk alleles in leukocytes. Science 344, 519–523 (2014).
Article CAS PubMed PubMed Central Google Scholar
Letourneau, A. et al. Domains of genome-wide gene expression dysregulation in Down's syndrome. Nature 508, 345–350 (2014).
Article CAS PubMed Google Scholar
Davis, L.K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).
Article PubMed PubMed Central CAS Google Scholar
Gamazon, E.R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Article CAS PubMed PubMed Central Google Scholar
Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wray, N.R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).
Article CAS PubMed PubMed Central Google Scholar
Nuotio, J. et al. Cardiovascular risk factors in 2011 and secular trends since 2007: the Cardiovascular Risk in Young Finns Study. Scand. J. Public Health 42, 563–571 (2014).
Article PubMed Google Scholar
Raitakari, O.T. et al. Cohort profile: the cardiovascular risk in Young Finns Study. Int. J. Epidemiol. 37, 1220–1226 (2008).
Article PubMed Google Scholar
Wright, F.A. et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46, 430–437 (2014).
Article CAS PubMed PubMed Central Google Scholar
Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Article PubMed PubMed Central CAS Google Scholar
Torres, J.M. et al. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. Am. J. Hum. Genet. 95, 521–534 (2014).
Article CAS PubMed PubMed Central Google Scholar
Buil, A. et al. Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins. Nat. Genet. 47, 88–91 (2015).
Article CAS PubMed Google Scholar
Nica, A.C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).
Article PubMed PubMed Central CAS Google Scholar
Robinson, G.K. That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6, 15–32 (1991).
Article Google Scholar
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
Article CAS PubMed PubMed Central Google Scholar
Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–405, e1–e3 (2013).
Article CAS PubMed PubMed Central Google Scholar
Brown, C.D., Mangravite, L.M. & Engelhardt, B.E. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLoS Genet. 9, e1003649 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wen, X., Luca, F. & Pique-Regi, R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).
Article PubMed PubMed Central CAS Google Scholar
Wood, A.R. et al. Another explanation for apparent epistasis. Nature 514, E3–E5 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Article CAS PubMed PubMed Central Google Scholar
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
PubMed PubMed Central Google Scholar
Lee, D. et al. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics 31, 1176–1182 (2015).
Article PubMed Google Scholar
Pritchard, J.K. & Cox, N.J. The allelic architecture of human disease genes: common disease–common variant...or not? Hum. Mol. Genet. 11, 2417–2423 (2002).
Article CAS PubMed Google Scholar
Wen, W. et al. Meta-analysis of genome-wide association studies in East Asian–ancestry populations identifies four new loci for body mass index. Hum. Mol. Genet. 23, 5492–5504 (2014).
Article CAS PubMed PubMed Central Google Scholar
Pers, T.H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Article CAS PubMed Google Scholar
Smith, G.D. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).
Article PubMed Google Scholar
Pickrell, J. Fulfilling the promise of Mendelian randomization. bioRxiv doi:10.1101/018150 (16 April 2015).
Pierce, B.L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).
Article PubMed PubMed Central Google Scholar
Gusev, A. et al. Quantifying missing heritability at known GWAS loci. PLoS Genet. 9, e1003993 (2013).
Article PubMed PubMed Central CAS Google Scholar
Dimas, A.S. et al. Common regulatory variation impacts gene expression in a cell type–dependent manner. Science 325, 1246–1250 (2009).
Article CAS PubMed PubMed Central Google Scholar
Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).
Article CAS PubMed Google Scholar
Pickrell, J.K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).
Article PubMed PubMed Central CAS Google Scholar
Finucane, H.K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Stancáková, A. et al. Hyperglycemia and a common variant of GCKR are associated with the levels of eight amino acids in 9,369 Finnish men. Diabetes 61, 1895–1902 (2012).
Article PubMed PubMed Central CAS Google Scholar
Stancáková, A. et al. Changes in insulin sensitivity and insulin release in relation to glycemia and glucose tolerance in 6,414 Finnish men. Diabetes 58, 1212–1221 (2009).
Article PubMed PubMed Central CAS Google Scholar
Turchin, M.C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).
Article CAS PubMed PubMed Central Google Scholar
Boomsma, D.I. et al. Netherlands Twin Register: from twins to twin families. Twin Res. Hum. Genet. 9, 849–857 (2006).
Article PubMed Google Scholar
Bulik-Sullivan, B.K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
CAS PubMed PubMed Central Google Scholar
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bolormaa, S. et al. A multi-trait, meta-analysis for detecting pleiotropic polymorphisms for stature, fatness and reproduction in beef cattle. PLoS Genet. 10, e1004198 (2014).
Article PubMed PubMed Central CAS Google Scholar
Zhu, X. et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am. J. Hum. Genet. 96, 21–36 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zaitlen, N., Pas¸aniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).
Article CAS PubMed PubMed Central Google Scholar
Han, B., Kang, H.M. & Eskin, E. Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet. 5, e1000456 (2009).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

We thank the individuals who participated in the study. We also acknowledge L. Yang for helpful discussions that have improved the quality of this manuscript. We also thank K. Mohlke, M. Boehnke and F. Collins for help with the METSIM data. This work was funded in part by US National Institutes of Health (NIH) grants F32 GM106584 (A.G.), R01 GM053725 (B.P.), R01 GM105857 (A.L.P., A.G. and G.B.), HL-28481 (P.P., A.J.L. and M.C.) and HL-095056 (P.P. and B.P.) and by the US NIH training grant in Genomic Analysis and Interpretation T32 HG002536 (A.K.).

Author information

Authors and Affiliations

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Alexander Gusev, Gaurav Bhatia, Wonil Chung & Alkes L Price
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
Alexander Gusev, Gaurav Bhatia & Alkes L Price
Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
Alexander Gusev, Gaurav Bhatia & Alkes L Price
Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
Arthur Ko, Elina Nikkola, Marcus Alvarez, Aldons J Lusis, Päivi Pajukanta & Bogdan Pasaniuc
Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, USA
Arthur Ko & Päivi Pajukanta
Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, California, USA
Huwenbo Shi & Bogdan Pasaniuc
Department of Psychiatry, VU University Medical Center, Amsterdam, the Netherlands
Brenda W J H Penninx & Rick Jansen
Department of Biological Psychology, VU University, Amsterdam, the Netherlands
Eco J C de Geus & Dorret I Boomsma
Department of Statistics, Department of Biological Sciences, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
Fred A Wright
Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
Patrick F Sullivan
Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina, USA
Patrick F Sullivan
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Patrick F Sullivan
Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
Mete Civelek & Aldons J Lusis
Department of Clinical Chemistry, Fimlab Laboratories and University of Tampere School of Medicine, Tampere, Finland
Terho Lehtimäki, Emma Raitoharju & Ilkka Seppälä
Department of Clinical Physiology, Pirkanmaa Hospital District and University of Tampere School of Medicine, Tampere, Finland
Mika Kähönen
Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
Olli T Raitakari
Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
Olli T Raitakari
Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
Johanna Kuusisto & Markku Laakso
Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
Bogdan Pasaniuc

Authors

Alexander Gusev
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Ko
View author publications
You can also search for this author in PubMed Google Scholar
Huwenbo Shi
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Bhatia
View author publications
You can also search for this author in PubMed Google Scholar
Wonil Chung
View author publications
You can also search for this author in PubMed Google Scholar
Brenda W J H Penninx
View author publications
You can also search for this author in PubMed Google Scholar
Rick Jansen
View author publications
You can also search for this author in PubMed Google Scholar
Eco J C de Geus
View author publications
You can also search for this author in PubMed Google Scholar
Dorret I Boomsma
View author publications
You can also search for this author in PubMed Google Scholar
Fred A Wright
View author publications
You can also search for this author in PubMed Google Scholar
Patrick F Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Elina Nikkola
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Alvarez
View author publications
You can also search for this author in PubMed Google Scholar
Mete Civelek
View author publications
You can also search for this author in PubMed Google Scholar
Aldons J Lusis
View author publications
You can also search for this author in PubMed Google Scholar
Terho Lehtimäki
View author publications
You can also search for this author in PubMed Google Scholar
Emma Raitoharju
View author publications
You can also search for this author in PubMed Google Scholar
Mika Kähönen
View author publications
You can also search for this author in PubMed Google Scholar
Ilkka Seppälä
View author publications
You can also search for this author in PubMed Google Scholar
Olli T Raitakari
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Kuusisto
View author publications
You can also search for this author in PubMed Google Scholar
Markku Laakso
View author publications
You can also search for this author in PubMed Google Scholar
Alkes L Price
View author publications
You can also search for this author in PubMed Google Scholar
Päivi Pajukanta
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Pasaniuc
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.G. and B.P. conceived and designed the experiments. A.G., A.K. and H.S. performed the experiments and analyzed the data. G.B., W.C., B.W.J.H.P., R.J., E.J.C.d.G., D.I.B., F.A.W., P.F.S., E.N., M.A., M.C., A.J.L., T.L., E.R., M.K., I.S., O.T.R., J.K. and M.L. generated data, reagents, materials and analysis tools. A.G., A.L.P., P.P. and B.P. wrote the manuscript. All authors reviewed, revised and wrote feedback for the manuscript.

Corresponding authors

Correspondence to Alexander Gusev or Bogdan Pasaniuc.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Distribution of cis and trans SNP-heritability estimates cross three cohorts.

Cis (left) and trans (right) density plots shown for three cohorts investigated. Black line corresponds to all converged genes; red line corresponds to genes with cis-SNP-heritability >> 0 by LRT. Dotted lines show respective means.

Supplementary Figure 2 Scaled Venn Diagram of overlap in genes with significant cis SNP-heritability across the METSIM, NTR and YFS data.

Supplementary Figure 3 Cross-validation prediction accuracy across three expression cohorts.

Using 10-fold cross-validation, prediction R^2 (divided by corresponding cis-h2g) was compuated in the METSIM, YFS and NTR (Wright et al.) cohorts by three methods: best eQTL in the gene, BLUP and BSLMM. Left panels show accuracy as a functino of cis h2g; right panels show accuracy as a function of LRT P-value for non-zero cis-h2g.

Supplementary Figure 4 Histogram of BSLMM prediction gains in three cohorts.

For each of the METSIM, YFS and NTR (Wright et al.), the difference between BSLMM R^2 and eQTL R^2 (computed by cross-validation) is plotted as histogram.

Supplementary Figure 5 Correlation of summary-based TWAS and individual-level association Z scores.

Genes with significant cis-h2g in YFS were directly tested for association with height (Z-score on y-axis) and plotted against corresponding Z-score from TWAS using only height summary association data (x-axis). Left panel shows test of height against total expression on the y-axis (\rho = 0.415); right panel shows test of height against BLUP genetic component of expression on the y-axis (\rho = 0.998). Right panel demonstrates that summary-based TWAS is essentially identical to individual-level TWAS when using in-sample LD.

Supplementary Figure 6 GWAS summary simulations over diverse disease architectures.

Power to detect a genome-wide significant association is shown for three methods (GWAS; eGWAS computed from best eQTL, and TWAS computed from summary statistics) over different disease architectures. Colors represent number of causal variants in each simulated gene. Left/right panels correspond to simulations with causal variants hidden/shown. Top/bottom panels correspond to simulations with each gene explaining 0.001/0.0005 of trait variance. x-axis shows simulations at increasing GWAS sample sizes (with expression panel fixed at 1,000 samples).

Supplementary Figure 7 TWAS power with expression and genetic component of expression.

Power to detect a genome-wide significant association is shown for summary-statistic TWAS using observed expression as well BLUP genetic value of expression. Left/right panels correspond to simulations with causal variants hidden/shown. Top/bottom panels correspond to simulations with each gene explaining 0.001/0.0005 of trait variance. x-axis shows simulations at increasing GWAS sample sizes (with expression panel fixed at 1,000 samples).

Supplementary Figure 8 GWAS summary simulations with heritable expression independent of phenotype.

Expression and trait were simulated as having the same causal variants but independent effect-sizes and power evaluated. For a single causal variant, this model is statistically identical to a true causal model. Colors indicate the method used (see Supplementary Fig. 4). Left/right panels correspond to simulations with causal variants hidden/shown. Top/bottom panels correspond to simulations with each gene explaining 0.001/0.0005 of trait variance. x-axis shows simulations at increasing GWAS sample sizes (with expression panel fixed at 1,000 samples).

Supplementary Figure 9 TWAS/eGWAS power with increased sample reference size.

: Results from simulations matching Supplementary Figure 4 (top left) model with variable reference panel size from 100-2,000 individuals. Phenotype generated under to the untyped, high effect variant model where expression explains 0.001 of trait variance. TWAS matches or outperforms eGWAS at 1,000 samples, with little additional power gained subsequently. GWAS power is unaffected by expression reference panel and not shown (identical to Supplementary Fig. 4).

Supplementary Figure 10 TWAS/COLOC power under causal model.

Single locus power comparison for causal variant hidden (left) and typed (right), high effect model where expression explains 0.001 of trait variance. Unlike other power simulations, results are not adjusted for genome/transcriptome-wide multiple testing. TWAS and COLOC significance thresholds set in a null expression simulation (with realistic heritable GWAS, see Methods) to achieve 5% FDR.

Supplementary Figure 11 GWAS summary simulations using haplotype-copying model of population.

Results from simulations matching Supplementary Figure 4 re-analyzed using HAPGEN2 to generate all GWAS sub-study individuals. After holding out 1,000 samples for the expression reference, SHAPEIT2 was used to phase 5,000 individuals in a cis-block for each of 100 random genes, yielding 100 random reference panels. For each reference panel we used HAPGEN to generate 60 sub-studies of 5,000 individuals using its haplotype-copying model (under neutrality). Phenotypes were generated with the same causal variants and effects for each sub-study, and the corresponding GWAS summary statistics were computed by meta-analyzing across these batches. Default parameters for SHAPEIT and HAPGEN were always used, and the appropriate genetic map was downloaded from the SHAPEIT2 website.

Supplementary Figure 12 TWAS/COLOC power under independent effects model.

Single locus power comparison for model where expression and trait have the same causal variants with independent effects. GWAS generated from high effect model where expression explains 0.001 of trait variance and causal variants are hidden (left) or typed (right). Unlike other power simulations, results are not adjusted for genome/transcriptome-wide multiple testing. TWAS and COLOC significance thresholds set in a null expression simulation (with realistic heritable GWAS, see Methods) to achieve 5% FDR. Single causal variant scenario is statistically identical to the causal expression model and has equivalent power.

Supplementary Figure 13 LD Score regression (LDSC) provides a noisier local estimate than TWAS for quantifying correlation between complex trait and expression.

Left panel shows in-sample and summary-based estimate of association between height and cis genetic component of expression (correlation = 0.998). Right panel shows in-sample estimate of correlation between height and cis genetic component of expression (y-axis) and LDSC estimate of genetic correlation (x-axis); correlation between the two statistics was 0.7.

Supplementary Figure 14 Distribution of TWAS statistics for height as a function of expression panel size in the merged YFS + NTR (blood) cohorts.

To evaluate the effect of expression panel size on power in real data, we re-ran the TWAS for height using the merged YFS+NTR cohort with down-sampling. The histogram of TWAS Z^2 scores is shown for training from randomly sampled half of the cohort (1,200 samples, top) and the full cohort (2,400 samples, bottom). The mean TWAS Z^2 was 5.8 in the combined cohort and 5.6 in the down-sampled cohort, with the overall distribution of Z-scores largely consistent across the two studies. Importantly, there were four genes that were not previously significant in the YFS/NTR cohorts individually and were significant in the combined study, but all four had been detected by the omnibus test. This is consistent with our simulations showing that TWAS power is saturated beyond 1,000 expression samples. Only 652 genes that were significantly cis-heritable in both YFS and NTR were analyzed.

Supplementary Figure 15 Distance between nearby genes and GWAS index SNP at known GWAS risk loci.

Each line shows the histogram of number of GWAS risk loci overlapping a gene as a function of distance from the index SNP. Black line shows the closest gene. Red denotes genes with significant cis SNP heritability. Green shows the TWAS significant gene.

Supplementary Figure 16 Correlation of TWAS Z-scores from summary-level and individual-level data in METSIM.

Joint distribution of summary-level and individual-level TWAS Z-scores shown across all significant genes for four traits in the METSIM GWAS cohort.

Supplementary Figure 17 Evidence for allelic heterogeneity of expression

Novel TWAS genes were evaluated for allelic heterogeneity of expression using three different tests (left SKAT gene-based test; middle conditional analysis; right LASSO model selection). For each test, genes were categorized into those not reaching TWAS significance in a cohort, reaching TWAS significance but failing the permutation test, and reaching TWAS significance and passing the permutation test. The % of genes considered significant (SKAT/conditional) or the average number of associated variants selected (LASSO) are reported by each bar.For each of the 69 genes x 3 expression cohorts we ran the SKAT gene-based test with expression as outcome to look for evidence of allelic heterogeneity. As with TWAS, all SNPs in the 1-Mb locus around the gene were used, and SKAT was run in common+rare mode (Ionita-Laza et al. Am. J. Hum. Genet. 2013), which was most appropriate for array SNP data. Default parameters were used throughout. Genes were considered significant if they surpassed the multiple-testing burden of 69 genes. The conditional test was evaluated by including the top SNP as a covariate and testing for any secondary eQTLs at the locus that were significant after Bonferroni correction for the number of SNPs at the locus. The LASSO was run on all SNPs in the locus, penalized by the cis-h2g of expression.

Supplementary Figure 18 Expression accuracy (by cross-validation) in 100 random genes.

Average of prediction R^2 divided by corresponding gene cis-h2g shown from randomly selected genes for three methods and two cohorts.

Supplementary Figure 19 Correlation of MuTHER TWAS Z-scores using different LD reference samples.

MuTHER expression data was used to train TWAS predictors using LD from 600 METSIM individuals (x-axis) or 6,000 METSIM individuals (y-axis). Joint distribution of TWAS Z-scores and trend are shown.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1– 19 and Supplementary Note. (PDF 2393 kb)

Supplementary Tables 1– 17

Supplementary Tables 1– 17. (XLSX 127 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gusev, A., Ko, A., Shi, H. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet 48, 245–252 (2016). https://doi.org/10.1038/ng.3506

Download citation

Received: 22 June 2015
Accepted: 14 January 2016
Published: 08 February 2016
Issue Date: March 2016
DOI: https://doi.org/10.1038/ng.3506

This article is cited by

The genetic architecture of youth anxiety: a study protocol
- Laina McAusland
- Christie L. Burton
- Sandra Meier
BMC Psychiatry (2024)
Shared genetic effect of kidney function on bipolar and major depressive disorders: a large-scale genome-wide cross-trait analysis
- Simin Yu
- Yifei Lin
- Jin Huang
Human Genomics (2024)
Associations between genetically predicted plasma protein levels and Alzheimer’s disease risk: a study using genetic prediction models
- Jingjing Zhu
- Shuai Liu
- Lang Wu
Alzheimer's Research & Therapy (2024)
A multi-ancestry genetic study of pain intensity in 598,339 veterans
- Sylvanus Toikumo
- Rachel Vickers-Smith
- Henry R. Kranzler
Nature Medicine (2024)
Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits
- Siming Zhao
- Wesley Crouse
- Xin He
Nature Genetics (2024)