Letter
Published: 13 August 2018

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

Nature Genetics volume 50, pages 1219–1224 (2018)Cite this article

106k Accesses
1886 Citations
1697 Altmetric
Metrics details

Subjects

Abstract

A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation¹. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature^2,3,4,5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk⁶. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Risk for CAD according to GPS.**

**Fig. 3: Risk gradient for disease according to the GPS percentile.**

Electronic health records and polygenic risk scores for predicting disease risk

Article 31 March 2020

Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers

Article 07 April 2020

Improving reporting standards for polygenic scores in risk prediction studies

Article 10 March 2021

References

Green, E. D. & Guyer, M. S., National Human Genome Research Institute. Charting a course for genomic medicine from base pairs to bedside. Nature 470, 204–213 (2011).
Article CAS PubMed Google Scholar
Fisher, R. A. The correlation between relatives on the supposition of Mendelian inheritance. Proc. R. Soc. Edinb. 52, 99–433 (1918).
Google Scholar
Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2012).
Article CAS PubMed PubMed Central Google Scholar
Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).
Article CAS PubMed PubMed Central Google Scholar
Abul-Husn, N. S. et al. Genetic identification of familial hypercholesterolemia within a single U.S. health care system. Science 354, pii: aaf7000 (2016).
Article Google Scholar
Nordestgaard, B. G. et al. Familial hypercholesterolaemia is underdiagnosed and undertreated in the general population: guidance for clinicians to prevent coronary heart disease: consensus statement of the European Atherosclerosis Society. Eur. Heart J. 34, 3478–3490a (2013).
Article CAS PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Estrada, K. et al. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. JAMA 311, 2305–2314 (2014).
Article PubMed Google Scholar
Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–405 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits and implications for the future. Preprint at https://www.biorxiv.org/content/early/2017/08/11/175406 (2017).
Ripatti, S. et al. A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses. Lancet 376, 1393–1400 (2010).
Article PubMed PubMed Central Google Scholar
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Article PubMed PubMed Central Google Scholar
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Bycroft, C. et al. Genome-wide genetic data on ~500,000 UK Biobank participants. Preprint at https://www.biorxiv.org/content/early/2017/07/20/166298 (2017).
Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tada, H. et al. Risk prediction by genetic risk scores for coronary heart disease is independent of self-reported family history. Eur. Heart J. 37, 561–567 (2016).
Article CAS PubMed Google Scholar
Abraham, G. et al. Genomic prediction of coronary heart disease. Eur. Heart J. 37, 3267–3278 (2016).
Article CAS PubMed PubMed Central Google Scholar
Khera, A. V. et al. Genetic risk, adherence to a healthy lifestyle, and coronary disease. N. Engl. J. Med. 375, 2349–2358 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mega, J. L. et al. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet 385, 2264–2271 (2015).
Article CAS PubMed PubMed Central Google Scholar
Natarajan, P. et al. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation 135, 2091–2101 (2017).
Article PubMed PubMed Central Google Scholar
January, C. T. et al. 2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines and the Heart Rhythm Society. Circulation 130, e199–e267 (2014).
PubMed PubMed Central Google Scholar
GBD 2015 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 388, 1545–1602 (2016).
Article Google Scholar
Knowler, W. C. et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N. Engl. J. Med. 346, 393–403 (2002).
Article CAS PubMed Google Scholar
Abraham, C. & Cho, J. H. Inflammatory bowel disease. N. Engl. J. Med. 361, 2066–2078 (2009).
Article CAS PubMed PubMed Central Google Scholar
Pharoah, P. D., Antoniou, A. C., Easton, D. F. & Ponder, B. A. Polygenes, risk prediction, and targeted prevention of breast cancer. N. Engl. J. Med. 358, 2796–2803 (2008).
Article CAS PubMed Google Scholar
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Article PubMed PubMed Central Google Scholar
Khera, A. V. & Kathiresan, S. Is coronary atherosclerosis one disease or many? Setting realistic expectations for precision medicine. Circulation 135, 1005–1007 (2017).
Article PubMed PubMed Central Google Scholar
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Article CAS PubMed PubMed Central Google Scholar
Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet. 49, 946–952 (2017).
Article CAS PubMed PubMed Central Google Scholar
Scott, R. A. et al. An expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes 66, 2888–2902 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Article CAS PubMed PubMed Central Google Scholar
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Article PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Ganna, A. et al. Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. Nat. Neurosci. 19, 1563–1565 (2016).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

UK Biobank analyses were conducted via application 7089 using a protocol approved by the Partners HealthCare Institutional Review Board. The analysis was supported by a KL2/Catalyst Medical Research Investigator Training award from Harvard Catalyst funded by the National Institutes of Health (TR001100 to A.V.K.), a Junior Faculty Research Award from the National Lipid Association (to A.V.K.), the National Heart, Lung, and Blood Institute of the US National Institutes of Health under award numbers T32 HL007208 (to K.G.A.), K23HL114724 (to S.A.L.), R01HL139731 (to S.A.L.), RO1HL092577 (to P.T.E.), R01HL128914 (to P.T.E.), K24HL105780 (to P.T.E.), and RO1 HL127564 (to S.K.), the National Human Genome Research Institute of the US National Institutes of Health under award number 5UM1HG008895 (to E.S.L. and S.K.), the Doris Duke Charitable Foundation under award number 2014105 (to S.A.L.), the Foundation Leducq under award number 14CVD01 (to P.T.E.), and the Ofer and Shelly Nemirovsky Research Scholar Award from Massachusetts General Hospital (to S.K.). The authors thank D. Altshuler (Vertex Pharmaceuticals, Boston, MA) for comments on an earlier version of this manuscript.

Author information

These authors contributed equally: Amit V. Khera, Mark Chaffin.

Authors and Affiliations

Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Amit V. Khera, Krishna G. Aragam & Sekar Kathiresan
Cardiology Division of the Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
Amit V. Khera, Krishna G. Aragam, Pradeep Natarajan, Steven A. Lubitz, Patrick T. Ellinor & Sekar Kathiresan
Harvard Medical School, Boston, MA, USA
Amit V. Khera, Krishna G. Aragam, Pradeep Natarajan, Steven A. Lubitz, Patrick T. Ellinor & Sekar Kathiresan
Cardiovascular Disease Initiative of the Broad Institute of Harvard and MIT, Cambridge, MA, USA
Amit V. Khera, Mark Chaffin, Krishna G. Aragam, Mary E. Haas, Carolina Roselli, Seung Hoan Choi, Pradeep Natarajan, Eric S. Lander, Steven A. Lubitz, Patrick T. Ellinor & Sekar Kathiresan

Authors

Amit V. Khera
View author publications
You can also search for this author in PubMed Google Scholar
Mark Chaffin
View author publications
You can also search for this author in PubMed Google Scholar
Krishna G. Aragam
View author publications
You can also search for this author in PubMed Google Scholar
Mary E. Haas
View author publications
You can also search for this author in PubMed Google Scholar
Carolina Roselli
View author publications
You can also search for this author in PubMed Google Scholar
Seung Hoan Choi
View author publications
You can also search for this author in PubMed Google Scholar
Pradeep Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Eric S. Lander
View author publications
You can also search for this author in PubMed Google Scholar
Steven A. Lubitz
View author publications
You can also search for this author in PubMed Google Scholar
Patrick T. Ellinor
View author publications
You can also search for this author in PubMed Google Scholar
Sekar Kathiresan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.V.K., M.C., and S.K. conceived and designed the study. A.V.K., M.C., K.G.A., M.E.H., C.R., S.H.C., and S.A.L. acquired, analyzed, and interpreted the data. A.V.K., M.C., E.S.L., and S.K. drafted the manuscript. A.V.K., M.C., P.N., E.S.L., P.T.E., and S.K. critically revised the manuscript for important intellectual content.

Corresponding author

Correspondence to Sekar Kathiresan.

Ethics declarations

Competing interests

A.V.K. and S.K. are listed as co-inventors on a patent application for the use of genetic risk scores to determine risk and guide therapy. S.K. and P.T.E. are supported by a grant from Bayer AG to the Broad Institute focused on the genetics and therapeutics of myocardial infarction and atrial fibrillation.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Risk gradient for coronary artery disease across the distribution of the genome-wide polygenic score and two previously published scores.

a–c, Three polygenic scores for coronary artery disease were calculated within the UK Biobank testing dataset of 288,978 participants: a previously published score comprising 50 variants that had achieved genome-wide levels of statistical significance in previous studies (Eur. Heart J. 37, 561–567, 2016) (a); a previously published score comprising 49,310 variants derived from a Metabochip GWAS (Eur. Heart J. 37, 3267–3278, 2016) (b); and the newly derived genome-wide polygenic score comprising 6,630,150 variants (c). For each score, the population was divided into 100 bins according to percentile of the score and prevalence of coronary artery disease within each bin plotted. The prevalence of coronary artery disease across score percentiles ranged from 1.4% to 5.9% for the 50-variant score, 1.0% to 7.2% for the 49,310-variant score, and 0.8% to 11.1% for the 6,630,150-variant genome-wide polygenic score.

Supplementary Figure 2 Predicted versus observed prevalence of coronary artery disease according to genome-wide polygenic score percentile.

For each individual within the UK Biobank testing dataset, the predicted probability of disease was calculated using a logistic regression model with only the genome-wide polygenic score (GPS) as a predictor. The predicted prevalence of disease within each percentile bin of the GPS distribution was calculated as the average predicted probability of all individuals within that bin. The shape of the predicted risk gradient was consistent with the empirically observed risk gradient, reflected by black and blue dots, respectively.

Supplementary Figure 3 Predicted versus observed prevalence of four diseases according to genome-wide polygenic score percentile.

a–d, For each individual within the UK Biobank testing dataset, the predicted probability of disease was calculated using a logistic regression model with only the genome-wide polygenic score (GPS) as a predictor. The predicted prevalence of disease within each percentile bin of the GPS distribution was calculated as the average predicted probability of all individuals within that bin. The shape of the predicted risk gradient was consistent with the empirically observed risk gradient, reflected by black and blue dots, respectively, for each of four diseases: atrial fibrillation (a), type 2 diabetes (b), inflammatory bowel disease (c), and breast cancer (d). Breast cancer analys is was restricted to female participants.

Supplementary Information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–10

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khera, A.V., Chaffin, M., Aragam, K.G. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 50, 1219–1224 (2018). https://doi.org/10.1038/s41588-018-0183-z

Download citation

Received: 15 February 2018
Accepted: 21 June 2018
Published: 13 August 2018
Issue Date: September 2018
DOI: https://doi.org/10.1038/s41588-018-0183-z

This article is cited by

Polygenic scores and their applications in kidney disease
- Atlas Khan
- Krzysztof Kiryluk
Nature Reviews Nephrology (2025)
Ancestry-aligned polygenic scores combined with conventional risk factors improve prediction of cardiometabolic outcomes in African populations
- Michelle Kamp
- Oliver Pain
- Michèle Ramsay
Genome Medicine (2024)
Optimizing and benchmarking polygenic risk scores with GWAS summary statistics
- Zijie Zhao
- Tim Gruenloh
- Qiongshi Lu
Genome Biology (2024)
Improving on polygenic scores across complex traits using select and shrink with summary statistics (S4) and LDpred2
- Jonathan P. Tyrer
- Pei-Chen Peng
- Paul D. Pharoah
BMC Genomics (2024)
Leveraging large-scale datasets and single cell omics data to develop a polygenic score for cisplatin-induced ototoxicity
- Deanne Nixie R. Miao
- MacKenzie A. P. Wilke
- Britt I. Drögemöller
Human Genomics (2024)

Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

Subjects

Abstract

Access options

Similar content being viewed by others

Electronic health records and polygenic risk scores for predicting disease risk

Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers

Improving reporting standards for polygenic scores in risk prediction studies

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Integrated supplementary information

Supplementary Figure 1 Risk gradient for coronary artery disease across the distribution of the genome-wide polygenic score and two previously published scores.

Supplementary Figure 2 Predicted versus observed prevalence of coronary artery disease according to genome-wide polygenic score percentile.

Supplementary Figure 3 Predicted versus observed prevalence of four diseases according to genome-wide polygenic score percentile.

Supplementary Information

Supplementary Text and Figures

Reporting Summary

Rights and permissions

About this article

Cite this article

This article is cited by

Polygenic scores and their applications in kidney disease

Ancestry-aligned polygenic scores combined with conventional risk factors improve prediction of cardiometabolic outcomes in African populations

Optimizing and benchmarking polygenic risk scores with GWAS summary statistics

Improving on polygenic scores across complex traits using select and shrink with summary statistics (S4) and LDpred2

Leveraging large-scale datasets and single cell omics data to develop a polygenic score for cisplatin-induced ototoxicity

Genetic risks and clinical rewards

Bringing polygenic risk scores to the clinic

Genome-wide polygenic risk predictors for kidney disease

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Integrated supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links