From genome-wide association studies to Mendelian randomization: novel opportunities for understanding cardiovascular disease causality, pathogenesis, prevention, and treatment

2018, Cardiovascular Research

SPOTLIGHT REVIEW Cardiovascular Research (2018) 114, 1192–1208 doi:10.1093/cvr/cvy045 Marianne Benn1,2,3* and Børge G. Nordestgaard2,3,4,5 1 Department of Clinical Biochemistry, Rigshospitalet, Copenhagen University Hospital, 2100 Copenhagen, Denmark; 2The Copenhagen General Population Study, Herlev and Gentofte Hospital, Copenhagen University Hospital, Denmark; 3Faculty of Health and Medical Sciences, University of Copenhagen, Denmark; 4Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Denmark; and 5The Copenhagen City Heart Study, Frederiksberg Hospital, Copenhagen University Hospital, Denmark Received 12 October 2017; revised 8 December 2017; editorial decision 22 December 2017; accepted 16 February 2018; online publish-ahead-of-print 19 February 2018 Abstract The Mendelian randomization approach is an epidemiological study design incorporating genetic information into traditional epidemiological studies to infer causality of biomarkers, risk factors, or lifestyle factors on disease risk. Mendelian randomization studies often draw on novel information generated in genome-wide association studies on causal associations between genetic variants and a risk factor or lifestyle factor. Such information can then be used in a largely unconfounded study design free of reverse causation to understand if and how risk factors and lifestyle factors cause cardiovascular disease. If causation is demonstrated, an opportunity for prevention of disease is identified; importantly however, before prevention or treatment can be implemented, randomized intervention trials altering risk factor levels or improving deleterious lifestyle factors needs to document reductions in cardiovascular disease in a safe and side-effect sparse manner. Documentation of causality can also inform on potential drug targets, more likely to be successful than prior approaches often relying on animal or cell studies mainly. The present review summarizes the history and background of Mendelian randomization, the study design, assumptions for using the design, and the most common caveats, followed by a discussion on advantages and disadvantages of different types of Mendelian randomization studies using one or more samples and different levels of information on study participants. Historical background Accumulation of diseases in families and inheritance of traits in nature have been recognized for centuries. A specific mode of inheritance was first described by Gregor Mendel in 1865. Using pea plants, Mendel showed how each individual organism carry two alleles for each trait, each segregating to a daughter cell during cell division (Mendel’s first law, Law of segregation of alleles) and showed how alleles for separate traits .. .. .. .. .. .. .. .. .. .. .. .. are passed independently of one another and randomly from parent to offspring (Mendel’s second law, Law of independent assortment). Early disease mapping focused on monogenic diseases and relied on segregation- and linkage-analysis of genetic markers and a disease in large families.1 Due to technological limitations, studies only genotyped a few hundred markers throughout the genome, but from this it became clear that unlike the rare monogenic diseases (one gene causes one disease), most diseases are caused by combinations of different genes and that * Corresponding author. Tel: þ45 3545 3040; fax: þ45 3545 2880, E-mail: C The Author(s) 2018. For permissions, please email: Published on behalf of the European Society of Cardiology. All rights reserved. V Downloaded from by guest on 23 February 2023 From genome-wide association studies to Mendelian randomization: novel opportunities for understanding cardiovascular disease causality, pathogenesis, prevention, and treatment 1193 From GWAS to Mendelian randomization 2. Observational studies and randomized trials Use of biomarkers and lifestyle factors may help identify opportunities for understanding disease causality, pathogenesis, prevention, and treatment, but it is pivotal to know whether the biomarker or lifestyle factor of Observaonal study (associaon only) Non-random distribuon LDL-C ↓ LDL-C ↑ Confounders unevenly distributed Cardiovascular disease Single point associaon Confounding Reverse causaon Regression diluon bias Miscalssificaon .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . interest is causal and contribute to disease pathogenesis, or simply is an innocent bystander. The term ‘risk factor’ is often used without distinguishing between whether the factor is causal or simply an innocent bystander. Traditionally, risk factors have been identified through observational non-genetic epidemiological studies; however, these studies have shortcomings. An association between a risk factor and a disease may in the observational study design be due to confounding and/or reverse causation, variation in the measurement of the risk factor may cause misclassification, and risk estimates may be biased by regression dilution bias (Figure 1, left panel). For lifestyle risk factors like alcohol, coffee, or milk intake or smoking, information on the amount used may not be reliable or may be difficult to standardize due to variation in the products used. To circumvent some of these shortcomings and to be able to infer causality between a biomarker or lifestyle risk factor and disease, randomized clinical intervention trials have traditionally been considered the ‘gold standard’. The randomized clinical intervention trial relies on a doubleblinded randomization where confounders are evenly distributed between active drug (or other intervention) and placebo (or control study). The study design excludes reverse causation and regression dilution bias, but may be biased by off-target effects (pleiotropy) of the active drug. Randomized clinical intervention trials are expensive and to reduce cost, studies often include individuals with a high a priori risk of event to increase the likelihood of demonstrating efficacy of the intervention. Together with the often limited duration of intervention trials, the applicability of results to other settings is limited and information on potential long term side effects may not be obtained (Figure 1, middle panel). 3. Mendelian randomization Central in the Mendelian randomization study design is the use of instrumental variables. The concept originates from econometrics Randomized trial (causal esmate) Mendelian randomizaon (causal esmate) Randomizaon method Random distribuon of alleles Placebo Drug: LDL-C↓ Confounders evenly distributed Normal allele Allele: LDL-C ↑ Confounders evenly distributed Cardiovascular disease Cardiovascular disease 2-10 year effect Double blind No reverse causaon Pleiotrophic (off-target) effects No regression diluon bias Life long effect Double blind No reverse causaon Pleiotrophic effects No regression diluon bias Figure 1 Comparison of observational studies, randomized trials, and Mendelian randomization studies to help understand causality from a risk factor, i.e. high low-density lipoprotein (LDL) cholesterol levels to risk of cardiovascular disease. Downloaded from by guest on 23 February 2023 environmental factors likewise play a role in the development and progression of common diseases (multiple common genetic variants and the environment combined causes multifactorial/complex diseases). It also became clear that to examine associations between genetic variation and disease, a reference sequence of the human genome was essential. To obtain this, the Human Genome Project was initiated in 1990 and had by 2003 sequenced most of the entire human genome.2,3 In 2002, the International HapMap Project aiming to develop a haplotype map of genetic variation in the human genome was initiated.4 As the purpose was to map variation in the human genome, individuals from four different populations were included: trios (offspring and parents) from Yoruba, Nigeria; trios from Utah, USA of western European ancestry; unrelated individuals from Tokyo, Japan; and Han Chinese individuals from Beijing, China. The first results were published in 2005 and the project has since been extended to include genetic variation in individuals from other populations than the original four.5 Sequencing of the human genome and mapping of genetic variation has been the foundation for genome-wide association studies (GWAS) of the association between millions of genetic variants and biomarkers, lifestyle factors, and disease outcomes. A useful byproduct of the identification of genetic variants strongly associated with risk factors or lifestyles has been new opportunities to test which risk factors or lifestyles are directly causally related to diseases, or simply represent markers of some known or unknown factors. These opportunities are the focus of the present review, as used in the so-called Mendelian randomization study design or approach. 1194 M. Benn and B.G. Nordestgaard and was first introduced by Phillip and Sewall Wright in 1928 and named instrumental variables by Reiersol in 1941.6 An instrumental variable is a measurable variable (in Mendelian randomization a genotype) which is associated with the exposure of interest [i.e. lowdensity lipoprotein (LDL) cholesterol], but not with any other factors or confounders. In the Mendelian randomization study design genetic variants following Mendelian inheritance are used as such instrumental variables,7 and gives the study design the name Mendelian randomization.8,9 In a given population, individuals can be divided into subgroups based on genetic variants. Some individuals will have the genetic allele A associated with i.e. average LDL cholesterol levels, while others will have the allele B for the same genetic variant associated with high LDL cholesterol levels. As the genetic variant is randomly distributed from other traits, environmental and other risk factors in the population, risk of disease can be compared directly between those with allele A and allele B (Figure 1, right panel), similar to the treatment vs. placebo groups in the randomized clinical intervention trial. A difference in risk of disease between the two allele groups A and B will therefore indicate a causal effect of the risk factor (i.e. LDL cholesterol) on the disease. A complete Mendelian randomization study design consists of four steps: (2) (3) (4) (1) (2) (3) The genetic instruments used must be associated with the exposure of interest (risk factor). On Figure 2, arrow #2, this means that the PCSK9 genetic variants selected as instruments in the example must be associated with LDL cholesterol concentrations. The genetic instruments used must only have an effect on the outcome through the risk factor under study, and must not be confounded. On Figure 2, this means that the PCSK9 genetic variants must only be associated with risk of cardiovascular disease via LDL cholesterol, and not for example via diabetes. The genetic variants used must be independent of the outcome through other mechanisms, and not associated with confounders of the relationship between the risk factor and the outcome. On Figure 2, this means that the PCSK9 genetic variants used as instruments must be independent of cardiovascular disease except through LDL cholesterol; and must not be associated with for example diabetes which is known ① (Cardiovascular disease) ④ ② ③ SNPs associated with risk factor (Z) (PCSK9) Observaonal associaon study Mendelian randomizaon study (MR) 2 sample Mendelian randomizaon study Genome-wide associaon study (GWAS) Figure 2 Mendelian randomization design shown with a risk factor [i.e. low-density lipoprotein (LDL) cholesterol], an outcome (i.e. cardiovascular disease), and genotypes associated with the risk factor (i.e. genetic variants in PCSK9) as instruments. First step is to examine whether the risk factor, high LDL cholesterol, is observationally associated with cardiovascular disease (Black arrow #1); whether LDL cholesterol increasing alleles are associated with high LDL cholesterol levels (Black arrow #2); whether LDL cholesterol increasing alleles are associated with risk of cardiovascular disease as an indication of causality (Black arrow #3); and finally, whether the causal effect of high LDL cholesterol level is consistent with the corresponding observational associations using instrumental variable analysis (Black arrow #4). Information used in a two sample Mendelian randomization study to generate a summary estimate similar to that obtained by instrumental variable analysis is shown in grey (Grey arrow #2, #3, and #4). Data for two sample Mendelian randomization studies are often obtained from genome-wide association studies (White arrow #2 and #3). Testing whether the risk factor of interest is associated with the disease of interest in an observational study design (Figure 2, #1). Testing whether the selected genetic variants are associated with the risk factor (Figure 2, #2). Testing whether the genetic variants associated with the risk factor also are associated with disease, as an indication of a causal effect of the risk factor on disease (Figure 2, #3). Combining the genetic estimates into a causal estimate of the effect of a specific unit change in the risk factor (Figure 2, #4). Step 3 provides information on whether there is a causal relationship, and Step 4 an estimate of the magnitude of this causal effect on risk. Testing the observational association in Step 1 is not necessary for inferring causality in step 2, 3, and 4. The Mendelian randomization design is largely free of confounding, free of reverse causation and regression dilution bias, and reflect life-long effects7; although these life-long effects can be attenuated if the individual develop compensatory mechanisms in response to the genetic risk factor, known as canalization.10 To use the Mendelian randomization principle and instrumental variable analysis to draw conclusions on causal effects, there are three key assumptions that must be fulfilled7: Outcome (Y) (LDL-cholesterol) .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . to have effect on both LDL cholesterol concentrations and on risk of cardiovascular disease.7 4. Selection of genetic variants as instruments in Mendelian randomization Data from genome-wide association studies provide a rich source of information on millions of genetic variants and their association with biomarkers and diseases. Box 1 includes a suggestion for a strategy to select genetic variants for a Mendelian randomization study. Genetic variants selected as instrumental variables should be both unconfounded and their effect explained exclusively through an effect on the risk factor of interest (no pleiotropy). To be unconfounded a genetic variant must be randomly distributed in the population. The necessary assumptions for this are random mating in the population and no selection effects relating to the variant violating Mendel’s second Law. Departure from these assumptions can be assessed by testing the Hardy–Weinberg equilibrium, comparing the observed allele frequency distribution in the population with the expected distribution. Several potential limitations apply to the Mendelian randomization design, the most notable being weak instruments, pleiotropy of Downloaded from by guest on 23 February 2023 (1) Risk factor (X) 1195 From GWAS to Mendelian randomization Box 1 Suggestions on how to select genetic variants to use as instrumental variables Before genotyping own population 1. Identify genetic variants from candidate gene studies and genome-wide association studies (GWAS) with similar ethnicities as your own study population. P thresholds for association of both 10-8 and 10-7 have been suggested. Information on many monogenic disorders can be found at Further, a compilation of summary level findings from genetic association studies. 2. Select genetic variants with effects on biomarker or lifestyle factor with a clinically meaningful effect size. Use beta-coefficients and standard error from GWAS or if possible from publicly available consortia data. 4. Check if selected genetic variants are in linkage disequilibrium. SNiPA: LD Hub: Genetic variants in major linkage disequilibrium should be excluded. 5. Check whether selected genetic variants have pleiotropic effects. LD Hub: Pheno Scanner: GWAS-Central: be searched by rs-number Genetic variants can be annotated to genes using: SNiPA: SNPedia: Ensembl: UCSC Genome Browser: Information on exclusion/inclusion of genetic variants should be included in the paper. After genotyping own population 6. Check genetic variants for deviation from the Hardy-Weinberg expectations as an indication of a genotyping error or a violation of Mendel’s second law. 7. Check that allele frequencies largely correspond to frequencies reported in genomic databases, i.e.: 8. Check the strength of the association between the genotype and the risk factor i.e. by 2 stage least square regression. r2 is a measure of the genotypes contribution to the variation in the phenotype and F is a measure of the genotype as an instrument for the phenotype. 9. If possible, confirm results in an independent sample. Link to tool for two-sample summary level Mendelian randomization instruments, linkage disequilibrium between genetic variants used as instruments, and population stratification. 4.1 Weak instruments In Mendelian randomization, genetic variants can be weak instruments if they only explain a small fraction of the variation in the risk factor (biomarker or the lifestyle factor) they serve as instruments for (Figure 3).11 The validity of an instrument is the result of several .. .. .. .. .. .. .. .. .. .. .. .. .. . (overlapping) factors: (i) study size; in a large study there will be a low variation of the regression of the genetic variant on the risk factor and the strength of the instrument will be high, (ii) frequency of the genetic variants used; a genetic variant occurring in very few individuals can only randomize a small proportion of the population into the specific risk category, (iii) magnitude of the effect size of the genetic variant on the risk factor, (iv) biological variation in the risk factor the genetic variants are instruments for; the association between genetic variants and i.e. plasma glucose may be strong in the fasting state, but Downloaded from by guest on 23 February 2023 3. Selected genetic variants must be of a reasonable frequency, as allele frequency determines the size of subgroups to compare within the population. Also, rare variants often have large effect sizes on risk factors, while common variants often have small effect sizes. 1196 M. Benn and B.G. Nordestgaard Vercal pleiotropy Horizontal pleiotropy Balanced Unbalanced: Posive bias Unbalanced: Negave bias SNPs SNPs SNPs SNPs Negave bias Risk factor 1 Posive bias Posive bias Negave bias Risk factor Risk factor Risk factor ↑ CVD ↑↑ CVD ↓↑ CVD Risk factor 2 ↑ CVD True relaonship Convenonal MR Egger MR 1.0 1.5 1.0 1.5 1.0 1.5 1.0 1.5 Summary esmate for risk of cardiovascular disease Figure 4 Overview of vertical and horizontal pleiotropy in a Mendelian randomization study. Vertical pleiotropy refers to when genetic variants (red boxes) are associated with more than one risk factor (yellow boxes) under examination on the pathway from risk factor or lifestyle to disease (blue). Black arrows denote the direction of causality. Horizontal pleiotropy refers to when a genetic variant is associated with traits on other pathways that are also causal for the disease under study. When using multiple genetic variants in combination, horizontal pleiotropy can balance out and have no net effect on the association of the exposure and disease risk. Red arrows denote potential horizontal bias. Unbalanced horizontal pleiotropy bias the association between the exposure and the outcome, and the effect estimate from conventional Mendelian randomization can be exaggerated or diminished, depending on the direction of the pleiotropy. Using the Egger MR statistical method can to some extent adjust for this. Figure adapted from White et al.14 and Holmes et al.13 perhaps poor in the nonfasting state, and (v) additive or multiplicative per allele effect of the genetic variant on the risk factor; a multiplicative model may cause an overestimation of a potential causal effect, while an additive model may reduce this bias. Figure 3 shows percent variation in selected risk factors explained by the single strongest .. .. .. .. .. .. .. genetic variant, which represent the combined influence of frequency of the genetic variant and the effect size on the risk factor. As can be seen, the genetic contribution to variation is very high for lipoprotein (a) (27%), while the genetic contribution to for example diastolic blood pressure is low (0.04%). Downloaded from by guest on 23 February 2023 Figure 3 Examples of percent variation in causal and non-causal risk factors for cardiovascular disease explained by genetic variants. Data extracted from Refs. 27, 47, 61, 62, 67, 72, 81, 85, and 93–96. 1197 From GWAS to Mendelian randomization Table 1 A targeted vs. a ‘shotgun’ approach for selecting genetic variants as instruments for Mendelian randomization studies Targeted Shotgun .............................................................................................................................................................................................................................. Method Variants selected using biological knowledge on function. Variants selected from GWAS for a significant association with a biomarker. Setting Often a population study. Often a two sample Mendelian randomization Pros Well-understood function of variants––risk of pleiotropy low. with data from GWAS (case–control) studies. A causal relationship is more plausible if multiple genetic variants are associated with the outcome. Greater study power can be achieved. Validity of genetic instruments can be tested. May contribute new information on a biological pathway. May predict function of a specific pathway and potentially of a drug affecting the pathway. Cons Requires a large study population. Risk of population stratification. Risk of unbalanced pleiotropy. Linkage disequilibrium among variants can be difficult to manage. No information on potential confounders. Ascertainment effect because of case selection. Function of genetic variants may be poorly understood. Weak instruments will bias the causal genetic estimates in the same direction as the observational association between the biomarker or lifestyle factor and the outcome. The magnitude of the bias depends on the strength of the genotype-risk factor association.11 An indication of the strength of an instrument can be gained from the F statistic from the regression of the biomarker or lifestyle factor on the genetic variants (Figure 3).7,12 Strength of instrumental variables can be increased by increasing the sample size of the study; and precision of the instrument can be improved by using more than one genetic variant as instrument, provided that each extra genetic variant explains extra variation in the biomarker or lifestyle factor.11 To have statistical power to be conclusive, Mendelian randomization studies require a large sample size and use of genetic variants observed with a reasonable frequency in the population and with a large effect size on the risk factor. A large sample size can often only be obtained in consortia. 4.2 Pleiotropy of instruments Genetic variants used as instruments in Mendelian randomization should only have effect on a single pathway on which the risk factor or lifestyle exposure of interest lies. If genetic variants used as instruments have effects on other factors than the risk factor under examination pleiotropy may occur; the parallel phenomenon in randomized clinical intervention trials is often referred to as an off-target effect of the intervention. Pleiotropy can in Mendelian randomization be vertical and horizontal.13,14 Vertical pleiotropy occurs when a genetic variant is associated with several risk factors on the same biological pathway from the variant to the disease under examination, and does not invalidate the findings of the study (Figure 4, left). Horizontal pleiotropy occurs when a genetic variant is associated with other pathways which are also causal for the disease of interest and can, if it is unbalanced, systematically bias the estimate of the causal association, either exaggerating or diminishing it13,14 (Figure 4, right). Horizontal pleiotropy can be minimized by careful selection of genetic variants used as instruments and by using several variants in combination to balance out a potential negative or positive bias. .. To reduce risk of horizontal pleiotropy, genetic instruments can .. .. either be selected by a targeted approach using genetic variants with .. .. known biological effect on the biomarker, i.e. variants in the LDLR as .. instruments for high LDL cholesterol or be selected using a ‘shotgun’ .. ... approach including many genetic variants in an attempt to ‘balance .. out’ the pleiotropy. Advantages and potential limitations of the tar.. geted and the ‘shotgun’ approach are listed in Table 1. The targeted .. .. approach has the advantage that selected variants are less susceptible .. to pleiotropy and more likely to reflect an effect of the biomarker via .. .. a specific biological pathway.15 A targeted approach may contribute .. new knowledge on how the effect of a risk factor on risk is mediated .. .. via this pathway and is a very powerful approach for prediction of .. pharmacological effects via specific pathways. Instruments can also be .. .. selected using information from GWAS, simply selecting genetic var.. iants associated with the biomarker of interest. This approach has the .. .. advantage that the statistical power may be higher compared to the .. .. targeted approach; however, many genetic variants are not assigned .. to genes, information on functional effects not necessarily available, .. .. and pleiotropy may be introduced. .. .. .. 4.3 Linkage disequilibrium .. .. Genetic variants located close to each other on a chromosome may be .. in linkage disequilibrium and thus be inherited together. Genetic variants .. .. in linkage disequilibrium do not satisfy Mendel’s second law of random .. assortment and use of several genetic variants in linkage disequilibrium .. .. with each other, may introduce bias in a Mendelian randomization study. .. If the genetic variant under study is associated with the risk factor of .. .. interest, but also with a competing risk factor because of linkage with .. other genetic variants associated with the competing risk factor, a bias .. .. may be introduced with a similar effect to that seen with pleiotropy.10 .. Also, if several genetic variants which are associated with the same risk .. .. factor, but are in linkage disequilibrium, are used, their contributions to .. the causal estimate are correlated and may exaggerate the magnitude of .. . the causal estimate. Downloaded from by guest on 23 February 2023 Linkage disequilibrium between variants manageable. Potential mediators and confounders may be examined. 1198 4.4 Population stratification and ascertainment bias 5. Types of Mendelian randomization studies Mendelian randomization studies can be performed using several different strategies with combinations of either one or two study samples to gain information on gene-risk factor and gene-outcome associations and level of detail on study participants with individual-level, study-level, and summary-level data. The classic study design is a one-sample Mendelian randomization study using individual-level data. This corresponds to the study design shown in Figure 2 (black arrows) and is often carried out using data from a population study. Advantages of this design are that (i) detailed information on potential confounding and mediating factors may be available and can be examined and accounted for; (ii) the assumptions necessary for the validity of the genetic variants can be tested, including testing for potential pleiotropy; (iii) the use of a population with known ethnicity reduces the risk of population stratification; (iv) it can be tested whether an additive or multiplicative per allele model fits the data best, resulting in more precise causal estimates; and (v) valuable information on the observational association between the risk factor and the outcome may be included (Figure 2, black arrow #1). Two variants of the classic one-sample Mendelian randomization study using individual-level data are the two-step Mendelian randomization study where it is tested whether the effect of the biomarker or lifestyle factor under examination is mediated through other measured factors on a causal pathway; and the bi-directional Mendelian randomization to assess the direction of causation.10,17 It is often claimed that a potential limitation of the classic design is that the genotype-risk factor and genotype-outcome associations are correlated since they are obtained using the same individuals, and that this may bias a causal effect of the biomarker in the same direction as the observational estimate (Figure 2, arrows #2 and #3). However, use of the classic design in a large homogenous study cohort with its many other advantages including low risk of chance findings will out weight this minor issue. Mendelian randomization can also be performed using study-level data where causal estimates from several studies are meta-analysed, increasing the statistical power of the combined study. Potential limitations are similar to the limitations of conventional meta-analyses, which are publication bias, inclusion of small studies which tend to show larger causal effects, and heterogeneity among studies included. Also, variation in the risk factor or lifestyle factor under examination and in the .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . definition of endpoints between studies and affecting the causal estimate should be taken into account. Another study design is a two sample Mendelian randomization study using two independent samples (Figure 2, grey arrows #2 and #3). This design can be used on individual level data, but is often carried out using summary level data with a beta-coefficient and the standard error of the mean from the regression of the risk factor on the genotype from one study, and similar data for the regression of the outcome on the genotype from another study. Advantages of this study design are that (i) that causality can be inferred without information on the observational association (Figure 2, #1), (ii) the genotype-risk factor and genotype-outcome associations are not correlated and a bias will be in the direction of the null hypothesis, and (iii) data may be collected from very large GWAS where information on the genotype-outcome association comes from case–control studies with more cases than population studies, and thus a high statistical power; however, genetic variants identified in GWAS are common and often have small phenotypic effects, thus potentially introducing bias because of weak instruments. Requirements for the two-sample Mendelian randomization are that the samples included must not be overlapping, should be of similar age and gender distribution and ethnicity; and genetic variants should be completely independent and thus not in linkage disequilibrium.17,18 Limitations of the two sample Mendelian randomization study are that data from GWAS often use a case–control design, where selection of cases and controls may have introduced ascertainment bias and inclusion of individuals from several populations may introduce population stratification.16 Also, genetic variants genotyped using large chip genotyping platforms, have been included on the specific platform for a reason, i.e. a high likelihood for being associated with cardiovascular disease. This may potentially invalidate the data for other uses and definitely increases the likelihood for observing a causal effect for the outcome genetic variants were selected to detect. Ideally, publications should present information both from individuallevel data of own studies and if available, combined with summary-level data. This combination may provide information on confounding factors and mediating factors in biological pathways, with a high statistical power. 6. Extending the Mendelian randomization design Due to the complex linkage disequilibrium structure in the genome, an association between a genetic variant and a risk factor or disease identified in a GWAS does not necessarily mean that the genetic variant per se is causal or functional in the disease pathway. The observed association can either be due to causality, linkage disequilibrium with another causal variant, or pleiotropy of the variant. In the classic Mendelian randomization study, the genetic variants are simply used as un-confounded instruments for a risk factor of life-style, and whether the variants used are causal in the disease pathway, or simply in linkage disequilibrium with a causal gene, is irrelevant for the study assumptions. In a relatively new approach, the two sample summary-level Mendelian randomization design is applied to integrate summary-level data from GWAS including information on the effect of genetic variants (Z) on a trait (Y), combined with data on the effects of the same genetic variants (Z) on expression levels (X) from expression quantitative trait loci studies with genetic variants influencing expression levels of genes (eQTL),19 to identify genes with expression levels which are associated with a risk factor or disease (Figure 5, left panel). This may help identify the specific gene (or cis-regulatory variant) causal for an association Downloaded from by guest on 23 February 2023 Population stratification occurs when there is genetic heterogeneity in the population under study, dividing the population into subgroups. This may occur if the population includes individuals of different ethnic origin or if there is non-random mating within the population. Presence of population stratification may be indicated by a departure of allele frequencies from the Hardy–Weinberg expectations; if such a departure is observed, it is however important to first exclude whether this is due to genotyping errors, as this is the most common cause. Ascertainment bias may occur if data from case–control studies like those often employed in GWAS are used. If cases included in a GWAS have been selected to minimize phenotypic heterogeneity or to focus on extreme cases, there will have occurred an enrichment of disease predisposing alleles in the study cohort,16 and causal estimates may be inflated. Risk of ascertainment bias can be reduced by using data from general population studies. M. Benn and B.G. Nordestgaard 1199 From GWAS to Mendelian randomization Causality Risk factor/ Phenotype (Y) Transcripon Risk factor GWAS Causal variant Pleiotropy Transcripon/ Expression (X) eQTL Transcripon Risk factor AA Aa aa Linkage Transcripon Risk factor Genotype (Z) Causal variant Causal variant 1 Causal variant 2 Figure 5 Association between gene expression and a risk factor through genotypes. A model of causality where a difference in genotype is mediated through gene expression (transcription), left panel; and three possible explanations for an observed association between a risk factor and gene expression through genotypes, right panel. Reproduced with permission from Zhu et al.20 observed in a GWAS.20,21 If the expression level of a gene (X; in the classic design corresponding to the risk factor) is influenced by a genetic variant (Z) (the eQTL), there will be differences in gene expression levels among individuals carrying different genotypes of the genetic variant (Figure 2, #2, and Figure 5, right panel). The novel approach tests whether a mediated effect (X) of a genetic variant (Z) on a trait (Y) identified in a GWAS is most likely due to causality through the expression level of a gene (X), or due to linkage or pleiotropy (Figure 5, right panel). Using this approach, novel genes causally associated with coronary artery disease (EIF2B2 and ATP5G1), HDL cholesterol (GPR146), and LDL cholesterol (ERAL1) have been identified.21 The Mendelian randomization design has also been extended to examine heritable changes in gene function not due to changes in DNA sequence (epigenetics), but due to for example methylation of DNA; and to include information on metabolomics and proteomics. Recently, the MR-Base collaborators have developed an online analytical platform incorporating data from 1094 GWAS on disease and other complex traits, making it possible to infer causal relationships between phenotypes, bypassing the need for individual-level genotype or phenotype data (MR-Base, The homepage host an online version to directly perform two-sample summary-level Mendelian randomization and provides links to analytical tools implemented in R. 7. Information on cardiovascular disease gained from Mendelian randomization Multiple GWAS on cardiovascular disease have identified common genetic variants in very large case–control studies. GWAS have also examined genetic determinants of conventional risk factors for cardiovascular disease, including lipids and lipoproteins, markers of .. .. inflammation, haemostasis and thrombosis, arterial wall function, metab.. olism, antioxidants, and lifestyle factors. Combined these studies have .. .. added to our understanding of cardiovascular disease pathogenesis and .. disease pathways, and in many cases form the basis of Mendelian ran.. .. domization studies. Results from some of the Mendelian randomization .. studies are summarized below and in Table 2 and Figure 6. .. .. .. .. .. 7.1 Lipids and lipoproteins .. Mendelian randomization has been used extensively to examine the .. .. causal role of lipids and lipoproteins. LDL cholesterol,14,23–26 .. triglyceride-rich lipoproteins27,28 (reviewed in Refs.29–31), and lipopro.. .. tein (a)32,33 (reviewed in Ref. 34) concentrations have been shown to be .. causally associated with higher cardiovascular disease risk, while .. .. Mendelian randomization studies on high-density lipoprotein (HDL) .. cholesterol have failed to show a causal role for HDL cholesterol in car.. .. diovascular disease risk.14,35,36 High HDL cholesterol concentration is .. .. associated with reduced risk of cardiovascular disease in observational .. studies, but this may be due to confounding by other factors (i.e. physical .. .. activity, obesity or diabetes) or with low concentration of triglyceride.. rich lipoproteins inversely correlated with HDL cholesterol concentra.. .. tion (Figure 1, left panel and corresponding to Figure 2, #1). Several .. Mendelian randomization studies have addressed the association of both .. .. low (via the LCAT35 and ABCA137 genes) and high (LIPC38–40) HDL choles.. terol concentration and risk of cardiovascular disease and found no .. .. causal effect of HDL cholesterol on risk using genes only associated with .. HDL cholesterol. A large individual-level Mendelian randomization study .. .. used a rare variant, minor allele frequency 2.6%, in the endothelial lipase .. gene LIPG consistently associated with a 0.14 mmol/L higher HDL choles.. .. terol concentrations, but not with LDL cholesterol or triglycerides as .. instrument in a Mendelian randomization study (corresponding to #2 in .. .. Figure 2). The increase in HDL cholesterol was predicted to translate .. into a 13% lower risk of coronary heart disease, but in 20 913 cases and .. . 95 407 controls, the observed odds ratio was 0.99 (95% confidence Downloaded from by guest on 23 February 2023 Causal variant 1200 M. Benn and B.G. Nordestgaard Table 2 Identification of strongest genetic instruments for Mendelian randomization studies through candidate gene approach and genome-wide association studies Biomarker or lifestyle factor Genetic instrument identified by Effect on cardiovascular disease examined in ........................................................ ................................................................................ Candidate gene approach Candidate gene approach GWAS approach GWAS approach MR approach .............................................................................................................................................................................................................................. Lipids and lipoproteins LDL cholesterol þþþ94,97–99 105–107 þþþ100,101 100 Triglyceride-rich-lipoproteins Lipoprotein(a) þþþ þ110 þþþ þ111 HDL-cholesterol þ113 þþþ100,101 þþþ þ112 þþþ102,103 109 þþþ27,28 þþ32 þ114 – –35–37,40 þ115 116,117 þ50 þþ C-reactive protein Type 1 interferon – þ118 YKL-40 þ96 þþþ23–26,94,104 þþþ þ33 þ115 Interleukin-6 receptor Ceruloplasmin Pentraxin 3 108 54 –47,49 –118 þþþ –96 120 þ 120 þ –119 –120 Haemostasis and thrombosis Factor V Factor VII þ51 þ51 þ51 –51 Prothrombin þ51 þ51 52 Plasminogen activator inhibitor type 1 Platelet glycoprotein receptor GPIa, GPIba and GPIIIa þ þ51 þ53 –51 Fibrinogen þ54 –55 þ56 þþ57 Blood pressure and arterial wall Blood pressure Paraoxonase Lipoprotein associated phospholipase A2 Serum type secretory phospholipase A2 þþ121,122 119 –119 58 –123 –59 þ þ Metabolism Non-fasting glucose Type 2 diabetes þ124 þ124 Body mass index þþ61 þ124,125 þ124,125 þ67,68 þ66 þ61,62 63 64 –65 and þ64 þ69 Adiponectin Brain-derived Neurotrophic Factor þ Fetuin-A þ71 –70,71 127 –72,127 Antioxidants Uric acid Bilirubin Vitamin C Extracellular superoxide dismutase 126 128 þ þ 129 –94 þ 95 –95 þ74 þ þ74 Lifestyle factors Smoking Alcohol intake þ þþ76,130 þ77 Milk intake þ78 –78 79 –79 Coffee intake Other þ130 77 þ þ86 Telomere length 131 Height Homocysteine þ þ89 25-hydroxyvitamin D þ54 133 Cystatin C Chronic kidney disease þ þ135 Iron þ137 Celiac disease 138 þ 132 þ þþþ85–87 þ88 –89 –81 133 þ þ136 –134 –136 þ90 and –91 –138 Illustrative examples of biomarkers and lifestyle factors tested for causal effect on risk of cardiovascular disease using Mendelian randomization. –, No evidence for a causal effect of the risk factor; þ, some evidence; þþ, firm evidence; þþþ, very strong evidence of a causal effect of the risk factor on cardiovascular disease risk. Downloaded from by guest on 23 February 2023 Inflammation Interleukin-1 receptor agonist þþþ94,97–99 1201 From GWAS to Mendelian randomization interval: 0.88–1.11), suggesting that a LIPG mediated increase in HDL cholesterol also does not reduce risk of coronary artery disease40 (corresponding to #3 and #4 in Figure 2). Another large study used 140 genetic variants in a summary-level two sample Mendelian randomization study combining information on the association of the genetic variants on HDL cholesterol concentration in 188 577 individuals from the Global Lipids Genetics Consortium (Figure 2, #2) with information on the association of the variants on risk of coronary artery disease from Coronary Artery Disease Genome-wide Replication and Meta-analysis plus Coronary Artery Disease Genetics comprising 63 746 cases with coronary artery disease and 130 681 controls, and observed an odds ratio of 0.95 (0.85–1.06) for a 0.41 mmol/L higher HDL cholesterol14 (Figure 2, #3 and #4). Several studies have also examined loss-of- .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . function variants in the CETP gene potentially mimicking the effect of CETP-inhibition. These variants increase HDL cholesterol concentrations and reduce risk of coronary artery disease, however, also lower LDL cholesterol concentrations, and attribution of the protective effect solely to HDL may not be valid taking the results above into account.40–42 The results summarized above have contributed to the departure from seeing this lipoprotein fraction as a therapeutic target. The Mendelian randomization study design has also been used to predict potential effects of pharmacological intervention on other lipid and lipoprotein concentrations on cardiovascular disease risk and did predict the effect of inhibiting the PCSK9 protein (reviewed in Ref. 15) and to examine side effects of LDL lowering therapies like increased risk of new onset diabetes,14,43,44 and to make the existence unlikely of other Downloaded from by guest on 23 February 2023 Figure 6 Overview of biomarkers and lifestyle factors examined for a causal effect on risk of cardiovascular disease using the Mendelian randomization design. Biomarkers marked with a red þ have been shown to have a causal effect with higher cardiovascular disease risk; biomarkers marked green - have been examined, but did not show causal effects on risk; and biomarkers marked with yellow  have been examined, but results have been conflicting. None of the biomarkers have shown a protective effect with lower cardiovascular disease risk. All these causal or non-causal risk factors are thought to act on the basic causal mechanism of elevated apolipoprotein B containing lipoproteins in plasma [low-density lipoproteins, triglyceride-rich lipoproteins, and lipoprotein (a)] leading to atherosclerosis and/or thrombosis superimposed on existing atherosclerosis. 1202 potential side effects of LDL lowering like cancer development45 and Alzheimer’s disease, vascular dementia, any dementia, or Parkinson’s disease.46 7.2 Inflammation 7.3 Haemostasis and thrombosis Studies on haemostasis and thrombosis and risk of cardiovascular disease have shown that Factor V, a cofactor in converting prothrombin to thrombin by Factor Xa, and prothrombin are causal factors for cardiovascular disease risk, but not Factor VII. Also, plasminogen activator inhibitor type 1 (PAI-1) inhibiting the activation of tissue-type plasminogen activator and thus inhibiting fibrinolysis has a causal effect with increased cardiovascular disease risk. While platelet glycoprotein receptor GPIa, a receptor for binding of platelets to collagen, and GPIba and GPIIIa, both receptors for von Willebrand factor and thrombin, and thus pro-coagulant proteins have not been shown to have causal effects on cardiovascular disease risk.51–55 7.4 Blood pressure and arterial wall Several GWAS have identified genetic variants associated with both systolic and diastolic blood pressure.56 The variants identified only explain a small proportion of the variation in blood pressure, but despite this, they have shown large effects on cardiovascular disease risk.57 Lipoprotein-associated phospholipase A2 (Lp-PLA2) is a marker of inflammation that accumulates in unstable atherosclerotic plaques. A genetic variant in the PLA2G7 gene disrupting the function of Lp-PLA2, mimicking the effect of Lp-PLA2 inhibitor darapladib, has failed to show a causal effect on cardiovascular disease risk.58 Serum type secretory phospholipase A2 (sPLA2-IIa) hydrolyses phospholipids on lipoprotein particles, leading to an increased binding of LDL particles to proteoglycans in the arterial wall potentially accelerating atherosclerosis. However, genetic variants in the PLA2G2A gene leading to lower concentrations and activity of sPLA2-IIa and mimicking the effect of sPLA2-IIa inhibitor varespladib did not have a causal effect on cardiovascular disease risk.59 Paraoxonase is a family of antioxidative enzymes hydrolyzing lipid peroxidases and preventing oxidation of LDL particles. Paraoxonase 1 has been suggested to be responsible for the HDL particles antioxidative properties and their ability to efflux and exchange cholesterol. Genetic variants in the PON1 gene, associated with lower plasma concentrations and activity of paraoxonase 1, have not shown causal effects on cardiovascular disease risk.60 7.5 Metabolism Body mass index has been shown to have a causal effect on cardiovascular disease risk,61 a risk that in part is mediated through elevated levels of non-fasting remnant and LDL cholesterol, and blood pressure.62 .. .. Adiponectin which is reduced in obese individuals and inversely associ.. ated with insulin sensitivity, has in two individual-level Mendelian ran.. .. domization studies been shown to have a causal effect on cardiovascular .. disease risk,63,64 but not in a large summary-level study.65 .. Type 2 diabetes has consistently been associated with cardiovascular .. .. disease in observational and in Mendelian randomization studies66 and .. .. plasma glucose concentrations have also been shown to contribute cau.. sally to this risk.67,68 .. Circulating brain-derived neurotrophic factor is involved in regulation .. .. of body weight, physical activity, and endothelial integrity.69 Mice with .. .. loss-of-function mutations in the BDNF gene encoding brain-derived .. neurotrophic factor have an up to 50% higher food intake compared to .. .. wild type mice. In a Mendelian randomization study in humans high .. brain-derived neurotrophic factor had a causal effect with lower cardio.. .. vascular risk.69 Whether this is due to a direct effect on the endothelium .. or an effect mediated through body mass is not known. .. .. Fetuin-A is a glycoprotein involved in free fatty acid induced insulin .. resistance and an inhibitor of vascular calcification. A causal effect of .. .. fetuin-A has been examined in two Mendelian randomization studies .. with conflicting results; one showing that low concentration of fetuin-A .. .. has a causal effect with increased cardiovascular disease risk70 and a .. .. meta-analysis of seven studies observing no consistent association with .. cardiovascular disease risk.71 .. .. .. .. 7.6 Antioxidants .. Despite general belief that oxidation of low-density lipoproteins is .. .. essential for development of atherosclerosis and that antioxidants in .. consequence will protect against cardiovascular disease, Mendelian ran.. .. domization studies of high plasma levels of naturally occurring antioxi.. dants like uric acid and bilirubin have not supported this hypothesis.72,73 .. Extracellular superoxide dismutase (ecSOD) is an antioxidative .. .. enzyme found in the arterial wall. To maintain its function, ecSOD is .. .. bound to external membrane of endothelial cells. A genetic variant in .. the ECSOD gene which reduces the binding of ecSOD to the endothelial .. .. cell leads to 10-fold and 40-fold increased plasma levels in heterozygotes .. and homozygotes, respectively, and disrupts the protective effects of .. .. ecSOD, appeared to have a causal effect with increased cardiovascular .. disease risk,74 although most pronounced in individuals with diabetes.75 .. .. .. .. 7.7 Lifestyle factors .. Mendelian randomization studies has confirmed that genetic variation .. .. increasing smoking amount and extent is a cause of cardiovascular dis.. .. ease risk76 and that genetically low alcohol intake is causally associated .. with less coronary heart disease.77 .. Also, despite a general belief that high milk intake leads to higher cho.. .. lesterol levels and therefore to more cardiovascular events, use of genet.. .. ically determined lactose intolerance in whites leading to lower milk .. intake has not been able to document a causal relationship between milk .. .. intake and cardiovascular disease risk.78 Finally, high coffee intake is .. strongly associated with low risk of cardiovascular disease and mortality. .. .. Surprisingly however, genetically higher coffee intake in Mendelian ran.. domization studies found no causal relationship to cardiovascular disease .. .. risk,79 or to strong cardiovascular risk factors like diabetes, obesity, or .. the metabolic syndrome.80 .. .. .. .. 7.8 Other risk factors .. Despite very strong epidemiological evidence for an association .. . between low plasma vitamin D levels and increased cardiovascular Downloaded from by guest on 23 February 2023 High C-reactive protein (CRP) concentration is associated with increased risk of cardiovascular disease in observational studies47,48 and treating high risk patients with statins reduces CRP concentrations. However, several very large Mendelian randomization studies have failed to show a causal effect of CRP on cardiovascular disease risk,47,49 suggesting that pharmacological inhibition of CPR may not reduce risk of disease. Interleukin 6 has mainly pro-inflammatory effects, and reduced binding of interleukin-6 to the interleukin-6 receptor due to genetic variation in the IL6R gene, mimicking the effect of tocilizumab a monoclonal IL-6 antibody used to treat rheumatoid arthritis, has a causal effect with reduced cardiovascular disease risk.50 M. Benn and B.G. Nordestgaard 1203 From GWAS to Mendelian randomization 8. Perspectives Mendelian randomization studies often use information gained through genome-wide association studies, and combined these two study designs have provided novel information in cardiovascular medicine over the past more than 10 years. Much information on cardiovascular disease causality, pathogenesis, prevention, and treatment has been gained from Mendelian randomization studies; however, it requires extensive knowledge about human genetics, interaction between genes, gene function and regulation to select valid genetic variants to use as instruments in Mendelian randomization studies and to interpret the results appropriately. Conflict of interest: none declared. References 1. Terwilliger JD, Goring HH. Gene mapping in the 20th and 21st centuries: statistical methods, data analysis, and experimental design. Hum Biol 2000;72:63–132. 2. 