Abstract
Free full text
QTL mapping in outbred populations: successes and challenges
Abstract
Quantitative trait locus (QTL) mapping in animal populations has been a successful strategy for identifying genomic regions that play a role in complex diseases and traits. When conducted in an F2 intercross or backcross population, the resulting QTL is frequently large, often encompassing 30 Mb or more and containing hundreds of genes. To narrow the locus and identify candidate genes, additional strategies are needed. Congenic strains have proven useful but work less well when there are multiple tightly linked loci, frequently resulting in loss of phenotype. As an alternative, we discuss the use of highly recombinant outbred models for directly fine-mapping QTL to only a few megabases. We discuss the use of several currently available models such as the advanced intercross (AI), heterogeneous stocks (HS), the diversity outbred (DO), and commercially available outbred stocks (CO). Once a QTL has been fine-mapped, founder sequence and expression QTL mapping can be used to identify candidate genes. In this regard, the large number of alleles found in outbred stocks can be leveraged to identify causative genes and variants. We end this review by discussing some important statistical considerations when analyzing outbred populations. Fine-resolution mapping in outbred models, coupled with full genome sequence, has already led to the identification of several underlying causative genes for many complex traits and diseases. These resources will likely lead to additional successes in the coming years.
quantitative trait locus (QTL) mapping in rodents has been a successful strategy for identifying regions of the genome that play a role in many diseases and traits. Initial QTL studies used F2 intercross or backcrosses, in which two strains that differ phenotypically and genetically are bred for two generations (see Fig. 1). With this strategy, F2 progeny are phenotyped and genotyped, and genetic loci linked to the trait of interest are identified. Although this method works extremely well for detecting QTL, it provides only a broad localization, with identified regions tending to be very large (30–40 Mb) and often encompassing hundreds of genes. As a result, although F2 intercrosses and backcrosses have identified thousands of QTL over the past 30 yr, they have proven less useful for identifying the underlying causative genes and variants (see Ref. 30).
In an attempt to narrow the QTL regions and identify causative genes, multiple methods have been developed (see Ref. 32). One popular method is to create a congenic strain in which consecutively smaller portions of the QTL from the disease strain are bred onto the genetic background of the disease resistant strain. This strategy has successfully narrowed numerous QTL and has the potential to lead to underlying causative genes (e.g., 7, 43, 70, 75). Although this strategy has several advantages, including the ability to test underlying function once a gene has been identified, this method works less well when there are multiple tightly linked QTL with potentially different effects, often resulting in loss of phenotype when the QTL is moved onto the resistant background and subsequently narrowed.
An alternative approach is to perform linkage disequilibrium mapping in highly recombinant, or outbred, populations. The purpose of the current review is to discuss how highly recombinant rodent models can be used to unravel the underlying genetic basis of complex traits. Such models are advantageous because of the segregation of multiple alleles with relatively low linkage disequilibrium, allowing fine-mapping of many traits to only a few Megabases or less. In the following review, we discuss outbred models with known ancestry such as advanced intercross (AI) populations (created by breeding two inbred strains for many generations) (19) as well as heterogeneous stocks (HS) and the diversity outbred (DO; created by breeding eight inbred strains for many generations) (80, 88). These models are based on breeding strategies as depicted in Fig. 1 and summarized in Table 1. We also discuss the use of outbreds with unknown ancestry such as commercially available outbred (CO) populations (96). We follow this with a discussion of strategies that can be used to identify candidate genes that underlie these QTL and end this review by briefly discussing some of the cautions and concerns regarding the statistical analysis in highly recombinant animal populations.
Table 1.
Model | Description | Advantages | Disadvantages |
---|---|---|---|
Advanced intercross (AI) | breeding of two inbred strains for many generations | • mapping resolution is a few Mb | • number of phenotypes that can be mapped is limited by founder strains |
• parent genotypes are known | |||
• each marker is informative | |||
Heterogeneous stocks (HS) | breeding of eight inbred strains for many generations | • mapping resolution is few Mb | • large numbers are needed |
• hundreds of different phenotypes can be mapped | • functional follow-up is challenging | ||
• founder strains have been fully sequenced and can be used to identify candidate variants | • statistical challenges including diplotype substitution plots | ||
• the HS rat may be useful for mapping traits that are hard to replicate in the mouse | |||
Diversity outbred (DO) | breeding of eight inbred strains for many generations; founder strains are the same as those used to create the Collaborative Cross (CC); founder strains crossed multiple ways (funnels) | • mapping resolution is a few Mb and likely to improve beyond HS | • large numbers are needed• statistical challenges including diplotype substitution plots |
• more balanced design than HS strategy because multiple funnels were used | |||
• hundreds of different phenotypes can be mapped | |||
• founder strains have been fully sequenced and can be used to identify candidate variants | |||
• ability to tap into CC test of candidate genes | |||
Commercial outbreds (CO) | outbred models sold by commercial breeders | • potential to map to gene-level resolution | • choosing the correct strain becomes important, as not all traits will map within each strain |
• underlying genetic structure is similar to inbred strains, such that full sequence can be imputed |
WHY CONSIDER OUTBRED POPULATIONS FOR QTL MAPPING?
There are several advantages of using outbred, or highly recombinant, animal populations for the purpose of QTL mapping. One of the most obvious is the ability to map within only a few megabases or even less. In contrast to studies conducted in F2 intercross or backcrosses, in which there is 40–60 Mb between recombination events, distance between historical recombination events in outbred models is generally <5 Mb. With large sample sizes, the distance between recombination events within the population (as opposed to within an individual animal) is even smaller, leading to fine-mapping of QTL and rapidly narrowing the number of potential candidate genes in the region. Outbred populations also provide the ability to survey many different genetic combinations, thus minimizing the role of genetic background (as significant QTL are identified on a mix of genetic backgrounds). The large degree of genetic variability also results in extremely large phenotypic variability, such that outbred models can be used to fine-map hundreds of different phenotypic traits. Finally, with increased genetic and phenotypic variability, outbred populations offer a closer approximation to human populations than an F2 or backcross.
One may wonder why such animal models are necessary, particularly with the relative success of human genome-wide association studies over the past several years (91). Despite these successes, however, a large proportion of the heritable variance for most complex traits is still unknown (see Ref. 58). Although several methods are currently underway to identify the missing heritability in humans (e.g., meta-analyses to increase power, deep sequencing to identify rare variants, investigation of gene-gene and gene-environment interactions; see Ref. 38), we argue that genetic dissection of complex traits in animal models will complement the human work and aid in more quickly unraveling the genetic basis of this missing heritability (see also Ref. 66). Animal models offer important advantages over work in humans, including the ability to control both genetics and environment, investigate RNA expression levels in tissues that are not readily available in humans, and immediately test underlying function. Below, we discuss multiple outbred populations that will aid in this discovery.
ADVANCED INTERCROSS
The simplest outbred model is the AI. AI populations, first proposed by Darvasi and Soller (19), are created by breeding two inbred strains for many generations, resulting in the accumulation of many historical recombination events and allowing for much finer resolution mapping than an F2 (see Fig. 1). Studies have been conducted in populations after a moderate number of crosses such F8 (e.g., Ref. 65) or after many more crosses such as F34 (e.g., Refs. 52, 72). Specifically, the LG/J × SM/J AI population has existed for several years and has thus have been through many generations of breeding (24). Because distance between recombinant events decreases with each additional generation, populations developed from a larger number of crosses enables finer mapping resolution (see Ref. 26), with some studies achieving fine-mapping resolution <1 Mb (see Refs. 14, 52). It is also possible to integrate data from an AI with F2 data, thus achieving both power (from the F2) and mapping resolution (from the AI) (14). When analyzing AI populations, it is important to take into account the complex family relationships (14), and this can be done by using mixed modeling approaches within a recently developed R package, QTLRel (13). AI populations have been used to fine-map multiple traits including obesity (26, 64), metabolic syndrome (50), methamphetamine sensitivity (10, 14, 65), among others (e.g., Refs. 25, 42, 51, 52). When coupled with haplotype analysis of the parent strains and/or RNA expression analysis, AI populations have the potential to lead to underlying candidate genes (e.g., Ref. 25). Although there are several advantages to this relatively simple outbred model (i.e., each marker is informative, founder genotypes are known), it is limited by its ability to identify only those QTL that segregate within the two starting populations.
HETEROGENEOUS STOCKS
HS animals are similar to an AI, but instead of starting with only two inbred strains, the population is created by combining eight inbred strains and then outbreeding in a way that minimizes inbreeding (see Fig. 1). After 50 generations of outbreeding, the genetic make-up of the resulting progeny represents a random mosaic of the founding animals, with the average distance between recombination events approaching a single centimorgan (61), enabling the fine-mapping of QTL to only a few megabases. Unlike an AI, the underlying genetic architecture is more complex and the underlying ancestral haplotypes need to be determined prior to analysis. Determining this underlying structure provides increased information from what is obtained from the genotypes themselves and lends to improved genetic mapping (61). This is done by determining ancestral probabilities with a dynamic programming algorithm initially developed by Mott and colleagues (61). Regression modeling is then conducted on the underlying mosaic structure to identify QTL. Similar to AI populations, it is important to take family structure into account to avoid detecting false QTL, and this can be done either through mixed modeling (14, 46) or resample model averaging (87). Once QTL are identified, founder sequence can be used to narrow potentially causative variants (48).
HS populations were initially created as a source for selection studies and have only recently been used for QTL mapping. Initial breeding designs of HS populations were not balanced (i.e., the founder genomes were not represented equally in the resulting population), leading to certain statistical challenges as well as potential loss of alleles in advanced generations. Despite these challenges, HS mice were first used for genetic mapping in 1999 when Flint and colleagues demonstrated the ability to narrow a previously identified QTL (29) to only 0.8 cM (81). Since that time HS mice have been used to fine-map QTL for hundreds of traits to an average confidence interval of only 2.7 Mb (88). In separate studies, the HS mouse has also been used to fine-map traits such as fear (82), ethanol-induced locomotor activity (20) and arthritis (45).
In addition to the HS mice, there is also an HS rat colony. The HS rat colony (N:NIH-HS) was first established by the National Institutes of Health in 1984 to serve as a source of genetically segregating animals for both experimental and selection studies (36). Similar to the HS mouse, the HS rat has been used to map hundreds of traits to an average of 4.5 Mb, with several causal genes being identified (6). In addition to this large study, the N:NIH-HS rat colony has also been used to fine-map traits involved in fear (44) and diabetes (76, 77). Other studies show that the HS rat will be a promising model for mapping kidney-related traits (78), bone fragility (3), drug abuse behavior (71), as well as behavioral and physiological responses to stress (21, 55, 56) and to ethanol (8, 33, 79). Because of the rich history of the rat in behavioral studies, the HS rat will likely prove a useful model for genetic dissection of behaviors that are not easily modeled in the mouse (see Ref. 63). The utility of the HS rat will be enhanced by the recent availability of gene knockouts and other genetic manipulations now available in the rat (47). Full genome sequence is available for founder strains of the HS mouse (48) and HS rat (6) and will be invaluable for identifying causative genes and variants within fine-mapped QTL. Two HS mouse colonies [the Boulder HS (59) and the Northport HS (20)] and two N:NIH HS rat (one at the Medical College of Wisconsin in the U.S. and the other at the Autonomous University of Barcelona in Spain) currently exist.
Despite the clear successes of using HS mice and rats for genetic fine-mapping, one of the disadvantages of the HS strategy is that, as with an F2 intercross, each animal is genetically and phenotypically distinct. Because of this, each time a new study is started, not only does a new group of animals need to be phenotyped, but all animals also need to be fully genotyped. The highly recombinant nature of these populations requires that relatively large number of animals are needed for sufficient statistical power (see Refs. 60, 61, 89), and high-density genotyping platforms are required (94). Although 1–4,000 SNPs are sufficient in an AI (14, 65), 7,000 SNPs or more are needed for analysis in the HS (88). Because of the need for both large numbers and high-density genotyping, these studies have the potential to be both time-consuming and expensive. It is therefore beneficial to gather as much phenotype information as possible from the same group of animals, so that genotyping only needs to be done once, and this information can be used to map multiple traits (e.g., Refs. 6, 80, 88). Because of practical concerns, however, this is not always feasible. Another disadvantage of the HS strategy is that, because each animal is genetically and phenotypically distinct, it is not possible to conduct follow-up functional studies in specific animal population once a candidate region has been identified (as is possible by using, for example, congenic strains). A further disadvantage is that these populations have been created through a single funnel (i.e., combining founder genomes only once), leading to loss of certain alleles and an unbalanced representation of the founder genomes.
COLLABORATIVE CROSS AND DIVERSITY OUTBRED
In an effort to develop a lasting resource in which to model human disease, the mouse community began the process of developing what is now called the Collaborative Cross (CC), a panel of recombinant inbred lines created from eight inbred founder strains (18). Recombinant inbreds (RI) are created by combining genomes from two or more inbred strains followed by brother-sister mating of the progeny to create inbred strains with differing combinations of the parent genomes. RI strains are advantageous because once they are genotyped, this information can be used for all future mapping studies. Prior to the CC, most RI strains were created using only two founder strains. Because of this, loci generally map to relatively large regions, similar to an F2 intercross, although there are some RI strains that have been bred for several generations prior to inbreeding, leading to increased mapping resolution (see Ref. 68). Combining eight founder strains, as found in the CC, also leads to high resolution mapping. Because the CC have only been through 22 generations prior to becoming inbred, however, this resource does not tend to achieve as fine-resolution mapping as what can be accomplished in the HS or DO.
The CC was developed in an effort to not only map complex traits to relatively fine resolution but also to have a resource of inbred strains to support systems level genetic studies (85). The broad idea of this undertaking was that it would be a shared resource in which investigators from many labs would be able to study their phenotype of interest while being able to utilize the full genetic information, as well as transcriptome, proteome, and metabolome information from each strain (16). The CC is made up of five inbred strains and three wild-derived strains, such that it is the most genetically diverse mouse resource available to date. The CC was established by breeding together the eight strains and then inbreeding the G2:F1 generation. The CC have been created from multiple funnels, such that each line has relatively equal contributions from the eight founder strains (18). To demonstrate utility of this resource, the partially inbred pre-CC have been used to map multiple traits ranging from body weight (5) to influenza pathogenesis (27), among others (e.g., Refs. 9, 23, 49, 83). CC lines are being developed in three different locations: U.S. (University of North Carolina), Israel (Tel Aviv), and Australia (Perth) (18). In 2012, 42 fully inbred lines were available, and this number is expected to significantly increase in the next couple of years (93). Genetic information for each inbred line is fully available at: http://csbio.unc.edu/CCstatus/index.py. Although far fewer lines have been developed than originally proposed (see Ref. 16), this resource will be invaluable for follow-up (validation and functional studies) of QTL identified in the DO (see below).
The DO is an outbred panel created from eight inbred strains, similar to the HS. The major difference is that the DO has been created from the same eight founder strains used to create the CC (80). The DO offers several advantages over the HS including an extraordinarily high degree of genetic and phenotypic diversity as a result of the added wild-derived strains. This resource is also being maintained with 175 breeding pairs (80), as opposed to the 40–50 breeding pairs generally used to maintain the HS (94). In addition to the increased diversity that this resource provides, there is the additional advantage of being able to tap into the CC for both validation and further functional studies once a QTL has been identified. To demonstrate the utility of this resource, 150 DO from the G4 and G5 generation were used to fine-map cholesterol levels to only 2 Mb, containing only 11 genes (80). By using complete genome sequence from the founder strains combined with known founder effects at the QTL, a single candidate gene was identified. The DO have also recently been used to fine-map several behavioral traits to only 1–3 Mb, with several potential candidate genes identified (53). The DO are currently maintained at Jackson Laboratories using a randomized breeding strategy to maintain heterozygosity and control genetic drift, with the expectation that mapping resolution will improve with each generation of breeding (17). In addition to the DO, a second HS_CC outbred population has also been created from the CC founders and has been used to look at striatal gene networks (41).
COMMERCIAL OUTBRED
Outbred lines sold by commercial breeders are often maintained using thousands of lines, offering the potential to map to a much finer resolution than the HS or DO (94). Studies in outbred CD1 mice were the first to show that outbred mice exhibit low linkage disequilibrium and high heterozygosity, indicating they would be useful for genome-wide genetic mapping studies (4). Since that time, successful fine-mapping in CO has been conducted for anxiety (97), metabolic traits (99), and gene expression levels (35). To demonstrate the general utility of CO mice for genetic mapping purposes, Yalcin and colleagues (96) assessed genetic architecture in 66 available outbred mouse colonies. They found that many of the strains exhibit low linkage disequilibrium with some strains exhibiting haplotype block sizes <100 Kb, potentially enabling gene-level resolution of mapping studies. In addition, they found that the underlying genetic structure in CO mice is similar to that found in the classical inbred mouse strains, making it possible to impute full genetic sequence of the CO strains from the available sequence of classical inbred strains. As proof of principle, Yalcin et al. (96) conducted a genome-wide mapping analysis in three different outbred strains. Analysis was conducted by haplotype reconstruction, as previously described for the HS and DO populations, in addition to single marker analysis. They were able to fine-map three different traits and identified H2-Ea as the causative gene (as confirmed by complementation analysis) within a QTL identified for CD4+/CD8+ ratio in lymphocytes (96). Although this strategy holds significant promise for gene-level resolution of mapping studies, choice of strain will significantly affect whether or not a QTL is identified and at what resolution. In general, choosing a strain based on low linkage disequilibrium coupled with a high mean allele frequency is likely to provide good mapping resolution for most traits (94). As genome-wide information becomes available for more outbred strains, this information can be used to aid in choosing an outbred strain that harbors specific alleles of interest.
IDENTIFICATION OF CANDIDATE GENES AND VARIANTS UNDERLYING QTL
Outbred populations offer the ability to fine-map QTL to only a few megabases or less. As mentioned previously, successful mapping is dependent on sufficiently dense genotyping of the population. In the studies described above, various platforms have been used including the Mouse Universal Genotyping Array (MUGA), which contains 7,851 SNPs (see Ref. 80). In rat, a high-density Affymetrix single nucleotide polymorphism (SNP) genotyping array (RATDIV) containing 800K SNPs was used (6). For full utility of these outbred resources, genotyping capabilities need to continue to develop. The MegaMUGA, containing 78K SNPs, has recently been developed in mice and will prove useful for analysis of the DO population. Future studies may also implement whole genome light sequencing (see Ref. 67) or genotype by sequencing methods (see Ref. 28).
While fine-mapping to only a few Mb is a large advantage over F2 intercross or backcross strategies, mapping using outbred models frequently does not lead to single gene resolution and additional strategies are needed to identify candidate genes. In the following paragraphs, we discuss the use of founder sequence and expression QTL (eQTL) mapping for this purpose (see Fig. 2). These methods are not unique to outbred models and have been used previously to narrow candidate genes within QTL mapped using F2 intercross or backcross strategies. There are, however, several advantages when applying them to outbred crosses including 1) The QTL itself is often only a few Mb and so contains far fewer genes and 2) multiple alleles within the outbred models can be leveraged to identify candidate variants.
To date, complete genomes have been sequenced in the founder strains of the HS and DO mice (48) as well as the HS rat (6). Relative to the respective reference genomes, >4 million SNPs per strain have been identified in the mouse (48) and >2 million SNPs per strain have been identified in the rat (6), in addition to structural variants and indels. Because a portion of the genome (~15% in mouse and ~12% in rat) could not be mapped back to the reference strains, the number of variants identified is likely much larger than reported (6, 48). The available sequence information can be used in several ways to identify candidate genes and/or variants within a fine-mapped QTL. First, full genome sequence of the founders allows one to conduct a haplotype analysis to identify regions in which founder sequence matches founder allele effects at a particular QTL, thereby narrowing the possible causative region within the QTL. This strategy has been used in the DO mice to identify potential causative SNPs within a 2 Mb QTL for plasma cholesterol (80), as well as in the CC to identify candidate genes for hematological parameters (49).
By coupling founder sequence with relatively dense genotyping of the outbred population, it is possible to impute HS or DO genotypes at all possible SNPs within a QTL. This can then be followed by a merge analysis to narrow the potential causative variants within the QTL (95). In brief, a merge analysis uses probabilistically inferred descent to impute genotypes at unobserved loci and then surveys those multiple imputed SNPs for their association with the phenotype. Using this method, one compares two statistical models: the haplotype model and the allelic model. In the haplotype model, the underlying ancestral probabilities at each SNP are used to run the analysis. This haplotype model is compared with one in which only the alleles for that SNP are used (allelic model). In the allelic model, the founder strain alleles are “merged” into two groups for each diallelic SNP: those containing allele “a” at a locus of interest and those containing allele “b” at this locus (95). Potentially causative variants are those in which the allelic model provides a better fit, that is, explaining the same amount but with far fewer parameters, than the haplotype model (see Refs. 45, 95). This method has proven useful in narrowing the number of causative variants within QTL mapped in HS mice (48) and rats (6). Although powerful, it is important to recognize that this strategy does not always lead to a single causative variant, and additional follow-up is often required. Once identified, potential variants can be prioritized by choosing those that fall within gene coding regions or are within regions of high conservation. Additionally, any gene that falls in close proximity to multiple candidate variants could be considered a high-priority candidate gene. Similarly, any gene that lies far away from any potential candidate variant can be ruled out as a potential candidate (6).
Another strategy that is often used to identify candidate genes is eQTL analysis, in which RNA transcript levels are used as the phenotypic trait and mapped to the genome. In such analysis, choice of tissue is an important consideration as gene expression levels may be altered only in specific tissues (69, 90). By employing eQTL analysis, one gains an understanding of how alterations in the genome affect transcript expression levels, providing further insight into how both DNA and RNA regulate disease. Through eQTL analysis both local (cis-acting) and distant (trans-acting) QTL are identified. It has been shown that identification of cis-eQTL that reside within previously identified physiological QTL can serve as a means for narrowing candidate genes that reside within the region (e.g., Refs. 2, 40, 74). Employing eQTL analysis in highly recombinant populations has the potential to significantly shorten the time-line to identify potentially causative genes, particularly because there are already far fewer genes within the physiological QTL (see Ref. 5). Using outbred models also offers the advantage being able to map with confidence to within only a few Mb of the transcript itself, as shown in HS (39) and CO mice (5, 35). This can be particularly advantageous when mapping trans-eQTL. Once identified, any correlation between the expression trait and the phenotypic trait can be determined, followed by testing for causality (i.e., Ref. 73). Although not specifically applied to outbred models, these strategies have had several successes at identifying causal genes in humans and animal models (e.g., Refs. 12, 98) and will likely be a promising avenue of research in outbred models in future studies.
PROVING CAUSATION OF CANDIDATE GENES
The strategies outlined above are often used in tandem to identify potentially causative players within a QTL. Even with the advantages of using outbred models, once a candidate gene is identified, follow-up studies are needed to confirm or disprove the role of that candidate in the trait. One of the most popular methods used is to study a knockout model. Such methods have been available in the mouse since 1990 (84) and have recently become available in the rat (34). Although popular because of the relative ease of constructing a knockout, it is important to recognize that showing a change in phenotype in a knockout model neither proves nor disproves a causative role of this gene at the QTL (30), particularly because there is no way to create a knockout on the same background in which the QTL was identified. New gene targeting approaches that allow for changes in single base pairs are now being used and offer a more realistic approach than a full gene knockout (see Ref. 1). Other methods such as quantitative complementation can also be used to test a causal role of the gene or variant. Quantitative complementation was first used in Drosophila melanogaster (54) and involves creating four sets of F1s from the following strains: an inbred strain with the susceptibility allele at the QTL, an inbred strain with a protective allele at the QTL, and two pairs of co-isogenic strains (genetic background is the same everywhere except at a single locus) that are either wild type or mutated at the suspected gene (a nice review is found in Ref. 31). The trait is tested in these four F1 strains and analyzed for an interaction (see Refs. 96, 97). Because quantitative complementation relies on obtaining co-isogenic strains, most investigators rely on testing a knockout strain for their gene of interest, at least as an initial step in determining whether or not to follow up a particular candidate. Without the use of more realistic vector-based approaches or a quantitative complementation test, however, it is important for investigators to assess all available information, including expression, sequence, results from the knockout, as well as possible in vitro, studies to assess the potential causative role of a particular gene and/or variant.
STATISTICAL ANALYSIS: CONCERNS AND CAUTIONS
Although genetic mapping in outbred models offer promise for capturing at least some of the missing heritability of complex traits, the analysis using these populations is not simple and care needs to be taken to ensure the analysis is done correctly, thereby avoiding identification of false QTL. In human studies, one of the most reliable methods is to replicate the QTL in additional studies. Because allele frequencies may vary between populations, however, a QTL may not segregate in all rodent populations and therefore may not be detected in all studies. This can occur in outbred colonies created from different founding inbred strains or in outbreds created from the same founders but maintained for many years in separate locations. Thus, care needs to be taken when comparing QTL studies across populations. In the absence of replication studies, the statistical strategies discussed below can be used to provide confidence in identified QTL.
One particularly important statistical issue is the need to address the complex family relationships of outbred models. This can be done using mixed modeling approaches such as EMMA (46), as applied in AI (14), HS (76, 77), and DO (80), or by using resample model averaging (87). QTLRel, an R package recently developed for this purpose, makes it relatively easy to account for family relationships in highly recombinant animal populations, particularly when full pedigree information is known (13). Resample model averaging approaches make use of genome-wide genotypes to determine genetic relatedness directly and may prove advantageous under certain circumstances, particularly when pedigree information is unknown (87).
In addition to the importance of taking into account family relationships, there are several other statistical concepts that investigators should consider when analyzing outbred populations. One consideration is how to determine significance thresholds. Cheng and Palmer (15) recently compared four different methods used in an AI population. They found that as long as an appropriate statistical model (i.e., one that takes into account the complex family relationships) is used, all methods worked relatively well, with gene-dropping decreasing false QTL even when family is not taken into account. Another consideration is how best to determine the confidence interval of the QTL. Many studies use the 1.5 LOD drop method (e.g., Refs. 53, 72, 77). An alternative approach is to use nonparametric bootstrapping (37), in which the QTL is re-estimated under alternative datasets based on the original, with each alternative dataset created by resampling the individuals with replacement (92). Although this method has been shown to be overly conservative (57), it does provide a complementary estimate of how sensitive the localization of the top QTL peak is to resampling, thus providing insight into whether more than one locus may underlie the QTL (see Ref. 77). Accurate determination of diplotype substitution effects is also an on-going statistical challenge in outbred populations. Several papers have looked at the effects of just the founder allele effects within the QTL (53, 80). This has been useful in conducting haplotype analysis and narrowing the region of the QTL. However, within the HS or DO populations, there are in effect 36 possible diplotype combinations, and founder effects account for only eight of these. We (77) have recently published methods that account for all 36 possible diplotype effects, and work in this area is on-going (see Ref. 22).
A final statistical concern is that of statistical power. Although previous power calculations have been run in multi-founder populations and suggest that 1,000–1,500 animals provide sufficient power for mapping QTL explaining 5% of the variance (60, 61, 89), these simulations do not account for the confounding effects of relatedness (e.g., Refs. 14, 87), or marker ascertainment (e.g., Ref. 86) and are therefore likely overstated. Previous studies in the DO have used as few as 150 mice; however, this study provided sufficient power to map only 11 of 113 traits that were measured (80). Studies in HS populations have used >1,000 animals, successfully mapping most traits analyzed (6, 88) and demonstrating the increased power of these studies. To have more accurate power estimates for future studies in these populations, power calculations will need to account for both family structure and polygenes.
CONCLUSION
This review discusses currently available outbred models for QTL mapping. Such models provide several advantages over traditional F2 intercross or backcross strategies, including increased mapping resolution and greater genetic and phenotypic variability. Upon fine-resolution mapping, haplotype analysis or a statistical merge analysis can be used to narrow the region further, and expression QTL mapping can be used to rapidly identify candidate genes. There have been several successes using these strategies. The most recent is identification of 35 causal genes for 31 traits in HS rats (6). Other successes include using AI mice to identify Cskne1 for methamphetamine sensitivity (11) and Cdh11 for regulating femoral morphology (25). HS mice have been used to identify Rgs2 for anxiety (97), and CO mice have been used to identify H2-Ea for CD4+/CD8+ ratio (96). In addition to these examples, in just the past few years, these populations have led to narrowing of candidate genes and identification of potentially causative variants in hundreds of QTL, and we expect that many of these genes will be confirmed in the next few years. Although several statistical challenges still exist (including determination of significance thresholds, confidence intervals, as well as understanding diplotype effects and determining statistical power), we believe that QTL mapping in outbred populations will provide the ability to identify at least some of the missing heritability for complex traits in humans.
AUTHOR CONTRIBUTIONS
Author contributions: L.C.S.W. drafted manuscript; L.C.S.W. edited and revised manuscript; L.C.S.W. approved final version of manuscript.
ACKNOWLEDGMENTS
I thank Dr. William Valdar for helpful comments on a previous version of this review and for assistance in responding to reviewer comments. I also thank Katie Holl for creating Fig. 1 and Dr. Jozef Lazar for assistance in responding to a reviewer comment. Finally, I thank Dr. Jonathan Flint for introducing me to outbred animals for QTL fine-mapping.
REFERENCES
Articles from Physiological Genomics are provided here courtesy of American Physiological Society
Full text links
Read article at publisher's site: https://doi.org/10.1152/physiolgenomics.00127.2013
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc4073892
Citations & impact
Impact metrics
Citations of article over time
Article citations
Identification of novel genetic loci and candidate genes for progressive ethanol consumption in diversity outbred mice.
Neuropsychopharmacology, 49(12):1892-1904, 29 Jun 2024
Cited by: 0 articles | PMID: 38951586 | PMCID: PMC11473901
Environmental enrichment promotes adaptive responding during tests of behavioral regulation in male heterogeneous stock rats.
Sci Rep, 14(1):4182, 20 Feb 2024
Cited by: 1 article | PMID: 38378969 | PMCID: PMC10879139
An exponential increase in QTL detection with an increased sample size.
Genetics, 224(2):iyad054, 01 May 2023
Cited by: 0 articles | PMID: 36974931 | PMCID: PMC10213487
Detection of Selection Signatures in Anqing Six-End-White Pigs Based on Resequencing Data.
Genes (Basel), 13(12):2310, 08 Dec 2022
Cited by: 2 articles | PMID: 36553577 | PMCID: PMC9777694
Intramembranous bone regeneration in diversity outbred mice is heritable.
Bone, 164:116524, 24 Aug 2022
Cited by: 0 articles | PMID: 36028119 | PMCID: PMC9798271
Go to all (32) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Fine mapping and replication of QTL in outbred chicken advanced intercross lines.
Genet Sel Evol, 43(1):3, 17 Jan 2011
Cited by: 28 articles | PMID: 21241486 | PMCID: PMC3034666
High-resolution mapping of gene expression using association in an outbred mouse stock.
PLoS Genet, 4(8):e1000149, 08 Aug 2008
Cited by: 46 articles | PMID: 18688273 | PMCID: PMC2483929
Heterogeneous Stock Populations for Analysis of Complex Traits.
Methods Mol Biol, 1488:31-44, 01 Jan 2017
Cited by: 31 articles | PMID: 27933519 | PMCID: PMC5869698
QTL mapping using high-throughput sequencing.
Methods Mol Biol, 1284:257-285, 01 Jan 2015
Cited by: 17 articles | PMID: 25757777
Review
Funding
Funders who supported this work.
NIDDK NIH HHS (2)
Grant ID: R01 DK106386
Grant ID: R01 DK088975