Open AccessArticle

Genome-Wide Analysis and Genomic Prediction of Chilling Tolerance of Maize During Germination Stage Using Genotyping-by-Sequencing SNPs

Shiliang Cao

^*,

Tao Yu

Gengbin Yang

Wenyue Li

Xuena Ma

and

Jianguo Zhang

Maize Research Institute of Heilongjiang Academy of Agricultural Sciences, Harbin 150086, China

Author to whom correspondence should be addressed.

Agriculture 2024, 14(11), 2048; https://doi.org/10.3390/agriculture14112048

Submission received: 8 October 2024 / Revised: 12 November 2024 / Accepted: 12 November 2024 / Published: 14 November 2024

(This article belongs to the Section Crop Genetics, Genomics and Breeding)

Download

Browse Figures

Figure 1
Violin plots of RGR for AM population. "> Figure 2
Quantile–quantile plot of GWAS result of RGRs under four models. "> Figure 3
Manhattan plot of GWAS of RGRs. The dots mean −log(p) of each markers, the solid line stands for the threshold of significant level and the value is 6.74. "> Figure 4
Comparison of the prediction accuracy of different models for RGR: the red triangle represents the mean of the prediction accuracy, while the black solid line indicates the median of the prediction accuracy, the dot of circle type represents outliers. "> Figure 5
Influence of training population size on the prediction accuracy of the natural population (the red triangle symbol is the mean prediction accuracy, and the black long horizontal solid line is the median prediction accuracy, the dot of circle type represents outliers). "> Figure 6
Influence of number of significant markers and random markers on prediction accuracy to natural population. "> Figure 7
Influence of MAF on prediction accuracy of natural population. (The red triangle symbol is the mean prediction accuracy, and the black long horizontal solid line is the median prediction accuracy, the dot of circle type represents outliers). "> Figure 8
Influence of MAF 0.40–0.50 marker and random marker on prediction accuracy of natural population. The black solid dot represents outliers. ">

Versions Notes

Abstract

Chilling injury during the germination stage (CIGS) of maize significantly hinders production, particularly in middle- and high-latitude regions, leading to slow germination, seed decay, and increased susceptibility to pathogens. This study dissects the genetic architecture of CIGS resistance expressed in terms of the relative germination rate (RGR) in maize through association mapping using genotyping-by-sequencing (GBS) single-nucleotide polymorphisms (SNPs). A natural panel of 287 maize inbred lines was evaluated across multiple environments. The results revealed a broad-sense heritability of 0.68 for chilling tolerance, with 12 significant QTLs identified on chromosomes 1, 3, 5, 6, and 10. A genomic prediction analysis demonstrated that the rr-BLUP model outperformed other models in accuracy, achieving a moderate prediction accuracy of 0.44. This study highlights the potential of genomic selection (GS) to enhance chilling tolerance in maize, emphasizing the importance of training population size, marker density, and significant markers on prediction accuracy. These findings provide valuable insights for breeding programs aimed at improving chilling tolerance in maize.

Keywords:

maize; chilling tolerance; germination; genomic prediction; SNP; association mapping

1. Introduction

Maize is a vital staple and feed crop globally, with chilling sensitivity posing a significant challenge to production, particularly in high-latitude regions. The Northeast region of China experiences cold damage approximately every 3–4 years, with the frequency of severe cold injuries increasing as maize planting areas expand northward and eastward [1]. In the years marked by significant cold damage, maize yields can decline by more than 15% [2]. Chilling injuries during the germination period result in slow germination and emergence, contributing to issues such as seed rot and mold [3]. This phenomenon is a critical factor affecting both the stability of maize yields and the overall quality of the crop in the Northeast, particularly in Heilongjiang Province [4]. As maize cultivation expands northward, the demand for cold-tolerant varieties increases. Chilling damage during germination leads to slow emergence and seed rot, adversely affecting yield stability and quality. Identifying and breeding for chilling tolerance is crucial for enhancing maize resilience in cold-prone areas.

Different scientific research institutions and groups have varying standards for identifying cold tolerance in maize. Depending on the experimental conditions, there are two primary methods for assessing maize cold tolerance during the germination period: field assessment and indoor assessment. In field assessments, early sowing is employed to create low-temperature conditions, evaluating cold tolerance based on the emergence rate and emergence index [5]. For instance, a method of sowing 10 days earlier than the normal period was used to conduct low-temperature tolerance tests on 228 maize inbred lines in Harbin, Keshan, and Heihe in Heilongjiang Province, China. Using emergence rate and emergence index as indicators, six inbred lines (IBB14, LH163, B144, MBPM, CO372, and PHM49) that consistently exhibited cold tolerance across all three locations were identified ultimately [6]. Indoor assessments utilize devices such as artificial growth cabinets to control light, temperature, and moisture conditions. Chilling tolerance is evaluated by comparing the relative values of maize germination time and the germination rate between low and normal temperatures [7]. Since maize is particularly susceptible to cold damage at temperatures of 10 °C or lower, a low temperature range of 10 °C to 12 °C is commonly used for indoor cold tolerance assessments [3].

There is no doubt that using varieties with chilling tolerance is one of the most cost-effective and environmentally friendly approaches to reduce losses from chilling damage. Dissecting the genetic architecture of chilling tolerance through molecular markers will enable breeders to enhance their breeding efficiency by facilitating the introgression of chilling tolerance genes into chilling-susceptible germplasm using marker-assisted selection (MAS) or genomic selection (GS) strategies. Most studies indicate that chilling tolerance in maize during the germination stage is a quantitative trait controlled by multiple genes. In a study on the inheritance of cold tolerance in maize, the germination ability of three warm-loving inbred lines and three cold-tolerant inbred lines, along with their F₁, F₂, and backcross populations were examined. Significant effects of epistasis, as well as additive and dominant genes, on germination ability at low temperatures were found [8]. Another study on the cold tolerance of maize during the field germination period indicates that it is primarily controlled by additive effect [9]. Fraceheboud conducted a quantitative trait locus (QTL) analysis on the traits related to the function of photosynthetic organs under low temperatures and found that a total of eight genomic regions significantly contributed to the expression of the target traits. Ultimately, it was concluded that the key gene(s) for the development of functional chloroplasts in maize under low temperatures are located on chromosome 3, near the centromere [10]. Strigens et al. (2013) [11] employed a genome-wide association analysis to identify 19 SNP molecular markers related to maize seedling growth and chlorophyll fluorescence parameters, which explained 5.7% to 52.5% of the phenotypic genetic variation during the maize seedling growth stage. All these studies indicate that the genetics of cold tolerance during maize germination are quite complex and require a substantial amount of further research.

Genomic selection (GS) represents a revolutionary approach in modern breeding practices, harnessing the power of high-density markers that span the entire genome. This innovative methodology enables the early prediction and selection of desirable traits, thereby shortening generation intervals, enhancing the accuracy of genomic estimated breeding values (GEBVs), and accelerating genetic progress [12]. GS is particularly adept at predicting complex traits characterized by low heritability and challenging measurement criteria, effectively exemplifying the integration of genomic technology into breeding strategies [13]. For instance, Gowda et al. (2018) explored genome-wide selection for resistance to lethal diseases in tropical maize, reporting prediction accuracies of 0.56 and 0.36 for two distinct populations, which underscores the promising potential for both intra- and inter-population predictions [14]. Similarly, Liu [15] et al. (2019) conducted a genome-wide prediction study on the ear rot of CIMMYT tropical inbred maize and found that the genetic relationship between the modeling population and the prediction population significantly influences prediction accuracy, emphasizing the critical role of genetic relatedness in genomic selection outcomes.

Before the application of maize genomic selection (GS), it is essential to optimize the factors that influence the prediction accuracy of GS. The size and composition of the training population not only impacts the prediction accuracy but also affects the associated costs. Generally, prediction accuracy tends to increase with a larger sample size of the training population [16]. Within a certain range, genomic prediction accuracy also improves with an increase in the number of markers [16]. However, this increase is not linear; after reaching a certain threshold, the gains in accuracy may plateau or become negligible [17,18]. This phenomenon may be attributed to the varying effects of markers across different chromosomes. Marker quality issues, such as genotyping errors, a high rate of missing markers, and low marker polymorphism (often indicated by minor allele frequency), are significant contributors to the low prediction accuracy of GS. Furthermore, although GS does not require prior knowledge of the relationship between each marker and the trait, research has shown that markers with various correlationships to traits exhibit different predictive capabilities. Therefore, the random selection of markers covering the chromosome versus the targeted selection of markers associated with specific traits or those with differing effects can lead to substantial variations in predictive performance. This highlights the importance of marker selection as a critical area of focus in GS research. A higher density of genotyping does not necessarily enhance accuracy, and in some cases, subsets of markers can perform better than the complete dataset [19].

In the current study, a natural population with 287 inbred lines was used in association mapping in conjunction with GS using genome-wide GBS SNPs. The main objectives of this study were to (i) elucidate the genetic architecture of chilling tolerance in maize with SNPs, (ii) identify major QTL to chilling using the association mapping approach, (iii) explore the potential of GS for improving chilling tolerance in maize, and (iv) investigate the effects of training population size and marker density on genomic prediction accuracy in maize association panel.

2. Materials and Methods

2.1. Plant Materials, Field Test, and Collecting of Phenotypic Data

An association mapping (AM) population consisting of 287 maize inbred lines was used to genotype and identify the chilling tolerance in the field. Both of the maize inbred lines were from an early maize breeding program in our institute. Before genotyping and phenotyping, these materials were rigorously self-crossed and purified to obtain their seeds with high purity and vigor in 2017.

A field test was used to identify the chilling tolerance of the maize inbred lines in Harbin China in 2018, 2019, and 2020. In order to simulate the chilling stress, the sowing date was adjusted to the day when the soil temperature at a depth of 5 cm depth exceeded 5 °C in succession. The normal sowing time was used as a control. The germination rate (GR) was measured after four weeks. The relative germination rate (RGR) was calculated simply as the ratios of the mean germination rate under low-temperature stress conditions and control conditions, and the ratio was used as indicators for chilling tolerance.

MEATA-R software (v6.0) [20] was used to analyze the multi-location trials using a mixed linear model to estimate the best linear unbiased prediction (BLUP) value of the genotypes and the broad-sense heritability of the RGR of the AM population based on the entry mean within the trials. The mixed linear model was applied as follows:

Y_{i j k} = μ + g_{i} + e_{j} + g e_{i j} + r_{k} e_{j} + ε_{i j k}

where Y_ijk is the target trait (RGR); µ is the overall mean; and g_i, e_j, and g_eij are the effects of the i-th genotype, the j-th environment, and the i-th genotype via j-th environment interaction, respectively. r_kej is the effect of the k-th replication within the j-th environment. ε_ijk is the residual effect of the i-th genotype, the j-th environment, and k-th replication. The genotype is considered as the fixed effect, whereas all the other terms are declared as the random effects.

The broad-sense heritability based on the entry means within the trials was estimated as follows:

h^{2} = σ_{g}^{2} / (σ_{g}^{2} + σ_{g e}^{2} / e + σ_{e}^{2} / e r),

where

σ_{g}^{2}

σ_{ge}^{2}

, and

σ_{e}^{2}

stands for genotypic genotype environment interaction and error variance components, respectively, and e and r stand for the environments and of the replicates within each environment, respectively. Descriptive statistics of the phenotype were conducted with a psych package (R Development Core Team, 2013) and phenotypic data distribution of the RGR for three years, and combination analysis was shown in a violin plot with the violin plot package in the R software (R 4.0.0) (R Development Core Team, 2013).

2.2. Genotypic Data and GWAS

A commonly used GBS protocol was applied in the present study, which was described in the previous studies [21]. The restriction enzyme ApeK1 was used to digest the genomic DNA and then genotyping-by-sequencing libraries were constructed in the 96-plex. GBS was carried out on the Illumina HiSeq2500 platform at the Beijing Genomics Institute (BGI, Shenzen, China). After the sequencing and quality control of the raw data, the clean reads were anchored to a Maize B73 Ref Gen_v4 reference genome with BWA software (v0.7.17-r1188). Single-nucleotide polymorphism calling was carried out using the TASSEL GBS Pipeline, utilizing the GBS 2.7 TOPM ((tags on physical map) file obtained from Panzea (www.panzea.org (accessed on 1 March 2019)) to anchor the reads to the reference genome Maize B73 RefGen_v4 [22,23]. Imputation was conducted using the FILLIN method in TASSEL 5.0 with the parameters for running FILLIN set to their default values, which are described in detail by Swarts et al. (2014) [23]. For each inbred line, 584,935 raw SNPs were called after genotyping. A criterion of MAF (minor allele frequency) greater than 0.05 and missing data rates below 20% was used to filter the raw SNP using the filter function of the TASSEL V5.0 software (Bradbury et al., 2007) [24]. Totally, 276,239 high quality SNPs with an average MAF (minor allele frequency) of 0.26 were obtained and used for genome-wide association analysis (Figures S1 and S2).

A linkage disequilibrium analysis was conducted in TASSEL V5.0 [24] and the average LD decay distance estimated was 7.06 kb (r² = 0.2) (Figure S3) across the 10 maize chromosomes. Four models, namely the General Linear Model (GLM), the mixed linear model (MLM), the fixed and random model circulating probability unification (FarmCPU), and the Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK), were applied for association mapping analysis in the GAPIT (Genome Association and Prediction Integrated Tool-R) package (Version 3), where a kinship matrix along with three principle components was incorporated to avoid spurious associations. The kinship matrix and PCA were generated automatically by the package, GAPIT. The p-value of each SNP was calculated and a Bonferroni formula −log (p/n) was used to adjust the significant threshold value, and n was the number of markers used in AM, p was the significant level and was set at 0.05, and the threshold value was equal to 6.74 with the formula −log (0.05/276,239).

2.3. Genomic Prediction

The phenotype and genotype data for AM were used for the genomic prediction of chilling tolerance in maize to analyze the influence of the statistical model, the training population size, the marker density, and the significant marker on prediction accuracy. The average value of the correlations between the phenotype and the genomic estimated breeding values was defined as the genomic prediction accuracy (rMG). The method of five-fold cross-validation was used with 100 repetitions [25], in which 80% of the samples were randomly selected for predicting the remaining 20% of the samples. The prediction accuracy was the average accuracy of 100 repetitions of the sampling prediction.

2.3.1. The Impact of Statistical Models on Prediction Accuracy

RR-BLUP, G-BLUP, BayesA, BayesB, BayesC, and the random forest model in machine learning were used to conduct genomic prediction studies on the population, and the prediction accuracies of each model were compared. The same number of molecular markers were used for prediction by different models, and the modeling was carried out using five-fold cross-validation. The RR-BLUP model was implemented using the “rrBLUP” package [26], while the G-BLUP, BayesA, BayesB, and BayesC models were implemented using the “BGLR” software package [27]. The random forest model was implemented using the “randomForest” R language package [28]. A five-fold cross-validation scheme with 100 replications was used to generate the training and validation sets to assess the prediction accuracy.

2.3.2. The Effect of Training Group Size on Prediction Accuracy

The training population size ranged from 10% to 90% of the total sample with increments of 10%. For example, 10% of the random samples were used to predict the remaining 90%, 20% were used to predict the remaining 80%, and so on.

2.3.3. The Effect of Marker Density and Quality on Prediction Accuracy

Different marker density gradients were set according to the number of remaining markers after filtering. The impact of minimum allele frequency (MAF) on prediction accuracy was controlled by setting different MAF ranges, including 0.05–0.10, 0.10–0.20, 0.20–0.30, 0.30–0.40, and 0.40–0.50. The models were built based on the minimum number of markers obtained in each quality control range, and the effect of the same number of molecular markers under different MAF conditions on prediction accuracy was compared.

2.3.4. The Effect of Significant Markers on Prediction Accuracy

Using R programming language, significant markers identified through different statistical models in the association analysis of the genotype data were selected for whole-genome prediction. The difference in prediction accuracy between the same number of significant markers and randomly selected markers was compared.

3. Results

3.1. Analysis of Phenotype

Descriptive statistics were conducted with the best linear unbiased prediction (BLUP) values of RGR for 2019, 2020, and 2021 and the combination of them. The lowest and highest RGRs in the field were 26.97% and 81.02% for three years, respectively, both of which occurred in 2020. The range of RGR in 2020 was also the largest among the three years and the value was 54.05%. The lowest standard deviation and range both occurred in 2018, and the values were 5.91% and 27.93%, respectively. There were large differences in the range among the different years and it showed that the RGR of the population was affected by the environments heavily. The skewness was slightly leftward, and the kurtosis was decreased. After calculating with Meta-R for the blup of the RGR, the combined heritability increased to 0.68. The variation in each year of the natural population and the combined variation were shown in the violin plot (Figure 1), as well as the heritability results, which are presented in Table 1.

3.2. Genome Wide Association Mapping

Four different models were employed to conduct a genome-wide association analysis on the BLUP of the relative germination rate (RGR). QQ plots (Figure 2) and Manhattan plots (Figure 3) were generated based on the GWAS results. As shown in Figure 2, the majority of the points in the Generalized Linear Model (GLM) analysis were situated above the diagonal line, indicating a high rate of false positives. In contrast, the mixed linear model (MLM) displayed some points below the diagonal. For the FarmCPU and BLINK models, most of the points aligned closely with the diagonal line, suggesting that these models were more appropriate and reliable for conducting association analyses on the RGR of AM populations.

The Manhattan plot of the GWAS for the relative germination rate (RGR) is presented in Figure 3. In the Generalized Linear Model (GLM), nine molecular markers surpassed the significance threshold of 6.74. Among these, three markers were located on chromosome 1, two on chromosome 3, two on chromosome 5, and one each on chromosomes 6 and 10. In contrast, the mixed linear model (MLM) did not identify any markers exceeding the significance threshold, indicating that the MLM might be more stringent for assessing RGR during seed germination in this study, potentially leading to false negatives. The FarmCPU model identified four significant markers, which were located on chromosomes 1, 3, 6, and 10, respectively. Lastly, the BLINK model revealed three significant markers: one on chromosome 3 and two on chromosome 6.

The molecular markers that exceeded the significance threshold are summarized in Table 2. A total of 16 significant molecular markers were identified across the three models. Notably, the marker CORbnaG.193035 surpassed the threshold in all the models except the mixed linear model (MLM), where it exhibited the highest significance, although it did not exceed the threshold for the MLM. Furthermore, CORbnaG.193035 demonstrated a relatively high phenotypic variation explained (PVE) across the three models: 2.46% in the Generalized Linear Model (GLM), in second place; 7.07% in the FarmCPU model, where it ranked first; and 8.26% in the BLINK model, also ranking first. This marker was located at 144,220,149 bp on chromosome 3, suggesting the potential presence of genes associated with cold tolerance during maize germination. A gene named Zm00001d041935, which was associated with the transcription factor MYB77, was found via a Bioinformatics study and MYB77 was found to be involved in promoting high potassium ion absorption. The minor allele frequency (MAF) of the significant molecular markers ranged from 0.20 (for CORbnaG.32511) to 0.42 (for both CORbnaG.11778 and CORbnaG.45953), with a mean of 0.33. This indicated that the relative abundance of DNA marker polymorphisms might be a crucial factor in identifying the molecular markers associated with specific traits.

3.3. Genomic Prediction of RGR

3.3.1. Influence of Statistical Models to Prediction Accuracy

A genomic prediction of the relative germination rate (RGR) was performed using several models, including rr-BLUP, G-BLUP, BayesA, BayesB, BayesC, and the random forest model (RF). The prediction accuracies of these models were illustrated in a box plot (Figure 4). After conducting 100 repetitions, the average prediction accuracy for rr-BLUP, G-BLUP, BayesA, BayesB, BayesC, and RF were found to be 0.44, 0.40, 0.39, 0.34, 0.32, and 0.34, respectively. The median prediction accuracies (indicated by red triangles in the figure) for each method were 0.43, 0.41, 0.40, 0.35, 0.35, and 0.34, respectively. Notably, the BayesA, BayesB, BayesC, and RF models did not exhibit any outliers but demonstrated relatively lower prediction accuracies, with the BayesC model even yielding negative values. Overall, despite the presence of the four outliers, the rr-BLUP model achieved both the highest mean and median prediction accuracies, indicating superior prediction performance compared to the other models. Consequently, the rr-BLUP model was selected for subsequent studies focused on optimizing population size and marker density.

3.3.2. Influence of Training Set to Prediction Accuracy

The prediction accuracy increased gradually with the enlargement of the training size (Figure 5). As the proportion of the training set for the different models rose from 10% to 90%, the mean prediction accuracy (indicated by the red triangles in Figure 5 and Figure 6) were 0.19, 0.27, 0.32, 0.36, 0.39, 0.41, 0.43, 0.44, and 0.46, respectively. As illustrated in Figure 5, the prediction accuracy stabilized relatively when the training set reached 50%.

3.3.3. Influence of Marker Density and Significant Markers

According to Figure 6, it could be seen that the prediction accuracy of the AM population had almost entered a plateau when the number of random markers reached 3000. To compare the effects of the significant and random markers on the genomic prediction of RGR, we analyzed an equal number of significant markers from different GWAS models alongside random markers (Figure 6). While increasing the marker density generally improved the prediction accuracy for the random markers, this improvement gradually slowed down as the number of markers increased; however, if all the markers were used for prediction, the prediction accuracy of the random markers would be lower than that of any of the significant markers obtained from the GWAS models with the same number of markers. The trend of the prediction accuracy of the significant markers in relation to marker density varied across the four GWAS models and the random markers. When the number of the significant markers was less than 50, FarmCPU and BLINK were ranked among the top two. Taking the case of 50 significant markers as an example, the prediction accuracy of the four models, in descending order, were as follows: FarmCPU (0.75) > BLINK (0.74) > MLM (0.71) > GLM (0.61). When the number of significant markers exceeded 100, the prediction accuracy of the MLM consistently ranked highest among the four models and the random markers, until all the markers were used for prediction. The highest prediction accuracy of the FarmCPU model occurred when the number of the markers was 300, after which the prediction accuracy gradually declined. The prediction accuracy trends of the BLINK model and the GLMs were quite similar; however, the accuracy of the GLM consistently remained below that of the BLINK model.

3.3.4. Influence of Markers Minor Allele Frequency to Prediction Accuracy

According to the filtering based on minor allele frequency (MAF), the corresponding numbers of the markers were as follows: 19,227 for the MAF range of 0.05–0.10, 89,116 for 0.10–0.20, 67,359 for 0.20–0.30, 52,961 for 0.30–0.40, and 47,576 for 0.40–0.50. To assess the impact of the minor allele frequency on the prediction accuracy, an equal number of markers were selected from each frequency range for whole-genome prediction research. Consequently, a total of 19,227 markers were chosen for each frequency range. The prediction accuracies at the same molecular marker density for the five MAF ranges were 0.38, 0.42, 0.43, 0.43, and 0.44, respectively (Figure 7). The markers with a MAF range of 0.40–0.5 exhibited the highest predictive accuracy.

As shown in Figure 7, the same number of random markers (19,277) were selected to compare the difference between the markers with various MAFs. To compare the difference in prediction accuracy among the different number markers within the MAF 0.40–0.50 range, a range of 5 to 3000 random markers was selected and the result was shown in Figure 8. As shown in Figure 8, as the number of markers increased from 5 to 3000, the prediction accuracy of the markers within the MAF range of 0.40–0.50 remained higher than that of the randomly selected markers with the same number. From Figure 8, it can also be seen that the prediction accuracy of the 100 markers with the MAF values between 0.4 and 0.5 was almost equivalent to that of the 3000 random markers. This indicates that selecting markers with a larger minimum allele frequency can increase the prediction accuracy to some extent.

4. Discussion

Weather-related risks have become a crucial and increasingly significant factor influencing crop failures and fluctuations in farmer income [29]. In high-latitude cool regions around the world, chilling injury significantly impedes maize production, leading to slow germination, increased seed decay, and heightened susceptibility to pathogens [30]. Identifying the chilling tolerance of maize germplasm during the germination stage is fundamental to genetic research. In this study, the chilling tolerance was assessed over three consecutive years; however, there were considerable variations in the mean values and heritability of chilling tolerance across the different years. The variations in the mean values might have been sourced from the fluctuation of the weather, especially in temperature. After applying best linear unbiased prediction (BLUP) to analyze the chilling tolerance data from the three years, the heritability improved to 0.68, and the variation in BLUP values was greater than that observed in single-year analyses. This indicated that employing appropriate statistical analysis methods might enhance the effectiveness of data analysis in experiments, thereby improving the overall quality of the research. Thus, it is evident that the choice of statistical analysis methods is as crucial as the experimental design itself.

Chilling stress can influence various physiological and biochemical parameters; therefore, enhancing maize chilling tolerance through genetic modification requires further research to identify the genetic loci responsible for chilling tolerance [3]. Many studies have shown that chilling tolerance during the germination stage of maize is a complex quantitative trait. Diverse conclusions about the genetic loci of the trait have been drawn by different researchers using various materials and methods. Hu (2018) [31] conducted a genome-wide association analysis and identified eight loci associated with cold tolerance on chromosome 2. Among these, three loci were linked to the relative number of days for achieving a germination rate of 50% (RDT50); three were associated with the relative germination index (RGI), and two were related to the relative germination rate (RGR). Zhang et al. (2020) [32] conducted a genome-wide association analysis based on a 56 K chip and identified a total of 30 SNP markers associated with chilling tolerance during corn germination under both controlled and field conditions. These markers were distributed across all ten chromosomes of the maize and markers with the highest significant level found on chromosome 2. Our study utilized four different GWAS models, with all the models except the MLM identifying significant markers, resulting in a total of 16 markers. Among these 16 significant markers, one was detected by all three of the models, and two were identified by two models each, leading to a total of 12 unique significant markers. Among the 12 significant markers, 4 were located on chromosome 1, 2 on chromosome 3, 2 on chromosome 5, 3 on chromosome 6, and 1 on chromosome 10. The marker CORbnaG.193035 was detected by all three of the models and exhibited the highest level of significance and phenotypic variation explained (PVE), indicating its potential importance for future applications. We discovered a nearby gene, Zm00001d041935, which was associated with the transcription factor MYB77. Bioinformatics studies suggest that MYB77 was involved in promoting high potassium ion absorption. This finding provided a promising direction for our future investigations.

Genomic selection has become a practical approach in plant breeding programs due to the decreasing costs of genotyping [12]. Before implementing genomic selection in breeding, it is necessary to study and optimize the various factors that influence prediction accuracy. Despite many new statistical or machine learning models having been invented, rrBLUP is still considered a relatively robust method for genomic selection. Jiang et al. compared the predictive performance of six machine learning models and the rr-BLUP model on an F1 population with a mixed genetic background of different hybrid vigor groups. The rr-BLUP model outperformed all the machine learning models, demonstrating that it is an ideal model in terms of accuracy and efficiency. Our analysis revealed that the rr-BLUP model outperformed other genomic prediction models, achieving a prediction accuracy of 0.44 (Figure 4). This finding aligns with previous research indicating the effectiveness of rr-BLUP in predicting complex traits with low heritability.

Besides statistical analysis methods, this study emphasized the importance of optimizing factors such as training population size, marker density, and marker quality to enhance prediction accuracy [33]. As the size of the training population increased, the prediction accuracy improved and eventually stabilized at around 30–50% (Figure 5). This suggested that larger training sets could provide more robust estimates of genetic effects, thereby improving the reliability of genomic predictions. On the other hand, using 30–50% of the samples as a training population can achieve relatively good predictive accuracy, thus demonstrating the feasibility of genome-wide selection for chilling tolerance during the germination stage in maize, without incurring too many additional costs.

The decrease in genotyping costs is one of the important factors driving the application of genome-wide selection methods in crop breeding. However, the number of samples requiring genotyping in actual breeding processes is almost enormous. Thus, reducing genotyping costs is a crucial consideration for the application of genome-wide selection. Finding ways to reduce the number of markers without compromising prediction accuracy has been a focal point of research in genomic prediction. Marker selection based on the effects of the markers and their associations with phenotypes is also a method to improve prediction accuracy [34,35]. Our results showed that while increasing marker density generally enhanced prediction accuracy, the accuracy became relatively stable when using 3000 to 5000 random markers for prediction. Significant markers consistently outperformed random markers, and the prediction accuracy of the significant markers obtained from different GWAS models showed varying trends with the increase in the number of markers. This finding highlights the importance of selecting high-quality markers that are strongly associated with the trait of interest. After the significant markers have been identified, it is natural for people to explore the predictive effects of these markers for genomic selection. In recent years, Jeong et al. (2020) [36] developed a tool called GMStool to conduct GWAS-based marker selection, aiming to reduce the cost of genotyping. In our study, the analysis of minor allele frequency (MAF) further indicated that markers with higher MAF contributed to enhanced prediction accuracy, suggesting that the careful selection of markers based on their minor allele frequency can optimize genomic selection strategies. According to the comparison of the prediction accuracy of the random markers with high MAF markers, it suggested that 100 markers with the MAF 0.40–0.50 could achieve the same prediction accuracy as 3000 random markers.

5. Conclusions

In conclusion, this study provides valuable insights into the genetic basis of chilling tolerance in maize and underscores the potential of genomic selection as a powerful tool for improving chilling tolerance in breeding programs. By employing association mapping with genotyping-by-sequencing (GBS) SNPs, we successfully identified 12 significant quantitative trait loci (QTLs) on chromosomes 1, 3, 5, 6, and 10, contributing to our understanding of the genetic architecture underlying chilling tolerance of maize. By integrating molecular markers and optimizing prediction models, we can enhance the efficiency of breeding efforts aimed at developing maize varieties that are resilient to chilling stress, thereby improving maize production in cold-prone regions. Future research should focus on validating the identified QTLs and exploring the functional mechanisms underlying chilling tolerance to facilitate the development of targeted breeding strategies.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/agriculture14112048/s1; Figure S1: The marker number of the GWAS panel in a different range of MAF; Figure S2: The LD decay analysis of the GWAS panel; Figure S3: The distribution of SNPs on ten chromosomes of the GWAS panel.

Author Contributions

S.C.: conceptualization, data curation, formal analysis, writing—original draft, and writing—review and editing. T.Y.: data curation, formal analysis, and writing—original draft. G.Y.: investigation, validation, and writing—original draft. W.L.: investigation, validation, and writing—original draft. X.M.: investigation, validation, and writing—original draft. J.Z.: project administration and resource allocation. All authors have read and agreed to the published version of the manuscript.

Funding

The authors declare that financial support was received for the research, authorship, and/or publication of this article. This study was financially supported by the research-operating expenses of the provincial research institutes in Heilongjiang Province (No. CZKYF2023-1-A003), the National Science Foundation of China (No. 31771814), the Heilongjiang Province Agricultural Science and Technology Innovation Leap Project (No. CX23TS02 and No. CX23ZD05), and the Heilongjiang Province Seed Industry Innovation and Development Project (No.2023).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author/s.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shi, F.; Pei, Z.; Wang, S.; Gao, Y.; Lu, B.; Liu, J.; Dong, X. Study of Temporal Change Characteristics of Agrometeorological Chilling Injury and Hail and Tornadoe Disasters for 36 Years in Heilongjiang Province. Heilongjiang Agric. Sci. 2019, 6, 36–39. [Google Scholar]
Ma, S.; Xi, Z.; Wang, Q. Risk evaluation of cold damage to corn in Northeast China. J. Nat. Disasters 2003, 3, 138–141. [Google Scholar]
Ma, Y.; Tan, R.; Zhao, J. Chilling Tolerance in Maize: Insights into Advances—Toward Physio-Biochemical Responses’ and QTL/Genes’ Identification. Plants 2022, 11, 2082. [Google Scholar] [CrossRef] [PubMed]
Cao, S.; Yu, T.; Hu, G.; Wang, C.; Cao, J. Research progress on the identification methods of cold tolerance during maize germination period. China Seed Ind. 2018, 8, 29–33. [Google Scholar] [CrossRef]
Brandolini, A.; Landi, P.; Monfredini, G.; Tano, F. Variation among Andean races of maize for cold tolerance during heterotrophic and early autotrophic growth. Euphytica 2000, 111, 33–41. [Google Scholar] [CrossRef]
Zhang, J.; Wang, Z.; Cao, J.; Shi, G.; Guo, X.L.; Cai, Q.; Yin, Y.; Yu, T. Evaluation and Utilization of Low Temperature Tolerance of Maize Inbred Lines from Different Sources. J. Maize Sci. 2019, 6, 7–12. [Google Scholar]
Bhosale, S.; Rymen, B.; Beemster, G.; Melchinger, A.; Reif, C. Chilling tolerance of Central European maize lines and their factorial crosses. Ann. Bot. 2007, 6, 1315–1321. [Google Scholar] [CrossRef]
Mcconnell, R.L.; Gardner, C. Inheritance of Several Cold Tolerance Traits in Corn. Crop Sci. 1979, 19, 847–852. [Google Scholar] [CrossRef]
Frascaroli, E.; Landi, P. Cold tolerance in field conditions, its inheritance, agronomic performance and genetic structure of maize lines divergently selected for germination at low temperature. Euphytica Int. J. Plant Breed. 2016, 209, 771–788. [Google Scholar] [CrossRef]
Fracheboud, Y.; Ribaut, J.; Vargas, M.; Messmer, R.; Stamp, P. Identification of quantitative trait loci for cold-tolerance of photosynthesis in maize (Zea mays L.). J. Exp. Bot. 2002, 376, 1967–1977. [Google Scholar] [CrossRef]
Strigens, A.; Freitag, N.; Gilbert, X.; Grieder, C.; Riedelsheimer, C.; Schrag, T.; Messmer, R.; Melchinger, A. Association mapping for chilling tolerance in elite flint and dent maize inbred lines evaluated in growth chamber and field experiments. Plant Cell Environ. 2013, 10, 1871–1887. [Google Scholar] [CrossRef] [PubMed]
Xiang, Y.; Xia, C.; Li, L.; Wei, R.; Rong, T.; Liu, H.; Lan, H. Genomic prediction of yield-related traits and genome-based establishment of heterotic pattern in maize hybrid breeding of Southwest China. Front. Plant Sci. 2024, 15, 1441555. [Google Scholar] [CrossRef] [PubMed]
Wientjes, Y.C.J.; Bijma, P.; Calus, M.P.L.; Zwaan, B.J.; Vitezica, Z.G.; van den Heuvel, J. The long-term effects of genomic selection: 1. Response to selection, additive genetic variance, and genetic architecture. Genet. Sel. Evol. 2022, 54, 19. [Google Scholar] [CrossRef] [PubMed]
Gowda, M.; Das, B.; Makumbi, D.; Babu, R.; Semagn, K.; Mahuku, G.; Olsen, M.; Bright, J.M.; Beyene, Y.; Prasanna, B. Genome-wide association and genomic prediction of resistance to maize lethal necrosis disease in tropical maize germplasm. Theor. Appl. Genet. 2015, 128, 1957–1968. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Hu, G.; Zhang, A.; Loladze, A.; Hu, Y.; Wang, H.; Qu, J.; Zhang, X.; Olsen, M.; Vicente, F.; et al. Genome-wide association study and genomic prediction of Fusarium ear rot resistance in tropical maize germplasm. Crop J. 2021, 9, 325–341. [Google Scholar] [CrossRef]
Cao, S.; Loladze, A.; Yuan, Y.; Wu, Y.; Zhang, A.; Chen, J.; Huestis, G.; Cao, J.; Chaikam, V.; Olsen, M.; et al. Genome-Wide Analysis of Tar Spot Complex Resistance in Maize Using Genotyping-by-Sequencing SNPs and Whole-Genome Prediction. Plant Genome 2017, 10, 1–14. [Google Scholar] [CrossRef] [PubMed]
Zhang, A.; Wang, H.; Beyene, Y.; Semagn, K.; Liu, Y.; Cao, S.; Cui, Z.; Ruan, Y.; Burgueño, J.; San, V.F.; et al. Effect of Trait Heritability, Training Population Size and Marker Density on Genomic Prediction Accuracy Estimation in 22 bi-parental Tropical Maize Populations. Front. Plant Sci. 2017, 8, 1916. [Google Scholar] [CrossRef]
Cao, S.; Song, J.; Yuan, Y.; Zhang, A.; Ren, J.; Liu, Y.; Qu, J.; Hu, G.; Zhang, J.; Wang, C.; et al. Genomic Prediction of Resistance to Tar Spot Complex of Maize in Multiple Populations Using Genotyping-by-Sequencing SNPs. Front. Plant Sci. 2021, 12, 672525. [Google Scholar] [CrossRef]
e Sousa, M.B.; Galli, G.; Lyra, D.H.; Granato, I.S.C.; Matias, F.I.; Alves, F.C.; Fritsche-Neto, R. Increasing accuracy and reducing costs of genomic prediction by marker selection. Euphytica 2019, 215, 18. [Google Scholar] [CrossRef]
Alvarado, G.; Rodríguez, M.F.R.; Pacheco, A.; Burgueño, J.; Crossa, J.; Vargas, M.; Pérez-Rodríguez, P.; Lopez-Cruz, M.A. Meta-r: A software to analyze data from multi-environment plant breeding trials. Crop J. 2020, 8, 754–756. [Google Scholar] [CrossRef]
Elshire, R.J.; Glaubitz, J.C.; Sun, Q.; Poland, J.A.; Kawamoto, K.; Buckler, E.S.; Mitchell, S.E. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 2011, 6, e19379. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Glaubitz, J.C.; Casstevens, T.M.; Lu, F.; Harriman, J.; Elshire, R.J.; Sun, Q.; Buckler, E.S. TASSEL-GBS: A high capacity genotyping by sequencing analysis pipeline. PLoS ONE 2014, 92, e90346. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Swarts, K.; Li, H.; Navarro, J.A.R.; An, D.; Romay, M.C.; Hearne, S.; Acharyae, C.; Glaubitze, J.C.; Mitchelle, S.; Elshireg, R.J.; et al. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome 2014, 7, 175–177. [Google Scholar] [CrossRef]
Bradbury, P.J.; Zhang, Z.; Kroon, D.E.; Casstevens, T.M.; Ramdoss, Y.; Buckler, E.S. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 2007, 23, 2633–2635. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Gowda, M.; Liu, W.; Würschum, T.; Maurer, H.P.; Longin, F.H.; Ranc, N.; Reif, J.C. Accuracy of genomic selection in European maize elite breeding populations. Theor. Appl. Genet. 2012, 124, 769–776. [Google Scholar] [CrossRef] [PubMed]
Endelman, J.B. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 2011, 4, 250–255. [Google Scholar] [CrossRef]
Pérez, P.; de los Campos, G. Genome-wide regression and prediction with the BGLR statistical package. Genetics 2014, 198, 483–495. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Li, Z.; Zhang, Z.; Zhang, J.; Luo, Y.; Zhang, L. A new framework to quantify maize production risk from chilling injury in northeast china. Clim. Risk Manag. 2021, 3, 100299. [Google Scholar] [CrossRef]
Jiang, L.; Gong, L.; Jiang, L.; Li, X.; Cheng, M.; Zhang, X. Chilling injury monitoring and intensity identification of dryland maize in Heilongjiang. J. Sci. Food Agric. 2023, 103, 4573–4583. [Google Scholar] [CrossRef]
Hu, H. Genome-Wide Association Study and Genomic Selection on Chilling During Germination and Seedling Stage in Maize (Zea mays L.). Ph.D. Dissertation, Northeast Agricultural University, Harbin, China, 2018; p. 6. [Google Scholar]
Zhang, H.; Zhang, J.; Xu, Q.; Wang, D.; Di, H.; Huang, J.; Yang, X.; Wang, Z.; Zhang, L.; Dong, L.; et al. Identification of candidate tolerance genes to low-temperature during maize germination by GWAS and RNA-seq approaches. BMC Plant Biol. 2020, 20, 333. [Google Scholar] [CrossRef] [PubMed]
Jiang, S.; Cheng, Q.; Yan, J.; Fu, R.; Wang, X. Genome optimization for improvement of maize breeding. Theor. Appl. Genet. 2020, 133, 1491–1502. [Google Scholar] [CrossRef] [PubMed]
Resende, M.; Munoz, P.; Resende, M.; Garrick, D.J.; Fernando, R.L.; Davis, J.M.; Jokela, E.J.; Martin, T.A.; Peter, G.F.; Kirst, M. Accuracy of Genomic Selection Methods in a Standard Data Set of Loblolly Pine (Pinus taeda L.). Genetics 2012, 190, 1503–1510. [Google Scholar] [CrossRef] [PubMed]
Hoffstetter, A.; Cabrera, A.; Huang, M.; Sneller, C. Optimizing Training Population Data and Validation of Genomic Selection for Economic Traits in Soft Winter Wheat. G3 2016, 6, 2919–2928. [Google Scholar] [CrossRef]
Jeong, S.; Kim, J.Y.; Kim, N. GMStool: GWAS-based marker selection tool for genomic prediction from genomic data. Sci. Rep. 2020, 10, 19653. [Google Scholar] [CrossRef]

Figure 1. Violin plots of RGR for AM population.

Figure 2. Quantile–quantile plot of GWAS result of RGRs under four models.

Figure 3. Manhattan plot of GWAS of RGRs. The dots mean −log(p) of each markers, the solid line stands for the threshold of significant level and the value is 6.74.

Figure 4. Comparison of the prediction accuracy of different models for RGR: the red triangle represents the mean of the prediction accuracy, while the black solid line indicates the median of the prediction accuracy, the dot of circle type represents outliers.

Figure 5. Influence of training population size on the prediction accuracy of the natural population (the red triangle symbol is the mean prediction accuracy, and the black long horizontal solid line is the median prediction accuracy, the dot of circle type represents outliers).

Figure 6. Influence of number of significant markers and random markers on prediction accuracy to natural population.

Figure 7. Influence of MAF on prediction accuracy of natural population. (The red triangle symbol is the mean prediction accuracy, and the black long horizontal solid line is the median prediction accuracy, the dot of circle type represents outliers).

Figure 8. Influence of MAF 0.40–0.50 marker and random marker on prediction accuracy of natural population. The black solid dot represents outliers.

Table 1. Descriptive statistics of RGR for three years and combination analysis.

Statistical Parameter	2018	2019	2020	Combination Analysis
sample number	287	287	287	287
Mean (%)	49.54	51.18	59.89	53.54
SD	5.91	7.54	11.33	16.34
Median (%)	50.71	51.97	59.95	54.84
Min (%)	32.74	28.89	26.97	6.03
Max (%)	60.66	65.62	81.02	84.41
Range (%)	27.93	36.73	54.05	78.38
Skew	−0.58	−0.52	−0.31	−0.46
Kurtosis	0.07	0.07	−0.40	−0.11
SE	0.35	0.45	0.67	0.96
Heritability	0.34	0.44	0.55	0.68

Table 2. The information of the significant SNPs identified by GWAS.

Model	SNP	Chr.	Position	p Value	MAF *	Effect	PVE * (%)
GLM	CORbnaG.10957	1	27819106	8.92 × 10⁻⁸	0.33	7.60	0.49
GLM	CORbnaG.11778	1	29840462	1.72 × 10⁻⁷	0.42	6.15	0.77
GLM	CORbnaG.32511	1	105975282	1.72 × 10⁻⁷	0.20	6.87	0.46
GLM	CORbnaG.189504	3	129160674	1.72 × 10⁻⁷	0.23	6.17	0.09
GLM	CORbnaG.193035	3	144220149	1.41 × 10⁻⁷	0.41	5.54	2.46
GLM	CORbnaG.312335	5	78150895	1.56 × 10⁻⁷	0.30	5.72	0.69
GLM	CORbnaG.312336	5	78150917	1.56 × 10⁻⁷	0.30	−5.72	0.51
GLM	CORbnaG.375482	6	105660873	1.46 × 10⁻⁷	0.22	7.90	2.78
GLM	CORbnaG.577939	10	138708366	1.32 × 10⁻⁷	0.33	5.87	0.89
FarmCPU	CORbnaG.45953	1	164876718	1.47 × 10⁻⁷	0.42	−3.09	1.26
FarmCPU	CORbnaG.193035	3	144220149	2.54 × 10⁻¹⁶	0.41	5.39	7.07
FarmCPU	CORbnaG.392909	6	163324992	2.89 × 10⁻⁹	0.33	−3.97	2.29
FarmCPU	CORbnaG.577939	10	138708366	1.13 × 10⁻⁹	0.33	4.49	7.13
BLINK	CORbnaG.193035	3	144220149	2.49 × 10⁻⁸	0.41	4.41	8.26
BLINK	CORbnaG.392908	6	163324961	1.69 × 10⁻⁷	0.33	4.58	0.47
BLINK	CORbnaG.392909	6	163324992	1.69 × 10⁻⁷	0.33	−4.58	0.02

* MAF stands for minor allele fequecy; PVE stands for phenotypic variation explained.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, S.; Yu, T.; Yang, G.; Li, W.; Ma, X.; Zhang, J. Genome-Wide Analysis and Genomic Prediction of Chilling Tolerance of Maize During Germination Stage Using Genotyping-by-Sequencing SNPs. Agriculture 2024, 14, 2048. https://doi.org/10.3390/agriculture14112048

AMA Style

Cao S, Yu T, Yang G, Li W, Ma X, Zhang J. Genome-Wide Analysis and Genomic Prediction of Chilling Tolerance of Maize During Germination Stage Using Genotyping-by-Sequencing SNPs. Agriculture. 2024; 14(11):2048. https://doi.org/10.3390/agriculture14112048

Chicago/Turabian Style

Cao, Shiliang, Tao Yu, Gengbin Yang, Wenyue Li, Xuena Ma, and Jianguo Zhang. 2024. "Genome-Wide Analysis and Genomic Prediction of Chilling Tolerance of Maize During Germination Stage Using Genotyping-by-Sequencing SNPs" Agriculture 14, no. 11: 2048. https://doi.org/10.3390/agriculture14112048

APA Style

Cao, S., Yu, T., Yang, G., Li, W., Ma, X., & Zhang, J. (2024). Genome-Wide Analysis and Genomic Prediction of Chilling Tolerance of Maize During Germination Stage Using Genotyping-by-Sequencing SNPs. Agriculture, 14(11), 2048. https://doi.org/10.3390/agriculture14112048

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu