Modeling of genetic gain for single traits from marker-assisted seedling selection in clonally propagated crops

Seedling selection identifies superior seedlings as candidate cultivars based on predicted genetic potential for traits of interest. Traditionally, genetic potential is determined by phenotypic evaluation. With the availability of DNA tests for some agronomically important traits, breeders have the opportunity to include DNA information in their seedling selection operations—known as marker-assisted seedling selection. A major challenge in deploying marker-assisted seedling selection in clonally propagated crops is a lack of knowledge in genetic gain achievable from alternative strategies. Existing models based on additive effects considering seed-propagated crops are not directly relevant for seedling selection of clonally propagated crops, as clonal propagation captures all genetic effects, not just additive. This study modeled genetic gain from traditional and various marker-based seedling selection strategies on a single trait basis through analytical derivation and stochastic simulation, based on a generalized seedling selection scheme of clonally propagated crops. Various trait-test scenarios with a range of broad-sense heritability and proportion of genotypic variance explained by DNA markers were simulated for two populations with different segregation patterns. Both derived and simulated results indicated that marker-based strategies tended to achieve higher genetic gain than phenotypic seedling selection for a trait where the proportion of genotypic variance explained by marker information was greater than the broad-sense heritability. Results from this study provides guidance in optimizing genetic gain from seedling selection for single traits where DNA tests providing marker information are available.


Supplementary materials
= ∑ × =1 (S1) where was the mean phenotypic or genotypic value of the seedling population, was the number of marker genotypes segregating in the seedling population, was the marker genotypic value of the ith marker genotype, was the frequency of the ith marker genotype.
where was the variance explained by marker loci, was the mean phenotypic or genotypic value of the seedling population, was the number of marker genotypes segregating in the seedling population, was the marker genotypic value of the ith marker genotype, was the frequency of the ith marker genotype. genetic gain in the first stage of two-stage seedling selection ∆ 2 genetic gain in the second stage of two-stage seedling selection Comparison between derived and simulated genetic gains from phenotype-only seedling selection for the population with three segregating genotypes without dominance (d3 = 0). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S2 (b) Comparison between derived and simulated genetic gains from phenotype-only
seedling selection for the population with three segregating genotypes with complete dominance (d3 = a3). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S2 (c) Comparison between derived and simulated genetic gains from phenotype-only
seedling selection for the population with nine segregating genotypes. Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S3 (a)
Comparison between derived and simulated optimal genetic gains from markeronly seedling selection for the population with three segregating genotypes without dominance (d3 = 0). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S3 (b)
Comparison between derived and simulated optimal genetic gains from markeronly seedling selection for the population with three segregating genotypes with partial dominance (d3 = a3/2). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S3 (c)
Comparison between derived and simulated optimal genetic gains from markeronly seedling selection for the population with three segregating genotypes with complete dominance (d3 = a3). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S3 (d) Comparison between derived and simulated optimal genetic gains from marker -
only seedling selection for the population with nine segregating genotypes. Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S4 (a) Comparison between derived and simulated genetic gains from two-stage
seedling selection for the population with three segregating genotypes without dominance (d3 = 0). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S4 (b) Comparison between derived and simulated genetic gains from two-stage
seedling selection for the population with three segregating genotypes with partial dominance ( = / ). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S4 (c) Comparison between derived and simulated genetic gains from two-stage
seedling selection for the population with three segregating genotypes with complete dominance ( = ). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S4 (d) Comparison between derived and simulated genetic gains from two-stage
seedling selection for the population with nine segregating genotypes. Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation.

Fig. S5 (a) Comparison between derived and simulated genetic gains from index seedling selection for the population with three segregating genotypes without dominance ( = ).
Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Black numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation. Pink numbers indicate ratios between weight coefficients of the phenotypic score and marker score in each trait-test scenario.

Fig. S5 (b)
Comparison between derived and simulated genetic gains from index seedling selection for the population with three segregating genotypes with partial dominance ( = / ). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Black numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation. Pink numbers indicate ratios between weight coefficients of the phenotypic score and marker score in each trait-test scenario.

Fig. S5 (c)
Comparison between derived and simulated genetic gains from index seedling selection for the population with three segregating genotypes with complete dominance ( = ). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Black numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation. Pink numbers indicate ratios between weight coefficients of the phenotypic score and marker score in each trait-test scenario.

Fig. S5 (d) Comparison between derived and simulated genetic gains from index
seedling selection for the population with nine segregating genotypes. Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Black numbers on the right corner of each plot are correlation coefficients between mean genetic gains estimated based on derivation and simulation. Pink numbers indicate ratios between weight coefficients of the phenotypic score and marker score in each trait-test scenario.

Fig. S6 (a) Simulated genetic gain from alternative seedling selection strategies for the population with three segregating genotypes and no dominance ( = ).
Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Fig. S6 (b) Simulated genetic gain from alternative seedling selection strategies for the population with three segregating genotypes and complete dominance ( = ). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Fig. S6 (c) Simulated genetic gain from alternative seedling selection strategies for the population with nine segregating genotypes. Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the end of seedling selection, ranging from 0.05 to 0.95. The Y axis indicates genetic gain from seedling selection based on the unit of simulated genotypic values. Error bars for each data point indicate the 95% confidence interval (Equation 11), which are not obvious because of extremely tight confidence intervals. Fig. S7 (a) Simulated genetic gain from two-stage seedling selection for the population with three segregating genotypes and no dominance ( = ). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the first stage, and the Y axis indicates simulated genetic gain from two-stage seedling selection based on SPM. Fig. S7 (b) Simulated genetic gain from two-stage seedling selection for the population with three segregating genotypes and complete dominance ( = ). Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the first stage, and the Y axis indicates simulated genetic gain from two-stage seedling selection based on SPM. Fig. S7 (c) Simulated genetic gain from two-stage seedling selection for the population with nine segregating genotypes. Each plot represents a selection scenario with a given broad-sense heritability (H) of the trait and predictiveness (P) of the DNA test. In each plot, the X axis indicates the proportion of seedlings selected in the first stage, and the Y axis indicates simulated genetic gain from two-stage seedling selection based on SPM.