Genetic Contribution of Synthetic Hexaploid Wheat to CIMMYT’s Spring Bread Wheat Breeding Germplasm

Synthetic hexaploid (SH) wheat (AABBD’D’) is developed by artificially generating a fertile hybrid between tetraploid durum wheat (Triticum turgidum, AABB) and diploid wild goat grass (Aegilops tauschii, D’D’). Over three decades, the International Maize and Wheat Improvement Center (CIMMYT) has developed and utilized SH wheat to bridge gene transfer from Ae. tauschii and durum wheat to hexaploid bread wheat. This is a unique example of success utilizing wild relatives in mainstream breeding at large scale worldwide. Our study aimed to determine the genetic contribution of SH wheat to CIMMYT’s global spring bread wheat breeding program. We estimated the theoretical and empirical contribution of D’ to synthetic derivative lines using the ancestral pedigree and marker information using over 1,600 advanced lines and their parents. The average marker-estimated D’ contribution was 17.5% with difference in genome segments suggesting application of differential selection pressure. The pedigree-based contribution was correlated with marker-based estimates without providing chromosome segment specific variation. Results from international yield trials showed that 20% of the lines were synthetic derived with an average D’ contribution of 15.6%. Our results underline the importance of SH wheat in maintaining and enhancing genetic diversity and genetic gain over years and is important for development of a more targeted introgression strategy. The study provides retrospective view into development and utilization of SH in the CIMMYT Global Wheat Program.


Results
Marker-estimated contribution vs. the theoretical contribution of the D' genome in the synthetic derivative lines. After applying the various data filters (see methods), a total of 2,669 D genome specific markers (975 PAV and 1,694 SNPs) were recovered for further analysis (Supplementary Figs S1, S2). The frequency of a D genome specific marker, with a differential frequency between the synthetic hexaploid wheat (shw) and bread wheat (bw) populations, was significantly higher than the A and B genome specific markers, with differential frequencies between the durum wheat (dw) and bw populations ( Supplementary Fig. S2). The population allele frequency difference was set to 0.3 falls in 80 percentiles of the distribution. Therefore, only the D genome specific markers were used to estimate of D' contribution from Ae. tauschii to the synthetic derivative lines. The number of differential DarSeq ® markers on the A and B genome was considered to be too low to precisely estimate the contribution of the A and B genomes of durum wheat to the synthetic derivative lines.
The average theoretical contribution of the D' genome to the 253 synthetic derivatives in our study was 25.3% (Fig. 1). In contrast the marker-estimated contribution of the D' genome was 17.4% (Fig. 2). Thus, the marker-based estimate was 7.9% lower than the theoretical estimate. However, there was a high correlation between both estimates (r = 0.88**) ( Supplementary Fig. S3). The majority of the synthetic derivative lines showed a marker-estimated contribution,, less than expected. The synthetic derivative lines of generation 1 and 2 had on an average a lower D' contribution (φ) compared to expectation (ϕ = 12.5 to 25.0%). On the other hand, synthetic derivative lines of subsequent introgression cycle maintained a somewhat higher D' contribution than expected (Figs 1, 2, Supplementary Fig. S3). The results indicate a relatively rapid selection against the undesirable genome segments of Ae. tauschii during the first introgression cycles but a residual D' contribution due to linkage drag in later introgression cycles.
In a second step, we looked how the contribution of D' genome to the synthetic derivative lines is distributed across the genome (Fig. 3). The average contribution of D' in each chromosome varied, while chromosome 2D was the least contributor and chromosome 7D the highest contributor (Fig. 3). For the majority of chromosome regions, the proportion of the D' versus the D genome contribution was less than 18%. However, some chromosome regions were clearly more likely to have D' alleles over D with a proportion equal to or greater than 50% www.nature.com/scientificreports www.nature.com/scientificreports/ (Fig. 3). The Supplementary Table S2 provides list of genes retrieved form RefSeqV1.1 66 in regions with a higher likelihood to retain D' genome alleles.
Average genetic contribution to international nursery lines and released cultivars. We also calculated the theoretical contribution of D' in two international yield trials and in released cultivars with SH wheat in its pedigree. The average contribution of the D' genome in synthetic derivative lines was generally higher in the Semi-Arid Wheat Yield Trial (SAWYT) than in the Elite Spring Wheat Yield Trial (ESWYT) (Fig. 4). In both trials, increasing number of synthetic derivatives with yearly fluctuations were retained in recent years. However, the contribution from the D' genome decreased in recent years indicating that subsequent crosses have reduced the D' genome contribution only retaining the favorable D' regions in the genome. Across both   international yield trials, 25 SH wheat was successfully utilized in developing competitive synthetic derivatives ( Table 1). The SH lines ' ALTAR 84/AEGILOPS SQUARROSA (TAUS)' and 'CROC_1/AEGILOPS SQUARROSA (224)' were the most frequent SH wheat used. The synthetic derived lines, such as SOKOLL (PASTOR/3/ALTAR 84/AEGILOPS SQUARROSA (TAUS)//OPATA) and VOROBEY (CROC_1/AEGILOPS SQUARROSA (224)// OPATA/3/PASTOR) were subsequently used as parents.
Among known released cultivars with synthetic background, the released cultivars retained an average of 17.48% of D' genome contribution (Table 2). Noteworthy, cultivars with as high as 50% of D' contribution (synthetic derivative lines of generation 1) have been released, although the majority of cultivars had less than 12.5% of D' contribution (Supplementary Table S2).

Discussion
Our study provides a unique perspective of the genetic contribution of Ae. tauschii to the CIMMYT spring bread wheat breeding program via SH wheat, which have been of interest for many years 23 . Studies have shown that the D genome diversity of SH wheat is considerably higher than of bread wheat 25,27 . Although direct hybridization of bread wheat with T. turgidum and Ae. tauschii is possible, it is generally difficult to generate and maintain stable genomes 67 . As a result, the development of SH wheat is considered a better alternative. In addition, SH wheat can also serve as bridge to introduce alleles from tetraploid durum wheat and emmer wheat to hexaploid bread wheat 33 . In contrast to initial years when few elite durum wheat genotypes were used to develop SH, in more recent years additional lines from the tetraploid Triticum dicoccoides and Triticum dicoccum 20,23,68 pools were introgressed.
The potential of wild relatives of wheat including Ae. tauschii to improve disease resistance in bread wheat is well documented. For example, the leaf rust resistance gene Lr21 was introgressed into the wheat cultivar Thatcher from Ae. tauschii accession TA1599 via SH wheat 69 . Similarly, gene Yr28 and other genes of resistance to stripe rust have been transferred from Ae. tauschii 70 . More recently, two Ug99 stem rust resistance genes (SrTA10187 and SrTA10171) were transferred from Ae. tauschii into the hard-white winter wheat line KS05HW14 71 . In several instances, synthetic derivative lines have also been reported to positively contribute to yield and abiotic stress torelance 19,[72][73][74] .
Beside Ae. tauschii, other wild wheat relative species can be introgressed into wheat without development of synthetics. However, usually the transfer of the alien segments is practically and methodologically more challenging and creates a situation where no homologous pairing occurs. The D' genome being the progenitor of the D genome, problems of homologous paring have not been reported. Due to its allo-hexaploid nature, bread wheat is in general have high buffering capacity to alien transfers.
The A and B genomes are more closely related with each other than the A vs. D genome or the B vs. D genome 5,6 . This makes it more difficult to find SNP markers specific to the A or B genome. The A and B genomes from durum and bread wheat have common ancestors, while natural hybridization between the genomes occurred 8,000 years ago. Both genomes have gone through natural and artificial selection over years, thus likely evolved in a similar manner. Moreover, the dwarfing gene Rht1 in modern durum wheat was transferred to bread wheat further allowing the mixing of the two species. Our study clearly showed that the differences between the A or B genomes from durum and bread wheat were smaller than between the D' and D genomes.
The theoretical contribution of the D' genome was a good predictor of the estimated marker-estimated D' genome contribution. However, synthetic derivatives within each theoretical contribution class (breeding cycles 1 to 7) differed significantly for the estimated marker-contribution and for genomic regions deriving from the D' genome. Thus, the theoretical estimates provide some indication about the potential contribution of D' , but is not an accurate reflection due to natural and artificial selection, linkage drag and Mendelian sampling that occurs during the breeding process. www.nature.com/scientificreports www.nature.com/scientificreports/ Adaptive introgression is very common phenomena, which leads to unbalanced contributions in different part of the genome particularly in regions vital for plant survival and adoption 75 . SH wheat shows a number www.nature.com/scientificreports www.nature.com/scientificreports/ of undesirable traits such as a shattering rachis, glume retention, glume hardiness and tall plant height 33,67 . Therefore, during the process of developing synthetic derivatives, large progeny populations (exceeding 2,000 to 3,000 plants in size) were grown to select for agronomically desirable traits particularly threshability, plant height and disease resistance 33 . Natural adaption and adaptive introgression apply as some progenies do not survive due to chlorosis or necrosis. In addition, the fitness advantage of wild or domesticate alleles may be favored in process of natural selection [76][77][78] .
The large progeny populations are used by breeders for stringent selection. The probability of retaining D' is higher in regions surrounding QTL/genes of interest when certain plant types and target traits are selected and can last several generations of backcrossing 21 . Our results showed a larger extent of the residual D' contribution was possibly due to linkage drag in later breeding cycles. We also demonstrated that there is no clear biased preference among alleles in the majority of genome regions. However, some bias in contribution is obvious as during the process of developing the synthetic derivatives both natural and artificial selection are applied, possibly favoring genes related to survival and to typical bread wheat plant types. The search for the underlying genes in the preferable selected D' genome segments did not provide any deeper insight as the function of most genes was unknown.
CIMMYT established its Wide Crosses Program, in 1986. Since their development, synthetic derivative lines were then frequently used in mainstream breeding and the superior progenies were selected for inclusion and distribution through international yield trials and nurseries, a few were released as cultivars. In the past 32 years, CIMMYT has generated more than 1,401 spring type SH wheat and over two thousand crosses made between the most promising SH wheat and elite bread wheat lines. Generating winter type SH wheat started in 2008. Approximately 50 targeted SH wheat are currently developed by CIMMYT annually, by crossing current elite durum wheat cultivars, but extending the genetic diversity of the Ae. tauschii accessions by using combined genotypic and phenotypic information.
The number of SH wheat increased in both yield trials (ESWYT and SAWYT) during the last 19 years, however there was a declining trend of the D' genome contribution on an individual basis. Synthetic derivatives of breeding cycle 1 with a D' genome contribution of 50% formed part in yield trials disseminated from 1997 to 2000, whereas in more recent years the average D' genome contribution was less than 15%. Although 1524 SH wheat were developed to date, derivatives from 25 SH wheat reached the yield trials. Over one thousand initially developed SH wheat were characterized for agronomic and other economically important traits and below 10% were found to have potential for use in breeding programs; the set is available as Elite Synthetics from the CIMMYT germplasm bank. Similarly, thousands of synthetic derivatives have been characterized over the years, some now released as varieties, however the most popular synthetic derivatives, e.g. SOKOLL and VOROBEY, were used multiple times as parents in mainstream breeding due to their superior performance under drought stress and resistance to Septoria tritici blight. The SH wheat ALTAR_84/AEGILOPS SQUARROSA(TAUS) was the most frequent line in both ESWYT and SAWYT. Among the durum wheat donors ALTAR_84 dominated followed by CROC_1 (=LAHN) and CHEN. ALTAR_84 was bred during the 1980s based on a new ideotype concept, with balanced increase in all yield components and was extensively grown in Mexico. The higher percentage of SH wheat in the SAWYT is due to their potential introgression of new stress tolerance genes 79 . A more recent outcome of the utilization of SH has been the development and release of high grain Zn, biofortified, varieties 'Zn Shakti' from a durum wheat SH, and 'WB-02' , 'PBWZn-1′, and 'Nohely F2018' from T. dicoccum based SH wheat.
The value of introgression of SH wheat is also demonstrated by an increasing number of varieties released in various countries. It is difficult to estimate the use of synthetic derivative lines as parents across breeding programs due to the (a) unavailability of pedigree record of cultivars released in different countries and (b) incomplete or inaccurate pedigree records available. Based on available pedigree records some cultivars are released with as high as 50% theoretical D' genome contribution (e.g. KT2009 in Pakistan). This means the buffering capacity of the A and B genome can mask undesirable effects of wild alleles.
The development of SH wheat led to increased D genome diversity via the introgression of D' at CIMMYT. However, for SH wheat to be increasingly used in mainstream breeding, a targeted introgression of high value traits combined with a proper selection strategy through limited crossing is considered important to capture additional useful genes/diversity from SHs. The breeding strategy needs to consider minimizing the unfavorable contribution of the D' genome due to linkage drag while maintaining the desirable alleles. This raises the question, how much of D' genome contribution is acceptable or favorable in synthetic derivatives or what is the average percentage of D' genome contribution that must be discarded using selection. The theoretical contribution in the most recent CIMMYT international yield trials remained less than 12.5% contribution of the D' genome which means that at least three rounds of crosses with elite bread wheat lines is required if backcrossing and stringent selection for multiple traits is not applied. As we have seen the difference between the theoretical and marker-estimated contribution of D' at the individual and across genome level, accelerating the process through an appropriate breeding approach can only be archived by getting rid of the unnecessary genome segments of D' while retaining desirable genes/QTLs. Combining genomic profiles with the phenotypic assessment of SH wheat could help to identify the desirable genes/QTL, which could be traced during the selection process via genomic-assisted breeding approaches.

conclusion
The results show that the genetic contribution of the D' genome of Ae. tauschii can be estimated using high-density genome profiles. The marker-estimated contribution of D' in this study underlines the importance of SH wheat in maintaining diversity and genetic gain over years. However, targeted breeding efforts are required to effectively use the D' diversity for mining desirable alleles. Molecular markers, if tagged to desirable alleles, will have the potential to accelerate the selection process to maintain the desirable variation while culling the undesirable variation. Overall, the development and utilization of SH wheat at large scale at CIMMYT is a model for maintaining www.nature.com/scientificreports www.nature.com/scientificreports/ diversity in bread wheat breeding germplasm required for combating future challenges in crop improvement and is a successful example of the utilization of genetic diversity from wild relatives.

Methods
The SH wheat included in this study were developed by CIMMYT wide-crosses program by crossing durum or emmer wheat (Triticum dicoccum, AABB) with Ae. tauschii (D'D'), followed by artificial doubling of the chromosomes of resulting haploid F 1 plants (Fig. 5, Supplementary Fig. S5). The SH wheat was then crossed with bread wheat lines via 1 or 2 backcrosses or 3-way (top) crosses, to generate advanced lines through selections conducted in segregating generations and as fixed lines; known as a synthetic derivative line (Supplementary Fig. S5). The SH wheat crossed "x" times with bread wheat is termed as synthetic derivative line of breeding cycle "x". For example, synthetic derivative line of breeding cycle 1 is defined by an SH wheat crossed once with a bread wheat.

Marker-based contribution estimation.
A total of 359 genotypes used in this study included three Ae. tauschii lines, 30 durum wheat lines, eight SH wheat, 253 synthetic derivative lines, and 63 bread wheat lines (Supplementary Table S3). Synthetic derivative lines were between 1 to 7 cycles in derivation. The synthetic derivative lines were selected because of their relevance in the CIMMYT Spring Bread Wheat Improvement Program, with at least one sister line distributed through CIMMYT international yield trials, but with at least 3 to 4 sister lines conserved in the CIMMYT Wheat Germplasm Bank.
All entries were genotyped with the DArTseq ® technology at the Genetic Analysis Service for Agriculture (SAGA) laboratory at CIMMYT in Mexico. A complexity reduction method including two enzymes (PstI and HpaII) was used to create a genome representation of the set of samples. A PstI-RE site specific adapter was tagged with 96 different barcodes enabling multiplexing a 96-well microtiter plate with equimolar amounts of amplification products within a single lane on an Illumina HiSeq2500 instrument (Illumina Inc., San Diego, CA). The successful amplified fragments were sequenced up to 77 bases, generating approximately 500,000 unique reads per sample. A proprietary analytical pipeline developed by DArT P/L was used to generate allele calls for SNP and SilicoDArT. Then, a set of filtering parameter was applied to select high quality markers for this specific study. A total 76,543 Single Nucleotide Polymorphism (SNPs) and 365,663 SilicoDArT were revealed. SilicoDArT markers are a secondary marker type provided by DArTseq ® , which are scored for the presence or absence of a single loci.
The genotypic data were then filtered based on the following consecutive steps:  www.nature.com/scientificreports www.nature.com/scientificreports/ The primary populations in our study included the Ae. tauschii -SH wheat (shw), bread wheat (bw), durum wheat (dw) lines, where geneflow across the populations rarely occurs, whereas the secondary population consisted of the synthetic derivative lines (sd) which were derived by the introgression of shw, bw and dw, respectively. We assumed a SNP marker with two alleles (1 and 2) at locus L with an allele frequency P 1shw , P 2shw in the Ae. tauschii -SH wheat population, and P 1bw , P 2bw in the bread wheat population and P 1sd , P 2sd in the synthetic derivative line population. Then the population specific differential (δ D ) can be represented as: For calculating population specificity at A and B, let allele frequency P 1dw , P 2dw for durum wheat population, the population specific differential (δ A or B ) can be represented as: For calculation of D' contribution, the minimum population specific differential (δ D(l) ) was set to greater than 0.30 as 80% of the markers had the differential less than 0.30 82 . Using the allele frequencies, the least-square estimation of the introgression proportion (φ), as a measure of the genetic contribution for D' at each locus l 83,84 can be expressed as theoretical contribution estimation. The theoretical contribution of D' to synthetic derivative lines was calculated using the available pedigree information, where pedigrees were extended at multiple levels and were traced back to the first cross between the SH wheat and bread wheat using the International Wheat Information System (IWIS) version 2 (https://www.cimmyt.org/funder_partner/international-wheat-improvement-network-iwin/). IWIS, curated by CIMMYT, has a collection of pedigree and phenotypic data recorded since 1976 to date. Using a recursive approach using the pedigree information, we calculated the theoretical contribution of D' . The ϕ I , the D' contribution to individual i can be estimated with D contribution to first parent (female), abbreviated as ϕ P1 , and second parent (male), ϕ P2 , with the formula: As SH wheat has the entire D' genome derived from Ae. tauschii, the ϕ I is considered as 100. For bread wheat with no SH wheat in its pedigree, the ϕ I is considered as 0. The different accessions of Ae. tauschii were assumed to have equal amounts of contribution. The theoretical and marker-estimated contribution of D' was compared across all included synthetic derivative lines.
Average genetic contribution to international nurseries and released cultivars. To estimate the impact of SH wheat in the CIMMYT spring bread wheat breeding program the average theoretical contribution of D' was estimated in two international yield trials, which were annually disseminated to 100-400 national partners worldwide as part of the CIMMYT International Wheat Improvement Network (IWIN). The entries of the Elite Spring Wheat Yield Trial (ESWYT) targeted to irrigated wheat growing areas and the Semi-Arid Wheat Yield Trial (SAWYT) targeted to rain-fed areas from 19 years (1997 to 2016) were used ( Table 2). The entries included in these two yield trials are considered to be the best CIMMYT breeding lines for their target environment testing and up-take as cultivars in the developing world. The average contribution of D' in synthetic derivative lines released as cultivars worldwide between 2003-2017 were also analyzed ( Table 2, Supplementary  Table S2). All calculations and plotting, otherwise specified above, were performed in R 87 .