Genome-wide mapping and allelic fingerprinting provide insights into the genetics of resistance to wheat stripe rust in India, Kenya and Mexico

Stripe or yellow rust (YR) caused by Puccinia striiformis Westend. f. sp. tritici Erikss. is a persistent biotic-stress threatening global wheat production. To broaden our understanding of the shared genetic basis of YR resistance across multi-site and multi-year evaluations, we performed a large genome-wide association study using 43,706 YR observations on 23,346 wheat lines from the International Maize and Wheat Improvement Center evaluated between 2013 and 2019 at sites in India, Kenya and Mexico, against predominant races prevalent in the countries. We identified 114 repeatable markers tagging 20 quantitative trait loci (QTL) associated with YR on ten chromosomes including 1D, 2A, 2B, 2D, 3A, 4A, 4D, 5A, 5B and 6B, among which four QTL, QYr.cim-2DL.2, QYr.cim-2AS.1, QYr.cim-2BS.2 and QYr.cim-2BS.3 were significant in more than ten datasets. Furthermore, we report YR-associated allelic fingerprints for the largest panel of wheat breeding lines (52,067 lines) till date, creating substantial opportunities for YR favorable allele enrichment using molecular markers. Overall, the markers and fingerprints reported in this study provide excellent insights into the genetic architecture of YR resistance in different geographical regions, time-periods and wheat germplasm and are a huge resource to the global wheat breeding community for accelerating YR resistance breeding efforts.

Scientific RepoRtS | (2020) 10:10908 | https://doi.org/10.1038/s41598-020-67874-x www.nature.com/scientificreports/ overcome by new Pst races [17][18][19][20][21] . Meanwhile, APR is expressed at the later growth stages, is usually durable and not race-specific 16,22 . In addition, resistance to YR can also be classified based on the testing conditions into: greenhouse and field consistent resistance that is usually detected in the greenhouse in seedlings and is also effective in the field (henceforth referred to as seedling resistance, SR) and field resistance (FR) that can be detected at the adult-plant stage in the field. Among the 83 catalogued YR resistance genes 23,24 9,15,[25][26][27] . In addition, several temporary YR resistance genes and quantitative trait loci (QTL) have been identified and reviewed 9,28 .
A key component in developing YR resistant wheat varieties involves identifying genes/QTL and closely linked diagnostic markers that can facilitate accurate selection for resistance. While linkage mapping has been very useful for Yr gene identification, genome-wide association studies (GWAS) that rely on the linkage disequilibrium (LD) between markers and the underlying causal polymorphisms are powerful for identifying markertrait associationsl [29][30][31] . In contrast to linkage mapping studies, GWAS involve no population development time as they can be performed on existing diversity panels and also provide better resolution by harnessing ancestral recombination events that have occurred at the population level in a diversity panel 29,32 . While several GWAS for YR have been reported [33][34][35][36][37][38][39] , our understanding of the shared genetic basis of YR resistance across multi-site and multi-year evaluations is inadequate and several marker-trait associations identified in GWAS are not repeatable, limiting their use in breeding programs. Breeding for YR was initiated at the International Maize and Wheat Improvement Center (CIMMYT) in the early 1970's, and involves crossing parents with slow rusting APR genes, selecting segregating early-generations under high YR pressure 40 in Toluca (Mexico), and screening advanced generations in Karnal (India), Ludhiana (India), Njoro (Kenya), Celaya (Mexico), El Batan (Mexico) and Toluca (Mexico). Between 2013 and 2019, 23,346 wheat breeding lines have been evaluated for YR by CIMMYT and 43,706 YR observations have been recorded at sites in India, Kenya and Mexico. Hence, the key objective of this study was to leverage these datasets and perform GWAS for dissecting the shared genetic basis of YR FR against the races predominant in these three countries.
Another critical component in developing YR resistant wheat varieties involves identifying the best parental combinations with a high number of favorable alleles (FAs) for YR associated genes/QTL. So, we have generated the allelic fingerprints of a huge panel of 52,067 CIMMYT wheat breeding lines comprising all the advanced lines developed during 2013-2019 and key parents in the crossing blocks for repeatable markers significantly associated with YR resistance. Furthermore, we used the allelic fingerprint profiles to gain insights into the following questions: (i) What is the proportion of YR FAs in the CIMMYT germplasm? (ii) Which parents are the most likely sources of resistance and what are the lines with high frequencies of FAs that can be selected as potential parents? (iii) What is the effect of having different numbers of QTL with FAs on the disease severity?

Results
Phenotyping data and population structure analysis. Phenotypic distributions of YR FR and SR are shown in Fig. S1a-S1e and phenotyping data is available in Table S2. Statistical analysis of YR FR data (Table S3) indicated that the highest YR mean percentage severity was in Karnal panel 1819 (22.5 ± 23.5%), followed by Njoro main season (MS) 2 panel 1718 and Ludhiana panels 1415, 1819 and 1718 all of which had mean severities greater than 15%. Considering SR, the highest mean of the infection scores was observed in panel 1516 (6.5 ± 1.4) and the least in panel 1819 (3.8 ± 1.1). In Njoro, panels 1314, 1415 and 1516 had mean field severities less than 3.4% and an increasing severity was observed in the recent panels 1718 and 1819. Population structure analysis in the eight panels using the first two principal components indicated weak to moderate population structure, with the cumulative percentage variation explained by the first two principal components ranging between 11.5 and 23.6%. (Fig. 1).
Genome-wide association mapping for stripe rust resistance. We identified 466 markers significantly associated with YR resistance at a p value threshold of 0.001 across all the datasets. The p values, additive effects and the R 2 of the markers are given in Table S4. After Bonferroni correction for multiple testing, we identified 146 markers that were significantly associated with YR resistance (Figs. 2 and 3, Table S5).

Markers associated with field resistance in India (Karnal and Ludhiana).
In the Karnal panel 1819, the most significant markers were 2D_634645061, 2D_635168473 and 2B_158358725 that had high additive effects (AEs) of 10.3%, 10.3% and 11.8% on the severity, respectively. In the Ludhiana panel 1314, the most significant marker was 2B_178771120 with an AE of 5.4%. In the Ludhiana panel 1415, 4A_738980353 and 2B_155155706 were the most significant markers with AEs of 7.7%, and 9.2%, respectively. In the Ludhiana panel 1617, the most significant markers on chromosomes 2B and 2D were 2B_184494478 and 2D_634645061 that had AEs of 2.5% and 3.8%, respectively. In the Ludhiana panel 1718, the most significant markers on chromosomes 2B and 2D were 2B_157266347, 2D_634645061 and 2D_635168473 that had AEs of 9.1%, 7.4% and 7.4%, respectively. In the Ludhiana panel 1819, the most significant markers were 2D_634645061, 2D_635168473 and 2B_181609374 that had AEs of 7%, 7% and 5%, respectively. Considering all the datasets in India, the markers that were significant in the highest number of datasets (four out of six) included 2B_155155706, 2D_634645061 and 2D_635168473.
Markers associated with field resistance in Njoro, Kenya. In the Njoro MS1 panel 1415, the most significant markers were 2B_785905982 and 2A_15449240 that had AEs of 2.4% and 1.5%, respectively. In the Njoro MS1 panel 1516, the most significant marker was 7B_223178621 with a low AE of 0.5%. In the Njoro   MS1 panel 1718, the most significant markers on chromosomes 2D and 2B were 2D_634645061, 2D_635168473 and 2B_157266347 that had AEs of 4.3%, 4.3% and 3.8%, respectively. In the Njoro MS1 panel 1819, the most significant markers on chromosomes 2D and 2B were 2D_575686517 and 2B_152503636 that had AEs of 5.2% and 4.9%, respectively. In the Njoro MS2 panel 1314, the most significant marker was 2A_14418760 with an AE of 1.4%. In the Njoro MS2 panel 1718, the most significant marker was 2D_575686517 with a high AE of 13.1%. In the Njoro off season (OS) 1 panel 1819, the most significant markers were 2B_158358725 and 2D_575686517 that had AEs of 3.9% and 1.9%, respectively. In the Njoro OS2 panel 1819, the most significant markers were 2B_158359468 and 2D_634645061 that had AEs of 2.1% and 3.1%, respectively. In the Njoro OS1 panel 1718A, the most significant marker was 2D_575686517 with an AE of 4.1%. In the Njoro OS1 panel 1819A, the most significant marker was 2D_635168473 with an AE of 2.6%. Overall, across all the FR evaluations in Njoro, markers 2D_634645061 and 2D_635168473 were significant in seven datasets.
Markers associated with field resistance in Mexico (Celaya, El Batan and Toluca). In the Celaya panel 1415, the most significant marker was 2A_18495181 with an AE of 6.3%. In the El Batan panel 1516, the most significant marker with a known chromosomal position was 2A_2367215 with an AE of 1.5%. In the Toluca panels 1314 and 1617, the most significant markers were 2A_24002740 and 2A_17830617 with AEs of 1.7% and 1.1%, respectively. In the Toluca panel 1819, the most significant marker with a known chromosomal position was 2A_14978553 with an AE of 2.2%. In the Toluca combined panel, the most significant marker was 2A_18495181 with an AE of 2%. Across all the FR evaluations in Mexico, the markers significant in all the six datasets were on chromosome 2A.
Markers associated with seedling resistance. In the seedling panel 1415, the most significant marker was 5A_705565647 with an AE of 0.6 on the infection score. In the seedling panel 1516, the most significant markers on chromosomes 2B and 5A were 2B_765461540 and 5A_705565647 that had AEs of 0.9 and 0.6, respectively. In the seedling panel 1617, the most significant markers on chromosomes 2B and 5A were 2B_765454910 and 5A_708317841 that had AEs of 0.6 and 0.5, respectively. In the seedling panel 1718, the most significant marker was 2B_759735490 with an AE of 0.6. In the seedling panel 1819, the most significant marker was 2B_765461540 with an AE of 0.5. In the seedling combined panel, the most significant markers on chromosomes 2B and 5A were 2B_765461540 and 5A_708317841 with AEs of 1.7 and 0.5, respectively. The markers significant in the highest number of seedling panels were on chromosomes 2B (five out of six datasets) and 5A (four out of six datasets).
A reference map with stripe rust resistance associated markers. To identify repeatable markers associated across multiple datasets, we filtered the markers that were significant in two or more datasets at a p value threshold of 0.001, resulting in 114 markers. The repeatable markers with known chromosomal positions www.nature.com/scientificreports/ were anchored to the RefSeq v1.0 and a reference map for YR resistance was generated (Fig. 4). The markers significant in the highest number (12) of datasets were 2D_634645061 and 2D_635168473, followed by markers 2B_157266347, 2B_155142247, 2B_155155706 and several markers on chromosome 2A that were significant in ten to eleven datasets.
Designation of linkage disequilibrium based quantitative trait loci associated with stripe rust. The LD between the 114 repeatable markers in each chromosome was obtained and used to delineate LD-based QTL (Table S6, Fig. S2a-S2h). A subset of these markers that included the top two significant markers in each dataset and chromosome that were repeatable are shown in Table 1, along with their genetic positions in the POPSEQ map 41 and the designated QTL. On chromosome 1DS, QYr.cim-1DS.1 was associated with Njoro OS panels 1718A and 1819A. On chromosome 2AS, QYr.cim-2AS.1 spanning 8.9 cM had 23 markers that were associated with FR in Njoro MS panels 1314 and 1415, Celaya, El Batan, Toluca all panels, and SR in panels 1415, 1718 and the combined panel. We also observed that the two unaligned markers were in high LD with the markers in QYr.cim-2AS.1 and indicate the same QTL. On chromosome 2BS, QYr.cim-2BS.1 was associated with YR FR in the Ludhiana panels 1617 and 1819. In addition, a large region on chromosome 2BS between 152186055 and 184634323 bps spanning 32.4 Mb and 5 cM had markers significantly associated with several datasets. Despite being a narrow genetic interval, several markers in this region had very low correlations with each other, suggesting the possibility of more than one QTL. Hence, we used a stringent marker correlation coefficient (R 2 > 0.8) and D′ value (D′ > 0.9) to designate QTL in this region, and also classified only the markers that were highly significant in several datasets into QTL. Among these, QYr.cim-2BS. 2 (15). We also observed that 8.6% of the lines had FAs at 10-15 QTL, 77.5% of the lines had FAs at 5-9 QTL and 13.8% of the lines had FAs at 0-4 QTL. Considering only the four consistent QTL, 58 lines had FAs at all the four QTL, including 17 lines with the parent Quaiu*2/Kinde, and five lines with the parent Chen/Ae. sq//2*Opata/3/Tilhi/4/Attila/2*Pastor/5/Heilo/7/ Kiskadee#1/5/Kauz*2/Mnv//Kauz/3/Milan/4/Bav92/6/Whear//2*Parula/2*Pastor. We also observed that about 3,550 lines (6.8%) had FA at three major QTL, 12,028 lines (23.1%) had FAs at two major QTL, 30,420 lines (58.4%) had FAs at one major QTL and 6,011 lines (11.5%) had no FAs at the major QTL. Furthermore, to understand the effect of having different numbers of QTL with FAs on the disease severity of lines, we regressed YR field severities in the India and Kenya panels against the number of QTL with FAs (only 17 FR QTL that were significant in more than one FR dataset were considered). Our results clearly indicated that the number of QTL in a line had highly significant relationships (at a p value threshold of 0.001) with the YR severities in all the datasets (Fig. 7). However, we also observed lines that had several QTL with FAs but had high disease severities and some lines that had few QTL with FAs and low severities, that can be attributed to not considering the effect size of the QTLs, imprecise phenotypes, insufficient marker coverage, rare QTL not detected in the study etc.

Discussion
We performed comprehensive GWAS that provide excellent insights into the genetic architecture of YR field and seedling resistance, across multiple sites and years of evaluation in India, Kenya and Mexico. The 114 YR-associated repeatable GBS markers identified in this study have been anchored to the RefSeq v1.0, which will enable    46,47 . The gene Yr17 is a race-specific resistance gene that was introgressed into the French wheat cultivar 'VPM-1' and widely used in the CIMMYT breeding program through lines derived from Mutus (Pedigree: Milan/S87230/4/Bow/Nac//Vee/3/Bjy/Coc) and Kachu (Pedigree: Kauz//Altar 84/Aos/3/ Milan/Kauz/4/Vee/Koel), both of which have Milan in the pedigree 33,34 . However, Yr17 was only effective against the races in Mexico and in the earlier Njoro panels (1314 and 1415). It was ineffective against the Pst races in Njoro since 2015-2016 and in India, while it has also been previously reported to be ineffective in Europe and other regions 21,48 . On chromosome 2BS, QYr.cim-2BS.1 was 3.2 Mbs away from Qyr.wpg-2B.1 linked to marker IWA7370 38 . The region between 152186055 and 184634323 bps where several markers with different FA frequencies were located is rich in YR genes 49,50  On chromosome 2DL, QYr.cim-2DL.1 was associated with FR in the recent Njoro panels 1718 and 1819. The previously reported QTL for YR on chromosome 2DL included, (i) QPst.jic-2D from the UK wheat cultivar Guardian flanked by markers Xgwm539 and Xgwm349 54 that were at 513098578 and 629648566 bps respectively; (ii) a QTL from the Japanese cultivar Fukuho-komugi flanked by marker Xgwm349 55 ; (iii) Qyr.wpg-2D.2 flanked by marker IWA5211 at 638147548 bps 38 and (iv) QPst.jic-2D from German winter wheat cultivar Alcedo flanked by marker gwm320 that was at 644277239 bps 56 . Among these, QYr.cim-2DL.1 was in the interval of QPst.jic-2D from Guardian but might be indicating a novel QTL, as Guardian has not been used extensively in the CIMMYT breeding program, while 30% of the fingerprinted lines had FAs for this QTL. Another QTL, QYr.cim-2DL.2 on chromosome 2DL that was associated with the highest number of datasets in India and Njoro in most of the recent evaluations involving panels 1718 and 1819 was 2.5 Mb away from the marker wpt-667054 that was linked to the gene Yr54 57 . This gene was mapped from the CIMMYT spring wheat line QUAIU and the presence of the QYr.cim-2DL.2 FAs in lines with Quaiu #1, Blouk #1 and Babax/Lr42//Babax all of which had Babax/Lr42//Babax in the pedigree affirms that this QTL refers to the Yr54 gene that confers moderate resistance when present alone. While the effectiveness of this gene against races MEX96.11 and MEX08.13 in Mexico has been reported 57 , this study establishes its effectiveness to races in India (Karnal and Ludhiana) and Kenya (Njoro).
On chromosome 4AL, QYr.cim-4AL.1 significant in three Ludhiana panels was in the same position as Qcim.4A.5.3 reported to be associated with YR FR in Ludhiana 33 . The known YR resistance genes on chromosome 4AL include Yr51 from wheat landrace, AUS27858, linked to marker gwm160 58 that was 19.7 Mbs away from QYr.cim-4A.1. In addition, chromosome 4AL also had the Yr60 gene from line Almop (Avocet*3// Lalbmono1*4/Pavon), whose closest marker wmc313 (0.6 cM distal to Yr60) 59 63 , and has been reported to be largely ineffective when present alone 21,64 , it was significantly associated with YR resistance in Njoro in the two large panels used in this study and not in the smaller panels where the FA frequencies were less than 5%. On chromosome 5AL, QYr. cim-5AL.1 was in the position of the Yr34/Yr48 gene flanked by marker wPt-2873 65 69 and might be indicating Yr78 or another gene in that region. It should also be highlighted that none of the durable pleiotropic APR genes (Yr18, Yr29, Yr30 and Yr46) were detected in this study. While Yr46 is absent in the germplasm we used, a diagnostic GBS marker for Yr30 was not available. However, we used two markers on chromosome 1BL (1B_670207768 and 1B_670159907) that were significantly associated with YR in a biparental population (Apav/#1//Kenya Fahari/2*Kachu) 33 and probably linked to the Yr29 gene and observed FAs for the markers at very high frequencies in panels 1314-1819 (0.99 to 1), indicating that the gene is almost fixed in the CIMMYT germplasm. But this needs to be considered cautiously as the markers have not been validated in other populations and their LD with the Yr29 gene is not known. Finally, we have also reported the YR associated allelic fingerprints for the largest panel of wheat breeding lines till date, that can be exploited by breeders to select parents, design strategic crosses for YR and eliminate lines with low to no FAs. The YR severities that could not be explained by the markers identified in this study, could be partly attributed to the fact that we have only used the 64% of markers that aligned to the RefSeq v1.0. While an impressive number of fingerprinted lines had FAs at ten and more YR QTL substantiating the multiple QTL resistance possessed by CIMMYT lines vs monogenic resistance, it also indicates plentiful opportunities for FA enrichment using molecular markers. The study is also of great significance for India where YR is a serious problem in the North Western plains zone comprising the states of Punjab and Haryana that have a major share in India's wheat buffer stock. Overall, we hope that the results presented in this study will help broaden the understanding of the genetic architecture of YR resistance in different geographical regions, time-periods and wheat breeding lines and strengthen global YR resistance breeding efforts.

Methods
Populations. We used 23,346 advanced wheat breeding lines from CIMMYT that comprised eight different panels as follows: (i) Panels 1314, 1415, 1516, 1617 and 1718 with 1,092 lines each (ii) Panel 1819 with 1,228 lines (iii) Panel 1718A with 8,593 lines and (iv) Panel 1819A with 9,217 lines (Table S1). Panels 1718A and 1819A comprised the breeding lines from the first-year or stage 1 of yield testing that were developed using the selectedbulk breeding scheme as described in Juliana et al. 33 . Panels 1314, 1415, 1516, 1617, 1718 and 1819 included breeding lines from stage 2 of yield testing that were selected from stage 1 for high grain yield, good agronomic type, rust resistance and acceptable end-use quality.
Evaluation of field resistance to YR. Field resistance to YR was scored as the percentage of infection Scientific RepoRtS | (2020) 10:10908 | https://doi.org/10.1038/s41598-020-67874-x www.nature.com/scientificreports/ Quality control of the phenotypic data. Quality control of the phenotypic data was done by removing outliers that were 'K' spreads from the center ('K' was assumed as 4) using the Huber's robust fit outliers method 78 in the JMP statistical software (https ://www.jmp.com).
Genotyping data and population structure analysis. The genotyping-by-sequencing (GBS) method 79 was used to obtain genome-wide molecular markers and marker polymorphisms were called using the TAS-SEL (Trait Analysis by aSSociation Evolution and Linkage) version 5 GBS pipeline 80 . Marker discovery was done using a minor allele frequency of 0.01 and the resulting 6,075,743 GBS tags were aligned to the reference genome (RefSeq v1.0) assembly of the bread wheat variety Chinese Spring 81 with an overall alignment rate of 64%. The tags were further filtered as described in Juliana et al. 33 and 78,662 single-nucleotide polymorphisms were obtained. In each of the panels, the markers were filtered for: (i) Missing data less than 60%, (ii) Minor allele frequency greater than 0.05 and (iii) Heterozygosity less than 10% and the number of filtered markers in each panel is given in Table S1. In addition, we also filtered the lines in each panel for less than 50% missing data. Marker missing data was imputed using the LD-kNNi genotype imputation method 82 in TASSEL version 5 83 .
Population structure in all the panels was assessed using principal component analysis 84 .
Genome-wide association mapping and reference map with stripe rust associated markers. We performed genome-wide association studies for YR FR and SR in 28 datasets using the mixed linear model 85 in TASSEL version 5. The mixed linear model was fitted using population structure as a fixed effect, accounted for by the first two principal components and kinship as a random effect, obtained by the centered identity-by-state method 86 . The optimum level of compression and the 'population parameters previously determined' 87 were used for running the mixed linear model. Correction for multiple testing was done using the Bonferroni method where an α level of 0.20 was used for declaring significant markers. The p values, additive effects and percentage variation explained by each marker were obtained and Manhattan plots were plotted using the CMplot 'R' package 88 . For constructing a reference map with the YR associated markers, we obtained all the markers that were significant at the 0.001 level across all the datasets. We then filtered only the repeatable markers that were significant in two or more datasets and aligned them to the RefSeq v1.0. Visualization of the reference map was done using Phenogram (https ://visua lizat ion.ritch ielab .org/pheno grams /plot).

Designation of linkage disequilibrium based quantitative trait loci. Linkage disequilibrium meas-
ures between the significant markers in each chromosome like the standardized disequilibrium coefficient (D′) 89 and the correlation between alleles at the two marker loci (r 2 ) were calculated in TASSEL, in addition to the p value for the existence of LD using the two-sided Fisher's Exact test. For LD estimation, the set of repeatable significant markers associated with YR and a large panel of 52,067 wheat lines (described below) were used. Any pair of markers with a D' value greater than 0.80 and the p value for the existence of LD equal to 0 were designated into a LD-based QTL.
Allelic fingerprinting and favorable allele frequencies. Allelic fingerprinting for YR resistance associated repeatable markers was done for 52,067 wheat breeding lines comprising 50,250 lines from the stage 1 of yield testing developed during 2013-2019, 1,385 lines from the international bread wheat screening nurseries developed during 2012-2014 and 432 lines from CIMMYT's bread wheat crossing blocks. The allelic effects estimated from the mixed linear model were used in identifying the FA, defined as the allele that had a decreasing effect on YR severities or scores, the non-favorable (YR increasing) alleles and heterozygous alleles for each of the repeatable markers. In addition, the lines were also fingerprinted for the designated QTL using the most consistent marker alleles at those QTL. Heatmaps with the fingerprinted markers were obtained using the 'R' package 'pheatmap' 90 . Furthermore, we fitted a linear regression model for the YR field severities in the India and Kenya panels using the number of QTL with FAs, considering only the 19 FR QTL that were significant in more than one FR dataset.

Data availability
The stripe rust phenotyping data of 23,346 lines evaluated in different panels and environments is available in Supplementary Table 2.