Meta-QTLs and candidate genes for stripe rust resistance in wheat

In bread wheat, meta-QTL analysis was conducted using 353 QTLs that were available from earlier studies. When projected onto a dense consensus map comprising 76,753 markers, only 184 QTLs with the required information, could be utilized leading to identification of 61 MQTLs spread over 18 of the 21 chromosomes (barring 5D, 6D and 7D). The range for mean R2 (PVE %) was 1.9% to 48.1%, and that of CI was 0.02 to 11.47 cM; these CIs also carried 37 Yr genes. Using these MQTLs, 385 candidate genes (CGs) were also identified. Out of these CGs, 241 encoded known R proteins and 120 showed differential expression due to stripe rust infection at the seedling stage; the remaining 24 CGs were common in the sense that they encoded R proteins as well as showed differential expression. The proteins encoded by CGs carried the following widely known domains: NBS-LRR domain, WRKY domains, ankyrin repeat domains, sugar transport domains, etc. Thirteen breeders’ MQTLs (PVE > 20%) including four pairs of closely linked MQTLs are recommended for use in wheat molecular breeding, for future studies to understand the molecular mechanism of stripe rust resistance and for gene cloning.

Meta-QTL (MQTL) analysis involving known QTLs for resistance against stripe rust in wheat has never been conducted, although it has been conducted for a number of other traits. These other traits earlier used for MQTL analysis included the following: tolerance to abiotic stresses such as drought and heat 33,34 , resistance to a number of diseases including leaf rust 35,36 , tan spot 37 , Fusarium head blight 38 and powdery mildew 39 . The objective of the present study was to conduct MQTL analysis for resistance against stripe rust in wheat, followed by identification of candidate genes (CGs) for the MQTLs thus identified. A more specific objective was to select a few breeders' MQTLs and CGs to serve as an important resource to supplement wheat breeding through marker assisted selection (MAS) and for future basic studies to understand the mechanism of stripe rust resistance in bread wheat.

Results
QTLs and their distribution on wheat chromosomes. Stripe rust resistance can be either all stage resistance (ASR), which is also described as seedling resistance (SR), or adult plant resistance (APR), which also includes the so-called high temperature adult plant (HTAP) resistance. As a result, in the published literature, the following four different terms have been used for QTLs for stripe rust resistance: SR, ASR, APR and HTAP. However, for the purpose of the present study, we will consider only two classes, ASR (including SR) and APR (including HTAP).
In the present study, using WheatQTLdb and the published literature, a total of 86 studies were available; only 75 studies (73 studies on common wheat + two studies on durum wheat) were found to contain all the required information related to QTLs and were used for the MQTL analysis (Table 1; for details see Supplementary Table S1). The mapping populations in these studies consisted of RILs, DH or F 3 populations (Table1; Supplementary Table S1). A total of 353 QTLs were available in these 75 studies. The information on the flanking markers, phenotypic variation explained (PVE%) for each QTL along with their confidence intervals (CIs) are presented in Supplementary Table S1. Following are some details about these 353 QTLs: (i) These QTLs were distributed on all the 21 wheat chromosomes, with the total number of QTLs per chromosome ranging from 3 on chromosome 5D to 52 on chromosome 2B (Fig. 1a,b). (ii) The distribution of QTLs among three subgenomes also widely differed, with 100 (28.3%) QTLs on A sub-genome, 206 QTLs (58.4%) on B sub-genome and 47 QTLs (13.3%) on D sub-genome (Fig. 1b). (iii) The number of QTLs for ten individual traits relevant to stripe rust ranged from 2 QTLs for leaf area infected (LAI) to 243 for disease severity (DS) (Fig. 1c). (iv) LOD score for individual QTLs ranged from 2 to 62 with 47% of QTLs showing a LOD score of 2-7 (Fig. 1d). (v) The proportion of phenotypic variance explained (PVE%) by individual QTLs ranged from 1 to 88% (average = 6%) and followed the characteristic L-shaped distribution, with most (67%) QTLs showing a PVE < 30% and only a small fraction representing major genes (PVE > 30%) (Fig. 1e). (vi) A set of 302 QTLs (including 37 QTLs for HTAP resistance) were available for APR and only 25 QTLs were available for ASR (including 8 QTLs described as QTLs for SR). The information on the type of resistance for the remaining 26 QTLs was not available. The dense consensus map had 76,753   markers, including 3,526 DArT, 65,459 SNP, 3,975 SSR, and the remaining 3,793 markers representing a group including AFLP, STS, TRAP, etc., which have been used only rarely for QTL interval mapping. The total length of the consensus map was 5774 cM; the size of the 21 individual linkage groups ranged from 98 cM (4D) to 462 cM (2B) (Fig. 2a). The marker densities ranged from 5 markers per cM for 6D to 28 markers per cM for 1A (Fig. 2a). Average marker densities for three sub-genomes also differed, with lowest marker density for D sub-genome (11.85 markers per cM) followed by B sub-genome (14.28 markers per cM) and A sub-genome (15.42 markers per cM) (Fig. 2b). Only 214 QTLs from a total of 353 QTLs could be projected onto the consensus map and were used for MQTL analysis.       Table S4). Twenty-four (24) CGs were common among the above two categories of genes (Table 3, Supplementary Tables S3 and S4).      Table S4). A representative heat map of 29 important DECGs is shown in Fig. 4b; these CGs included the following: (i) 14 CGs encoding R domain containing proteins, (ii) 9 CGs known to be involved in disease resistance signaling pathways and (iii) 6 CGs having very high expression (FC ≥ 5 or ≤ − 5).

Discussion
A new era of quantitative disease resistance involving use of DNA-based molecular markers for identification of QTLs for disease resistance started during early 1990s. The era started with the publication of the classical papers on interval mapping by Lander and Botstein 53,54 . Among plant systems (including wheat), interval mapping has been used to study the genetic control of all kinds of traits and resistance to stripe rust in wheat is no exception 55 . Thus, during the last 25 years, more than 70 reports involving mapping of > 350 QTLs (also described as quantitative resistance loci i.e., QRLs) for stripe rust appeared. A number of these QTLs also overlapped Yr genes that were already mapped.
It may be recalled that in the present study 61 MQTLs involving 184 QTLs were identified. This indicated roughly three-times reduction in the number of genomic regions controlling stripe rust resistance in wheat genome. Earlier, while conducting meta-QTL analysis in wheat for fusarium head blight, roughly five-fold reduction was involved 56 and similar analysis showed nearly four-fold reduction in genomic regions controlling leaf rust resistance in wheat 35 . The absence of MQTLs on 5D, 6D, 7D agrees with earlier reports on QTL analysis [57][58][59][60][61] . Absence of QTLs on few chromosomes was nothing unusual, since in two earlier meta-QTL studies, no MQTLs for leaf rust resistance, were available on five chromosomes including 1D, 3D, 5A, 5D and 6D 35 , and no MQTLs for fusarium head blight were available on six chromosomes including 1D, 3D, 5D, 6D, 7B and 7D 56 . What is unusual is the absence of MQTLs on 5D and 6D in all the three studies (including the present study). A possible explanation for the limited QTLs located on the D sub-genome across various studies could be the low level of polymorphism associated with the D sub-genome. In the present study also, only 47 (13.3%) of the 353 QTLs used for metaQTL analysis belonged to the D sub-genome, suggesting that fewer QTLs are generally available on the D sub-genome.
Sixty-one MQTLs for stripe rust resistance in wheat (including four MQTLs derived from the QTLs belonging to durum wheat) is a fairly large number indicating occurrence of QTLs in close proximity, which agrees with a large number of Yr genes for stripe rust resistance reported in wheat genome. Occurrence of such a large number of genes/MQTLs should provide resistance against a large number of ever-evolving races of stripe rust, www.nature.com/scientificreports/ distributed in different wheat growing regions of the world 5 . Also, almost all the MQTLs (except MQTL36-4A) showed resistance against more than one pathotypes indicating that these MQTLs exhibit race non-specific resistance and may carry novel genes which may sometimes be involved in providing resistance against broad spectrum of pathotypes. Earlier, for resistance against fusarium head blight also, 65 MQTLs were identified. However, the number of MQTLs identified in this study far exceeds the number of MQTLs identified for leaf rust resistance (35) 35 . Perhaps this was due to relatively fewer QTL studies (19) available for meta-analysis in case of leaf rust resistance, so that in a recent report, the number of MQTLs for leaf rust resistance has risen to 75 MQTLs. In this recent study MQTLs are described as genetic map positions or gmQTLs (based on genetic map), since these were named on the basis of genetic map positions that were later converted into sequence mapped meta-QTLs or smQTL 36 . This new terminology for naming meta-QTLs as gmQTL and smQTL may be more widely used in future.
Most of the MQTLs identified in the present study controlled more than one traits (Table 2; Supplementary  Table S2). This probably indicated either a tight linkage of genes for different traits, or occurrence of pleiotropic genes. This may also be attributed to a bias due to the use of related traits for identification of stripe rust resistance as also reported earlier in case of MQTL analysis for leaf rust resistance in wheat 35,36 .
It is also interesting to note that the positions of 24 MQTLs overlapped those for Yr genes; some of these Yr genes are already cloned and characterized (e.g., allelic Yr genes Yr5 and Yrsp overlap MQTL17-2B and MQTL21-2B and Yr15 overlap MQTL6-1B). The above three cloned Yr genes are known to encode proteins for NBS-LRR and therefore represent R genes 21,62 . These genomic regions carrying Yr genes and MQTL may be involved in controlling both the qualitative resistance (generally controlled by R genes) and quantitative resistance (controlled by QRLs) making them relatively more important.
We know that a large number of Yr genes have already been deployed for stripe rust resistance, but there is only one report available, where QTLs (QYr.nafu-2BL and QYr.nafu-3BS) were utilized for transfer of stripe rust resistance in wheat cultivars 63 . Some of the Yr genes which are still effective in India include Yr5, Yr10, Yr15, Yrsp, Yr47, Yr57 and Yr63 64,65 . Three above cloned Yr genes (Yr5, Yr15, Yrsp) and four other Yr genes (Yr10, Yr53, Yr61, Yr65, Yr69) are also known to be effective worldwide 23,66 . Therefore, the MQTLs and the associated Yr genes may be utilized for developing a package to be used for improvement of stripe rust resistance.
It may also be recalled that there are 8 meta-QTLs, which occur in four pairs of MQTLs occurring in close proximity. These four pairs of MQTLs and five other MQTLs (8 + 5 = 13) listed in Table 4 may prove useful for breeding; therefore, we like to describe these MQTLs as breeders' MQTLs. For selecting these breeders' MQTLs, we utilized a number of criteria including the following two criteria suggested in an earlier study 38 . (i) The low CI and high average PVE of the MQTLs and (ii) the number of QTLs carried by individual MQTL. Additional criteria were also used in the present study for prioritizing and selecting breeders' MQTLs. For instance, the relationship between MQTLs and the pathotypes occurring in specific wheat growing regions may be an important criterion. While doing this, we also have to keep in mind that virulence can also be quantitative in nature as mentioned earlier. MQTLs showing resistance against more than one pathotypes may also be an important criterion for achieving broad spectrum resistance. Such important MQTLs showing resistance against multiple pathogen races were also identified in a recent study on MQTL analysis conducted for tan spot resistance in wheat 37 .
Further, almost all (60 out of 61) MQTLs contain original QTLs that are responsible for APR (or HTAP) and a solitary MQTL was based on ASR (including SR). This exclusive presence of QTLs for APR is perhaps due to QTL studies mostly conducted on APR.
The 385 CGs identified during the present study were subjected to a detailed study and were shown to encode a variety of proteins; at least some of these proteins are known to be involved in disease resistance ( Table 3). Out of these 385 CGs, 265 CGs belonged to important classes of R genes which followed six out of the nine different mechanisms earlier proposed by Kourelis and van der Hoorn 67 on the basis of the known protein products encoded by 314 cloned R genes in different crops. The six mechanisms followed by the R genes identified in the present study include direct and indirect perception of pathogen-derived molecules on the cell surface by receptor-like proteins and receptor like kinases, direct and indirect intracellular detection of pathogen-derived molecules by NLRs (NBS-LRR), detection through integrated domains and host reprogramming-mediated loss of susceptibility.
The differential expression of 144 CGs (for details, see Supplementary Table S4) at the seedling stage agrees with earlier reports 26,52,68 . These genes are largely involved in important processes like protein phosphorylation, photosynthesis, protein ubiquitination, transmembrane transport, oxidation-reduction processes, etc. which are relevant to disease resistance. For instance in an earlier study, a reduction in photosynthesis was shown to enhance stripe rust resistance due to the interaction of Yr36, encoding for wheat kinase STARTI (WKSI) with Psbo (a member of photosystem II) 69 without having any adverse effect on yield. Similarly, in another study, several genes encoding PR (pathogenesis-related) proteins, involved in a number of defense responses were shown to get induced in response to stripe rust infection in a number of wheat lines carrying different genes for ASR (YrTr1, Yr76, Yrsp, Yrexp2) and HTAP (Yr5, Yr59, Yr62 and Yr7b) 70 . A number of downstream genes, apparently similar to the CGs identified in the present study and involved in processes mentioned above were also identified in a transcriptome study conducted using a pair of introgression lines, which differed for Yr5 52 .
CGs underlying the MQTLs in wheat were also identified earlier for several traits including drought tolerance 34 , tan spot resistance 37 and fusarium head blight resistance 56 . However, the criteria used by us was novel and not used in any of these earlier reports. For instance, in most of the earlier reports, the complete physical interval flanking the MQTL region was considered for identification of CGs. However, in the present study, we used 2 Mb region flanking the exact physical position of the MQTL based on the MQTL peak position available from the BioMercator software.
Fifty-nine (59) CGs out of the total 385 CGs belonged to MQTLs described as breeder's MQTLs, and are, therefore, considered to be more important. Out of these 59 CGs, 32 CGs also showed differential expression www.nature.com/scientificreports/ and encoded important R proteins including S/TPK, SLC transporter, mitogen-activated protein (MAP) kinase, UDP-glucosyltransferases, S1/P1 nuclease, etc. The known roles of some of the important CGs (shown in Fig. 4) during disease resistance can be summarised as follows: (i) STPK-V, a member of Pm1 gene was reported to confer powdery mildew resistance in wheat 71 . (ii) NBS-LRR domain containing genes are the protein products of the cloned Yr genes like Yr10, Yr5, etc. as mentioned earlier 21,62 . (iii) TaMAPK4, a type of MAPK gene is reported to act as a positive regulator of stripe rust resistance in wheat 72 . (iv) The above transporters may also possibly represent Yr genes similar to Yr46 which was shown to encode a hexose transporter 19 . (v) UDP-glucosyltransferases were earlier reported to show differential expression due to stripe rust infection in wheat genotypes indicating their role in Yr39 mediated stripe rust resistance 73 . Some other important CGs like those encoding for WRKY domains, ankyrin repeat and F-box domain containing genes were also identified in different MQTLs, although the expression data was not available for these genes. WRKY and ankyrin repeat domain containing genes were recently found to encode for proteins of cloned YrU gene 22 . Similarly, a F-box domain containing gene was identified as a CG underlying YrR39 that has been used for improvement of stripe rust resistance 74 .
In summary, the present study allowed us to identify 5 MQTLs and 4 pairs of closely linked MQTLs that may be used by breeders for developing high yielding stripe rust resistant wheat cultivars ( Table 4). Eight of these 13 genomic regions overlapped the known Yr genes. Some of the important CGs like those encoding for NBS-LRR proteins, WRKY proteins, transporters, UDP-glucosyltransferases and MAP kinases may either correspond for important known Yr genes or involved in downstream signalling processes during wheat-Puccinia striiformis interaction. Therefore, these CGs may be used for molecular breeding after validation. These MQTLs may be used for MAS in wheat breeding after necessary validation and may also be used for fine-mapping leading to cloning. Some of these genes may, however, function through their effect on other genes that are directly involved in stripe rust resistance. For instance, overexpression of TaWRKY62 provided seedling resistance to stripe rust by activating a variety of genes including PR protein genes, salicylic/jasmonic acid responsive genes and ROS associated genes 75 . In another study, induced overexpression of TaLHY (a type of pf MYB TF) in leaf blade and sheath reduced the damage caused by stripe rust 76 . This knowledge may prove useful for validating some CGs identified in the present study.
Gene editing and induced mutations are two other approaches, which are still unexplored in case of stripe rust resistance except a single study, where the function of Yr15 gene (encoding wheat tandem kinase 1 or WKS1) was validated throught induced mutations 20 . In this study, using a resistant line, EMS induced mutations in Yr15 were successfully obtained for susceptibility. The resulting susceptible lines carried mutant allele leading to changes in three amino acids (Gly54, Ala149 and Ala460) causing disruption in gene function, thus validating the role of WSK1 in resistance. A similar strategy may be used for at least three cloned genes (Yr5/Yrsp, Yr7 and Yr10) which overlap the marker interval carrying two important MQTLs and the associated CGs (Table 4; Fig. 4a,b). Techniques involving gene editing or base editing may also be used for the above cloned Yr genes and the CGs after identification of causal SNPs involved in providing stripe rust resistance through CGAM approach. CRISPR/Cas9 has already been used for editing three genes (TaABCC6, TaNFXL1, and TansLTP9) for resistance against fusarium head blight 77 , thus demonstrating the possibility of using gene/base editing. The present study thus provided resources that may be used in future for wheat breeding and for basic studies involving stripe rust resistance.

Materials and methods
Search for QTLs and input file preparation. The information on QTLs for stripe rust resistance reported during the past 20 years (2000 to 2020) was obtained from WheatQTLdb 28 (Supplementary Table S1) and other published literature. The detailed information from each study was collected on the following aspects: (i) types of mapping populations and their parents, (ii) size of mapping population, (iii) pathotype(s) used for phenotyping, (iv) methods of QTL mapping, (v) position of QTLs and markers flanking the QTLs, (vi) logarithm of odds (LOD) value, and (vii) R 2 values of the QTLs. Only QTLs with complete information required for meta-analysis were retained for final analysis. The two input data text files used for meta-QTL analysis included the genetic map file and QTL information file from each study following the instructions provided in the BioMercator v3/ v4 manual 40 .

Construction of consensus genetic map.
A consensus genetic map involving SSR, DArT and SNP markers was constructed using LPmerge software 41 . For this purpose, five individual genetic linkage maps each from different studies were used as reference maps [42][43][44][45][46] . Markers flanking individual QTL regions were also included for construction of a consensus genetic map.
Projection of QTLs on consensus map. The original QTLs were projected onto the consensus map using the QTL projection tools (QTL Proj) available in BioMercator v4.2. The QTLs which could not be projected onto the consensus map were excluded. For the projection of QTLs, a scaling rule between the marker interval of the original QTL and the corresponding interval on the consensus map was used 47 . The new confidence interval (CI) for MQTLs on the consensus linkage group was computed using Gaussian distribution 47 .
Meta-analysis of QTLs and identification of MQTLs. MQTL analysis was conducted using BioMercator v4.2 40,48 . Following two different approaches were used based on the number of QTLs on each individual chromosome for conducting MQTL analysis: (i) If the number of QTLs on an individual chromosome was ≤ 10, the approach suggested by Goffinet and Gerber 49 was used; and (ii) if the number of initial QTLs on an individual chromosome was > 10, the approach suggested by Veyrieras et al. 47  www.nature.com/scientificreports/ estimates the relative amount of data lost by different statistical models). Quality of the model depended on the assumption that lesser the information loss, higher will be the quality of that model 50 . Whenever, more than one MQTLs were located in the same physical interval (genetic/confidence interval < 15 cM), these were treated as closely linked MQTLs.
Yr genes associated with MQTL. The sequences of markers associated with Yr genes were blasted against the wheat reference genome version 49 (Ensembl_Release 49) available in the Ensembl Plants database. This allowed identification of physical coordinates of the associated markers. These physical intervals containing the Yr genes were compared with the physical coordinates of the MQTL regions to identify association of Yr genes with individual MQTLs.
Identification of CGs underlying the MQTLs. The CGs underlying the MQTLs were identified in the Gene ontology (GO) and in silico expression analysis of CGs. GO analysis was conducted using Biomart tool available in Ensemble Plants. In silico expression of CGs was conducted using expression data available in expVIP database 51 . The available expression data used in the present study belonged to two different studies. In the first study, Zhang et al. 26 inoculated seven-day old seedlings of wheat variety N9134 using race CYR31 for yellow rust and race E09 for powdery mildew. The inoculated leaves were harvested at 0, 1, 2, and 3 dpi and expression data were collected for both yellow rust and powdery mildew (using 0 dpi as control). In the second study, Dobon et al. 52 used two different genotypes, namely Vuka (susceptible) and Avocet-Yr5 (resistant), which were inoculated with Pst isolate 87/66 at three leaf stage. Leaf samples were collected at 0, 1, 2, 3, 5, 7, 9, and 11 days post-inoculation (dpi) in susceptible genotype Vuka, but for only five days (at 0, 1, 2, 3, and 5 dpi) for the resistant line Avocet-Yr5. In this second study, expression data was collected for genes in both the host and the pathogen. The data on expression in expVIP was available as log 2 transformed tpm (transcripts per million) values. Only genes (CGs) showing FC ≥ 2 (upregulation, twofold or more) or FC ≤ − 2 (downregulation, twofold or more) were accepted as showing differential expression in the form of fold changes estimated by comparing tpm values under stress vs. control. The results of such differentially expressed CGs were depicted in the form of heatmaps generated using online tool Morpheus that is available at https:// softw are. broad insti tute. org/ morph eus/.