GWAS revealed a novel resistance locus on chromosome 4D for the quarantine disease Karnal bunt in diverse wheat pre-breeding germplasm

This study was initiated to identify genomic regions conferring resistance to Karnal Bunt (KB) disease in wheat through a genome-wide association study (GWAS) on a set of 179 pre-breeding lines (PBLs). A GWAS of 6,382 high-quality DArTseq SNPs revealed 15 significant SNPs (P-value <10−3) on chromosomes 2D, 3B, 4D and 7B that were associated with KB resistance in individual years. In particular, two SNPs (chromosome 4D) had the maximum R2 values: SNP 1114200 | F | 0–63:T > C at 1.571 cM and R2 of 12.49% and SNP 1103052 | F | 0–61:C > A at 1.574 cM and R2 of 9.02%. These two SNPs displayed strong linkage disequilibrium (LD). An in silico analysis of SNPs on chromosome 4D identified two candidate gene hits, TraesCS4D02G352200 (TaNox8; an NADPH oxidase) and TraesCS4D02G350300 (a rhomboid-like protein belonging to family S54), with SNPs 1103052 | F | 0–61:C > A and 1101835 | F | 0–5:C > A, respectively, both of which function in biotic stress tolerance. The epistatic interaction analysis revealed significant interactions among 4D and 7B loci. A pedigree analysis of confirmed resistant PBLs revealed that Aegilops species is one of the parents and contributed the D genome in these resistant PBLs. These identified lines can be crossed with any elite cultivar across the globe to incorporate novel KB resistance identified on 4B.

Wheat (Triticum aestivum L.) is consumed worldwide as a key component of the human diet. The global wheat production for 2018-19 was 730.9 million tons, which is approximately 29 million tons lower than production in the preceding year (http://www.fao.org/worldfoodsituation/csdb/en/). With the global population estimated to be 9 billion in 2050, wheat demand is expected to increase by 60%. Increases in the annual wheat yield must grow from the current level of less than 1% to at least 1.6% to meet the demand (https://www.wheatinitiative.org/ about-us/wheat-research-food-security).
Rusts, smuts, bunts, mildews and foliar blights are major biotic factors that significantly affect wheat production and productivity. Karnal bunt (KB), which is caused by Neovossia indica, is one of five bunt and smut diseases of wheat and a serious concern to the grain industry due to quarantine regulations that restrict the international movement and trade of affected stocks. KB directly attacks the economic part, i.e., seed of the crop ( Supplementary Fig. S1). After the attack, the fungus forms black teliospores in sori and deteriorates the grain quality by generating an unpleasant, and foul smell attributed to trimethylamine produced by the fungus 1 . The infection of approximately 4% of kernels is sufficient for the grain to become unfit for human consumption due to a low palatability 2 . Internationally, approximately 70 countries have placed an embargo on the import of Karnal bunt-infected wheat 3

Results and Discussion
KB is a quarantine disease (zero percent tolerance) that is indirectly responsible for economic losses in more than 40 countries by affecting the free import-export of wheat grains. Additionally, infected grains are not fit for human and animal consumption 17 . Thus, the identification and/or development of KB resistance genotypes(s) that minimize the probability of establishment or decrease the inoculum spread rate into newer areas are urgently needed 17 . The development of resistant cultivars through breeding programs is an appropriate strategy to achieve zero tolerance to the disease.
Evaluation of pre-breeding germplasm for KB resistance. The sporadic incidence of KB in natural hotspots and the role of the environment in disease development make studies of KB resistance more challenging and time-consuming. Screening of wheat germplasm against the pathogen has been attempted by various researchers. Until 1979, screening experiments were conducted at KB hotspot locations. Subsequently, researchers developed an effective screening system, and since then, a large number of wheat germplasm and allied species have been screened under artificial epiphytotic conditions 18 . The severity of KB incidence in the germplasm of 179 pre-breedling lines from different environments ranged from 1.43-58.34%, whereas infection with KB-SUS (check) ranged from 53.38 to 98.84% (mean = 86.08 ± 12.49; Fig. 1). The mean severity was 26.09 ± 13.76%, indicating a significant difference in KB resistance among the studied genotypes at the genetic level. A summary of the results from the detailed phenotypic analysis is presented in Table 1 and Supplementary Table S1. The frequency distribution of the disease in the population lines is presented in Fig. 2 and indicated a trend towards a normal distribution. The severity of infection ranged from 0 to 69.03% in E-1 and from 0-56.92% in E-2. The mean value for infection with the disease in E-1 (28.26%) was greater than in E-2 (23.91%). The variation in disease resistance in the currently analyzed germplasm suggested that KB resistance was reinforced in different genetic backgrounds of the most appropriate high yielding varieties/genotypes. Therefore, PBLs with KB resistance can be used to create immunity for the disease 19 . This magnitude of genetic variation in infection indicated that a GWAS would likely reveal the underlying QTLs. During KB screening, breeders select genotypes that exhibit an infection rate less than 5% and exploit them in wheat improvement program. In current study with more stringency of selection, seven PBLs exhibited an infection severity of less than 4.0% at both E-1 and E-2 (Table 2 and Fig. 1). Variable levels of disease incidence for KB have been reported in different germplasm specimens by various researchers; for example, 1 to 48% and 2 to 95% disease infection rates were recorded in spring and durum wheat, respectively 20 . A previous study reported a percentage of disease incidence ranging from 0.2-63.1% among 150 wheat genotypes 21 . A significant correlation (0.81**) between KB infection and the year indicated that the disease pressure was substantial in both years and phenotypic data were suitable for the GWAS analysis (Supplementary Table S1).
The analysis of variance suggested that both the genotype and year exerted a significant effect on KB (Supplementary Table S2). Similar results were recorded during QTL mapping for KB resistance in two different studies 2,8 . The heritability of trait was high in both years (91% and 87% in E-1 and E-2, respectively) (Supplementary Table S3), suggesting that KB resistance can be mapped through QTL mapping and GWAS 2,15 . The phenotyping protocols established at CENEB, Cd. Obregon and reported in a previous study 22 that limit environmental or unexplained variations account for the high heritability observed in the present and previous studies. The disease is highly influenced by environmental conditions, but in current investigation the environmental effect was not significant and high heritability of the disease was observed. The high level of genetic variation and high heritability recorded in the current study will be very useful for selection to genetically improve KB resistance in wheat, and highly resistant genotypes may be shared with partners to exploit them in KB-resistant breeding programs.
Distribution frequency of KB disease in PBLs. The distribution frequency of disease infection was continuous in both environments, suggesting that KB resistance in wheat displays a quantitative inheritance pattern ( Supplementary Fig. S2). The continuum of resistance against KB in the germplasm of wheat is well-established in previous studies 8, 23 and our study corroborates these findings. Moreover, the distribution of KB infections was skewed towards low disease severity, indicating that some major effect-related genes or genes involved in epistatic effects controlled KB resistance in the population investigated.
Distribution of SNP markers, Genetic Diversity and Population Structure. A total 6,382 of high-quality SNPs were obtained from the 179 PBLs and were subjected to genetic diversity, population structure and GWAS analyses. The analysis of the genotype data revealed differences in the distribution of polymorphic DArT markers among the A, B, and D genomes ( Supplementary Fig. S3  in wheat contains comparatively fewer polymorphic markers than the A and B genomes due to low recombination attributed to its evolutionary and domestication history. Consistent with previous findings, the B genome contains a greater number of polymorphisms than the A and D genomes, which is attributed to a greater number of effective recombination events in the B genome [24][25][26] . The average number of SNPs per chromosome was 304, and the value ranged from 147 (4B) to 500 (7D) ( Supplementary Fig. S4).
The ratio of the number of SNPs in the B and A genomes was 1.17, 1.15 in the B and D genomes, and 0.98 in the A and D genomes. Based on these ratios, the number of SNPs in the D genome was approximately equal to the number of SNPs in the A genome and only slightly less than the number of SNPs in the B genome. A similar ratio was also recorded between the A and D genomes 27 . Many researchers have reported a high value for the ratios between D and A or B genomes, but this finding was not corroborated in the current study, indicating the high variability of D genome in PBLs. As described in a previous study, unique genetic variations and wild relative genes can be used to broaden the genetic base of wheat, extend the genetic gain and weaken the genetic bottlenecks associated with the D genome 28 .
The mean genetic diversity in current sets of germplasm was 0.32 (range 0.035-0.71), which is higher than previous studies examining SNP markers 29,30 . The high value of genetic diversity may be attributed to the exploitation of diverse exotic parental lines to generate PBLs. The mean genetic diversity value was close to the value reported for the Wheat Association Mapping Initiative (WAMI) population (0.31), which comprises many synthetically derived lines. The PIC (0.28; range 0.01-0.59) of the current study was comparatively higher than values (0.21, 0.16) reported in previous studies 31,32 .
The population structure can distort the MTA by revealing false positive associations in a GWAS. Therefore, information on the population structure is a prerequisite 33 . The largest delta K was observed at K = 6, suggesting the presence of six sub-populations ( Supplementary Fig. S5). The maximum number of genotypes were grouped in sub-population 3 (83 PBLs) followed by sub-population 5 (44 PBLs). Sub-population 4 included two PBLs, while sub-populations 1, 2 and 6 contained 10, 13 and 27 PBLs, respectively.
Two elite lines (Baj #1 and Super 152) in the parentage dominated sub-population 1, as 80% of the population lines had these parents. Seventy percent of the lines in subpopulation 2 shared three elite parents, Wiblli, Kukuna and Tacupeto F2001, while the remaining lines shared two parents, Baj #1 and Super152. Similarly, sub-populations 3, 4, 5 and 6 were dominated by parents Baj#1, Super152/Villa Juarez F2009, Kachu and Kiritati, respectively. Based on these results, sub-populations largely corresponded to the elite parents in each cross of exotic/elite1/elite2. Linkage disequilibrium analysis and genome-wide association study of KB resistance. The extent of LD decay in this panel at cut off r 2 = 0.1 has been reported to 2.5 cM for whole genome 34 , which suggests higher genetic diversity of the investigated panel as compared to the wheat panels used in previous studies 31,[35][36][37] . This panel has been drawn from a large set of pre-breeding lines (PBLs) developed by a three way crossing scheme (exotic/elite1//elite2) among exotics and elites 38 . Each pre-breeding line acquired approximately 25% of the exotic genome and 75% of the elite genome at an early stage thus allowing recombination between exotic and elite genomes. Further, LD decay in the three genomes was at 2.5, 5.0, and 2.5 cM for A, B, and D genomes, respectively. These results indicate a faster LD decay in the D genome, almost comparable to A genome, than reported in previous studies. This might be due to the use of synthetics for developing PBLs, driving more recombination in the D genome.
Previous studies reported the presence of KB resistance loci scattered on many chromosomes in all three genomes of wheat 15,16 . Hence, cultivar pyramiding/stacking of favourable alleles from multiple loci is a prerequisite to develop KB resistance 22 .
The results of the current study validated the consensus region on chromosome 3B reported in previous studies 2,15 (Fig. 4). Likewise, QTLs for KB resistance on chromosomes 5B and 2D 8,15 were also reported previously in RIL populations and a GWAS panel, respectively. A marker-trait association was also detected on chromosome 7B, which might be the same locus that was previously reported 23  The donor of the D genome, i.e., Aegilops tauschii, in hexaploid wheat has been a source of useful genes, for example, rust resistance genes 23 . In the current study, two KB resistant loci were located on the D genome (chromosomes 2D and 4D) (Fig. 4). The wider genetic variation derived from synthetics and the diverse genetic make-up of PBLs obtained from populations used in previous studies may be the probable explanation for the identification of novel KB resistance loci on the D genome.
Chromosome 3B appears to be a hot spot for various disease resistance genes, as it is the location of QTLs for KB resistance 24 and resistance to tan spot 39 , yellow rust 39 , leaf rust 24 and crown rot 26 . Genomic regions inherited as multi-disease resistance loci are not uncommon in wheat 39 . Therefore, researchers should focus on chromosome 3B for future introgression breeding to create cultivars with simultaneous resistance to multiple diseases. Likewise, the current GWAS panel may be screened for pathogens causing devastating diseases, such as a blast to identify major effect QTLs and/or to reveal genomic hot spot(s) for multi-disease resistance.
During the detection of highly tolerant PBLs based on phenotypic data, a set of 7 lines were confirmed to be resistant (Table 2). An analysis of the pedigree of these 6 PBLs revealed that Aegilops species are one parent used during the creation of synthetic wheat and contributed the D genome to these resistant PBLs. The analysis of these same lines for the SNP 1114200 marker (chromosome 4D) indicated that 6 of 7 PBLs contained the 'T' allele (favorable allele contributing to resistance). The role of the D genome in KB resistance suggested that the D genome must be further enriched with additional polymorphic SNPs to exploit more QTLs for various diseases viewed as upcoming threats to wheat cultivation.
During the study, SNPs on chromosome 4D were not associated with the Rht-D1 dwarfing locus, as the frequency of the dwarf allele was 0.1% (2 of 179 PBLs exhibited wild allele, i.e., G), indicating that the Rht-D1 gene was fixed in the current KB panel of PBLs and the KB resistance locus on chromosome 4D is novel (Supplementary Table S5).
Significance of chromosome 4D. Seven SNP markers on chromosome 4D were associated with KB infection in the joint analysis (Table 3). Noticeably, all but one SNP were present in strong LD ( Supplementary  Fig. S7). We divided the panel into two groups. Group 1 (carrying all favourable alleles of 7 markers) contained 109 PBLs and group 2 (carrying unfavourable alleles for any of the 7 markers) contained 70 PBLs. The mean KB% was 12.16% higher than in group 1. This comparison further confirms that the allelic profile of group 1 PBLs is www.nature.com/scientificreports www.nature.com/scientificreports/ associated with resistance to KB (Supplementary Fig. S8). However, an in-depth analysis of a set of diverse exotic and elite accessions should improve our understanding of this genomic region on chromosome 4D.
The identified markers were subjected to BLAST search in Ensembl Plants server (https://plants.ensembl. org/Triticum_aestivum/Info/Index) to obtain additional insights into the genes on chromosome 4D that are involved in KB resistance. Of the 8 associated SNPs located on chromosome 4D, three showed direct hits with candidate genes and three were located in the vicinity (2 Mb) (Supplementary Table S6). Of the three direct hits, two hits with markers 1103052 | F | 0-61:C > A (TraesCS4D02G352200) and 1101835 | F | 0-5:C > A (TraesCS4D02G350300) were interesting, belonging to RBOHB and Rhomboid-like protein gene families 40,41 . The gene TraesCS4D02G352200 showed the closest relationship to the AET4Gv20828800 gene in Aegilops tauschii ( Supplementary Fig. S9). One of the molecular functions of TraesCS4D02G352200 is NAD(P)H oxidase activity. As key producers of reactive oxygen species, NADPH oxidases (NOXs), which are also known as respiratory burst oxidase homologs (RBOHs), play crucial roles in various biological processes in plants. Two TaNOX genes have been reported to confer resistance to brown rust infection 40 . To date, 15 TaNOXs have been reported to be co-expressed with different sets of other genes that participate in several critical intracellular processes such as cell wall biosynthesis, defence response, and signal transduction, suggesting vital but diverse roles for these genes in regulating plant growth and stress responses in wheat 43 . Two of these genes (TaNOX8 and TaNOX9) were mapped to chromosome 4DL. TaNOX8 is specifically present on the membrane and is co-expressed with TaNOX13AL on chromosome 5AL. Thus, we speculate that the marker 1103052 | F | 0-61:C > A might represent the TaNOX8 gene. The other candidate gene, TraesCS4D02G350300, is a Rhomboid-like protein, which belongs to a family of serine proteases (family S54) that cleave substrates within transmembrane domains 44 . Rhomboid protease family member S54 has proven to be active during fungal-plant interactions 45 .
Furthermore, these two markers, 1103052 | F | 0-61:C > A and 1101835 | F | 0-5:C > A, on chromosome 4D were significantly involved in epistatic interactions with other loci on chromosome 4D and with the associated locus on chromosome 7B (1072740 | F | 0-54:G > C) (Fig. 5), contributing an additional 8% to the variation. Therefore, KB resistance is controlled by minor additive and epistatic loci. Previous studies using bi-parental populations and a GWAS panel have reported additive QTLs for KB resistance 2,15,16 , but information on epistatic QTL is lacking. This report is the first to identify additive and epistatic QTL for KB resistance in a GWAS panel. A similar level of interaction was also reported for APR loci for stem rust in wheat 46 .

Materials and methods
Plant material. The plant material used in the current investigation comprised 179 pre-breeding lines (PBLs). A detailed description of the development of these PBLs has been reported in Singh et al. 38 . Briefly, a set of 179 outstanding PBLs were selected from a larger set of 984 TC 1 F 5:6 based on their tolerance to biotic and abiotic stresses, namely, heat, drought, yellow rust (Puccinia striiformis f. sp. tritici), and powdery mildew (Blumeria graminis f. sp. tritici (Bgt)), character and yield performance in Mexico. The pedigree information for these lines is presented in Supplementary Table S7. KB-SUS was used as a susceptible line and grown around all of the experimental blocks.
Phenotyping PBLs for Karnal bunt disease through artificial inoculation methods. The phenotypic screening was conducted at Ciudad Obregon, Mexico over two years (2016-17 was referred to as E-1 and 2017-18 was referred to as E-2) using a randomized complete block design (RCBD). For the preparation of the inoculation, one-year-old teliospores were scraped off infected grains with a dissecting needle and incubated in a water-Tween 20 solution for 24 h; then, the suspension was filtered through a 60 µM nylon sieve and centrifuged at 3000 rpm. After discarding the supernatant, 0.5% sodium hypochlorite was applied as the active ingredient for 2 min to disinfect teliospores and the suspension was centrifuged again. Teliospores were then rinsed twice with sterile distilled water, followed by centrifugation. Teliospores were resuspended in sterile distilled water in the centrifuge tube. One milliliter of the teliospore suspension was spread on Petri dishes containing 2% water-agar (AA) and incubated at 18-22 °C in the dark. After 6 to 9 days, teliospore germination was evaluated using a compound microscope at 10x magnification. Pieces of AA with germinated teliospores were removed and placed upside down on the lid of Petri dishes containing potato-dextrose-agar (PDA). After 10 to 14 days, 2 to 3 mL of sterile distilled water was added to the plates and the colonies were scraped gently using a sterile spatula. Hyphae www.nature.com/scientificreports www.nature.com/scientificreports/ and sporidia were inoculated onto other plates with PDA using a sterile syringe, and the plates were incubated at 18-22 °C in the dark for approximately nine days. After the incubation, pieces of PDA with the different fungal propagules were transferred and placed upside down on the lids of sterile glass Petri dishes to induce the production of allantoid secondary sporidia 47,48 . Three milliliters of sterile distilled water were added to the bottom of the plates.
Water was collected from the plates every 24 h; secondary allantoid sporidia were collected and counted using a hemocytometer. Then, the concentration was adjusted to 10,000 sporidia per mL. Five heads of each experimental line at the boot stage were inoculated by injecting 1 mL of the allantoid sporidial suspension during the boot stage. Stems of the inoculated heads were identified with a piece of red plastic. An automatic mist irrigation system was used to water the plants during the period of inoculation (January-March) for 20 min five times per day, and the area was covered with nets to prevent bird damage. Grains were harvested manually, and the healthy and infected grains were counted by a visual inspection to calculate the percentage of infection (infected grains). Standard agronomic practices were used during the crop season. The mean disease incidence (%) was estimated from five inoculated spikes per plot. The phenotyping for KB screening was based on ratings reported as percentages to identify tolerant and susceptible genotypes 2 .  Population Structure and Linkage Disequilibrium (LD) Analysis. The structure was analyzed using K-values ranging from 1 to 8 for the entire population with 6,382 SNPs markers with STRUCTURE software 51 . Five independent analyses were performed for each K-value, with 100,000 set as the length of the burn-in time and number of Markov Chain Monte Carlo replications. The correct estimation of K was provided by an ad hoc statistic deltaK 52 that was calculated using the program STRUCTURE HARVESTER 53 . The linkage disequilibrium analysis was conducted on this panel using GAPIT version 2.0 34 . Birefly, a squared correlation coefficient r 2 was estimated for all pairwise comparisons. Pattern of LD decay was then visualized by plotting pair-wise r 2 values against the genetic distance for A, B, and D genomes separately and for whole genome. For the LOESS estimation of LD decay, genetic distance was estimated as the point where the LOESS curve first crosses the baseline r 2 of 0.1.

Association analysis.
Phenotypic data from two years were analyzed separately and combined for the association analysis. TASSEL software v5.01 54 was used to calculate associations between the markers and KB score with the mixed linear model (MLM) 55 . MLM takes advantage of the population structure (Q) and kinship (K) matrix as covariates during a GWAS to avoid spurious associations 56 . Both of these matrices were also generated using TASSEL 5.01. To correct for multiple testing, the step up procedure of Benjamini and Hochberg 29 which controls the false discovery rate was used with a cut-off value of 0.05. ANOVA analysisand estimates ofcorrelations and heritability were conducted in SAS 9.4. The KB scoing data of both environments separately and mean over the environments (Supplementary Table S9) was used for the association mapping.
Epistatic interactions. Two-and three-locus interactions among associated loci were studied using an in-house script executed in R, as described previously 31,42,50 . A significant threshold of P < 0.001 represented significant marker-marker interactions.
In silico analysis. In silico analysis of the significant loci was conducted in Ensembl Plants sequence database (https://plants.ensembl.org/Triticum_aestivum/Info/Index) by using BLASTN algorithm with functional annotations from IWGSC v1.1 RefSeq annotation. Precisely, to find the candidate genes, the physical starting point of the marker along with chromosome name was put in Ensembl Plants and around 300 bp were added before and after the SNP. For some markers, SNPs fell within gene sequences and hence were classified as direct gene hits in Figure 5. Epistatic interactions among loci associated with KB resistance in wheat. The magnitude of the marker effect (F value) is represented in shades of blue (dark blue indicates a stronger interaction). The magnitude of the epistatic interaction is presented with colors ranging from yellow (stronger interaction) to red (weaker interaction).

conclusions
In the present study, an association mapping panel consisting of 179 PBLs genotyped with 6382 high-quality DArTseq SNP markers was utilized to obtain an understanding of the genetics of Karnal bunt resistance in wheat ( Supplementary Fig. S10). Fifteen significant SNPs were identified on chromosomes 2D, 3B, 4D and 7B in a joint analysis of data collected from 2 years. This report is the first to describe QTLs/genes associated with KB resistance on chromosome 4D. The SNPs associated with KB resistance might be exploited through marker-assisted selection to facilitate the breeding of resistant wheat. Eventually, PBLs carrying more favourable alleles coupled with superior agronomic performance could be utilized as excellent parental materials to produce improved wheat lines that resist this stress. These results provide new information for further research aiming to improve KB resistance in common wheat. Moreover, the 7 best identified resistant lines have become part of the breeding programs of many Asian countries (India and Pakistan) where KB is a major problem.