Identification of resistance loci against new pathotypes of Plasmodiophora brassicae in Brassica napus based on genome-wide association mapping

Genetic resistance is a successful strategy for management of clubroot (Plasmodiophora brassicae) of brassica crops, but resistance can break down quickly. Identification of novel sources of resistance is especially important when new pathotypes arise. In the current study, the reaction of 177 accessions of Brassica napus to four new, virulent pathotypes of P. brassicae was assessed. Each accession was genotyped using genotyping by sequencing to identify and map novel sources of clubroot resistance using mixed linear model (MLM) analysis. The majority of accessions were highly susceptible (70–100 DSI), but a few accessions exhibited strong resistance (0–20 DSI) to pathotypes 5X (21 accessions), 3A (8), 2B (7), and 3D (15), based on the Canadian Clubroot Differential system. In total, 301,753 SNPs were mapped to 19 chromosomes. Population structure analysis indicated that the 177 accessions belong to seven major populations. SNPs were associated with resistance to each pathotype using MLM. In total, 13 important SNP loci were identified, with 9 SNPs mapped to the A-genome and 4 to the C-genome. The SNPs were associated with resistance to pathotypes 5X (2 SNPs), 3A (4), 2B (5) and 3D (6). A Blast search of 1.6 Mb upstream and downstream from each SNP identified 13 disease-resistance genes or domains. The distance between a SNP locus and the nearest resistance gene ranged from 0.04 to 0.74 Mb. The resistant lines and SNP markers identified in this study can be used to breed for resistance to the most prevalent new pathotypes of P. brassicae in Canada.

Variant analysis and annotation. There were slightly more SNPs in the C-genome (160,174 SNPs) than the A-genome (141,579 SNPs). Chromosome A03 had the highest number of SNPs within the A-genome, while C03 contained the highest number of SNPs in the C-genome ( Table 2). The mean density was 2.12 SNP/Kb across the 19 chromosomes. In general, SNP density was higher in the C-genome (2.55 SNPs/Kb) than the A-genome (1.70 SNPs/Kb). C07 had the highest number of SNPs per Kb (2.88) and A10 had the lowest (1.43) ( Table 2). The vast majority of SNPs were bi-allelic (90%), and only 10% were multi-allelic (Supplemental Fig. S2). There was a positive correlation (r = 0.80) between chromosome length and the number of SNPs, but only a weak correlation (r = 0.30) between the number of SNPs and the number of SNPs per Kb. www.nature.com/scientificreports/ The SNPs were annotated using SnpEff software. About 37% of SNPs were annotated within coding regions, 22% within introns, 31% within promoter regions, 0.3% within splice sites, and 9.7% mapped to other genetic regions (Supplemental Fig. S3a). A more detailed SNP annotation was performed using the Variant Effect Predictor (Supplemental Fig. S3b). For SNPs within coding regions, 17% were non-synonymous, 18% were upstreamgene variants, 9% were downstream-gene variants, 23% were synonymous variants, 14% were intron variants, 15% intergenic variants, and 4% were located in the splice site regions and 5′ and 3′ UTRs (Supplemental Fig. S3b). Overall, more SNPs were annotated to the A-genome than to the C-genome (Supplemental Fig. S3c).
Genetic diversity and population structure. For genetic diversity analysis, the SNP markers were filtered based on a minor allele frequency (MAF) of 0.05 and minimum sample count of 80%, which resulted in 104,184 high quality SNPs. The mean MAF was the same for the A-and C-genomes (MAF = 0.14). Chromosome C01 had the highest MAF (0.16), followed by C03 and A07 (0.15), and lowest MAF (0.12) was in chromosomes A09 and C09 ( Table 2). The mean marker heterozygosity (H e ) was 0.06 and the mean accession heterozygosity was 0.14. The average polymorphic information content (PIC) was the same for the A-and C-genomes (0.26). PIC was highest in chromosome C01 (0.27) and lowest (0.24) in A09 ( Table 2). The ratio of transitions (changes from A <-> G and C <-> T) to transversions (changes from A <-> C, A <-> T, G <-> C or G <-> T) was 3: 2.
Linkage disequilibrium analysis. Linkage disequilibrium (LD) in the association panel was calculated using Pearson's r 2 statistic on pairwise combinations of SNPs present across the 19 chromosomes of B. napus (Supplemental Fig. S5). The average LD (r 2 ) across the genome was 0.15. The mean LD was 0.10 in the A-genome and 0.19 in the C-genome. LD values ranged from 0.01 in A09 to 0.19 in C01 (Table 2). Across the genome, LD decayed very rapidly (r 2 = 0.23) within 1.6 Mb (Supplemental Fig. S5).  www.nature.com/scientificreports/ Genome-wide association analysis. Genome-wide association analysis for clubroot severity was conducted using a general linear model (GLM) for naïve and P + Q analysis, and a mixed linear model (MLM) for P + K and P + Q + K. Naïve refers to genotypes and phenotypes only, Q is structure, and K is kinship. The quantile-quantile (Q-Q) plots from all models revealed that, save for significant SNPs, the distribution of observed -log 10 (p) was closest to the expected distribution in the P + Q + K compared to other models, so associations were identified using this model. A significance level of P ≤ 0.05/N (N: number of SNPs) based on the Bonferroni correction and a less stringent suggestive significance level at P ≤ 0.5/N were selected. Association analysis detected 13 SNPs associated with resistance to the four P. brassicae pathotypes distributed on chromosomes A01, A03, A04, A05, C03, C07 and C09, with two SNPs (one on A05 and one on C07) associated with resistance to 5X, four SNPs to 3A (three on A03 and one on C03), three SNPs to 2B (on A01) and six SNPs (four on A01, one on C03 and one on C09) associated with resistance to 3D (Table 3, Fig. 3).
Four significant SNPs (A01_19406654, A01_19406667, A01_19406668 and A01_19406672) on chromosome A01 were associated with resistance to pathotypes 2B and 3D. These four SNPs were in the same B. napus gene, BnaA01g27830D, which encoded glutamate decarboxylase 5. For pathotypes 3A, two significant SNPs (A03_50605605 and A03_50605321) on chromosome A03 were detected in the same gene, BnaA03g05600D, and www.nature.com/scientificreports/ SNP C03_327139017 on chromosome C03 was detected in gene BnaC03g07170D. Both genes on chromosomes A03 and C03 encode galactose oxidase/kelch repeat superfamily protein (Table 3).

Candidate resistance genes.
A Blast search of 1.6 Mb upstream and downstream regions (distance based on LD) from significant SNPs was performed to identify any nucleotide-binding-site/leucine-rich-repeat (NBS-LRR) resistance protein encoding genes. For pathotype 5X, a coiled coil / nucleotide binding site / leucine-rich repeat (CC-NBS-LRR or CNL) family was identified at 0.53 Mb from the significant SNP, A05_101850460, on chromosome A05. For the SNP on C07, no NBS-LRR gene was found (Table 4). For pathotype 2B, no NBS-LRR gene was found in LD vicinity of the SNP on A04, while for SNPs on A01, a putative disease resistance protein, At3g15700, was detected at 0.16 Mb from the SNPs (Table 4). For pathotype 3A, no disease resistance genes were found adjacent to SNP A03_50588774 on A03, but a gene encoding an LRR domain was found at 0.08 Mb from the SNP. For SNP C03_327139017 on C03, an avirulence-induced gene (AIG1) family protein was detected at 0.04 Mb from the SNP. For pathotype 3D, a toll-interleukin-1 receptor/nucleotide binding site/leucine-rich repeat (TIR-NBS-LRR or TNL) gene was found at 0.07 Mb from the SNP, C03_356260031, on C03 (Table 4).

Discussion
In the current study, GWAS was used to identify and map new sources of resistance to four recently identified pathotypes (5X, 3A, 2B and 3D) of P. brassicae in 177 accessions of B. napus from around the world. The majority of the accessions were highly susceptible to all four pathotypes (80-100 DSI), but ~ 10% showed strong www.nature.com/scientificreports/  www.nature.com/scientificreports/ resistance (0-20 DSI). This indicated that sources of strong resistance to clubroot were much less common in B. napus than in B. rapa 24,25 . In total, 13 SNPs were associated with resistance, of which nine SNPs were on the A-genome, and four on the C-genome. Although the A-genome (from B. rapa) appeared to carry more QTL for clubroot resistance than the C-genome, this result indicated that the C-genome (from B. oleracea) could also be a potential source for clubroot resistance. Interestingly, four SNPs on A01 were common for pathotypes 2B and 3D, which may indicate that there is a common QTL for resistance to the two pathotypes. Moreover, SNPs located on A03 and C03 were associated with resistance to pathotype 3A. A Blast search determined that these SNPs were located very close to or within the same gene on both chromosomes. Association of SNPs with a single gene on separate chromosomes indicated the occurrence of inter-genome gene duplication, which is a common phenomenon in B. napus [40][41][42] . QTL for clubroot resistance have been identified previously in B. napus, and several have been mapped to chromosomes A01, A02, A03, A08, C02, C03, C06, C07, and C09 34,35 . In those previous studies, the disease reaction to only a single strain of P. brassicae was assessed, and a SNP array or associative transcriptome was used rather than GBS, which might explain the difference in the number and identity of QTL detected. In a recent GWAS study, 45 QTLs were identified against 13 strains on chromosomes A01 to A10, and C02, C03 and C05 43 . In the current study, two pathotypes (3A and 2B) included in a recent study 43 were analysed using a larger and potentially more diverse germplasm collection. The current study, however, identified different QTL from those in the previous study 43 . One possible explanation for the difference between the studies could be the analysis of DSI. A rank-based inverse normal transformation was used to make the DSI values nearly fit the normal distribution required for parametric model-based association analysis in the current study, while highly skewed DSIs were used in the previous study 43 . In addition, different sources of B. napus collections in the current study from the previous report 43 could be an important factor contributing to the discrepancy. We conclude that all 13 QTL identified in the current study are likely to be novel because they were located at different physical locations on the chromosomes from the QTL identified previously 43 .
Extensive efforts have been made to map the genes for resistance to P. brassicae in Brassica species via biparental mapping approaches. In our previous studies, eight genes / QTL loci for resistance to pathotype 3H or 5X or both were mapped into chromosomes A02 (Rcr8), A03 (Rcr1, Rcr2, Rcr4 and Rcr5) and A08 (Rcr3, Rcr9 and Rcr9 wa ), and one gene into chromosome C07 (Rcr7) 13,14,[22][23][24][25] . In the current study, 13 SNPs were identified from seven chromosomes (A01, A03, A04, A05, C03, C07 and C09) for resistance to four new pathotypes (2B, 3A, 3D and 5X) of P. brassica from Canada. None of the previously identified genes/QTL for resistance to Canadian pathotypes reside in chromosomes A01, A04, A05, C03 or C09. Although three SNPs were identified in chromosome A03 where Rcr1, Rcr2, Rcr4 and Rcr5 were located and one SNP in chromosome C07 where Rcr7 was located, these four SNPs were located in different regions from the identified genes. Three closely linked SNP A03_50588774, A03_50605321 and A03_50605605 associated with resistance to 3A were identified in the 2.5 Mb region of chromosome A03, while Rcr1, Rcr2, Rcr4 and Rcr5 spanned in the region of 23-25 Mb of chromosome A03 in the B. napus reference genome. Furthermore, SNP C07_531619426 associated with resistance to 5X was www.nature.com/scientificreports/ located at 17.9 Mb of chromosome C07. However, Rcr7 was identified in B. oleraces, in a region equivalent to the 25 Mb of chromosome C07 in the B. napus genome. The majority of disease resistance genes identified in plants have been classified as TNL or CNL proteins, with ~ 70% of NBS-LRR genes in the Brassicaceae family being TNLs [44][45][46][47] . In the current study, five putative resistance genes were identified within LD distance of the SNPs associated with resistance to the four pathotypes. The resistance genes belonged to the TNL family (one gene) and the CNL family (one gene), as well as other types of resistance genes. The gene BnaA03g065730, which mapped close to SNP A03_50588774, contained the LRR protein domain that is an essential part of NBS-LRR proteins 48 . Moreover, the gene BnaC03g07250D that was located at 0.04 Mb from SNP C03_327139017 encoded an avirulence-induced gene (AIG1) family protein that, according to the gene-for-gene hypothesis by Flor 49 , should belong to a disease resistance protein-encoding gene family.
Population structure can have a significant impact on GWAS. Structure analysis using the entire SNP panel from GBS indicated that the core collection was comprised of seven sub-populations. However, a previous study using a SNP panel obtained from the Brassica 60 K Illumina Infinium SNP array identified only two major subpopulation 50 . Similarly, the difference in LD between studies at the chromosomal level as well as at the genome level appeared to be associated with the density of molecular markers. In the current study, LD extended further in the C-genome relative to the A-genome. LD decayed very rapidly (r 2 = 0.23) within 1.6 Mb, while the range of previous reported LD decay values was affected by the diversity within the germplasm collection that was examined [51][52][53] .
In summary, the current study identified several accessions of B. napus with high levels of resistance to one or more or the four important new pathotypes of P. brassicae examined. Genome-wide association mapping analysis detected and mapped 13 SNP loci associated with resistance to the four pathotypes. This information will be used in subsequent genetic analysis of bi-parental populations to verify the SNPs and fine map the functional genes responsible for resistance to each pathotype, and also used for marker-assisted breeding of resistance to clubroot in canola.

Materials and methods
Plant and pathogen materials. A collection of 671 B. napus accessions were obtained from three gene banks; the Plant Genetic Resources of Canada, the Centre for Genetic Resources of the Netherlands, and Agricultural Research Service of the United of America. Self-pollination was performed for each line under greenhouse conditions to reduce the level of heterozygosity. The accessions were evaluated for resistance to a field collection of pathotype 5X (strain LG02) that had been characterized by Dr. S.E. Strelkov using the Canadian Clubroot Differential 6 system. Also, selected lines were tested for resistance to field collections of pathotypes 3A (strain F.P. [3][4][5][6][7][8][9][10][11][12][13][14], 2B (F.P. 183-14) and 3D (F.P. 1-14) of P. brassicae (also provided by Dr. Strelkov) and then genotyped using a GBS platform.
Seedlings for GBS analysis were grown to the 3-4 leaf stage in a growth chamber. A small (100 mg) sample of leaf tissue was collected from each accession, immediately frozen in liquid nitrogen, lyophilized in a freeze dryer for ~ 48 h and ground to a fine powder using a tissue lyser (Qiagen, Newtown City, USA).
Resting spores of pathotypes 5X, 3A, 2B, and 3D of P. brassicae were increased on susceptible canola and stored as frozen clubbed roots at − 20 °C until needed. Resting spores were extracted from the frozen clubs as described by Strelkov et al. 54 , and adjusted to a concentration of 1 × 10 7 resting spores mL −1 . Spores of each pathotype were applied separately to the host entries.
Evaluation of clubroot reaction. The experiment was conducted in a growth chamber under the controlled environment following the method similar to that described by Suwabe et al. 16 . Seed of each host genotype was pre-germinated on moistened filter paper in a Petri dish. One-week-old seedlings of each host line × pathotype combination with 12 plants each were inoculated by dipping the entire root system in the resting spore suspension for 10 s. The inoculated seedlings were then immediately planted in 6 cm × 6 cm × 6 cm plastic pots filled with Sunshine LA4 potting mixture, with one seedling per pot. The pots were thoroughly watered and transferred to a greenhouse at 21 °C ± 2 °C with a 16 h photoperiod. The potting mixture was kept saturated with tap water at pH 6.5 for the first week after inoculation and then watered and fertilized as required.
Six weeks after inoculation, the seedlings were gently removed from the potting mix, the roots of each plant were washed with tap water, and each root was rated for clubroot symptom severity on a 0 to 3 scale, where: 0 = no clubs, 1 = a few small clubs on less than one-third of the roots, 2 = moderate clubs (small to medium-sized clubs on 1/3 to 2/3 of the roots), and 3 = severe clubs (medium to large-sized clubs on > 2/3 of the roots). A disease severity index (DSI) was then calculated using the formula of Horiuchi and Horias, modified by Strelkov et al. 54 .
where n is the number of plants in a class; N is the total number of plants in an experimental unit; and 0, 1, 2 and 3 are the symptom severity classes.
Assessment of the disease reaction of each accession with a resistant response (DSI < 20) in the initial test was repeated two more times. Each of these repetitions provided a similar result. www.nature.com/scientificreports/ Sequence analysis and SNP discovery. The accession sequences were analyzed using genotyping by sequencing (GBS) in four major steps; DNA sample preparation, library construction, library sequencing and SNP calling. For sample preparation, DNA extraction was performed using the DNeasy 96 plant kit as per the manufacturer's instruction (Qiagen). To reduce the genome complexity, DNA was digested with ApeKI, a methylation-sensitive restriction enzyme. For library construction, the fragments produced by digestion were directly ligated to enzyme-specific adapters followed by PCR amplification. For sequencing, the samples were divided into two pools of 96 samples each, and assessed in two runs of Illumina HiSeq 2500 (Illumina Inc., USA). DNA alignment was generated with BWA software version 0.7.8-r455. The GBS-TASSEL pipeline 55 was used for SNP calling, and VCF and HapMap genotype files were generated. Initial SNP filtration was performed with the following settings: minor allele frequency (MAF) > 0.01 and missing data per site < 90%. Accessions with too much missing data were removed. Depth, missingness and heterozygosity were calculated using VCFtools V.0.1.12 56 59 , and variant locations were characterised as coding, intron, splice site, promoter and intergenic regions, intergenic, upstream, downstream, and synonymous.
Genetic diversity and population structure. Population-based genetic diversity, including allele frequencies, minor allele frequency (MAF), and average heterozygosity, were computed using TASSEL 5.2.18 software 60 . Polymorphic information content (PIC) values were calculated for SNP markers using the formula (PIC = 1 − (maf 2 + (1-maf) 2 ))-(2maf 2 (1-maf) 2 )), where maf = is the minor allele frequency. The ratio of transitions to transversions was calculated using the Kimura 2-parameter model in MEGA7 61 . Structure analysis of the accessions was conducted using fastSTRU CTU RE v2.2 62 . The admixture model and correlated allele frequency were applied with a burn-in period of 50,000 iterations and 100,000 replications of Markov Chain Monte Carlo (MCMC).
Linkage disequilibrium (LD) analysis. LD decay across the B. napus genome was measured and a correlation matrix of r 2 values was computed between all pairs of polymorphic SNPs with MAF ≥ 5% using the GAPIT V3 package 63 .
Genome-wide association analysis. Clubroot severity data (disease severity index, DSI) were transformed using rank-based inverse normal transformation using the rntransform function in the GenABEL R Library 64 . Associations were analyzed for all SNP markers with MAF ≥ 5%, and evenly distributed (1 SNP per 100 Kb) using the following models: general linear model (GLM) for naïve and P + Q, and mixed linear models (MLM) for P + K and P + Q + K, where naïve refers to genotypes and phenotypes only, Q is structure and K is kinship. A kinship matrix of the accessions was calculated and principle components analysis was used to account for population structure and accession relatedness. The association analysis was done using TASSEL v.5 60 .
Candidate resistance genes. Using Blast2Go software 65 , the sequence regions neighboring and within LD of the significant SNPs were searched for candidate genes encoding disease resistance proteins that were potentially responsible for the resistance to each pathotype of P. brassicae. www.nature.com/scientificreports/