Two QTLs controlling Clubroot resistance identified from Bulked Segregant Sequencing in Pakchoi (Brassica campestris ssp. chinensis Makino)

Clubroot, caused by Plasmodiophora Brassicae, is a serious soil-borne disease in worldwide. In recent years, progression of clubroot is rapid and serious in Shanghai, China. In this study, The inheritance of clubroot resistance (CR) were determined in pakchoi using F2 segregation population that were developed by crossing highly resistant line ‘CR38’ and susceptible line ‘CS22’. Two novel QTLs, qBrCR38-1 and qBrCR38-2, was identified by BSA-seq (Bulked Segregant Sequencing) resistant to P. brassicae physiological race 7. Two significant peak qBrCR38-1 and qBrCR38-2 were observed by three statistical methods between interval of 19.7–20.6 Mb in chromosome A07 and 20.0–20.6 Mb in chromosome A08, respectively. In addition, Polymorphic SNPs identified within target regions were converted to kompetitive allele-specific PCR (KASP) assays. In target regions of qBrCR38-1 and qBrCR38-2, there were twenty SNP sites identified, eleven KASP markers of which are significantly associated to CR (P < 0.05). Seven candidate genes were identified and found to be involved in disease resistance (TIR-NBS-LRR proteins), defense responses of bacterium and fungi and biotic/abiotic stress response in the target regions harboring the two QTLs. Two novel QTLs and candidate genes identified from the present study provide insights into the genetic mechanism of CR in B.rapa, and the associated SNPs can be effectively used for marker-assisted breeding.

Bulked segregant analysis by sequencing (BSA-seq), is an effective technique used to identify quantitative trait loci (QTLs) 19,20 . It is a genomics tool used for genetic mapping which takes advantage of bulked-segregant analysis and high-throughput genotyping using next-generation sequencing (NGS) 21,22 . BSA-Seq has been applied to mapping agronomically important loci in Arabidopsis, rice and wheat [23][24][25] . Candidate genes for disease resistance have been identified successfully by this approach, such as a broad-spectrum resistance gene Pi65 (t) in rice 26 , downy mildew and powdery mildew resistance QTLs in cucumber 27,28 . BSA-seq technologies, have proven successful for rapidly establishing the association of agronomic traits with molecular markers and had a major impact on crop breeding 29 . Meanwile, different statistical methods for poolsd QTL mapping have been proposed, including MutMap 23 , G' value 30 and ED 31 , that would have noise reduction and highlighting effect of QTLs. Moreover, it requires an efficient platform for applying molecular markers to marker-assisted selection (MAS) in breeding. Kompetitive Allele Specific PCR (KASP) is one of the high-throughput SNP genotyping technologies is a cost-effective, low genotyping error rates and flexible system which is widely used for genetic mapping, trait-specific markers development, germplasm characterization (genetic diversity, genetic relationship, and population structure), and quality control (QC) analysis (genetic identity, genetic purity, and parentage verification) [32][33][34] .
The Pakchoi (Brassica campestris ssp. chinensis Makino), also called non-heading Chinese cabbage, is one of the most important Brassica vegetable crop in China and East Asia 35 . Most Pakchoi cultivars are highly susceptible to the P. brassicae. In this study, BSA-seq was applied to determine the inheritance of clubroot resistance (CR) in Pakchoi inbred line 'CR38' and F 2 population. Then, the underlying QTLs were mapped by three statistical methods and examined sequence variation in the target region to identify the most probable candidate gene associated with CR.

Results
Clubroot Resistance evaluations in CR38 × CS22. 'CR38' were highly resistant (DSI = 3.33 in 2016, DSI = 4.76 in 2017) to the 7 physiology race of P. brassicae in contrast to the 'CS22' which was more susceptible (DSI = 100 in 2016, DSI = 94.44 in 2017). 'CR38' and 'CS22' (Fig. 1a), were crossed to develop segregating populations for QTL analysis of CR. Clubroot symptoms of 294 F 2 individuals were identified in autumn 2017. F 2 population exhibited a continuous frequency distribution with a range of 0-3 and DSI of 34.24, suggesting polygenic control of CR in this population (Table 1).
Whole genome sequencing analysis. BSA-seq were used to identify the loci controlling CR in our F 2 population, from which 22 highly resistant and 22 highly susceptible individuals were selected and two DNA pool were created. Illumina high-throughput sequencing generated 51,292,360 and 59,493,747 short reads (150 bp in length) from the CR-pool and the CS-pool, with a coverage of 98.11% (39-fold genome coverage) and 98.46% (45-fold genome coverage), respectively. Over 97% of the reads in both pools were mapped to the Chiifu-401 reference genome. The Q20% was 96.12% and 95.91% of the CR -pool and CS-pool. The effective sequencing depths for CR38 and CS22 were 20-fold and 25-fold genome coverage, respectively, which guaranteed accuracy of the subsequent analysis. The results from sequencing are presented in Table 2. www.nature.com/scientificreports www.nature.com/scientificreports/ Mapping of Clubroot resistance. Sequence data were trimmed and filtered prior to analysis. Compared to the reference genome, 1,489,940 SNPs were identified between the CR38 and CS22 parents. Based on the uniquely mapped reads, association analysis between the two bulks was performed on 1,079,828 SNPs. To identify genomic regions associated with CR, three methods were used to mapping the QTLs. Firstly, Scanning each SNP in the whole genome, the Ratio of SNP-ratioin CR-pool and CS-pool was calculated 36 (Fig. 2A). Two significant peak qBrCR38-1 and qBrCR38-2 was observed between 19.7-20.6 Mb of chromosome A07 and 20.0-20.6 Mb of chromosome A08 (Fig. 2a). Next, Allelic segregation between CR and CS pools by Euclidean Distance (ED) 31 were measured. ED 4 was raising ED to the fourth power to decrease noise. Local polynomial regression methods (LOESS fit) of ED 4 calculated shown in Fig. 2b. The identified peaks that have an ED above a threshold and a high allele frequency in the CR-pool. G value at each SNP were calculated, and G is a smooth version of G 30 . The statistics result demonstrated that there were two QTLs at the same peaks position about 20.1 Mb in chromosome A07 and 20.2 Mb in chromosome A08 (Fig. 2b). We found 5441 and 1887 SNPs between the two parents in the target regions of chromosome A07 and chromosome A08.
Corresponding sequences in flanking 50 bp region were used to develop allele-specific KASP primers which were genotype of the parental lines and F 2 population. There were twenty SNP sites identified at physical positon 18.4 Mb -20.8 Mb and 19.4 Mb -21.8 Mb around regions of qBrCR38-1 and qBrCR38-2 (Table 3). qBrCR38-1 was mapped in the interval of 0.35 cM flanked by Br_K_07106 and Br_K_07107. Br_K_080103 and Br_K_080107 are linkage to qBrCR38-2 within the genetic distance of 0.33 cM. To evaluate the effect of qBrCR38-1 and qBrCR38-2, F 2 population were classified into two groups (AA and aa) based on the genotype of the most tightly linked markers. Lines carrying the CR38 alleles (AA) are significantly (P < 0.05) clubroot resistant than those carrying the CS22 alleles (aa) in eleven SNP markers (Table 3). Close-up view of QTLs and KASP markers link to the target regions was shown in Fig. 3.

Candidate gene identification.
A total of 188 and 116 annotated genes encoded sequences encompassed the target regions of qBrCR38-1 and qBrCR38-2, respectively. The functional annotation of genes in the target region indicates their potential involvement in fungal disease resistance assessed for further analysis. In chromosome A07, candidate gene BraA07002412, homologous to AT1G80460, encodes protein NHO1 (nonhost resistant gene 1) similar to glycerol kinase and performs a rate-limiting step in glycerol metabolism. In addition, candidate gene BraA08002455 in chromosome A08, homologous to AT1G65850, encodes Toll-Interleukin-1 receptor/nucleotide binding site/leucine-rich repeat (TIR-NBS-LRR class) disease resistance protein. Moreover, six genes with F-box domain (BraA07002249, and BraA07002494 in A07, BraA08002451 and BraA08002452in A08) and a WRKY transcription factor (BraA08002471) were considered as the functional genes associated with CR (Table 4).

Discussion
The traditional fine mapping method is time-consuming and is rather laborious to require the identification of genome-wide polymorphic DNA markers for linkage analysis. Furthermore, it needs to use the genetic mapping and physical mapping to locate the target gene in the specific location of chromosomes. With the rapid development of sequencing technology and genomics, using BSA-seq of extreme mixed pools, a large number of traits can be located quickly and effectively. Compared to traditional QTL mapping, BAS-seq has the advantages of simple operation and short test period, and does not require specific near-isogenic lines to be constructed. Through high-throughput sequencing, thousands of SNPs can be captured at one time, and this variation information can be used to identify trait-related genes or QTLs as well as to develop molecular markers. The accuracy of BSA-seq results is very important to identify the resistance loci of CR, and there are many influencing factors. With improvement of reference genome in Chinese cabbage using NGS and Pacbio SMRT sequencing, it is possible www.nature.com/scientificreports www.nature.com/scientificreports/ to map genes for different traits in Chinese cabbage crops effectively and get more complete gene sequences. In order to highlight the significance of the positioning interval for better noise reduction, Three appropriate statistical approach were used to observed same two significant peak of qBrCR38-1 and qBrCR38-2. SNP-ratio and ED4 increasing CR sensiticity near the high allele frequency locus in the CR-pool. G' is expected to decrease much more rapidly around the causal site, implying narrower intervals of support around QTLs 30 . BSA-Seq combination with KASP analysis is a powerful approach for fine mapping of causal genes. KASP is a cost-effective single-step genotyping technology, cheaper than SSRs and more flexible than genotyping by sequencing (GBS) or array-based genotyping. In our research, KASP markers linkaged CR were obtained in the target region of QTLs. Alignments of the physical map corresponds to the genetic map around target regions were showed good collinearity (Fig. 4). The inconsistencies such as Br_K_080103 might be caused by chromosomal rearrangement in Pakchoi. Eleven SNP sites were significantly associated to CR, but nine SNP sites have week effective to CR in chromosome A08. Thus it is speculated that qBrCR38-1 is a major effective locus. In general, our research provide an important way to molecular marker assisted selection for CR on backcross breeding and gene pyramiding in B.rapa.

Figure 2.
Genome-wide scan for clubroot resistance QTLs using BSA-seq. Two significant QTL of qBrCR38-1 and qBrCR38-2 detected in chromosomes A07 and A08 using three statistical approach. (a) Ratio (CR-pool/CSpool) of the SNP-ratios (Resistant alleles/sensitive alleles) is presented. The significance peak is indicated by the horizontal dotted line. (b) Distribution of G'value and Loess fit curve calculated using ED 4 data on chromosome A07 and A08 (2019) Table 3. KASP marker in target region. *P < 0.05, **P < 0.01.    17,18,39 . In the present study, the resistant parent "CR38" has high resistance to P. brassicae of physiological race 7 which is widespread in Shanghai, China. There were no clubroot resistant genes, even no resistant cultivars. Therefore, the development of cultivars and the mapping genes resistant to physiological race 7 in Pakchoi will provide a basis for disease resistance breeding of Chinese cabbage.

A. thaliana homologs Annotations
Previous researches have applied several strategies to identify and map genes and QTL involved in CR in B.rapa. Crr1 and Rcr9 16 were both located in chromosomes A08, while no genes or QTLs were found in chromosomes A07. In the present study, clubroot resistance in 'CR38' was identified as quantitative inheritance and was regulated by multiple genes. In addition, qBrCR38-1 and qBrCR38-2 in chromosomes A07 and A08 associated with CR in Pakchoi were detected by BSA-seq. We blasted the sequence from linkage marker (BSA7) of Crr1 and Rcr9 target region to B.rapa reference genome V2.5. Crr1 and Rcr9 were located in 14.27 Mb and 13.5-14.2 Mb respectively, thus showing that these two gene were not in the same chromosomal position as qBrCR38-2 (Fig. 3).
Annotation information and homologous sequence alignment of candidate genes revealed new disease-resistant genes involved in CR and further studies are required. NOH1 was reported to be related to both general and specific resistance against bacterium and fungi 40 . TIR-NBS-LRR resistant protein plays an important role in regulating the activity of plant disease resistance proteins 41 . Recent studies suggested that F-box proteins may up-regulate defense-related gene expression in rice 42 , and play a role in cell death and defense responses activated during pathogen recognition in tobacco and tomato 43 . WRKY transcription factors family is known to be involved in biotic/abiotic stress response 44 . Overall, Genomic intervals of QTL loci still need to be narrowed down to determine candidate genes. Candidate genes identified from the present study provide insights into the genetic mechanism of clubroot resistant in B.rapa.
In conclusion, BSA-seq Combine with KASP analysis was carried out to conduct the mapping of CR in Pakchoi. As a result, two novel QTLs regions on chromosome A07 and A08 respectively were detected, and seven genes in target chromosome region associated with the disease resistance were considered as candidates for CR. These genes need to be further studied, and the associated SNPs should be used for marker-assisted breeding of CR in B.rapa

Materials and Methods
Plant materials and phenotypic data collection. Two Pakchoi inbred line, CR38 (Clubroot resistance) and CS22 (Clubroot susceptible), were used as parents to generate 292 F 2 individuals for studies on inheritance and genetic mapping. CR38 is cold tolerant type and highly resistant to the 7th physiology race of Plasmodiophora Brassicae by using the inoculation method of Williams (1966) 45 . The pathogen was propagated on Pakchoi, and the clubs in infected roots were stored at −20 °C until required. All CR test plants were sown in a pot containing 5 × 10 6 spores per gram of dry soil, and cultivation was carried out in a growth chamber at 25/20 °C (day/night) with a photo-period of 14 h. At 6 weeks after sowing, the root symptoms of each plant were evaluated. A disease severity index (DSI) of two parents was calculated as according to Suwabe (2003) 8 . Each test included control-resistant and susceptible cultivars. All plant materials examined in this study were obtained from Shanghai academy of agricultural science. www.nature.com/scientificreports www.nature.com/scientificreports/ Library construction and Whole genome sequencing. Total genomic DNA from the two parental lines, CR38 and CS22, was isolated from young leaves according to the CTAB method 46 . Equal amounts of genomic DNA were sampled from individuals with leaves (22 DNA samples for each extreme trait) and bulked to generate the Clubroot Resistance pool (CR-pool) and Clubroot Sensitive pool (CS-pool). Pair-end sequencing libraries with a read length of 150 bp and insert sizes of approximately 350 bp were subjected to whole-genome re-sequencing with Illumina HiSeq 2500. Short reads obtained from both parents and two DNA-bulks were aligned against the B. rapa Chiifu-401 reference genome sequence v2.5 (http://brassicadb.org/brad/) to obtain the consensus sequence using BWA software 47 . Reads of CR-pool and CS-pool were separately aligned to CR38 and CS22 consensus sequence reads to call SNPs with GATK tools software 48 . Heterozygote alleles in two parents were filtered out during the process. Raw read data are archived at the NCBI Sequence Read Archive (SRA) under Accession PRJNA497673.
QTL mapping and target region annotation. Assuming that "A" is a SNP identified between CR38 and CS22, the CR38 and CS22 genotypes at this site are "AA" and "aa", respectively. For the CR-pool, nRA and nRa are the numbers of reads containing "A" and "a", Respectively. Three methods were used to mapping QTLs. Firstly, SNP-ratio (Resistant alleles/sensitive alleles) of CR-pool and CS-pool were calculated. Then, CR-pool SNP-ratio was divided by CS-pool SNP-ratio and plotted across the genomic regions that showed ratio peaks, which indicate the possible existence of a QTL 36 = Ratio (nRA/nRa)/(nSA/nSa) Read depth for each allele at segregating allelic SNPs in 500 kb sliding windows was summed using a 100 kb step increment.
The ED for each SNP was calculated according to a previous report 31 . ED is the sum of 100 ED-SNP values within a window of 100 consecutive SNPs. ED 4 was then calculated by raising ED to the fourth power. Loess fit curves for SNP allele frequency ED 4 . G' , the G value averaged across neighboring SNPs, was calculated as described by Magwene et al. (2011) 30 . There methods for mapping common target regions were considered as significant loci.
A list of genes within the target regions was generated using the B.rapa genome annotation data (http://brassicadb.org/brad/), and these genes were used to query the GO annotation on the website of gene ontology to identify putative resistance related genes for further analysis.

KASP Marker development and genotyping in target region. Polymorphic SNPs identified within
the target regions for clubroot resistance were converted into KASP markers (Supplementary Table S1). Each SNP site were parsed to retrieve the flanking sequences 50 bp for KASP markers in the target region. The criteria for selection were that the flanking sequences (a) did not contain any other SNP and InDels, (b) had no more than four consecutive repeats, (c) was close to the ratio peaks on the physical location and (d) Primer sequences were relatively unique in the genome.
For each SNP, two allele-specific forward primers and one common reverse primer were designed using the Primer3 software. Target region SNPs were converted into KASP primers and were used to test the entire F2 population. Reactions were performed in 384 well plates, with a final reaction volume of 3 μl, which contained 1.48 μl of KASP 2X reaction mix, 50 ng of template DNA, 0.17 μM Hex forward primer, 0.17 μM FAM forward primer and 0.42 μM universal reverse primer. The following cycling conditions were used: 15 min at 94 °C followed by 10 touchdown cycles of 20 s at 95 °C and 60 s at 65 °C (dropping 0.8 °C per cycle); after the final annealing temperature of 56 °C was achieved, there were 26 cycles of 20 s at 94 °C and 60 s at 57 °C. Thermocycling and fluorescence readings were performed on Hydrocycler and PHERAstar of LGC SNPline platform. Genotyping data viewed as a cluster plot by SNPviewer software supported from LGC Genomics (http://www.lgcgenomics.com). The significance of the correlation coefficients between genotype and phenotype was determined with t-tests. Linkage groups were performed with JoinMap 4.1 (Van Ooijenand Voorrips, 2001).