Molecular dissection of resistance gene cluster and candidate gene identification of Pl17 and Pl19 in sunflower by whole-genome resequencing

Sunflower (Helianthus annuus L.) production is challenged by different biotic and abiotic stresses, among which downy mildew (DM) is a severe biotic stress that is detrimental to sunflower yield and quality in many sunflower-growing regions worldwide. Resistance against its infestation in sunflower is commonly regulated by single dominant genes. Pl17 and Pl19 are two broad-spectrum DM resistance genes that have been previously mapped to a gene cluster spanning a 3.2 Mb region at the upper end of sunflower chromosome 4. Using a whole-genome resequencing approach combined with a reference sequence-based chromosome walking strategy and high-density mapping populations, we narrowed down Pl17 to a 15-kb region flanked by SNP markers C4_5711524 and SPB0001. A prospective candidate gene HanXRQChr04g0095641 for Pl17 was identified, encoding a typical TNL resistance gene protein. Pl19 was delimited to a 35-kb region and was approximately 1 Mb away from Pl17, flanked by SNP markers C4_6676629 and C4_6711381. The only gene present within the delineated Pl19 locus in the reference genome, HanXRQChr04g0095951, was predicted to encode an RNA methyltransferase family protein. Six and eight SNP markers diagnostic for Pl17 and Pl19, respectively, were identified upon evaluation of 96 diverse sunflower lines, providing a very useful tool for marker-assisted selection in sunflower breeding programs.

changes in the pathogen populations due to the coevolution between the pathogen and sunflower host 5,6 . Some of the Pl genes that have been widely used to combat DM infection in sunflower have already been ineffective against new races of P. halstedii, such as Pl 6 and Pl 7 7-9 . A recent survey showed that only the Pl Arg , Pl 15 , Pl 17 , Pl 18 , and Pl 33 genes remained effectively resistant against a total of 185 P. halstedii isolates collected from North Dakota, South Dakota and Nebraska sunflower production regions in the United States when a total of twelve known DM R genes were tested, including Pl 1 , Pl 2 , Pl 5 , Pl 6 , Pl 13 , Pl 15 -Pl 18 , Pl 21 , Pl 33 , and Pl Arg 10 .
Intensive breeding efforts in sunflower have narrowed down the genetic variability of the sunflower genome, resulting in a constant need to identify and deploy new agronomically important genes. There are 53 wild sunflower species belonging to the Helianthus genus, which are invaluable reservoirs of agronomically desirable genes [11][12][13] . The oil maintainer line HA 458 (PI 655009) is resistant to all North American P. halstedii races identified thus far 10,14 . It harbors the DM R gene, Pl 17 , originating from the wild H. annuus L. accession PI 468435. The other DM R gene, Pl 19 , was also identified from the wild H. annuus L. accession PI 435414 15 . Both Pl 17 and Pl 19 genes were previously mapped to sunflower chromosome 4 corresponding to linkage group 4 in a similar position 16,17 . Recently, four additional novel DM R genes, Pl 27 -Pl 29 and Pl 33 , were identified in proximity to Pl 17 and Pl 19 on chromosome 4 18,19 , while Pl 17 , Pl 19 , and Pl 33 were in an interval spanning a physical distance of approximately 3.2 Mb when the flanking markers were positioned on chromosome 4 pseudomolecules of the HA412-HO genome sequence 16,17,19 .
The three DM R genes, Pl 17 , Pl 19 , and Pl 33 , were highly effective toward the most predominant and virulent races of P. halstedii and have not been widely used for commercial sunflower production. The broad-spectrum DM resistance and similar position of these R genes make it infeasible to select individuals harboring respective R gene based on phenotyping. Diagnostic molecular markers would provide a timely and accurate selection tool for sunflower breeding programs and would be developed with the advancement of rapidly developing sequencing technology combined with the single nucleotide polymorphism (SNP) genotyping system.
The publicly available genomic resources of the two assembled and annotated genome reference sequences of HA412-HO and XRQ in sunflower are powerful tools to study the genetic basis of agronomically important traits, to utilize sequence information for marker development, and to dissect the trait-governing genes genetically and molecularly 20 . Whole-genome resequencing can be utilized for efficiently identifying SNP, insertion and deletion (InDel), structure variation (SV), and copy number variation (CNV) in a massively parallel manner. The extremely high distribution of SNPs in the genomes of all organism makes it a powerful genetic tool for population genetics studies and marker-trait association analyses. However, current use of PCR-based approaches for genotyping of individual SNPs of special interest is still limited by accuracy, throughput, simplicity, and operational costs. An innovative SNP genotyping method has been developed in our laboratory, which adapts to multiple platforms and throughputs, allowing a PCR-based technology to genotype individual SNPs 16,21,22 . In the current study, we report the use of reference sequence-based chromosome walking toward the target genes, Pl 17 and Pl 19 , identify candidate genes, and develop user-friendly SNP markers diagnostic for Pl 17 and Pl 19 .

Results
Saturation and fine mapping of Pl 17 . Two strategies were adopted for marker development. At first, the genome sequence of chromosome 4 was extracted from the HA412-HO reference assembly from 3,621,089 to 6,852,749 bp and the XRQ assembly from 5,662,479 to 5,707,598 bp, which covers the Pl 17 and Pl 19 loci reported in previous studies 16,17 . A total of 101 pairs of primers, including 40 STSs and 61 SSRs were screened for polymorphisms between the parents HA 458 (Pl 17 ) and HA 234. Polymorphic markers were further used to genotype 186 F 2 individuals of HA 234 × HA 458. Five markers were mapped around the Pl 17 locus (Table 1), reducing the Pl 17 gene interval from 2.9 cM between SFW04052 and ORS963 to 1.3 cM between SUN232 and ORS963 (Fig. 1a,b).
To further narrow down the Pl 17 gene region, a total of 80 SNPs was identified from the HA458-WGS1 (27 SNPs) and HA458-WGS2 (53 SNPs) in the targeted gene interval. Ten contigs were identified from the HA458-WGS1, which fell in the Pl 17 interval and were used as queries to align against the HA412-HO and  SUN232  SSR  ga  16  HA412-HO  TGTTTGAAAGGGAGACCACA  GGCGAGTTTATTTTGGGTGA  234  Pl 17 and Pl 19   SUN252  STS  --HA412-HO  AACGACATGCACATGGAAAA  AGAAAGCCTGCCAAACAAAA  233  Pl 17   SUN254  STS  --HA412-HO  GGACCATATGGGGTTTTCCT  TTCGGGCATATTTCAAGTCC  179  Pl 17 and Pl 19   SUN287  STS  --HA412-HO  TGTGATTGAAAAACCGGTCA  TACGGGTCAAACGGGTAAAA  225  Pl 19   SUN367  SSR  ag  12  HA412-HO  ATGGATGCCTTGCTCATCCC  CACTCCCATGCCCCTTACAG  251  Pl 19   SUN375  SSR  ag  8  HA412-HO  AATGATGAGGATGGCCGCAG  GATCAACTCGAAACCGGCAC 12 showed polymorphisms between HA 234 and HA 458 and were used to genotype the F 2 population. Linkage analysis indicated that ten SNP markers co-segregated with Pl 17 and one (SPB0007) was proximal to Pl 17 at a 0.3 cM genetic distance (Fig. 1b). Fine mapping of Pl 17 was performed to dissect the SNP marker cluster co-segregating with Pl 17 and to increase the map resolution. The two previously reported Pl 17 flanking markers, SNP marker SFW04052 and SSR marker ORS963 covering a 2.9 cM interval (Fig. 1a), were used to genotype the 3,008 F 3 individuals from the selected F 3 families that were heterozygous for Pl 17 . One hundred and three recombinants were identified and advanced into the next generation. The SSR marker SUN232 identified from saturation mapping was closer to the Pl 17 locus than SFW04052 and was then used to screen the 103 recombinants (Fig. 1b). Twenty-two of them were found to have recombination events in the target interval of 1.3 cM flanked by the SSR markers SUN232 (0.5 cM) and ORS963 (0.8 cM), and their advanced generation was inoculated with P. halstedii race 734 for the resistance test.
The 12 polymorphic SNP markers mapped to the Pl 17 interval between markers SUN232 and ORS963 using the 186 F 2 individuals were further used to genotype 22 recombinants identified from 3,008 F 3 individuals. As a result, Pl 17 was placed in a 0.0665 cM interval at the upper end of chromosome 4, flanked by markers C4_5711524 (0.0332 cM) and SPB0001 (0.0333 cM) (Fig. 1c). Most of the markers were physically in accordance with their genetic positions, although five SPB SNPs had a reversed order in both the HA412-HO and XRQ assemblies compared with their genetic positions ( Table 2). The flanking markers C4_5711524 and SPB0001 delimited Pl 17 to a 15 kb interval on the XRQ genome assembly. 19 . One hundred and one SSR and STS markers previously used in the Pl 17 saturation mapping were also used to genotype the two parents of the Pl 19 population, CONFSCLB1 and PI 435414. In addition, 56 SSRs were identified from the 296.4 kb sequence of XRQ from 6,238,999 to 6,535,440 bp on chromosome 4. Of 157 SSR and STS markers tested, 11 showed polymorphisms between the parents and were further used to genotype the BC 1 F 2 population (Table 1). Linkage analysis of marker-trait associations indicated that all SSR markers mapped distal to Pl 19 (Fig. 2a,b).

Saturation and fine mapping of Pl
Based on the physical positions of the newly mapped SSR marker SUN461, which was located from 7,383,392-7,383,599 bp in HA412-HO and 6,413,020-6,413,227 bp in XRQ, 104 SNPs were selected from a 308.4 kb region (7,690,106-7,998,497 bp) of the HA412-HO sequence, and 168 SNPs were selected from a 398.7 kb region (6,400,728-6,799,385 bp) of the XRQ sequence. Of 272 SNP markers tested in CONFSCLB1 and HA-DM5, 66 were polymorphic and were used to genotype the 139 BC 1 F 2 individuals derived from the cross of CONFSCLB1 × PI 435414 (Pl 19 ). Total of 35 SNPs were mapped around Pl 19 , with four SNPs designed from the HA412-HO assembly and 31 designed from the XRQ assembly. A total of 37 co-segregating markers, including eight SSR and 29 SNP markers, were mapped to a 0.7 cM genetic distance distal to Pl 19 (Fig. 2b).
To further fine map Pl 19 , the SSR marker SUN391 and the SNP marker SFW02206 were used as the flanking markers to screen the 2,256 BC 1 F 3 individuals selected from the BC 1 F 3 families heterozygous for Pl 19 . A total of 77 BC 1 F 3 individuals with recombination events close to the Pl 19 gene were identified and advanced to the next generation. Of 77 recombinants, 23 with recombination events occurred in the proximity to the Pl 19 region, and www.nature.com/scientificreports www.nature.com/scientificreports/ their families (30 seedlings per family) were tested with P. halstedii race 734. Of 35 mapped SNP markers, 15 were selected for further genotyping of the 77 recombinants to increase the map resolution. The Pl 19 gene was placed in the 0.2216 cM interval, flanked by SNP markers C4_6676629 (0.0443 cM) and C4_6711381 (0.1773 cM) (Fig. 2c). This genetic region corresponds to a 35 kb segment in the XRQ assembly ( Table 3).

Collinearity of SNPs between the two reference genome assemblies.
In the present study, two reference genomes, HA412-HO and XRQ, were used for SNP marker development. Most SNPs from either HA412-HO or XRQ had a collinear order in both genome assemblies (Tables 2 and 3). However, of 104 SNPs selected from HA412-HO in a region of 308. 4 kb for Pl 19 , only four SNP markers were mapped to the Pl 19 region. A search for these SNP positions in the XRQ genome assembly revealed that 30 SNPs residing in a 57.9 kb segment (7,690,106-7,748,049 bp) of HA412-HO were aligned to a 7.2 Mb segment (147,656,203-154,868,683 bp) of XRQ, which is outside the Pl 19 region (Supplementary Table S3). The remaining 74 SNPs in a 236.2 kb region (7,762,321-7,998,497 bp) were aligned to a corresponding region of 518.1 kb (6,458,182-6,976,291 bp) in the XRQ assembly.
identification of candidate genes for pl 17 and pl 19 . Most SNP markers mapped around the Pl 17 locus were physically between 5,676,065 to 5,711,324 bp on chromosome 4 of the XRQ assembly ( Table 2). The genetic positions of those markers were generally in accordance with their physical positions, although there was some conflict. The 104 kb genomic sequence of XRQ was analyzed from 5,670,000 to 5,780,000 bp on chromosome 4 encompassing the newly identified SNP markers from the XRQ sequence (https://www.heliagene. org/HanXRQ-SUNRISE/). Four putative genes were found in the corresponding genomic region (Table 4). One defense-associated gene HanXRQChr04g0095641 at nucleotide positions from 5,672,715 to 5,705,044 bp with a length of 32.329 kb had the typical TNL motif of the resistance gene model, encoding the full-length Toll/ interleukin-1-receptor, nucleotide-binding site, and leucine-rich repeat. Moreover, all 12 polymorphic SNP markers identified from the fine mapping were in this 32.329 kb region, supporting its candidacy for Pl 17 (Fig. 1d).
Pl 19 was located between marker C4_6676629 and C4_6711381, and the good collinearity of the genetic and physical positions of markers in this region suggested the presence of Pl 19 in the interval from 6,676,629-6,711,381 bp on chromosome 4 of the XRQ assembly (Table 3). A 120-kb genomic sequence on XRQ chromosome 4 was analyzed from 6,640,000 to 6,760,000 bp, which covers newly identified SNP markers for Pl 19 (Table 3). Three putative genes were discovered, with one candidate gene HanXRQChr04g0095951 falling into the interval of 6,676,629-6,711,381 bp, which was predicted as a probable RNA methyltransferase family protein ( Table 4, Fig. 2d).  Table S4) to determine their specificity in the sunflower population and to assess their potential in marker-assisted selection for Pl 17   www.nature.com/scientificreports www.nature.com/scientificreports/ the selected sunflower lines (Fig. 3). HA 458 (Pl 17 donor line) and those sunflower lines introgressed with the Pl 17 gene, including HA-DM3, HA-BSR2 to HA-BSR4, and HA-BSR6 to BA-BSR8, showed unique Pl 17 SNP marker alleles, distinguishing them from other sunflower lines (Fig. 3). The SNP marker C4_5696413 also amplified a fragment with a similar size to the Pl 17 allele in HA 291 (lane 3 in Fig. 3a). Sunflower line HOLS 1 showed a heterozygous pattern in all six diagnostic SNP markers for Pl 17 (lane 95 in Fig. 3).

Development of diagnostic markers for
Of 17 SNP markers tested in the evaluation panel, eight, C4_6401756, C4_6407910, C4_6647557, C4_6656705, C4_6666835; C4_6675662, C4_6676629, and S4_7964876, could differentiate Pl 19 from other reported Pl genes in the selected sunflower lines. HA-DM5 was the only sunflower line carrying the Pl 19 gene in the 96-line evaluation panel and had a unique PCR pattern of Pl 19 marker alleles compared with the remaining 95 lines (Fig. 4). These Pl 17 and Pl 19 unique markers are of essential utility in sunflower breeding to assist selection for these two genes.     Table S1) [23][24][25][26][27][28] . Distinguishing genes from a cluster can be achieved through traditional allelic analysis, polymorphic marker analysis, resistance specificity to different pathotypes, and the presence or absence of host reactions to pathogen effectors. Our previous studies have indicated that Pl 17 and Pl 19 are different but closely linked genes on sunflower chromosome 4 (data for allelic analysis not shown). Common markers ORS963 and NSA_003564 are downstream of Pl 17 but upstream of Pl 19 16,17 . Both genes are delimited in an interval of 3.2 Mb on chromosome 4 of the HA412-HO assembly, at which time the XRQ reference was not available. Using a sequence-based chromosome walking strategy toward the target gene in this study, Pl 17 was refined into an interval of 15 kb at a position from 5,696,076-5,711,324 bp on chromosome 4 in the XRQ assembly. In contrast, Pl 19 was precisely mapped to an interval of 35 kb at a position from 6,676,429-6,711,781 bp in the XRQ assembly, approximately 1 Mb apart from Pl 17 . A recently reported DM R gene, Pl 33 , is located in an interval of 1.56 Mb from 4,208,180-5,766,419 bp on chromosome 4 in the XRQ assembly 19 . Marker analysis among the three gene donors suggested that Pl 33 is different from Pl 17 and Pl 19 (Figs 3 and 4). The sunflower genome is approximately 3.6 Gb in size with more than 80% highly repetitive sequences. The assembly of a large and complex genome with a high level of repetitive sequences remains a challenge in the community, but longer read length, higher genome coverage, and more sophisticated bioinformatics would reduce this difficulty and provide more accurate results. The HA412-HO whole-genome sequence was assembled from Illumina reads (100 bp) and 454 Roche reads (400-1,000 bp), while the XRQ whole-genome sequence was assembled from PacBio sequencing data with an average read length of 10.3 kb 20 . High quality genome assembly is crucial for reference sequence-based chromosome walking to anchor a specific region for the target gene. In the present study, comparison of mapped SNP positions between two assemblies revealed the coincidence of their positions in the two reference genomes of most SNPs. However, when searching the positions of 104 SNPs derived from HA412-HO for Pl 19 in the XRQ genome assembly, 30 SNPs located in a 57.9 kb segment between 7,690,106 and 7,748,049 bp were found to align to a 7.2 Mb segment between 147,656,203 and 154,868,683 bp in XRQ (Supplementary Table S3), and none of them was mapped to the Pl 19 region. This finding complicates the use of the reference genome for chromosome walking.

Discussion
The two sunflower reference sequences provide alternative opportunities for SNP discovery. In the current study, SNPs from the XRQ genome showed more polymorphisms than those from HA412-HO. A total of 120 SNPs from HA412-HO were used for Pl 17 (16 SNPs) and Pl 19 (104 SNPs) fine mapping, and only four were mapped (3.3%). In contrast, of 232 SNPs from XRQ tested for Pl 17 (64 SNPs) and Pl 19 (168 SNPs), 43 were mapped (18.5%). Considering its assembly from very long PacBio reads, the XRQ genome sequence can be used as the first choice in sequence-based chromosome walking aiming for fine mapping and gene cloning in the sunflower community, while the HA412-HO genome provides a useful comparison to the XRQ genome and a second selection of SNP markers.
In the prior five years, 18 new DM R genes (Pl 17 -Pl 20 , Pl 22 -Pl 35 ) have been identified and mapped with a total of 36 DM R genes in sunflower 1,[16][17][18][19][29][30][31][32] . Despite this great progress, none of the DM R genes has been cloned in sunflower to date. The R genes cloned from other crops indicate that most R genes encode proteins with nucleotide binding and leucine-rich repeat domains (NLRs) 33  identified from the reference genome of XRQ for Pl 17 belongs to this class. A preliminary expression analysis suggested it is potentially a Pl 17 gene with the expected kinetics in cotyledons and roots between susceptible and resistant parents in chronological order (data not shown). EMS-induced mutation was performed in a large population of HA 458 seeds and advanced into the M 2 generation. DM testing of the M 2 population is currently underway to screen for mutants showing susceptible phenotypes. The sequences of the candidate gene HanXRQChr04g0095641 will be further evaluated and compared between wild type and mutants. These studies will provide a foundation to facilitate our efforts of cloning Pl 17 in the future. The 35 kb region of the XRQ reference genome harboring Pl 19 contains only one annotated gene, HanXRQChr04g0095951, predicted as a probable RNA methyltransferase family protein. RNA methylation and its role in human diseases have been reported, however, genes with similar annotation have not thus far been implicated in disease resistance in plants 35,36 . Genomic regions harboring plant disease resistance genes are often complex, exhibiting structural variations between resistant and susceptible genotypes 37 . Thus, it is possible that  the Pl 19 gene is absent from the available sunflower reference assemblies. Alternatively, the single gene identified at the Pl 19 locus in the XRQ assembly may be indicative of a novel resistance mechanism. Similarly, although a more conventional prospective candidate gene was identified for Pl 17 , the gene conferring resistance may also be absent from the reference assembly. Future work on the cloning of Pl 17 and Pl 19 will be required to distinguish between these possibilities and elucidate the genetic basis of the broad-spectrum disease resistance.
Downy mildew remains the major disease threat to sunflower production because of its high-level ability to develop new virulence and its worldwide distribution. Two prerequisites are essential to the use of host resistance in breeding programs, i.e., a resistance resource and diagnostic markers. Both Pl 17 and Pl 19 show broad-spectrum resistance to all known isolates of P. halstedii 10,17,18 . Because of their biallelic nature, SNP markers show fewer polymorphisms in the breeding population in nature, especially if the marker is not closely linked to the target gene. In the current study, we applied a whole-genome resequencing approach combined with reference sequence-based chromosome walking to narrow down the gene intervals and develop diagnostic SNP markers for Pl 17 and Pl 19 , respectively. Six diagnostic SNP markers for Pl 17 spanned a physical distance of 15 kb in the XRQ genome within the candidate gene HanXRQChr04g0095641. Two diagnostic SNP markers, C4_6675662 and C4_6672629, closest to Pl 19 were in a 35-kb interval of Pl 19 within the candidate gene HanXRQChr04g0095951. The high-density maps and diagnostic SNP markers for Pl 17 and Pl 19 developed in this study provide useful tools to accelerate the transfer of these genes to elite sunflower lines in breeding programs, as well as facilitate pyramiding of these genes with other broadly effective Pl genes for durable DM control 38 .

Methods
Mapping populations and evaluation panel. The initial F 2 mapping population for Pl 17 was created from a cross between HA 234 and HA 458 with 186 individuals. HA 458 (PI 655009) is an oilseed maintainer line that is resistant to all North American P. halstedii races identified thus far. HA 234 is an oilseed sunflower maintainer line that is susceptible to DM. The DM resistance gene Pl 17 in HA 458 was previously mapped to sunflower chromosome 4 16 . This F 2 population was used for saturation mapping of additional markers in the present study. For fine mapping, recombinants were screened from 3,008 F 3 individuals selected from the previously characterized F 2:3 families heterozygous for Pl 17 . Each selected heterozygous F 3 family equates to a segregating F 2 population.
Saturation mapping of the DM R gene Pl 19 was performed in the BC 1 F 2 population developed from the cross of cytoplasmic male sterile (CMS) CONFSCLB1 and PI 435414 with 139 F 2 individuals, which was previously used for the initial mapping of Pl 19 17 . PI 435414, which is resistant to DM, is a wild H. annuus accession that was collected from Paris, Texas, U.S. in 1978. CONFSCLB1 is a confectionary maintainer line that is susceptible to DM. For fine mapping, recombinants were screened from 2,256 BC 1 F 3 individuals selected from the previously characterized BC 1 F 2:3 families heterozygous for Pl 19 . In our follow-up breeding program, Pl 19 was successfully introgressed from wild PI 435414 into confectionary sunflower, named HA-DM5 (PI 687025), which was used for whole-genome resequencing to fine map the Pl 19 gene.
The evaluation panel consisted of 96 sunflower inbred lines with diverse origins, including 24 and 17 lines harboring different DM and rust R genes, respectively (Supplementary Table S4). This panel was used to identify diagnostic DNA markers in marker-assisted selection for Pl 17 and Pl 19 , respectively. 17 and Pl 19 on sunflower chromosome 4 in an interval between 3,621,089 and 6,852,749 bp 16,17 . This stretch of 3.2 Mb genomic sequence covering both loci was extracted from the HA412-HO (https://www.heliagene.org/HA412. v1.1.bronze.20141015/) and XRQ reference genomes (GenBank accession GCA_002127325.1), respectively. The type and distribution of simple sequence repeats (SSRs) were analyzed using GRAMENE Ssrtool (http://archive. gramene.org/db/markers/ssrtool), and those repeated no less than five times were utilized for primer design. Sequence-tagged sites (STSs) were also analyzed within this 3.2 Mb sequence of the HA412-HO reference. A total of 157 pairs of primers, including 40 STSs and 117 SSRs (40 STSs and 55 SSRs were from the HA412-HO sequence and 62 SSRs from the XRQ sequence), were designed for amplification.

SSR and STS marker identification. Previous genetic mapping studies have placed both Pl
Resequencing and SNP marker identification. Initially, HA 458 whole genome sequence (named HA458-WGS1 with a low genome coverage) was provided by Dr. Loren Rieseberg of the University of British Columbia, Canada, and aligned with the reference genome XRQ (https://www.heliagene.org/ HanXRQ-SUNRISE/) around the Pl 17 region to identify SNPs and InDels. Subsequently, HA 458 (HA458-WGS2) and HA-DM5 (released germplasm with Pl 19 ) were sequenced at 40 and 35 × depth, respectively, on the Illumina HiSeq sequencing platform at CD Genomics Inc. according to their protocols. Briefly, quality DNA samples were used for library construction using CoVaris S/E210 for fragmentation, and qualified libraries for each gene were pooled for sequencing. Raw reads resulting from sequencing process were filtered to remove reads containing adaptors, reads with >1% ambiguous bases, and reads with low quality (greater than 50% bases less than 15 Q score). A total of 961,980,260 (98.95%) clean reads were obtained from HA 458 sequencing, where 952,508,154 (99.02% mapping rate) and 954,743,565 (99.25%) reads could be mapped to the HA412-HO and XRQ reference genomes, respectively. All SNPs and InDels were identified using the mapped reads and annotated with ANNOVAR software. HA-DM5 was also whole-genome resequenced at CD Genomics Inc. with the same protocols and on the same platforms. The SNP markers were named with prefix C4 or S4 followed by a number representing the physical position of the SNPs along chromosome 4 of each reference genome assembly. C4 represents the SNP from the XRQ reference, while prefix S4 represents the SNP from the HA412-HO reference.
Genotyping of PCR-based markers and linkage analysis. SSR and STS primers were designed using the Primer 3 program (Table 1)  www.nature.com/scientificreports www.nature.com/scientificreports/ (PCR) for SSR and STS was performed as described by Qi et al. (2011) 41 , while SNP PCR was conducted as described by Qi et al. (2016) 32 . PCR products were visualized by gel electrophoresis on a 6.5% polyacrylamide gel using an IR2 4300/4200 DNA analyzer (LI-COR, Lincoln, NE, USA).
Genotyping data for each marker was first assessed for goodness of fit to the Mendelian segregation ratio (1:3 for dominant and 1:2:1 for codominant) using the Chi-square (χ 2 ) test. Those fitted markers were linkage analyzed with phenotyping data using JoinMap 4.1 software 42 . Regression mapping algorithm and Kosambi's mapping function were chosen. The cutoffs of linkage analysis among markers were set at a likelihood of odds (LOD) ≥3.0 and maximum genetic distance ≤50 centimorgans (cM).
Downy mildew resistance evaluation. The P. halstedii isolate of race 734 was chosen to test seedlings of the recombinants selected from the fine mapping populations for resistance to DM, together with their respective parents, HA 234 and HA 458 for Pl 17 , and CONFSCLB1 and HA-DM5 for Pl 19 , using the whole seedling immersion method as described by Gulya et al. 43 and Qi et al. (2015) 16 . Race 734 was first identified in 2009 in North America and overcame the Pl 6 and Pl 7 genes 8 . The seedling was considered susceptible (S) if sporulation was observed on cotyledons and true leaves and resistant (R) if no sporulation was observed. A total of approximately 30 seedlings from each recombinant were inoculated with the P. halstedii isolate of race 734 and evaluated. The recombinants were classified as homozygous resistant if none of the seedlings exhibited sporulation, segregating if some seedlings showed sporulation on cotyledons and true leaves, and homozygous susceptible if all seedlings showed sporulation on cotyledons and true leaves, which represented the genotypes of DM resistance in each recombinant.