Map and sequence-based chromosome walking towards cloning of the male fertility restoration gene Rf5 linked to R11 in sunflower

The nuclear fertility restorer gene Rf5 in HA-R9, originating from the wild sunflower species Helianthus annuus, is able to restore the widely used PET1 cytoplasmic male sterility in sunflowers. Previous mapping placed Rf5 at an interval of 5.8 cM on sunflower chromosome 13, distal to a rust resistance gene R11 at a 1.6 cM genetic distance in an SSR map. In the present study, publicly available SNP markers were further mapped around Rf5 and R11 using 192 F2 individuals, reducing the Rf5 interval from 5.8 to 0.8 cM. Additional SNP markers were developed in the target region of the two genes from the whole-genome resequencing of HA-R9, a donor line carrying Rf5 and R11. Fine mapping using 3517 F3 individuals placed Rf5 at a 0.00071 cM interval and the gene co-segregated with SNP marker S13_216392091. Similarly, fine mapping performed using 8795 F3 individuals mapped R11 at an interval of 0.00210 cM, co-segregating with two SNP markers, S13_225290789 and C13_181790141. Sequence analysis identified Rf5 as a pentatricopeptide repeat-encoding gene. The high-density map and diagnostic SNP markers developed in this study will accelerate the use of Rf5 and R11 in sunflower breeding.


Results
Saturation mapping of Rf5 and R 11 region. The previous SSR map placed Rf5 and R 11 to a region of 7.1 cM, with R 11 being 1.6 cM proximal to Rf5 22 (Fig. 1a). To saturate Rf5 and R 11 regions, a total of 45 SFW-SNPs most likely to be around both gene loci on sunflower chromosome 13 were selected and converted into the PCRbased length polymorphism markers. The selected 45 SNP markers were screened between two parents, HA 89 and HA-R9, for polymorphism. Nine SFW-SNP markers, SFW01515, SFW01741, SFW02101, SFW03371, SFW04100, SFW04482, SFW04577, SFW05176, and SFW07542, were polymorphic with codominant nature. They were further genotyped in the F 2 population of 192 individuals. Eight SFW-SNP markers were mapped to the Rf5 interval between ORS995 and ORS728, reducing the gene interval from 5.8 to 0.8 cM, while no SFW-SNP was mapped to the R 11 interval between ORS728 and ORS45 (Fig. 1b). The genetic distance between Rf5 and R 11 was comparable to that of the two genes in the previous SSR map, and three SFW-SNP markers, SFW01515, SFW04100 and SFW04577, were co-segregating with Rf5 and were 1.3 cM distal to R 11 22 (Fig. 1a,b).
Fine mapping of Rf5 and R 11 using SNP markers from whole-genome resequencing. Recombinant screens from a large population. To increase map resolution, a large population was screened to detect recombinants for both Rf5 and R 11 . Rf5 flanking markers, SNP SFW03371 and SSR ORS728, were used to screen 3517 F 3 individuals selected from the previously characterized F 2:3 families heterozygous for Rf5. A total of 87 recombinants were identified and were grown in the greenhouse for seeds. Among the 87 recombinants, 24 plants could not develop pollen and were considered sterile. Two fertile plants did not have enough seeds and were later excluded from fertility testing. The remaining 61 fertile recombinant families were grown in the field (35 seeds for each family) to evaluate their genotypes as homozygous or heterozygous, of which 37 were heterozygous fertile, and 24 were homozygous fertile. Similarly, R 11 flanking markers, SSR markers ORS728 and ORS45, were used to screen 8795 F 3 individuals selected from the previously characterized F 2:3 families heterozygous for R 11 (Fig. 1b). A total of 112 recombinants were identified, and their advanced generation (20 seedlings for each family) was inoculated with P. helianthi race 336 for rust resistance testing. Among 112 recombinant families tested, 29 were homozygous susceptible, 18 homozygous resistant, and 65 segregating.
New SNP marker development and fine mapping. To further refine the positions of Rf5 and R 11 in the target region, HA-R9 was sequenced at 40 × genome coverage to identify additional SNP markers within the region. www.nature.com/scientificreports/ The variants, including single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels), were called in the target region of Rf5 from the two sunflower reference genome assemblies, spanning a 58.2 kb region (216,334,932-216,393,092 bp) on chromosome 13 in the HA412-HO genome and a 60.6 kp region (175,222,724-175,283,334 bp) in the XRQ genome, respectively. A total of 579 variants (536 SNPs and 43 InDels) from HA412-HO and 803 variants (752 SNPs and 51 InDels) from XRQ were identified, respectively. Eighty-four WGS-SNPs (31 from HA412-HO and 53 from XRQ) were selected from the two target regions and screened between the parents, HA 89 and HA-R9, with nine SNPs showing polymorphism. A total of 15 SNP markers (9 WGS-SNPs and 6 SFW-SNPs) were used to genotype 85 recombinants of Rf5 identified from 3517 F 3 individuals. Linkage mapping placed Rf5 on a 0.00071 cM interval on chromosome 13, co-segregating with SNP marker S13_216392091 (Fig. 1c). Most of the WGS-SNP markers were physically positioned in accordance with their genetic positions in the XRQ genome assembly, but genetic and physical positions in the HA412-HO assemblies were reversed in order ( Table 1). The flanking markers, S13_216392091 and C13_175253964, delineated Rf5 to within 35.6 and 30.6 kb regions in the HA412-HO and XRQ assemblies, respectively (Table 1). In the saturation map, the flanking SSR markers, ORS728 and ORS45, delimited R 11 to within a 3.4 Mb region (223,364,614-226,744,870 bp) in the HA412-HO assembly with no SNP marker mapped to this interval (Fig. 1b, Table 2). The SNPs/InDels were identified in the R 11 target region by aligning HA-R9 sequence to the two reference genomes, spanning a 76.4 kb region (225,224,729-225,301,092 bp) in the HA412-HO assembly and a 1.5 Mb region (180,597,345-182,108,040 bp) in the XRQ assembly. A total of 559 SNPs/InDels were identified in the 76.4 kb region of HA412-HO, and 12,359 SNPs/InDels were found in the 1.5 Mb region of XRQ. A total of 34 SNPs (15 from HA412-HO and 19 from XRQ) were selected to screen between the parents, HA 89 and HA-R9. Ten polymorphic SNPs were used to genotype 112 recombinants of R 11 identified from 8795 F 3 individuals. Fine mapping placed R 11 to a 0.00210 cM interval on chromosome 13, and the gene co-segregated with S13_225290789 and C13_181790141 (Fig. 1c). The flanking SNP markers, C13_181790141 and C13_181792157 delineated R 11 to an interval of 2416 bp in the XRQ genome and 3197 bp in the HA412-HO genome ( Table 2). R 11 was approximately 8.9 and 6.5 Mb apart from Rf5 in the HA412-HO and XRQ genome assemblies, respectively (Tables 1 and 2).
Sequence comparison of HA-R9 with the candidate genes. HA-R9 whole-genome resequencing generated a total of 487,190,276 raw reads. After removal of 797,556 reads with adapters (0.16%) and 74,547  www.nature.com/scientificreports/ reads containing > 10% undetermined bases (0.02%), a total of 486,318,173 (99.82%) paired-end clean reads were used for assembly and gap repair. By using SOAPdenovo2 58 , a total of 8,547,762 contigs were constructed, with most contigs (6,933,581 contigs; 81.12%) ranging between 100 and 500 bp in length. Only 20 (0.0002%) and 545,224 (6.38%) contigs were more than 10 and 1 kb in length, respectively. Additionally, a total of 7,010,438 scaffolds were identified, with most of them (6,032,509 scaffolds; 86.05%) ranging between 100 and 500 bp in length, while 11,749 (0.17%) and 565,585 (8.07%) scaffolds were more than 10 and 1 kb in length, respectively. The small size and similar total numbers of contigs and scaffolds were most likely due to the short reads (350 bp of 150 bp paired-end) resulting from the Illumina HiSeq/MiSeq sequencing platform and the wide distribution of repetitive sequences in the sunflower genome. A stretch of 58.2 kb genomic sequence between 216,334,932 and 216,393,092 bp of chromosome 13 covering the Rf5 gene was extracted from the reference genome HA412-HO and used as query to search against the HA-R9 assembled contigs and scaffolds. A total of 106 contigs and 393 scaffolds were identified, and 2 contigs (C16577551 and C16613275) and 3 scaffolds (scaffold206293, scaffold545194, and scaffold550505) were selected based on their positions in the target region (Fig. 2). The selected contigs and scaffolds were aligned to the three candidate Rf5 genes, Ha412v1r1_13g048240, Ha412v1r1_13g048260, and HanXRQChr13g0420371, and showed high levels of sequence identity (Table 4). Contig C16613275 shared the highest level (99%) of identity with candidate gene Ha412v1r1_13g048240, followed by scaffold206293 with Ha412v1r1_13g048260 (97%) and HanXRQChr13g0420371 (97%). The aligned sequence between contig/scaffold and candidate gene was usually over 1 kb. Open reading frames (ORFs) were analyzed using ORFfinder (https ://www.ncbi.nlm.nih.gov/ orffi nder/) among the selected contigs and scaffolds, and the longest ORF for each contig/scaffold was further analyzed by repeat and deduced amino acid numbers (Table 5). Not surprisingly, all of them belong to PPR superfamily with a series of degenerate 35-amino-acid repeats with different copy numbers, suggesting their candidacies for the Rf5 gene. The best ORFs from each contig/scaffold were aligned, and high similarity was found among them (Fig. 3).
Comparative analysis of amino acid sequences of the candidate Rf5 gene, Ha412v1r1_13g048260, with other characterized Rf orthologues from petunia, radish, rice and sorghum was also performed to reveal the sequence similarity along the PPR motifs. Although multiple sequence alignment showed overall low sequence identity, relatively higher sequence similarity was found in the PPR domains ( Supplementary Fig. S1). This comparative analysis indicated that the candidate Rf5 gene is phylogenetically distant to other characterized Rf orthologous from different plant species.
A stretch of 70.981 kb (181,774,000-181,844,981 bp) genomic sequence harboring the R 11 gene was extracted from XRQ chromosome 13 and used as a query to search against HA-R9 assembled contigs and scaffolds to identify the R 11 gene sequence. After analysis of numerous contigs and scaffolds aligned to the query sequence, one contig and seven scaffolds were selected based on their positions in the target sequence (Fig. 4, Table 6). The selected contig and scaffolds were aligned to the four candidate R 11 genes, Ha412v1r1_13g051750, HanXRQChr13g0422111, HanXRQChr13g0422121, and HanXRQChr13g0422131. Scaffold607601 could be aligned to both Ha412v1r1_13g051750 and HanXRQChr13g0422111, respectively, while scaffold585166 and scaffold396498 showed high levels of sequence identity with HanXRQChr13g0422111 only (Table 6). Three scaffolds shared 92 to 98% sequence identity with HanXRQChr13g0422121, while one contig (C16653159) and one scaffold (scaffold433233) showed 95 to 96% sequence identity with HanXRQChr13g0422131.  www.nature.com/scientificreports/  (Fig. 5). The ten SNP markers closely linked to R 11 in the fine map were used to screen six sunflower lines: HA 89, HA-R3, HA-R6, HA-R9, RHA 397, and RHA 464. HA-R3, HA-R6, and RHA 397 carry the rust resistance genes, R 4 , R 13a , and R 13b , respectively, all mapped to the lower end of chromosome 13, while RHA 464 harbors a rust R gene R 12 mapped to chromosome 11 48,51,52 . Among 10 SNPs tested, three exhibited a unique PCR pattern in HA-R9, different from that of the other five lines, and were subsequently used to test a panel of 96 diversified  www.nature.com/scientificreports/ sunflower lines. The two SNPs, C13_181790141 co-segregating with R 11 and C13_181792157 proximal to R 11 at 0.00011 cM genetic distance were diagnostic markers for R 11 (Figs. 1c, 6).

Discussion
Sunflower chromosome 13, particularly its lower end, harbors a number of economically important genes locating within the cluster. This valuable gene cluster was further divided into two sub-clusters 51 , i.e. sub-cluster I including the rust R genes R adv and R 11 and the male fertility restorer genes Rf1, Rf5 and Rf7, and sub-cluster II including the rust R genes R 4 , R 13a , R 13b , and R 16 , and the four downy mildew R genes Pl 5 , Pl 8 , Pl 21 , and Pl 34 8,16,22,47,48,51,55 . The two sub-clusters are approximately 23 Mb apart based on evidence from the two linked genes, Rf7 and Pl 34 , originally from the wild H. annuus species, accession PI 413157. Both genes were mapped to an interval of 5.8 cM genetic distance on chromosome 13 and located in the two sub-clusters, respectively 8 . The two SNP markers, NSA_001167 closely linked to Rf7 and SFW08875 closely linked to Pl 34 , located at the positions of 170,812,277 and 193,131,123 bp, respectively, in the XRQ genome, delimited the two genes to a physical interval of 22.3 Mb 8 . In the current study, sequencing-based chromosome walking combining with fine mapping delineated Rf5 and R 11 to regions of 30.6 and 2.1 kb in the XRQ genome within the sub-cluster I, and diagnostic SNP markers for Rf5 and R 11 were developed to facilitate marker-assisted breeding. Sequence alignment indicated that scaffold206293 from the HA-R9 sequence assembly shared 97% sequence identity with www.nature.com/scientificreports/ two candidate genes, Ha4121r1_13g048260 and HanXRQChr13g0420371, which codes a PPR protein, a motif of most cloned Rf genes, providing a starting point for Rf5 gene cloning in the future (Table 4). Its predicted ORF was 1188 bp in length and incomplete, suggesting the first step for future work is to retrieve the surrounding sequences for a complete ORF. Clustering of Rf genes is common in plant species, having been reported in common bean, rice, and petunia 25,42,59 . The Rf gene cluster harboring five active genes (Rf1a, Rf1b, Rf4, Rf5 and Rf98) located on rice chromosome 10 shows extreme variation in structure and gene content 37 . In sunflower, among seven Rf genes reported, three of them, Rf1, Rf5, and Rf7, were mapped to sub-cluster I in the lower end of chromosome 13 8,16,22 . Yue et al. (2010) localized Rf1 at a position 3.7 cM proximal to SSR marker ORS511 on LG 13, equivalent to chromosome 13 16 . Rf7 was mapped at a location 0.9 cM proximal to ORS511 in chromosome 13 8 , while Rf5 shared a common SSR marker ORS316 with Rf7 in the target region 22 , suggesting the close genetic relationship of Rf1 and Rf7, as well as Rf5 (Fig. 7b,c,d).
Three Rf genes, Rf1, Rf5, and Rf7, originated from the different accessions of the sunflower wild species H. annuus collected from Texas (Rf1), Oklahoma (Rf5), and New Mexico (Rf7), respectively 8,22,60 . Recently, a genome-wide association study identified 24 significant SNP markers associated with Rf1, which are located in a region where Rf5 and Rf7 reside 8 . Among 24 SNPs associated with Rf1, only five and seven SNPs retained the   (Table 7). Taken together, we propose a hypothesis of three genes ordered within sub-cluster I: Rf7 near the 172 Mb position, Rf1 at 174 Mb, and Rf5 at 175 Mb. Sub-cluster I with three the Rf genes also harbors two rust R genes, R adv and R 11 linked to Rf5, which are positioned distal to a common SSR marker ORS316 at the genetic distances of 3.0 and 3.7 cM in the two maps, respectively 22,47 (Fig. 7). R adv originated from a sunflower wild species H. argophyllus and encodes specific recognition to rust infection, different from that of R 11 51 . Bachlava et al. (2011) reported that an NBS-LRR-encoding resistance gene candidate (RGC) marker RGC260 most closely linked to R adv was mapped to 0.2 cM distal to the R adv locus 47 (Fig. 7a). Alignment of the RGC260 reverse primer sequence to the XRQ genome sequence indicated that RGC260 is located at the position of 178,056,184 bp in the XRQ genome assembly. In the present study, fine mapping delimited R 11 to an interval between 181,789,941 and 181,792,357 bp in the XRQ genome, indicating that R adv and R 11 are two closely linked, but different genes.
Although the success of Rf gene cloning has been reported in maize, peanut, radish, rice, sorghum, and sugar beet, its cloning from sunflower is precluded due to the large genome size and high proportion of repetitive sequences 23,25,26,32,35,38,39,62 . Sunflower is a diploid species with a genome size of approximately 3.6 Gb and more than 80% repetitive sequences. The availability of sunflower genome sequences of two inbred lines, HA412-HO and XRQ, has enabled the development of high density molecular markers and accelerated fine mapping and map or sequence-based gene cloning 57 . With reference-guided chromosome walking, we identified three and two candidate genes for Rf5 from the HA412-HO and XRQ assemblies, respectively. Among them, two from the HA412-HO assembly, Ha412v1r1_13g048240 and Ha412v1r1_13g048260, both had PPR motifs typical of Rf genes, and one from XRQ assembly, HanXRQChr13g0420371, showed a typical tetratricopeptide-like helical domain, which shares 100% sequence identity with Ha412v1r1_13g048260, indicating the sequence of Ha412v1r1_13g048260 is highly conserved in sunflower, at least among the sunflower lines we studied.
The majority of cloned Rf genes in plants encode a specific clade of the RNA-binding PPR protein family 42,63 . Duplicated PPR-containing genes residing within the Rf locus are habitual in plant species. A pair of duplicated PPR-containing genes, Rf-PPR591 and Rf-PPR592, was found to reside in the Rf locus in Petunia, share 93% sequence similarity and are identical in PPR organization, but only differ in the last 12 C-terminal amino acids 25 . Further functional characterization confirmed Rf-PPR592 was able to restore fertility to CMS plants, but not  (Table 3). Both genes encode a PPR protein and share 92% sequence similarity, suggesting it is likely one of the two is the candidate Rf5 gene in sunflower.
As HA412-HO does not have the Rf5 gene, the candidate genes in HA412-HO could be rf or pseudo-alleles and fail to interact with cytoplasm for fertility restoration. Thus, it is important to determine the physical location of Rf5 in the HA-R9 genome. The contigs and scaffolds from HA-R9 whole-genome resequencing and assembly were searched with the use of the candidate gene sequences as queries, and contig C16613275 and scaffold206293 were determined to share very high sequence identity (97-99%) with queries. However, the predicted ORF of scaffold206293 is incomplete, and a large portion of it consists of undetermined nucleotides, which is not uncommon of the repetitive sequences in the sunflower genome and the sequencing system. All these targeted contigs and scaffolds feature a typical PPR motif with a series of degenerate 35-amino-acid repeats of different numbers (Table 5). Due to the unavailability of a stable transformation system in sunflower, we were unable to confirm their function to restore male sterility. Alternatively, we developed an EMS mutant population of HA-R9, and sterile male plants, resulting from mutations in the Rf5 locus in the M1 generation, were obtained in the M3 families. Target region sequencing of the Rf5 mutant plants and the Rf5 donor HA-R9 is underway using PacBio long-read sequencing for further functional analysis.
HA-R9 carrying Rf5 has been tested for its male fertility restoration to eight different CMS lines, including PET1, PET2, MAX1, GIG1, ANN2, ANN3, RIGX, and GIG2 56 . The results indicated that Rf5 can only restore PET1 CMS, just as Rf1. Rf5 is approximately 6 Mb apart from R 11 with a recombination ratio of 1.3% between the two genes (Fig. 1b). Therefore, Rf5 and R 11 could be used as a linkage block in sunflower breeding programs. The introgression of these two genes into new hybrids is important, as Rf5 provides a new Rf gene to PET1 CMS, and R 11 provides resistance to all P. helianthi races identified so far in North America. The high-density map and diagnostic SNP markers developed provide the information and tools required to accelerate the transfer of Rf5 and R 11 into elite sunflower lines.

Methods
Plant materials. An F 2 population with 192 individuals previously used to map Rf5 and R 11 with SSR markers was used for saturation mapping in the current study 22 . In its original cross, the inbred line HA 89 was susceptible to rust, while the wild H. annuus accession PI 613748 was used as a donor of Rf5 and R 11 . A sunflower germplasm line, HA-R9, characterized as homozygous for both Rf5 and R 11 , released by USDA and North Dakota State University in 2013, was used as a gene donor for whole-genome resequencing 56 .
For fine mapping of Rf5 and R 11 , recombinants were screened from 3517 and 8795 F 3 individuals selected from the previously characterized F 2:3 families heterozygous for Rf5 or R 11 , respectively. Each selected heterozygous F 3 family was considered a segregating F 2 population for Rf5 or R 11 . www.nature.com/scientificreports/ An evaluation panel was assembled consisting of 96 inbred sunflower lines with diverse origins, including 20 and 17 lines known to harbor the different male fertility restoration Rf genes and the rust R genes, respectively (Supplementary Table S1). This panel was used to validate the diagnostic DNA markers linked to Rf5 and R 11 . Male fertility evaluation. F 2:3 individuals and F 2:4 families were visually scored as fertile or sterile based on the presence or absence of pollen. Plants that could develop anthers and shed pollen were considered fertile, while those that could not develop anthers or pollen were considered sterile. Recombinants selected from the fertility restoration segregating F 2:3 population were grown in the greenhouse to evaluate male fertility. From 3517 F 2:3 individuals, 86 were selected as recombinant between the markers SFW03371 and ORS728; of these recombinants, 23 were male-sterile, and 63 were fertile. The subsequent generation of 61 fertile F 2:4 recombinant families (35 seeds each) were grown in a field at Glyndon, MN, in the summer of 2015 to evaluate their homozygosity and heterozygosity. The family was considered homozygous fertile if all the plants in the family were able to develop anthers and shed pollen. Conversely, the family was considered heterozygous fertile if the plants in the family were segregating for male fertility.
Rust resistance evaluation. The recombinants selected from the fine mapping population were evaluated for rust resistance in the greenhouse in 2015. Twenty seeds from each of the selected F 2:4 recombinant families, together with the parents HA 89 and HA-R9, were grown in 4 × 9 cell plastic flats filled with Sunshine SB 100B potting mixture (SunGro Horticulture, Bellevue, WA, USA). Regular greenhouse maintenance was performed until seedlings reached the four-leaf stage. The P. helianthi isolate of race 336 was chosen for testing seedlings using an artificial inoculation procedure described by   64 . Leaves were inoculated with urediniospores of P. helianthi race 336. Resistance against rust was evaluated 12 to 14 days after inoculation for both infection types (ITs) based on the 0 to 4 scale described by Yang et al. (1986) 65 and the percentage of leaf area covered with pustules (severity) described by Friskop et al. (2015) 66 . Infection types 0, 1, and 2 combined with pustule coverage of 0 to 0.5% were classified as resistant, and ITs 3 and 4 with pustule coverage > 0.5% were considered susceptible.
Saturation mapping, whole-genome resequencing, and SNP marker identification. For saturation mapping, a total of 45 SNP markers potentially mapped around Rf5 and R 11 gene loci on chromosome 13 were selected after comparison with a published genetic map 67 (hereafter referred to as SFW-SNPs, Supplementary Table S2).
HA-R9 was sequenced on the Illumina HiSeq/MiSeq sequencing platform at Novogene Corp. according to their protocols. Briefly, quality DNA samples were randomly fragmented using Covaris cracker to 350 bp in size for library construction, and later qualified libraries were pooled for sequencing according to effective concentrations and expected data volume. Raw reads resulting from next generation sequencing were trimmed and filtered to remove adapters, reads with > 10% undetermined bases, and reads with more than half of the bases of low quality (Q score ≤ 5). After filtering, clean reads were separately mapped to the two publicly available sunflower reference genomes, HA412-HO (https ://www.helia gene.org/HA412 .v1.1.bronz e.20141 015/) and XRQ (https :// www.helia gene.org/HanXR Q-SUNRI SE/). All SNPs and InDels were identified using the genome-mapped reads. The SNP markers were named with prefix C13 or S13 followed by a number representing the physical position of each SNP along chromosome 13 of each reference genome assembly (hereafter referred to as WGS-SNPs, Supplementary Table S3a and S3b). The prefix C13 represents the SNP from the XRQ reference genome assembly, while S13 represents the SNP from the HA412-HO reference genome assembly.
Genotyping of PCR-based markers. SSR marker genotyping was performed as described by   49 . Genotyping of polymerase chain reaction (PCR)-based SNP markers was conducted as described by   68 and Long et al. (2017) 69 . For each SNP, two-tailed forward allele-specific primers (AS-primers F1 and F2) and one common reverse primer were designed (Supplementary Table S4). A universal primingelement-adjustable primer (PEA-primer 5′-ATA GCT GG-Sp9-GCA ACA GGA ACC AGC TAT GAC-3′) with an attached fluorescence tag at the 5′ terminus was used in each PCR. The PCR protocol for SNP genotyping was conducted as described by Ma et al. (2017) 70 . Upon amplification, PCR products were loaded on a 6.5% polyacrylamide gel for visualization using an IR2 4300/4200 DNA analyzer (LI-COR, Lincoln, NE, USA). Sequence assembly, alignment and candidate gene identification. The clean paired-end reads from HA-R9 whole-genome resequencing were assembled using SOAPdenovo2 and gaps were repaired 58 . The two genomic regions, 216,334,932-216,393,092 bp and 225,224,729-225,301,092 bp on chromosome 13 from the HA412-HO genome sequence assembly were selected to identify contigs and scaffolds possibly having Rf5 and R 11 genes, respectively. The sequences of the selected contigs and scaffolds are presented in Supplementary  Table S5. A standalone BLASTN program downloaded from the NCBI (ftp://ftp.ncbi.nlm.nih.gov/blast /execu table s/blast +/LATES T/) was used to conduct a BLAST search of the reference sequences of the two genomic regions mentioned above against the assembled HA-R9 contigs and scaffolds, at E-value e-20. Selected contigs and scaffolds showing sequence similarity were again aligned with candidate genes identified in the reference assemblies using the BLASTN suite (https ://blast .ncbi.nlm.nih.gov/Blast .cgi). Open reading frames (ORF) were identified among the selected contigs and scaffolds using ORFfinder (https ://www.ncbi.nlm.nih.gov/orffi nder/). Multiple sequence alignments of deduced amino acid sequences of the Rf5 candidate gene Ha412v1r1_13g048260 with the contigs and scaffolds from HA-R9 and the characterized Rf orthologues from different plant species were performed using MultAlin version 5.4.1 (http://multa lin.toulo use.inra.fr/multa lin/).