Rose is the world’s most important ornamental plant, with economic, cultural and symbolic value. Roses are cultivated worldwide and sold as garden roses, cut flowers and potted plants. Roses are outbred and can have various ploidy levels. Our objectives were to develop a high-quality reference genome sequence for the genus Rosa by sequencing a doubled haploid, combining long and short reads, and anchoring to a high-density genetic map, and to study the genome structure and genetic basis of major ornamental traits. We produced a doubled haploid rose line (‘HapOB’) from Rosa chinensis ‘Old Blush’ and generated a rose genome assembly anchored to seven pseudo-chromosomes (512 Mb with N50 of 3.4 Mb and 564 contigs). The length of 512 Mb represents 90.1–96.1% of the estimated haploid genome size of rose. Of the assembly, 95% is contained in only 196 contigs. The anchoring was validated using high-density diploid and tetraploid genetic maps. We delineated hallmark chromosomal features, including the pericentromeric regions, through annotation of transposable element families and positioned centromeric repeats using fluorescent in situ hybridization. The rose genome displays extensive synteny with the Fragaria vesca genome, and we delineated only two major rearrangements. Genetic diversity was analysed using resequencing data of seven diploid and one tetraploid Rosa species selected from various sections of the genus. Combining genetic and genomic approaches, we identified potential genetic regulators of key ornamental traits, including prickle density and the number of flower petals. A rose APETALA2/TOE homologue is proposed to be the major regulator of petal number in rose. This reference sequence is an important resource for studying polyploidization, meiosis and developmental processes, as we demonstrated for flower and prickle development. It will also accelerate breeding through the development of molecular markers linked to traits, the identification of the genes underlying them and the exploitation of synteny across Rosaceae.
Rose is the queen of flowers, holding great symbolic and cultural value. Roses appeared as decoration on 5,000-year-old Asian pottery1, and Romans cultivated roses for their flowers and essential oil2. Today, no ornamental plants have greater economic importance than roses. They are cultivated worldwide and are sold as garden plants, in pots or as cut flowers, the latter accounting for approximately 30% of the market. Roses are also used for scent production and for culinary purposes3.
Despite their genetic complexity and lack of biotechnological resources, rose represents a model for ornamental plant species, allowing the investigation of traits such as bloom seasonality or flower morphology. Furthermore, rose displays a range of unique features as a result of its complex evolutionary and breeding history, including interspecific hybridization events and polyploidization4,5,6. Roses belong to the genus Rosa (Rosoideae, Rosaceae), which contains more than 150 species7 of varying ploidy levels, ranging from 2n = 2× to 10×8,9. Many modern roses are tetraploid and can be genetically classified as ‘segmental’ allopolyploids (a mixture between allopolyploidy and autopolyploidy)10, whereas dog-roses display unequal meiosis to maintain pentaploidy11,12. Rose breeding has a long and generally unresolved history in Europe and Asia, most likely involving several interspecific hybridization events. Importantly, many very-old varieties are still maintained in private and public rose gardens and are a living historical archive of rose breeding and selection13. Large and well-documented herbarium collections, combined with genomic advances, offer excellent opportunities to reconstruct phylogenetic relationships within the species.
Roses have been subject to selection for several traits that are not usually encountered in other crops. In particular, aesthetic criteria have been a principal focus of rose breeding over the past 250 years, next to plant vigour and resistances to biotic and abiotic stresses. Among the aesthetic traits, flower colour and architecture (from 5-petalled ‘simple’ flowers to 100-petalled ‘double’ flowers), floral scent and prickle formation on the stem and leaves have been the main targets of the breeders’ eyes (and noses). Although these traits can be interpreted as signs of the domestication process, they originally evolved through adaptation to natural conditions. The availability of a high-quality reference genome sequence is key to unravelling the genetic basis underlying these evolutionary and developmental processes that accelerate future genetic, genomic, transcriptomic and epigenetic analyses. Recently, a draft reference genome sequence of Rosa multiflora has been published14. Although completeness measures suggest that the assembly is fairly complete in terms of the gene space covered, it is also highly fragmented (83,189 scaffolds, N50 of 90 kb).
Here, we present an annotated high-quality reference genome sequence for the Rosa genus using a haploid rose line derived from an old Chinese Rosa chinensis variety ‘Old Blush’ (Fig. 1a,b). ‘Old Blush’ (syn. Parsons’ Pink China) was brought to Europe and North America in the eighteenth century from China and is one of the most influential genotypes in the history of rose breeding. Among other things, it introduced recurrent flowering into Western germplasm, which is an essential trait for the development of modern rose cultivars15. We validated our pseudo-chromosome scale genome assembly of ‘Old Blush’ using high-density genetic maps of multiple F1 progenies and synteny with Fragaria vesca. We delineated hallmark chromosomal features, such as the pericentromeric regions, through annotation of transposable element families and positioning of centromeric repeats using fluorescent in situ hybridization (FISH). This reference genome also allowed us to analyse the genetic diversity within the Rosa genus following a resequencing of eight wild species. Using genetic (F1 progeny and diversity panel) and genomic approaches, we were able to identify key potential genetic regulators of important ornamental traits, including continuous flowering, flower development, prickle density and self-incompatibility.
Development of a high-quality reference genome sequence
We developed a haploid callus cell line (HapOB) using an anther culture at the mid-to-late uninucleate microspore developmental stage from the diploid heterozygous ‘Old Blush’ variety (Fig. 1c–e). The homozygosity of the HapOB line was verified with ten microsatellite markers distributed over the seven linkage groups (Supplementary Table 1). Flow cytometric analysis showed the HapOB callus to be diploid, suggesting that spontaneous genome doubling occurred during in vitro propagation.
A combination of Illumina short-read sequencing and PacBio long-read sequencing technologies was used to assemble the doubled haploid HapOB genome sequence. PacBio sequencing data (Supplementary Table 2) was assembled with CANU16, yielding 551 contigs (N50 of 3.4 Mb), representing a total length of 512 Mb. Of the obtained sequence, 95% is contained in only 196 contigs. The PacBio-based assembly was error corrected with Illumina paired-end reads: 37,300 single-nucleotide polymorphisms (SNPs) and 307,700 insertions and deletions (indels) were corrected, representing 341.1 kb (Supplementary Table 2). K-mer spectrum analysis (K = 25) suggested a genome size of 532.7 Mb (251.1 Mb of a unique genome sequence and 279.6 Mb of repetitive sequences), whereas flow cytometric analysis estimated a genome size of 1C = 568 ± 9 Mb. Thus, the assembled sequence represents 96.1% or 90.1%, respectively, of the estimated genome size. No major contamination was detected by screening for the predicted prokaryotic genes (Supplementary Table 3). Furthermore, only four contigs had low Illumina read mapping frequency, all of which were found to most likely encode plant proteins.
High-density female and male genetic maps were developed from a cross between R. chinensis ‘Old Blush’ and a hybrid of Rosa wichurana (OW). F1 progeny from this cross (n = 151) were genotyped with the WagRhSNP 68K Axiom array17 (Table 1 and Supplementary Table 4). Thirteen contigs, for which marker order clearly indicated assembly artefacts, were split before anchoring all 564 resulting contigs to the female and male genetic maps using a total of 6,746 SNP markers (Table 1). Of these, 196 contigs were anchored manually onto the seven linkage groups, mostly on both the female and the male genetic maps (174 and 143 contigs, respectively). In total, 466 Mb were therefore anchored onto the genetic maps and assembled into 7 pseudo-chromosomes representing 90% of the assembled contig length (Table 1 and Supplementary Fig. 1a). The remaining 368 contigs (52 Mb) were assigned to chromosome 0 (Chr0). The quality of the assembly of the seven pseudo-chromosomes was assessed using two independent genetic maps: the previously published integrated high-density genetic map (K5) based on 172 tetraploid F1 progeny10, and a newly developed high-density map based on 174 diploid F1 progeny from a cross between cultivar ‘Yesterday’ and R. wichurana (YW; see Supplementary Fig. 1b). The co-linearity between the pseudo-chromosomes and both linkage maps is excellent (Supplementary Fig. 2). In addition, the anchoring of the 386 contigs (52 Mb), currently assigned to Chr0, onto the K5 map and the YW map revealed that 39 contigs (total: 28.4 Mb) and 27 contigs (total: 24.1 Mb), respectively, can potentially be positioned onto the 7 linkage groups (Supplementary Fig. 2). However, because these genetic maps were created using independent genotypes that are not related to R. chinensis ‘Old Blush’, we chose a conservative approach by not incorporating these contigs into the pseudo-chromosome sequence of HapOB.
Positioning centromeres within the genome assembly
The centromeric regions were identified using both bioinformatic and cytogenetic methods. We discovered a highly abundant tandem repeat (0.06% of the genome with more than 2,000 copies per haploid genome) of monomers (159 bp long) that we call OBC226 (‘Old Blush’ centromeric repeat from RepeatExplorer cluster 226; Fig. 2a). PCR confirmed the tandem organization of this repeat (Fig. 2b). FISH analysis unambiguously confirmed the location of the repeat in the centromeric regions of four of the seven chromosomes: Chr2, Chr5, Chr6 and Chr7 (Fig. 2c). Mapping of the OBC226 repeat sequence revealed regions with high coverage on all HapOB pseudo-chromosomes except Chr1, which explains why no clear centromeric region could be detected on this chromosome (Fig. 2d). On Chr3 and Chr4, the copy number of OBC226 was probably too low to be detected by FISH. Furthermore, the core OBC226 centromeric repeats were flanked by other repetitive sequences, and these were unequally distributed along the chromosomes, with a clearly higher density in the core centromeric regions (Fig. 2d). These centromeric regions were also enriched in Ty3/Gypsy transposable elements. Taken together, these results confirm the position of the centromeric regions on the seven pseudo-chromosomes and reveal the high repeat sequence content, and low gene content, of the scaffolds currently assigned to Chr0.
Annotation of the sequence
Based on the mapping of 723,268 transcript sequences (expressed sequence tag/complementary DNA and RNA sequencing (RNA-seq) contigs with a minimum size of 150 bp) onto the HapOB genome assembly, we predicted a total of 44,481 genes covering 21% of the genomic sequence length using Eugene combiner18. These include 39,669 protein-coding genes and 4,812 non-coding genes. Evidence of transcription was found for 87.8% of all predicted genes. At least one InterPro domain signature was detected in 86.5% of the protein-coding genes using InterProScan19, with 68.0% of the genes assigned to 4,051 PFAM gene families20. The quality of the structural annotation was assessed using the BUSCO v2 method based on a benchmark of 1,440 conserved plant genes21, of which 92.5% had complete gene coverage (including 5.3% duplicated ones), 4.1% were fragmented and only 3.4% were missing. This result can be compared to the analysis of the whole-genome assembly, which identified 95% complete genes and 3.6% missing genes. The set of predicted non-coding genes included 186 ribosomal RNA, 751 transfer RNA, 384 small nucleolar RNA, 99 microRNA, 170 small nuclear RNA and 3,222 unclassified genes (annotated as non-coding RNA) with evidence of transcription but no consistent coding sequence.
The number of predicted proteins in Rosa (39,669) is higher than the number of predicted proteins in F. vesca (28,588 predicted proteins22). By BLAST analysis, we identified 6,543 proteins that are rose specific. Among them, 5,867 proteins have no homologue in Arabidopsis thaliana. For these proteins, no functional information is available from closely related species and experimental evidence will be required to explore their role in roses. We also looked at whether the difference in the predicted number of proteins was owing to protein family expansion. Such a scenario was detected for some protein families, including nucleotide-binding site leucine-rich repeat (NBS-LRR) and cytochrome P450 (Supplementary Fig. 3).
The REPET package23 was used to generate a genome-wide annotation of repetitive sequences of the HapOB genome (see Methods for details). Retrotransposons, also called class I elements, represent the largest transposable element genomic fraction (35.1% of the sequenced genome), of which long terminal repeat (LTR) retrotransposons represent 28.3%. Gypsy elements are more frequent than Copia (Supplementary Table 5a). Non-LTR retrotransposons long intersperced nuclear elements (LINEs) and potential short intersperced nuclear elements (SINEs) represent 5.0% of the sequence genome and class II elements (DNA transposons and Helitrons) represent 11.7% (Supplementary Table 5a). The remaining 15% include unclassified repeats (7.3%), chimaeric consensus sequences (1.9%) and potential repeated host genes (5.8%). We also identified Caulimoviridae copies representing 1.25% of the genome. Interestingly, one particularly abundant Gypsy Tat-like family was found in the genome assembly. The total copy coverage represents 3.4% of the genome. Tat-like elements are known to have an open reading frame (ORF) after the polymerase domains, and surprisingly in this case, the ORF corresponds to a class II transposase domain.
In a preliminary comparison between the transposable element annotation in HapOB and the F. vesca v2.0.a1 genome assembly (without manual curation) (Supplementary Table 5b), we found that retrotransposon elements represent the largest transposable element genomic fraction in F. vesca (13.91%), similar to rose. We found approximately twofold more copies for all transposable element families except SINE and unclassified in Rosa than in Fragaria. This indicates that the difference in genome size between Rosa and Fragaria is largely due to an expansion of the transposable element fraction.
Synteny between Rosa and F. vesca
Rosa and Fragaria both belong to the Rosoideae subfamily of the Rosaceae24, having diverged around 50 million years ago25. Previous genetic studies have demonstrated that large macrosyntenic blocks are conserved between Rosa and Fragaria10,26. We compared the HapOB genome to the recently updated F. vesca genome22 to analyse the synteny in detail (Supplementary Fig. 4a). R. chinensis Chrs 1, 4, 5, 6 and 7 display strong synteny with F. vesca Chrs 7, 4, 3, 2 and 5, respectively. Consistent with previous suggestions10, a reciprocal translocation was detected between R. chinensis Chr 2 and 3 and F. vesca Chrs 6 and 1, respectively. Our results clarify the highly conserved synteny between F. vesca and Rosa, revealing only two major translocation events.
Within the Rosaceae family, the synteny is also well conserved between Prunus and Rosa (Supplementary Fig. 4c): Rosa Chr1 corresponds to Prunus Chr2, Rosa Chr4 corresponds to the end of Prunus Chr1, whereas Prunus Chrs 3, 5 and 8 correspond to large parts of Rosa Chrs 2, 6 and 7 respectively. Owing to the allopolyploid origin of Malus, the overall synteny is less clear, even if large blocks of synteny can be detected (Supplementary Fig. 4b).
Genetic diversity with the genus Rosa
The 150 or more existing rose species belong to four subgenera. Excluding the subgenus Rosa, all subgenera contain only one or two species. We resequenced eight Rosa species (Table 2), representing three of the four subgenera (Hulthemia: R. persica, Herperhodos: R. minutifolia and Rosa). Within Rosa, we covered all of the main sections according to the latest phylogenetic analyses27,28 (Table 2) in the form of R. chinensis var. spontanea, R. rugosa, R. laevigata, R. moschata, R. xanthina spontanea and R. gallica. All are diploid species except R. gallica, which is tetraploid (Table 2). SNPs and indels were identified relative to the HapOB reference sequence (Fig. 3).
The nuclear SNP-based phylogenetic tree of the eight species (Fig. 3a) is consistent with previous molecular analyses27,28. The clade, including R. chinensis, R. gallica, R. moschata and R. laevigata, fits with the Synstylae and allies clade previously found in a chloroplastic analysis28. The same is the case for R. persica and R. xanthina, both belonging to the Cinnamomeae and allies clade. However, R. rugosa and R. minutifolia show an uncertain position. In particular, R. minutifolia, which belongs to the Hesperhodos subgenus27,28, was expected to be closer to R. persica and R. xanthina. One of the possible explanations is that the resequenced R. minutifolia individual is actually the product of an interspecific cross, as it shows unexpected morphological characters, such as few prickles on the young flowering shoots and flowers clustering in inflorescences. Methodologically, the use of only homozygous SNPs may have caused bias, especially in R. rugosa, as most of its SNPs were in the heterozygous state (Supplementary Table 6).
The lowest SNP and indel density was found in R. chinensis var. spontanea (9.9 and 1.6 per kb, respectively). ‘Old Blush’ is described as an interspecific cross between R. chinensis var. spontanea and R. x odorata var. gigantea6, which is consistent with the relatively low sequence divergence of R. chinensis var. spontanea compared to the HapOB reference sequence. The highest SNP and indel density was found in R. gallica (21.0 and 4.5 per kb, respectively); this could be the result of the (allo)tetraploidy of this species29, as shown by its high proportion of heterozygous SNPs (74%; Supplementary Table 6).
As expected, the majority (79.2–89.0%) of the SNPs were located in non-coding regions (Supplementary Table 6). Only 3–7% of the SNPs were located in exons, of which half were synonymous, in line with other species (for example, tomato30). The different species displayed varying levels of homozygosity (homozygous SNPs ranging from 79.2% in R. persica to 26.0% in tetraploid R. gallica; Supplementary Table 6). The number of small indels was higher (between 876,648 and 2,430,123) than Malus, with an average of 346,498 indels31, suggesting a higher level of diversity within the Rosa genus.
Analysis of the genetic control of important traits
This new reference sequence is an important tool to help decipher the genetic basis of ornamental traits, such as blooming (including continuous flowering, flower development and the number of petals), prickle density on the stem and self-incompatibility. We studied the genetic determinism in (1) two F1 progenies (151 individuals obtained from the OW progeny and 174 individuals obtained from the YW progeny), and (2) a panel of 96 rose cultivars originating from the nineteenth to the twenty-first century32,33. Our data demonstrate that important loci controlling continuous flowering, double flower morphology, self-incompatibility and prickle density were predominantly localized in a single genomic region of Chr3 (Fig. 4a).
Detection of a new allele controlling continuous flowering in rose
Most species of roses are 'once flowering'. In rose, continuous flowering is controlled by a homologue of the TERMINAL FLOWER 1 (TFL1) family, RoKSN, located on Chr3 (ref. 34). The continuous flowering phenotype is due to the insertion of a Copia retrotransposon element in the RoKSN gene. The continuous flowering rose ‘Old Blush’ was previously proposed to be RoKSNCopia/RoKSNCopia at the RoKSN locus34. This Copia element corresponds to the RLC_denovoHm-B-G10244-Map11 retrotransposon. We identified 34 insertions of this transposable element in the HapOB genome, of which 11 are full length (Supplementary Table 7a). The element is inserted into three genes, all of which are disrupted. The 3′ and 5′ LTRs of the full-length elements are >99% similar (Supplementary Table 7a), suggesting a recent insertion, as previously proposed for the element inserted in RoKSN34.
Here, quantitative trait locus (QTL) analysis in the OW progeny identified the CONTINUOUS FLOWERING locus on Chr3 (Fig. 4a), as expected, but we were unable to detect the RoKSN gene in the annotated HapOB genome. Detailed analysis of RoKSN allele segregation in the OW progeny revealed the existence of a null allele, in which RoKSN is deleted (see Supplementary Table 8 for further details). The diploid ‘Old Blush’ parent of the OB mapping population is therefore hemizygous RoKSNCopia/RoKSNnull, and the RoKSNnull allele is present in the HapOB genome sequence.
Interesting parallels exist between rose and F. vesca because F. vesca also exhibits both the once flowering and the continuous flowering phenotypes. In strawberry, a 2-bp deletion in the TFL1 homologue causes a shift from once flowering to continuous flowering34,35. Synteny analysis revealed four orthologous syntenic blocks in the RoKSN gene region, here called blocks A–D (Supplementary Fig. 5). We detected a pattern of conserved gene content in combination with genome rearrangements between different Rosa species and the published genome sequence of F. vesca36 where the synteny with F. vesca is broken at the FvKSN location. The FvKSN gene is located between the A and B blocks in F. vesca. The A block is inverted in the HapOB genome, and the C and D blocks are inserted between the A and B blocks. In R. multiflora14 and R. laevigata (see Methods for the partial R. laevigata genome sequence assembly), which are both once flowering, the RoKSNWT allele is present and synteny is conserved with F. vesca (Supplementary Fig. 5). Taken together, these data indicate that the RoKSNnull allele is the result of a large rearrangement at the CONTINUOUS FLOWERING locus, leading to the complete deletion of the RoKSN gene. The RoKSNnull allele represents a novel allele responsible for continuous flowering, which has not been previously described.
The number of petals is an important ornamental trait, and roses with higher numbers of petal (‘double flower’) have traditionally been selected. Through a study of mutant lines (sports), the change in petal number was attributed to a homeotic conversion in organ identity, with stamens converted into petals37. The genetic basis of the double flower trait is complex, with a dominant gene (DOUBLE FLOWER) controlling simple versus double flower phenotypes and two QTLs controlling the number of petals on double flowers38. Here, we combined the genome sequence with segregation data of four different F1 progenies to confine the putative location of the DOUBLE FLOWER locus (Supplementary Table 9) to a 293-kb region of Chr3 (between position 33.24 Mb and 33.53 Mb; Fig. 4a). Using a genome-wide association study (GWAS) approach with a panel of 96 cultivated roses, we detected a strong association with simple versus double flowers in the same region (between position 33.08 Mb and 33.94 Mb; Fig. 4b). A second significant peak was located at a distance of 5 Mb, which may correspond to a secondary locus influencing this trait.
The 293-kb region contains 41 annotated genes. Among these, half are expressed during the early stages of floral development (Supplementary Table 10). By excluding genes expressed in later floral stages (with completely open flowers), we retained four candidate genes: an F-box protein (RC3G0245100), a homologue of APETALA2/TOE (RC3G0243000), a Ypt/Rab-GAP domain of the gyp1p superfamily protein (RC3G0245000) and a tetratricopeptide repeat-like superfamily protein (RC3G0243500) (Supplementary Table 10).
Concerning double flowering, ‘Old Blush’ is heterozygous for the DOUBLE FLOWER locus. Sequencing both alleles of the four selected candidate genes in ‘Old Blush’ revealed only minor modifications for RC3G0245100, RC3G0245000 and RC3G0243500 (Supplementary Fig. 6a–c, respectively). However, concerning the APETALA2/TOE gene (RC3G0243000), we detected a 1,426-bp insertion in intron eight (Supplementary Fig. 7a). The insertion showed high similarity to an unclassified transposable element (annotation noCAT_denovoHM-B-R7962-Map20; Supplementary Table 5c). This repeat element is present 62 times in the HapOB genome, of which 20 insertions are full length and 4 are located in gene introns (Supplementary Table 7b). Apart from this insertion and a few SNPs, no other differences were detected between the two alleles. In the OW progeny, all individuals that carry the transposable element insertion allele display the double flower phenotype (see Supplementary Table 11 for further details).
Phylogenetic analysis showed that RC3G0243000 belongs to the APETALA2/TOE clade within the AP2/ERF subfamily39 (Supplementary Fig. 7d). Like all members of the AP2 clades, the protein encoded by RC3G0243000 contains two conserved AP2 domains and a conserved putative miRNA172 binding site (Supplementary Fig. 7b,c). The genomic position, expression analysis, protein sequence data and predicted deleterious effect of the insertion in intron 8 suggest that the APETALA2/TOE gene is a good candidate for the DOUBLE FLOWER locus. APETALA2/TOE has a central role in the establishment of the floral meristem and in the specification of floral organs40,41,42,43. APETALA2 was classified as a class A floral homeotic gene, which specifies sepal identity if expressed alone and petal identity if expressed together with class B genes44. Furthermore, AP2/TOE3 repressed AGAMOUS expression (a class C gene) in the two outer floral whorls in the floral meristem42 (reviewed in ref. 45). In rose, a reduction in the levels of RhAGAMOUS transcripts was proposed to be the basis of the double flower phenotype37. We hypothesise that misregulation of the rose APETALA2/TOE homologue (due to the presence of the transposable element) is responsible for the RhAGAMOUS transcript level reduction, leading to the double flower phenotype.
Interestingly, a GWAS approach for petal number (a quantitative analysis) in a panel of tetraploid and double flower varieties33 revealed that the most significant QTL is also located at the DOUBLE FLOWER locus (Fig. 4c). Several markers in this cluster display significant dose-dependent effects on the number of petals. One of these markers, RhK5_4359_382 (at position 33.55 Mb), was analysed via the Kompetitive allele specific PCR (KASP) technology both in the original association panel of 96 cultivars and in an independent panel of 238 tetraploid varieties and showed the same effect in both populations (Supplementary Fig. 8a,b). Two other markers (RhK5_14942 and RhMCRND_760_1045) were also tested on the 96 cultivars by KASP technology and revealed the same pattern (Supplementary Fig. 8c,d). This demonstrates a dual role of the DOUBLE FLOWER locus in rose: it controls both the double flower phenotype (double versus single flowers) and the number of petals. Given that the petal number QTL was detected in several panels of unrelated rose genotypes, it seems that this locus acts independently of the genetic background.
As described for other Rosaceae species46,47,48, in some diploid roses, self-incompatibility is caused by a gametophytic SI (self-incompatibility) locus. This locus is most likely composed of genes encoding S-RNases and F-box proteins, which represent the female- and male-specific components, respectively. Previous approaches have failed to characterize the Rosa Sl-locus genes owing to the low sequence similarity between S-RNase genes across species and the existence of multiple genes for both S-RNases and F-box proteins. A screen for S-RNase and F-box homologues in the HapOB genome sequence identified a region of 100 kb on Chr3 that contains three genes coding for S-RNAses and four genes for S-locus F-box proteins (Fig. 4a and Supplementary Fig. 9a). This region is syntenic with the SI locus in Prunus persica (Supplementary Fig. 9b). One of the S-RNases (S-RNase36) was expressed in pistils of ‘Old-Blush’ flowers. Of the F-box genes, Fbox38 accumulated in the stamens (Supplementary Fig. 9c,d). Hence, this region fulfils the requirements of a functional S-locus.
This region is consistent with previous data on segregation of the self-incompatibility phenotype in a diploid rose population, in which the self-incompatibility phenotype was analysed by generating a bi-parental progeny and backcrossing individual progeny to both parents49. We generated a marker for an orthologue of the S-RNase gene (SRNase30) expressed in pistils of ‘Old Blush’ that co-segregates with the S-locus at a distance of 4.2 cM. The large number of recombinants might be explained by incomplete expression of self-incompatibility (leaky phenotypes) in some individuals of the progeny, a phenomenon that is also observed in, for example, Solanum populations50.
We investigated the genetic regulation of prickle density in rose. In two F1 progenies, QTLs were detected on Chr3. In the OW and YW progenies, a large region of significant association was detected between position 31.2 Mb and the end of the chromosome on both male and female maps (Fig. 4d,e, respectively). In both populations, two peaks were clearly detected, which probably correspond to two neighbouring QTLs (Fig. 4a,d,e). Through a GWAS approach, we detected a strong association between SNPs and the presence of prickles between positions 31.0 Mb and 32.4 Mb (Supplementary Fig. 10a). In rose, prickles originate as a deformation of glandular trichomes in combination with cells from the cortex51. We have looked for homologues of candidate genes controlling trichome initiation and development identified in A. thaliana52. Screening the QTL region on Chr3 of HapOW for gene family members of these candidate genes revealed several WRKY transcription factors, of which RC3G0244800 (positioned at 33.40 Mb; Fig. 4a) shows strong similarity with AtTTG2 (TESTA TRANSPARENT GLABRA2), which is involved in trichome development in Arabidopsis53 (Supplementary Fig. 10b). We studied the expression of the rose TTG2 homologue (RcTTG2) in three different individuals of the OW progeny with different prickle densities (absence, medium- and high-density prickles on the stem; Supplementary Fig. 10c). The RcTTG2 transcript accumulated at higher levels in stems presenting prickles, suggesting that RcTTG2 is a positive regulator of prickle presence in rose. This TGG2 homologue represents a good candidate for the control of prickles in rose.
We have produced a high-quality reference rose genome sequence that will represent an essential resource for the rose community but also for rose breeders. Using this new reference sequence, we have analysed important structural features of the genome, including the position of the centromeres (Fig. 2) and SNP and indel frequencies (Fig. 3).
Taking advantage of this new high-quality reference sequence, rose is set to become a model species to study ornamental traits. For example, rose was previously used to study scent emission, leading to the discovery of a new pathway for the synthesis of monoterpenes54. Here, using a combination of genomic and genetic approaches (F1 progenies and GWAS diversity panel), we have demonstrated that this new reference sequence can be used to analyse loci controlling ornamental traits, such as continuous flowering, double flower, self-incompatibility and prickle density (Fig. 4). We have identified and characterized candidate genes for these traits. We propose that a rose APETALA2/TOE homologue controls the switch from simple to double flower and, unexpectedly, also the number of petals within double flowers. Further analyses are necessary to validate the function of these genes. The analyses were done in diploid roses but also in tetraploid roses, allowing direct implementation in rose breeding materials, with the development of diagnostic markers as we demonstrated for petal number. For this economically crucial trait, we have developed a genetic marker that permits the prediction of petal number, which we validated on a large panel (Supplementary Fig. 8). This represents a good example of how the development and release of the rose genome sequence can accelerate gains in rose breeding.
Cultivated roses have an allopolyploid background but segregate mainly tetrasomically10,55. Hence, rose is a unique model for polyploidization and chromosome pairing mechanisms, which can now also be investigated at the molecular level. This reference sequence opens the way to genomic and epigenomic approaches to study important traits, providing an essential bridge between this and other plant species.
Development of haploid ‘Old Blush’ callus
Young flower buds of ‘Old Blush’ (Fig. 1c) with microspores at a mid-to-late uninucleate developmental stage (Fig. 1d) were collected in a greenhouse, wrapped in aluminium foil and stored in the dark at 4 °C for 25 days. These were then surface sterilized in 70% ethanol for 30 s and in sodium hypochlorite solution (2.9° active chloride) for 15 min followed by rinsing three times in double-distilled sterilized water.
Anthers were aseptically removed using binoculars and ground in starvation B medium56 with minor modifications (pH 6 and 0.1 M sorbitol) for 2 min using a MSE homogenizer (Measuring & Scientific Equipment) set at 10,000 r.p.m. Anthers were then collected on 50-µm mesh filters, covered with a fine layer of fresh modified starvation B medium and incubated for 24 h at 22 °C in darkness. Anthers were transferred on MS medium containing 30 g l−1 sucrose, 0.5 mg l−1 BAP (6-benzylaminopurine) and 0.1 mg l−1 NAA (naphthaleneacetic acid) in 12-well culture plates. Plates were incubated in darkness at 23 °C/19 °C (16 h/8 h), taking care not to move the boxes or expose them to light for 80 days to induce somatic embryo formation. Somatic embryos were isolated from the anthers and transferred on the same medium in petri dishes with filter paper in 4-week intervals until the production of callus (Fig. 1e). Then, callus was multiplied on the same medium in the dark until enough material for DNA extraction was produced. Homozygosity was verified using ten previously described microsatellite markers57.
Genome sizes and ploidy levels were analysed on a flow cytometer, PASIII (488-nm, 20-mW laser; Partec). The Cystain absolute PI reagent kit (Sysmex) was used for sample preparation. Solanum lycopersicum ‘Stupické polni tyckove rane’ (1,916 Mb/2C) was used as an internal standard.
Genome sequencing and assembly
DNA extraction for PacBio and Illumina sequencing
Callus tissues of the haploid ‘Old Blush’ HapOB line was kept in the dark for 3 days prior to DNA extraction to reduce chloroplast DNA contamination. DNA extraction was performed on 1 g HapOB callus tissue as described previously58. In total, approximately 30 mg genomic DNA was obtained in several batches for the preparation of three independent single-molecule real-time (SMRT) bell libraries. For the first library, genomic DNA was sheared by a Megaruptor (Diagenode) device with 30-kb settings. Sheared DNA was purified and concentrated with AMpureXP beads (Agencourt) and further used for SMRTbell preparation according to the manufacturer’s protocol (Pacific Biosciences; 20-kb template preparation using BluePippin (Sagesscience) size selection system with a 15-kb cut-off). Two additional libraries were made excluding the DNA shearing step, but with an additional initial damage repair. Size-selected and -isolated SMRTbell fractions were purified using AMPureXP beads and finally used for primer and polymerase (P6) binding according to the manufacturer’s binding calculator (Pacific Biosciences). Three library DNA–polymerase complexes were used for Magbead binding and loaded at 0.16, 0.25 and 0.20 nM on-plate concentrations, using 12, 7 and 8 SMRT cells, respectively. Final sequencing was done on a PacBio RS-II platform, with a 345- or 360-min movie time, 1 cell per well protocol and C4 sequencing chemistry. Raw sequence data were imported and further processed on a SMRT Analysis Server v2.3.0.
For Illumina sequencing, approximately 200 ng genomic DNA was sheared in a 55-µl volume using a Covaris E210 device to approximately 500–600 bp. One library with an insert size of 720 bp was made using Illumina TruSeq Nano DNA Library Preparation Kit according to the manufacturer’s guidelines. The final library was quantified by Qubit fluorescence spectrophotometry (Invitrogen) and the library fragment size range was assessed by Bioanalyzer High Sensitivity DNA assay (Agilent). The library was used for clustering as part of two lanes of a paired-end flow cell v4 using a Cbot device and subsequent 2 × 125 paired-end sequencing on a Hiseq2500 system (Illumina). De-multiplexing was carried out using Casava 1.8 software.
Genome assembly, polishing and contamination assessment
All sequence data generated that were derived from 27 SMRT cells containing 19.2 Gb of reads larger than 500 bp were assembled with CANU hierarchical assembler v1.4 (ref. 16) (version release r8046). In general, default settings were used except ‘corMinCoverage’, which was changed from 4 to 3, ‘minOverLapLength’, which was increased from 500 to 1,000, and ‘errorRate’, which was adjusted to 0.015. The assembly was completed on the Dutch National e-Infrastructure with the support of SURF Cooperative using 2,024 CPU hours (Intel Xeon Haswell 2.6 GHz) for the complete CANU process. Illumina paired-end (2 × 125 bp) reads were mapped onto the genome assembly using Burrow-Wheeler aligner maximum exact match (BWA-MEM)59. Pilon60 was then used to error correct the assembly. This procedure was repeated three times iteratively.
For contamination assessment, prokaryotic genes were predicted on the contigs using MetaGeneAnnotator61. The number of genes per nucleotide was computed for every contig. Furthermore, Illumina reads were mapped on the contigs using BWA-MEM62. The number of mapped reads per nucleotide was computed for every contig. Contigs with a low Illumina read mapping frequency were aligned against the GenBank non-redundant protein database using BLASTX.
Development of high-density genetic maps and GWAS analysis
A diploid F1 population of 151 individuals (OW) was obtained by crossing R. chinensis ‘Old Blush’ and a hybrid of R. wichurana obtained from Jardin de Bagatelle (Paris, France). This population was planted at the INRA Experimental Unit Horti (Beaucouzé, France).
A diploid F1 population of 174 individuals (YW) was obtained from a cross between ‘Yesterday’ and R. wichurana (the extended population as used in ref. 63). This population was planted at the ILVO (Melle, Belgium).
The tetraploid K5 cut rose mapping population consisted of 172 individuals obtained from a cross between P540 and P867. It was planted in Wageningen, the Netherlands, and was previously used in various QTL studies64,65.
The association panel comprised 96 cultivars, of which 87 were tetraploid, 8 were triploid and 1 was diploid, selected to reduce the genetic relatedness between genotypes33. Plants were cultivated in a randomized block design, with three blocks comprising one clone of each genotype both in the greenhouse and at an experimental field location at Leibniz Universität Hannover, Germany. For marker validation, an independent population of 238 tetraploid varieties was used that was cultivated in a field plot of the Federal Plant Variety Office in Hannover, Germany. Plants of the association panel and the phenotypic data are described in Supplementary Table 12.
Genetic map construction
The construction of the different genetic maps from F1 progenies (OW, YW and K5), the KASP assay for SNP validation and the development of a sequence characterized amplified region (SCAR) marker for the SI locus are described in Supplementary Methods.
The GWAS analyses for petal numbers and prickle density were performed in TASSEL 3.0 (ref. 66) as described previously33. Trait marker association for petal number was analysed using the mixed linear model (MLM) and 39,831 markers (petal as a quantitative trait with the Q + K model), including a fixed effect as the population structure matrix (Q) and random effect as the kinship matrix (K). Significance thresholds were corrected for multiple testing by the Bonferroni method using the number of contigs (19,083) as a correction factor, resulting in a significance threshold of 1.78 × 10–6. The kinship matrix used in the MLM was calculated for 10,000 SNP markers with the software SPAGeDi 1.5 (Zitat) as described previously33. For the GWAS analysis of prickles and petals with the general linear model (GLM) in TASSEL 3.0 (ref. 66), 63,000 markers were analysed. Petals and prickles were set as qualitative traits (1 and 0 to indicate presence or absence, respectively), and the analysis was performed without any correction for population (Q + K). Significance thresholds in the GLM were corrected by the number of contigs (28,054) to 1.78 × 10–6.
Alignment of the HapOB rose genome with the OW genetic maps
The alignment of the genetic and physical maps was done in two steps. First, the HapOB sequence was aligned to the integrated genetic maps to detect problems of assembly (contigs that are present on two linkage groups). Second, to precisely order and orient the contigs on each linkage group, the alignment was done separately on the male and female maps and manually integrated.
During the first step, 7,822 out of a total of 7,840 SNP markers were positioned by mapping the corresponding 70-bp probes onto the HapOB genome sequence using Blat v.35 (ref.67). Markers with more than one best hit were eliminated. Out of the 7,360 remaining markers, 6,808 passed the mapping quality filter (≥95% match and ≤4% mismatch). Of these, 6,746 markers belonging to the most common linkage group on their respective contigs were conserved and described as 'concordant' markers. Only contigs with more than one of these markers were retained.
During the second step, the mapping and anchoring were done independently on the male and female maps (Table 1). The procedure and conditions were the same as for the first mapping. Only concordant markers were kept (4,875 (87%) and 1,871 (81%) for the female and male map, respectively). We positioned and oriented the different contigs manually (Supplementary Table 13). When a contig spanned several loci, its order and position were clear. However, for some contigs, genetic maps did not resolve the orientation problems. In these situations, we used the synteny between Rosa and F. vesca10. The strategy used to position and orient contigs is described in Supplementary Fig. 11. The position and orientation of the contigs are listed in Supplementary Table 13.
Concerning the K5 integrated genetic map, among the 25,695 SNP markers present, 20,706 SNPs (80.6%) could be positioned on the HapOB genome sequence by BLAST of the SNP-flanking marker sequences (Supplementary Fig. 2a).
Centromere region identification and FISH
Three complementary tools were used to identify centromeric tandem repeats and to estimate their abundance in the R. chinensis ‘Old Blush’ genome: Tandem Repeat Finder (TRF68), TAREAN69 and RepeatExplorer70, each with default settings, and the output was parsed using custom python scripts. All tandem repeats identified by TRF were subjected to all-against-all BLAST to cluster similar repeats and to estimate abundance (the total number of tandem repeat cluster copies) in the genome. Paired reads were quality filtered and trimmed to 120 bp for analysis by RepeatExplorer (0.5 M read pairs) and TAREAN (1.3 M read pairs). RepeatExplorer cluster CL226 had the globular-like shape specific for tandem repeats. The corresponding monomer repeat sequence was identified by analysing the contigs of this cluster with TRF. The identical tandem repeat was also identified by TAREAN and TRF. To determine the location of the CL226 tandem repeat cluster in the genome assembly, 275 M paired-end genomic reads of ‘Old Blush’ were mapped onto the contigs from RepeatExplorer cluster CL226, using Bowtie2 (ref. 71) with parameter -k 1 to select read pairs with high similarity to the CL226 repeat. Selected read pairs were then split into two groups: reads that matched the CL226 repeat sequence itself and reads that matched the flanking genome sequence. Both groups of reads were separately mapped onto the genomic scaffolds using Bowtie with parameters -a 1 and -N 1. The distribution of the two sets of CL226 reads was visualized using the circlize package72 of R Bioconductor73. Mitotic chromosome slides were prepared with the 'SteamDrop' method74 using young root meristems of R. chinensis ‘Old Blush’. Two oligonucleotide probes (5′-TTGCGTTGTTCTAGTGACATTCA-TAMRA-3′; 5′-ACCCTAGAAGCGAGAAGTTTGG-TAMRA-3′) were used for FISH, as previously described75. DRAWID76 was used for chromosome and signal analysis.
Annotation of the rose genome
Gene and transposable element annotations are described in Supplementary Methods.
The plant material originated from ‘Loubert Nursery’ in Rosier-sur-Loire, France (R. persica), from ‘Rose Loubert’ rose garden in Rosier-sur-Loire, France (R. moschata, R. xanthina spontanea and R. gallica) and from ‘Roseraie du Val de Marne’, Haÿ-Les-Roses, France (R. chinensis var. spontanea, R. rugosa, R. laevigata and R. minutifolia alba).
Illumina paired-end shotgun indexed libraries were prepared from 3 μg DNA per accession, using the TruSeq DNA PCR-Free LT Kit (Illumina). Briefly, indexed library preparation was performed with low-sample protocol with a special development to reach an insert size of 1–1.5 kb. DNA fragmentation was performed by AFA (Adaptive Focused Acoustics) technology on the focused ultrasonicator E210 (Covaris). All enzymatic steps and clean up were done according to the manufacturer’s instructions, apart from the fragmentation and sizing steps. Paired-end sequencing using 2 × 150 sequencing-by-synthesis cycles was performed on a HiSeq 2000/2500, Rapid TruSeq V2 chemistry (Illumina) running in rapid mode using on-board cluster generation (according to the manufacturer’s instructions). For some read sets, a low enrichment of libraries with five PCR amplification cycles was performed.
Cutadapt and FASTX toolkit software were used for quality control (Q > 30), and adapter trimming and high-quality reads were considered for further analysis. To identify the SNPs and indels in each species, filtered paired-end reads were mapped against the HapOB reference using BWA with default parameters77. The BWA software produced highly accurate alignment compared to other software. Unmapped and duplicated reads were removed using SAMtools and the Picard package, respectively78. Furthermore, reliable mapped reads were used for base quality score recalibration and indel realignment using the Genome Analysis Toolkit (GATK) software79. We then called variants individually on each sample using the HaplotypeCaller/GATK. The identified SNPs and indels were filtered out on the bases of a minimum read depth of 20 and SNP quality (Q) ≥ 40. The genomic distribution of SNPs and indels was analysed by calculating their frequency over each 200-kb interval on each HapOB chromosome. Circos was used to visualize the distribution of SNPs and indels on each HapOB chromosome. SnpEff and SnpSift80,81 were used to annotate the effects of SNPs and identify the potential functional effects of amino acid substitution on corresponding proteins, respectively.
To infer phylogenetic relationships between Rosa species, homozygous SNPs from each VCF file were merged using GATK CombineVariants and parsed to build a SNP alignment using VCFtools and our own scripts. A maximum likelihood analysis was performed using RAxML v8.1.5 with 100 bootstrap replicates82. As the SNP alignment contains only variable sites, an ascertainment bias correction was applied to the GTRGAMMA model of substitution83. The resulting phylogenetic tree was rooted on R. persica, which was purported to be the most divergent Rosa species84.
To conduct the synteny analysis between the HapOB reference sequence and F. vesca, orthologous genes were identified using reciprocal BLAST with an e-value of 1 × 105 (ref. 85), v = 5 and b = 5. The protein sequences and annotation for F. vesca (v2.0.a1) were downloaded from the GDR database (https://www.rosaceae.org/). The output of the BLAST tool was used in the McSCANX tool to identify syntenic regions between the genomes86. The circos software87 was used to visualize the syntenic regions between two genomes. In addition, an analysis of microsynteny was performed between R. chinensis ‘Old blush’ and F. vesca for Chr3 to see the conserved region near the RoKSN locus using Symap software85.
Good-quality and pre-processed Illumina reads of R. laevigata were used for assembly. Genomic sequence reads were assembled using SPAdes (v3.11.1) with a k-mer value of 63 (ref. 88).
For the OW and YW populations (151 and 174 individuals, respectively), the number of petals per flower was counted using 5 or up to 10 independent flowers, respectively. In roses, single flowers typically have five petals. Flowers with fewer than eight petals were considered as simple flowers, whereas those with eight or more petals were considered as ‘double’ flowers.
For the GWAS panel, the number of petals was counted for three flowers on each of the three clones from greenhouse-grown plants, and the arithmetic means were calculated for each genotype.
In the OW and YW populations, the length of a stem part with four internodes was measured in the middle of a stem (between the fifth and seventh internodes). Prickles were counted on four internodes. The prickle density was expressed as the number of prickles per internode. For each genotype, three stems were measured and counted.
For the GWAS panel, prickle density was calculated as the arithmetic mean of the number of prickles between the third and fourth node of newly developed shoots. For each genotype, three shoots were counted from three replicates in a randomized block design.
For TTG2 expression analysis, three individuals of the OW progeny were selected according to prickle density: OW9068 (no prickles), OW9155 (low density) and OW9106 (high density). The terminal part of young stems was harvested in spring 2016 from field-grown plants (two biological replicates). RNA extraction, cDNA synthesis, qPCR (three technical replicates) and relative quantifications were performed as previously described89. Calibration was done using TCTP and UBC genes. The following primers were used to amplify TTG2 (RcTTG2-1-F: CCTCAAACCCAGGAGCATC and RcTTG2-1-R: CAACAGCTTGATCCCTGAGAG).
Organ-specific expression of candidate self-incompatibility genes were tested using RNA extracted from the stamens and pistils of three flower buds and five open flowers and the terminal leaflets of three young leaves, sampled from an individual of ‘Old Blush’ in August 2017. RNA extraction was carried out according to previously published protocols37. cDNA synthesis and RT–PCR were performed with the PrimeScript RT reagent Kit with genomic DNA Eraser and EmeraldAmp PCR Master Mix (TaKaRa) according to the manufacturer’s protocols. The following primers (5′ to 3′) were used to amplify seven candidate genes and a house-keeping gene: SRNase26 (F1: TGCAGCCAACACATACGATT and R1: GCAAGAAGATCGGCGTAGTC), SRNase30 (F1: TGTTCAACAATGGCCGATAA and R1: TGCACATAAGCGAAGGAGTG), SRNase36 (F1: TGTGGTAACAGCTGCAAAGC and R1: TCAACCACGTTTTTGCCATA), Fbox29 (F2: TGACTATTTTCTATTGCGCTTGAG and R1: CACCACAAAAAGGATAACAAGAC), Fbox31 (F1: TTTGCTATGAAAATGATAACAACAG and R1: AACCCCATGGTTTCATTAAGTA), Fbox38 (F1: GACTACTCTCCTTTGGCCTGAA and R1: CTACAGCTGCAGAATCATTTGAC), Fbox40 (F1: CGTCCAATATCTCTACTCAATGGT and R1: CCTCTTCTTGGTGAGTCTGAAAT) and RoTCTP (F2: AAGAAGCAGTTTGTCACATGG and R2: TCTTAGCACTTGACCTCCTTCA).
Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.
The R code used for pairwise maximum likelihood recombination and lod score calculations is available through CRAN (https://CRAN.R-project.org/package=polymapR). The R code used to infer phylogenetic relationships is available on request from the corresponding author90. The python scripts used for centromeric region identification are available on request from the corresponding author.
All the genome data have been made available on a genome browser (https://iris.angers.inra.fr/obh/) and in the public GDR database (https://www.rosaceae.org/species/rosa/chinensis/genome_v1.0)91. FASTA files of chromosomes and genes (mRNA, proteins and non-coding RNA) and gff files for gene models and structural features (transposable element) can be downloaded from both the previously mentioned websites. Raw data (PacBio and Illumina reads) are available under the accession number PRJNA445774. RNA-seq data used for genome annotation are available under the following SRA accession numbers: SRP128461 for 91/100-5 leaves infected with blackspot and SRP133785 for R. wichurana and ‘Yesterday’ leaves infected with two powdery mildew pathotypes. Raw data of resequencing of the eight wild Rosa species are available under the SRA accession number SRP143586.
Wang, G. A study on the history of Chinese roses from ancient works and images. Acta Hortic. 751, 347–356 (2007).
Pliny (2013) Pine L'Ancience: Histoire naturelle (Schmit, S., Trans) Bibliothèque de la Pléiade No. 593 (Gallimard, Paris, 2013).
Nybom, H. & Werlemark, G. Realizing the potential of health-promoting rosehips from dogroses (Rosa sect. Caninae). Curr. Bioact. Compd. 13, 3–17 (2017).
Zhang, J. et al. The diploid origins of allopolyploid rose species studied using single nucleotide polymorphism haplotypes flanking a microsatellite repeat. J. Hortic. Sci. Biotechnol. 88, 85–92 (2013).
Ritz, C. M. & Wisseman, V. Microsatellite analyses of artificial and spontaneous dogroses hybrids reveal the hybridogenic origin of Rosa micrantha by the contribution of unreduced gametes. J. Hered. 102, 2117–2127 (2011).
Meng, J., Fougère-Danezan, M., Zhang, L.-B., Li, D.-Z. & Yi, T.-S. Untangling the hybrid origin of the Chinese tea roses: evidence from DNA sequences of single-copy nuclear and chloroplast genes. Plant Syst. Evol. 297, 157–170 (2011).
Wisseman, V. & Ritz, C. M. The genus Rosa (Rosoideae, Rosaceae) revisited: molecular analysis of nrITS-1 and atpB-rbcL intergenic spacer (IGS) versus conventional taxonomy. Bot. J. Linna. Soc. 147, 275–290 (2005).
Jian, H. et al. Decaploidy in Rosa praelucens Byhouwer (Rosaceae) endemic to Zhongdian Plateau, Yunnan, China. Caryologia. 63, 162–167 (2012).
Robert, A. V., Gladis, T. & Brumme, H. DNA amounts of roses (Rosa L.) and their use in attributing ploidy levels. Plant Cell Rep. 28, 61–71 (2009).
Bourke, P. M. et al. Partial preferential chromosome pairing is genotype dependent in tetraploid rose. Plant J. 90, 330–343 (2017).
Herklotz, V. & Ritz, C. M. Multiple and asymmetrical origin of polyploid dog rose hybrids (Rosa L. sect. Caninae (DC.) Ser.) involving unreduced gametes. Ann. Bot. 120, 209–220 (2017).
Ritz, C. M., Köhnen, I., Groth, M., Theissen, G. & Wisseman, V. To be or not to be the odd one out—allele-specific transcription in pentaploid dogroses (Rosa L. sect. Caninae (DC.) Ser). BMC Plant Biol. 11, 37 (2011).
Liorzou, M. et al. Nineteenth century French rose (Rosa sp.) germplasm shows a shift over time from a European to an Asian genetic background. J. Exp. Bot. 67, 4711–4725 (2016).
Nakamura, N. et al. Genome structure of Rosa multiflora, a wild ancestor of cultivated roses. DNA Res. 25, 113–121 (2018).
Wylie, A. P. The history of garden roses. J. R. Hortic. Soc. 79, 555–571 (1954).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Koning-Boucoiran, C. F. et al. Using RNA-seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.). Front. Plant Sci. 6, 249 (2015).
Foissac, S. et al. Genome annotation in plants and fungi: EuGene as a model platform. Curr. Bioinform. 3, 87–97 (2008).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Edger, P. P. et al. Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity. GigaScience 7, 1–7 (2018).
Flutre, T., Duprat, E., Feuillet, C. & Quesneville, H. Considering transposable element diversification in de novo annotation approaches. PLoS ONE 6, e16526 (2011).
Potter, D. et al. Phylogeny and classification of Rosaceae. Plant Syst. Evol. 266, 5–43 (2007).
Xiang, Y. et al. Evolution of Rosaceae fruit types based on nuclear phylogeny in the context of geological times and genome duplication. Mol. Biol. Evol. 34, 262–281 (2017).
Gar, O. et al. An autotetraploid linkage map of rose (Rosa hybrida) validated using the strawberry (Fragaria vesca) genome sequence. PLoS ONE 6, e20463 (2011).
Bruneau, A., Starr, J. R. & Joly, S. Phylogenetic relationships in the genus Rosa: new evidence from chloroplast DNA sequences and an appraisal of current knowledge. Syst. Bot. 32, 366–378 (2007).
Fougère-Danezan, M., Joly, S., Bruneau, A., Gao, X.-F. & Zhang, L.-B. Phylogeny and biogeography of wild roses with specific attention to polyploids. Ann. Bot. 115, 275–291 (2015).
Fernández-Romero, M. D., Torres, A. M., Millán, T., Cubero, J. I. & Cabrera, A. Physical mapping of ribosomal DNA on several species of the subgenus Rosa. Theor. Appl. Genet. 103, 835–838 (2001).
The 100 Tomato Genome Sequencing Consortium et al. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J. 80, 136–148 (2014).
Duan, N. et al. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement. Nat. Commun. 8, 249 (2017).
Nguyen, T. H. N., Schulz, D., Winkelmann, T. & Debener, T. Genetic dissection of adventitious shoot regeneration in roses by employing genome-wide association studies. Plant Cell Rep. 36, 1493–1505 (2017).
Schulz, D. F. et al. Genome-wide association analysis of the anthocyanin and carotenoid contents of rose petals. Front. Plant Sci. 7, 1798 (2016).
Iwata, H. et al. The TFL1 homologue KSN is a regulator of continuous flowering in rose and strawberry. Plant J. 69, 116–125 (2012).
Koskela, E. A. et al. Mutation in TERMINAL FLOWER1 reverses the photoperiodic requirement for flowering in the wild strawberry Fragaria vesca. Plant Physiol. 159, 1043–1054 (2012).
Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 43, 109–116 (2010).
Dubois, A. et al. Tinkering with the C-function: a molecular frame for the selection of double flowers in cultivated roses. PLoS ONE 5, e9288 (2010).
Roman, H. et al. Genetic analysis of the flowering date and number of petals in rose. Tree Genet. Genomes 11, 85 (2015).
Shigyo, M., Hasebe, M. & Ito, M. Molecular evolution of the AP2 subfamily. Gene 366, 256–265 (2006).
Bowman, J. L., Alvarez, J., Weigel, D., Meyerowitz, E. M. & Smyth, D. R. Control of flower development in Arabidopsis thaliana by APETALA1 and interacting genes. Development 119, 721–743 (1993).
Bowman, J. L., Smyth, D. R. & Meyerowitz, E. M. Genes directing flower development in Arabidopsis. Plant Cell 1, 37–52 (1989).
Jung, J.-H., Lee, S., Yun, J., Lee, M. & Park, C.-M. The miR172 target TOE3 represses AGAMOUS expression during Arabidopsis floral patterning. Plant Sci. 215–216, 29–38 (2014).
Zhang, B., Wang, L., Zeng, L., Zhang, C. & Ma, H. Arabidopsis TOE proteins convey a photoperiodic signal to antagonize CONSTANS and regulate flowering time. Genes Dev. 29, 975–987 (2015).
Bowman, J. L., Smyth, D. R. & Meyerowitz, E. M. Genetic interactions among floral homeotic genes of Arabidopsis. Development 112, 1–20 (1991).
Ó'Maoiléidigh, D. S., Graciet, E. & Wellmer, F. Gene networks controlling Arabidopsis thaliana flower development. New Phytol. 201, 16–30 (2014).
Ashkani, J. & Rees, D. J. G. A comprehensive study of molecular evolution at the self-incompatibility locus of Rosaceae. J. Mol. Evol. 82, 128–145 (2016).
Charlesworth, D., Vekemans, X., Castric, V. & Glemin, S. Plant self-incompatibility systems: a molecular evolutionary perspective. New Phytol. 168, 61–69 (2005).
McClure, B., Cruz-García, F. & Romero, C. Compatibility and incompatibility in S-RNase-based systems. Ann. Bot. 108, 647–658 (2011).
Debener, T. et al. Genetic and molecular analysis of key loci involved in self incompatibility and floral scent in roses. Acta Hortic. 870, 183–190 (2010).
Mena-Ali, J. I. & Stephenson, A. G. Segregation analyses of partial self-incompatibility in self and cross progeny of Solanum carolinense reveal a leaky S-allele. Genetics 177, 501–510 (2007).
Kellogg, A. A., Branaman, T. J., Jones, N. M., Little, C. Z. & Swanson, J. D. Morphological studies of developing Rubus prickles suggest that they are modified glandular trichomes. Botany 89, 217–226 (2011).
Pattanaik, S., Patra, B., Singh, S. K. & Yuan, L. An overview of the gene regulatory network controlling trichome development in the model plant, Arabidopsis. Front. Plant Sci. 5, 259 (2014).
Johnson, C. S., Kolevski, B. & Smyth, D. R. TRANSPARENT TESTA GLABRA2, a trichome and seed coat development gene of Arabidopsis, encodes a WRKY transcription factor. Plant Cell 14, 1359–1375 (2002).
Magnard, J.-L. et al. Biosynthesis of monoterpene scent compounds in roses. Science 349, 81–83 (2015).
Koning-Boucoiran, C. F. S. et al. The mode of inheritance in tetraploid cut roses. Theor. Appl. Genet. 125, 591–607 (2012).
Kyo, M. & Harada, H. Control of the developmental pathway of tobacco pollen in vitro. Planta 168, 427–432 (1986).
Hibrand-Saint Oyant, L., Crespel, L., Rajapakse, S., Zhang, L. & Foucher, F. Genetic linkage maps of rose constructed with new microsatellite markers and locating QTL controlling flowering traits. Tree Genet. Genomes 4, 11–23 (2008).
Daccord, N. et al. High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nat. Genet. 49, 1099–1106 (2017).
Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Noguchi, H., Taniguchi, T. & Itoh, T. MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res. 15, 387–396 (2008).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303, 3997 (2013).
Hosseini Moghaddam, H., Leus, L., De Riek, J., Van Huylenbroeck, J. & Van Bockstaele, E. Construction of a genetic linkage map with SSR, AFLP and morphological markers to locate QTLs controlling pathotype-specific powdery mildew resistance in diploid roses. Euphytica 184, 413–427 (2012).
Gitonga, V. W. et al. Inheritance and QTL analysis of the determinants of flower color in tetraploid cut roses. Mol. Breed. 36, 143 (2016).
Yan, Z., Dolstra, O., Prins, T. W., Stam, P. & Visser, P. B. Assessment of partial resistance to powdery mildew (Podosphaera pannosa) in a tetraploid rose population using a spore-suspension inoculation method. Eur. J. Plant Pathol. 114, 301–308 (2006).
Bradbury, P. J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Novák, P. et al. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res. 45, e111 (2017).
Novak, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Gu, Z., Gu, L., Eils, R., Schlesner, M. & Brors, B. Circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014).
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Kirov, I., Divashuk, M., Van Laere, K., Soloviev, A. & Khrustaleva, L. An easy “SteamDrop” method for high quality plant chromosome preparation. Mol. Cytogenet. 7, 21 (2014).
Kirov, I. V., Van Laere, K., Van Roy, N. & Khrustaleva, L. I. Towards a FISH-based karyotype of Rosa L. (Rosaceae). Comp. Cytogenet. 10, 543–554 (2016).
Kirov, I. V. et al. DRAWID: user-friendly java software for chromosome measurements and idiogram drawing. Comp. Cytogenet. 11, 747–757 (2017).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Cingolani, P. et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 3, 35 (2012).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w(1118); iso-2; iso-3. Fly 6, 80–92 (2012).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Lewis, P. O. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol. 50, 913–925 (2001).
Du Mortier, B. C. Notice sur un Nouveau Genre de Plantes: Hulthemia; Précédée d’un Aperçu sur la Classification des Roses (Casterman, J., 1824).
Lyons, E. et al. Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. Plant Physiol. 148, 1772–1781 (2008).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Randoux, M. et al. Gibberellins regulate the transcription of the continuous flowering regulator, RoKSN, a rose TFL1 homologue. J. Exp. Bot. 63, 6543–6554 (2012).
Jung, S. et al. The Genome Database for Rosaceae (GDR): year 10 update. Nucleic Acids Res. 42, D1237–D1244 (2014).
We thank the ImHorPhen team of IRHS and the experimental unit (UE Horti) for their technical assistance in plant management. We thank the PTM ANAN (M. Bahut) of the SFR Quasav and the Gentyane platforms (especially C. Poncet) for the SSR and SNP analyses, respectively. We acknowledge A. Chauveau and I. Le Clainche for libraries preparation and E. Marquand and A. Canaguier for data processing. This work was supported by CEA-IG/CNG, by conducting the DNA quality control and by providing access to the INRA-EPGV group for their Illumina Sequencing Platform. We acknowledge J.-L. Gaignard (from the communication service of the INRA) for his help to fund the project. We thank the GDR team, and particularly P. Zheng, S. Jung and D. Main, for management of the genome sequence at the GDR database. We thank ‘Région Pays de la Loire’ for funding the sequencing of HapOB (Rose Genome Project), the resequencing of eight wild species (Genorose project in the framework of RFI ‘Objectif Végétal’) and for the EPICENTER ConnecTalent grant of the Pays de la Loire (N.D. and E.B.). F.F. and L.H.S.-O. thank the ANR for funding the genetic determinism of flower development (ANR-13-BSV7-0014). K.K. thanks the JSPS for funding the analysis of the S-locus (JSPS KAKENHI no.17H04616). T.D. thanks the German Ministry of Economic Affairs for funding the GWAS analysis (Aif programme ZI) and the Deutsche Forschungsgemeinschaft for the RNA-seq data generation (DFG program GRK1798). The development of the high-density SNP maps was partly funded by TTI Green Genetics and by the TKI Polyploids projects (BO-26.03-002-001 and BO-50-002-022).
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Methods, Supplementary References and Supplementary Figures 1–12
Reporting Summary file
Validation of the homozygosity of HapOB using ten microsatellites located on the 7 linkage groups
Summary of the PacBio and Illumina sequencing of HapOB
Prokaryotic gene prediction on the HapOB contigs
Details of the high-density SNP genetic maps from the OW progeny, obtained from a cross between R. chinensis ‘Old Blush’ (OB, female) and a hybrid of R. wichurana (W, male). Linkage maps for the maternal (A1 to A7) and paternal (B1 to B7) parents are provided, specifying the number of SNP per LG, the size (in cM), the number of unique loci per LG and the density of SNP/cM
Transposable element (TE) annotation in HapOB. A) Abundance of repetitive elements in the HapOB genome. Total number of elements per type and percentage relative to the total number of elements are shown. B) Comparison of TE in HapOB and F. vesca using automatic annotation with the REPET package: no manual curation step was done. C) Consensus library for TE annotation in HapOB genome. Each line represents a TE consensus genome family. TE consensus name: name of the consensus. Length (bp): TE consensus length in bp. Coverage (bp): cumulative genome coverage of this TE family in bp. Coverage (%): cumulative genome coverage of this TE family in percentage (calculated on 518515953 bp). Frags: number of TE fragments before the TEannot "long join procedure". fullLgthFrags: number of complete TE fragments before the TEannot ‘long join procedure’ (a full-length fragment represents only one fragment that covers 95% of the consensus). Copies: number of TE copies after the TEannot ‘long join procedure’ (a copy is a chain of fragments). fullLgthCopies: number of complete TE copies after the TEannot ‘long join procedure’ (a full-length copy is reconstructed by the join of fragmented multiple hits and all these fragments cover 95% of the consensus). meanId: mean identity calculated on the percentages of identity (between each copy with its TE consensus). sdId: standard deviation on the percentages of identities
SNP analysis in resequenced species within the genus Rosa. For each species, the detected SNP are classified according to their effect (HIGH (non synonymous change, splice sites), LOW (synonymous), MODERATE and MODIFIER (intron, intergenic)) or to their position in the genome (coding gene, intergenic)
Analysis of the elements inserted in A) RoKSN and in B) the APETALA2/TOE homologue, RC3G0243000. A) The copia element corresponding to the TE consensus RLC_denovoHm-B-G10244-Map11. This TE is inserted in four coding sequences, for which the gene function is described. For the full-length element, 3' and 5' Long Terminal Repeat (LTR) elements were aligned. The high similarity (around 100%) suggests a recent insertion. B) The inserted element in the APETALA2/TOE homologue corresponds a repeat element (consensus noCAT_denovoHM-B-R7962). The element is mainly inserted in non-coding regions (bar four insertions into introns of coding genes)
Detection of the RoKSNnull allele in R. chinensis 'Old Blush' using two different progenies. We studied the RoKSN segregation in two different progenies: (a) R. chinensis ‘Old Blush’ x R. wichurana hybrid (RoKSNWT/RoKSNcopia, OW progeny) and (b) R. chinensis ‘Old Blush’ x R. moschata (RoKSNWT/RoKSNWT, OM progeny). In both populations we observed unexpected allelic combinations, with individuals exhibiting only the RoKSNWT allele (47 individuals out of 152 in OW and 5 individuals out of 10 in OM). One hypothesis to explain these unexpected results is the existence of a null allele in R. chinensis ‘Old Blush’. In other words, our data suggests ‘Old Blush’ is not RoKSNcopia/RoKSNcopia, but RoKSNcopia/RoKSNnull
Precise location of the DOUBLE FLOWER locus using OW progenies with 151 individuals (OW151, in orange), with 260 individuals (OW260, in yellow), the HW (H190 x hybrid of R. wichurana, in purple), the 94/1 (in grey) and the YW (in blue) F1 progenies
Expression pattern during the floral development of the 41 candidate genes located in the DOUBLE FLOWER locus interval (see Supplementary Table 9). The expression value is expressed as the count number from RNASeq data obtained from Dubois et al. (2012): IFL for Floral Bud and Floral Meristem transition; IMO for Floral Meristem and Early Floral organs (Sepal, petal, stamens and carpel) developments; BFL for closed flower and OFT for open flower. The four most interesting candidate-genes according to their expression patterns are in blue and bold (see details in the text)
Association of the transposable element in AP2/TOE homologue with the double flower phenotype. We have developed a microsatellite marker, present in the AP2 promoter region (2150bp downstream of the transposable element, see Supplementary Figure 7). 'Old Blush' (OB) has two alleles (180 and 166bp) as does R. wichurana, RW (186 and 169bp). By sequencing, we have shown that in OB the 166bp allele carries the transposable element, whereas the 180bp allele does not (Supplementary Figure 7). In RW, no transposable element is present. In the F1 progeny, all 154 simple-flower individuals possess the 180bp allele, whereas all 103 double-flower individuals possess the 166bp allele. As the microsatellite marker and the transposable element are in close proximity (2150bp), we can conclude that the 166bp allele / transposable element haplotype is associated with the double-flower phenotype
Rose cultivars of the association panel with data for prickle density and number of petals
Positioning and ordering of the contigs on the 7 pseudo-molecules. 196 contigs were anchored to the female and male genetic maps (more than one marker). The procedure for the positioning and ordering is given in Materials and Methods, and an example (Linkage group 5) is presented in Supplementary Figure 11. The average genetic position is presented for the male and female maps in cM. The contigs in red are those for which a manual analysis was performed, as described in Materials and Methods
About this article
Cite this article
Hibrand Saint-Oyant, L., Ruttink, T., Hamama, L. et al. A high-quality genome sequence of Rosa chinensis to elucidate ornamental traits. Nature Plants 4, 473–484 (2018). https://doi.org/10.1038/s41477-018-0166-1
The high‐quality genome of diploid strawberry ( Fragaria nilgerrensis ) provides new insights into anthocyanin accumulation
Plant Biotechnology Journal (2020)
Horticulture Research (2020)
Expansion and expression diversity of FAR1/FRS-like genes provides insights into flowering time regulation in roses
Plant Diversity (2020)
Scientia Horticulturae (2020)
Analysis of the Rdr1 gene family in different Rosaceae genomes reveals an origin of an R-gene cluster after the split of Rubeae within the Rosoideae subfamily
PLOS ONE (2020)