Introduction

Gametophytic self-incompatibility (GSI), the most common pre-zygotic self-incompatibility genetic mechanism, prevents self-fertilization between genetically related individuals, where the genotype of the haploid pollen determines its incompatibility type1. This mechanism is determined by a single locus, the S-locus, and in the most frequent eudicot system2, the S-pistil gene codes for a protein with ribonuclease activity, called S-RNase3,4,5, and the S-pollen gene (s) code(s) for a F-box protein(s)6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26. In incompatible crosses, the cytotoxic S-RNases lead to pollen tube RNA degradation, that causes the cessation of pollen tube elongation. Two mechanisms of pollen recognition have been proposed, the self and non-self-recognition systems6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26. In the self-recognition mechanism, described in the Prunus genus of the Rosaceae family6, 7, 11, 17, 24, 27, only one F-box protein is the S-pollen GSI specificity determinant, called SFB, that interacts with the self S-RNase (reviewed by28). In this system, as expected, the S-RNase and SFB genes show patterns of co-evolution29. In the non-self-recognition mechanism, characterized in Solanaceae, Plantaginaceae, and in the Rosaceae Maleae tribe (Malus/Pyrus/Sorbus), there are multiple F-box proteins, called SLFs in Solanaceae and Plantaginaceae, and SFBBs in Maleae13,14,15,16, 18,19,20,21,22,23,24,25,26, 30,31,32, that interact with all S-RNases but the self-S-RNase15, 16, 18, 20,21,22,23,24,25,26, 30,31,32. Even though different recognition interactions occur due to different evolutionary paths, the S-RNase retained the same feature of cytotoxicity to self-pollen in both mechanisms28, 30.

Based on phylogenetic inferences, the S-RNase GSI system evolved once in core eudicots33,34,35,36,37, and has been the subject of multiple duplication events during evolution. In Rosaceae, different duplicates have been recruited for GSI function in Prunus and Malus/Pyrus/Sorbus24, 37. Convergent evolution has also been proposed for the evolution of the S-pollen determinant since the Prunus S-pollen gene, belongs to a different gene lineage than that of the Malus/Pyrus/Sorbus S-pollen genes. Therefore, it is not surprising the presence of different recognition systems in Prunus (self-recognition6, 7, 11, 12, 17, 24, 27, 28, 38) and in Malus/Pyrus/Sorbus (non-self-recognition mechanism13, 16, 18,19,20, 24, 25). How the two systems evolved is still debated. The observation that Prunus S-RNase and SFB lineage genes are present in Fragaria species (an outgroup), suggests that the ancestral Rosaceae S-locus was of the self-recognition type, and that the Malus S-locus region has a “de novo” evolution24. An alternative hypothesis postulates that the divergence between SFB and SFBB genes occurred early in the establishment of eudicots, and that Prunus species started using SFB gene as a S-pollen factor around the time of the Prunus divergence39. Under this evolutionary hypothesis, the common ancestor of Prunus and Malus/Fragaria would present a non-self-recognition mechanism. In this work, by characterizing the Rosa S-locus region we clarify the Rosaceae S-locus ancestral state.

Rose (Rosa sp., Rosaceae) species are known by their ornamental value and the production of essential oils used in perfume and cosmetic industry. There are about 200 species in this genus, but only 10 species (R. canina, R. chinenesis ‘Old Blush’ OB), R. foetida, R. gallica, R. gigantea, R. moschata, R. multiflora, R. phoenicia, R. rugosa, and R. wichuraiana ‘Basye’s Thornless’) have contributed for most of the modern species, throughout processes of hybridization and polyploidization40. Based on the absence of seed set after self-pollination, the presence of a GSI system has been postulated for most Rosa diploid species41,42,43. Analyses of one of the R. chinensis genomes44, revealed three putative S-RNase genes (SRNase26, SRNase30, and SRNase36) located on chromosome 3, as a candidate region of the Rosa S-locus44. In this work we show that this region is not the S-locus. This is the reason why segregation analyses using SRNase30 as a marker, shows co-segregation with the S-locus at a distance of 4.2 cM44. Nevertheless, these segregation analyses showed that the S-locus region is located on chromosome 3, as previously reported45. According to chromosome location, we also exclude a putative S-pollen gene –RrSLF, cloned from R. rugosa pollen RNA, that in phylogenetic analyses clusters with Prunus SFB46, since the orthologous R. chinensis gene is located on chromosome 2.

Rosa diploid genomes are of relatively small size (560 Mb47, 48), and for R. chinensis ‘Old Blush’44, 49, and R. multiflora50 there are assembled genomes publicly available (https://www.ncbi.nlm.nih.gov/sra). Here, we use these datasets to perform phylogenetic analyses to identify the Rosa S-locus genes. Transcriptomic analyses using R. chinensis pistil and ovary, stamen, leaf, stem, and root tissue (available in SRA database), support the identification of the Rosa S-locus genes. To show that the Rosa S-RNase gene shows evidence of positive selection, we used low coverage Rosa genomes (R.moschata, R.laevigata, R.rugosa, R. persica, R.xanthina, R. minutifolia, R. odorata, R. arvensis, and R.majalis) available at SRA database, to identify further S-RNase alleles. Furthermore, we use R. arvensis genotype–phenotype association experiments to confirm that the identified gene is the S-RNase. We also identify F-box genes in the vicinity of the S-RNase, to determine which GSI system is present in Rosa. Expression, phylogenetic analyses, polymorphism levels, and evidence for selection favoring diversification of F-box genes within an S-haplotype, suggest multiple F-box genes as the S-pollen component. Rosaceae GSI evolution is discussed in the context of these findings.

Results

R. chinensis and R. multiflora Rosaceae S-RNase lineage genes

S-RNases code for basic proteins (isoelectric point (IP) above 8)5, 24, 33, 34, 37, that present two conserved amino acid patterns (pattern 1 and 236), are expressed in pistils (where the rejection of self-pollen occurs during the growth of pollen tubes), stigma, and flowering buds, and present signs of diversifying selection5, 24, 33, 34, 36, 37. Of the 80 S-RNase like sequences identified in R. chinensis (55) and R. multiflora (25) genomes (Table 1; Supplementary Table S1), 41 code for putative proteins with a IP above 7 (Table 1; Supplementary Table S2). Phylogenetic analyses of these sequences, and 34 reference sequences from24 (Supplementary File S1), revealed that the sequences labeled Rchinensis3_16, Rchinensis2_32_2, and Rchinensis1_1-Rchinensis2_12, do not belong to the Rosaceae S-RNase lineage (Fig. 1). None of the Rosa S-RNase lineage sequences cluster with Malus/Pyrus S-RNases. Two groups of sequences Rmultiflora_4, Rmultiflora_8, and Rchinensis1_3-Rchinensis2_27 (gene 1), and Rchinensis2_16-Rchinensis3_17-Rchinensis4_40 (gene 2), cluster with Prunus S-RNases (Fig. 1), suggesting that one of them may represent the Rosa S-RNase gene. In R. chinensis these sequences are located on chromosome 3, where the S-locus has been identified44, 45. These sequences code for putative proteins with IP above 8, they have two putative introns, and amino acid pattern 2 conserved, features typical of S-RNases. Gene 1 sequences also present high levels of synonymous polymorphism (27.9%) as the S-RNases36. Although in this group of sequences pattern 1 shows a violation at the first pattern position (Y instead of [FST]), an identical pattern is observed in Solanaceae for the S-RNase Nicotiana tomentosiformis XP_018625910_1 (this sequence clusters with other Solanaceae S-RNases in phylogenetic analyses; data not shown). Because gene 2 is represented by one sequence, we cannot address levels of diversity. Nevertheless, the presence of identical sequences in the two R. chinensis genomes analyzed, suggests low levels of diversity. The putative SRNase26, SRNase30, and SRNase36 present IP below 8, and the first two sequences cluster with Prunus and Malus S-lineage 1 genes, and SRNase36 clusters with the Malus S-lineage 2 gene. In Prunus and Malus GSI, these sequences are not involved in specificity determination.

Table 1 Summary of the S-RNase like sequences identified in the R. chinensis, and R. multiflora genomes.
Figure 1
figure 1

Bayesian phylogenetic tree, showing the relationship of the R. chinensis and R. multiflora S-RNase like sequences with Fragaria, Prunus, and Malus S-RNase lineage genes. For the R. chinensis sequences the chromosomal (Chr) location is given. The tree was rooted with MDP0000267606A T2- RNase, not involved in GSI24. In bold are the Rosa sequences that cluster with Prunus S-RNases, that could represent the S-RNase gene. Numbers below the branches represent posterior credibility values above 70.

Rchinensis1_3–Rchinensis2_27 is expressed in pistil and ovary tissue, like the S-RNases

The S-RNase gene is highly expressed in pistils, in stigmas and styles of flowers at anthesis, but also shows low expression in entire flower buds. It should be noted that S-RNase lineage genes, can also show a similar pattern of expression, since this expression is inferred to be the ancestral expression of the S-RNase lineage genes24. Therefore, it is not surprising that SRNase30, and SRNase36, show expression in pistil and ovary (Fig. 2), as previously reported44. SRNase26 shows no expression in the tissues here analyzed, as well as those used in44. Rchinensis1_3-Rchinensis2_27, shows a similar expression to SRNase30, but with high levels of expression in pistil and ovary, and low expression in stamen (Fig. 2). Rchinensis2_16-Rchinensis3_17-Rchinensis4_40 is not expressed in the tissues here analyzed. These results are compatible with Rchinensis1_3-Rchinensis2_27 gene, being the S-pistil gene determining GSI specificity.

Figure 2
figure 2

R. chinensis expression levels (FPKM) for the two Prunus S-RNase lineage genes, as well as the three previous putative S-RNases, in pistil and ovary, stamen, leaf, stem, and root tissues.

Rosa S-RNases show evidence for positively selected amino acid positions

To address if the Rosa sequences here identified as the S-locus pistil gene are the subject of diversifying selection, a feature of the S-RNase gene, we first identified and annotated sequences similar to S-RNases from other self-incompatible Rosa genomes with low coverage (R.moschata, R.laevigata, R.rugosa, R. persica, R.xanthina, R. minutifolia, R. odorata, R. arvenses, and R.majalis; Supplementary Table S1). Phylogenetic analyses using sequences presenting an IP above 7 covering at least the exon where motif 2 is located (Supplementary Fig. S1A,B; Supplementary File S2), together with the Rosa S-RNase sequences here identified, and the reference sequences from24, revealed four additional Rosa S-RNases (Rarvensis_11, Rminutifolia_20, Rmoschata17_14, and Rmoschata17_30; Supplementary Fig. S1A,B). Similar analyses with the sequences covering the motif 1, revealed four sequences (Rodorata07_39; Rmoschata08_12; Rminutifolia_7; and Rarvensis_6; Supplementary Fig. S2; Supplementary File S3) that also cluster with Rosa S-RNases sequences. Because the phylogenetic relationship of sequences Rarvensis_11 and Rarvensis_6 is similar, they may represent two exons of the same gene from the same S-haplotype, and thus were treated as such. The same applies to sequences Rminutifolia_20 and Rminutifolia_7, as well as sequences Rmoschata17_14 and Rmoschata17_34. Using these sequences together with Rchinensis1_3-Rchinensis2_27, Rmultiflora_4, and Rmultiflora_8, we identify 21 amino acid sites under positive selection (results available at http://bpositive.i3s.up.pt/ in the project named Rosa S-locus genes; BP2018000004) by performing codeML51 analyses. These amino acid sites are, in principle, responsible for GSI specificity24, 35, 52.The location of these sites at the predicted 3D structure is mostly around the active site pocket region (Fig. 3), as observed in other Rosaceae and Solanaceae species25, 52.

Figure 3
figure 3

Positively selected amino acid sites, highlighted in yellow on the predicted 3D structure of the R. chinensis S-RNase.

R. arvensis genotype–phenotype association experiments

For 12 R. arvensis accessions, obtained from a breeding program of siblings53, S-haplotype was deduced either through hand-pollinated tests (Supplementary Fig. S3A,B), allowing to perform co-segregation experiments of the S2-allele. An amplification product with the expected size (300 bp; see Material and Methods) was obtained in six (Ose (S1 S2), Url (S4 S5), E200 (Ose x Wid (S3 S6); S2 S3), E404 (E200 x Ose; S1 S2), E435 (E200 x Wid; S2 S6), and E893 (E459 x Url; S1 S5) individuals. The sequence of these amplification products, reveled two types of identical sequences. One obtained from Ose, E200, E404, and E435 individuals, those individuals having the S2-RNase (GenBank acc. Numbers MW452856–MW452859). The other sequence type was obtained from Url and E893, those individuals having the S5-RNase (GenBank acc. numbers MW452860 and MW452861). These results support co-segregation of the S-locus with the S2-RNase genotypes here surveyed. Therefore, the S-RNase gene here identified is on the S-locus region.

Identification of the Rosa S-pollen F-box genes

The R. rugosa S-locus F-box gene (RrSLF; KY446808), expressed in pollen tissue and phylogenetically related with Prunus SFB gene, was reported as a putative Rosa S-pollen gene46. It should be noted that other F-box genes not involved in S-pollen specify determination are also expressed in pollen, as well as in other tissues24. The expression of RrSLF gene in other tissues has not, however, been addressed46. This gene has 99% homology at nucleotide level with a R. chinensis sequence (CM009583.1) located on chromosome 2, but the Rosa S-locus is located on chromosome 3 (44, 45, and this work). Furthermore, all S-pollen genes described in Rosaceae, Solanaceae, and Plantaginaceae are intronless genes, and the R. chinensis RrSLF (PRQ51373) and R. multiflora (Rmu_sc0016061.1;BDJD01015883.1) orthologous genes have one intron. The orthologous gene has been identified in all low coverage Rosa genomes, except R. majalis, covering the entire coding region. Low levels of divergence (0.036 for synonymous and 0.009 for non-synonymous divergence respectively, after Jukes and Cantor correction; N = 10) are obtained for this gene. This is in contrast with the Prunus SFB gene, that presents levels of variability above 20%6, 7, 11, 36. Therefore, RrSLF is not the S-pollen gene.

In R. chinensis chromosome 3 there are 30 SFB/SLFL/SFBB like genes (called Fbox − and + according to the 5′ or 3′ position relative to the S-RNase, respectively; Supplementary Table S3; Supplementary File S4). In the two R. multiflora scaffolds where the S-RNase is located, there are five such genes (Supplementary Table S3). The phylogenetic analyses of the R. chinensis and R. multiflora SFB/SLFL/SFBB like genes together with Prunus SFB and SLFL genes, M. domestica S1-SFBBs, Petunia and Nicotiana SLF sequences, and A. thaliana F-box/kelch-repeat, shown in Fig. 4 (Supplementary File S4), revealed that the Rchinensis_F-box + 13 gene clusters with Prunus SFB gene. This gene is expressed in all tissues here analyzed (Fig. 5), but the S-pollen gene(s) are mainly expressed in anthers /pollen only7, 10, 12,13,14,15,16, 21, 23,24,25, and thus is unlikely to be the S-pollen gene. Furthermore, using a 1188 bp gene region obtained from eight Rosa genomes (R. arvensis, R. laevigata, R. moschata, R. xantina, R. rugosa, R. odorata, R. multiflora, and R. chinensis), low average levels of divergence (0.0321 for synonymous and 0.0128 for non-synonymous divergence respectively, after Jukes and Cantor correction) are observed. Moreover, this gene is the neighbor F-box gene of the T2-RNase Rchinensis1_8-Rchinensis2_25, that does not cluster with Prunus or Maleae S-RNases (Fig. 1). Therefore, Rchinensis_F-box + 13 gene is not involved in S-pollen GSI specificity determination.

Figure 4
figure 4

Bayesian phylogenetic tree, showing the relationship of the SFB/SLFL/SFBB like genes (called Fbox − and + according to the 5′ or 3′ position relative to the S-RNase) from R. chinensis chromosome 3, R. multiflora sc0006888, and R. multiflora sc0001861, with M. domestica SFBBs, Prunus SFBs and SLFLs, and Solanaceae SLFs. The tree was rooted with A. thaliana F-box/kelch-repeat, not involved in GSI. Numbers below the branches represent posterior credibility values above 70.

Figure 5
figure 5

R. chinensis expression levels (FPKM) for the F-box genes located on chromosome 3, in pistil and ovary, stamen, leaf, stem, and root tissues.

The F-box genes in the vicinity of the R. chinensis and R. multiflora S-RNase gene cluster, with high support, with different Prunus SLF genes (Fig. 4), that are a sister group of Malus SFBBs24. In R. chinensis 14 of these genes (Rchinensis_F-box-3 up to Rchinensis_F-box + 11) show expression in stamen only, compatible with being S-pollen genes (Fig. 5). Furthermore, the R. multiflora orthologs of R. chinensis are not in the same order relatively to the S-RNase gene (Fig. 4), showing that this region is highly rearranged, as observed in the Malus/Pyrus/Sorbus S-locus region13, 16, 18,19,20, 24, 25. This is surprising since we identified the Rosa S-RNase as belonging to the Prunus S-lineage, and in Prunus there is a single S-pollen gene. In Prunus the S-pollen gene presents levels of diversity similar to the S-RNase gene11, 29. Therefore, we also determined levels of synonymous and non-synonymous divergence for the F-box genes surrounding the R. chinensis S-RNase (Rchinensis_F- box-1 and Rchinensis_F- box + 1, used as query in a blastn to identify contigs containing the orthologous genes in the low coverage Rosa genomes). The low levels of synonymous and non-synonymous divergence for Rchinensis_F-box-1 (0.070 and 0.005 respectively), and Rchinensis_F- box + 1 (0.112 and 0.02129 respectively) are incompatible with the hypothesis that one of them is determining Rosa S-pollen specificity. Therefore, multiple F-box genes must be involved in Rosa pollen GSI specificity determination, as in Malus/Pyrus/Sorbus and Solanaceae species. Using the 14 R. chinensis F-box sequences in the vicinity of the S-RNase, that are expressed in stamen, we find evidence for positively selected amino acid sites, as expected if these genes are involved in S-pollen specificity determination (results available at http://bpositive.i3s.up.pt/ under project Rosa S-locus genes; BP2018000004). On the predicted 3D structure, these amino acid sites are located in the same regions (Fig. 6) as those observed for Petunia S-pollen genes26. Evidence for positive selection is also observed for the five F-box genes of the R. multiflora scaffold sca0006888 (results available at http://bpositive.i3s.up.pt/ under project Rosa S-locus genes; BP2018000004). Therefore, the data suggests that Rosa S-pollen specificity is determined by multiple F-box genes, like in Malus/Pyrus/Sorbus and Solanaceae species.

Figure 6
figure 6

Positively selected amino acid sites on the predicted 3D structure of the R. chinensis F-box located in the 5` region of the S-RNase. In yellow are highlighted those sites that appear as positively selected in the two datasets analyzed (14 and 13 R. chinensis F-box genes in the vicinity of the S-RNase that are expressed in stamen), in green those that are located in a region not analyzed when 14 F-box genes are considered, and in blue those that change due to alignment gaps in the two datasets analyzed.

Discussion

In Rosa, there are very important traits of horticultural interest such as flower development, architecture, senescence, scent biosynthesis and emission, ease of reproduction, and resistance to biotic and abiotic stresses, that have been selected only once during the history of rose selection, and incorporated into many rose varieties44, 45, 48. Indeed, only 10 species have contributed to the genetic make-up of most of the modern rose cultivars, and some old and popular cultivars, such as ‘Old Blush’ have dominated the history of rose selection40. This Chinese rose from the Song dynasty (960–1279) conveys several desirable characters such as recessive reblooming habit, recessive lack of prickles (stem) and dominant flower doubleness44, 45, 48, all co-segregating with the S-locus region. The characterization of the S-locus region here performed is thus, very important in order to help breeding selection, and the control of genetic diversity.

The Rosa S-locus is composed of a S-RNase gene that belongs to the Prunus lineage (Fig. 1). This gene shows all expected features of a S-pistil gene since it shows expression in pistil and ovary (Fig. 2), evidence for diversifying selection (Fig. 3), and co-segregation with the S-locus. In Fragaria, that also belongs to the Rosoideae subfamily (the two genera have been diverging for about 50 million years54), a Prunus lineage S-RNase gene has been also reported as the putative S-pistil gene24. This suggests that the Rosoideae GSI system could be of the self-recognition type, with one S-pollen gene, as in Prunus (24 and references therein). Nevertheless, the Rosa F-box gene that clusters with the Prunus SFB gene (Fig. 4) is not the neighbor of the S-RNase gene, as observed in Prunus, and presents expression (Fig. 5) and polymorphism levels incompatible with being the S-pollen gene. In Rosa there are multiple F-box genes in the vicinity of the S-RNase, that show expression compatible with being the S-pollen gene (Fig. 5), and evidence for diversifying selection for F-box genes within a S-haplotype (Fig. 6). These are the expected features of S-pollen genes in a non-self-recognition system, as reported in Malus/Sorbus/Pyrus13, 16, 18,19,20, 24, 25 and Solanaceae species15, 22, 23, 26, 30,31,32. Therefore, the Rosa S-locus region encompasses a large region, as reported for other species presenting non-self recognition systems, and this could explain why in rose several traits of horticultural interest are linked to the S-locus. It should be noted that the Rosa S-locus region may be highly rearranged, since we could not align the contig containing the S-RNase gene from R. chinensis with those from R. multiflora as well as the two contigs containing the S-RNase gene from R. multiflora. This observation has been previously reported for the two available R. chinensis genomes, that could not be aligned in this region (conceivably carrying two different S-haplotypes; see Fig. 1 of55). Pollen transcriptome analyses of multiple S-haplotypes are, however needed to determine how many F-box genes in the vicinity of the Rosa S-RNase are involved in pollen GSI specificity, as performed in Malus25 and Petunia21.

Since multiple S-pollen genes are determining Rosa S-pollen specificity, as in Maleae species, this suggests that the ancestral Rosaceae GSI system was of the non-self-recognition system type. This implies that during evolution the Malus S-RNase lineage, that does not cluster with the Rosa S-RNase, has been recruited de novo from a duplicate of the ancestral S-RNase gene. Moreover, it implies that the Prunus S-pollen gene, that does not cluster with Rosa S-pollen genes, evolved de novo from an unrelated F-box gene.

Low levels of variation have been reported in roses but the S-locus region, being large and under balancing selection, may help retain a substantial fraction of the variability. Indeed, the evolution within the genus Rosa, occurred via interspecific hybridization, allopolyploidization and genetic reticulation among sympatric species56. In Europe, the recent post-glaciation period expedited the process of speciation, as the northward extension of the biotopes increased. The diploid R. arvensis holds a special place in this process, since, although it belongs to the synstylae group, has a substantial genomic promiscuity with the polyploid caninae group56, 57, that may also help maintain diversity levels. The inbred strains that are being established for R. arvensis53 are a starting point to investigate the diversity of the S-locus in European wild roses. Mutants breaking down self-incompatibility, will also help refine the S-locus effect on roses molecular diversity.

Methods

Identification of putative S-RNase like sequences in Rosa genomes

The annotations (CDS) of two R. chinensis genomes (available at NCBI RefSeq database (www.ncbi.nlm.nih.gov) and GDR database (https://www.rosaceae.org) Supplementary Table S1) were downloaded, and the sequences showing similarity with reference S-RNases assigned as Rchinensis1 and Rchinensis3, respectively. Moreover, the corresponding genome sequences were downloaded as a FASTA file, and the S-RNase sequences here annotated labeled as Rchinensis2 and Rchinensis4, respectively. R. multiflora genome (NCBI assembly database, Supplementary Table S1) was also downloaded as a FASTA file, since annotations are not available for this species. To find and extract protein encoding segments larger than 100 bp, we have used getorf, using the emboss Docker image available at pegi3s Bioinformatics Docker images project (htttps:pegi3s.github.io/dockerfiles). Then, we selected the protein encoding segments that show similarity with reference S-RNase sequences, using tblastx (Expect value (e) < 0.05), as implemented in SEDA58, 59, using as query Prunus S3-RNase (AJ298312), Malus Sh-RNase (AB032247), and Fragaria putative S-RNase (gi561957436, gi561674690 and gi561985884)24. Based on this information, we manually annotated the corresponding genome region to identify the exons of each gene. For each putative gene we obtained the corresponding amino acid sequence to calculate IP, using ExPASy software60.

We used the same protocol for the genomes here assembled using the short reads of nine Rosa genomes downloaded from NCBI SRA database (Supplementary Table S1). In this case, we used FastQC to evaluate read quality, and Cutadapt to trim reads61, and ABySS 2.062 for the de novo assembly, using the Docker images available at pegi3s Bioinformatics Docker images project (htttps://pegi3s.github.io/dockerfiles).

Identification of the F-box genes located in R. chinensis chromosome 3 and R. multiflora scaffolds where the S-RNase is located

The protocol presented for the S-RNase like sequences was also used to obtain F-box genes located in R. chinensis chromosome 3, and for the two R. multiflora scaffolds where the S-RNase gene has been identified, using as query P. avium SFB3 (AAT72121.1), P. avium SLFL1 (BAG12295.1), and M. domestica SFBB3-beta (BAF47180.1).

Phylogenetic analyses

Phylogenetic analyses of the R. chinensis and R.multiflora S-RNase like sequences and F-box like genes were performed using sequences aligned with MUSCLE alignment algorithm, as implemented in ADOPS63. Only codons with a support value above 2 were used for phylogenetic reconstruction. Bayesian trees were obtained using MrBayes 3.1.264 as implemented in the ADOPS pipeline63. The Generalised Time-Reversible (GTR) model of sequence evolution was implemented in the analyses, allowing for among-site rate variation and a proportion of invariable sites. Third codon positions were allowed to have a gamma distribution shape parameter different from that of first and second codon positions. Two independent runs of 1,000,000 generations with four chains each, were carried out. The average standard deviation of split frequencies was always ~ 0.01 and the potential scale reduction factor for every parameter was ~ 1.00, showing that convergence was achieved. Trees were sampled every 100th generation and the first 5000 samples were discarded (burn-in). The tree was converted to Newick format using the Format Conversion website (http://phylogeny.lirmm.fr/phylo_cgi/data_converter.cgi) and edited using Mega765.

The phylogenetic analyses of the low coverage Rosa genomes (Supplementary Table S1) sequences were performed with Mega765, using ClustalW alignment algorithm, Neighbor-Joining method, bootstrap test with 10,000 replicates, the p-distance method for computing the evolutionary distances, and pairwise deletion since sequences can have different sizes (Supplementary Table S2).

Expression of R. chinensis S-RNase like, and F-box genes located on chromosome 3 in pistil and ovary, stamen, leaf, stem and root transcriptomes

To estimate expression of the Rosa S- RNase like sequences located on chromosome 3, we use RNA-seq data from R. chinensis pistil and ovary, stamen, leaf, stem, and root transcriptomes (Supplementary Table S4). We used FastQC to evaluate read quality, and Cutadapt to trim reads61. FPKM values were estimated using the RSEM method, as implemented in Trinity66, using the R. chinensis Refseq CDS, and the R. chinensis S-RNase like and F-box sequences located on chromosome 3.

R. arvensis genotype–phenotype association experiments

Genomic DNA was extracted from leaves of 12 R. arvensis individuals, for which the haplotypes were predicted according to their parents and progeny (Supplementary Fig. S3A,B), using the method of67. PCRs were performed using the genomic DNA and primers RA-F 5′ GGAAGCCARACTGAAGAT 3′ and RA-R 5´AGCATCACAGTYTCGATCA 3′, designed for conserved regions of the putative Rosa S-RNase sequences here identified. Standard amplification conditions were 35 cycles of denaturation at 94 °C for 30 s, 52 °C for 30 s for primer annealing, and primer extension at 72 °C for 2 min. The amplification products with the expected size (353 bp) for individuals Ose, E200, E404, Url, and E893 were cloned using the TA cloning kit (Invitrogen, Carlsbad, CA). For each individual, the insert of 16 colonies was cut separately with DdeI, and HinfI restriction enzymes, and only one restriction pattern was observed, and thus three colonies only were sequenced. The ABI PRISM BigDye cycle-sequencing kit (Perkin Elmer, Foster City, CA), and specific primers, or the primers for the M13 forward and reverse priming sites of the pCR2.1 vector, were used to prepare the sequencing reactions. Sequencing runs were performed by STABVIDA (Lisboa, Portugal).

Identification of positively selected amino acid sites, their location on the crystal structure, and polymorphism levels

For the six sequences identified as putative Rosa S-RNases, we inferred positively selected amino acid sites, using codeML51, as implemented in ADOPS63, using Muscle as the alignment method. Such analyses where also performed for the 14 R. chinensis F-box sequences that are in the vicinity of the S-RNase, that cluster with Prunus SLFL, and that are expressed in stamen. Since the inclusion of Rchinensis_F-box-3 gene sequence excludes from the analyses a large fraction of the 3´region, we performed these analyses after removing this sequence. codeML analyses were also performed for five R. multiflora sc0006888 F-box sequences in the vicinity of the S-RNase. The details of the analyses can be seen at the B + database (bpositive.i3s.up.pt68; Rosa S-locus genes BP2018000004). Model comparisons were M2a-M1a and M8-M7. We consider as positively selected those amino acid sites that show a probability higher than 90% for both naive empirical Bayes (NEB) or Bayes empirical Bayes (BEB) methods.

To visualize these positions in the 3D structure, for the S-RNase (translation of Rchinensis1_3–Rchinensis2_27; Supplementary File S1) we first identified the signal peptide using SignalIP (http://www.cbs.dtu.dk/services/SignalP/) website tool available at ExPASy60. After removing the signal peptide, the 3D structure was modeled by I-Tasser69, and the model with the highest C-score value used. The same methodology was used for the putative S-pollen gene (translation of R chinensis_F box-1; Supplementary File S4), but in this case after removing the F-box domain (the first 60 amino acid positions). All structural images were produced using PyMOL (The PyMOL Molecular Graphics System, Version 1.7.4 Schrödinger, LLC.).

Levels of polymorphism were obtained with DnaSp70.