A high-quality genome sequence of Rosa chinensis to elucidate ornamental traits

Rose is the world’s most important ornamental plant, with economic, cultural and symbolic value. Roses are cultivated worldwide and sold as garden roses, cut flowers and potted plants. Roses are outbred and can have various ploidy levels. Our objectives were to develop a high-quality reference genome sequence for the genus Rosa by sequencing a doubled haploid, combining long and short reads, and anchoring to a high-density genetic map, and to study the genome structure and genetic basis of major ornamental traits. We produced a doubled haploid rose line (‘HapOB’) from Rosa chinensis ‘Old Blush’ and generated a rose genome assembly anchored to seven pseudo-chromosomes (512 Mb with N50 of 3.4 Mb and 564 contigs). The length of 512 Mb represents 90.1–96.1% of the estimated haploid genome size of rose. Of the assembly, 95% is contained in only 196 contigs. The anchoring was validated using high-density diploid and tetraploid genetic maps. We delineated hallmark chromosomal features, including the pericentromeric regions, through annotation of transposable element families and positioned centromeric repeats using fluorescent in situ hybridization. The rose genome displays extensive synteny with the Fragaria vesca genome, and we delineated only two major rearrangements. Genetic diversity was analysed using resequencing data of seven diploid and one tetraploid Rosa species selected from various sections of the genus. Combining genetic and genomic approaches, we identified potential genetic regulators of key ornamental traits, including prickle density and the number of flower petals. A rose APETALA2/TOE homologue is proposed to be the major regulator of petal number in rose. This reference sequence is an important resource for studying polyploidization, meiosis and developmental processes, as we demonstrated for flower and prickle development. It will also accelerate breeding through the development of molecular markers linked to traits, the identification of the genes underlying them and the exploitation of synteny across Rosaceae.

* F1 YW progeny GBS markers were generated in an F1-population of 174 plants. 100 ng of genomic DNA was digested with EcoRI and MspI, and ligated to TruSeq compatible GBS adapters with sample specific barcodes in a modified protocol from 6 . GBS-libraries were individually amplified with Taq 2x Mastermix NEB (Bioké), purified with MagNA magnetic beads, and pooled in equimolar amounts before sequencing on an Illumina HiSeq3000 instrument with 2x150bp at the Oklahoma Medical Research Facility (OMRF, USA). Reads were demultiplexed with GBSX 1.1.5 7 allowing one mismatch in the barcodes. Sequence quality was checked with FastQC 8 . After QC of the raw reads, barcodes, restriction site remnants and adapter sequences were removed with Cutadapt 9 and FASTX-Toolkit 0.0.13 10 , and all reads were trimmed to a maximum length of 86bp to compensate for variable barcode lengths. Paired-end reads were overlapped using PEAR 11 , and mapped onto the Old Blush reference genome sequence with default parameters of the BWA-mem algorithm in BWA 0.7.8 12 . Alignments were sorted, indexed, and filtered on mapping quality 20 (q20) with SAMtools 1.2. 13 . Using 10 million quality filtered reads of the two parental lines, we first delineated GBS-stacks on the genome with the following criteria: length range 60-275 bp, average read depth 30-700, equal read depth ratio (0.75 to 1.33-fold) between both parents. SNPs were called with the Unified Genotyper implemented in Genome Analysis Toolkit (GATK) v.3.7 14,15 , and filtered with the following criteria: mean read depth DP10, minor allele count MAC4, SNP quality score q20, Genotype Quality score GQ30, min read depth per SNP per genotype DP10, bi-allelic SNPs. Finally, SNPs were selected from tags containing 1-3 neighbouring SNPs, with <10% missing-values across all 174 genotypes, and with a heterozygous genotype call in one parent and homozygous genotype call in the other parent. 2248 GBS-tags spread across the LGs of the 'Old Blush' reference genome with a total of 3622 SNPs segregating in the F1 were used to create the 'Yesterday' x R. wichurana genetic maps.
GBS data were combined with the AFLP and microsatellite data 16 for linkage mapping (unidirectional 'Yesterday' x R. wichurana,139 F1 plants included). Joinmap 4.1 was applied to cluster the markers into 7 linkage groups; pwd-files with the pair wise recombination frequencies between markers and the corresponding LOD scores were exported. Marker ordering per linkage group was performed in SPSS v.23 using Multidimensional scaling (MDS,17 ). Pairwise recombination frequencies were taken as distance input; LOD scores were used as weight factors. The unidimensional solution was accepted as the best ordering and compared for some linkage groups with map2 and map3 regression maps as generated by Joinmap.
Applying rather strict settings, 3622 GBS SNP markers were selected: 2320 segregating according to the <lmxll> Joinmap scheme; 1302 to the <nnxnp> scheme. Other segregation patterns were not retained. Combined with the previous set of Hosseini et al. (2012) 16 this yielded a set of 4575 markers. A LOD threshold range between 7 to 12 was applied in Joinmap to separate 7 potential linkage groups. A good correspondence with the pseudo-chromosome annotation to 'Old Blush' was observed; all SNP markers located on scaffolds currently assigned to Chr0 of HapOB were assigned to the 7 linkage groups of the YW map (Supplementary Figure 2b). For MDS mapping we compared both the use of LOD and LOD² values as weight parameters. Best correlation with map2 and map3 regression maps (Joinmap) was obtained by using LOD values as weights as recommended 17 . Per pseudochromosome the ordering of the linkage map was compared with the nucleotide position of markers on the physical sequence assembly.
Specific recombination rates (kbp/cM) per chromosome were calculated (Supplementary Figure 1b). For all chromosomes a gradually increasing or decreasing trend was observed; nevertheless, different patterns were observed. Often two clouds of markers could be distinguished, which appeared to be caused by a poor integration of the maternal and paternal maps due to the limited number of allelic bridge markers in certain segments. This problem was resolved by separately calculating male and female maps.

* F1 K5 progeny
Tetraploid linkage maps of the K5 population, consisting of 151 unique individuals after screening and filtering, were generated following the methods described in Bourke et al. (2017) 18 . In brief, the population was genotyped using the WagRhSNP 68K Axiom SNP array, with marker dosage assigned using fitTetra 19 .
Pairwise maximum likelihood recombination frequencies and LOD scores were calculated using polymapR package 20 . The code is available online through CRAN (https://CRAN.R-project.org/package=polymapR). Marker clustering by LOD score identified seven chromosomal linkage groups. Linkage group numbering was assigned through linkage analysis to previously-mapped SSR and AFLP markers 21  Activation for 15 min at 94°C, followed by 10 cycles at 94°C for 20 s and 61°C for 1 min.
(61°C decreasing 0.6°C per cycle to achieve final annealing temperature of 55°C) followed by 26 cycles at 94°C for 20 s and 55°C for 1 min. Reading of KASP genotyping reactions on the qPCR machine was performed in a final cycle at 30°C for 30 s. If fluorescence data did not form satisfactory clusters the conditions for additional cycles were 94°C for 20 s followed by 57°C for 1 min (up to 3 cycles). Genotypic data were analyzed with the StepOne TM Software v2.3 (Applied Biosystems, USA).

Development of a SCAR marker for the SI locus
The marker segregating at the SI locus was derived from a collection of genomic contigs generated by genomic sequencing of the diploid R. multiflora hybrid 88/124-46 via Illumina sequencing and subsequent assembly (data not shown). This collection of 27094 scaffolds was screened with 64 amino acid sequences of annotated S-RNAses from various Prunus species obtained from the NCBI database. One contig (contig no. 2308) with a significant hit to an S-RNAse gene also contained a predicted gene with similarity to an S-locus related F-Box protein. PCR primers (PrimerP9f 5´_CTTGCATTCAAGGTGCAGTC_3´ and Primer P9r 5´_CGGCTCTGGTGAAATAGTCC_3´) for the S-RNAse homolog were designed with the Primer3 software and flank the intron between exon 2 and 3 of the predicted S-RNAse sequence. Amplification conditions were: 94°C for 30s, 30 cycles of 94°C for 30s, 61°C for 30s and 72°C for 2min followed by a final amplification at 72°C for 10min.

* Gene annotation
Previously, we developed a large data set of RNA-seq data from different tissues of 'Old Blush' 22 . In order to increase the mRNA diversity and the gene coverage, we also used RNA-seq libraries from four other genotypes with and without rose pathogens: R.       78E-6). B) Phylogenetic analysis of the Arabidopsis TTG2 clade of the WRKY transcription factor family. The rose WRKY transcription factors located in the prickle density QTL region are shown in green and the closest TTG2 homolog is shown in red. The Arabidopsis WRKY proteins are numbered according to TAIR nomenclature (http://www.arabidopsis.org). C) TTG2 transcript accumulation in three different OW individuals with no prickles, low, and high prickle density. The transcript accumulation level was analysed by qPCR and expressed as a ratio relative to the sample with no prickles (mean ± SD). n = two biologically independent stem samples with two technical replicates each, per prickle density type.