The chromosome-level quality genome provides insights into the evolution of the biosynthesis genes for aroma compounds of Osmanthus fragrans

Yang, Xiulian; Yue, Yuanzheng; Li, Haiyan; Ding, Wenjie; Chen, Gongwei; Shi, Tingting; Chen, Junhao; Park, Min S.; Chen, Fei; Wang, Lianggui

doi:10.1038/s41438-018-0108-0

Download PDF

Article
Open access
Published: 20 November 2018

The chromosome-level quality genome provides insights into the evolution of the biosynthesis genes for aroma compounds of Osmanthus fragrans

Xiulian Yang^1,2^na1,
Yuanzheng Yue^1,2^na1,
Haiyan Li^1,2,
Wenjie Ding^1,2,
Gongwei Chen^1,2,
Tingting Shi^1,2,
Junhao Chen³,
Min S. Park⁴,
Fei Chen³ &
…
Lianggui Wang^1,2

Horticulture Research volume 5, Article number: 72 (2018) Cite this article

8265 Accesses
68 Citations
1 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 19 April 2019

This article has been updated

Abstract

Sweet osmanthus (Osmanthus fragrans) is a very popular ornamental tree species throughout Southeast Asia and USA particularly for its extremely fragrant aroma. We constructed a chromosome-level reference genome of O. fragrans to assist in studies of the evolution, genetic diversity, and molecular mechanism of aroma development. A total of over 118 Gb of polished reads was produced from HiSeq (45.1 Gb) and PacBio Sequel (73.35 Gb), giving 100× depth coverage for long reads. The combination of Illumina-short reads, PacBio-long reads, and Hi-C data produced the final chromosome quality genome of O. fragrans with a genome size of 727 Mb and a heterozygosity of 1.45 %. The genome was annotated using de novo and homology comparison and further refined with transcriptome data. The genome of O. fragrans was predicted to have 45,542 genes, of which 95.68 % were functionally annotated. Genome annotation found 49.35 % as the repetitive sequences, with long terminal repeats (LTR) being the richest (28.94 %). Genome evolution analysis indicated the evidence of whole-genome duplication 15 million years ago, which contributed to the current content of 45,242 genes. Metabolic analysis revealed that linalool, a monoterpene is the main aroma compound. Based on the genome and transcriptome, we further demonstrated the direct connection between terpene synthases (TPSs) and the rich aromatic molecules in O. fragrans. We identified three new flower-specific TPS genes, of which the expression coincided with the production of linalool. Our results suggest that the high number of TPS genes and the flower tissue- and stage-specific TPS genes expressions might drive the strong unique aroma production of O. fragrans.

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Glennis A. Logsdon, Allison N. Rozanski, … Evan E. Eichler

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Jarkko Salojärvi, Aditi Rambani, … Patrick Descombes

Elucidation of genes enhancing natural product biosynthesis through co-evolution analysis

Article 12 April 2024

Xinran Wang, Ningxin Chen, … Xiaozhou Luo

Introduction

Sweet osmanthus (Dicotyledons, Lamiales, Oleaceae, Osmanthus) is one of the most popular, evergreen ornamental tree species in China due to its unique sweet aroma^1,2. More than 160 cultivars of O. fragrans have been classified based on phenotypes such as the leaf shape, flower color, aroma, season and frequency of flower blooming³. The association between phenotypes and genotypes of O. fragrans has been examined through aroma compounds^4,5,6, essential oils^7,8,9,10, and taxonomy using various molecular markers^{11,12,13,14,15,16}. Transcriptome studies have determined the genes that might be responsible for the emission of flower scent in O. fragrans^17,18. Gene expression has also been modulated at different flowering stages of O. fragrans¹⁹. Differential gene expression studies have identified genes in the mediated isopentenol production (MEP) pathway, as well as the terpenoid- and carotenoid-synthesis pathways. Transcriptomics studies allowed researchers to make connections between the major flower aroma compounds, and differentially expressed genes and encoded proteins. The flower aroma compounds, (R)- and (S)-linanool are produced by a terpene synthase(s) (TPS)²⁰. Another key flower aroma compound, β-ionone, is produced through the oxidative cleavage of β-carotene by carotenoid cleavage enzymes (CCD)^21,22,23. Transcriptome studies have shown that TPS(s) and CCD(s) are differentially expressed at different flowering stages in O. fragrans^24,25. Additionally, we and others have recently reported a set of transcription factors (TFs) associated with the expression of color and the emission of fragrance in O. fragrans^21,26,27. All of these gene expression studies provide valuable insights on how flower blooming and aroma production are interlinked²⁸. However, a genome sequence is largely needed to reveal the full genetic background of aroma production in sweet osmanthus and the evolution of aroma in the family Oleaceae.

In this study, we generated a reference genome for O. fragrans to provide a solid foundation for our future understanding of the genome structure and evolution of the Oleaceae family. Furthermore, we conducted a detailed analysis of the aroma compounds, tissue and flowering time-specific differential gene expression to investigate the molecular mechanisms of sweet fragrance development in O. fragrans.

Results

Sequencing summary

We generated 100-fold PacBio single-molecule long reads (a total of 73.4 Gb with an N50 length of 13.0 kb), 77-fold k-mer depth Illumina paired-end short reads (45.1 Gb) and Hi-C data that produced 23 unambiguous chromosome scaffolds for a high-quality assembly. For stepwise assembly, we first performed an initial PacBio-only assembly, resulting in an assembly size of 733.5 Mb and a contig N50 of 1.59 Mb, and the assembled genome had a highly complete BUSCOs (96.1 %) (Supplemental Table 1). Then, the initial contigs were subsequently polished with PacBio long reads and Illumina short reads. As the final step, Hi-C data were used to polish the scaffolds generated by the PacBio and Illumina reads.

Determination of genome size and heterozygosity

The k-mer method²⁹ and KmerFreqAR³⁰ were used to determine the genome size of O. fragrans using the quality-filtered reads of Illumina data. The genome size was estimated based on the formula: Genome size = Modified k-mer number/average k-mer depth, where modified k-mer = Total k-mer number−error k-mer number and the average k-mer depth obtained from the main peak of k-mer distribution curve (Supplemental Fig. 1). To determine the heterozygosity, Arabidopsis genome data were used to simulate Illumina PE reads, which was carried out by using pIRS software³¹. Then, a fitting KmerFreqAR³⁰ was developed using the k-mer distribution curve of O. fragrans. When the two k-mer curves were consistent, the heterozygosity of Arabidopsis was considered the reference for the heterozygosity of O. fragrans. The final analysis produced ~1.45% heterozygosity of the O. fragrans genome.

Genome assembly and quality assessment

The integrated work-flow of genome assembly is shown (Supplemental Fig. 2). The full PacBio long reads were converted to fasta format. Then, all subreads of genome data were assembled using Falcon v0.3.0³² with specific parameters (length-cutoff pr = 8 kb; length-cutoff pr = 9 kb). We used Arrow (https://github.com) to polish the draft genome (G1) to obtain the corrected genome (G2). Then, G2 was polished again by Pilon³³, which mapped the next-generation sequencing data to G2 with bwa to obtain the twice-corrected genome (G3). The O. fragrans genome had high heterozygosity, which led to a G3 size larger than the estimation. To acquire the nonredundant genome, heterozygous and redundant sequences were removed from the corrected genome using Redundans³⁴ with the following parameters: heterozygosity = 0.0145 and Sequencing Depth = 86. The nonredundant genome (G4) was ~741 Mb, with a contig N50 size of 1.595 Mb (Table 1). Finally, BUSCO v3.0 analysis³⁵ was performed to assess G4 using the embryophyta_odb10 database with default parameters.

Table 1 Quality assessment statistics of the assembled genome of O. fragrans

Full size table

The clustering of contig by hierarchical clustering of the Hi-C data was performed. Through a comparative analysis, the only pair of reads around the DpnII digestion site was determined. Hi-C linkage was used as a criterion to measure the degree of tightness of the association between different contigs by standardizing the digestion sites of DpnII on the genome sketch. Agglomerative hierarchical clustering and LACHESIS produced chromosome assembly maps with a karyotype of 2n = 46 (Fig. 1). As a result, the total number of contigs of the O. fragrans genome map was 5327, and the total length was 740,635,307 bp. The combined length of Hi-C contigs was 740,404,543 bp, accounted for 99.97 % of the total length of the final assembled genome, indicating the high quality of Hi-C data (Table 2).

**Fig. 1: Hi-C map of the O. fragrans genome showing genome-wide all-by-all interactions.**

Table 2 Summary statistics demonstrating the high quality of the Hi-C map of O. fragrans

Full size table

Annotation of repeat sequences

The genome of O. fragrans had simply, moderately, and highly repetitive sequences. MIcroSAtellite was used to identify the repeat sequences in the genome of O. fragrans (MISA, RRID: SCR 010765). A total of 409,691 SSRs was obtained, including 305,868 mono-, 70,587 di-, 25,544 tri-, 3934 tetra-, 2081 penta-, and 1991 hexa-nucleotide repeats, respectively (Supplemental Tables 2–3). The tandem repeats finder (TRF, v4.07b)³⁶ identified over 400,000 tandem repeats, accounting for 0.076 % of the O. fragrans genome.

We used homology-based and de novo approaches to identify transposable elements. RepeatMasker³⁷ was used to search against the Repbase (v. 22.11)³⁸ and Mips-REdat libraries³⁹. Then, we used RepeatMasker v4.0.6 to search the de novo repeat library that we built using RepeatModeler v1.0.11 (RepeatModeler, RRID: SCR 015027). Finally, TEs were confirmed by searching the TE protein database using a RepeatProteinMask and WU-BLASTX. The repetitive sequence was 49.35 %, of which LTR accounted for 28.49 % of the assembled genome of O. fragrans (Supplemental Table 4).

Annotation of noncoding RNA (ncRNA)

We identified rRNA, miRNA, and snRNA genes in the O. fragrans genome by searching the Rfam database (release 13.0)⁴⁰, using BLASTN⁴¹ (E-value ≤ 1e−5). Software tRNAscan-SE (v1.3.1)⁴² and RNAmmer v1.2⁴³ were used to predict tRNAs and rRNAs, resulting in an O. frangrans genome with 525 miRNAs, 847 tRNAs, 49 rRNAs, and 2058 snRNAs (Supplemental Table 5).

Gene prediction

The protein-coding genes were identified using homology-based and de novo predictions-based approaches. The O. fragrans genome was mapped against the published sequences of Arabidopsis thaliana, Olea europaea, Sesamum indicum, Solanum tuberosum, and Vitis vinifera. To accurately identify spliced alignments, we used GeneWise v2.2.0⁴⁴ to filter all initially aligned coding sequences. For de novo prediction, the data from NGS and the full-length transcriptomes were analyzed with hisat2-2.1.0 and PASApipeline-2.0.2 to predict the complete gene set. We randomly selected 1000 genes to train the model parameters for Augustus v3.3³⁶, GeneID v1.4.4⁴⁵, GlimmerHMM⁴⁶, and SNAP⁴⁷. The final consensus gene set was generatedusing EVidenceModeler (EVM) v1.1.1⁴⁸, which combined the genes predicted by the de novo and homology searches^49,50 The assembled genome had 45,542 genes with an average transcript length of 4065 bp, an average CDS length of 1142 bp, and a number of exons per gene of 5 (Supplemental Table 6).

The functional validity of the predicted genes was further evaluated by searching the UniProt (release 2017_10), KEGG (release 84.0), and InterPro (5.21-60.0) databases using Blastall44, KAAS,49, and InterProScan50. As a result, we were able to assign potential functions to 43,573 protein-coding genes out of the total of 45,542 genes in the O. fragrans genome (95.68 %) (Supplemental Table 7).

Genome evolution

Gene family analysis

Although morphological investigation and a number of genes have placed O. fragrans in the Oleaceae family, there is still no whole genome-scale phylogenomic analysis of the evolutionary position of O. fragrans. Here, we compared the O. fragrans genome with the genome sequences of 11 other plants (A. thaliana, Fraxinus excelsior, Glycine max, O. europaea, Oryza sativa, Petunia axillaris, Petunia inflata, Prunus mume, Rosa chinensis, Solanum lycopersicum, and V. vinifera). We applied the OrthoMCL (v2.0.9) pipeline⁵¹ (BLASTP E-value ≤ 1e−5) to identify the potential orthologous gene families between the genomes of these plants. Gene family clustering identified 17,513 gene families consisting of 38,808 genes in O. fragrans, of which, 1086 gene families were unique to O. fragrans. O. europaea, and F. excelsior had the biggest number of shared gene families among these plants (Fig. 2).

**Fig. 2: Species tree and evolution of gene numbers.**

Synteny analysis

We used the protein sequences of O. fragrans that were aligned against each other with Blastp (E-value ≤ 1e−5) to achieve the conserved paralogs, Then, MCScanX (http://chibba.pgml.uga.edu/mcscab2) was used to find the collinearity block in the genome. Using the Circos tool (http://www.circos.ca), we mapped and gene density, GC content, Gypsy density, and Copia density, as well as the average expression value of genes expressed in flowers on individual chromosomes (Fig. 3).

**Fig. 3: High-quality assembly of twenty-three chromosomes.**

Whole-genome duplication (WGD)

To determine the source of the high number of genes (>45,000) in O. fragrans, the WGD events were analyzed by taking advantage of the high-quality genome of O. fragrans. We applied four-fold synonymous third-codon transversion (4DTv) and synonymous substitution rate (Ks) estimation to detect the WGD events. First, respective paralogous of O. fragrans, G. max, O. europaea, V. vinifera, and A. thaliana were identified with OrthoMCL. Then, the protein sequences of these plants were aligned against each other with Blastp (E-value ≤ 1e−5) to achieve the conserved paralogs of each plant. Finally, the WGD events of each plant were evaluated based on their 4DTv (Fig. 4a) or Ks (data not shown) distribution. The WGD analysis suggestted that O. fragrans, G. max and O. europaea experienced WGD events within less than 15 MYA, but V. vinifer and A. thaliana have not experienced WGD events recently (Fig. 4a). We also compared the number of duplicated genes (Fig. 4b), the chromosome-level duplications (Fig. 4c), and the number of a functional homologs of glycotransferase and bHLH-Myc transcription factor genes between O. fragrans and V. vinifera (Fig. 4d), further validating the WGD events.

**Fig. 4: Evidences for whole-genome duplication events in O. fragrans.**

Determination of volatile aroma compounds

To make a direct connection between the biosynthetic genes and flower fragrance development, we determined the volatile aroma compounds. Headspace-SPME combined with GC-MS analysis identified over 40 volatile compounds, including linalool, dihydrojasmone lactone (2(3H)-furanone, 5-hexyldihydro-), 1-cyclohexene-1-propanol, 2,6,6-tetramethyl-, and β-ocimen as the major components. Linalool was present in the highest amount at the early flowering stage (S1) and decreased afterwards (Table 3).

Table 3 Identity and quantity of volatile aroma compounds in the various flowering stages of O. fragrans

Full size table

Expression analysis

We also produced comprehensive transcriptome dataset using both HiSeq and the Iso-Seq pipeline. We focused our further analysis on identifying the specific genes responsible for floral development and the biosynthesis of volatile aroma compounds in O. fragrans. The members of MADs transcription factors that control plant development were highly expressed in all tissues tested. Among them, AG, AP3/PI, AP1, and SEP were predominantly expressed in the early flower stage (S1), whereas, the expression level of the ANR1 gene family was highly specific to the root tissue (Fig. 5b). Interestingly, the numbers of ABCE genes were higher than that of Fraxinus chinensis, a close relative of O. fragrans (Fig. 5a).

**Fig. 5: MADS-box gene family in O. fragrans.**

The major component of the volatile compounds in the floral scent of O. fragrans, linalool (Fig. 6), is known to be synthesized by terpenoid synthetases (TPS). Therefore, we compared the expression profiles of TPS genes and identified over 40 genes that contain the functional motifs of TPS. Differential gene expression (DGE) analysis identified 7 TPS genes that are highly expressed in flowers, compared to roots, leaves, and stems (Fig. 7).

**Fig. 6: The top 7 secondary metabolites produced by the osmanthus flower measured by GC-MS.**

**Fig. 7: TPS (terpene synthase) gene family in O. fragrans.**

Discussion

Sweet osmanthus is one of the most beloved ornamental tree species in China and other parts of the world and has been cultivated for over two-thousand years in China due to its attractive traits of beautiful colors, unique aromas, a long flowering season, and medicinal efficacy. However, there is a limited number of studies that have investigated the genetic basis of the phenotypic diversity of sweet osmanthus. Recently, a set of genetic markers was identified^14,15, and an effort to construct a genetic linkage map was reported¹⁶. Additionally, several transcriptomics studies identified a large number of genes that are differentially expressed in some of the cultivars with attractive traits^17,18. While these studies indirectly associate the diverse phenotypes with the genotypes of sweet osmanthus, there is no genome information that can directly link the specific genes to particular traits. Thus, we have sequenced, assembled and annotated the genome of sweet osmanthus. Furthermore, combining HiSeq- and IsoSeq-based transcriptome analyses, we gained deep insight into the genes that control aroma compounds synthesis in the flowers of O. fragrans.

The high-quality reference genome provides deep insights to the evolution of O. fragrans

Currently, there are still no comprehensive analyses combining genomic, transcriptomic, and metabolic approaches to reveal the unique aroma of O. fragrans. Despite advances in second-generation sequencing, it is still very challenging to construct a high-quality plant genome due to the high complexity, large size, and high percentage of repeats and polyploidy. Therefore, we combined the second-generation short read to achieve high accuracy, the third-generation long reads for de novo assembly, and Hi-C to scaffold contigs into a chromosome-scale assembly. To guarantee a high-quality genome annotation, we combined de novo, homology-based, and experimental evidence obtained from the extensive transcriptomics data, including the full-length transcripts. We constructed a reference-quality genome that produced an unambiguous chromosome-scale assembly (N = 23) and functionally annotated 43,573 genes out of the complete set of 45,542 genes of O. fragrans (95.68 %).

The number of genes, 45,542, is high and is more than the genes present in some of the plants that are related to O. fragrans (Fig. 2). This can be attributed to the repeated gene duplications which led to expansion of the gene families. The O. fragrans genome has higher number of multicopy genes compared to other plant species (Fig. 2). Furthermore, O. fragrans appears to have obtained and retained a large number of genes through the whole genome duplications (Fig. 4). The majority of plant species have experienced genome duplications in their evolutionary past^52,53. The high gene number of O. fragrans might be a result of complex interactions among various factors such as the rate of evolution, number of duplication events, level of gene retention, expansion of gene family and selection pressure. The recent (~15 MYA) WGD and high retention might explain the large gene number. The number of genes involved in secondary metabolism is particularly high in O. fragrans (unpublished observation), and these genes might have been retained and/or expanded after the whole-genome duplications. This result may reflect the continuous interaction between O. fragrans and environmental factors, which imposes a constant pressure for adaptation⁵⁴.

The calculated level of heterozygosity (1.45 %) is high in O. fragrans var ruixiangui. Considering that O. fragrans has been selectively bred for desirable traits for over 2000 years in China, 160 cultivars with diverse phenotypes have been selected. The high heterozygosity (1.45 %) in O. fragrans var ruixiangui might support an extensive breeding among cultivars throughout its history, although it is challenging to accurately determine the origin of the observed heterozygosity in the cultivar. Furthermore, as an androdioecious species⁵⁵, the coexistence of selfing and crossing poses an additional challenge to trace the origin of the high heterozygosity. Recently, the first genetic map of O. fragrans was created using the SLAF-seq method¹⁶ to provide a framework for understanding the genome organization. This linkage map has helped us assemble the reference genome and can help to investigate the origin of the high heterozygosity and history of hybridization among the cultivars of O. fragrans.

The new genome can also be used as a reference for the whole genome resequencing of sweet osmanthus cultivars. Resequencing these whole genomes of various cultivars provides highly useful information on the potential drivers for the phenotype diversity, evolution, and population structure of a given species⁵⁶. Our preliminary genome sequencing of 30 different cultivars of O. fragrans identified a large number of single nucleotide polymorphisms (SNP), copy number variation (CNV), insertion sdeletion (InDel), structural variations (SV) and other mutation sites (unpublished results). Using the above mutation loci as new molecular genetic markers, researchers can study the history of cultivation, population dynamics and genetic diversity.

The whole-genome duplication and the tandem duplication of the biosynthetic genes is likely the cause for the strong sweet aroma of O. fragrans

Among TPS-family genes, the TPS-b and g subfamilies are known to synthesize monoterpenes⁵⁷. Linalool, the major aroma compound identified in our study, is produced by the monoterpene synthesis pathway in O. fragrans. Using the high-quality genome and deep transcriptome information, we found a significant expansion of TPS as a whole, and of subfamilies b and g specifically, compared with the grape (V. vinifera), which did not have whole genome duplication (Fig. 5). In addition to TPS1, 2, 3, and 4, which have been previously functionally validated, we identified seven additional TPS genes that are specifically expressed in flower stages S1 and S3. Three TPS genes appear to be new genes that are flower specific, indicating that the production of fragrance is controlled by a complex network involving multiple TPS genes functioning in time- and flower-specific manners. Our results suggest that the unique aromas of O. fragrans are some of the outcomes of the interrelationship between genome evolution, transcriptional regulation, and metabolic control. Our current work lays a solid foundation for further studies on the comparative genomics, molecular and biochemical mechanisms of aroma development in O. fragrans.

Conclusion

We constructed a high-quality reference genome of O. fragrans by combining Illumina, PacBio and Hi-C platforms. The genome of O. fragrans var. rixianggui is ~740 Mb and has a high heterozygosity of 1.45 %. A large number of genes (45,542) was predicted by the gene models built with de novo, homology-based, and experimental data obtained from extensive transcription results. Our deep genome analysis indicates evidence of whole-genome duplication at ~15 MYA. Our new genome information should help the research community study the genome structure, genetic basis of genetic diversity, and regulation of the flowering process and scent development in O. fragrans and other related plant species.

Materials and methods

Genome sequencing

For genome sequencing, leaf samples were collected from a male tree (O. fragrans var. rixianggui) on the campus of Nanjing Forestry University, Nanjing, China, and were processed for genomic DNA isolation and library construction. Rixianggui (Semperfloren) is a unique cultivar because it has a strong aroma and blooms continuously, except in hot summer months, while other cultivars, for example, Thunbergii, Latifolius, and Aurantiaeus, bloom only in autumn. Genomic DNA was extracted using the CTAB method, size fractionated with BluePipin (Sage Science, Inc, MA, USA), used for library construction following the PacBio SMRT library construction protocol, and sequenced on the PacBio Sequel platform (Pacific Biosciences, CA, USA). For Illumina library construction, the extracted DNA was fragmented and size-fractionated using g-tube and BluePipin, then subjected to paired-end library construction and sequenced on the HiSeq X ten platform (Illumina Inc, CA, USA).

Hi-C sequencing

To ensure the quality of the Hi-C library, leaf samples were initially examined forintegrity of the nuclei by DAPI staining. Once confirmed for high quality nuclei, the samples were processed following the Hi-C procedure^58,59,60. The Hi-C library was sequenced on the Illumina HiSeq X ten platform (Illumina, CA, USA), generating 740 million Hi-C read pairs, which were submitted to the Lachesis Hi-C scaffolding pipeline⁵⁸. Hi-C libraries produce different molecular types, including invalid pairs of self-circles, dangling-ends, and dumped-pairs. According to the different molecular types that lead to the alignment of paired reads on the genome in different directions, the unique alignment of reads on the genome needs to be statistically analyzed. Once recognized as an effective interaction, the final data only retained effective interactions. According to the above rules, the position of the DpnII digestion site in the reference genome was used, because it can also provide useful information on the structural organization of individual chromosomes.

Transcriptome sequencing

To obtain information that can assist in the empirical annotation of genes, full-length transcriptome sequencing was performed. The samples from flowers at three different blooming stages (S1: beginning, S2: middle, S3: late; Suppl. Figure 3), leaves, stems, and roots were collected from the same tree described above and processed for library construction. The total RNAs were extracted according to the manufacturer’s instructions of TRNzol Universal Reagent (Cat# DP424, TIANGEN Biotech Co. Ltd, Beijing, China). The quality and quantity of the RNA samples were evaluated using a NanoDrop™ One UV-Vis spectrophotometer (Thermo Fisher Scientific, USA), a Qubit® 3.0 Fluorometer (Thermo Fisher Scientific, USA) and an Agilent Bioanalyzer 2100 (Agilent Technologies, USA). All RNA samples with integrity values close to 10 were used for cDNA library construction and sequencing. The cDNA library was prepared using the TruSeq Sample Preparation (Illumina Inc, CA, USA) and IsoSeq Library Construction kits (Pacific Biosciences, CA, USA), and paired-end sequencing with 150 bp was conducted on a HiSeq X ten platform (Illumina Inc, CA, USA).

Aroma compound analysis

Fresh flowers at three different stages (S1: beginning, S2: middle, S3: lat), defined by the size of the flower (Supplemental Fig. 3), were picked from the same tree at the time of sample collection for the transcriptome studies described above. Sampling was replicated five times, and the samples were quickly put into polyethylene bags impermeable to gases, kept frozen and stored at −20 °C. Headspace solid phase microextraction (SPME) combined with gas chromatography-mass spectrometry (GC-MS) was used to determine the identity and quantity of the aroma volatiles. Flowers (0.3 g) were placed in a 4 mL solid-phase microextraction vial (Supelco Inc, USA), 1 μl of 1000× diluted ethyl caprate (Macklin Inc, China) was added, and vials were capped with a 65 µm DB-5 ms extraction head (Supelco Inc, USA). Then, the vial was incubated for 40 min in a water bath at 45 °C to volatilize the aroma compounds and release them into the headspace. After the adsorption period, the fiber head was removed and introduced into the heated injector port of the GC for desorption at 250 °C for 3 min. The desorbed volatile compounds from DB-5 ms were analyzed on a Trace DSQ GC-MS (Thermo-Fisher Scientific, USA), equipped with a 30 m x 0.25 mm × 0.25 mm TR-5 ms capillary column (Supelco Inc, USA). The oven temperature was programmed at 60 °C for 2 min, increasing at 5 °C/min to 150 °C, then increasing at 10 °C/min to reach 250 °C, followed by maintaining the temperature of the transfer line at 250 °C. Helium was taken as the carrier gas at a linear velocity of 1.0 mL/min. Mass detector conditions on MS were: source temperature: 250 °C and the electronic impact (EI) mode at 70 eV, with a speed of 4 scans/s over the mass range m/z 33-450 amu in a 1 s cycle. Volatile compounds were first auto-matched by mass spectra using the NIST98 database through ChemStation (Agilent, USA). A series of n-alkanes (C7-C30) (Sigma St. Louis, MO) was injected into the GC-MS set to obtain the linear retention indices of the volatile compounds, and they were analyzed under the same conditions. The data were also compared with published linear retention indices (NIST Chemistry WebBook, SRD 69). The normalization of peak-areas was used to calculate the quantities of the volatile aroma compounds.

Change history

19 April 2019
Since the publication of this article, the authors have noticed that the NCBI accession number is missing from article.

References

Shang, F. D., Yin, Y. J. & Xiang, Q. B. The culture of sweet osmanthus in China. J. Henan Univ. Nat. Sci. 43, 136–139 (2003).
Google Scholar
Hao, R. M., Zang, D. K. & Xiang, Q. B. Investigation on natural resources of Osmanthus fragrans Lour. at Zhou luo cun in Hunan. Acta Hortic. Sin. 32, 926–929 (2005).
Google Scholar
Zang, D. K., Xiang, Q. B., Liu, Y. L. & Hao, R. M. The studying history and the application to International Cultivar Registration Authority of sweet osmanthus (Osmanthus fragrans Lour.). J. Plant Resour. Environ. 12, 49–53 (2003).
Google Scholar
Deng, C. H., Song, G. X. & Hu, Y. M. Application of HS-SPME and GC-MS to characterization of volatile compounds emitted from Osmanthus flowers. Ann. Chim. 94, 921–927 (2004).
Article CAS Google Scholar
Xin, H. P. et al. Characterization of volatile compounds in flowers from four groups of sweet osmanthus (Osmanthus fragrans) cultivars. Can. J. Plant Sci. 93, 923–931 (2013).
Article CAS Google Scholar
Cai, X. et al. & W, C.Y. Analysis of aroma-active compounds in three sweet osmanthus (Osmanthus fragrans) cultivars by gas-chromatography olfactometry and GC-mass spectrometry. J. Zhejiang. Univ. Sci. B 15, 638–648 (2014).
Article CAS Google Scholar
Hu, C. D. et al. Essential oil composition of Osmanthus fragrans varieties by GC-MS and heuristic evolving latent projections. Chromatographia 70, 1163–1169 (2009).
Article CAS Google Scholar
Wang, L. M. et al. Variations in the components of Osmanthus fragrans Lour. Essential oil at different stages of flowering. Food Chem. 114, 233–236 (2009).
Article CAS Google Scholar
Hu, B. F., Guo, X. L., Xiao, P. & Luo, L. P. Chemical composition comparison of the essential oil from four groups of Osmanthus fragrans Lour. flowers. J. Essent. Oil Plants 15, 832–838 (2012).
Article CAS Google Scholar
Lei, G. M. et al. Water-soluble essential oil components of fresh flowers of Osmanthus fragrans lour. J. Essent. Oil Res. 28, 177–184 (2016).
Article CAS Google Scholar
Shang, F. D., Yin, Y. J. & Zhang, T. The RAPD analysis of 17 Osmanthus fragrans cultivars in Henan province. Acta Hortic. Sin. 31, 685–687 (2004).
Google Scholar
Yuan, W. J., Han, Y. J., Dong, M. F. & Shang, F. D. Assessment of genetic diversity and relationships among Osmanthus fragrans cultivars using AFLP markers. Electron. J. Biotechnol. 14, 2–3 (2011).
Google Scholar
Hu, W., Luo, Y., Yang, Y., Zhang, Z. Y. & Fan, D. M. Genetic diversity and population genetic structure of wild sweet osmanthus revealed by microsatellite markers. Acta Hortic. Sin. 41, 1427–1435 (2014).
CAS Google Scholar
Yuan, W. J., Li, Y., Ma, Y. F., Han, Y. J. & Shang, F. D. Isolation and characterization of microsatellite markers for Osmanthus fragrans (Oleaceae) using 454 sequencing technology. Genet. Mol. Res. 14, 17154–17158 (2015).
Article CAS Google Scholar
Han, Y. J. et al. cDNA-AFLP analysis on 2 Osmanthus fragrans cultivars with different flower color and molecular characteristics of MYB1gene. Trees 29, 931–940 (2015).
Article CAS Google Scholar
He, Y. X., Yuan, W. J., Dong, M. F., Han, Y. J. & Shang, F. D. The first genetic map in sweet osmanthus (Osmanthus fragrans Lour.) using specific locus amplified fragment sequencing. Front. Plant Sci. 8, 1621 (2017).
Zhang, X. S., Pei, J. J., Zhao, L. G., Tang, F. & Fang, X. Y. RNA-Seq analysis and comparison of the enzymes involved in ionone synthesis of three cultivars of Osmanthus. J. Asian Nat. Prod. Res. 9, 1–13 (2018).
Google Scholar
Yang, X. L. et al. Transcriptomic analysis of the candidate genes related to aroma formation in Osmanthus fragrans. Molecules 23, 1604 (2018).
Article Google Scholar
Xu, C. et al. Cloning and expression analysis of MEP pathway enzyme-encoding genes in Osmanthus fragrans. Genes 7, 78 (2016).
Article Google Scholar
Zeng, X. L. et al. Emission and accumulation of monoterpene and the key terpene synthase (TPS) associated with monoterpene biosynthesis in Osmanthus fragrans Lour. Front. Plant Sci. 6, 1232 (2015).
PubMed Google Scholar
Baldermann, S. et al. Functional characterization of a carotenoid cleavage dioxygenase 1 and its relation to the carotenoid accumulation and volatile emission during the floral development of Osmanthus fragrans Lour. J. Exp. Bot. 61, 2967–2977 (2010).
Article CAS Google Scholar
Baldermann, S., Kato, M., Fleischmann, P. & Watanabe, N. Biosynthesis of α- and β-ionone, prominent scent compounds, in flowers of Osmanthus fragrans. Acta Biochim. Pol. 59, 79–81 (2012).
Article CAS Google Scholar
Han, Y. J., Liu, L. X., Dong, M. F., Yuan, W. J. & Shang, F. D. cDNA cloning of the phytoene synthase (PSY) and expression analysis of PSY and carotenoid cleavage dioxygenase genes in Osmanthus fragrans. Biologia 68, 258–263 (2013).
Article CAS Google Scholar
Han, Y. J. et al. Differential expression of carotenoid-related genes determines diversified carotenoid coloration in flower petal of Osmanthus fragrans. Tree. Genet. Genom. 10, 329–338 (2014).
Article Google Scholar
Zhang, C., Wang, Y. G., Fu, J. X., Bao, Z. Y. & Zhao, H. B. Transcriptomic analysis and carotenogenic gene expression related to petal coloration in Osmanthus fragrans ‘Yanhong Gui’. Trees 30, 1207–1223 (2016).
Article CAS Google Scholar
Mu, H. N. et al. Transcriptome sequencing and analysis of sweet osmanthus (Osmanthus fragrans Lour.). Genes. Genom. 36, 777–788 (2014).
Article CAS Google Scholar
Han, Y. J. et al. Characterization of OfWRKY3, a transcription factor that positively regulates the carotenoid cleavage dioxygenase gene OfCCD4 in Osmanthus fragrans. Plant Mol. Biol. 91, 485–496 (2016).
Article CAS Google Scholar
Wang, L. et al. Analysis of the main active ingredients and bioactivities of essential oil from Osmanthus fragrans Var. thunbergii using a complex network approach. BMC Syst. Biol. 11, 144 (2017).
Article Google Scholar
Guillaume, M. & Carl, K. A fast, lock-free approach for efficient parallel counting of occurrences of K-Mers. Bioinformatics 27, 764–770 (2011).
Article Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 27, 18 (2012).
Article CAS Google Scholar
Hu, X. et al. pIRS: Profile-based Illumina pair-end reads simulator. Bioinformatics 28, 1533–1535 (2012).
Article Google Scholar
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
Article CAS Google Scholar
Walker, B. J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S., Cuomo, C. A., Zeng, Q., Wortman, J., Young, S. K. & Earl, A. M. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Article Google Scholar
Pryszcz, L. P. & Gabaldón, T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 44, e113 (2016).
Article Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with Single-Copy Orthologs. Bioinformatics 31, 3210–3212 (2015).
Article Google Scholar
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).
Article CAS Google Scholar
Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. 4, 10 (2004).
Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Article Google Scholar
Nussbaumer, T. et al. MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 41, D1144–D1151 (2013).
Article CAS Google Scholar
Kalvari, I. et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46, D335–D342 (2017).
Article Google Scholar
Camacho, C. et al. BLAST+: architecture and tapplications. BMC Bioinforma. 10, 421 (2009).
Article Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS Google Scholar
Lagesen, K., Hallin, P., Rødland, E. A., Staerfeldt, H. H. & Rognes, T. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Article CAS Google Scholar
Birney, E. & Durbin, R. Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548 (2000).
Article CAS Google Scholar
Blanco, E., Parra, G. & Guigó, R. Using geneid to identify genes. Curr. Protoc. Bioinforma. 4, 1–28 (2007).
Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS Google Scholar
Bromberg, Y. & Rost, B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 35, 3823–3835 (2007).
Article CAS Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
Article Google Scholar
Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A. C. & Kanehisa, M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35, W182–W185 (2007).
Article Google Scholar
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, 116–120 (2005).
Article Google Scholar
De Bodt, S., Maere, S. & Van de Peer, Y. Genome duplication and the origin of angiosperms. Trends Ecol. Evol. 20, 591–597 (2005).
Article Google Scholar
Li, L., Stoeckert, C. J. Jr. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Article CAS Google Scholar
Ciu, L. Y. et al. Widespread genome duplications throughout the history of flowering plants. Genome Res. 16, 738–749 (2006).
Article Google Scholar
Casneuf, T., De Bodt, S., Raes, J., Maere, S. & Van de Peer, Y. Nonrandom divergence of gene expression following gene and genome duplications in the flowering plants Arabidopsis thaliana. Genome Biol. 7, R13 (2006).
Article Google Scholar
Xu, Y. C. et al. The differentiation and development of pistils of hermaphrodites and pistillodes of males in androdioecious Osmanthus fragrans L. and implications for the evolution to androdioecy. Plant Syst. Evol. 300, 843–849 (2014).
Article Google Scholar
Huang, X. et al. High-throughput genotyping by whole-genome resequencing. Genome Res. 19, 1068–1076 (2009).
Article CAS Google Scholar
Tholl, D. Terpene synthases and the regulation, diversity and biological roles of terpene metabolism. Curr. Opin. Plant. Biol. 9, 297–304 (2006).
Article CAS Google Scholar
Kaplan, N. & Dekker, J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat. Biotechnol. 31, 1143–1147 (2013).
Article CAS Google Scholar
Marie-Nelly, H. et al. High-quality genome (re) assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).
Article CAS Google Scholar
Jibran, R. et al. Chromosome-scale scaffolding of the black raspberry (Rubus occidentalis L.) genome based on chromatin interaction data. Hortic. Res. 5, 18008 (2018).
Article Google Scholar

Download references

Acknowledgements

This work was supported by research grants provided by the National Natural Science Foundation (31870695 and 31601785), the Project of Key Research and Development Plan (Modern Agriculture) in Jiangsu (BE2017375), the Selection and Breeding of Excellent Tree Species and Effective Cultivation Techniques (CX(16)1005), the Project of Osmanthus National Germplasm Bank, and the Top-notch Academic Programs Project of Jiangsu Higher Education Institutions.

Authors’ contributions

L.W. and Y.Y. designed and coordinated the whole project. X.Y., L.W., Y.Y., and F.C. together lead and performed the whole project. J.C., F.C., T.S., H.L., and W.D. performed the analyses of genome evolution, gene family analyses, and metabolic analyses, M.S.P., J.C., F.C., Y.Y., and G.C. participated in manuscript writing and revision. All authors read and approved the final manuscript.

Author information

These authors contributed equally to this work and should be considered co-first authors: Xiulian Yang, Yuanzheng Yue

Authors and Affiliations

Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China
Xiulian Yang, Yuanzheng Yue, Haiyan Li, Wenjie Ding, Gongwei Chen, Tingting Shi & Lianggui Wang
College of Landscape Architecture, Nanjing Forestry University, Nanjing, China
Xiulian Yang, Yuanzheng Yue, Haiyan Li, Wenjie Ding, Gongwei Chen, Tingting Shi & Lianggui Wang
State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Fujian Agriculture and Forestry University, Fuzhou, China
Junhao Chen & Fei Chen
Nextomics Bioscience Institute, Wuhan, China
Min S. Park

Authors

Xiulian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzheng Yue
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Ding
View author publications
You can also search for this author in PubMed Google Scholar
Gongwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Shi
View author publications
You can also search for this author in PubMed Google Scholar
Junhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Min S. Park
View author publications
You can also search for this author in PubMed Google Scholar
Fei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lianggui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Min S. Park, Fei Chen or Lianggui Wang.

Ethics declarations

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

new Supplemental

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, X., Yue, Y., Li, H. et al. The chromosome-level quality genome provides insights into the evolution of the biosynthesis genes for aroma compounds of Osmanthus fragrans. Hortic Res 5, 72 (2018). https://doi.org/10.1038/s41438-018-0108-0

Download citation

Received: 07 November 2018
Revised: 12 November 2018
Accepted: 13 November 2018
Published: 20 November 2018
DOI: https://doi.org/10.1038/s41438-018-0108-0

This article is cited by

High quality genomes produced from single MinION flow cells clarify polyploid and demographic histories of critically endangered Fraxinus (ash) species
- Steven J. Fleck
- Crystal Tomlin
- Victor A. Albert
Communications Biology (2024)
Genome-Wide Characterization of Differentially Expressed Scent Genes in the MEP Control Network of the Flower of Lilium ‘Sorbonne’
- Lei Cao
- Fan Jiang
- Jinping Fan
Molecular Biotechnology (2024)
Mechanisms for leaf color changes in Osmanthus fragrans ‘Ziyan Gongzhu’ using physiology, transcriptomics and metabolomics
- Peng Guo
- Ziqi Huang
- Fude Shang
BMC Plant Biology (2023)
Insights into the trihelix transcription factor responses to salt and other stresses in Osmanthus fragrans
- Meilin Zhu
- Jing Bin
- Yuanzheng Yue
BMC Genomics (2022)
Temperature regulation of carotenoid accumulation in the petals of sweet osmanthus via modulating expression of carotenoid biosynthesis and degradation genes
- Yiguang Wang
- Chao Zhang
- Hongbo Zhao
BMC Genomics (2022)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Sequencing summary

Determination of genome size and heterozygosity

Genome assembly and quality assessment

Annotation of repeat sequences

Annotation of noncoding RNA (ncRNA)

Gene prediction

Genome evolution

Gene family analysis

Synteny analysis

Whole-genome duplication (WGD)

Determination of volatile aroma compounds

Expression analysis

Discussion

The high-quality reference genome provides deep insights to the evolution of O. fragrans

The whole-genome duplication and the tandem duplication of the biosynthetic genes is likely the cause for the strong sweet aroma of O. fragrans

Conclusion

Materials and methods

Genome sequencing

Hi-C sequencing

Transcriptome sequencing

Aroma compound analysis

Change history

19 April 2019

References

Acknowledgements

Authors’ contributions

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links