Transcriptome profiles of three Muscat table grape cultivars to dissect the mechanism of terpene biosynthesis

Vitis vinifera is widely grown worldwide for making wine and for use as table grapes. Of the existing cultivars, some have a floral and fruity flavour, referred to as a Muscat flavour. It is well-documented that this flavour originates from a series of terpene compounds, but the mechanism of terpene content differences among the Muscat-type cultivars remains unclear. Transcript and terpene metabolite profiles were integrated to elucidate the molecular mechanism of this phenomenon. In this research, three genotypes with different aromatic strengths were investigated by RNA sequencing. A total of 27 fruit samples from three biological replicates were sequenced on Illumina HiSeq2000 at three stages, corresponding to the veraison; berries had intermediate Brix value and were harvest-ripe. After quality assessment and data clearance, a total of 254.18 Gb of data with more than 97% Q20 bases were obtained, approximately 9.41 Gb data were generated per sample. These results will provide a valuable dataset for the discovery of the mechanism of terpene biosynthesis.


Background & Summary
The trait of aroma is one of the most important parameters for the quality of grapes and is the main concern when consumers buy grape products. For genetic improvement research and breeding, the biosynthesis mechanism of aromatic compounds and their regulation has attracted much attention. Terpenes are the typical aromatic compounds in Muscat grapes, and they belong to the second metabolites [1][2][3][4] ; they have a low olfactory threshold and can be easily precepted by humans. The terpenes mainly exist in the pericarp and in the flesh of some cultivars 5 , with their content being affected by the genotype 6,7 , developmental stage 8,9 , environment and management of the grape [10][11][12][13] . Terpenes have two forms: the free form, which directly leads to the aromatic flavour, and the glycoside bound form, in which the potential aromatic compounds transfer to the free form by hydrolysis [14][15][16] .
The genetic mechanism of Muscat flavour in grapevines has been studied through quantitative trait loci analysis (QTL) in different F1 populations 23,24 , and in selfing populations, it has been shown that VvDXS is a structural candidate gene for geraniol, nerol, and linalool concentrations in wine grapes 25 . Battilana reported that single nucleotide polymorphism (SNP) mutations in VvDXS are the main cause of the Muscat flavour. The substitution of a lysine with an asparagine at position 284 of the VvDXS amino acid sequence affects the monoterpene content of Muscat flavour and neutral cultivars 26 .
In Muscat grapes, some cultivars have a very strong flavour, while others have moderate or light flavour. The terpene type and concentration varied among the cultivars. To date, terpene accumulation has been well-documented in some wine grapes. Terpene accumulation in developing Gewurztraminer grapes has been shown to be correlated with an increase in the transcript abundances of early terpenoid pathway enzymes 27 . Some transcription factors involved in controlling terpene biosynthesis have been predicted in the grapevine cultivar Muscat Blanc à Petits Grains through gene co-expression network analysis 28 . A Nudix hydrolase was also found to be a component of a terpene synthase-independent pathway, with cytochrome P450 hydroxylases, epoxide hydrolases and glucosyltransferases genes potentially involved in monoterpene metabolism 29 . However, there are few reports on the table grape 30 .
In this study, we present the transcriptome analysis of three genotypes of table grapes. During berry development, 27 samples, in total, were sequenced on the Illumina HiSeq Platform. After quality assessment and data clearance, a total of 254.18 Gb high-quality base pairs with more than 97% Q20 bases were obtained, and an approximately 9.41 Gb per sample. In the aggregate, a total of 776 million reads were yielded, with an average of 31.66 million reads per sample. Furthermore, approximately 76.65% of the total reads were uniquely aligned to the grape genome (V2) 31 . These data will provide useful information for investigating terpene biosynthesis.

Methods
Overview of the experimental design. The berries of three genotypes were collected at three developmental stages. Approximately 300 grape berries were randomly collected for each replicate, with three replicates harvested for each stage. The experimental design and analysis pipeline are shown in Fig. 1.

Materials and methods. Plant materials.
Three V. vinifera cultivars were used for transcript study.
'Xiangfei' was registered by our team and has a strong Muscat flavour and a green to golden skin colour, while 'Italia, ' the famous mid-late season table grape cultivar that originated in Italy, has a moderate Muscat flavour. 'Zaomeiguixiang' has a purple-reddish colour and a strong Muscat flavour.
Sampling. The vines were grown in the experimental vineyard at the Beijing Academy of Forestry and Pomology Sciences in China (39°58′N and 116°13′E) under a plastic cover and were trained into a two-wire vertical trellis system with a 2.5-m row space and a 0.75 m plant space. In 2017, berry samples from three vines were harvested at the developmental stages corresponding to EL35, EL36, and EL38 32 . The berry begins to colour and soften at EL 35 (about 5% of the berries started to colour and soften), progresses to the complete veraison with an intermediate  Fig. 1 Flowchart of the experimental design. Berry samples were collected at three developmental stages, and three biological replicates per sample were used for transcriptome sequencing. All raw reads were quality controlled and assessed. Then, the clean data were mapped to the V. vinifera reference genome (V2) by Hisat2. Gene expression levels were calculated with RSEM.
www.nature.com/scientificdata www.nature.com/scientificdata/ Brix of EL 36, and reaches harvest ripeness at EL38. At each stage, three replicates were harvested; approximately 300 grape berries were randomly collected for each replicate.
Physiochemical parameters. Fifty berries of each replicate were pressed and centrifuged to determine total soluble solids (TSS), pH value and titratable acidity. TSS was measured by a digital refractometer (PAL-1, Atago, Tokyo, Japan). The pH value was measured by a pH meter (FiveGo F2-Standard, Mettler Toledo, Switzerland). RNA extraction and sequencing. The extraction of total RNA from the berries was carried out by a Plant RNA extraction kit (Aidlab Biotechnologies, Beijing, China). The quality of the RNA was verified by agarose gel electrophoresis, and the concentration was determined by the absorbance ratio (A260/A280, 1.8-2.0) on an Implen P330 nanophotometer (Implen GmbH, Munich, Germany).
The RNA-Seq libraries were constructed from 27 samples according to the methods of Wang 33 . The enriched mRNA was obtained by using oligo (dT) magnetic beads then fragmented by 94 °C for 5 min. cDNA was synthesized by Superscript ® III Reverse Transcriptase, followed by purification, end repair and dA-tailing and was then ligated with the sequencing adaptor. Afterwards, PCR amplification was conducted by indexed primers. The constructed library was QC checked by Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System and then sequenced by Illumina HiSeq2000 platform at BGI Life Tech Co., Ltd. (Shenzhen, China). Low quality reads (more than 20% of the base qualities are lower than 10), reads with adaptors and reads with unknown bases (N bases more than 5%) were filtered to get clean reads and were stored in FASTQ format. The clean reads were mapped onto the reference grapevine genome (V2) using Hisat2 34 .

Data Records
The RNA-Seq clean data of the 27 samples were deposited at the NCBI Sequence Read Archive with accessions SRP184152 35 . The files of gene expression level were deposited in NCBI's Gene Expression Omnibus (GEO), and are accessible through GEO Series accession number GSE130386 36 . The information of the differentially expressed genes (DEGs) between samples were deposited in figshare 37 .

Sample name
Total soluble solids Titratable acidity(g/l) pH www.nature.com/scientificdata www.nature.com/scientificdata/ Technical Validation Quality control. The physiochemical parameter of the samples was shown in Table 1. A total of 27 RNA samples were prepared and sequenced, with the sequencing depth ranging between 22.48 and 33.08 million reads; the Q20 values for the clean reads were above 97%, and the average mapping ratio was 84.72% (Online-only Table 1).
Analysis of RNA-Seq data. After novel transcript detection, novel coding transcripts were merged with reference transcripts to get a complete reference. Clean reads were mapped to the transcript by using Bowtie2 38 . Gene expression levels were calculated with RSEM 39 . The distribution of reads based on the detection of read coverage skewness showed good fragmentation randomness (Fig. 2). The differentially expressed genes (DEGs) between samples were identified by the R package, DESeq2 40 . The DEGs with a |log2ratio| ≥ 1 and a false discovery rate probability ≤ 0.001 were considered statistically significant. The statistical analyses of DEG are shown in Fig. 3.

Usage Notes
The RNA-Seq fastq.gz files were deposited at Gene Expression Omnibus and can be downloaded using the fastq-dump tool of the SRA Toolkit (https://www.ncbi.nlm.nih.gov). The V2 reference genome of V. vinifera, the annotated file, could be retrieved at (http://genomes.cribi.unipd.it/grape/).  Fig. 3 Statistics of differently expressed genes. The X-axis represents the comparison method between groups and the y-axis represents DEG numbers. The red colour represents upregulated DEGs, and the blue colour represents downregulated DEGs.