Background & Summary

The trait of aroma is one of the most important parameters for the quality of grapes and is the main concern when consumers buy grape products. For genetic improvement research and breeding, the biosynthesis mechanism of aromatic compounds and their regulation has attracted much attention. Terpenes are the typical aromatic compounds in Muscat grapes, and they belong to the second metabolites1,2,3,4; they have a low olfactory threshold and can be easily precepted by humans. The terpenes mainly exist in the pericarp and in the flesh of some cultivars5, with their content being affected by the genotype6,7, developmental stage8,9, environment and management of the grape10,11,12,13. Terpenes have two forms: the free form, which directly leads to the aromatic flavour, and the glycoside bound form, in which the potential aromatic compounds transfer to the free form by hydrolysis14,15,16.

Biologically, the biosynthesis of terpene compounds in plants are synthesized by two pathways, the methyl-erythritol-4-phosphate pathway (DXP/MEP) in the plastid and the mevalonate pathway (MVA) in the cytoplasm17, with terpenes located in the mesocarp and pericarp18. Starting from pyruvic acid and 3-phosphate glyceraldehyde, by 1-deoxy-D-xylulose-5-phosphate synthase (DXS), which is the entrance enzyme in the MEP pathway, the two compounds were changed into 1-deoxy-D-xyulose-5-phosphate and, then, through six enzymatic reactions, were converted into geranyl-diphosphate (GPP). Geranyl-diphosphate was the substrate for all the terpenes. Then, by a series of terpene synthases, the GPP was synthesized into hemiterpenes (C5), monoterpenes (C10), sesquiterpenes (C15) or diterpenes (C20)19,20,21,22.

The genetic mechanism of Muscat flavour in grapevines has been studied through quantitative trait loci analysis (QTL) in different F1 populations23,24, and in selfing populations, it has been shown that VvDXS is a structural candidate gene for geraniol, nerol, and linalool concentrations in wine grapes25. Battilana reported that single nucleotide polymorphism (SNP) mutations in VvDXS are the main cause of the Muscat flavour. The substitution of a lysine with an asparagine at position 284 of the VvDXS amino acid sequence affects the monoterpene content of Muscat flavour and neutral cultivars26.

In Muscat grapes, some cultivars have a very strong flavour, while others have moderate or light flavour. The terpene type and concentration varied among the cultivars. To date, terpene accumulation has been well-documented in some wine grapes. Terpene accumulation in developing Gewurztraminer grapes has been shown to be correlated with an increase in the transcript abundances of early terpenoid pathway enzymes27. Some transcription factors involved in controlling terpene biosynthesis have been predicted in the grapevine cultivar Muscat Blanc à Petits Grains through gene co-expression network analysis28. A Nudix hydrolase was also found to be a component of a terpene synthase-independent pathway, with cytochrome P450 hydroxylases, epoxide hydrolases and glucosyltransferases genes potentially involved in monoterpene metabolism29. However, there are few reports on the table grape30.

In this study, we present the transcriptome analysis of three genotypes of table grapes. During berry development, 27 samples, in total, were sequenced on the Illumina HiSeq Platform. After quality assessment and data clearance, a total of 254.18 Gb high-quality base pairs with more than 97% Q20 bases were obtained, and an approximately 9.41 Gb per sample. In the aggregate, a total of 776 million reads were yielded, with an average of 31.66 million reads per sample. Furthermore, approximately 76.65% of the total reads were uniquely aligned to the grape genome (V2)31. These data will provide useful information for investigating terpene biosynthesis.

Methods

Overview of the experimental design

The berries of three genotypes were collected at three developmental stages. Approximately 300 grape berries were randomly collected for each replicate, with three replicates harvested for each stage. The experimental design and analysis pipeline are shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the experimental design. Berry samples were collected at three developmental stages, and three biological replicates per sample were used for transcriptome sequencing. All raw reads were quality controlled and assessed. Then, the clean data were mapped to the V. vinifera reference genome (V2) by Hisat2. Gene expression levels were calculated with RSEM.

Materials and methods

Plant materials

Three V. vinifera cultivars were used for transcript study. ‘Xiangfei’ was registered by our team and has a strong Muscat flavour and a green to golden skin colour, while ‘Italia,’ the famous mid-late season table grape cultivar that originated in Italy, has a moderate Muscat flavour. ‘Zaomeiguixiang’ has a purple-reddish colour and a strong Muscat flavour.

Sampling

The vines were grown in the experimental vineyard at the Beijing Academy of Forestry and Pomology Sciences in China (39°58′N and 116°13′E) under a plastic cover and were trained into a two-wire vertical trellis system with a 2.5-m row space and a 0.75 m plant space. In 2017, berry samples from three vines were harvested at the developmental stages corresponding to EL35, EL36, and EL3832. The berry begins to colour and soften at EL 35 (about 5% of the berries started to colour and soften), progresses to the complete veraison with an intermediate Brix of EL 36, and reaches harvest ripeness at EL38. At each stage, three replicates were harvested; approximately 300 grape berries were randomly collected for each replicate.

Physiochemical parameters

Fifty berries of each replicate were pressed and centrifuged to determine total soluble solids (TSS), pH value and titratable acidity. TSS was measured by a digital refractometer (PAL-1, Atago, Tokyo, Japan). The pH value was measured by a pH meter (FiveGo F2-Standard, Mettler Toledo, Switzerland). Titratable acidity was analysed by titration with NaOH (0.1M) to the end point of pH 8.2 and expressed as tartaric acid equivalents in accordance with the National Standard of People’s Republic of China (GB/T15038-2006, 2006). The other berries were then frozen in liquid nitrogen and stored at −80 °C.

RNA extraction and sequencing

The extraction of total RNA from the berries was carried out by a Plant RNA extraction kit (Aidlab Biotechnologies, Beijing, China). The quality of the RNA was verified by agarose gel electrophoresis, and the concentration was determined by the absorbance ratio (A260/A280, 1.8–2.0) on an Implen P330 nanophotometer (Implen GmbH, Munich, Germany).

The RNA-Seq libraries were constructed from 27 samples according to the methods of Wang33. The enriched mRNA was obtained by using oligo (dT) magnetic beads then fragmented by 94 °C for 5 min. cDNA was synthesized by Superscript®III Reverse Transcriptase, followed by purification, end repair and dA-tailing and was then ligated with the sequencing adaptor. Afterwards, PCR amplification was conducted by indexed primers. The constructed library was QC checked by Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System and then sequenced by Illumina HiSeq2000 platform at BGI Life Tech Co., Ltd. (Shenzhen, China). Low quality reads (more than 20% of the base qualities are lower than 10), reads with adaptors and reads with unknown bases (N bases more than 5%) were filtered to get clean reads and were stored in FASTQ format. The clean reads were mapped onto the reference grapevine genome (V2) using Hisat234.

Data Records

The RNA-Seq clean data of the 27 samples were deposited at the NCBI Sequence Read Archive with accessions SRP18415235. The files of gene expression level were deposited in NCBI’s Gene Expression Omnibus (GEO), and are accessible through GEO Series accession number GSE13038636. The information of the differentially expressed genes (DEGs) between samples were deposited in figshare37.

Technical Validation

Quality control

The physiochemical parameter of the samples was shown in Table 1. A total of 27 RNA samples were prepared and sequenced, with the sequencing depth ranging between 22.48 and 33.08 million reads; the Q20 values for the clean reads were above 97%, and the average mapping ratio was 84.72% (Online-only Table 1).

Table 1 Physiochemical parameters for each sample.

Analysis of RNA-Seq data

After novel transcript detection, novel coding transcripts were merged with reference transcripts to get a complete reference. Clean reads were mapped to the transcript by using Bowtie238. Gene expression levels were calculated with RSEM39. The distribution of reads based on the detection of read coverage skewness showed good fragmentation randomness (Fig. 2). The differentially expressed genes (DEGs) between samples were identified by the R package, DESeq240. The DEGs with a |log2ratio| ≥ 1 and a false discovery rate probability ≤ 0.001 were considered statistically significant. The statistical analyses of DEG are shown in Fig. 3.

Fig. 2
figure 2

Reads distribution on transcripts. The x-axis represents the position along transcripts, and the y-axis represents the number of reads.

Fig. 3
figure 3

Statistics of differently expressed genes. The X-axis represents the comparison method between groups and the y-axis represents DEG numbers. The red colour represents upregulated DEGs, and the blue colour represents downregulated DEGs.

Usage Notes

The RNA-Seq fastq.gz files were deposited at Gene Expression Omnibus and can be downloaded using the fastq-dump tool of the SRA Toolkit (https://www.ncbi.nlm.nih.gov). The V2 reference genome of V. vinifera, the annotated file, could be retrieved at (http://genomes.cribi.unipd.it/grape/).