RNA-seq transcriptomic profiling of crassulacean acid metabolism pathway in Dendrobium catenatum

The regulation of crassulacean acid metabolism (CAM) pathway has recently become a topic of intensive research and has been explored in terms of several aspects, including phylogenetics, genomics, and transcriptomics. Orchidaceae, which contains approximately 9,000 CAM species, is one of the largest lineages using this special photosynthetic pathway. However, no comprehensive transcriptomic profiling focused on CAM regulation in orchid species had previously been performed. In this report, we present two Illumina RNA-seq datasets, including a total of 24 mature leaf samples with 844.4 million reads, from Dendrobium catenatum (Orchidaceae), a facultative CAM species. The first dataset was generated from a time-course experiment based on the typical CAM phases in a diel. The second was derived from an experiment on drought stress and stress removal. A series of quality assessments were conducted to verify the reliability of the datasets. These transcriptomic profiling datasets will be useful to explore and understand the essence of CAM regulation.


Background & Summary
Crassulacean acid metabolism (CAM) is the most important photosynthetic physiology by which plants adapt to seasonal water-limiting areas. A CAM plant can assimilate CO 2 into the opening stoma during the night and employ it in the Calvin cycle under stoma closure during the following day. This physiological process reduces evaporative demand by performing stomatal opening during the cooler nighttime instead of during the daytime when the transpiration rate would be higher. Hence, plants engaged in CAM have higher water use efficiency than their C 3 and C 4 counterparts 1 and are considered to be important contributors to agriculture in semiarid and arid regions [2][3][4] .
Recently, two CAM genomes from orchid species, Phalaenopsis equetris and Dendrobium catenatum (=D. officinale), have been published 9,21,22 , laying a foundation for CAM research. P. equetris is an obligate CAM plant 9 with low metabolic plasticity to environmental changes 23 . However, D. catenaum is a facultative CAM plant 24 that can adjust the intensity of the CAM pathway in response to external stresses, such as drought 25 , which allows researchers to compare varied gradients of physiological activities through manipulating experimental conditions 26 . Consequently, the latter orchid should be considered a better candidate for CAM studies than the former. To date, several RNA-seq datasets for D. catenatum have been constructed to examine gene expression in chilling stress 27 , alkaloid biosynthesis 28 , polysaccharides synthesis 29,30 , seed development 31 , and organ-specific regulation 32 but not in CAM processes.
In this study, we conducted two experiments on D. catenatum to generate two RNA-seq datasets. The first experiment aimed to collect the gene expression profile (Dataset I) for CAM and the operation of associated pathways in a diel with four typical phases (Fig. 1a); the second aimed to record the gene expression profile (Dataset II) during the alternation of day and night under drought stress and upon stress removal (Fig. 1b). These experiments yielded 24 samples with a total of 844.4 million reads of transcriptome data from mature leaves (Data Citation 1). Additionally, quality assessments of the data were conducted to verify their reliability (Fig. 1c). We believe that these profiles will help to deepen the comprehensive understanding of the essence of CAM regulation.

Methods
Design and sample collection of experiment I Clones of D. catenatum were cultivated in white and transparent pots (5.0 cm in diameter) with sphagnum moss as the matrix. The plants were grown in a greenhouse with temperatures from 22 to 28°C and relative humidity from 40 to 60%. The experiments were initiated with strong eight-month-old individuals (clones; 12-14 cm height) grown in matrix maintained at approximately 30% volumetric water content, which ensured that these plants did not undergo drought or waterlogging.
During the period when leaf samples were collected, we measured the net CO 2 exchange rates by Li-6400XT (Li-COR Biosciences Inc., Lincoln, NE, USA) to confirm the four CAM phases 5 of the plants in a diel. The following parameters were set for the equipment with a double-sided transparent leaf cuvette: photosynthetic photon flux density (PPFD), natural light; cuvette temperature inside, synced with the outside; CO 2 concentration, atmosphere; flow rate, 200 μmol s −1 ; and cuvette fan speed, fast. According to the curve of CO 2 exchange rates, leaf samples were collected at 00:30, 06:30, 13:00 and 17:30 ( Fig. 1a), which represented Phases I, II, III and IV, respectively. The third and fourth mature leaves from the apex of each individual were harvested and combined as one sample. Three biological replicates were collected for Phase I, five for Phase II, three for Phase III and six for Phase IV. These excised leaves were frozen immediately in liquid nitrogen and stored at −80°C.

Design and sample collection of experiment II
Tissue-cultured D. catenatum plants from seeds were grown in plastic pots (8.0 cm in diameter) filled with a substrate mix of composted pine bark and small stones. The plants were kept in a growth chamber with a temperature of 28/22°C (day/night), a photoperiod of 12/12 hr (day/night), a light intensity of 100 μmol m −2 s −1 , a relative humidity of 50/70% (day/night), and watering every two days at 15:30. Vigorous eight-month-old plants with a height of~12 cm were chosen for the follow-up experiment. Irrigation was performed on the first day, omitted from the second to the seventh day, and recommenced on the eighth day. (Fig. 1b). The mature fourth leaf from the apex of each individual was harvested at both 06:30 and 18:30 (half an hour after light on and off, respectively) on the second, seventh, and ninth days and at 18:30 on the eighth day (Fig. 1b). Each sample time point included only one biological replicate. These samples were frozen immediately in liquid nitrogen and stored at −80°C.

Data filtering and gene quantification
The raw RNA-seq reads were cleaned using the Fastq_clean procedure by Zhang et al. 34 , which included trimming adapters and low-quality bases and removing rRNA and viral sequences. The quality control criteria for this filter were set as follows: (1) low-quality bases below phred quality 20 were trimmed from both ends of reads; (2) after the low-quality bases were trimmed, reads containing over two "N" were removed; (3) the reads with length shorter than 75 (for Dataset I) or 50 (for Dataset II) were removed; and (4) BWA 35 related parameters were set as recommended. The filtering results are listed in Table 1. The clean reads were evaluated using FastQC v0.11.7 (http://www.bioinformatics.babraham.ac.uk/ projects/fastqc/), and the assessment results were summarized and visualized using MultiQC v1. the recommended configuration. Salmon ver. 0.9.1 37 was used to map the clean reads against the primary CDS from the genome (GenBank Assembely ID ASM160598v2) published by Zhang et al. 22 to quantify the gene abundance as read counts. The default settings for Salmon were used. We applied the DESeq2 38 R package to normalize the read counts.

Principal component analysis and heatmap illustration
The normalized read count values of each sample in the two datasets were employed in principal component analysis (PCA) with an online tool, three-dimension PCA (http://www.omicshare.com/tools/ Home/Soft/seniorpca), using the default parameters. A heatmap for sample clustering of Dataset I was illustration with the R package PoiClaClu 39 .

Data Records
The RNA-seq raw data of the two datasets were deposited at the NCBI Sequence Read Archive with BioSample accessions SAMN09267369-SAMN09267385 (Dataset I; Data Citation 1) and SAMN09269105-SAMN09269111 (Dataset II; Data Citation 1). Data Citation 2 contains the R scripts in this study. The files of gene abundance for the two datasets are deposited in Figshare (Data Citation 3). The heatmap for sample clustering of Dataset I is available in Figshare (Data Citation 4).

RNA qualities
The quality of the total RNA was assessed using an Agilent Bioanalyser 2100 (Agilent Technologies). The RNA samples with RNA integrity numbers higher than 7.0 were used to prepare RNA-seq libraries in this study. The RNA quality evaluations are listed in Table 2.

Quality validation
We applied FastQC to assess the RNA-seq clean data, including the mean per base quality scores, per sequence quality scores, and per sequence GC content. Summary plots are presented in Fig. 2.
With respect to both datasets, the quality scores per base were higher than phred quality 30, and almost all sequences had a quality over 20. The GC contents of the samples from both datasets I and II showed a similar normal distribution. Moreover, the RNA-seq data had high mapping rates ranging from  Table 2. RNA sample quality in this study.  87.13-90.82% (Table 1). These statistics indicated that high-quality RNA-seq reads were obtained for downstream analysis. The PCA result (Fig. 3a) showed that the samples in dataset I clustered into four groups corresponding to the four phases in CAM, which was also supported by a sample clustering analysis (Data Citation 4). In Dataset II, the drought stress samples, including DR7, DR8 and DR10, clustered closely, and DR5, DR6 and DR11 were neighbors in the PCA plot (Fig. 3b). The distinctive pattern of samples in the analyses indicated that these transcriptome profiles were valuable for understanding the CAM pathways.