Transcriptomic profiling for prolonged drought in Dendrobium catenatum

Orchid epiphytes, a group containing at least 18,000 species, thrive in habitats that often undergo periodic drought stress. However, few global gene expression profiling datasets have been published for studies addressing the drought-resistant mechanism of this special population. In this study, an experiment involving the effect of continuous drought treatments on an epiphytic orchid, Dendrobium catenatum, was designed to generate 39 mature-leaf-tissue RNA-seq sequencing datasets with over two billion reads. These datasets were validated by a series of quality assessments including RNA sample quality, RNA-seq read quality, and global gene expression profiling. We believe that these comprehensive transcriptomic resources will allow a better understanding of the drought-resistant mechanisms of orchid epiphytes.


Background & Summary
In response to prolonged water deficit stress, plants have evolved coping mechanisms to increase their drought tolerance through physical adaptations, molecular regulations, and environmentally suitable metabolic pathways [1][2][3] . Most studies concerning drought stress mechanisms have been performed in Arabidopsis thaliana and other drought-intolerant C 3 plants 2 . Studying a highly resistant plant that has been shaped by natural selection is the most direct and effective way to extract crucial genes and determine the main metabolic pathways of the drought stress procedure.
In the wild, most epiphytic orchids, a prosperous group containing over 18,000 species, take root on the surface of tree bark or rocks 4,5 . Due to the poor moisture supply in these habitats 6 , these plants usually suffer periodic water shortage 7 . While adapting to harsh habitats, some orchid species have evolved succulent storage-organs, such as pseudobulbs 8,9 , thick leaves 10 , and crassulacean acid metabolism (CAM) 11 , a photosynthetic pathway with high water-use efficiency 12 . Morphological and anatomical studies show that orchid plants possess desirable qualities for mitigating drought stress 10,13,14 . By measuring physiological indexes and secondary metabolites of Dendrobium moniliforme 15 , Wu et al. found that increasing antioxidant enzyme activities and osmolytes play an important role in protecting plants under drought stress. Although several physiological traits might provide clues for the mechanism of drought resistance, there is no large data set that allows holistic understanding. Unfortunately, few comprehensive transcriptomic profiling studies that address drought resistance have been published.
Comparing the two published genomes from epiphytic orchid species 16,17 , Phalaenopsis equestris and Dendrobium catenatum, the latter possesses more Heat-shock protein 70 family members and R genes 17 , which suggests that D. catenatum can tolerate a much wider variety of environments and has superior qualities for adverse resistance. A previous study demonstrates that D. catenatum uses the facultative CAM pathway as a drought-enduring process 11 . Hence, this species can be considered as droughtresistant material useful for elucidating mechanisms of mitigating drought stress in epiphytic orchids. Previous studies show that the circadian clock modifies responsiveness to environmental input and stress according to the time of day [18][19][20][21] . With regard to the correlation between CAM and circadian rhythm 22 . the conventional sampling tactics that focus on a single time point per day should be abandoned as, if the daylight sampling time is fixed, some important clues to key resistance genes could be missed.
In the current study, D. catenatum plants were subjected to continuous drought treatments by simulating their natural environment under controlled conditions. Sampling time points were set for both day and night during the drought procedure. A dataset containing 39 RNA-seq with over 41 million sequence reads per sample was generated using the Illumina HiSeq 2500 platform. We assessed RNA sample quality, RNA-seq read quality, and the global gene profile (Fig. 1) Figure 1. Overview of the experimental design and analysis pipeline. The raw data were filtered using the package Fastq_clean, and clean data were assessed using FastQC and MultiQC. The clean reads were mapped to the D. catenatum genome (GenBank Assembely ID ASM160598v2) using Hisat2. The package ReSQC was used to calculate RNA-seq reads coverage over the gene body. Gene abundance was quantified using DESeq2. our dataset. We believe that these transcriptomic profiles will contribute to a comprehensive understanding of the mechanism of drought resistance in D. catenatum.

Plant material and experimental design
Clones of D. catenatum were planted in transparent plastic pots (5.0 cm in diameter) with sphagnum moss as the matrix. Eight-month-old plants were transferred into a phytotron chamber (12/12 h light/ dark, light intensity~100 μmol m −2 s − 1 ; 28/22°C day/night; and relative humidity 60/70% day/night) and adapted to the controlled conditions for 10 days before being used for the follow-up experiment. The experiments were conducted on initially healthy individuals (~12 cm height). Plants were irrigated on the first day and then water was withheld to mimic drought stress. We collected leaf samples when the volumetric water content of the base material declined to~30-35%,~10-15, and~0%, respectively, at both 09:00 h and 21:00 h (Fig. 1). The fourth and fifth leaves (mature leaf) from the apex of each plant were harvested and mixed to create one sample. These samples were immediately frozen in liquid N 2 and stored at À 80°C.

RNA isolation and sequencing
Total RNA was extracted from the samples mentioned above (

Data filtering and assessment
The raw data (raw reads; Data Citation 1) were filtered using Fastq_clean v2.0 23 . Sequencing adapters, low-quality bases, viral sequences, and rRNA sequences were cleaned. The criteria for this filtering procedure were set as follows: (1) Table 1), respectively; (2) bases with a phred quality score below 20 were clipped from both ends of reads; (3) after low-quality bases were trimmed, reads containing over two "N" were discarded; (4) reads with a length shorter than 75 nt were discarded; and (5) the parameters for BWA v0.5.7 24 were set as recommended according to Fastq_clean instructions. The statistics of clean reads are listed in Table 1.

Gene quantification and detection of read coverage skewness
The clean reads were mapped to the D. catenatum genome 17 (GenBank Assembly ID ASM160598v2) using Hisat2 26 with default parameters. Salmon v0.9.1 27 was used to estimate gene abundance as read counts in the alignment-based mode. The raw read counts were imported into the R package DESeq2 28 for normalization. We used the package ReSQC 29 to assess RNA-seq read coverage skewness over the gene body based on the above mapping results.

Assessment of sample composition
A heatmap for cluster relationships among samples representing Poisson distance were generated with raw read counts. The R package PoiClaClu 30 was used for the calculation of Poisson distance, and the R package Pheatmap (https://cran.r-project.org/web/packages/pheatmap/index.html) for visualization. A principal component analysis (PCA) was also employed to assess sample relationships based on rlogtransformed values of raw read counts.

Gene hierarchical clustering and Gene Ontology (GO) analysis
To determine the highly correlated genes in this prolonged drought experiment, weighted gene coexpression network analysis (WGCNA) 31 was used to detected gene clusters (modules) on normalized read counts (Data Citation 2) using the WGCNA v1.63 32,33 package in R. This analysis generated a topological overlap matrix plot (Fig. 2) that illustrated the relationships among gene clusters. To give an

Code availability
The R scripts for reads count filtration and normalization, heatmap illustration, PCA and WGCNA are available in Figshare (Data Citation 4).

Data Records
The RNA-seq raw data of 39 samples are deposited at the NCBI Sequence Read Archive (Data Citation 1). Supplementary materials are available on the Figshare data management platform (Data Citations 2-4). Data Citation 2 provides expression profiles of raw read counts and normalized read counts; Data Citation 3 contains WGCNA results, GO annotation for all genes, and GO enrichment for gene clusters. Data Citation 4 is dedicated to the R scripts in this study.

RNA quality control
The quality of total RNA is a critical parameter for the construction of sequencing libraries and the follow-up quantitative analyses. In particular, RNA integrity (RIN) is positively correlated on uniquely mapped reads in RNA-Seq 34 , which means low RIN would lead to a bias in gene expression profiles. In this study, RNA samples with a RIN value >6.5 were employed for RNA-seq library construction, which meant that high-quality reads were obtained for subsequent studies. The quality values for RNA samples, including RIN, are listed in Table 2.

Quality validation
The high clean data rate (Table 1), ranging from 98.73% to 99.56%, indicated that both RNA-seq libraries and raw RNA-seq data obtained in this study were of high quality. Results of clean reads assessment by FastQC are illustrated in Fig. 3. The per base quality scores were >30, and most per sequence quality scores were >20, suggesting a high sequence quality. The per sequence GC contents had pattern curves similar to a normal distribution indicating the sequencing data were free of contamination. In addition, we examined read-mapping qualities of the 39 samples, including mapping rates and read distribution on reference genes. The mapping rates to the reference genome were superior, with a range from 83.36% to 88.02% (Table 1). The distribution of reads based on the detection of read coverage skewness showed good fragmentation randomness (Fig. 4), which reflected that each part of the gene was sequenced evenly.
Both the heatmap (Fig. 5a) and PCA (Fig. 5b)   distinctly separate from the groups with water content of 10-15% and 25-30%. However, the clustering of samples with 10-15% and 25-30% water content overlapped. The explanation for this is that, for a CAM plant, moderate drought would not result in a significant change in gene expression because of its strong ability to adapt to drought.