Selection of suitable reference genes for qRT-PCR normalisation under different experimental conditions in Eucommia ulmoides Oliv

Normalisation of data, by choosing the appropriate reference genes, is fundamental for obtaining reliable results in quantitative real-time PCR (qPCR). This study evaluated the expression stability of 11 candidate reference genes with different varieties, developmental periods, tissues, and abiotic stresses by using four statistical algorithms: geNorm, NormFinder, BestKeeper, and RefFinder. The results indicated that ubiquitin-conjugating enzyme S (UBC) and ubiquitin-conjugating enzyme E2 (UBC E2) could be used as reference genes for different E. ulmoides varieties and tissues, UBC and histone H4 (HIS4) for different developmental periods, beta-tubulin (TUB) and UBC for cold treatment, ubiquitin extension protein (UBA80) and HIS4 for drought treatment, and ubiquitin-60S ribosomal protein L40 (UBA52) and UBC E2 for salinity treatment. UBC and UBC E2 for the group “Natural growth” and “Total”, UBA80 and UBC for the group “Abiotic stresses”. To validate the suitability of the selected reference genes in this study, mevalonate kinase (MK), phenylalanine ammonia-lyase (PAL), and 4-coumarate-CoA ligase (4CL) gene expression patterns were analysed. When the most unstable reference genes were used for normalisation, the expression patterns had significant biases compared with the optimum reference gene combinations. These results will be beneficial for more accurate quantification of gene expression levels in E. ulmoides.

Gene expression analysis is an important part of molecular biology research. Quantitative real time polymerase chain reaction (qRT-PCR) has become a popular technology for studying gene expression patterns 1,2 . Absolute and relative quantification are two methods of presenting quantitative gene expression. The absolute quantification is done by comparing the quantification cycle (Cq) value of the sample with the standard curve 3 . As for the relative quantification, the qPCR data of target genes requires reference genes for calibration 4 . since the copy number of genes usually does not have any biological significance, researchers are more concerned about differential expression in gene analysis. Therefore, relative quantification has become the main method of gene expression analysis 5 .
In order to improve the reliability and accuracy of gene expression quantification, the standardisation of gene expression data is necessary. Calibration can eliminate system errors associated with the experimental process (i.e. the errors of sample quantification or between samples) 6,7 . Generally, the most common method for normalising data in gene expression experiments is to use reference genes as internal controls. Ideally, the reference gene expression profiles are not influenced by experimental conditions, however, a reference gene with universally stable expression under all experimental conditions (different varieties, tissues or organs, developmental periods, under biological or abiotic stresses etc. ) has not yet been discovered. Therefore, screening and validating appropriate reference genes for different experimental conditions are critical for target gene expression data normalisation [8][9][10] . Usually, genes associated with maintaining basic cell functions (primary metabolism or cell structure) are selected as candidate reference genes [11][12][13] , such as 18S ribosomal RNA (18S RNA), actin (ACT), actin 97 (ACT97), histone H2B (HIS2B), histone H4 (HIS4), alpha-tubulin (TUA), beta-tubulin (TUB), ubiquitin-60S

Results
Selection of reference genes, amplification specificity and efficiency, and cloning. Based on the E. ulmoides transcriptome data, we cloned 11 candidate reference genes (18S rRNA, ACT, ACT97, HIS2B, HIS4, TUA, TUB, UBA52, UBA80, UBC, and UBC E2) from "Huazhong4". The sequences of these genes and the primers used for cloning are shown in Supplementary Figs S1-S11 and Supplementary Table S1. The primer pairs of all candidate reference genes and target genes were designed for qRT-PCR, and the amplicon lengths were controlled between 59 and 200 bp. Agarose gel electrophoresis ( Supplementary Fig. S12) and melting curve analysis ( Supplementary Fig. S13) were used to determine primer specificity. The amplification efficiency of qRT-PCR across all 11 reference genes varied from 89.1 to 106.4%, with R 2 varying from 0.991 to 0.997 (Table 1 and Supplementary Fig. S14).
Expression profile of the reference genes. Cq values were used to quantify the expression levels of candidate reference genes; lower Cq values mean higher expression levels. The raw Cq values for all samples in this study were listed in Supplementary Table S2 (there were no Cq values in the negative controls), and a box and whiskers plot was used to describe the raw Cq value distribution ( Fig. 1 26.32). In addition, every reference gene had different coefficients of variation (lower values represent less variability) in this study, as shown in Fig. 1; TUA had the most variation whereas TUB had the least variation, which indicated that TUA was the most unstable gene and TUB was the most stable gene of all the reference genes.
Expression stability analysis of reference genes. In this study, six experimental conditions were performed, including different E. ulmoides varieties, different developmental periods, different tissues, cold treatment, drought treatment and salinity treatment. Furthermore, these experimental conditions were sorted into three different groups: "Natural growth" (Varieties, Periods, Tissues), "Abiotic stresses" (Cold, Drought, Salinity) and "Total" (all experimental conditions). In order to select appropriate reference genes for these experimental conditions and groups, four software programs (geNorm 28 , NormFinder 29 , BestKeeper 30 , and RefFinder 31 ) were used to evaluate the stability of the 11 candidate reference genes. GeNorm analysis. The geNorm analysis results were presented in Table 2 The optimal number of reference genes for normalisation depends on pair-wise variation (V n/n+1 ). When V n/n+1 < 0. 15, it suggests that an extra reference gene is not necessary for normalisation. For six experimental conditions (Varieties, Periods, Tissues, Cold, Drought, Salinity), two reference genes were sufficient for accurate normalisation (Fig. 2); the most stable genes pairs for these conditions were UBA80 and UBC, UBC and UBCE2, Figure 1. Cq values distribution of eleven candidate reference genes across all experimental samples of E. ulmoides. The outside box is determined from 25 th to 75 th percentiles, and the inside box represents the mean value. The line across the box is the median. The whiskers represent percentiles from 5 th to 95 th , and asterisks represents outliers.  Fig. S4). For the group of "Abiotic stresses" and "Total", V 2/3 > 0.15 and V 3/4 < 0.15 (Fig. 2). Therefore, the genes UBC, UBC E2, and ACT97, and UBA80, UBC, and UBC E2 were chosen, respectively ( Supplementary Fig. S15). However, for the "Natural growth" group, where V 5/6 < 0.15 (Fig. 2), five reference genes were needed.
NormFinder analysis. The stability value of each candidate reference gene was also analysed by NormFinder, wherein a lower stability value indicates higher expression stability. In this research, the results analysed by NormFinder were similar to the analysis by geNorm (  BestKeeper analysis. The analysis results of BestKeeper are also shown in Table 2. The results indicated that all candidate reference genes were remarkably stable when expressed under most experimental conditions (different varieties, periods, cold treatment, and drought treatment) in this study. The rankings by BestKeeper analysis showed that the most stable reference genes were UBC E2 (CV ± SD = 0.61 ± 0.14) and UBC (CV ± SD = 0.98 ± 0.24) for different varieties of E. ulmoides. HIS4 (CV ± SD = 0.98 ± 0.20) and UBC (CV ± SD = 1.68 ± 0.42) were the most stable genes for leaf blade developmental periods. For different tissues, only four reference genes were expressed stably; the most stable genes were UBC E2 (CV ± SD = 3.26 ± 0.75) and UBC (CV ± SD = 3.88 ± 0.95). TUB (CV ± SD = 0.51 ± 0.13) and UBC (CV ± SD = 0.60 ± 0.16) showed the most stable expression under cold treatment. UBC (CV ± SD = 0.32 ± 0.08) and HIS4 (CV ± SD = 0.64 ± 0.15) were the best reference genes under drought treatment. Ten reference genes displayed significantly stable expression in the salinity treatment; ACT97 (CV ± SD = 0.71 ± 0.19) and ACT (CV ± SD = 0.97 ± 0.24) were the most stable genes, while 18S rRNA (CV ± SD = 4.70 ± 1.16) was considered not suitable for gene expression normalisation. Seven reference genes had remarkably stable expression in the "Natural growth" group, in which UBC E2 (CV ± SD = 2.15 ± 0.50), ACT (CV ± SD = 2.44 ± 0.60), and UBC (CV ± SD = 2.45 ± 0.60) were the most stable genes ( Table 2). For the "Abiotic stresses" group, all of the 11 reference genes were stably expressed; of these, UBA80 (CV ± SD = 1.40 ± 0.32), UBC (CV ± SD = 1.29 ± 0.34), and TUB (CV ± SD = 1.39 ± 0.34) were the most stable genes (Table 2). For the "Total" group, seven reference genes presented remarkably stable expression, in which ACT (CV ± SD = 2.50 ± 0.61) and TUB (CV ± SD = 2.70 ± 0.66) were the most stable genes ( Table 2).

RefFinder analysis.
We estimated the geomean of ranking values obtained from geNorm, NormFinder, and BestKeeper programs using RefFinder software. This allowed us to generate a recommended comprehensive ranking of reference genes for accurate transcript normalisation in each experimental set. The results indicated that UBC and UBC E2 were the most stable genes for different varieties and tissues, that UBC and HIS4 were the most stable genes for different development stages, that TUB and UBC were the most stable genes for cold treatment, that UBA80 and HIS4 were the most stable genes for drought treatment, and that UBA52 and UBC E2 were the most stable genes for salinity treatment. UBC and UBC E2 were the most stable genes for the groups "Natural growth" and "Total", and UBA80 and UBC were the most stable genes for the group "Abiotic stresses" ( Table 2).

Reference gene validation.
To validate the accuracy of selected reference genes, the relative expression levels of MK, PAL, and 4CL were analysed in all the experimental conditions involved in this study. For each experiment condition, the two most stable and two unstable reference genes, according to RefFinder and the reference genes combination according to geNorm, were selected for normalisation. Among the four varieties, there was no significant difference in the expression of MK. MK has the highest expression in "Daye", followed by "Xiaoye" and "Yanci"; the lowest was "Huazhong4" (Fig. 3A). In the five leaf developmental stages, MK was up-regulated with approximately 1.8-fold changes in the third period and down-regulated in the second, fourth and fifth periods (Fig. 3B). Among the five tissues, MK has the highest expression in the leaves, followed by the bark (0.85-fold changes); the lowest was the root (0.2-fold changes) (Fig. 3C). Under cold treatment, MK was down-regulated at 2 h, 6 h, and 12 h, but especially at 2 h (0.05-fold changes) (Fig. 3D). Under drought treatment, the expression of MK was down-regulated, and the lowest  (Fig. 3E). Under the salinity treatment, the expression levels of MK first decreased but then increased with treatment time. The lowest expression was at 6 hours after treatment (0.28-fold changes) (Fig. 3F).
Among the four varieties, PAL has the highest expression in "Huazhong4", followed by "Yanci", and the lowest expression in "Daye" (Fig. 4A). During leaf development, the expression level of PAL first increased with leaf growth and then decreased, with the highest expression levels in period three (Fig. 4B). Among the five tissues, PAL has the highest expression in leaves, followed by branches, and was not expressed in fruits (Fig. 4C). Under cold treatment, the expression of PAL first increased sharply, then decreased with the prolongation of the treatment, but slightly increased 12 h after treatment (Fig. 4D). In the drought and salinity treatments, the expression levels of PAL decreased with the prolongation of treatment time (Fig. 4E, F).
Among the four varieties, 4CL has the highest expression in "Yanci", followed by "Huazhong4", and the lowest expression levels in "Daye" (Fig. 5A). During leaf development, the expression level of 4CL increased with leaf development, but decreased in period five (Fig. 5B). In different tissues, the expression level of 4CL was highest in leaves, followed by branches, but it was not expressed in fruits (Fig. 5C). In the first six hours of the cold treatment, the expression level of 4CL increased with the prolongation of the treatment time, decreased sharply in the 9 th hour of treatment, and increased slightly in the 12 th hour of treatment (Fig. 5D). During drought treatment,  the expression of 4CL was up-regulated at first, but then decreased gradually with treatment time (Fig. 5E). The expression of 4CL slowly increased as the salt treatment time was prolonged (Fig. 5F).
Our results confirm that using different reference genes for normalisation causes great differences among the expression patterns of MK, PAL and 4CL.When the stable reference genes and optimum reference gene combinations were used for normalisation, the expression patterns of MK, PAL and 4CL were similar. However, when the most unstable reference genes were used for normalisation, the expression patterns of MK, PAL and 4CL had significant biases compared with the optimum reference gene combinations. The results illustrate that a stably expressed reference gene was essential to an accurate normalisation of target gene expression.

Discussion
Gene expression pattern analysis in different experimental conditions is necessary for the functional analysis of genes 32 . Presently, many methods can be used to study gene expression levels, but qRT-PCR has become a powerful technology to research gene expression patterns because of its accuracy and sensitivity 33,34 . In qRT-PCR analysis, reference genes with stable expression levels and suitable expression abundance are preconditions that ensure the accuracy of gene expression analysis in different experimental conditions or species 35 . Ideal reference genes should be stably expressed in all experimental conditions. Many studies have emphasised that there is neither a universal reference gene nor a defined number of genes that should be used; thus, it is necessary to experiment in order to determine the appropriate reference gene or gene combination 36 . Reliable reference genes have been determined in many plant species under different cultivars, developmental stages, biotic stresses and abiotic stresses. For instance, selected suitable reference genes have been found for Coffea arabica 37 , peach 38 ,carrot 39 , berry 40 , celery 41 , pepper 42 , maize 43 , and so on. However, to the best of our knowledge, the selection of reference genes has only been carried out in transgenic E. ulmoides. In this study, 11 commonly used reference genes (18S rRNA, ACT, ACT97, HIS2B, HIS4, TUA, TUB, UBA52, UBA80, UBC and UBC E2) were selected as candidate reference genes to analyse under three natural growth conditions and three abiotic stress conditions. All candidate reference genes used in this study presented a suitable expression abundance (19 < Cq < 29), which can further evaluate their expression stability. To date, this study is the first report of a systematic analysis of reference genes in different varieties, tissues, developmental stages and environmental conditions in E. ulmoides.
In order to avoid the one-sidedness of an algorithm for the analysis of the stability of reference genes, several statistical methods are usually simultaneously used to analyse the best reference genes in different experimental conditions 44,45 . In the present study, three commonly used statistical programs (geNorm, NormFinder, and BestKeeper) were employed to evaluate and determine suitable reference genes. Similar to other studies, different statistical methods produced different stability rankings in each experimental condition, but the results were roughly the same. As reported in other studies, the most discrepant results in the gene stability ranking were obtained with BestKeeper 36 . In this study, for the "Total" group, UBC, UBA80, and UBC E2 were identified as the most stable genes by geNorm and NormFinder, BestKeeper showed ACT and TUB to be the best reference genes despite the fact that, ACT and TUB were ranked as the 8 th and 9 th genes by both geNorm and NormFinder. Therefore, it is very important for this study to use RefFinder to comprehensively analyse the results of geNorm, NormFinder and BestKeeper. The results of RefFinder are based on the geometric mean of the three software programs and the delta CT method to obtain the final ranking.
Using a single reference gene for normalisation will lead to deviations in the results 28,46 . Thus, two or more reference genes for standardization purposes will reduce the experimental error 47 . In the present study, geNorm was employed to determine the optimal number of reference genes for calibration in different experimental conditions. Our results showed that under different varieties, tissues, developmental stages and environmental conditions, the pair-wise variation was V 2/3 < 0.15, which indicated that two reference genes were sufficient for optimal normalisation. But in the groups "Natural growth", "Abiotic stresses" and "Total", V 2/3 > 0.15, which indicated that more genes were needed. However, although using multiple reference genes can make the results more accurate, it is not a required standard 28 .
The suitability of the selected reference genes has been assessed by analysing the expression levels in three target genes that related to trans-polyisoprene (Eu-rubber) biosynthesis (MK) and CGA biosynthesis (PAL and 4CL). MK is a key enzyme-coding gene related to trans-polyisoprene biosynthesis 26 ; PAL and 4CL are the upstream key enzymes of CGA 27 . In our study, the expression levels of MK, PAL and 4CL were different in different varieties, different tissues, different developmental stages and abiotic stresses. Indeed, the contents of Eu-rubber and CGA in different varieties, tissues and developmental stages of E. ulmoides were different. In addition, the expression levels of MK, PAL and 4CL were largely different in cold, drought, and salinity treatments. This is possibly due to the fact that these abiotic stresses are related to the content of Eu-rubber and CGA.
Additionally, we used both the most stable and the most unstable reference genes for normalisation to compare with the optimal reference gene combination for normalisation, the results are quite different. When the stable reference genes and optimum reference gene combinations were used for normalisation, the expression patterns of MK, PAL and 4CL were similar. However, when the most unstable reference genes were used for normalisation, the expression patterns of MK, PAL and 4CL had significant biases compared with the optimum reference gene combinations. This indicates that the reference genes screened in this study are reliable.
The selected stable reference genes in this study will be beneficial for more accurate quantification of gene expression levels in E. ulmoides for different varieties, developmental stages, tissues and environmental conditions.

Methods
Plant materials and treatments. For non-stress treatments, the third leaves from the base of the E. ulmoides branches were collected on April 9 th , 2016 to evaluate expression stability in four different varieties: "Xiaoye", "Daye", "Huazhong4" and "Yanci". Leaves of "Huazhong4" at five developmental stages were collected every 10 days from March 31 st to May 9 th , 2016 to evaluate expression stability in different leaf blade developmental periods. These five periods include leaves from germination to maturity; blade widths were 0.5 cm, 2.5 cm, 4 cm, 5.5 cm and 7.3 cm for each period, respectively. The third leaves from the base branches, barks from annual branches, one-year-old branches, and fibril roots of "Huazhong4" plants were harvested to evaluate expression stability in different tissues. All of the above materials were collected from the nursery of the College of Forestry, Northwest A & F University in Yangling, Shaanxi, China. For stress treatment, one-year-old potted plants of "Huazhong4", kept in the natural environment, were carefully removed from soil, and the roots were gently washed by distilled water. For drought and salinity treatments, the plants were immersed in complete medium containing 15% PEG 6000 and 200 mM NaCl, respectively, for 0, 2, 6, 9 and 12 h. For cold treatment, the plants were immersed in complete medium and were transferred at 4 °C for 0, 2, 6, 9 and 12 h. All treatments were performed in our laboratory. The leaf blade samples (the third leaves from the top of the plants) were separately collected and immediately frozen in liquid nitrogen, and then were stored at −80 °C. Each experimental condition had three biological replicates.
RNA isolation and cDNA reverse transcription. Total RNA was extracted using the Plant RNA Kit (OMEGA, Omega Bio-Tek, Shanghai, China) and treated with RNase-free DNase I according to the manufacturer's instructions. RNA concentration and purity were measured by the NanoDrop Nano-200 (All For Life Science, Hangzhou, Zhejiang, China), and RNA integrity was estimated by 1.2% agarose gel electrophoresis. cDNA (10 μL) was synthesised from 500 ng of total RNA using the PrimeScript ™ RT reagent Kit (TaKaRa Biotech Co., Ltd., Dalian, China). Random 6 mers and the Oligo dT Primer were used together according to the manufacturer's instructions.
Candidate reference genes selection, primer design, and gene cloning. The sequences of 10 candidate reference genes (18S rRNA, ACT, ACT97, HIS2B, HIS4, TUA, TUB, UBA52, UBA80, and UBC) originated from our E. ulmoides transcriptome (not published), and the primer sequences of UBC E2 originated from the study of Chen et al. 48 . The primers were designed by Primer Premier 5.0 software. The primer sequences of candidate reference genes used in this study were embodied in Table 1 and Supplementary Table S1. The primer specificities and amplicons size were verified by 4% agarose gel electrophoresis. A five-fold cDNA dilution series with three replicates per concentration was used to made standard curves for estimation of amplification efficiency (E = (10 [−1/slope] −1) × 100%) and the correlation coefficient (R 2 ) 49 . The sequences of 11 candidate reference genes from E. ulmoides were cloned using 2 × Taq Plus Master Mix (Vazyme Biotech Co., Ltd., Nanjing, China) as the polymerase. The PCR reaction (50 μL) contained 25 μL of 2 × Taq Plus Master Mix, 19 μL of dd H 2 O, 2 μL of the template cDNA, and 2 μL of each primer (10 nmol·m L −1 ). The amplification conditions were as follows: 3 min at 94 °C for denaturation; 35 cycles of 30 s at 94 °C (denaturation), 30 s at 55 °C (annealing), and 60 s at 72 °C (extension); and a final step of 10 min at 72 °C for extension. PCR products were gel-purified, ligated into the pMD 19-T vector, and then transformed into Escherichia coli. The bacterial liquids were sequenced by Gen Script Corporation (Nanjing, China).
Quantitative real-time PCR assay. qRT-PCR reactions were performed in a CFX96 Connect Real-time PCR Detection System (Bio-Rad Laboratories, Inc., Hercules, CA, USA) using SYBR Premix Ex Taq (TaKaRa Biotech Co., Ltd., Dalian, China). Each PCR reaction mixture (20 μL) contained 2 μL of diluted cDNA (20 × dilution), 10 μL of SYBR Green II Mix, 0.8 μL of each primer (10 nmol.mL −1 ), and 6.4 μL of ddH 2 O. The amplification conditions were as follows: 95 °C for 30 s to pre-denaturation, 40 cycles at 95 °C for 10 s to denaturation, and 58 °C for 20 s to annealing and extension. Melting curves were analysed from 60 °C to 95 °C to confirm primer SCieNtiFiC REPORTS | (2018) 8:15043 | DOI:10.1038/s41598-018-33342-w specificity and lack of primer dimers. Each reaction was repeated three times. The negative controls were performed on each plate and for each sample, with ddH 2 O and total RNA to replace the cDNA. Data analysis. Cq values were obtained by setting the baseline threshold to a mean of 75.55. The raw Cq data are shown in Supplementary Table S2. Four widely used software: geNorm 28 , NormFinder 29 , BestKeeper 30 , and RefFinder 31 were used to analyse the candidate reference gene's expression stability. When using the geNorm and NormFinder algorithms for analyses, the raw Cq data needs to be transformed into relative quantities. However, when using the BestKeeper and RefFinder software, the Cq values need not to be converted.
GeNorm calculates the expression stability measure (M) and analyzes the pair-wise variation (V) for each candidate reference genes, then excludes the most unstable genes which with highest M-value progressively. In addition, pair-wise variation V n /V n+1 (0.15 recommended threshold), determines the optimal number of reference genes for normalization 36,41 .
NormFimder calculates the expression stability value (SV) on the basis of intra-and inter-group for each reference gene 29 . The high expression stability of this gene is reflected in a low SV-value.
BestKeeper calculates the stability of candidate reference genes based on standard deviation (SD), Pearson correlation coefficient (r), and coefficient of variation (CV) with the Cq data of all candidate genes. The most stable gene is with the lowest SD and CV values. The range of variation of SD should be below 1 36,41 .
RefFinder can generate a comprehensive ranking of candidate reference genes in each experimental condition 31 .
Validation of reference genes. To validate the reliability of selected reference genes, two most stable and two unstable reference genes and optimum internal reference gene combinations were used to normalize the relative expression patterns of MK, PAL, 4CL in each experimental condition. The relative expression levels were calculated by 2 −△△Ct method 5 .