Evaluation of Angelica decursiva reference genes under various stimuli for RT-qPCR data normalization

Angelica decursiva is one of the lending traditional Chinese medicinal plants producing coumarins. Notably, several studies have focused on the biosynthesis and not the RT-qPCR (quantitative real-time reverse transcription polymerase chain reaction) study of coumarins. This RT-qPCR technique has been extensively used to investigate gene expression levels in plants and the selection of reference genes which plays a crucial role in standardizing the data form the RT-qPCR analysis. In our study, 11 candidate reference genes were selected from the existing transcriptome data of Angelica decursiva. Here, four different types of statistical algorithms (geNorm, NormFinder, BestKeeper, and Delta Ct) were used to calculate and evaluate the stability of gene expression under different external treatments. Subsequently, RefFinder analysis was used to determine the geometric average of each candidate gene ranking, and to perform comprehensive index ranking. The obtained results showed that among all the 11 candidate reference genes, SAND family protein (SAND), protein phosphatase 2A gene (PP2A), and polypyrimidine tract-binding protein (PTBP) were the most stable reference genes, where Nuclear cap binding protein 2 (NCBP2), TIP41-like protein (TIP41), and Beta-6-tubulin (TUBA) were the least stable genes. To the best of our knowledge, this work is the first to evaluate the stability of reference genes in the Angelica decursiva which has provided an important foundation on the use of RT-qPCR for an accurate and far-reaching gene expression analysis in this medicinal plant.

target gene, the RNA yield, quality, and the reverse transcription efficiency that is present in different samples. Therefore, to overcome experimental errors and assure accuracy of experimental results, a certain number of internal reference genes, also called housekeeping genes, are often selected for calibration and standardization 14,15 . There genes are usually used as a reference for either tissues or cells when they are stably expressed in a diversity of experimental conditions, and when changes are detected in the expression level of a target gene. Among the most commonly used internal reference genes for RT-qPCR are Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), Elongation factor-1α (EF-1α), Tubulin (TUB), β-Actin (ACT ), 18S ribosomal RNA protein (18S), Polyubiquitin 10 (UBQ10), etc. These housekeeping genes play a significant role in the cell structure maintenance and during primary metabolic activities. Moreover, more internal reference genes have been identified from the gene expression chip data. They include protein phosphatase 2A (PP2A), SAND family protein (SAND), Tap42interacting protein of 41 kDa (TIP41), etc. Consequently, the reference gene screening and evaluation have been accomplished for most species, such as rice 16 , Lycoris aurea 17 , radish 18 , potato 19 , Peucedanum praeruptorum 20 , etc. However, no study has demonstrated a systematic selection of reference genes in A. decursiva under external challenges (abiotic stress and hormone treatments). The study of the coumarin biosynthesis pathway is one of our primary research interests relates. Here, previous studies have characterized some candidate genes that associate with A. decursiva 8 . However, it is imperative to find suitable reference genes, that enhance the detection of differential gene expression levels under various experimental conditions in A. decursiva using the RT-qPCR application.
In our study, 11 candidate reference genes (SAND, PP2A, PTBP, ACT , CYP2, EXP-1, GAPDH, TUBA, NCBP2, UBQ10, and TIP41) were selected about the transcriptome sequencing datasets of A. decursiva. Besides, 7 different forms of treatments such as cold (4 °C), drought (20% PEG 6000), methyl jasmonate (25 mM MeJA), salt (600 mM NaCl), oxidative (50 mM H 2 O 2 ), ultraviolet (UV) induction, and metal (500 mM CuSO 4 ) were set. Next, the application of four statistical algorithms (geNorm, NormFinder, BestKeeper, and Delta Ct) to evaluate their expression stability for normalization, and comprehensive stability ranking was also performed by RefFinder. In conclusion, our work provides a basis for further studies on gene expression profiling and the regulation mechanisms of coumarin biosynthesis in A. decursiva under diverse experimental conditions.

Results
Selection of candidate reference genes, evaluation of amplification specificity and PCR efficiency. Here, 11 candidate reference genes were selected based on the transcriptome data of A. decursiva (unpublished). Table 1 lists all the candidate reference gene names and abbreviations, homologous genes from Arabidopsis, primer sequences, amplification length, annealing temperature (°C), PCR efficiency (E), and correlation coefficient (R 2 ). Next, conventional PCR and RT-qPCR were used to verify the primer-specific amplification of all candidate reference genes. As illustrated in Fig. S1, based on agarose gel electrophoresis and melting curve analysis which showed a peak, the 11 primer pairs were amplified by a single specific amplicon. Here, the amplification efficiency range of CYP2 and PP2A was 1.711 and 1.880 respectively whereas the correlation coefficient range of UBQ10 was 0.888, and that of PP2A, PTBP, TIP41, and ACT was 1.000.  Cyclophilin 2  AT4G33060  CGC CAC TTT TTT TGT TCT CT/  TTG CGG ATT ATA TTC CGA CA  108  82.9  1.711  0.971   EXP-1   S-adenosyl-l-methionine-dependent  methyltransferases superfamily  protein   AT2G32170  CCA AGT AGG AGC TTG GGA TG/  CAC ACC ACC GTC CTT TAG AA  109  81.7  1.789  0.958   GAPDH  Glyceraldehyde 3-phosphate dehydrogenase  AT1G42970  TGG TTC CCT TAA CGA TAC CA/  CTT ACG TTG TTG GCG TGA AT  138  www.nature.com/scientificreports/ Expression profile of candidate reference genes. Here, the cycle threshold values (Cp) showed the number of cycles when the generated fluorescent signal reached a level that could be detected. Therefore, in this study, the expression profile of the candidate reference genes were directly determined using the obtained Cp values. As shown in Fig. 1, the average Cp values of these 11 reference genes are distributed between 10 and 30, and a majority of them are distributed between 23 and 27. Among all the candidate reference genes, NCBP2 had the lowest average Cp values, whereas CYP2, PTBP, and TUBA had the highest average Cp values. Moreover, each reference gene had a different coefficient of variation under different conditions. Also, it was observed that SAND and PTBP had the lowest variability, whereas TUBA, which had the highest Cp value, had the highest variability. This value ranged between 25 and 32. Therefore, it may not be qualified as a reference gene. Through the Cp value combined with box-plot not only display the expression profile of the reference gene, but also give us a glimpse in their stability (Table S2). However, considering the complexity of their surroundings, we need to check the proper use of the references in every particular experimental condition. Thus, more statistical tools and further analyses are required.
Expression stability of candidate reference genes. Based on the relative expression levels of the 11 candidate reference genes, four algorithms such as BestKeeper, geNorm, NormFinder, and Delta Ct were used to examine their expression stability. Subsequently, the RefFinder tool was employed to sequence the expression stability of all these candidate reference genes and select the most suitable ones. geNorm analysis: The geNorm analysis evaluated the stability of all the 11 candidate reference genes using the M value (reference expression stability measure). These M values were calculated from the average variation of the gene relative against other candidate reference genes, and the lower M values indicated a higher gene expression stability 21 . As illustrated in Fig. 2, all the candidate reference genes had different levels of stability under different treatments. Here, the M value of PP2A and SAND was the lowest in most treatments and was deliberated as the most stable reference genes. Besides, ACT showed good stability under H 2 O 2 , MeJA, NaCl, and UV treatments, whereas CYP2 was one of the most stable reference genes in the CuSO 4 and UV treatment groups. On the other hand, the stability of NCBP2 seemed to be less satisfactory than other candidate reference genes. Subsequently, the geNorm algorithm can also determine the optimal number of normalized reference genes 21 by calculating pairwise mutations (V n/n+1 ). Normally, the ideal paired variation (V) score is less than 0.15, which means that the addition of any other gene will not have a substantial impact on standardization. In our study, in the NaCl, Cold, UV, H 2 O 2 , CuSO 4 , and WT subsets, their pairwise variation values (V 2/3 ) were all less than 0.15. This showed that the addition of a third reference gene lacked significant effects on the normalization of the target gene, thus, the number of the optimal reference genes that were determined was two. In contrast, as shown in Fig. 3 and Table S3, the pairwise variation values (V 3/4 ) of the PEG and MeJA subsets, was also less than 0.15. This indicated that the number of reference genes that were most suitable for these two treatments was three.
NormFinder analysis: To normalize raw data and measure the expression stability of candidate reference genes through intra-and inter-group variation, the NormFinder analysis uses the 2 −ΔCt method. Like the geNorm analysis, its lower values show higher stability. Table 2 shows the ranking of all candidate genes as calculated using the NormFinder algorithm. Among the PEG, Cold, and 'total' treatment subsets, it was observed that SAND was the most stable candidate gene, and also ranked higher in other subsets. After MeJA treatment (0.078) on all sample subsets, TUBA was deliberated as the most stable candidate gene. Besides, among the other five groups (NaCl, UV, H 2 O 2 , CuSO 4 , and WT), PTBP (0.670), TIP41 (0.307), GAPDH (0.495), CYP2 (0.483), and UBQ10 (0.451) were observed to be the most stable genes, respectively. Similar to the geNorm analysis, our study found that the NCBP2 was the most unstable candidate gene. www.nature.com/scientificreports/ BestKeeper analysis: Here, the raw data of the Ct value was directly calculated and the more stable candidate reference genes was evaluated by calculating the standard deviation (SD) and the Ct value 22 . Notably, lower SD and Ct values indicated higher gene expression stability, particularly when the SD was greater than 1 which indicated that the reference gene was unstable 23 and could not be utilized for normalization. Table 3 shows that in the NaCl, MeJA, UV, WT, and 'total' treatment subsets, the PTBP gene was considered the most stable. However, the MeJA and 'total' treatment subsets were postulated to be expression-insensitive since their SD values were exceed 1 and could not be used for normalization. For this reason, such values should be excluded. Besides,   Table 3, it was observed that the NCBP2 gene was still the most unstable under all treatments. Delta Ct analysis: This method evaluated gene expression stability by calculating the mean standard deviation (SD) of each gene. Here, the smaller the value, the higher the stability 24 . As shown in Table S4, the results of this analysis are consistent with the geNorm analysis. The only difference is that in the Cold and H 2 O 2 subsets, TUBA and PP2A candidate genes are the most stable respectively. According to the geNorm analysis, the two most stable candidate genes are SAND and ACT . Hence, SAND and PP2A are the most qualified reference genes.
RefFinder analysis: As shown in Fig. 4, we further calculated the geometric mean of the ranking of each candidate gene using the RefFinder algorithm (http:// 150. 216. 56. 64/ refer enceg ene. php#). This was based on the results obtained from the three statistical algorithms such as geNorm, NormFinder, and BestKeeper. Tables S5 and S6, show the comprehensive index ranking, whereby, the smaller the index, the more stable the gene expression 19 . Figure 3. Determination of the optimal numbers of reference genes for normalization by pairwise variation (V n/n+1 ). Different treatments are marked as square frame with different colors. The cut-off value is 0.15 and used to determine the optimal number of candidate reference genes for qRT-PCR normalization. www.nature.com/scientificreports/ This study showed that SAND and PP2A ranked the highest in most subsets, whereas NCBP2 and TUBA ranked the lowest, making them the most unstable reference genes. In contrast, in the MeJA and Cold subsets, TUBA seemed to be a relatively stable reference gene. Despite the different assessment methods, this resulting difference is reasonable and acceptable. In summary, the stability of these 11 candidate reference genes from the highest to the lowest is: SAND, PP2A, PTBP, ACT , CYP2, EXP-1, GAPDH, UBQ10, TIP41, TUBA, and NCBP2. These results were similar to those obtained from the geNorm and NormFinder analysis, but slightly different from those of the BestKeeper analysis.

Discussion
Currently, a majority of medicinal plants enhance the bioactivity of their medicinal components by inducing and regulating functional genes present in their biosynthetic pathway 25 . Several reports have shown that in plants, there is a significant correlation between the synthesis and accumulation of secondary metabolites and the expression levels found in the functional pathways of their biosynthetic pathways 26,27 . Coumarins are one of the main active components extracted from A. decursiva, which is one of the main sources of coumarin found in China and is listed as a special coumarin resource by the Chinese Pharmacopoeia. However, there exist no previous reports on RT-qPCR studies, which confines the research on the biosynthetic pathway of coumarin compounds from A. decursiva.
Here, we screened a total of 11 potential reference genes from the transcriptome data of A. decursiva (unpublished). First, as shown in Table 1 and S1, their primer specificity and PCR experimental conditions were confirmed, whereby the single peak of the dissolution curve exposed the specificity of these 11 pairs of primers (Fig. S1). As illustrated in Fig. 1 and Table S2, the average Ct-values for the 11 candidate reference genes is between 8.04 and 32.65, and the majority of these values were found to be between 23 and 27, which means that most of the 11 candidate reference genes were likely to afford accurate normalization 20 . The SAND and PTBP genes have the lowest variability, which suggests that they could have stable expression levels under different treatments. Conversely, considering the complexity of the ambient environment, the expression stability of these reference genes under different conditions must be investigated further 28 . Therefore, there is a need to use more statistical tools and conduct further analyses.
To ensure the accuracy and reliability of experimental data, four commonly used statistical algorithms (geNorm, NormFinder, BestKeeper, and Delta Ct) were combined for reference gene selection. Subsequently, RefFinder analysis was used to calculate the geometric ranking mean of each candidate reference gene, and then a comprehensive index ranking was done. Here, the results from this analysis showed that the ranking results acquired using different statistical algorithms were not similar, and the candidate reference genes also showed different levels of stability under different treatments. For example, Table 2 shows that TUBA is the best reference gene in the MeJA subset, and this is consistent with the results obtained when Atropa belladonna 29 was exposed to different hormone treatments which showed that it had the most stable expression. On the other hand, when this gene was exposed to other treatments, its stability was unsatisfactory. This is most likely because some genes belonging to the tubulin family and that express isomers in specific developmental stages or tissues are regulated by different developmental processes 30 . Similar to the results from tomato 31 and Arabidopsis thaliana 32 , UBQ10 is the most stable gene in the WT group (Table 2). Also, the PTBP gene performed well in the UV subset, unlike in the Cold, PEG, and CuSO 4 which is illustrated on Table 2 and S5). This result contrasted with the most stable performance of the PTBP gene in Cold, PEG, and CuSO 4 subsets in Peucedanum praeruptorum 20 . Table 3 shows that under cold stress (4 °C) and the 'total' group, the TUBA and PTBP genes were the most stable reference genes, respectively, but this was different from the results obtained from geNorm and NormFinder analyses which showed that SAND was the gene with the best stability. Since various calculation principles are inconsistent, even under the same conditions, the ranking of these genes could be different. Therefore, the single software analysis has certain disadvantages when evaluating the stability of the reference gene. Our study recommends that a comprehensive evaluation and analysis need to be used to ensure the reliability and accuracy of the results. Besides, it is necessary to flexibly select the most appropriate reference gene for different experimental treatments.
Also, bearing in mind that a single reference gene could cause errors in the expression level of the target gene 33,34 , the geNorm statistical algorithm calculates the pairwise variation (V n/n+1 ) to attain the optimal number of reference genes 35 . Notably, the ideal pairwise variation value must be lower than the critical value of 0.15 33,36 . For instance, as illustrated in Fig. 3 and Table S3, the pairwise variation V 2/3 in the Cold, NaCl, CuSO 4 , H 2 O 2 , UV, WT, and the 'total' treatment subsets, is less than 0.15. This demonstrated that only two genes are required to complete the gene expression normalization under these conditions. Consequently, it was observed that the PP2A and ACT are relatively stable reference genes under the treatment of MeJA, their pairing value V 3/4 was < 0.15, and their optimal number of reference genes was three.
Taken together, the analysis results from the three algorithms (geNorm, NormFinder, and Delta Ct) seemed to be consistent, whereas those of BestKeeper are different. Figure 4 and Table S5, confirm that under most external conditions, the stability of SAND and PP2A genes are significantly better than that of other reference genes, whereas NCBP2 and TUBA are nearly always defined as the most unstable reference genes. Of note, the SAND and PP2A are new reference genes that were identified through screening the data from the Arabidopsis gene chip. To date, many studies choose the SAND and PP2A genes for their reference gene standardization studies. For instance, studies that used Arabidopsis thaliana 32 , tomato 37 , Petunia hyrbrida 38 , Peucedanum praeruptorum 20 , etc. Conventionally, reference genes, such as TUBA, GAPDH, UBQ10, and ACT play a housekeeping role in the maintenance of cell structure or primary metabolic activities. Nevertheless, increasing research indicates that the expression levels in most of these traditional reference genes can somehow vary greatly and in many species, they are unsuitable for gene normalization under specific conditions 39,40 . Therefore, given a subset of experimental conditions, the housekeeping genes used as reference genes in each different species should be handled carefully 41,42 . Primer design and RT-qPCR conditions. Here, a total of 11 candidate reference genes were selected from the transcriptome data of A. decursiva (unpublished). Next, the built-in program in the Bioedit Sequence Alignment Editor (v7.0.9) was used to screen and select potential single genes using local blast (TBLASTN). Subsequently, the corresponding homologs of these reference genes were selected from the database of The Arabidopsis Information Resource (TAIR) (http:// www. arabi dopsis. org), and only single genes with lower E-values and higher scores were selected for subsequent analysis. Table S1, shows the reference gene IDs, homologous loci, gene sequences, and the different expression levels (FPKM, Fragments Per Kilobase per Million) of all the 11 candidate reference genes.

Materials and methods
To avoid amplification of any contaminating genomic DNA, primers were designed to cross at least one intron/exon border that contained both donor and acceptor sites, and then exon analysis was performed using the AlignX program in the vector NTI advance 11.5 package. Subsequently, the Primer3Plus (http:// prime r3plus. com/ cgi-bin/ dev/ prime r3plus. cgi) was used to design primers with the following characteristics: amplicon length was 100 to 150 bp, GC content was between 40 and 60%, primer length was 18 to 24 bp, temperature difference of each primer pair was less than 1 °C, and the melting temperature (Tm) was between 55 and 60 °C. Consequently, all the primer pairs were tested using conventional PCR to determine the combination of forward and reverse primers that performed optimally and the resulting products were examined using 1.0% agarose gel electrophoresis. Besides, from a series of standard curves of five different cDNA dilutions, the amplification efficiency (E) and correlation coefficient (R 2 ) were calculated. Table 1 lists all the gene-specific primer pairs that were designed and used in the RT-qPCR analysis.

Stability evaluation of candidate reference genes.
To obtain the RT-qPCR data, three biological and technical replicates were done for each sample, and all data presented as mean ± standard error of the mean (SEM). Consequently, statistical analyses were performed using the Student's t-test. Next, representative graphs were generated using OriginPro 9.1 (OriginLab Corporation, Northampton, MA, USA). Subsequently, data analysis was performed using GeNorm (ver.3.5), NormFinder (ver.0.953), BestKeeper (ver.1.0), and the Delta Ct method following the instructions from the manufacturer. Lastly, a comprehensive stability ranking analysis was performed using RefFinder (http:// 150. 216. 56. 64/ refer enceg ene. php).

Conclusions
Selecting appropriate reference genes is a significant prerequisite for quantifying gene expression using RT-qPCR. This is a conducive research on the key enzyme genes that are involved in the biosynthesis of coumarins and other interesting secondary metabolites found in A. decursiva. So far, this is the first systematic screening experiment of the most suitable reference genes for A. decursiva under various external treatments, like cold stress (4 °C), drought stress (20% PEG), Methyl jasmonate (MeJA), salt stress (0.5 M NaCl), oxidative stress (H 2 O 2 ), ultraviolet (UV) induction, metal stress (CuSO 4 ), untreated (WT), and 'total' (all treated samples). Our results have exposed that the 11 candidate genes have different stability in A. decursiva when exposed to different experimental treatments. From the overall stability ranking, SAND is the most stable candidate reference gene, followed by PP2A and then PTBP. On the other hand, the NCBP2 gene has the lowest stability making it unsuitable for further research. In summary, the reference genes evaluated in this study can be helpful for accurate normalization of the RT-qPCR data and any other future work on the gene expression of coumarin synthesis present in A. decursiva.

Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.