Introduction

Angelica decursiva (Miq.) Franch. et Sav. is an important traditional medicinal plant form the genus Angelica, family Umbelliferae. The major bioactive components of this herbal drug are coumarins and this drug has been listed as one of the main special coumarin sources by the Chinese Pharmacopoeia. Coumarin has a core structure (2H-1-benzopyran-2-one core), that widely exists in higher plant species like the Rutaceae, Leguminosae, and Umbelliferae. Besides, they are slightly distributed in both animals and microorganisms1. Recent pharmacological studies have proved that the 70% ethanol extract of A. decursiva has an anti-inflammatory activity2,3, antioxidant4,5, inhibitory effect on C6 rat glioma cells6 and inhibits the proliferation of osteogenic sarcoma cells7. Hence, certifying the supply of coumarin compounds from the drug has been a long-term practice8. Conversely, there is a huge shortage of coumarin raw materials, which has been attributed to low abundance, season, or region-dependent sourcing. Also, because of its complicated structure and multiple chiral centers9, there lacks a feasible method that has the capacity of mass chemical synthesis. Therefore, genetic and metabolic engineering strategies used in the production of coumarin compounds have become the main developing research direction. Subsequently, the study of functional genes that are present in their biosynthetic pathways is a study of interest.

In recent years, the rapid development of next-generation sequencing (NGS) technologies such as Roche / 454 and Illumina HiSeq platforms, has made it possible to study the distribution of mRNA and their expression levels in different tissues and cells. Currently, due to its high sensitivity, accurate quantification, and strong specificity, the RT-qPCR technology has become one of the most powerful tools for studying gene transcription levels. Besides, it quantifies the relative abundance and improves the quantitative accuracy of target genes10,11,12. Nonetheless, the accuracy of this technique is affected by several factors13, like the amplification efficiency of the target gene, the RNA yield, quality, and the reverse transcription efficiency that is present in different samples. Therefore, to overcome experimental errors and assure accuracy of experimental results, a certain number of internal reference genes, also called housekeeping genes, are often selected for calibration and standardization14,15. There genes are usually used as a reference for either tissues or cells when they are stably expressed in a diversity of experimental conditions, and when changes are detected in the expression level of a target gene. Among the most commonly used internal reference genes for RT-qPCR are Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), Elongation factor-1α (EF-1α), Tubulin (TUB), β-Actin (ACT), 18S ribosomal RNA protein (18S), Polyubiquitin 10 (UBQ10), etc. These housekeeping genes play a significant role in the cell structure maintenance and during primary metabolic activities. Moreover, more internal reference genes have been identified from the gene expression chip data. They include protein phosphatase 2A (PP2A), SAND family protein (SAND), Tap42-interacting protein of 41 kDa (TIP41), etc. Consequently, the reference gene screening and evaluation have been accomplished for most species, such as rice16, Lycoris aurea17, radish18, potato19, Peucedanum praeruptorum20, etc. However, no study has demonstrated a systematic selection of reference genes in A. decursiva under external challenges (abiotic stress and hormone treatments). The study of the coumarin biosynthesis pathway is one of our primary research interests relates. Here, previous studies have characterized some candidate genes that associate with A. decursiva8. However, it is imperative to find suitable reference genes, that enhance the detection of differential gene expression levels under various experimental conditions in A. decursiva using the RT-qPCR application.

In our study, 11 candidate reference genes (SAND, PP2A, PTBP, ACT, CYP2, EXP-1, GAPDH, TUBA, NCBP2, UBQ10, and TIP41) were selected about the transcriptome sequencing datasets of A. decursiva. Besides, 7 different forms of treatments such as cold (4 °C), drought (20% PEG 6000), methyl jasmonate (25 mM MeJA), salt (600 mM NaCl), oxidative (50 mM H2O2), ultraviolet (UV) induction, and metal (500 mM CuSO4) were set. Next, the application of four statistical algorithms (geNorm, NormFinder, BestKeeper, and Delta Ct) to evaluate their expression stability for normalization, and comprehensive stability ranking was also performed by RefFinder. In conclusion, our work provides a basis for further studies on gene expression profiling and the regulation mechanisms of coumarin biosynthesis in A. decursiva under diverse experimental conditions.

Results

Selection of candidate reference genes, evaluation of amplification specificity and PCR efficiency

Here, 11 candidate reference genes were selected based on the transcriptome data of A. decursiva (unpublished). Table 1 lists all the candidate reference gene names and abbreviations, homologous genes from Arabidopsis, primer sequences, amplification length, annealing temperature (°C), PCR efficiency (E), and correlation coefficient (R2). Next, conventional PCR and RT-qPCR were used to verify the primer-specific amplification of all candidate reference genes. As illustrated in Fig. S1, based on agarose gel electrophoresis and melting curve analysis which showed a peak, the 11 primer pairs were amplified by a single specific amplicon. Here, the amplification efficiency range of CYP2 and PP2A was 1.711 and 1.880 respectively whereas the correlation coefficient range of UBQ10 was 0.888, and that of PP2A, PTBP, TIP41, and ACT was 1.000.

Table 1 Details of primers used in this study.

Expression profile of candidate reference genes

Here, the cycle threshold values (Cp) showed the number of cycles when the generated fluorescent signal reached a level that could be detected. Therefore, in this study, the expression profile of the candidate reference genes were directly determined using the obtained Cp values. As shown in Fig. 1, the average Cp values of these 11 reference genes are distributed between 10 and 30, and a majority of them are distributed between 23 and 27. Among all the candidate reference genes, NCBP2 had the lowest average Cp values, whereas CYP2, PTBP, and TUBA had the highest average Cp values. Moreover, each reference gene had a different coefficient of variation under different conditions. Also, it was observed that SAND and PTBP had the lowest variability, whereas TUBA, which had the highest Cp value, had the highest variability. This value ranged between 25 and 32. Therefore, it may not be qualified as a reference gene. Through the Cp value combined with box-plot not only display the expression profile of the reference gene, but also give us a glimpse in their stability (Table S2). However, considering the complexity of their surroundings, we need to check the proper use of the references in every particular experimental condition. Thus, more statistical tools and further analyses are required.

Figure 1
figure 1

The RT-qPCR Cp values for 11 candidate reference genes in all samples. The expression data is shown as the Cp value of each reference gene in the samples of A. decursiva. Boxes indicate the 25th/75th percentiles, the lines represent the median, squares represent the means and whiskers represent the maximum and minimum values.

Expression stability of candidate reference genes

Based on the relative expression levels of the 11 candidate reference genes, four algorithms such as BestKeeper, geNorm, NormFinder, and Delta Ct were used to examine their expression stability. Subsequently, the RefFinder tool was employed to sequence the expression stability of all these candidate reference genes and select the most suitable ones.

geNorm analysis: The geNorm analysis evaluated the stability of all the 11 candidate reference genes using the M value (reference expression stability measure). These M values were calculated from the average variation of the gene relative against other candidate reference genes, and the lower M values indicated a higher gene expression stability21. As illustrated in Fig. 2, all the candidate reference genes had different levels of stability under different treatments. Here, the M value of PP2A and SAND was the lowest in most treatments and was deliberated as the most stable reference genes. Besides, ACT showed good stability under H2O2, MeJA, NaCl, and UV treatments, whereas CYP2 was one of the most stable reference genes in the CuSO4 and UV treatment groups. On the other hand, the stability of NCBP2 seemed to be less satisfactory than other candidate reference genes. Subsequently, the geNorm algorithm can also determine the optimal number of normalized reference genes21 by calculating pairwise mutations (Vn/n+1). Normally, the ideal paired variation (V) score is less than 0.15, which means that the addition of any other gene will not have a substantial impact on standardization. In our study, in the NaCl, Cold, UV, H2O2, CuSO4, and WT subsets, their pairwise variation values (V 2/3) were all less than 0.15. This showed that the addition of a third reference gene lacked significant effects on the normalization of the target gene, thus, the number of the optimal reference genes that were determined was two. In contrast, as shown in Fig. 3 and Table S3, the pairwise variation values (V 3/4) of the PEG and MeJA subsets, was also less than 0.15. This indicated that the number of reference genes that were most suitable for these two treatments was three.

Figure 2
figure 2

Average expression stability values (M) of the 11 candidate reference genes using geNorm software. Expression stability was evaluated in samples from A. decursiva was submitted to cold stress, drought stress (20% PEG), Methyl jasmonate (MeJA) stress, salt stress (0.5 M NaCl), oxidative stress (H2O2), ultraviolet (UV) induction, Metal stress (CuSO4), untreated (WT) and 'total' (all treated samples). The least stable genes are on the left with higher M-value and the most stable genes on the right with lower M-value.

Figure 3
figure 3

Determination of the optimal numbers of reference genes for normalization by pairwise variation (Vn/n+1). Different treatments are marked as square frame with different colors. The cut-off value is 0.15 and used to determine the optimal number of candidate reference genes for qRT-PCR normalization.

NormFinder analysis: To normalize raw data and measure the expression stability of candidate reference genes through intra- and inter-group variation, the NormFinder analysis uses the 2−ΔCt method. Like the geNorm analysis, its lower values show higher stability. Table 2 shows the ranking of all candidate genes as calculated using the NormFinder algorithm. Among the PEG, Cold, and 'total' treatment subsets, it was observed that SAND was the most stable candidate gene, and also ranked higher in other subsets. After MeJA treatment (0.078) on all sample subsets, TUBA was deliberated as the most stable candidate gene. Besides, among the other five groups (NaCl, UV, H2O2, CuSO4, and WT), PTBP (0.670), TIP41 (0.307), GAPDH (0.495), CYP2 (0.483), and UBQ10 (0.451) were observed to be the most stable genes, respectively. Similar to the geNorm analysis, our study found that the NCBP2 was the most unstable candidate gene.

Table 2 Stability rank of 11 candidate genes by NormFinder.

BestKeeper analysis: Here, the raw data of the Ct value was directly calculated and the more stable candidate reference genes was evaluated by calculating the standard deviation (SD) and the Ct value22. Notably, lower SD and Ct values indicated higher gene expression stability, particularly when the SD was greater than 1 which indicated that the reference gene was unstable23 and could not be utilized for normalization. Table 3 shows that in the NaCl, MeJA, UV, WT, and 'total' treatment subsets, the PTBP gene was considered the most stable. However, the MeJA and 'total' treatment subsets were postulated to be expression-insensitive since their SD values were exceed 1 and could not be used for normalization. For this reason, such values should be excluded. Besides, under H2O2, CuSO4, Cold, and PEG treatments, the most stable genes were observed to be CYP2, SAND, TUBA, and PP2A. Generally, as illustrated in Table 3, it was observed that the NCBP2 gene was still the most unstable under all treatments.

Table 3 Rank of 11 candidate genes by BestKeeper.

Delta Ct analysis: This method evaluated gene expression stability by calculating the mean standard deviation (SD) of each gene. Here, the smaller the value, the higher the stability24. As shown in Table S4, the results of this analysis are consistent with the geNorm analysis. The only difference is that in the Cold and H2O2 subsets, TUBA and PP2A candidate genes are the most stable respectively. According to the geNorm analysis, the two most stable candidate genes are SAND and ACT. Hence, SAND and PP2A are the most qualified reference genes.

RefFinder analysis: As shown in Fig. 4, we further calculated the geometric mean of the ranking of each candidate gene using the RefFinder algorithm (http://150.216.56.64/referencegene.php#). This was based on the results obtained from the three statistical algorithms such as geNorm, NormFinder, and BestKeeper. Tables S5 and S6, show the comprehensive index ranking, whereby, the smaller the index, the more stable the gene expression19. This study showed that SAND and PP2A ranked the highest in most subsets, whereas NCBP2 and TUBA ranked the lowest, making them the most unstable reference genes. In contrast, in the MeJA and Cold subsets, TUBA seemed to be a relatively stable reference gene. Despite the different assessment methods, this resulting difference is reasonable and acceptable. In summary, the stability of these 11 candidate reference genes from the highest to the lowest is: SAND, PP2A, PTBP, ACT, CYP2, EXP-1, GAPDH, UBQ10, TIP41, TUBA, and NCBP2. These results were similar to those obtained from the geNorm and NormFinder analysis, but slightly different from those of the BestKeeper analysis.

Figure 4
figure 4

Comprehensive ranking of BestKeeper, NormFinder, and geNorm. Using BestKeeper, NormFinder, and geNorm algorithms, the 11 candidate reference genes were ranked comprehensively, and the geometric mean of each reference gene was calculated.

Discussion

Currently, a majority of medicinal plants enhance the bioactivity of their medicinal components by inducing and regulating functional genes present in their biosynthetic pathway25. Several reports have shown that in plants, there is a significant correlation between the synthesis and accumulation of secondary metabolites and the expression levels found in the functional pathways of their biosynthetic pathways26,27. Coumarins are one of the main active components extracted from A. decursiva, which is one of the main sources of coumarin found in China and is listed as a special coumarin resource by the Chinese Pharmacopoeia. However, there exist no previous reports on RT-qPCR studies, which confines the research on the biosynthetic pathway of coumarin compounds from A. decursiva.

Here, we screened a total of 11 potential reference genes from the transcriptome data of A. decursiva (unpublished). First, as shown in Table 1 and S1, their primer specificity and PCR experimental conditions were confirmed, whereby the single peak of the dissolution curve exposed the specificity of these 11 pairs of primers (Fig. S1). As illustrated in Fig. 1 and Table S2, the average Ct-values for the 11 candidate reference genes is between 8.04 and 32.65, and the majority of these values were found to be between 23 and 27, which means that most of the 11 candidate reference genes were likely to afford accurate normalization20. The SAND and PTBP genes have the lowest variability, which suggests that they could have stable expression levels under different treatments. Conversely, considering the complexity of the ambient environment, the expression stability of these reference genes under different conditions must be investigated further28. Therefore, there is a need to use more statistical tools and conduct further analyses.

To ensure the accuracy and reliability of experimental data, four commonly used statistical algorithms (geNorm, NormFinder, BestKeeper, and Delta Ct) were combined for reference gene selection. Subsequently, RefFinder analysis was used to calculate the geometric ranking mean of each candidate reference gene, and then a comprehensive index ranking was done. Here, the results from this analysis showed that the ranking results acquired using different statistical algorithms were not similar, and the candidate reference genes also showed different levels of stability under different treatments. For example, Table 2 shows that TUBA is the best reference gene in the MeJA subset, and this is consistent with the results obtained when Atropa belladonna29 was exposed to different hormone treatments which showed that it had the most stable expression. On the other hand, when this gene was exposed to other treatments, its stability was unsatisfactory. This is most likely because some genes belonging to the tubulin family and that express isomers in specific developmental stages or tissues are regulated by different developmental processes30. Similar to the results from tomato31 and Arabidopsis thaliana32, UBQ10 is the most stable gene in the WT group (Table 2). Also, the PTBP gene performed well in the UV subset, unlike in the Cold, PEG, and CuSO4 which is illustrated on Table 2 and S5). This result contrasted with the most stable performance of the PTBP gene in Cold, PEG, and CuSO4 subsets in Peucedanum praeruptorum20. Table 3 shows that under cold stress (4 °C) and the 'total' group, the TUBA and PTBP genes were the most stable reference genes, respectively, but this was different from the results obtained from geNorm and NormFinder analyses which showed that SAND was the gene with the best stability. Since various calculation principles are inconsistent, even under the same conditions, the ranking of these genes could be different. Therefore, the single software analysis has certain disadvantages when evaluating the stability of the reference gene. Our study recommends that a comprehensive evaluation and analysis need to be used to ensure the reliability and accuracy of the results. Besides, it is necessary to flexibly select the most appropriate reference gene for different experimental treatments.

Also, bearing in mind that a single reference gene could cause errors in the expression level of the target gene33,34, the geNorm statistical algorithm calculates the pairwise variation (Vn/n+1) to attain the optimal number of reference genes35. Notably, the ideal pairwise variation value must be lower than the critical value of 0.1533,36. For instance, as illustrated in Fig. 3 and Table S3, the pairwise variation V 2/3 in the Cold, NaCl, CuSO4, H2O2, UV, WT, and the 'total' treatment subsets, is less than 0.15. This demonstrated that only two genes are required to complete the gene expression normalization under these conditions. Consequently, it was observed that the PP2A and ACT are relatively stable reference genes under the treatment of MeJA, their pairing value V 3/4 was < 0.15, and their optimal number of reference genes was three.

Taken together, the analysis results from the three algorithms (geNorm, NormFinder, and Delta Ct) seemed to be consistent, whereas those of BestKeeper are different. Figure 4 and Table S5, confirm that under most external conditions, the stability of SAND and PP2A genes are significantly better than that of other reference genes, whereas NCBP2 and TUBA are nearly always defined as the most unstable reference genes. Of note, the SAND and PP2A are new reference genes that were identified through screening the data from the Arabidopsis gene chip. To date, many studies choose the SAND and PP2A genes for their reference gene standardization studies. For instance, studies that used Arabidopsis thaliana32, tomato37, Petunia hyrbrida38, Peucedanum praeruptorum20, etc. Conventionally, reference genes, such as TUBA, GAPDH, UBQ10, and ACT play a housekeeping role in the maintenance of cell structure or primary metabolic activities. Nevertheless, increasing research indicates that the expression levels in most of these traditional reference genes can somehow vary greatly and in many species, they are unsuitable for gene normalization under specific conditions39,40. Therefore, given a subset of experimental conditions, the housekeeping genes used as reference genes in each different species should be handled carefully41,42.

Materials and methods

Samples preparation and treatments

Here, one-year-old plants of A. decursiva were collected from Ningguo City, Anhui Province, China (longitude: 118.95E, latitude: 30.62 N), and was identified as Angelica decursiva (Miq.) Franch. et Sav. by Zhang Ning, school of pharmacy, Jiangsu Health Vocational College, and deposited in the herbarium of the medicinal botanical garden (ID: JSJK-AD-021). The planting field of A. decursiva was a private land and the landowner has allowed us to collect plant materials for further study. The collection of plant material is in full compliance with relevant institutions, national and international guidelines and legislation. Then, these accessions were transplanted into plastic pots that contained a mixture of vermiculite, perlite, and peat moss (1:1:1 v/v). Next, the plants were grown in a greenhouse at a temperature of 25 °C, a long photoperiod of 16 h light and 8 h darkness, 40–65% relative humidity, and 3000 lx light intensity until treated. For drought treatments, a 200 mL solution of 25% PEG 6000 (w / v, polyethylene glycol, Sangon, China) was used to treat the plants for one week. For salt stress treatment, an approximate amount of 200 mL (600 mM) of NaCl was used to water the plants for 7 days. For cold stimulation treatments, the plants were incubated at 4 °C for 48 h. To study hormone therapy, 25 mM MeJA was applied for 6 h. To assess heavy metal stress, 500 mM of Copper sulfate (CuSO4) was applied to the plants for 24 h. In the case of oxidative stress, 50 mM H2O2 was used for 24 h. To induce ultraviolet (UV) light, these plants were irrigated using distilled water (100 mL) and then exposed to ultraviolet light for 24 h. Notably, all the above treatments had three biological replicates. The remaining plants that received no treatment served as controls. Lastly, all samples from each treatment were washed with MINIQ- filtered water, quickly frozen in liquid nitrogen, and stored at − 80 °C.

Total RNA isolation and cDNA synthesis

About 100 mg of the different frozen tissue samples were used to extract total RNA using the Spectrum Plant Total RNA kit (Sigma, USA). Next, the quality and purity of the extracted total RNA was determined using the NanoDrop spectrophotometer 2000 (Thermo Scientific, USA), and its integrity confirmed on a 1.5% agarose gel. Here, only RNA with A260/280 ratios between 1.8 and 2.2 and an A260/230 above 2.0 were used for cDNA synthesis. Subsequently, RNA samples were pre-treated with RNase-free DNase I (Takara Biotechnology, Dalian, China) to remove contaminating traces of genomic DNA and then used for reverse transcription. Lastly, following the instructions from the HiScript Q RT SuperMix for qPCR (Vazyme, China) was used for first-strand cDNA synthesis with oligo (dT) as the primer.

Primer design and RT-qPCR conditions

Here, a total of 11 candidate reference genes were selected from the transcriptome data of A. decursiva (unpublished). Next, the built-in program in the Bioedit Sequence Alignment Editor (v7.0.9) was used to screen and select potential single genes using local blast (TBLASTN). Subsequently, the corresponding homologs of these reference genes were selected from the database of The Arabidopsis Information Resource (TAIR) (http://www.arabidopsis.org), and only single genes with lower E-values and higher scores were selected for subsequent analysis. Table S1, shows the reference gene IDs, homologous loci, gene sequences, and the different expression levels (FPKM, Fragments Per Kilobase per Million) of all the 11 candidate reference genes.

To avoid amplification of any contaminating genomic DNA, primers were designed to cross at least one intron/exon border that contained both donor and acceptor sites, and then exon analysis was performed using the AlignX program in the vector NTI advance 11.5 package. Subsequently, the Primer3Plus (http://primer3plus.com/cgi-bin/dev/primer3plus.cgi) was used to design primers with the following characteristics: amplicon length was 100 to 150 bp, GC content was between 40 and 60%, primer length was 18 to 24 bp, temperature difference of each primer pair was less than 1 °C, and the melting temperature (Tm) was between 55 and 60 °C. Consequently, all the primer pairs were tested using conventional PCR to determine the combination of forward and reverse primers that performed optimally and the resulting products were examined using 1.0% agarose gel electrophoresis. Besides, from a series of standard curves of five different cDNA dilutions, the amplification efficiency (E) and correlation coefficient (R2) were calculated. Table 1 lists all the gene-specific primer pairs that were designed and used in the RT-qPCR analysis.

Stability evaluation of candidate reference genes

To obtain the RT-qPCR data, three biological and technical replicates were done for each sample, and all data presented as mean ± standard error of the mean (SEM). Consequently, statistical analyses were performed using the Student’s t-test. Next, representative graphs were generated using OriginPro 9.1 (OriginLab Corporation, Northampton, MA, USA). Subsequently, data analysis was performed using GeNorm (ver.3.5), NormFinder (ver.0.953), BestKeeper (ver.1.0), and the Delta Ct method following the instructions from the manufacturer. Lastly, a comprehensive stability ranking analysis was performed using RefFinder (http://150.216.56.64/referencegene.php).

Conclusions

Selecting appropriate reference genes is a significant prerequisite for quantifying gene expression using RT-qPCR. This is a conducive research on the key enzyme genes that are involved in the biosynthesis of coumarins and other interesting secondary metabolites found in A. decursiva. So far, this is the first systematic screening experiment of the most suitable reference genes for A. decursiva under various external treatments, like cold stress (4 °C), drought stress (20% PEG), Methyl jasmonate (MeJA), salt stress (0.5 M NaCl), oxidative stress (H2O2), ultraviolet (UV) induction, metal stress (CuSO4), untreated (WT), and 'total' (all treated samples). Our results have exposed that the 11 candidate genes have different stability in A. decursiva when exposed to different experimental treatments. From the overall stability ranking, SAND is the most stable candidate reference gene, followed by PP2A and then PTBP. On the other hand, the NCBP2 gene has the lowest stability making it unsuitable for further research. In summary, the reference genes evaluated in this study can be helpful for accurate normalization of the RT-qPCR data and any other future work on the gene expression of coumarin synthesis present in A. decursiva.