Spatio-temporal selection of reference genes in the two congeneric species of Glycyrrhiza

Glycyrrhiza, a genus of perennial medicinal herbs, has been traditionally used to treat human diseases, including respiratory disorders. Functional analysis of genes involved in the synthesis, accumulation, and degradation of bioactive compounds in these medicinal plants requires accurate measurement of their expression profiles. Reverse transcription quantitative real-time PCR (RT-qPCR) is a primary tool, which requires stably expressed reference genes to serve as the internal references to normalize the target gene expression. In this study, the stability of 14 candidate reference genes from the two congeneric species G. uralensis and G. inflata, including ACT, CAC, CYP, DNAJ, DREB, EF1, RAN, TIF1, TUB, UBC2, ABCC2, COPS3, CS, R3HDM2, were evaluated across different tissues and throughout various developmental stages. More importantly, we investigated the impact of interactions between tissue and developmental stage on the performance of candidate reference genes. Four algorithms, including geNorm, NormFinder, BestKeeper, and Delta Ct, were used to analyze the expression stability and RefFinder, a comprehensive software, provided the final recommendation. Based on previous research and our preliminary data, we hypothesized that internal references for spatio-temporal gene expression are different from the reference genes suited for individual factors. In G. uralensis, the top three most stable reference genes across different tissues were R3HDM2, CAC and TUB, while CAC, CYP and ABCC2 were most suited for different developmental stages. CAC is the only candidate recommended for both biotic factors, which is reflected in the stability ranking for the spatio (tissue)-temporal (developmental stage) interactions (CAC, R3HDM2 and DNAJ). Similarly, in G. inflata, COPS3, R3HDM2 and DREB were selected for tissues, while RAN, COPS3 and CS were recommended for developmental stages. For the tissue-developmental stage interactions, COPS3, DREB and ABCC2 were the most suited reference genes. In both species, only one of the top three candidates was shared between the individual factors and their interactions, specifically, CAC in G. uralensis and COPS3 in G. inflata, which supports our overarching hypothesis. In summary, spatio-temporal selection of reference genes not only lays the foundation for functional genomics research in Glycyrrhiza, but also facilitates these traditional medicinal herbs to reach/maximize their pharmaceutical potential.

www.nature.com/scientificreports/ genes under different experimental conditions and between the two congenic Glycyrrhiza species, and finally, we summarized the reference genes previously used within Leguminosae plants.

Materials and methods
Plant materials and growth conditions. Two-year-old licorice plants (G. uralensis and G. inflata) were grown in the test field at the Northwest Biological Agricultural Center, Chinese Academy of Sciences (Ningxia, China). Roots, rhizomes and leaves were collected in April (returning green), May (rapid growth and flowering), July (seed setting), and October (aging) (Fig. 1). All samples were flash frozen in liquid nitrogen, shipped to Guangzhou in dry ice and stored at -80 ℃ for RNA extraction. All experiments were carried out with three biological replicates.
RT-qPCR analysis. The  Selection of optimal reference genes under different experimental conditions. The RefFinder, a comprehensive system to integrate the currently available major computational programs (geNorm, Normfinder, BestKeeper, and Delta Ct method) to compare and rank the stability of candidate reference genes, was used for the overall ranking of the candidate reference genes 34 . Based on the rankings from the Microsoft Excelbased computational programs, RefFinder assigns an appropriate weight to an individual gene and calculates the geometric mean of their weights for the overall final ranking.
Comparison of the suitable reference genes under different experimental conditions and between the two congenic Glycyrrhiza species. The top three most suitable reference genes selected by RefFinder under the conditions of different tissues, developmental stages, and tissue × developmental stages were compared; and the suitable reference genes under the same experimental condition between the two congenic Glycyrrhiza species were also analyzed. The results were visualized by Venn Diagrams, and it was plotted using the OmicShare tools, a free online platform for data analysis (www.omics hare.com/tools ).

Validation of recommended reference genes.
To confirm the suitability of the reference genes recommended in the present study, we measured the differential expression of a specific licorice gene with a known expression profile under different tissues. Licorice β-amyrin synthase (β-AS, GenBank Accession Number: FJ627179), is a key gene in glycyrrhizin biosynthesis and mainly expressed in root organs 8,27 , we thus chose to validate the reliability of the selected reference genes in different treatment conditions. The primers of β-AS used for RT-qPCR were listed in Table 1. The expression level of β-AS was analyzed by seven normalization ways, including the most stable, the top two most stable, the top three most stable, the least stable, the top two least stable, the top three least stable, and all the candidate reference genes. One way ANOVA was carried out to evaluate the expression level of β-AS under different normalization conditions (SPSS statistics 22.0 software, IBM, United States).

Survey of the reference genes used within Leguminosae plants.
Reference genes reported previously in Leguminosae species was selected by searching "NCBI" or "Web of Science" with the key words "Legu-

Results
Primer specificity and RT-qPCR amplification efficiency. The specificity of the primers for all 14 candidate reference genes (ACT , CAC , CYP, DNAJ, DREB, EF1, RAN, TIF1, TUB, UBC2, ABCC2, COPS3, CS, R3HDM2) were determined by a single PCR product of expected size, and further confirmed by a single peak in the melting-curve analysis. The efficiency of the PCR amplification (E) was calculated from the standard curve by making a dilution series with mixed samples. The E value of the reported reference genes ranged from 67.28% (CAC ) to 116.84% (CYP) and correlation coefficients varied from 0.9526 (CAC ) to 0.9976 (CYP) ( Table 1). Three of them, CAC , TIF1 and TUB, presented unmatched E values which did not meet the requirements (90%-110%) for RT-qPCR analysis. The E value of the selected candidate reference genes from RNA-Seq data ranged from 90.63% (CAC ) to 107.58% (CYP) and correlation coefficients varied from 0.9887 (CS) to 0.9963 (COPS3) ( Table 1), and all these primers are suitable for RT-qPCR analysis 32 .
Expression profiling of candidate reference genes. The expression levels of the candidate reference genes were determined as quantification cycle (Cq) values, and the transcripts of these genes showed different levels of abundance in G. uralensis and G. inflata (Tables S1 and S2). The mean Cq values of the genes ranged from 20-27, with the majority lying between 23 and 26 across all tested samples in G. uralensis, and the mean Cq values of the genes ranged from 19-25, with the majority lying between 22 and 25 across all tested samples in G. inflata (Fig. 2, Tables S3 and S4). So the expression level of these candidate genes were much higher in tested samples in G. inflata than in G. uralensis, and they were more stable in G. inflata than in G. uralensis. TIF1 had the lowest Cq both in G. uralensis (mean Ct of 20.30) and G. inflata (mean Cq of 19.94), indicating the highest expression level of TIF1 in the two species, while EF1 (mean Cq of 27.50) or DREB (mean Cq of 25.56) was expressed at low levels in G. uralensis or G. inflata, respectively (Tables S3 and S4).
CAC showed the least gene expression variation both in G. uralensis (coefficient of variation (CV) of 6.74%) and G. inflata (CV of 2.92%), while surprisingly, a commonly used reference gene, ACT , was the most variable across all samples (CV of 10.45%) in G. uralensis (Table S3), and TIF1 (9.60%) was the most variable across all samples in G. inflata (Table S4).
Expression stability of candidate reference genes. The expression profiles of the 14 candidate reference genes in G. uralensis and G. inflata roots, rhizomes, and leaves across all experiments in this study were analyzed using geNorm, NormFinder, BestKeeper, Delta CT, and RefFinder (Tables 2 and 3).
For different developmental stages, four key growth periods of licorice were selected and measured, including Returning green stage in April; Rapid growth and flowering stage in May; Seed setting stage in July; and Senescence stage in October (Fig. 1). In G. uralensis, CAC , CYP, COPS3 were the most stable reference genes recommended by geNorm, whereas CAC , ABCC2, CYP by NormFinder, DNAJ, COPS3, CAC by the BestKeeper,   (Table 2). In G. inflata, the top three most stable candidate reference genes were RAN, COPS3, CS identified by all four methods, geNorm, NormFinder, BestKeeper and Delta CT (Table 3). For different tissues, three tissues (the roots, rhizomes and leaves) were tested (Fig. 1). In G. uralensis, the top three most stable reference genes recommended by geNorm and NormFinder were CAC , R3HDM2 and TUB, while COPS3, RAN and TIF1 by BestKeeper, R3HDM2, TUB and CAC by Delta CT (Table 2). In G. inflata, DREB, R3HDM2 and TUB were the most stable reference genes recommended by geNorm, COPS3, ABCC2, R3HDM2 by NormFinder, CAC , UBC2 and DNAJ by BestKeeper, COPS3, R3HDM2 and ABCC2 by Delta CT (Table 3).

Cq value
Because the growth, development, and metabolite accumulation of all the living organisms are certainly influenced by multiple factors, therefore, we also studied the optimal reference genes under the spatial-temporal

Selection of optimal reference genes under different experimental conditions. Based on
RefFinder, a web-based software, comprehensive ranking of reference genes integrating all four software was  (Tables 2 and 3). At different experimental stages, CAC , CYP, ABCC2 were identified as the top three most stable reference genes in the G. uralensis (Fig. 3A, Table 2), while RAN, COPS3, CS were identified in G. inflata (Fig. 3B, Table 3). For the different tissues, R3HDM2, CAC , TUB were identified as the most stable reference genes in the G. uralensis (Fig. 3A, Table 2), while COPS3, R3HDM2, DREB were identified in the G. inflata (Fig. 3B, Table 3). Under the spatial-temporal interaction conditions in G. uralensis, CAC , R3HDM2, DNAJ were identified as the most stable reference genes in the G. uralensis (Fig. 3A, Table 2), while COPS3, DREB, ABCC2 were identified in the G. inflata (Fig. 3B, Table 3). The optimal number of reference genes under each experimental conditions required for reliable normalization in two species were predicted by geNorm software with the V values (cutoff = 0.15). When pairwise variations Vn/n + 1 < 0.15, it means that an addition reference gene (n + 1) is not necessary. For all the experimental conditions in G. uralensis the first V-value less than 0.15 occurred at V2/3, suggesting that two reference genes were adequate to correctly normalize gene expression. But for spatial-temporal interaction conditions in the G. inflata, more than two reference genes was necessary suggested by V-value for accurate normalization, the first V-value less than 0.15 occurred at V4/5 (Fig. 4).

Comparison of the suitable reference genes under different experimental conditions and between the two congenic Glycyrrhiza species. The summary of the top three most suitable reference
genes under all experimental condition showed that seven genes appeared in the top three list in G. uralensis, among which CAC and R3HDM2 showed the highest recommended frequency (33.33% and 22.22%, respectively) (Fig. 3A). In G. inflata, seven genes were also appeared in the top three lists, and COPS3 and DREB presented the highest recommended frequency, with the frequency of 33.33% and 22.22%, respectively (Fig. 3B).
Comparison of the suitable reference genes under different experimental conditions showed only one of the top three candidates was shared between the individual factors and their interactions, specifically, CAC in G. uralensis and COPS3 in G. inflata. Therefore, CAC was the most stable reference gene in G. uralensis under all the experimental conditions tested, while COPS3 was the most stable in G. inflata.
For the comparison of the suitable reference genes between the two congenic Glycyrrhiza species, we found R3HDM2 was the only suitable reference gene shared between G. uralensis and G. inflata in different tissues, and no consistent reference gene was found under different developmental stage and tissue and developmental stage interactions between the two congenic Glycyrrhiza species. So the optimal reference genes for different species are variable, even for the two proximal species in the same genus.

Validation of recommended reference genes.
A root and rhizome-specific gene, β-AS, a key gene in glychrizin biosynthesis was used to validate the selected reference genes. To validate the reference geneto study expressin pattern in different tissues, expression of β-AS was normalized to both the most and the least stablecandidate reference genes, both in G.uralensis and G. inflata. When the recommended reference genes were used, the expression levels of β-AS in the roots and the rhizomes were similar and both high, while its expression was significantly reduced in the leaf. However, when normalized to the least suited reference genes, the expression pattern of β-AS changed, or the ratio of expression levels between roots/rhizomes and leaves were significantly enlarged or narrowed (Fig. 5). Unstable reference genes really confuse the results.
Survey of the reference genes used within Leguminosae plants. Reference    The most suited The least suited One Two Three One Two Three All Leaves Roots Rhizomes Figure 5. Validation of the recommended reference genes. Expression profiles of β-AS gene among different tissues were investigated using seven different normalization factors. The most suited, the top two most suited, the top three most suited, the least suited, the top two least suited, the top three least suited, and all the candidate reference genes. Bars represent the means ± standard error of biological replications. Different letters indicate significant differences (P < 0.05). www.nature.com/scientificreports/ A total of ten species (including G. uralensis and G. inflata) had been studied under different developmental stages, among the 22 reference genes recommended, eukaryotic elongation factor (EF, EF1α, and ELF1B) and tubulin (TUA1, TUA2, TUA5, and TUB) were the most choices, and the frequency of recommendation in the 10 species were 15.15% and 12.12%, respectively (Fig. 6). For the different tissues, a total of twelve species had been studied. Among the 20 reference genes recommended, ACT , EF, and UBQ performed particularly well, and they presented recommended frequency of 16.66%, 14.58% and 12.50%, respectively (Fig. 6).

Discussion
Licorice are herbaceous perennial plants with great medical and ecological values. Licorice root extracts have been proved to have anti-carcinogenic 2,3 , anti-inflammatory, anti-fungal, anti-piroplasmic and cytotoxic activities 4 . More recently, glycyrrhizin, the most important bioactive triterpenoid saponin in licorice roots, is under the consideration for treating COVID-19 infection caused respiratory syndrome 5 . Accumulation of its many bioactive compounds is spatio-temporal dependent. To study the functional genes involved in development and biosynthesis of these bioactive compounds, spatio-temporal expression pattern of these genes provides an important piece of information. Here we report a set of reference genes for RT-qPCR analysis to study spatiotemporal expression pattern of genes in two congenic licorice species, G. uralensis and G. inflata.
Candidates screening and optimal reference genes selection. In this study, CAC and COPS3 was one of the top candidates for all three experimental conditions tested in G. uralensis and G. inflata, respectively. COPS3 is a new candidate of reference gene screened from RNA-seq, and it was proved to be the most stable gene under different developmental stages in G. inflata. This result supports the idea that it is feasible to screen reference genes through RNA-seq dataset 35 . The current and emerging RNA-seq data may provide a bigger and even more reliable pool to select candidate reference genes other than the traditional housekeeping genes. Such discoveries are of great significance and should enable greater accuracy of normalization, particularly across diverse plant organs and in other experimental conditions where traditional housekeeping genes display variability in expression.
The suitable reference genes for spatio-temporal gene expression is different from that for individual conditions. Our results showed that different reference genes were selected to study spatial, temporal expression as well as spatial-temporal expression patterns in both species. The most stable reference genes varied in different experiments. In G. uralensis, the top three most stable reference genes across different tissues were R3HDM2, CAC and TUB, while CAC , CYP and ABCC2 were most suited for different developmental stages. Similarly, in G. inflata, COPS3, R3HDM2 and DREB were selected for tissues, while RAN, COPS3 and  www.nature.com/scientificreports/ CS were recommended for developmental stages (Tables 2 and 3). In addition, the optimal reference genes under the condition of two-factor interaction (developmental stages × tissues) are also different from those under single factor conditions. For the tissue-developmental stage interactions, CAC , R3HDM2 and DNAJ were the most suited reference genes in G. uralensis, while COPS3, DREB and ABCC2 in G. inflata. Only one of the top three candidates was shared between the individual factors and their interactions, specifically, CAC in G. uralensis and COPS3 in G. inflata (Fig. 3). Because the expression of genes is constantly under the influences of multiple factors/dimensions, so it is essential for gene function analysis to investigate gene expression under the interacting factors. Our results in this study illustrated that the optimal reference gene for spatio-temporal gene expression is different from that for individual conditions, so every gene expression analysis should begin with validation of reference genes in a given sample set under specific experiment conditions, either under single factor or under multiple factors interacting conditions. The validation of selected reference genes was done by normalizing the expression of β-AS, a key gene involved in glycyrrhizin biosynthesis, and its expression has been proven to be mainly in roots and rhizomes 8,9 . Generally, the expression pattern of β-AS should not be affected by reference gene selection, because the M values of the 14 candidate genes selected in our study were all below 1.5 by geNorm. However, our results showed that the expression pattern of β-AS was quite different when using the unstable reference genes for homogenization compared with the stable reference genes (Fig. 5). Therefore, our results showed that unstable reference genes would confuse the expression pattern while the stable reference genes gave reliable results, and the optimal reference genes screened in this study are reliable.

8 S r R N A A B C T A c ti n C A C C O P S 3 C Y P D R E B E F F b o x G A P D H H E L M s c 2 7 P P 2 A P T B R 3 H D M 2 T 9 7 T u b u li n U B C U K N V P S
The optimal reference genes within Leguminosae species are drastically different. In this study, we also summarized the validated reference genes for different development stages in Leguminosae (including G. uralensis and G. inflata tested in this study). From the results of our survey, we found EF1, TUB and UBQ are commonly used housekeeping genes, which have been identified as the most suitable ones at different developmental stages in several species. EF had been selected as the most suitable reference gene at different developmental stages in E. songoricum 36 , G. max 37 , and C. arietinum 38 . TUB was the optimal reference gene at different experimental stages in E. songoricum 36 , G. max 37 , and H. coronarium 39 . While UBQ was validated as the most stable reference gene at different developmental stages in C. tetragonoloba 40 , H. coronarium 39 , L. angustifolius 41 . For the different tissues in Leguminosae, the most stable reference gene candidates were ACT , EF and UBQ. Among them, ACT was recommended as a stable reference gene in different tissues of C. tetragonoloba 40 , E. songoricum 36 , G. max 37,42 , P. vulgaris 43 . EF was recommended in C. obtusifolia 44 , G. max 37 , M. sativa 45 , and UBQ was recommended in G. inflata (Table 3), E. songoricum 36 , G. max 37 , and L. angustifolius 41 . So, the optimal reference genes for different species in the same family are variable, even for the two proximal species in the same genus (G. uralensis and G. inflata). We found CAC was the most stable reference gene when all factors taken account in G. uralensis, and COPS3 was the optimal reference gene with the highest recommended frequency in G. inflata (Fig. 3). However, EF1 and TIF1 in G. uralensis, and ACT and TIF1 G. inflata were the most unstable reference genes respectively, and it has been proved that it will cause false results using the unstable reference gene for expression normalization (Fig. 5). Among them, EF1 and ACT are commonly used housekeeping gene and have been identified as the most suitable reference genes in several studies 40,43,46 . So our results indicated that it is always necessary to validate reference genes for reliable gene expression analysis. The summary and analysis of the reported legume reference genes will serves as a guide for the subsequent selection of reference genes in Leguminosae plants.

Summary and perspectives.
In this study, we evaluated the expression of 14 candidate reference genes across different tissues (root, rhizome, leaf) at various developmental stages (returning green, April; rapid growth and flowering, May; seed setting, July; and senescence stage, October) in the two congeneric medicinal plants, G. uralensis and G. inflate, respectively. Based on previous research and our preliminary data, we hypothesized that internal references for spatio-temporal gene expression are different from the reference genes suited for individual factors. In G. uralensis, the top three most stable reference genes across different tissues were R3HDM2, CAC and TUB, while CAC , CYP and ABCC2 were most suited for different developmental stages. CAC is the only candidate recommended for both biotic factors, which is reflected in the stability ranking for the spatio (tissue)-temporal (developmental stage) interactions (CAC , R3HDM2 and DNAJ). Similarly, in G. inflata, COPS3, R3HDM2 and DREB were selected for tissues, while RAN, COPS3 and CS were recommended for developmental stages. For the tissue-developmental stage interactions, COPS3, DREB and ABCC2 were the most suited reference genes. In both species, only one of the top three candidates was shared between the individual factors and their interactions, specifically, CAC in G. uralensis and COPS3 in G. inflata, which supports our overarching hypothesis.
In addition, we also documented the reference genes that have been used in RT-qPCR analyses among 12 different Leguminosae plants under the same biotic conditions with current study, i.e., tissue and/or developmental stage. Among the 115 genes have been tested, even the routinely used reference genes showed variable expressions under different experimental conditions. Therefore, to avoid the misinterpretation of RT-qPCR results, a thorough evaluation of reference genes is strongly recommended. More importantly, given that biosynthesis of bioactive compounds is typically spatio-temporal dependent, the selection of suitable reference genes should follow suit. Based on previous studies and our current results, we concluded (1) transcriptome is a rich reservoir for selecting stably expressed candidate reference genes, (2) customized design, especially the interaction among the experimental conditions, is warranted for searching suitable reference genes in any given species, and (3) without validation study, gene(s), including housekeeping genes, could lead to ambiguous results, especially in non-model species. Finally, spatio-temporal selection of reference genes not only lays the foundation www.nature.com/scientificreports/ for functional genomics research in Glycyrrhiza, but also facilitates these traditional medicinal herbs to reach/ maximize their pharmaceutical potential.

Data availability
Data will be available upon request. RNA-seq datasets used in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https ://www.ncbi.nlm. nih.gov/, PRJNA574093.