Systematic screening and validation of reliable reference genes for qRT-PCR analysis in Okra (Abelmoschus esculentus L.)

Quantitative real-time polymerase chain reaction (qRT-PCR) is a sensitive and widely used technique for quantifying gene expression levels, and its accuracy depends on the reference genes used for data normalization. To date, no reference gene has been reported in the nutritious and functional vegetable okra (Abelmoschus esculentus L.). Herein, 11 candidates of reference genes were selected and evaluated for their expression stability in okra in different tissues at different developmental stages by using three software algorithms (geNorm, NormFinder, BestKeeper) and a web-based tool (RefFinder). Among them, eukaryotic initiation factor 4 alpha (eIF4A) and protein phosphatase 2A (PP2A) showed the highest stability, while TUA5 had the lowest stability. The combined usage of these two most stable reference genes was sufficient to normalize gene expression in okra. Then, the above results were further validated by normalizing the expression of the cellulose synthase gene CesA4. This work provides appropriate reference genes for transcript normalization in okra, which will facilitate subsequent functional gene research on this vegetable crop.

www.nature.com/scientificreports/ molecular basis research, especially the rapid aging mechanism of okra pods. Therefore, it is necessary to evaluate the stability of candidate reference genes in different tissues and at different development stages in A. esculentus. Herein, we aim to identify reliable reference genes for qRT-PCR data normalization in okra. Eleven genes includingactin 2 (ACT2), protein phosphatase 2A (PP2A), polyubiquitin 10 (UBQ10), 18S ribosomal RNA protein (18SrRNA), eukaryotic initiation factor 4 alpha (eIF4A), Low expression of osmotically responsive genes 1 (Los1), tubulin alpha 5 (TUA5), heterogeneous nuclear ribonucleoprotein (hnRNP), elongation factor 1-alpha (EF1α), SAND family protein (SAND), and yellow leaf specific 8 (YLS8) were selected as candidate reference genes based on RNA-seq data from our lab. Their expression stabilities in the roots, stems, leaves, flowers and pods were systematically evaluated using geNorm 15 , NormFinder 16 , BestKeeper programs 17 and a web tool RefFinder (http:// blooge. cn/ RefFi nder). In addition, a targeted gene, involved in cellulose synthesis, namely AeCesA4, was used to validate the above reference genes.

Results
Verification of primer specificity and Cq values of candidate reference genes. A total of 11 candidate reference genes were selected for qRT-PCR normalization (Table 1). To check the specificity of primers used in PCR reactions, agarose gel electrophoresis (1.8% w/v) and melting curve analyses were performed. The results showed that a single band was obtained in each lane, yielding a single amplification product with expected size (Fig. 1). Meanwhile, the melting curve analysis showed that all of the primers amplified single major peaks (Fig. S1). These results indicate that all the primers pairs are highly specific.
The expression levels of candidate genes were detected in all samples according to the quantification cycle values (Cq values) obtained by qRT-PCR, and the mean Cq values of these candidates were between 7.94 (18S rRNA) and 28.23 (SAND), showing a wide range of expression levels (Fig. 2). Since gene expression levels are negatively correlated to Cq values, 18S rRNA was the most expressed gene with the lowest mean Cq value, while SAND was the least abundant gene with the highest mean Cq value among the 11 candidate reference genes.
Expressing stability analysis. Expression stability of 11 candidate genes was analyzed by geNorm, Nor-mFinder and BestKeeper independently and the ranking of their stability was obtained separately. Then we got a comprehensive ranking using the web tool, RefFinder that integrates aforementioned three algorithms plus the Delta CT method. geNorm analysis. Based on the geNorm analysis, the mean (M) values of all candidates ranging from 0.155 to 0.928 (Table 2), were lower than the cutoff value of 1.5 in all samples, indicating that all of the candidate genes were relatively stable in okra. In pod and leaf groups, eIF4A and LOS1 were found to be most stable, while 18S www.nature.com/scientificreports/ rRNA and TUA5 exhibited low stability and were ranked as the least stable ones in leaf and pod group, respectively. For different tissues of young seedlings, LOS1 and PP2A with the lowest value of 0.155 showed the best stability, whereas UBQ10 displayed the worst stability. For different organs in the fruiting period, the two most stable genes were eIF4A and PP2A, while TUA5 with the highest value of 0.979 was the most unstable. For the all samples, YLS8 and PP2A were the most stable genes, followed by hnRNP, but TUA5 was the least stable one. Among all the groups, eIF4A and LOS1 showed higher stability, whereas TUA5 had the lowest stability in most groups.
The geNorm program was also used to analyze pairwise variation values of Vn/Vn + 1 for the assessment of the minimal number of reference genes required for normalization. For the total samples, a minor variation was found between V2/3 (0.135) and V3/4 (0.105), suggesting that the two reference genes (YLS8 and PP2A) would be suitable for normalization. For the other groups, both V2/3 and V3/4 values were less than 0.15 (Fig. 3), indicating that the use of the top two reference genes was sufficient for normalization in qRT-PCR.

NormFinder analysis.
Expression stability values, intra-and inter-group variances of candidate genes in groups 1 and 2, groups 1 and 3, and groups 1 and 4 analyzed by NormFinder are shown in Table 2 and Table S1. Among all the groups, eIF4A got the top rank, which was somewhat different from geNorm results. For example, according to the geNorm analysis, PP2A showed the highest stability in seedling group, whereas its stability ranked third in the NormFinder analysis. Nevertheless, the most unstable reference genes in all groups were consistent with the results of the geNorm analysis. In general, eIF4A exhibited the best expression stability, while TUA5 and UBQ10 performed poorly across all groups.
BestKeeper analysis. BestKeeper assesses expression stability by measuring the standard deviation (SD) and coefficient of variance (CV). The more stable reference gene possessed the lower SD (i.e., usually < 1) value. For different tissues in the fruiting period and total samples group, UBQ10 and hnRNP were the most stable reference genes, whereas TUA5 with a SD value > 1 was considered as an unstable gene. In the leaf group, all reference genes had lower SD values (SD ≤ 0.71), and eIF4A and ACT2 were considered as the most suitable reference   www.nature.com/scientificreports/ genes, and 18S rRNA obtained the lowest stability. For pod samples, EF1-α and UBQ10 were the optimal reference genes, while TUA5 was unacceptable owing a higher SD value of 1.58. For different tissues in the seedling group, 18S rRNA and LOS1 were placed as the best reference genes, while UBQ10 as the worst one.

RefFinder analysis.
RefFinder, an online tool for expression stability of reference genes, was used to calculate and recommended comprehensive ranking of 11 candidates based on the three previously described algorithms and delta-Ct 18 ( Table 2). The comprehensive rankings from RefFinder showed that eIF4A and PP2A had the highest stability, while EF1-α and TUA5 had the least stability across all samples. For different tissues in the fruiting period, eIF4A and hnRNP were the two most stable reference genes. For pods and leaves at different developmental stages and different tissues of the young seedlings, the top two genes were eIF4A and LOS1, while UBQ10, TUA5 and 18S rRNA was ranked as the most unstable gene in the seedling stage, pods and leave groups, respectively. Taken together, eIF4A was defined as the most stably expressed gene, while TUA5 was the least stable in most groups.
Validation of the stability of reference genes. To test and verify the reliability of the screened reference genes, a target gene needs to be selected for qRT-PCR amplification. The relative expression pattern of gene CesA4, which encodes an enzyme essential for cellulose biosynthesis in plants, was tested in pods at different developmental stages, as well as in different tissue samples. And its relative expression levels were normalized using two most stable genes (eIF4A and PP2A for different tissues, eIF4A and LOS1 for pods), and the least stable reference gene (TUA5 and EF1-α for different tissues, TUA5 and 18S rRNA for pods), as well as one moderately stable reference gene ACT2 based on the results of RefFinder. The qRT-PCR analysis showed that AeCesA4 expression was the highest in 9 DAF pods, followed by stems at the young seedling phase, but lower in flowers and leaves (Fig. 4A). In different tissues, the expression patterns of AeCesA4 were similar when normalized using eIF4A and PP2A alone or in combination, but the relative expression levels of AeCesA4 decreased significantly in roots, stems, and 9 DAF pods (p < 0.05), when normalized with EF1-α and TUA5 (Fig. 4A). On the other hand, when TUA5 was used as an internal gene, the relative expression level of AeCesA4 in 9 DAF pods was much higher than those with stable genes (eIF4A and LOS1) (p < 0.05) (Fig. 4B). When normalized by ACT2 and 18S rRNA independently, however, the relative expression level of AeCesA4 was lower compared to normalization by the optimal genes (Fig. 4B). In the pods group, data normalization using the most widely used reference genes ACT2, the relative expression level of AeCesA4 in 6 DAF pod samples were significantly underestimated (p < 0.05) (Fig. 4B), thus highlighting the importance of selecting suitable reference genes.

Discussion
Presently, qRT-PCR is regarded as the best choice for accurately analyzing gene expression levels in different samples. However, due to its high sensitivity, this technique is highly subjected to manipulation level and samples' variations. When inappropriate reference genes are used for normalization analysis, these changes can severely affect results. Therefore, the selection of suitable reference genes is crucial to ensure the accuracy of qRT-PCR. However, systematic screening of reference genes of okra (A. esculentus) has not been reported.
Okra is an important vegetable which is popular all over the world. Despite of its high nutritional and medicinal effects, little attention has been paid to its molecular function and gene expression. Until now, its genome has not been sequenced, and little is known about the molecular mechanisms of the pod growth and development.  www.nature.com/scientificreports/ Also, a set of reliable reference genes for qRT-PCR assay is still lacking. Fortunately, RNA sequencing (RNAseq) is now a powerful approach for transcriptome analysis of differential gene expression. And it provides a resource for the identification of reference genes in non-model plants without genome information. Here, we used RNA-seq approach to identify the suitable reference genes for accurate normalization of the transcript levels by qRT-PCR analyses in okra.
In the present study, 11 candidate internal reference genes (ACT2, LOS1, TUA5, hnRNP, SAND, EF1-α, eIF4A, YLS8, PP2A, UBQ10, 18S rRNA) were identified from our transcriptome data. The three most extensively used software packages (geNorm 15 , NormFinder 16 , BestKeeper 17 ) and one web tool RefFinder were used to assess the expression stability of the candidate reference genes. Four programs showed a few differences in results, for example, according to the NormFinder evaluation, eIF4A and ACT2 were the most two stable reference genes in all of the samples examined, whereas their stability rankings were fourth and sixth in the geNorm analysis, respectively. However, results from BestKeeper analyses showed that UBQ10 and hnRNP were the most suitable reference genes in all the tested samples. Analysis using geNorm and Normfinder resulted in different orders of most stable genes but the least stable reference genes were the same ones. In general, stability ranking of reference genes generated by BestKeeper was quite different from those of the other two algorithms, similar to the results of previous reports 19,20 . It was difficult to determine the stable reference genes in A. esculentus using only one algorithm. Therefore, we used RefFinder which integrates the other computational algorithm to counteract bias and to obtain a comprehensive ranking of gene expression stability. Unexpectedly, eIF4A was defined as the most stably expressed gene in all tissues and specific tissue groups examined in this study. Moreover, previous studies have proved that eIF4A was suitable for normalization in gene expression studies in Avena sativa L. and Eleusineindica 14,21 . Following eIF4A, PP2A also displayed particularly excellent stability among all samples and it has been reported as a stably expressed gene in other species 8,13,22,23 . For these reasons, eIF4A and PP2A recommended by the above-mentioned software could be accepted as reference genes in this work. In contrast, TUA5 was the least recommended reference gene in most groups of this study, while TUA5 exhibits highly stable expression across development in soybean and in different tissues of Suaeda glauca 12,22 .
While for accurate normalization of qRT-PCR results, a single reference gene usually cannot meet the requirements 15,24 . The optimal number and choice of reference genes must be determined experimentally and methodically 25 . In the current study, the pairwise variation parameters from geNorm calculated indicated that a combination of two top stable reference genes may be a better option for gene expression normalization in all cases. Based on the comprehensive ranking of RefFinder, the combination of eIF4A and PP2A was the most stable reference gene set for all samples in our research. The best reference gene set for developing pods, young seedling samples and leaf, was eIF4A plus LOS1, and the optimal reference gene set for different tissues in the fruiting period was eIF4A plus hnRNP.
Cellulose, the main component of plant cell walls, plays a vital role in the growth and development of plants. The gene cellulose synthase A (CesA), encoding cellulose synthases, is responsible for cellulose biosynthesis in www.nature.com/scientificreports/ plant cell walls. Currently, CesA genes have been extensively studied in model plants such as rice and Arabidopsis 26,27 . Nevertheless, the regulatory mechanisms of CesA expression are not well investigated in A. esculentus.
To confirm the expression stability of reference genes in the current study, the relative expression patterns of AeCesA4, one of secondary cell wall-associated cellulose synthase genes, was analyzed by qRT-PCR. The results showed large differences in the quantification of AeCesA4 expression level when normalized using the best reference gene compared to the least stable one. For instance, when the least stable reference gene TUA5 was used, AeCesA4 expression levels were underestimated significantly in roots, stems and leaves (p < 0.05) (Fig. 4A), whereas the opposite results were displayed in 9 DAF pods (p < 0.05) (Fig. 4B). Actually, the mean Cq values of TUA5 ranging from 19.33 (3 DAF pods) to 22.96 (9 DAF pods) in pod group displayed relatively high variation around 3.33 cycles, indicating its expression levels decreased dramatically in later stages of pod development. Therefore, we are not surprised that when the least stable gene TUA5 was used for normalization, the expression level of AeCesA4 significantly increased compared to that of eIF4A, LOS1, or the combination of eIF4A + LOS1 in 9 DAF pods (Fig. 4B).
Previous studies published on qRT-PCR in okra, usually ACT were used as a single internal control for qRT-PCR analysis 28,29 , and their stability has not yet been reported. However, in the present experiments, ACT2 was ranked as moderately stable candidate reference gene according to RefFinder analysis. When ACT2 was used for normalization analysis, the expression of AeCesA4 in 6 DAF pods was significantly changed compared with the stable reference genes, was very similar to that in Arabidopsis pumila 30 (p < 0.05) (Fig. 4B). Another most commonly used reference gene, 18S rRNA, whose transcript abundance in okra was too high with Ct values less than 11, thus may affect the quantitative accuracy of the target gene. Similarly, misinterpretation was also observed in previous study 31 . Therefore, 18S rRNA should be excluded according to the selected reference genes criteria proposed by Beillard 32 . Hence, ACT2 and 18S rRNA, although the most commonly used, are not the appropriate reference genes for okra. Our results indicated that the expression stability of commonly used reference gene may vary significantly across different tissues and different development stages, and further proved the importance of validating the normalizing reference genes before conducting gene expression analysis.
Importantly, the expression patterns of AeCesA4 gene in okra's tissues exhibited the higher expression in fast-growing pods and stems than other tissues. Similar expression patterns in Miscanthus × giganteus have been reported 33 ; thus, it is consistent with its biological role of CesAs responsible for the secondary cell wall synthesis. Validation of gene expression revealed that AeCesA4 showed similar expression patterns when using the single most stable reference gene and the most stable reference genes combinations, whereas the expression levels were significantly different when normalized using the most unstable reference genes, suggesting that the identified reference genes are reliable.
Note that the combinations of multiple reference genes can be expected to be more precise than a single one 9,24 . Based on validation results of target gene AeCesA4 expression among different tissues, although its expression patterns is almost the same when normalized with the optimal gene or the combination of the two top stable reference genes, we recommend that using the appropriate combinations of two genes for more accurate and reliable qRT-PCR results for okra.

Conclusions
This is the first systematic study to validate a set of candidate reference genes for normalization of qRT-PCR data in okra using four algorithms. Different sets of reference genes were recommended to normalize gene expression data in different tissues and at different development stages. For the total samples group, the combination of eIF4A and PP2A was the most stable reference gene set. The best reference gene set for developing pods, seedling samples and leaves of different developmental stages was eIF4A + LOS1, and the optimal reference gene set for different tissues in the fruiting period was eIF4A + hnRNP. Additionally, the expression patterns of target gene AeCesA4 was determined to confirm the reliability of the selected reference genes. Our findings will benefit the qRT-PCR-based studies of gene expression in okra.

Materials and methods
Plant materials. The okra variety 'lüwuxing' used in this study was formally identified by associate research fellow Wei-Xia Liu (Chinese Academy of Tropical Agricultural Sciences). The voucher specimen of A. esculentus has been deposited in Shanghai Natural History Museum (Branch of Shanghai Science & Technology Museum) (Herbarium ID 92068). All experimental procedures were in accordance with local and national regulations. Okra seeds were placed on filter paper in 150 mm petri dishes, and an appropriate amount of distilled water was added. The dishes were placed in an incubator at 30°C for about 30 h. The sprouting seeds were sown in trays containing a mixture of peat soil, vermiculite, and perlite (1:1:1, v/v/v) and grown in a greenhouse under natural conditions for 1 month. The seedlings were moved outside of green house for hardening off in the open air for 1 week and then transplanted into the field (103°49′30.6″ E; 30°48′52.25″ N), Chengdu, China. Tissue samples of roots, stems, and young leaves were collected from 5-week-old seedlings. Mature and senescent leaves, beforeblooming and full-blooming flowers, and pods (3, 6 and 9 days after flowering (DAF)) were harvested from the fruiting stage of plants. All of the samples were collected from three plants, and frozen in liquid nitrogen, then stored at − 80°C until RNA extraction. To analyze the stability of candidate reference genes, samples were divided into four groups. Group one, young seedlings, contains three different tissues (roots, stems, and leaves) from young seedlings. Group two, leaves, contains three developmental stages of leaves (young, mature, and senescent leaves). Group three, Pod, contains three developmental stages of pods (3-, 6-, and 9-day-old ones). Group four, fruiting period, contains five developmental stages of fruit (buds, flowers, 3-, 6-, and 9-day-old pods). Evaluation of reference genes. The raw data of qRT-PCR were obtained by the CFX equipment software, the average Cq values were used for further analyses. The expression stability of candidate reference genes was evaluated with three algorithms namely geNorm v3.5 (https:// genorm. cmgg. be/) 15 , NormFinder v0.953 (https:// moma. dk/ normfi nder-softw are) 16 , and BestKeeper v1.0 (https:// www. gene-quant ifica tion. de/ bestk eeper. html) 17 , and then a comprehensive ranking was obtained by the RefFinder program (http:// blooge. cn/ RefFi nder). The analysis methods of these programs were the same as those described in previous study 13 .
Validation of reference genes. To verify the stability of reference genes, qRT-PCR was performed to detect the expression patterns of Cellulose synthase gene AeCesA4 in different tissue samples (root, stem, leaf, flower and pod) and pods at different developmental stages. The relative expression level of AeCesA4 was calculated by 2 −ΔΔct method 34 . One-way analysis of variance (ANOVA) test was applied to analyze significant differences among the reference genes using SPSS statistical software 19 (p < 0.05, Duncan's multiple range tests) 30 .

Data availability
The raw data of transcriptome sequencing were submitted to NCBI Sequence Read Archive (SRA) with accession number Bioproject: PRJNA861710. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.