Selection and validation of reference genes by RT-qPCR under photoperiodic induction of flowering in sugarcane (Saccharum spp.)

Although reference genes have previously been used in the expression analysis of genes involved in sugarcane flowering they had not been experimentally validated for stability and consistency of expression between different samples over a wide range of experimental conditions. Here we report the analysis of candidate reference genes in different tissue types, at different temporal time-points, in both short and long day photoperiodic treatments. The stability of the candidate reference genes in all conditions was evaluated with NormFinder, BestKeeper, and RefFinder algorithms that complement each other for a more robust analysis. As the Normfinder algorithm was more appropriate for our experimental conditions, greater emphasis was placed on Normfinder when choosing the most stable genes. UBQ1 and TUB were shown to be the most stable reference genes to use for normalizing RT-qPCR gene expression data during floral induction, whilst 25SrRNA1 and GAPDH were the least stable. Their use as a reference gene pair was validated by analyzing the expression of two differentially expressed target genes (PIL5 and LHP1). The UBQ1/TUB reference genes combination was able to reveal small significant differences in gene expression of the two target genes that were not detectable when using the least stable reference gene combination. These results can be used to inform the choice of reference genes to use in the study of the sugarcane floral induction pathway. Our work also demonstrates that both PIL5 and LHP1 are significantly up-regulated in the initial stages of photoperiodic induction of flowering in sugarcane.

www.nature.com/scientificreports/ patterns of genes involved in the sugarcane flowering pathway may help identify key points of this pathway that can be manipulated to control flowering. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) has become the most common, inexpensive and reliable method for gene expression studies. Accurate RT-qPCR analysis requires proper normalization of gene expression data through the use of reference genes whose expression should be constant in different tissues, organs and developmental stages, regardless of the experimental conditions. Until now no reference genes have been reported to be stable under all experimental conditions, the MIQE (Minimum Information for Publication of Quantitative Real-Time PCR Experiments) 6 guidelines state that the validation of the stability of the reference gene expression patterns for different tissues under the specific conditions in which the experiment was conducted needs to be done before using them to normalize the expression levels of the target genes of interest [6][7][8] . The use of inappropriate reference genes can lead to errors and consequently misinterpretation of results and erroneous conclusions. In fact it is recommended to use a combination of two or more stable reference genes to get the most reliable results 9 .
There are several algorithms with mutually complementary approaches that are available for the analysis of candidate reference genes that can be used to obtain information on the best reference genes to use.
The most commonly used programs for the analysis of reference genes are BestKeeper, NormFinder, Ref-Finder, and GeNorm 9-12 . BestKeeper infers expression stability combining in one index those with the best geometric means, showing the correlation between each candidate reference gene and the index. The Pearson correlation (r), coefficient of determination (r 2 ) and P value shows the relationship between candidate reference genes and the index 10 . The NormFinder algorithm provides a stability value for each gene with lower stability values indicating greater stability. This algorithm compares the relative expression of gene pairs for each sample in an experiment, and ranks the reference gene for its stability according to the repeatability of the expression difference between all samples 11 . RefFinder is an online tool (https ://omict ools.com/reffi nder-tool) that integrates the major algorithms such as geNorm, Normfinder, BestKeeper, and the comparative Delta-Ct method to compare and sort the candidate reference genes taking into account the results of all the algorithms analyzed together generating a comprehensive ranking 12 .
To date, reference genes have not been well defined for developmental time course gene expression studies of floral induction in sugarcane, despite the relationship between flowering and yield in this crop. Few reference genes were reported for expression studies in sugarcane, such as: studies on sucrose accumulation 13 ; lignin metabolism pathway 14 and abiotic stress [15][16][17][18][19][20] , however none of these have been validated for expression studies related to flowering. Here we describe a set of reference genes with stable expression profiles evaluated in two leaf tissues (mature and immature leaf), at two time points (corresponding to the early stages of floral induction and of floral primordia formation) in a commercial sugarcane cultivar (IACSP96-7569) grown in either short day (SD) or long day (LD) photoperiods. The selected reference genes were validated by analyzing the expression of two target genes PHYTOCHROME INTERACTING FACTOR 3-LIKE 5 (PIL5) and LIKE HETEROCHROMA-TIN PROTEIN 1 (LHP1), involved in the photoperiodic induction pathway 21 and previously identified as being differentially expressed by RNA sequencing (RNA-Seq) experiment. PIL5 is involved in the light perception pathway 22 while LHP1 regulates flowering time and is involved in epigenetic regulation of FLOWERING LOCUS T (FT) and FLOWERING LOCUS C (FLC) expression in Arabidopsis thaliana 21 .

Results
RT-qPCR data of candidate reference genes. After the assessment of the RT-qPCR data obtained from separate cDNA pools of samples from mature leaves and spindle leaves, primer specificity was confirmed by obtaining single melting curve peaks for all the candidate reference genes primer pairs together with the absence of any amplification in the negative controls ( Supplementary Fig. S1 online). The best melting curve peaks were obtained at a primer concentration of 0. 25 Table 2).
The BestKeeper algorithm estimates the stability of the reference genes according to the standard deviation from the cycle threshold (Ct) raw data. Although standard deviation greater than 1 indicates inconsistent expression of the candidate reference gene 10 , De Spiegelaere et al. 23 proposed excluding reference genes showing a standard deviation greater than 1.5. Using this calculation, the most stable genes were UBQ2 for spindle leaf and LD samples. Although RPL gene was the most stable for the mature leaf, 7th time point and SD samples, it was the least stable for spindle leaf samples. EF1 was the most stable for the 13th time point sample whereas GAPDH the least stable for mature leaf and SD samples ( Table 3, Supplementary Table S2 online).
However, according to the NormFinder algorithm, TIPS-41 was the most stable gene for two experimental samples (spindle leaf and LD) while TUB, RPL, UBQ2 and UBQ1 was the most stable for the mature leaf, 7th and 13th time points, and SD treatment, respectively. The least stable gene was 25SrRNA1 for six of the seven conditions while GAPDH was for the 7th time point (Table 3). The NormFinder algorithm also provides a combination of the two best reference genes that can be used as the most stable gene combination. UBQ1/TUB was the most stable gene combination for (7th time point and LD) and RPL/EF1 (mature leaf, 13th time point and SD) ( Table 3, Supplementary Table S2 online).
According to the RefFinder comprehensive ranking UBQ2 was the most stable reference gene for 13th time point samples, while TIPS-41 was for spindle leaf and LD samples, and EF1, TUB and UBQ1 for the 7th time point, mature leaf, and SD samples, respectively (Table 4, Supplementary Tables S3 and S4 online).
Finally, as the NormFinder algorithm also enables estimation of gene expression variation between sample subgroups of the sample set 11 which is particularly appropriate for our experimental set-up (different tissues, time points and photoperiod treatments), greater weight was giving to the selection of the candidate reference genes by this program. Therefore, the candidate reference genes UBQ1 and TUB are the best to use for normalization for most of the experimental conditions, including that in which the RNA-Seq experiment was performed (mature leaf, 7th time point and SD treatment). In addition, the pairwise UBQ1/TUB were proven as the best reference gene combination at 7th time point and LD treatment. Also TUB was the most stable gene at mature leaf, the tissue used for the RNA-Seq experiment.
Validation of reference genes. The gene expression data of two target genes (PIL5 and LHP1), was normalized using either the most stable (UBQ1/TUB) and least stable (GAPDH/25SrRNA1) reference gene pairs in several contrasting experimental conditions: 7th and 13th time points, SD and LD treatments, and different tissues (Mature leaf and Spindle leaf) to evaluate the effect of using good and bad reference gene pairs for normalization. Their expression was evaluated using three biological replicates, each with three technical replicates, for the two experimental conditions, assuming the inductive treatment (SD) as treated and the non-inductive treatment (LD) as the untreated, for the analysis using REST2009.
In the mature leaf at the 7th time point (early stage of induction), both PIL5 and LHP1 were up-regulated and significantly more expressed (P < 0.05 and P < 0.001 respectively) in SD than LD when expression data was normalized with the most stable reference genes (Fig. 1A,C). However when the expression data normalization was done with the least stable reference genes (GAPDH/25SrRNA1) no significant difference was observed between SD and LD (Fig. 1B,D). In the mature leaf at the 13th time point (floral primordia stage), neither of the target genes PIL5 and LHP1 showed a significant difference in expression between SD and LD when normalized with  www.nature.com/scientificreports/ either of the reference gene pairs. Similarly the spindle leaf, at the 7th time point, no significant difference in relative expression for PIL5 and LHP1 between SD and LD was detected. However, at the 13th time point, PIL5 and LHP1 were both significantly up-regulated in SD than LD samples, the relative difference in expression being large enough to be detected even when normalization is done using the least stable reference genes (Fig. 1B,D). The expression data of the target genes given by FPKM (fragments per kilobase of exon model per million reads mapped) from the RNA-Seq was converted in log fold change (SD/LD) and compared to the RT-qPCR relative expression data, both in the 7th timepoint mature leaf samples which were used for the RNA-Seq experiment. PIL5 had a FPKM of 2.59 and 1.20, respectively in SD and LD and a FPKM log fold change (SD/LD) of 0.33 ( Fig. 2A) indicating up-regulation in SD. This gene showed a significant up-regulation with a relative expression of 1.75 by RT-qPCR only when the data was normalized with the most stable reference genes (Fig. 2B) and not with the least stable reference gene pair (Fig. 2D). FPKM values for LHP1 at SD and LD were 33.84 and 40.36 respectively with a log fold change (SD/LD) of − 0.07 ( Fig. 2A) indicating a down-regulation in SD. However, LHP1 RT-qPCR relative expression data showed it was significantly up-regulated when normalized with the most stable reference genes (Fig. 2C), the least stable reference genes were not able to detect any significant difference (Fig. 2E).

Discussion
Flowering is an important process beginning with the shift from vegetative to reproductive development marked by changes in the shoot apex morphology. In sugarcane, flowering takes several months to complete and can be divided into various stages each one having specific photoperiod requirements 24 . RNA-Seq and RT-qPCR are important tools to determine the expression of genes involved in the flowering process as well as to understand their role in various stages of reproduction.
The selection of reliable reference genes is an essential step in the normalization of quantitative real-time PCR gene expression data as it impacts directly on the analysis and interpretation of the RT-qPCR data.
Whilst reference genes have previously been used in the expression analysis of genes involved in sugarcane flowering, for example GAPDH and ACTIN were used to normalize the expression of ScFT1 and ScTFL1 in sugarcane 25 , Medeiros et al. 26 used EF1-α for normalization of differentially expressed genes in early and late   20 as housekeeping gene to normalize genes related with photoperiod cycling over a 24-h period, these reference genes had not been experimentally validated for stability and consistency of expression between different samples at photoperiodic controlled conditions. There are several algorithms available for the selection of reference genes, each one based on a unique set of assumptions. The use of at least three different algorithms is recommended for the validation of reference genes in qPCR assays, although, conflicting results among these algorithms can cause difficulties in the choice of the best reference genes 27 . Regarding the algorithms applied in our analyses: Bestkeeper uses standard deviation and coefficient of variation to assess the stability of the reference gene, while RefFinder integrates several different algorithms (geNorm, NormFinder and Bestkeeper) and generates a comprehensive ranking of the reference genes. However, the individual primer efficiency of each reference gene is not considered by RefFinder 23 which assumes an efficiency of 100%. Although, BestKeeper and RefFinder can be complementarily, the ranking of www.nature.com/scientificreports/ stable reference genes by these algorithms in each experimental condition were not assumed to be the most appropriate option when compared to the reference genes defined by NormFinder. It is noteworthy that Nor-mFinder algorithm not only assesses the stability of individual reference genes but also indicates the best pair of reference genes (not provided by BestKeeper and RefFinder) considering the respective primer efficiency of each gene. The most stable individual reference genes will not necessarily be selected as the best pair by the NormFinder algorithm, as selecting reference genes from the same pathway (and thus possibly co-regulated) should be avoided as the best pair 11 . In situations involving large data sets, the ANOVA of inter-and intra-groups among different conditions performed by Normfinder is more appropriate in our case for the determination of the stability values, and thus to indicate the most, and least, stable reference genes 11 . After assessing the stability of eight different candidate reference genes using three different algorithms (Best-Keeper, NormFinder and RefFinder), it was found that UBQ1 and TUB were the most suitable and reliable for normalization of RT-qPCR expression data of target genes involved in the sugarcane floral induction pathway. According to the NormFinder, TUB and UBQ1 stand out as the most stable genes and were observed together and in combination with other genes to compose the best reference candidate gene pairs.
Although TIPS-41 was the most stable for immature leaf and long day conditions, it does not appear as part of a best pair in any of the conditions. This may indicate that the gene has some stability but cannot be fitted as a best pair with other candidates (due to possible co-regulation in similar molecular pathways) 27,28 . On the other hand, TUB and UBQ1 were the most stable reference genes for mature leaf and short day conditions, respectively. Also, these candidates form the best pair of reference genes for the seventh time point and long day conditions. These results agree with those from RefFinder where TUB and UBQ1 were the best overall candidates in the comprehensive ranking for mature leaf and short day conditions, respectively. Whether as the most stable individually, or as the best pair, TUB/UBQ1 was able to cover the broad range of experimental conditions used in the RNA-Seq experiment (mature leaf, 7th time point, SD and LD).
The results of Bestkeeper and NormFinder, separately, together with the integrated results presented by Ref-Finder, point to common candidate reference genes in some experimental conditions. However, as mentioned before each algorithm uses its own intrinsic calculations. Thus, not always the most stable reference genes point out by each algorithm separately will be selected as the most stable by the RefFinder. Despite this, in the majority of the experimental conditions the different algorithms pointed to the same genes as being stable reference genes. Comparing the results from BestKeeper to that from the BestKeeper data evaluated through RefFinder (Supplementary Table S4 online), three out of six experimental conditions (mature leaf, immature leaf and LD) showed the same reference genes. In the same comparison for NormFinder, 5 out of 6 conditions (mature leaf, spindle leaf, 13th time point, SD and LD) are in agreement between the algorithms, which makes NormFinder more robust and was therefore used as our main algorithm in the selection of stable reference genes.
It is important to note that the analysis of sugarcane reference genes reported in this paper has not previously been performed over such wide experimental conditions which in this case encompasses two tissues, at two-time points in SD and LD photoperiodic treatments.
In order to demonstrate and validate the reliability of the selected reference genes they were used to normalize the expression of two target genes, PIL5 and LHP1, both were differentially expressed in a previous RNA-Seq experiment performed by our research group. The expression of these two target genes was evaluated using three biological replicates, each with three technical replicates, for the three experimental conditions. By contrasting the expression data of the target genes normalized with the most stable (UBQ1/TUB) as well the least stable (GAPDH/25SrRNA) gene combination, it was verified that UBQ1/TUB are the best combination of reference genes to use for normalization as they were able to distinguish even small significant differences in gene expression (between PIL5 and LHP1 in mature leaves at the 7th sampling) which was not detected when using the GAPDH/25SrRNA1 reference gene combination. Such small but significant differences in gene expression may be crucial in understanding physiological processes at the molecular level and underlines the need to select good reference genes and validate their expression stability in appropriate conditions before analyzing the expression of target genes.
As for the comparison between RNA-Seq expression data (FPKM) and relative expression determined by RT-qPCR, the importance of gene validation by RT-qPCR is evident not only to confirm differential gene expression (and whether it is up or down-regulated), but also the significance of the change in expression level which was only possible through the use of stable reference genes (PIL5 up-regulation at the 7th time point on SD was confirmed as significant only when the RT-qPCR data was normalized with the most stable reference genes). The down-regulation of LHP1 according to the RNA-Seq data was shown to be erroneous by RT-qPCR validation as it is in fact significantly up-regulated when the data is normalized using the most stable reference genes. Although both target genes (PIL5 and LHP1) used to validate the selected reference genes interact with flowering regulators to control flowering time 21,29,30 , no information regarding the expression of these two target genes is available for sugarcane. Nevertheless, our RT-qPCR results suggest that PIL5 and LHP1 are significantly up-regulated in the initial stage of photoperiod induction, albeit functional studies in sugarcane will be necessary to determine their role in sugarcane photoperiodic induction. The reference genes UBQ1 and TUB were shown to be the most stable reference genes to use to normalize RT-qPCR gene expression data during floral induction in sugarcane. On the other hand, 25SrRNA1 and GAPDH reference genes were the least stable in our conditions. The validation of the differential expression of genes identified by RNA-Seq analysis should be carried out by RT-qPCR relative gene expression analysis using stable reference genes for correct data interpretation.  18 ; ELONGATION FACTOR 1-ALPHA (EF1-α) 26 and TONOPLAST INTRINSIC PROTEIN (TIPS-41) 15 . Two of them GAPDH 25 and EF1-α 26 have already been used in flowering related studies in sugarcane. Primers used for the candidate reference genes were those published and are shown in Supplementary Table S1 online. Analysis of primer efficiency was performed by RT-qPCR using a pooled cDNA sample composed of equal parts of cDNAs taken from all the spindle leaf samples (including biological replicates) and compared with a similar pool made from mature leaf samples, with the results analysed using LinRegPCR 7.5 software 31 . Three primer concentrations (0.25, 0.4 and 0.8 µM) were used in the optimization step and the best concentration established from the melting curve peak and the primer efficiency value. Following this, the reference genes were tested in pooled cDNA samples (1:20 dilution) of three biological replicates of each sampled condition of the photoperiod experiment (see Supplementary Fig. S3 online) to allow the "in silico" comparison between the different experimental variables: different tissues (all mature leaf samples vs. all spindle leaf samples); developmental stage (7th vs. 13th time point); and photoperiodic conditions (SD vs. LD). The reference genes were tested by RT-qPCR and the Cycle threshold (Ct) and Efficiency (E) data were analyzed by three statistical algorithms (NormFinder, BestKeeper and RefFinder) to define the most stable reference genes. The experimental variables were analysed "in silico" using these algorithms (Sup.Mat.RT-qPCR data to Perform Analysis on Algorithms.xls).
Selection of analysed genes and primer design. RNA-Seq experiment was performed on mature leaf samples from the 7th week timepoint in both SD and LD in order to find differentially expressed transcripts associated with flowering induction (data not shown).Two target genes PIL5 and LHP1 that are known to be differentially expressed in mature leaves at the 7th timepoint according to our RNA-Seq data were used for reference gene validation, only in 7th time point sample at mature leaf (corresponding to RNA-Seq experiment). The FPKM values from RNA-Seq data of these target genes were compared with relative expression data. The web tool PrimerQuestTool (https ://www.idtdn a.com/Prime rQues t/Home/Index ) was used to design the primers for these two genes adopting as conditions: primer length ranging between 17 and 22 bp, GC content between 35 and 65 (%), and melting temperature (Tm) between 59º and 65 °C with an amplicon ranging between 100 and 250 bp. The primers PIL5 and LHP1 were designed aiming to avoid conserved domains, the web tool NetPrimer (www.premi erbio soft.com/netpr imer) was used to analyze primer quality for secondary structure formation.
Total RNA extraction and cDNA synthesis. Total RNA extraction was performed with PureLink RNA Mini Kit (ThermoFisher), quantified using a NanoDrop 2000 (Thermo Fisher Scientific, Wilmington DE, USA), and RNA integrity was verified in 1.0% agarose gel electrophoresis (3 to 5 mV/cm) stained with ethidium bromide (1 µg mL-1). All RNA samples showed 260/280 ratio of around 1.8-2.0. The cDNA synthesis was performed with QuantiNova Reverse Transcription kit (QIAGEN Strasse 1, 40724 Hilden, GERMANY) following manufacturer's instructions (genomic DNA treatment included) using 1 µg of RNA and both oligo-dT and random primers (42ºC by 15 min and 95ºC by 3 min). The cDNA was set to a final dilution of 1:20 for RT-qPCR analysis.
RT-qPCR conditions. The RT-qPCR assays were performed using a Bio-Rad IQ5 machine using GoTaq qPCR Master Mix Kit (Promega, USA). The reaction was conducted in a final volume of 10 µl containing 5 µL of SYBR Green 2x, 3 µL of cDNA (1:20) and primer pairs at their respective adjusted concentration. Amplification conditions were: 95ºC for 3 min, followed by 40 cycles of 10 s at 95 °C and 30 s at 60ºC, followed by a melting curve from 55° to 95 °C. Both primer efficiency and concentration optimization analysis was performed using two technical replicates, whilst the evaluation of the cDNA pools under experimental conditions (tissues, time points and photoperiod treatments) as well as the validation of the reference genes were performed in triplicate (i.e. three technical replicates). Relative expression data and statistical analysis were performed using the software REST 2009 with 2000 iterations 32 and differences considered significant when P < 0.001 (***), P < 0.01 (**), www.nature.com/scientificreports/ and P < 0.05 (*). The inductive treatment (SD) was considered "treated" and non-inductive treatment (LD) was considered "untreated" for the purpose of expression calculations in the software. For a summarized flowchart of the methods see Supplementary Fig. S4 online.