Transcriptome-based identification and validation of reference genes for plant-bacteria interaction studies using Nicotiana benthamiana

RT-qPCR is a widely used technique for the analysis of gene expression. Accurate estimation of transcript abundance relies strongly on a normalization that requires the use of reference genes that are stably expressed in the conditions analyzed. Initially, they were adopted from those used in Northern blot experiments, but an increasing number of publications highlight the need to find and validate alternative reference genes for the particular system under study. The development of high-throughput sequencing techniques has facilitated the identification of such stably expressed genes. Nicotiana benthamiana has been extensively used as a model in the plant research field. In spite of this, there is scarce information regarding suitable RT-qPCR reference genes for this species. Employing RNA-seq data previously generated from tomato plants, combined with newly generated data from N. benthamiana leaves infiltrated with Pseudomonas fluorescens, we identified and tested a set of 9 candidate reference genes. Using three different algorithms, we found that NbUbe35, NbNQO and NbErpA exhibit less variable gene expression in our pathosystem than previously used genes. Furthermore, the combined use of the first two is sufficient for robust gene expression analysis. We encourage employing these novel reference genes in future RT-qPCR experiments involving N. benthamiana and Pseudomonas spp.


Results
selection of stably expressed genes based on tomato and Nicotiana benthamiana RNA-seq data. In order to identify genes with low expression variation to be used as reference in RT-qPCR experiments, we took advantage of a previous analysis of gene stability based on RNA-seq data from tomato leaves with different treatments (37 treatments/time points with an average of 3 biological replicates generated in independent experiments) 31 . This large set of tomato data was narrowed down to 50 genes with the most stable expression across all treatments. Performing BlastX analysis using these tomato genes as input and N. benthamiana proteins as database, we identified the closest putative orthologs and hypothesized they would also be stably expressed genes. To assist in the selection of reference genes, we performed RNA-seq analysis with N. benthamiana leaves vacuum-infiltrated with a suspension of Pseudomonas fluorescens 55 and MgCl 2 as a mock. This bacterial treatment, that results in a strong PTI induction and large transcriptomic changes at 6 h after infiltration (hai) 7 , allowed the identification of 10,300 differentially expressed genes from the 57,139 predicted genes in N. benthamiana (Supplementary Table S1). Using this information generated from N. benthamiana tissue, we calculated the coefficient of variation (CV) of the 50 selected genes, using the expression values (RPKMs, reads per kilobase of transcript per million mapped reads) of each biological replicate individually. The lower the CV is, the more stable the expression of the gene is across the conditions. In this way we were able to select 9 genes with the lowest CV (ranging from 5.14% to 10.27%) for analysis (Supplementary Table S2). We also selected a gene named NbPP2a (Niben101Scf09716g01002.1) previously validated as a stable reference gene in virus-infected N. benthamiana plants 25 and two traditional plant reference genes NbEF1α and NbGADPH.
Analysis of candidate reference genes expression profiles indicated high efficiencies and unique transcript amplifications. We evaluated the amplification efficiencies of each selected gene performing RT-qPCR using cDNA dilutions (1:5, 1:10, 1:100, 1:1000). Amplification efficiency E was measured as 10 −1/slope and expressed in percentage (Supplementary Table S3). All the primers showed high E values ranging from 92% to 100%. We also analyzed the specificity of the amplification through melting curves for all pairs of primers used and in all cases observed a single peak accounting for a single PCR product ( Supplementary  Fig. S1). We checked the presence of contamination and primer dimers with the analysis of the non-template control melting curves. Only in one of the three technical replicates corresponding to NbTspan gene, we observed a small peak. Analysis in detail of its corresponding Cq value (35.25) showed that was nearly 10 Cq-values lower  35 . Two genes were the exception, with mean Cq values of 34.4 (NbLip) and 30.6 (NbP5βR), and were consequently excluded from further analysis. Among the 10 remaining genes, NbEF1α and NbTspan had the largest (5.2) and smallest (1.4) difference between maximum and minimum Cq values, respectively.
Based on different algorithms, three newly identified reference genes are the most stably expressed. To estimate gene expression stability, we analyzed RT-qPCR data with three different software tools. We first used geNorm software 36 to establish the average expression stability value M. This program determines the pairwise variation of each gene with all other analyzed genes under the same experimental conditions. The lower the M value, the more stable the gene is. Three genes presented the highest variability, with M values over the usually proposed cutoff value of M ≤0.5 37 . These genes were NbKLC (M = 0.659), NbEF1α (M = 0.618) and NbGADPH (M = 0.556). This software also selects an optimal pair of reference genes and in this experiment the most stable ones were NbCENPO and NbUbe35 with M value of 0.306 ( Fig. 2A). We then estimated the minimal number of reference genes to be used. To achieve this, we determined the pairwise variation (V) of a normalization factor (NF) calculated by introducing reference genes one by one, starting from the two least variable and adding the rest in a decreasing stability order until the whole set was included 36 . We decided to analyze our data as a whole, only including PTI activation (Pf 55 and mock), only including ETI activation (Pst DC3000 and Pst DC3000 ΔhopQ1-1), only including samples taken at 6 hai (6 h) and only including samples taken at 12 hai (12 h) (Fig. 2B). Regardless of using the complete dataset, the defense response or time-point subsets, the results were very similar. In all of the cases, the V2/3 value obtained was smaller than the proposed cut-off of 0.15 36 , suggesting that only the two most stable reference genes (NbCENPO and NbUbe35) identified by geNorm software are sufficient for a good normalization of RT-qPCR data, regardless of the type of response evaluated (PTI or ETI) or time-point (6 or 12 h).
Another algorithm that also calculates an M index is NormFinder 38 . To calculate this index, NormFinder estimates the intragroup (within each sample/treatment) and then the intergroup (within different groups of samples/ treatments) variation. Similarly to geNorm analysis, NormFinder selected NbEF1α, NbKLC and NbGADPH as the most variable genes, with M values of 0.402, 0.387 and 0.352, respectively. However, the most suitable reference genes derived from NormFinder analysis were NbPP2a (M = 0.167), NbErpA (M = 0.216) and NbNQO (M = 0.242) (Fig. 3).
To further analyze the candidate gene stability, we used BestKeeper 39 . This tool allows the analysis of 10 genes in two steps. First, it calculates different statistical parameters and then, a coefficient of correlation (r) is obtained by comparing a BestKeeper index with each particular gene ( Table 2). According to standard deviation (SD) values all the genes under study were suitable to be considered as reference genes (SD [±Cq] <1 and SD [±x-fold] <2) 39 . We analyzed their variation parameters (SD [±Cq] and CV [% Cq]), and observed that NbTspan, NbUbe35 and NbNQO were the most stable genes and NbEF1α, NbGADPH and NbKLC, the most variable ones. The calculated r values for the comparison of each gene with the BestKeeper index were inconsistent with the SD-based analysis described above. For example, according with SD [±Cq] value, NbEF1α is the most variable gene but its  (Table 2). This approach is employed in the RefFinder tool 40 as a ranking method for the BestKeeper output.
When we compared the outputs of the statistical programs used, we observed a certain degree of discrepancy mainly in the selection of the genes with lower variability across the experiment. This degree of divergence among the stability ranking generated by geNorm, NormFinder and BestKeeper has been previously reported and could attributed to the fact that these tools are based on different algorithms 31,[41][42][43] . In order to analyze the results globally, we calculated the arithmetical mean of the ranking value obtained for each gene using all three algorithms 31,41,42,44 . As a result, NbUbe35 was rated as the most stable with a mean ranking value of 2.33 (Table 3). With this overall stability ranking we reanalyzed the pairwise variation in order to establish the minimum number of reference genes for normalization, considering the proposed threshold of 0.15 36 . Based on this analysis, the combination of NbUbe35 and NbNQO is sufficient for accurate normalization (Supplementary Fig. S2) and we consequently used this combination for further experiments.
Validation of the selected genes confirmed their suitability as reference genes. Using BlastP we identified two putative N. benthamiana orthologs of a previously described tomato gene (Solyc02g069960) that is induced by PTI 8 : Niben101Scf04323g01009.1 and Niben101Ctg15860g00004.1 (NbNAC042). According to our RNA-seq data the first one had very low expression (0-0.46 RPKM), while NbNAC042 was induced by PTI with RKPM values ranging between 0 and 20.53 (Supplementary Table S4). For this reason we chose NbNAC042 to put to test the two most stable genes described here (NbUbe35 and NbNQO) and compare their performance to  a middle-ranked gene (NbTspan) and a traditionally used reference gene (NbEF1α). Regardless of the reference genes used the trend of transcript abundance increase of NbNAC042 upon PTI induction (6 and 12 hai), was similar (Fig. 4A). Nevertheless, the combined use of NbUbe35 and NbNQO resulted in lower standard deviation values allowing establishing statistically significant differences at both time-points and at a lower significance level. To our surprise NbNAC042 gene expression increased with ETI activation (Pst DC3000 vs. Pst DC3000 ΔhopQ1-1), given Solyc02g069960 in tomato is not affected by ETI at 6 hai 8 . Again, the use of the combination of NbUbe35 and NbNQO lead to lower deviations and lower significance levels (Fig. 4B). These results highlight the relevance of the selection of accurate reference genes for the experimental system under study.

Discussion
Due to its amenability to genetic transformation, virus induced gene silencing and transient protein expression, Nicotiana benthamiana has become very popular in the plant biology field 18 . Particularly, its susceptibility to a wide variety of pathogens made this model plant one of the most widely used in molecular studies of plant-pathogen interactions 19 . RT-qPCR is a frequently used technology for detection and quantification of gene expression, but accurate data interpretation highly depends on the use of appropriate reference genes whose expression should have minimal variations in the tissue, treatment or condition to be analyzed 35  RNA-seq has become a powerful technology used for transcriptomic analysis in different organisms and treatments 27,28,48 . The information generated using this technique was used in the plant research field for the selection of new and more robust RT-qPCR reference genes in grape, soybean, potato, Lycoris and tomato 31,[49][50][51][52] . In this work we used a new approach for the selection of novel stably expressed genes that can be used in N. benthamiana-bacterial interaction studies. We have taken advantage of previously generated tomato RNA-seq information 7,8,31 . These studies include transcriptional changes of tomato leaves for studying PTI and ETI activation and the influence of bacterial effectors on plant defenses, through the infiltration of MAMPs and bacterial strains and mutants along with untreated tomato plants. Together these experiments constitute a robust and large set of data that allowed the identification of novel reference genes in the tomato-Pseudomonas pathosystem 31 . Due to the closeness between the two species, we were able to find the N. benthamiana gene orthologs of the stably expressed tomato genes previously reported. This strategy was earlier useful to find reference genes in pepper using microarray information generated from tomato 53 . In this work we produced new RNA-seq data from N. benthamiana leaves infiltrated with Pseudomonas fluorescens 55, whose comparison with a mock treatment, accounts for a strong transcriptional PTI induction 7 . In agreement with this, we were able to identify 10,300 genes differentially expressed in N. benthamiana when challenged with this strain (Supplementary  Table S1), This new dataset assisted in the selection of 9 novel genes with low coefficient of variation in the N. benthamiana-Pseudomonas pathosystem. This strategy led to the identification of a set of stably expressed genes in N. benthamiana that mostly differs from those previously found in tomato by an RNA-seq approach 31 , further confirming that it is necessary to evaluate reference genes for every system.
Using different algorithms such as geNorm 36 , NormFinder 38 and BestKeeper 39 we identified 3 reference genes (NbUBE35, NbNQO, NbErpA) that are more stably expressed compared to the commonly used ones that we selected for comparison (NbEF1α and NbGAPDH). We strongly recommend the use of NbUbe35 jointly with NbNQO as reference genes for expression studies that involve N. benthamiana leaf tissue infiltrated with Pseudomonas spp. or related experiments. According to our pairwise variation analysis, the use of two of these genes is sufficient to obtain accurate results (Supplementary Fig. S2). We also included in our analysis the NbPP2a gene, previously found to be the most stably expressed gene in N. benthamiana-virus interactions 25 . Although this gene performed fairly well in our study system, we found 3 genes whose overall expression was more stable (Table 3). When we put to test the combination of NbUbe35/NbNQO against a middle-ranked gene (NbTspan) or NbEF1α as reference, we found discordant results in terms of the capability of identifying statistically significant differences. Although the expression trend was similar, the high standard deviation obtained using either NbTspan or NbEF1α as reference prevented the detection of significant differences between the samples (Fig. 4).  These results highlight the importance of the selection and validation of reliable reference genes for an accurate interpretation of the results. We therefore encourage the use of the information generated in this work in future RT-qPCR experiments involving N. benthamiana-Pseudomonas spp. pathosystems.
In all the experiments, three biological replicates per infiltration were used. Details of the experiments are shown in Table 1.
RNA-seq library preparation and analysis. Total RNA was isolated using TRIzol reagent (Life Technologies, NY, USA) and libraries prepared as described previously 8 . Barcoded libraries were multiplexed by 12 in each lane and sequenced on an Illumina HiSeq 2000 equipment with 101 bp pair-end read mode. Sequence reads generated in this work have been deposited in the NCBI sequence read archive (SRA) under accession number SRP118889. Analysis of the RNA-seq data was performed as described previously 8 . selection of the candidate genes and primer design. Taking advantage of tomato gene expression stability ranking previously generated based on RNA-seq data 31 , using BlastX analysis from Sol Genomics Network 54 , we identified 50 N. benthamiana orthologs which we hypothesized had stable gene expression. Using the newly generated RNA-seq data (N. benthamiana leaves challenged with P. fluorescens 55) we discarded those with low expression (RPKM <3) and ranked them based on their coefficient of variation (CV) across treatments and biological replicates. We then selected the 9 most stably expressed genes for validation (Supplementary Table S2). Additionally, two traditional reference genes used in N. benthamiana RT-qPCR experiments (NbGADPH and NbEF1α) and NbPP2a, the most stably expressed gene identified in a previous report using virus-infected N. benthamiana plants 25 , were included for the analysis.
We used BlastP to identify two putative N. benthamiana orthologs of a previously reported tomato gene (Solyc02g069960) that is induced by PTI 8 . From the two closest found, Niben101Scf04323g01009.1 and Niben101Ctg15860g00004.1 (NbNAC042), we selected NbNAC042 based on its gene expression level and PTI induction (Supplementary Table S4), to test the performance of the most stable reference genes described in this work.
The nucleotide sequence of each gene was downloaded from the Sol Genomics Network webpage 54 and primers were designed using PrimerQuest tool (Integrated DNA Technologies). Primer efficiencies were checked by RT-qPCR using different cDNA dilutions. This information, along with the accession number of all N. benthamiana genes used in this work, is show in Supplementary Table S3. Dissociation curves were performed to confirm amplification specificity (Supplementary Fig. S1).
RNA isolation and cDNA preparation. Total RNA was isolated using the Tri-Reagent (Sigma Aldrich) following the manufacturer's instructions. RNA integrity was assayed by 1% agarose gel electrophoresis. Total RNA (8 μg) was processed with RQ1 RNase-free DNase (Promega) for 60 minutes at 37 °C to eliminate potential DNA contamination and then purified using a chloroform:octanol mix (24:1). RNA concentration and purity was determined using a CLARIOstar microplate reader (BMG Labtech). Purified RNA (2.4 μg) was used to prepare cDNA using M-MLV reverse transcriptase (Promega) with random primers according to the manufacturer's instructions.
Rt-qpCR reactions. RT-qPCR was performed as described previously 55 in 96-well plates (Thermo Fisher Scientific) on a StepOnePlus system (Applied Biosystems). Primer sequences and characteristics are shown in Supplementary Table S3. The reaction mix was performed using: 5 μl of FastStart Universal SYBR Green Master (Rox) (Roche Life Sciences), 2 μl of 2 μM primer mix, 2 μl of a diluted 1:10 cDNA and water to complete a final volume of 10 μl. Cycling conditions were 95 °C for 10 min, and 40 cycles of 95 °C for 15 s, 60 °C for 1 min. All RT-qPCR experiments were performed using three biological and three technical replicates. evaluation and validation of reference gene expression stability. Data obtained from the RT-qPCR experiments were analyzed using three statistical programs: geNorm 36 , NormFinder 38 and BestKeeper 39 . The relative expression of NbNAC042 gene was expressed as E −ΔΔCq , where E corresponds to the primer efficiency value. When a pair of reference genes were used (NbUbe35/NbNQO) the geometric Cq mean and efficiency average were employed.