Transcriptomic studies and assessment of Yersinia pestis reference genes in various conditions

Reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR) is a very sensitive widespread technique considered as the gold standard to explore transcriptional variations. While a particular methodology has to be followed to provide accurate results many published studies are likely to misinterpret results due to lack of minimal quality requirements. Yersinia pestis is a highly pathogenic bacterium responsible for plague. It has been used to propose a ready-to-use and complete approach to mitigate the risk of technical biases in transcriptomic studies. The selection of suitable reference genes (RGs) among 29 candidates was performed using four different methods (GeNorm, NormFinder, BestKeeper and the Delta-Ct method). An overall comprehensive ranking revealed that 12 following candidate RGs are suitable for accurate normalization: gmk, proC, fabD, rpoD, nadB, rho, thrA, ribD, mutL, rpoB, adk and tmk. Some frequently used genes like 16S RNA had even been found as unsuitable to study Y. pestis. This methodology allowed us to demonstrate, under different temperatures and states of growth, significant transcriptional changes of six efflux pumps genes involved in physiological aspects as antimicrobial resistance or virulence. Previous transcriptomic studies done under comparable conditions had not been able to highlight these transcriptional modifications. These results highlight the importance of validating RGs prior to the normalization of transcriptional expression levels of targeted genes. This accurate methodology can be extended to any gene of interest in Y. pestis. More generally, the same workflow can be applied to identify and validate appropriate RGs in other bacteria to study transcriptional variations.

. Workflow for validation of RGs and transcriptomic studies. Blue section presents the three steps to obtain suitable cDNA and control its quality. Orange section describes the method to choose pertinent candidates RGs and to obtain suitable PCR conditions from the primers design to the PCR parameters' optimization (E = 2 corresponds to 100% efficiency and E = 1 to 0%). Purple section shows the selection of the appropriate number of stable enough RGs using multiple algorithms. Green section displays the normalization of the gene of interest's expression and the use of the appropriate statistic tests to identify differences. Results expression level of the candidate RGs. A total of 29 candidate RGs were selected (Table 1) on the basis of our laboratory experience and of an extended literature screening of genes designated as "housekeeping" gene in Yersinia genus 28,29 and other genes commonly used as RGs in other bacteria species and present in Y. pestis genome [24][25][26]30 . In order to avoid co-regulation induced bias, we chose genes with various functions, distributed throughout the entire chromosome (Fig. S1) and not belonging to the same operon. The quantification cycle (C q ) values from each of the 29 candidate RGs were generated by RT-qPCR using RNA extracted from exponential and stationary-phase of Y. pestis grown at 21 °C, 26 °C and 37 °C in Luria-Bertani (LB) broth. Results are represented in a box-and-whiskers plot (Fig. 2). Position and dispersion parameters are represented in Table S1. Analysis showed consequent differences between each selected candidates, indicating a differential expression depending on the experimental conditions and an internal variability. The C q value medians ranged from 20.95 for rpoS to 31.52 for ribD except for 16S rRNA for which the median was found at 10.83. A C q value interquartile gap was observed from 0.42 [10.65; 11.07] for 16S rRNA to 3.31 [20.47; 23.78] for rpoA. A C q value variation of 1 corresponds to the doubling of RNA quantity and only five genes (16S rRNA, nadB, cysG, rpoD and tmk) displayed an interquartile gap equal or less than 1. The range varied between 2.3 [27.16; 29.46] for rpoB and 6.97 [18.62; 25.59] for rpoA. The C q value of only five genes (rpoB, nadB, mutL, gmk and 16S rRNA) had a range 3, that corresponds to a 8-fold increased RNA quantity. Under all growing conditions, C q value variations remain important even for genes with the lowest dispersion parameters.
For each temperature (21; 26 and 37 °C), RNAs have been extracted at exponential and stationary phases generating 6 groups. The intra-group variability was observed as drastically decreased with a C q value interquartile gap between 0. 16 [23.14; 27.15]). The C q value of five genes (16S rRNA, rpoD, fabD, rho and gyrB) had an interquartile gap equal or less than 0.5 in each sub-group (Fig. S2). Position and dispersion parameters in each sub-group are represented in Table S2.
Evaluation of candidate RGs stability. The position and dispersion parameters provided an overview of the disparity in expression level and internal variation of genes expression. However, these parameters based on absolute data are not sufficient to evaluate the stability of the expression of each candidate RG. Hence, relative expression has to be taken into account by statistical analysis in order to assess gene stability 31 . Four methods (GeNorm 32 , BestKeeper 33 , NormFinder 34 , and the delta-Ct method 35 ) have been used to rank the 29 candidate RGs. BestKeeper is the only one, which evaluates the stability of each gene independently of each other. Each algorithm provides a different ranking (Table 2). However, several genes were considered to be stable by different methods with overlapping ranks. Indeed, based on the results from GeNorm 32 and considering that a M-value lower than 0.7 is the cut-off value to select a suitable reference gene, 15 genes were considered stable: rpoD/proC, fabD, gmk, rho, thrA, adk, ribD, nadB, mutL, tmk, rpoB, tpiA, ftsZ and secA (in descending order with regard to stability). With the exception of secA, all the above cited genes as well as cysG, hcaT, recA, cysK and gapA were also found to be suitable using BestKeeper algorithm and based on the authors recommendations (cut-off value of the standard deviation lower than 1) 33 . Although there is no cut-off value or recommended number of genes for the NormFinder 34 algorithm, ranking using this latter algorithm was mainly in accordance with results obtained from the GeNorm algorithm. Indeed, the 15 genes previously identified as suitable RGs by GeNorm occupied the first 16 positions of the NormFinder ranking. Moreover, the 5 most stable genes (rpoD/proC, fabD, gmk, and rho) were the same according to these two algorithms and the delta-Ct method 35 ranked in a different order. Lastly, in order to determine a global ranking, we used the RefFinder 36 web-based tool to calculate the geometric mean of the four previous rankings. We found that gmk is the most stably expressed gene followed by proC, fabD, rpoD, nadB, rho, thrA, ribD, mutL rpoB, adk and tmk. Those twelve genes were considered stable enough to be employed as RGs by the four previously detailed methods and therefore were the best candidates for RT-qPCR experiment in Y. pestis. These selected genes were among those with the lowest dispersion C q parameters.

Set of stable RGs for RT-qPCR.
According to the MIQE guidelines 13 , the assessment of the stability of the RGs for each experimental condition should be performed prior to conduct the transcriptomic study. Furthermore, only one gene may not be sufficient unless its invariant expression has been proven. Vandesompele et al. 32 proposed to test up to five candidate RGs and to determine how many are necessary for a correct normalization. Thus, we propose the panel of the five following genes gmk, proC, fabD, rpoD and nadB, as an universal panel for transcriptional studies in Y. pestis. However, even if these five genes had been validated in each condition of temperature and growth phase (data not shown) and could be proposed for all studies, they might not be the best one for a normalization process in specific conditions. Therefore, we performed the RefFinder analysis by grouping the data according to the temperature or the state of growth (Table S3). By picking the five best-ranked genes belonging to the 12 previously validated genes in each condition, an optimal panel can be proposed for each condition (Table 3). These five genes are ranked from the most stable to the less stable in our conditions but should be used together and reevaluated for each new experiment.
Comparison between normalized expression level for the genes of interest. In Yersinia enterocolitica, the expression of acrAB (YPO3132-33) gene encoding a RND efflux pump has been reported to be regulated www.nature.com/scientificreports www.nature.com/scientificreports/  37 . In Y. pestis, in addition to acrAB, five putative RND efflux pumps genes have been identified: YPO0420-21, YPO1000-01, YPO2847-48-49, YPO3043 and YPO3482-83. We assessed the influence of the temperature and the growth phase on their expression.
Based on the expression of five ranked most stable genes gmk, proC, fabD, rpoD and nadB given by RefFinder according to the GeNorm method, the pairwise variation (V n /V n+1 ) was calculated between the normalization factors NF n and NF n+1 to determine the optimal number of required RGs. The genes inclusion has been stopped when the pairwise variation reached the cut-off value below 0.15, in our case after the fourth ranked gene. Therefore, gmk, proC, fabD and rpoD have been chosen as RGs (Fig. 3). The normalized gene of interest's Relative Quantity (RQ) position and dispersion data have been reported in Table S4 and represented in Fig. 4. Due to the non-normal distribution of the data the non-parametric Kruskal and Wallis test followed by a Holm-Bonferroni procedure (α risk set at 5%) have been used.
The results obtained showed that the regulation of acrAB in Y. pestis is function of temperature and state of growth and that RND efflux pumps genes expression is sensitive to the same stimuli. However, these trends that have been observed only from transcriptomic data do not allow us to extrapolate any phenotypic change or make any biological interpretation. For example, locus tag YPO1001 gene (Fig. 4B) transcription diminished with temperature with a 2.5-fold median decrease at the exponential phase and a 2.2-fold median decrease at the exponential phase between 21 °C and 37 °C. At the opposite, locus tag YPO3482 gene (Fig. 4F) showed a moderate increase in function of the temperature (only significant at the stationary phase). The other genes expression    Table 3. Five recommended reference genes in function of the state of growth. The five recommended reference genes correspond to the five best-ranked genes belonging to the 12 previously validated genes in each condition. They are recommended in exponential or stationary phase regardless the temperature and for each temperature regardless the state of growth. Five genes are also proposed for each specific condition of temperature and state of growth.

Effect of normalizing with non-validated RGs.
To illustrate the impact of the choice of the most appropriate RGs, we performed two alternative normalizations of the gene expression data. On one hand, we used the expression of the 16S RNA gene, the most commonly used normalizer 30 , and on the other hand we used the geometrical mean of the expression of our five last ranked genes, namely rpsL, dnaK, cysK, rpoS and rpoA. The results are presented respectively in Figs S3 and S4. By these two normalizations, the number of statistically significant results dramatically dropped from 32 to 14 and 16 respectively. Furthermore, three differences have been found as statistically significant with the 16S RNA normalization, but not with our RGs.

Discussion
The selection and validation of RGs are the most critical steps for transcriptomic studies 13 . For the highly pathogenic bacterium Y. pestis, a set of validated RGs might be proposed for exploring gene expression in many experimental conditions. However our results do not support the established practices and are in accordance with Bustin et al. who demonstrated that publications using RT-qPCR have deficiencies in the validation of experimental designs even in high impact factor journals 15 . Indeed, we observed that several genes such as 16S RNA, gyrA, recA or rpoA, which are commonly used to normalize the expression of targeted genes 30 are unsuitable for RT-qPCR in Y. pestis. We showed that growth conditions increased internal variability even in the most stable expressed genes and that the use of not-suitable RGs exposes the author to the risk of misinterpretation of results. It emphasizes the absolute necessity of choosing the right RGs to increase the signal-to-noise ratio and to validate results in the specific experimental condition. To our best knowledge, this is the first study validating RGs in Y. pestis on various growth conditions. More generally, we highlight that the previous validation of RGs before performing transcriptomic studies on prokaryotes is not standard enough while this step should be the prior requirement. This methodology is more conducted on eukaryotes, reason why all the algorithms we used were developed for humans, animals or plants.
The case of 16S RNA deserves to be more scrutinized because it is the most used gene on RT-qPCR in bacteria 30 . Even if this gene was found as the most stable gene by the BestKeeper algorithm, it did not reach the minimum requirements to be accepted by the other algorithms. In our study, the normalization using 16S RNA expression did not allow us to highlight as many differences as our validated standardization and even more identified differences that most probably do not exist. The suitability of 16S RNA as a RG was already questioned 38 . Moreover, while the MIQE recommendations state that the abundances should show a strong correlation with the total amounts of mRNA present in the samples 13 , 16S RNA has a very high level of transcription with about a 10 5 -fold ratio between its transcripts and the targeted genes mRNA. For these reasons, 16S RNA cannot be recommended as a suitable RG at least under our conditions. This illustrates the necessity of performing validation tests for each experimental condition and using several algorithms to perform this test.
The need for suitable RGs does not limit to RT-qPCR experiments but it may be extend to the recent advances in molecular biology as reverse transcription digital PCR (RT-dPCR). Through "The Digital MIQE Guidelines", Huggett et al., who signed the MIQE Guidelines in 2009, emphasized the necessity of justifying the number and the choice of RGs 39 for RT-dPCR. Even if absolute quantification obtained by dPCR have been proven to be more precise and reproducible than qPCR, these two technologies have been recently qualified as complimentary to obtain the best results for any experiment 40 . www.nature.com/scientificreports www.nature.com/scientificreports/ In conclusion, this methodology allowed us to observe slight variations of efflux pump gene's expression in Y. pestis. These differences could not be highlighted in previous transcriptomic studies using microarray expression data in Y. pestis [41][42][43] . The risk of bias is limited by using a validated set of RGs to normalize RT-qPCR data and following the MIQE recommendations with rigorous quality controls. This experimental workflow will authorize us to explore not only the interplay between the different active efflux systems by measuring their expression under various conditions but more widely all the transcriptomic changes in Y. pestis during its complete cycle of life.

Methods
All steps are designed to follow the MIQE guidelines 13 . Graphical representation has been made using Prism ® 6 (GraphPad Software, Inc.).

Bacterial strain and growth conditions. All experiments with live agents were realized in a Biosafety
Level 3 laboratory. Yersinia pestis CO92 strain 44 (Table S5). A spectrophotometer BioPhotometer ® Plus (Eppendorf) was used for OD determination. The MIQE guidelines 13 identified the sample acquisition as the first potential source of experimental variability in RT-qPCR experiment. In order to control sample-to-sample variation, we processed five independent biological replicates for each condition (30 in total).

RNA isolation and cDNA preparation.
At each defined growth's state, RNA was stabilized with the RNAprotect ® Cell Reagent (Qiagen). Bacteria were lysed and RNA was purified with the RNeasy ® Lipid Tissue Mini Kit (Qiagen). Samples were also treated with the RNase-Free DNase Set (Qiagen) for 10 min in the column. Extraction products have been stored at −80 °C. RNA concentration [150.8 ng/µL; 375.1 ng/µL] and purity was determined in duplicate using a NanoDrop 2000 spectrophotometer (Thermo Scientific) with 1.5 µL of product. Both OD 260/280 [2.13 and 2.20] and OD 260/230 [1.80; 2.49] ratios around 2 were considered as "pure" according to the manufacturer's recommendations. RNA integrity has been verified using a Bioanalyzer ® 2100 (Agilent) with a RNA Nano Chip (Agilent) and the RNA integrity number [9.8; 10] showed nucleic acid quality close to the maximal value 45 (Table S5). 1 µg RNA in a total reaction volume of 50 µL was used to prepare cDNA using the Reverse Transcription Core kit 500 RT-RTCK-05 (Eurogentec) with random nonameric primers. The initial step lasted 10 min at 25 °C and was followed by the RT step 30 min at 48 °C and the inactivation step 5 min at 95 °C. cDNA were stored at −20 °C. Because the RT step is known to be an important source of external variation 21 , three separated reverse transcriptions were performed for each extraction product (90 in total) and only one manufactured kit was used for all the procedure. A pool of cDNA was constituted by adding the same quantity of each www.nature.com/scientificreports www.nature.com/scientificreports/ RNA extract. This pool has been used as an internal calibrator for the qPCR experiment. A no-RT control with no reverse transcriptase has been performed to validate the absence of genomic DNA contamination.

Selection of candidate RGs, genes of interest and primer design.
For each identified gene, a pair of primers was designed using the Oligo7 primer analysis software (DBA Oligo Inc.) 46 based on the published genome 44 (Table 3). The genes of interest were those coding all potential RND efflux pumps in Y. pestis, based on a homology study of the AcrB protein 47,48 using the COG database 49 . The specificity of the primers were verified in silico using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) then in vitro as well as the lack of primer dimers by the presence of a single melting curve point after a qPCR, showing a single product for each pair of primers 50 .
Quantitative pCR. qPCR was performed in a LightCycler ® 480 (Roche) using the LightCycler ® 480 SYBR Green I Master (Roche) in 96 wells white PCR microplate and a sealing film (Fisher scientific). The reaction mixture, which was 20 µL in total volume, contained 10 µL of LightCycler ® 480 SYBR Green I Master 2 × (Roche), an equal quantity depending on the reaction of forward and reverse primer at 10 µmol/L and 1 µL of cDNA at the appropriate dilution. The mixture was filled to 20 µL by PCR grade water. The PCR program was started from one step of pre-incubation at 95 °C for 5 min, followed by 45 amplification cycles. Each cycle was started by a denaturation step at 95 °C for 10 s and followed by an annealing step at a specific temperature for a specific time depending on the reaction. The amplification step lasted 5 s at 72 °C. The last step was a melting curve analysis. For each run, 2 No Template Controls (NTC) have been performed. The quantity of primers and cDNA as well as the annealing time and temperature were determined to optimize the PCR efficiency (Table S6) as explained below.

Determination of the optimal quantitative real-time PCR conditions and the reaction efficiency.
We choose to express data obtained from the LightCycler ® 480 Software release 1.5.1.62 (Roche) as C q values 51 calculated by the 2(-Delta Delta C(T)) method 52 . PCR conditions for each candidate reference gene were determined to optimize the PCR efficiency by the generation of a three-point standard curve based on a two-fold dilution series of the cDNA pool. The efficiency was calculated from the slope of the standard curve generated for each run in the following equation E = 10 (−1/slope) , where E = 2 corresponds to 100% efficiency and E = 1 to 0% 22 . The efficiency for a specific dilution ratio was obtained from the standard curve passing by the two-fold upper and lower dilution. Because low PCR efficiency and high differences between PCR efficiencies can lead to dramatic quantification error, the E value should be as close as 2 as possible and homogenous for all the studied genes 53 . A value between 1.900 and 2.100 (90 and 110% respectively) is usually considered as suitable 26 . PCR parameters including the dilution of the cDNA sample were determined for each candidate reference gene to reach an efficiency of the reaction as close as possible from 2. All PCR parameters are reported in Table S6. The calculated efficiencies for the candidate RGs varied from 1.966 to 2.025 (96.6% to 102.5%) and from 1.957 to 2.007 (95.7% to 100.7%) for the genes of interest. The mean for the candidate reference genes was 1.986 with a standard deviation of 0.013 and respectively 1.989 and 0.026 for the genes of interest showing a high PCR efficiency level (close to 100%) and a good homogeneity between the different reactions. It is important to note that efficiency close to 2 is very important to obtain because some of algorithms as BestKeeper and Delta-Ct methods as well as the RefFinder web-tool use non-transformed data. Efficiency too different to 100% could induce some bias. The internal variability, estimated by calculating the difference of C q values between two identical and representative samples represented by the pool previously described was always lower than 0.30.
Evaluation and validation of candidate RGs stability. In order to avoid inter-run variability 51 , the samples were tested in one unique run in addition of four wells containing cDNA pool and two wells of negative control. No replicate has been performed at this step. The stability of reference gene candidates has been evaluated using four methods: Genorm 32 , NormFinder 34 , BestKeeper 33 and the comparative Delta-Ct method 35 combined in the online package RefFinder 36 .
• The GeNorm algorithm calculates for each candidate reference gene the pairwise variation with all other reference gene candidates. The control gene stability measurement M is defined as the average pairwise variation of one gene with all the other candidates. With the stepwise exclusion of the highest M value, the two most stable genes can be selected. Vandesompele et al. recommended to use at least the three most stable genes and to stepwise include the next most stable gene until the inclusion has no significant effect anymore 32 . • The NormFinder algorithm combines the intra and inter-group variations to estimate the systematic error, which is induced by the use of one reference gene candidate. Andersen et al. assumed that the average of the candidate genes expression would show less systematic variation than the individual candidate genes and advised to use several genes for normalization 34 . • The BestKeeper algorithm discards the less stable candidate RGs by calculating the standard deviation (SD) and a coefficient of variance of the derived crossing points. Pfaffl et al. advised to use the geometric mean of the best candidate RGs to normalize the expression data of the genes of interest. All those best genes are combined into an index. The Pearson correlation coefficient (r) and the probability p are calculated to describe the relation between the index and each candidate 33 . • For Silver et al., the Delta-Ct algorithm is based on the comparison of the expression of all possible pairs of genes within each pair-wise sample. The stability of the candidate RGs is estimated from the range and the percentiles of this difference and used to rank the candidate RGs 35 .
For using GeNorm and NormFinder, expression data had to be preliminary transformed into relative quantities (RQ) by taking into account the previously described efficiency according to the equation RQ = E ∆Ct(control-sample) . On the opposite side, BestKeeper and Delta-Ct methods used non-transformed data. On (2019) 9:2501 | https://doi.org/10.1038/s41598-019-39072-x www.nature.com/scientificreports www.nature.com/scientificreports/ the RefFinder 36 algorithm, Xie et al. used the C q data of each sample for each gene and processed them with the four previous algorithms. The global comprehensive ranking of the candidate RGs is based on the geometric mean of the ranking from the four algorithms. Expression data are not transformed and efficiency is supposed to be very close to 2.
Application to the study of efflux pumps. In order to study the expression of the genes of interest, we used the same methodology for the primer design and for the optimization of the qPCR than those developed for the candidate RGs. However, the RGs have been determined by using only the GeNorm algorithm with a cut-off value at 0.15. Indeed, this algorithm is the only one, which takes into account the PCR efficiency, indicates the number of references gene to use and provides a cut-off control value. The RQ of the genes of interest have been normalized using the geometrical mean of the RQ of the selected RGs. Statistical analysis has been performed by employing R ® software (The R foundation).

Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.