Introduction

The ability to measure variation in gene expression due to host and pathogen differences, between and within individuals, and over time, is crucial for identifying genes involved in disease progression. The fluorescence based reverse transcription quantitative polymerase chain reaction (RT-qPCR) is the gold standard procedure for quantifying gene expression. In recent years, it has emerged as a vigorous and widely used technique in quantitative data analysis due to its high sensitivity, specificity, reliability, reproducibility, swiftness, ease of process and high data throughput in comparison to other quantification procedures such as microarray, northern blotting or ribonuclease protection analysis1,2,3,4 In recent years the advent of next generation sequencing (NGS) technology has arisen as the favoured method for global quantification of gene expression, however qPCR is still regarded as the gold standard for confirmation of NGS results and is the standard method in studies focussed on smaller numbers of genes and clinical diagnostics and there is frequently a need to confirm that the two methods provide comparable results5. There is no disagreement that RT-qPCR is a robust technique for quantifying gene expression, but the reliability of RT-qPCR results depends on multiple factors such as RNA integrity and quantity, accurate reverse transcription, primer efficiency and most importantly, suitable stable internal gene selection for normalization6. To minimize inaccuracies in quantification of expression of genes of interest, a stable reference (housekeeping) gene is required as an endogenous control to normalize the technical variation within the experimental conditions7. A reference gene with unstable expression may generate misleading results and inaccurate conclusions.

Historically, many RT-qPCR studies in humans and various animal species have been normalised using the reference genes glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and beta-actin (ACTB), however these genes can have variable expression stabilities across tissue types and experimental conditions8,9. Ideally, the expression level of a reference gene should remain stable across various development stages10,11,12, types of tissues13,14,15, with cancer progression16, under the influence of physiological hormones17 and under different environmental and health conditions18,19. However, there is no single reference gene available that remains stable in all conditions. Therefore, experimental validation of reference genes should be carried out for each type of tissue, disease state and for other relevant variables. The guidelines for the minimum information required for the publication of quantitative real-time PCR experiments (MIQE) have been developed to evaluate the acceptability of RT-qPCR data, including the requirements for standardizing experiments20.

This study was part of a larger-scale project investigating the pathogenesis of koala retrovirus (KoRV). The koala (Phascolarctos cinereus) is an arboreal herbivorous marsupial species and a popular icon of Australia. The population is under threat from multiple factors to the extent that koalas are now nationally listed as vulnerable to extinction21. Habitat loss, dog attacks and disease are key drivers of koala population decline with particular threats being koala retrovirus (KoRV) and chlamydiosis22. Lymphoid neoplasia has long been recognised as a common malignancy of koalas, and because retroviruses are known to cause neoplasia in other vertebrate species23, it has been hypothesised that KoRV plays a role in this disease in koalas24,25,26. Moreover, since retroviruses are also associated with immunodeficiency in their hosts, it is speculated that KoRV may be involved in the susceptibility of koalas to opportunistic infections such as chlamydiosis27. Across their distribution, koala populations differ in their KoRV proviral and viral RNA loads and health status, with the incidence of chlamydiosis and malignancy being higher in northern populations of koalas compared to the southern populations22,28,29,30.

A greater understanding of the association between KoRV gene expression and disease is needed. One of the barriers to measuring KoRV gene expression has been the lack of suitable reference genes for normalisation of expression values in RT-qPCR experiments. To the best of our knowledge, there are no published data available on stable expression analyses of reference genes in koala tissues. The objective of this study was to identify stable reference genes in lymph node tissue in koalas from a northern and a southern koala population.

Results

Assessment of RNA quality

Total RNA was extracted from lymph node tissues of 19 koalas from Queensland (QLD, n = 11) and South Australia (SA, n = 8) koalas. The absorbance ratio at 260 and 280 nm (A260/280) of all purified RNA samples was between 2.0 and 2.2 and the A260/230 ratio values of all samples were greater than 2.0 (Table S1).

Verification of primer specificity and PCR efficiency for qPCR

Expression profile of the candidate reference genes

The steadiness of mRNA expression for each of the 13 candidate reference genes in koala lymph node tissues was analysed through Ct values. In regards to expression level, only ACTB was highly abundant with its average Ct values ranging from 13.17 to 15.85 (Fig. 1). Ten other candidate reference genes expressed at a medium level, with average Ct values ranging from 20.98 to 25.41, whilst Smap and Sec. 22b showed very low levels of expression (mean Ct = 30.27 and 29.25). The descriptive statistics of the Ct values across all the samples of each gene are available in supplementary information (Table S2).

Figure 1
figure 1

Threshold cycle (Ct) values for 13 analyzed genes obtained using qPCR in lymph node tissues of QLD and SA koalas. Box shows the 25/75 percentile; A line across the box indicates the median; Whiskers extend 1.5 times the interquartile range from the 25/75 percentiles; outliers are represented by dots; n = 19 sample points.

Expression Stability of Candidate Reference Genes

The expression stability of 13 candidate reference genes was evaluated through geNorm, BestKeeper, NormFinder, comparative ΔCt and finally, overall stabilities were ranked using comprehensive RefFinder tool across all the QLD, SA and combined koala tissues.

geNorm analysis

The geNorm algorithm evaluated the stability of reference genes based on expression stability value (M), as shown in Fig. 2A–C. All evaluated housekeeping genes had an M value below 1.5 which is the recommended geNorm cut off value for stable gene selection through RT-qPCR analysis9. This result confirms that the candidate reference genes were stable across lymph node tissues from different koalas. With the lowest M values, Grk2 and Hmg20a were the most stably expressed genes among SA koalas whereas Hmg20a and Ndufaf3 were the most stable genes in QLD koalas and also when the groups were combined. Overall, Hmg20a was found to be the most stable gene and Pdap1 was the least stable gene with the highest M value in separate QLD and SA population evaluations and also in combination group analysis.

Figure 2
figure 2

Evaluation of the stability of expression for 13 candidate reference genes in koalas using GeNorm (AC), BestKeeper (DF), NormFinder (GI), the comparative ΔCt method (JL) and the RefFinder tool (MO). QLD represents Queensland population evaluation result; SA represents South Australian population; Both represents combination of the QLD and SA populations. The most stable reference genes have the lowest expression stability values.

To determine the optimal number of genes needed for RT-qPCR normalization, the average pairwise variation (V) was calculated between two consecutive normalization factors NFn and NFn+1 across individual QLD and SA koala groups and in both populations. Generally, an additional reference gene is included at each step until V drops below 0.15. Below this point, additional reference genes are unlikely to have a beneficial impact for further improvement of data normalization12,31. In our study, the mean pairwise variation for the expression of genes ranked 2 and 3 was <0.15 and remained <0.15 for the addition of all other reference genes across both QLD and SA subgroups and the combined group, as shown in Fig. 3.

Figure 3
figure 3

GeNorm calculation of pairwise variation analysis for choosing the optimal number of reference genes required for normalization, with a sequentially increasing number of genes. QLD: only QLD population evaluation result; SA: only SA population; Both: combined QLD and SA population. (A) value < 0.15 indicates that the additional reference gene would not dramatically affect the normalization.

BestKeeper analysis

In the BestKeeper evaluation, Bet1 and ACTB demonstrated the highest stability in QLD, SA and combined evaluation, a finding that was completely discordant with results from the other four algorithms. Smap and Pdap1 exceeded the standard deviation (SD) threshold value of 1 across all geographical groups of this study and thus they were regarded as not suitable for subsequent gene expression normalization (Fig. 2D–F). Sec. 22b and Pdap1 also crossed the SD threshold value in the SA population and in the combined group. Overall, using this analytical approach, Bet1, ACTB, Stx12, GAPDH, Hmg20a, Ndufaf3, Tmem97, Grk2 and Nckap1I remained stable in all sample sets.

NormFinder analysis

Based on NormFinder, the top five ranking candidate reference genes for QLD koalas were Tmem97 > Stx12 > ACTB > Nckap1l > Hmg20a. For SA animals, they were Tmem97 > Stx12 > Nckap1l > Ndufaf3 > ACTB. For a combination of both populations, they were Tmem97 > Stx12 > Nckap1l > ACTB > Hmg20a. As shown in Fig. 2G–I, NormFinder suggested Tmem97 was the most stable and Stx12, the second most stable gene across all experimental groups, which was in contrast with the geNorm output. Conversely, in alignment with geNorm and BestKeeper results, Pdap1 was identified as the least stable gene in QLD, SA and in the combined groups.

The comparative ΔCt analysis

According to the comparative ΔCt method, Tmem97 was selected as most stable and Stx12 was the second most stable gene in the QLD group and in combined group analysis. These results were very similar to NormFinder, but contrasted with results generated by the geNorm and BestKeeper algorithms (Fig. 2J–L). In contrast, Nckap1l was the most stable gene for the SA population and Tmem97 was the second most stable. Pdap1 was the most variable gene across all groups, this finding was consistent across all tested algorithms.

RefFinder evaluation

The four different algorithms produced divergent ranking of the stable genes. This variation has been noted in previous studies32. Based on the geometric mean value of each gene, comprehensive stability was evaluated through the RefFinder tool and candidate reference genes were ranked from 1 (most stable) to 13 (least stable) (Fig. 2M–O). For QLD animals, the comprehensive ranking of candidate reference genes from 1 to 5 was Tmem97 > Hmg20a > Stx12 > ACTB > Ndufaf3. For SA animals, it was Tmem97 > Hmg20a > Stx12 > Nckap1l > Ndufaf3. For the combination of both populations, it was Tmem97 > Hmg20a > Ndufaf3 > Stx12 > Nckap1l. Overall, RefFinder chose Tmem97 and Hmg20a, as the most stable genes in all of the samples as these expressed the lowest Geomean of the ranking values.

Discussion

Quantitative real time PCR has been used widely in molecular biology research to evaluate gene expression under certain conditions. To deliver biologically relevant data, normalization of gene expression with a reference gene is crucial. However, the selection of endogenous control genes is critical as gene expression should be stable in any experimental conditions or any type of tissues. In reality, there is no single universal stable gene available which is consistent in all experimental conditions. Consequently, reference gene stability needs to be verified prior to RT-qPCR studies for each tissue type. This study represents the first analysis of this type of evaluation on koala lymph node tissue. The stability of 13 candidate reference genes was assessed in koala submandibular lymph nodes across northern and southern koala populations. As shown in Fig. 1, not all of the 13 analysed candidate genes had similar patterns of expression, suggesting variation in transcript abundance.

In this study, four different statistical algorithms were used to calculate the expression stability of 13 candidate reference genes for QLD and SA koala lymph node tissues with the QLD and SA populations analysed separately and also in combination. The comparative ΔCt method9,33 and geNorm9 algorithm evaluated the stability through intragroup differences and mean pairwise variation. The BestKeeper calculates the standard deviation of Ct values to determine stable genes and also used intragroup variation34 whereas NormFinder used both intra- and inter group variation. The RefFinder algorithm is used widely to calculate the geometric mean of the four algorithms results35. The outcome from the various algorithms is often dissimilar due to different approaches used by algorithms to evaluate the stability.

While the results did vary across populations and algorithms a stable set of 7 genes that would be acceptable for use as reference genes (Tmem 97, Hmg20a Ndufaf3, Stx12, Nckap1l, GAPDH or ACTB) could be identified with a smaller subset (Pdap1, Sec. 22b and Smap) that are not suitable in several algorithms also identified. For the QLD population, Tmem97 and Stx12 were selected as the most stable genes through the ΔCt method and NormFinder. Bet1 was most highly ranked by BestKeeper. Hmg20a and Ndufaf3 were selected as the best pair of genes through geNorm pairwise variation evaluation. Regarding the SA population evaluation, ranking positions were dissimilar to the QLD population. In the SA population Nckap1l and Tmem97 were selected as the most stable genes by comparative ΔCt method, while Grk2 and Hmg20a were chosen as the best pair of stable genes through geNorm. The BestKeeper and NormFinder results for the two most stable genes in SA were similar to the QLD population ranking. In the combined population analysis, the ranking position of the two most stable genes was identical with the QLD population studies across different algorithms. Overall, the multiple algorithms produced dissimilar results, and RefFinder was used to solve this issue of discordance. In this comprehensive ranking, Tmem97 and Hmg20a were selected as the two most stable genes for the subgroup and overall combination analysis of lymph node tissues.

ACTB and GAPDH, the most commonly used reference genes for normalization of quantitative expression studies in koalas30,36,37, were not ranked as the most stable genes with any of the algorithms or in any experimental group. The BestKeeper algorithm chose ACTB as the second most stable gene in all experimental conditions but in other algorithms and the final comprehensive analysis, its ranking position indicated lower stability. GAPDH was chosen as the third ranked reference gene based on geNorm algorithms, but its stability was lower with the other statistical algorithms and RefFinder.

In the geNorm algorithm evaluation, all reference genes expressed mean pairwise variation value of <0.15, indicating that there would be no improvement in the accuracy of data normalisation for the use of more than the top two ranked genes. However, in general three reference genes are recommended for accurate normalization in gene expression studies. Based on this study the top ranked genes overall in the RefFinder analysis in all conditions (Tmem 97 and Hmg20a) would be good choices, potentially in combination with any gene from the top 7 ranked candidates for all groups analysed (Ndufaf3, Stx12, Nckap1l, GAPDH or ACTB) as the stability of any of these genes was very similar across all analysis.

Tmem97 and Hmg20a are rarely used as endogenous controls to normalize expression levels in any species and there is no literature on their use in koala studies. Tmem97, a conserved integral membrane protein coding gene, is associated with cholesterol level maintenance38 and Hmg20a is involved with neuronal differentiation39 in mice. Transcripts of both were abundant in this study (Fig. 1). Consequently, Tmem97 and Hmg20a should be used in future normalisation studies on lymph node tissue in koalas. In addition, there may be further suitable reference genes that have not yet been explored. The use of the NGS generated transcriptomics data available for lymph node tissue to pre-select candidate genes with little variation in RNA expression for RT-qPCR normalisation appears to have been highly successful in nearly all the genes selected this way would be suitable for use in future studies. With essential tools now in place for marsupial gene normalisation (at least in koala lymph node tissue) from this study, future studies on relative gene expression and pathogen abundance now have a more robust method available for comparative gene expression RT-qPCR studies. Given that lymphocytes are the likely target cell for KoRV, a prevalent pathogen of koalas, lymph node expression studies are vital to understand whether KoRV can induce neoplasia or immunosuppression in koalas.

Methods

Sample collection

Submandibular lymph nodes were collected from 19 wild adult koalas from northern (n = 11, South-East Queensland) and southern (n = 8, Lofty Ranges, South Australia) populations. The animals were euthanized due to sickness or injury of various causes (full details are in supplementary information Table S1). Prior to euthanasia, koalas were anaesthetised with 0.25 ml Zoletil (tiletamine/zolazepam) (Virbac) intramuscularly. Euthanasia was performed with an intravenous injection of lethabarb (pentobarbital) (Virbac). Tissues were collected within 3 hours of euthanasia, placed in RNA later (Qiagen) and stored at −80 °C until RNA extraction. All procedures were approved by the University of Queensland Animal Ethics Committee (Animal ethics number ANFRA/SVS/461/12 and ANRFA/SVS/445/15), Queensland Government Department of Environment and Heritage Protection (Scientific Purpose Permit WISP11989112), University of Adelaide Animal Ethics Committee (Animal ethics number S-2013-198) and SA Government Department of Environment, Water and Natural Resources (Scientific Purpose Permit Y26054). All work was performed in in accordance with the guidelines and regulations of these bodies.

RNA extraction and cDNA synthesis

Total RNA was extracted using Qiagen RNeasy Plus Universal mini kit (Qiagen, Hilden, Germany) following the manufacturers instructions. RNA was further purified using an RNeasy mini kit (Qiagen, Hilden, Germany) with on-column DNase digestion (Qiagen, Hilden, Germany) as per the manufacturers guidelines. The quantity and quality of extracted RNA was assessed using a Nanodrop 1000 spectrophotometer (Thermo Scientific, Australia) and Agilent 2100 Bioanalyzer. cDNA was synthesized from 200 ng RNA using the QuantiTect Reverse Transcription kit (Qiagen, Germany) with a manufacturer provided oligo (dT) primer in a final volume 20 μl. The cDNA was diluted five-fold with RNAase free water and stored at −20 °C until required.

Selection of candidate reference genes

These animals are a subset of those included in a larger transcriptomics study described elsewhere 40. Data from that study (available at the European Nucleotide Archive with the accession number PRJEB21505 assembled transcriptome at https://doi.org/10.6084/m9.figshare.5492512) were accessed for this study. Normalised abundances (TPM) for annotated genes were determined using Stringtie41 following mapping of reads using Hisat242. Mean expression and variation across all replicates was determined and ranked for a) higher level of expression and b) low variation across all samples. The list of candidate stable genes is presented in Supplementary dataset 1 with their identifiers in the above data set, the sequences of the RNA transcripts of the selected genes are presented in Supplementary data set 2. A selection of the top 25 of these genes was then chosen for analysis of suitability as RT-qPCR normalisation genes via the MIQE guidelines. In addition to the stable genes identified from the transcriptome data, ACTB and GAPDH were also studied as these are the most commonly used reference genes in multiple koala studies.

Primer design and amplification efficiency analysis for RT-qPCR

Gene specific primers for RT-qPCR were designed using the NCBI Primer-BLAST tool (https://www.ncbi.nlm.nih.gov/tools/primer-blast/). The primers are detailed in Table 1. The assay was performed using Rotor Gene Corbett 6000 quantitative real-time PCR system (Qiagen). The size of the PCR products was between 95–145 bp and annealing temperature was optimized to 60 °C. cDNA was amplified using the PowerUp SYBR Green Master mix (ThermoFisher Scientific) following manufacturer instructions with a final reaction volume of 20 μl in each well containing 4 μl cDNA, 10 μl SYBR Green Master mix, 1 μl (10 μM) of each sense and anti-sense primer and 4 μl PCR-grade water. The PCR reaction was carried out with a hold temperature of 50 °C for 2 mins and 95 °C for 2 mins followed by 40 cycles of 95 °C for 15 s, 60 °C for 15 s and 72 °C for 1 min. To increase the accuracy and produce reliable results, all qPCR analyses were conducted with three replicates. For each primer pair, amplifications included a no-template control to ensure the absence of other contamination or primer dimer. Melting curve analysis was performed at the end of each PCR to verify primer specificity. Amplicons from each primer pair were tested by 2% agarose gel electrophoresis to verify the products’ size and absence of non-specific bands. The qPCR efficiency of each gene was determined by using slope analysis with a linear regression model. Undiluted cDNA samples were used to calculate the PCR efficiency and correlation coefficient (R2) for each primer pair based on the standard curve method. The standard curve was generated with five-fold serial dilutions of cDNA. The corresponding qPCR efficiencies (E) were calculated according to the equation E (%) = (10(−1/slope) − 1) × 10043. Amplicons from the newly designed primers were Sanger sequenced and aligned with transcriptome data to confirm identity of the product.

Table 1 Description of primers used in this study for RT-qPCR analysis.

Determination and validation of expression stability of reference genes

Ct values for all samples were exported into an excel spreadsheet using Rotor-Gene Q 2.3.1.49 software. The average Ct values of three replicates were used for further analysis. The reference genes expression stability was analysed with commonly used statistical applets geNorm9 (accessed as part of the qbase + analysis software from Biogazelle), NormFinder44, BestKeeper34, the comparative ΔCt method33 (accessed via the RefFinder web based tool (http://150.216.56.64/referencegene.php).

The GeNorm algorithm evaluates the expression stability value (M) of each housekeeping gene based on mean pairwise variation between selected candidate reference genes. The average ratio of two best stable genes expression levels should remain constant across all samples. The most stable gene shows the lowest M value and ultimately reference genes are ranked through stepwise elimination of genes with the highest M value. This statistical algorithm also evaluates the pairwise variation (Vn/Vn+1) to identify the optimal number of genes required for accurate normalization of real time PCR data9.

NormFinder determines expression stability through assessment of intra- and intergroup variation employing an ANOVA based approach. Genes with lower stability values exhibit higher expression stability in this algorithm44.

BestKeeper is used as an index to rank the stable reference genes through Pearson’s correlation coefficient as well as standard deviation (SD) and percentage covariance (CV) calculation of average Ct values. Analysed genes with SD values below 1 are considered as unstable and conversely in Pearson coefficients of correlation (R) analysis, the most stable genes exhibit values closest to 134.

The comparative ΔCt method compares the relative expression value of possible reference genes in pairs within each studied sample and ranks based on reproducibility of gene expression variation among experimental samples33.

Each algorithm might rank the candidate reference genes differently. To solve this issue, reference gene ranking made by RefFinder had been considered as final ranking because this tool RefFinder assesses all statistical algorithms and ranks the stable genes based on the geometric mean values. Analyses were done in three separate groupings, (a) Queensland koalas (QLD, northern population) (b) South Australian koalas (SA, southern population) and (c) combination of both groups.