Innovative GenExpA software for selecting suitable reference genes for reliable normalization of gene expression in melanoma

Melanoma is characterized by a high mutation rate, and selection of the proper reference for RT-qPCR is a serious problem. The algorithms commonly used to select the best stable reference gene or pair of genes in RT-qPCR data analysis have their limitations; this affects the interpretation of the results and the drawing of conclusions. For reliable assessment of changes in B4GALT gene expression in melanoma, and for comparison with their expression levels in melanocytes, we implemented our innovative GenExpA software, selecting the best reference by combining the NormFinder algorithm with progressive removal of the least stable gene from the candidate genes in a given experimental model and in the set of daughter models assigned to it. The reliability of references is validated based on the consistency of the statistical analyses of normalized target gene expression levels through all models, described by the coherence score (CS). The use of the CS value imparts an absolutely new quality to qPCR analysis, because it clarifies how low the stability value of reference must be in order for biologically correct conclusions to be drawn. GenExpA works in a manner independent of the experimental model and the normalizer. GenExpA is available at https://github.com/DorotaHojaLukowicz/GenExpA or https://www.sciencemarket.pl/baza-programow-open-source#oferty. Highlights GenExpA – next-generation software for normalizer selection and validation The GenExpA tool defines how low the stability value of the reference should be The GenExpA tool determines the level of gene expression analysis robustness The GenExpA tool increases the speed of analysis and reduces cost


Introduction
Achieving reliable results for normalization of the transcript level of a target gene requires an internal reference gene or pair of genes (housekeeping genes, HKGs) with stable expression between the analyzed samples and under different experimental conditions. Moreover, it is believed that the transcript levels of the reference and target genes should be expressed at roughly the same level [1]. However, the transcript levels of many of the HKGs typically used as references may change significantly under physiological or pathological conditions as well 3 as across tissue types and experimental conditions [1,2,3,4,5,6,7,8]. Therefore the selection and validation of the normalizer are the most critical stages in qPCR data analysis.
Typically, one or more of five algorithms (geNorm [3], NormFinder [9], BestKeeper [10], the comparative ΔCt method [11] and RefFinder; http://150.216.56.64/referencegene.php) are used to select the best stable single or combination of reference genes from a panel of candidate genes. The normalizer with the lowest stability value is considered the best and is then used for calculation of target gene expression in a group of samples. We have recently shown that this commonly accepted approach seems not enough for accurate and reliable target gene transcript normalization and that it may lead to biologically incorrect conclusions [12]. We demonstrated that beyond the parent model of samples of interest (experimental model), it is necessary to construct daughter models (auxiliary models built from the same samples as the experimental model but with fewer samplescombinations of samples of interest without repetition), followed by selection and validation of an appropriate normalizer for each model. Here we have shown that validation of the normalizer is done by determining the coherence score for target gene expression analyses performed for all given models (experimental and daughter models). If the statistical analysis produces inconsistent results for the expression level of a target gene based on a comparison of individual sets of two samples through all models, we have to search for new references with improved stability values via progressive removal of the least stable candidate reference gene in each sample, followed by re-selection and re-validation of new normalizers each time [12]. However, this proposed approach of RT-qPCR data analysis has been a very time-consuming task, requiring the use of several specialist software packages. This problem led us to automate these calculations by developing the GenExpA tool (Gene Expression Analyzer). It is a comprehensive tool based on a previously described workflow for quantified as well as raw qPCR data [12]. Based on the selected reference gene or pair of genes, GenExpA calculates the relative target gene 4 expression level, performs statistical analyses and generates results in the form of graphs and tables. The innovative aspect of GenExpA is that it estimates the coherence of results for each target gene in all designed models. It determines the robustness of normalization and works out the coherence score for the individual target gene. The advantage of this strategy is its ability to determine relative target gene expression based on the consistency of the results in all tested samples, regardless of the experimental model and the reference gene or pair of genes used. Here, based on qPCR data for B4GalT1-B4GalT7 transcripts in melanocytes and melanoma cells, we show how to perform the analysis step by step, and how to solve problems using this new methodological approach.

Results
A flow chart of target gene expression and statistical analysis of qPCR data using the GenExpA tool is shown in Fig. 1. At the beginning, we uploaded raw qPCR data for candidate reference (HPRT1, PGK1, RPS23, SNRPA; set 1) and target (B4GALT1-B4GALT7) genes (Supplementary Table S1) as well as quantified qPCR data for candidate reference genes (Supplementary Table S2) obtained for five human cell lines (melanocytes and melanoma cells) from different stages of oncogenic progression) comprising the experimental model. It is important to note that here "cell line" means "sample". Since all PCR reactions were performed in three biological and three technical replicates, a set of nine Ct values (for each of the genes) and a set of nine quantified values (for each of the candidate reference genes) were used as inputs for a given cell line. The uploaded quantified values were calculated based on the calibration curve previously prepared for all candidate reference genes in all analyzed cell lines. Working with the GenExpA interface, the potential reference genes were selected from the pool of all analyzed genes presented in the 'Available reference genes' window, and a panel of 26 possible models (the experimental model and 25 daughter models which are combinations, without repetition, of 5 cell lines taken 2 or 3 or 4 or 5 at a time) was Overview of the GenExpA tool workflow. (a) (A) The user uploads a table with Ct values for 'n' candidate reference genes and 'y' target genes measured in 'x' samples. The number of Ct values for a given gene in a given sample is equal to the quotient of the biological and technical repeats of the PCR reaction. The user uploads a table with calculated quantified values for 'n' candidate reference genes. (B) In the setting mode (not shown in this paper; see the GenExpA manual), the user separates 'n' potential reference genes from a list of all genes. After clicking on 'Generate combinations', the GenExpA tool automatically creates a set of possible 'z' models, i.e., 'z' models composed of samples of interest, which are combinations without repetitions (order does not matter) of two or more samples from the pool of 'x' samples. The 'z' models consist of the experimental model and its 'z-1' daughter models (auxiliary models); for example, if the experimental model consists of 5 samples, the number of daughter models is 25 (10 models of 2 samples, 10 models of 3 samples, and 5 models of 4 samples). The user unchecks the statistical model, wherein the non-parametric Mann-Whitney test or the ANOVA Kruskal-Wallis test with subsequent post hoc Dunn's test are recommended for models composed of two unmatched samples or three or more unmatched samples, respectively, in the case of a non-normal distribution. Also, the user can apply only the ANOVA Kruskal-Wallis test without or with a subsequent post hoc Dunn's test for two or more samples, respectively. Alternatively, the pairwise t-test with Holm adjustment is recommended as a statistical model for multiple pairs of observations in the case of a normal distribution (to determine the significance of a difference in gene expression between two or more samples) [22). In the setting mode, the user determines the P value indicating the level of statistical significance (default P<0.05), and single measurement repetition (default 3) (not shown in this paper; see the GenExpA manual). The user starts the automated data analysis by clicking on 'Run calculations'; it causes the implemented NormFinder algorithm to select, in each of the analyzed models, the best references between 'n' potential reference genes (purple line) according to the stability values of their expression in the experimental model and its auxiliary models. The best stability value is referred to the minimal combined inter-and intrasample expression variation of the gene [9).  Table S3). By choosing the 'Export graph' option, box-plots representing the medians of the obtained RQ values with genes in one of the analyzed models (graphs for all models are compiled in Supplementary   Fig. S1). In this analysis, GenExpA calculated the average coherence score at level 0.94. This value resulted from unreliable/uncertain normalization of four target genes (B4GALT3, B4GALT5, B4GALT6, B4GALT7), which reached coherence scores below 1 Table 1. Reference genes and their stability values in a panel of the experimental model and daughter models selected by NormFinder software on the base of 4 (part A) and 5 (part B) potential reference genes before (0) and after removing one (1) or two (2) the least stable genes from a given set of 4 and 5 reference genes. ( Supplementary Fig. S2). Next, to improve the estimation of the expression of these target genes, the least stable reference gene in each model was removed by setting 1 in the 'Remove repetitions' box. Also, marking 'Select best remove for models' caused GenExpA to choose the reference from the level of removal for which the stability value was lower in a given model.  Table S8; Supplementary Fig. S8). Sequential removal of the two least stable potential reference genes (digit 2 entered in 'Remove repetitions' box and choosing 'Select best remove for models' box) allowed us to assign better stability values to 11 16 of the 26 models (Table 1, Supplementary Fig. S19).
However, starting the analysis with a higher number of potential reference genes creates an opportunity to select the reference gene with a lower stability value in the experimental model as well as auxiliary models, and therefore gives a more robust analysis of target gene  Fig. S19). Therefore, adding a fifth housekeeping gene is mandatory to confirm the robustness of an analysis conducted with the above-mentioned sets of four candidate reference genes. Interestingly, the lower the HKG stability value, the better the CS value, that is, the more robust the analysis, potentially much more correct biologically.
In the case of gene B4GALT6, none of the sets of four potential reference genes was designed well enough to conduct a reliable analysis. Only the application of five potential reference genes with double rejection of the weakest gene allowed us to determine a reliable normalizer in each model and to obtain a coherent analysis (box-plot 78 in Supplementary   Fig. S19). On the other hand, selection of the normalizer based on five potential reference genes and with 'Remove repetition' set at 2 led to an inconsistent analysis for the relative expression of gene B4GALT5 (box-plot 65 in Supplementary Fig. S19   Medians of relative quantity (RQ) for target genes B4GALT1-B4GALT7 in the experimental model normalized to a reference gene or pair of reference genes with the best stability values resulting from GenExpA analysis based on a set of five candidate reference genes and remove repetition level 2 (a) or remove repetition level 1 (b) (see Table 1, part B). The statistical analyses used the Mann-Whitney test for models composed of two cell lines, or the Kruskal-Wallis test (ANOVA for models built of three or more cell lines) with post-hoc test. Red line represents statistical significance, P<0.05.

Discussion
Obtaining reliable results from the PCR reaction is the outcome of many factors related to handling of the material, beginning with the step of RNA/DNA isolation and ending in analysis of the results. That is why the MIQE guidelines were proposed for transparent reporting of experimental details in all publication prepared with the use of qPCR technique [13]. The reaction needs to be standardized in molecular analyses of all kinds of biological materials [14,15,16,17]. should be stressed that each of these algorithms has its limitations which affect the ranking of candidate reference genes and finally the selection of the best normalizer [18,19]. Recent studies have proposed the use of at least three different methods, but the problem of how to interpret conflicting results obtained from the use of different statistical methods still puzzles researchers.
They try to integrate a few algorithms to average the stability ranks; for example, the RefFinder algorithm calculates the geometric mean of ranking values obtained by four other algorithms [18]. But these statistical methods differ from each other; averaging the ranks can lead to a suboptimal assessment of stable reference genes [19]. It is commonly accepted that the lower the stability value of the reference, the greater the certainty of correct determination of the relative expression of the target gene. Yet it has not been clarified how low the stability value must be. In this paper we showed that the procedure for choosing a suitable reference involves 16 two steps: applying an algorithm for best reference gene selection (here, NormFinder), and checking whether the stability level of the selected reference is low enough to lead to a correct biological interpretation of relative target gene expression (here, determination of CS for the analysis). Through the application of these two steps, our method validates the chosen reference based on determination of the coherence score for the normalized target gene. In our method we used the model-dependent NormFinder algorithm, and by simulating the removal of the least stable gene from a set of candidate genes in the experimental as well as daughter models we were able to obtain more stable normalizers for each model. Previously, this approach helped us to exclude incorrect normalizations and biological misinterpretations concerning the alteration of ERAP1 gene expression during melanoma progression [12]. We chose the NormFinder algorithm because it calculates the stability of a reference gene or pair of genes by analysing the variation of its expression within and between samples. However, genes with high overall variation influence the stability ranking of candidate reference genes. Herein we have demonstrated that simple application of NormFinder ('Remove repetition' 0) may lead to biological misinterpretation of the results, because it depends on the HKG set used to select the reference. Our method eliminates the influence of the most variable gene or genes on the stability rankings of all other candidate reference genes. In addition, breaking down the experimental model into daughter models (auxiliary models) allows us to check whether a uniform trend of target gene expression among particular cell lines is maintained. This trend decides the consistency of the analysis and is described by coherence score CS=1. Obtaining full consistency of the analysis is tantamount to its completion. As we suggest here, based on our B4GALT gene expression analysis, the lower the stability value, the better the coherence of the obtained results, which, we propose, is related to their biological correctness. If, despite the application of our strategy, a uniform trend of target gene expression among particular cell lines has not been achieved, introducing a successive HKG/s, along with repeated rejection of the weakest gene, can lead to selection of the better reference and, in turn, biologically correct conclusions. It is important to note that once selected, a reference for a given model will not necessarily be selected once again in an experiment repeated for this model after some time, even if the pool of candidate reference genes used is exactly the same. This is because the gene expression pattern at the time point of sample collection depends on the current rate of cell mass growth and on microenvironmental conditions [20,21].

Conclusions
Here we have presented an expanded version of our already published method for RT-qPCR data analysis implemented in the GeneExpA toolsoftware for reference gene validation and automatic calculation of target gene expression. GenExpA will assign a CS value ranging from 0 to 1 to each analysis; a fully coherent analysis is described by a CS value of 1. We suggest starting the analysis with a set of at least four potential HKGs. If the results are not satisfactoryif the CS value is below 1introducing another candidate HKG/s to the set of candidate reference genes may improve them. However, even if CS is equal to 1 for a given target gene, enlarging the pool of candidate reference genes can lead to an analysis that is more robust in terms of between-sample statistical significance in the experimental model. Interestingly, our workflow, including full target gene analysis in order to validate normalizer, shows that the lower stability value of the selected reference gene or pair of genes is related to the biological correctness of the results. Moreover, our workflow is the first one that allows the user to define the required stability value needed to draw biologically correct conclusions. For clarity of the biological conclusions drawn, the description of the results should give stability values assigned to the experimental model and auxiliary models; this may help other researchers to compare their results and understand any differences between their outcomes and yours. GenExpA software can carry out an analysis of many genes independently at the same time.

Cell culture
Human epidermal melanocytes (adult, HEMa-LP) were obtained from Gibco (Life Technologies Cancer Center, Poznań, Poland. Cell lines were cultured as described previously [12]. All cultures were verified mycoplasma-free using the MycoAlert mycoplasma detection kit (Lonza).

Reverse transcription qPCR
Total RNA extraction, the reverse-transcription reaction and real-time qPCR reactions were performed as previously described [12]. RNA and cDNA concentrations as well as their quality were assessed with a NanoDrop 2000 spectrophotometer (Supplementary Data).
Housekeeping (GUSB, HPRT1, PGK1, RPS23, SNRPA) and target (B4GALT1-7) genespecific mRNAs were amplified with the use of TaqMan Gene Expression Assays, as listed in Table 2. All reactions were performed in three biological and three technical replicates. The reaction results were analyzed with StepOne Software ver. 2.0. Raw Ct values were calculated using StepOne ver. 2.0, applying automatic threshold and baseline settings.

RT-qPCR data analysis
The GenExpA tool was used for fully automated qPCR data analysis. The gene or combination of two HKGs with the lowest stability value were selected as the best reference by combining the model-based variance estimation with progressive removal of the least stable reference gene. For statistical analysis, the non-parametric Mann-Whitney or the 19 ANOVA Kruskal-Wallis test with subsequent post hoc Dunn's test were used for models comprised of two unmatched samples or three or more unmatched samples, respectively. P values less than 0.05 were taken to indicate statistical significance (P<0.05).