Quantifying changes in DNA and RNA levels is essential in numerous molecular biology protocols. Quantitative real time PCR (qPCR) techniques have evolved to become commonplace, however, data analysis includes many time-consuming and cumbersome steps, which can lead to mistakes and misinterpretation of data. To address these bottlenecks, we have developed an open-source Python software to automate processing of result spreadsheets from qPCR machines, employing calculations usually performed manually. Auto-qPCR is a tool that saves time when computing qPCR data, helping to ensure reproducibility of qPCR experiment analyses. Our web-based app (https://auto-q-pcr.com/) is easy to use and does not require programming knowledge or software installation. Using Auto-qPCR, we provide examples of data treatment, display and statistical analyses for four different data processing modes within one program: (1) DNA quantification to identify genomic deletion or duplication events; (2) assessment of gene expression levels using an absolute model, and relative quantification (3) with or (4) without a reference sample. Our open access Auto-qPCR software saves the time of manual data analysis and provides a more systematic workflow, minimizing the risk of errors. Our program constitutes a new tool that can be incorporated into bioinformatic and molecular biology pipelines in clinical and research labs.
Polymerase chain reaction (PCR) identifies a nucleic acid fragment of interest by increasing its proportion relative to others1. Initially the technique was primarily used to visualize DNA fragments for cloning2,3 or genotyping4,5,6, but can now be used to investigate genetic polymorphisms and mutations7,8, copy number variants (CNVs)9, single nucleotide variants (SNVs), point mutations, and genetic deletion/duplication events10. With the development of fluorogenic probes and dyes capable of binding newly synthesized DNA, PCR became more quantitative, leading to innovative tools for quantifying relative transcript levels for one or more genes, now referred to as quantitative PCR (qPCR). With these technological advancements, qPCR is now used to quantify messenger RNA (mRNA)11, long non-coding RNA12, microRNAs13,14, DNA–protein interactions15 and epigenetic modifications16,17. Thus, the advent of PCR has revolutionized our ability to analyze and quantify nucleic acids and has made qPCR a standard technique.
qPCR experiments are already automated at the data acquisition stage, with thermocycler software providing “by default” pre-processing procedures18. However, several steps (data exclusion, normalization, data display and differential analyses) required for full data interpretation are heterogenous, and the data processing and display methods and options vary widely across available licenced qPCR programs. Commercially available software that provide data summaries and statistical output do not systematically allow for user selections and are not necessarily transparent as to the processes and settings being used. Not knowing the conditions for data flagging or exclusion and normalization can lead to misinterpretation of the results. Also, not all qPCR software provides a statistical output. Analysis of qPCR data is still highly time consuming and error prone, especially when processing large numbers of data points. The user must intervene to include or exclude replicates, which, without guidelines or standardized procedures, can potentially introduce “user-dependent” variation and errors. To both simplify and accelerate this data analysis step for qPCR datasets, we have created a Python-based, open source, user-friendly web application “Auto-qPCR” to process exported qPCR data and to provide summary tables, visual representations of the data, and statistical analysis. The program can be found at the website https://auto-q-pcr.com/. Furthermore, the program can be installed locally, and then run offline.
The program can work with the two commonly used molecular biology approaches: (i) absolute quantification, where all RNA estimations rely on orthogonal projection of the samples of interest onto a calibration curve19, and (ii) relative quantification that relies on difference of cycle threshold (CT) values between the gene of interest and endogenous controls20.
Here we use Auto-qPCR to analyze qPCR datasets and illustrate four distinct computational methods. Overall, Auto-qPCR provides an all-in-one solution for the user, going from datasets to graphs, within one web-based software package. Unlike other software, the intermediate and final results are output by the program, allowing a full review of the data and accurate statistical treatment based on the experimental design. Auto-qPCR was conceived to build logical links between the experimental design and required statistics for differential analyses of each mode, which is rarely found in other qPCR programs. While other open-source qPCR analysis software programs and web apps21,22,23 are available, they are only able to normalize, compare and display qPCR data generated with one of the two quantification modes19,20. In contrast, Auto-qPCR provides a comprehensive data analysis package for a wide variety of qPCR experiments. Using the web app does not require prior programming knowledge, account creation or desktop installation. Additionally, the program has been designed to assist the user at each step of the analysis once the exported data files have been collected from the qPCR system.
Auto-qPCR can be used to analyse qPCR data in a reproducible manner, simplifying data analysis, avoiding potential human error, and saving time. In this manuscript, we describe some of the uses of the software and outline the steps required, from entering an individual dataset to complete statistical analysis and graphical presentation of the data.
Culture of iPSC lines
To illustrate the four different models of quantification managed by the Auto-qPCR program, we used 11 different iPSC cells lines whose properties are presented in Table S1. Quality control profiling for the iPSCs used was outlined previously24.
The use of iPSCs in this research is approved by the McGill University Health Centre Research Ethics Board (DURCAN_IPSC/2019-5374).
For the cell lines GM25952, GM35953, GM25974, GM25975, fibroblasts were ordered from the Coriell Institute and reprogrammed at the Montreal Neurological Institute. The NCRM1 iPSC line was requested from the NIH Center for Regenerative Medicine (NIH CRM, http://nimhstemcells.org/crm.html). The KYOUDXR0109B iPSC line was ordered from ATCC company. For the following iPSC cell lines—AiW001-2, AiW002-2, AJG001-C4, AJC001-5 and 522-2666-2—somatic cells were collected and reprogrammed at the Montreal Neurological Institute.
The iPSCs were seeded on Matrigel-coated dishes and expanded in mTESR1 (StemCell Technologies) or Essential 8 (ThermoFisher Scientific) media. Cells were seeded at 10–15% confluency and incubated at 37 °C in a 5% CO2 environment. The media was changed daily until the cultures reached 70% confluency. Cells harbouring irregular borders, or transparent centres were manually removed from the dish prior to dissociation with Gentle Cell Dissociation media (StemCell Technologies). The iPSCs were then seeded and differentiated into cortical or dopaminergic neuronal progenitors or neurons.
Generation of cortical and dopaminergic neurons
The induction of cortical progenitors was performed as described previously25. The media used for cortical differentiation is described in the standard operating procedure published on the Early Drug Discovery Unit (EDDU) website24. Once neural progenitor cells (NPCs) attained 100% confluency, they were passaged and seeded on a Poly-Ornithine-laminin coated dishes to be differentiated into neurons. Cells were switched for 24 h to 50% Neurobasal (NB) medium, and 24 h later placed in 100% NB medium with AraC (0.1 µM) (Sigma) to reduce levels of dividing cells. After the third day of differentiation, cells were maintained in 100% NB medium without AraC for four days before being collected for RNA extraction. IPSCs were induced into dopaminergic NPCs (DA-NPCs) according to methods previously described26, modified according to methods used within the group27. DA-NPCs were subsequently differentiated into dopaminergic neurons (DANs), with immunostaining and qPCR analysis performed at four and six weeks of maturation from the NPC stage28.
DNA and RNA extraction
IPSCs were dissociated with Gentle Cell Dissociation Reagent (Stem Cell Technologies) while Accutase® Cell Dissociation Reagent (Thermo Fisher Scientific) was used to dissociate NPCs and iPSC-derived neurons. After 5 min incubation at 37 °C with the indicated dissociation agent, cells were collected and harvested by centrifugation for 3 min at 1200 rpm. Cell pellets were resuspended in lysis buffer and stored at − 80 °C before DNA or total RNA extraction with the Genomic DNA Mini (Blood/Culture Cell) (Genesis) or mRNAeasy (Qiagen) kits, respectively.
cDNA synthesis, quantitative PCR, and data export
Reverse transcription reactions were performed on 400 ng of total RNA extract to obtain cDNA in a 40 μl total volume containing, 0.5 μg random primers, 0.5 mM dNTPs, 0.01 M DTT and 400 U/µl-MMLV RT (Carlsbad, CA, USA). The reactions were conducted in single plex, in a 10 µl total volume containing 2 × TaqMan Fast Advanced Master Mix, 20 × TaqMan primers/probe set (Thermo Fisher Scientific), 1 µl of diluted cDNA and RNAse-free H2O. Real-time PCR (RT-PCR) were performed on a QuantStudio 3 or QuantStudio 5 machines (Thermo Fisher Scientific). Primers/probe sets from Applied Biosystems were selected from the Thermo Fisher Scientific web site. Two endogenous controls (beta-actin and GAPDH) were used for normalization (Table S2).
Data generated from the QuantStudio machine were extracted using QuantStudio design and analysis software, either (i) as Excel files (*.xls or *.xlsx extensions) and the results tab was saved as a ‘comma delimited’ csv file or (ii) extracted as a txt file that only contained the result tab. Excel files should be carefully used since gene names (notably those whose numbers can be recognized as potential dates) could be modified by automatic changes in cell formatting29. We suggest exporting data in txt or csv file format.
Collection of external data set
An external qPCR data set was provided from an earlier published study30, which quantified levels of Nrxns and Nlgn transcripts in the subcortical areas of the brains from mice submitted to conditioned place preference (CPP) with cocaine. Briefly, subcortical areas (subthalamic nucleus, globus pallidum and substantia nigra) of sectioned mouse brains were isolated by laser capture microdissection. RNA was extracted with the Arcturus PicoPure kit and reverse transcription performed as above. The qPCR experiments were performed according to an absolute quantification design on the Opticon 2 PCR machine (Bio-Rad). Β2Microglobulin (B2M) was used as endogenous control. Data were re-extracted from the Opticon Monitor 2 files as csv files and analyzed by Auto-qPCR.
Program development and structure
Program function—input data processing and quantification
The Auto-qPCR program reads the raw data in the form of a results spreadsheet (via the users file navigator) and reformats it into a data frame in Python. The user enters information into the web app read as arguments by the software. See Table S4 for a list of all the user inputs and Figure S2 for examples of the input files. The input spreadsheet needs to be organized such that samples are found in rows and values are found in columns, the required columns are: Well, Sample Name, Target Name, Task, CT (Figure S2), the column names do not need to match exactly. The values for the reference genes/targets (ACTB, GAPDH) are calculated for each sample and technical replicate (cell line, time point, treatment condition) separately.
To detect outliers, the CT standard deviation (CT-SD) of the technical replicates for a given sample is calculated, if the CT-SD is greater than the cut-off (the default value is 0.3), then the technical replicate furthest from the sample mean is removed. The process occurs recursively until the CT-SD is less than the cut-off or the value of “max outliers” is reached. This is determined by the parameter ‘Max Proportion’, the 0.5 default means that outliers will be removed until two technical replicates remain. The ‘preserve highly variable replicates’: If the CT-SD is higher than 0.3, but the absolute (mean-median)/median is less than 0.1, replicates are preserved. This helps to account for a lack of a clear outlier, where two of three replicates are close to equally distributed around the mean.
Model dependent processing: Absolute model calculates the ratio between the gene of interest and each control. For each gene/target of interest the normalized value is calculated against the mean of each control target separately, then the mean value from normalized to controls is calculated. Relative model ΔCT, without a calibration sample, calculates the ΔCT by subtracting the Control CT value from the CT value for the target from each (endogenous control), then takes mean value of the resulting deltas. Relative model ΔΔCT and genomic stability model, individually calculates the ΔCT for the target in test sample and the reference/calibration sample(s) then calculates the ΔΔCT by subtracting the reference ΔCT from the test sample. For all models, the mean value of technical replicates is calculated for each target.
For the relative models, values of reference genes are calculated separately for each input file. The data from one input file will not be applied to another file. For the absolute model, qPCR output for each gene is found in a separate file and the selected endogenous controls will be applied to all the data input in one analysis. For all models, two spreadsheets are created that can be opened in Excel. (1) “clean_data.csv” contains the ΔCT calculated for each technical replicate, including outliers, indicated by “TRUE” in the column “Outlier”. (2) and “summary_data.csv” contains the mean, standard deviation (SD) and standard error (SE) for each sample calculated from the included technical replicates; this output can easily be analyzed in another statistical program (R, SASS, Prism). All the input and output data are cleared after processing and no user data is stored in the web app.
Program function—statistical analysis
For testing differential gene expression, the user selects the statistic option and files in a form to indicate the conditions of the experiment. Either paired test (t test) or multiple comparisons (one-way ANOVA or 2-way ANOVA) to investigate interaction effects is selected. The names of the variables to be grouped by must be within either the ‘sample names’ column in the input file or within an additional column, which was created during the qPCR setup. A column can also be added manually into the results input file(s), although this will add a risk of copy/paste errors and add additional time to the analysis process. See Table S5 for the list of which analysis is applied for each setting. All default settings are maintained for statistical functions (for details see the Pingouin documentation at https://pingouin-stat.org/), the output has been reformatted to be more easily read and interpreted by users and for consistency across statistical outputs.
The plotting scripts were written using the Matplotlib bar chart function. The labels and axis settings were all adjusted directly within the script (plot.py). The user can dictate the gene/target order and the sample order (cell lines, treatments, time points) in the web app by entering the orders into the appropriate input box. The order variables will be grouped for the summary plots. All the plots are automatically generated and saved as png files. If statistics are applied, two summary bar charts of the mean values are generated, grouped by the selected variable. For two-way ANOVA analysis, the summary bar chart will group the first variable on the x-axis and the second variable will be visualized in different colours and indicated in the legend.
Data availability and reproducibility
All raw csv input files data files and output files used in plots are available at https://github.com/neuroeddu/Auto-qPCR, along with a user guide. The example input (Input Data) and output files (Output Data) are all available and organized by Figure names. The parameters used for each figure can be found in the document “Notes_on_Datasets.docx” and screen shots of the filled web app from for each figure are in the Supplementary Figures. The example output will be replicated identically if the same conditions are entered.
The Auto-qPCR program functions with the workflow of a qPCR experiment
A qPCR experiment includes multiple steps that can be divided into two categories: (1) sample preparation to conduct the qPCR reaction, and (2) data analysis, visually represented in the schematic in Fig. 1. Nucleic acids are extracted from biological samples (RNA which is converted to cDNA for quantifying gene expression levels; or genomic DNA). Prior to performing qPCR in vitro, the user must generate the in-silico experimental layout using software that monitors the biochemical reaction. The user defines the experimental design (absolute or relative quantification), the method for detecting DNA synthesis (Taqman or SybrGreen) and the location of each sample within the plate. Finally, at the end of the qPCR process/cycle/program, the recorded data is exported and then would normally be analyzed manually. In our workflow, the data is exported from the PCR machine and saved as spreadsheet in the form of a txt or csv file (Supplementary Figure S2). The file is then uploaded into the Auto-qPCR web app and the user enters their experimental settings.
Auto-qPCR will remove technical replicates by the selected criteria, normalize to an endogenous control, create a clean data table, and summary data table and graphs of all the results. If the user selects the statistical analysis, differential expression analyses will be performed on the designated groups. The program was designed for the most common uses of qPCR: detecting DNA fragment duplications or deletions, and quantifying gene expression levels according to the absolute or relative quantification models.
A relatively new application for qPCR detects small changes within the genome, from a deletion to a duplication of a DNA segment. DNA regions known to be highly susceptible to such events can be quantified using a genomic instability qPCR test. In induced pluripotent stem cell (iPSC) research, genomic instability tests are critical for quality control to screen for duplication/deletion events that can arise during reprogramming and prolonged cell passaging31,32. We performed a qPCR test for genomic stability, where for each cell line, the signal from each DNA region of interest was compared to the endogenous control region.
We uploaded the data into the Auto-qPCR web app and selected the genomic instability model (Fig. 2B). The endogenous control used to normalize the data, was an amplicon of a region on chromosome 4 (CHR4), a location of the genome known not to contain any instabilities. As a reference sample, we used DNA known not to have any instabilities as the calibrator (Normal) (Fig. 2A). The genomic instability model has two steps of normalization in its general formula. This formula and the variables used in the example calculation (Fig. 2B,C). First, the CT values from the control region (i.e., CHR4) for each cell line are subtracted from each region of interest. Next, the ∆CT from the Normal DNA control is subtracted from the ∆CT calculated for each cell line sample. Finally, the mean is calculated from the average of multiple technical replicates included with the plate design for each sample. Thus, the ∆∆CT values are expressed as “Relative Quantification” according to the following formula: RQ = 2−∆∆CT. If the sample has no abnormalities (deletions or duplications) the values obtained should be equal or close to 1, except for targets in the X chromosome in a male individual in which the ratio would be expected to be at 0.5. As the DNA used for PCR amplification may come from a mixed population of cells, where only some cells carry a deletion or duplication, we set an acceptable range of variation as 0.3 above and below the expected value of 1; DNA regions with RQ values between that 0.7 and 1.3 are considered normal. Values below 0.7 indicate a deletion and values above 1.3 indicate an insertion. For ease of analysis, we have included a column in the output file from the Auto-qPCR program that indicates normal, insertion or deletion (Supplementary Table S6). We found that all seven chromosomal regions in the four cell lines tested were between 0.7 and 1.3 and we concluded that no duplications or deletions were present (Fig. 2D and Supplementary Fig. S3B). Overall, we demonstrated how Auto-qPCR can be used to analyse the data from a genomic instability qPCR assay, and that the app effectively processed the data, creating a summary table and graph of the data.
For absolute quantification experiments, the quantities of RNA transcripts for a gene of interest and the endogenous controls are first estimated with a calibration curve (Fig. 3A) to provide a mathematical relationship between the CT values and the RNA concentration or quantity. The relationship is described by the equation CT = alog2[RNA] + b, where “a” is the slope and “b” is the Y-intercept (Fig. 3C)33. The expression levels of the RNA molecule of interest are then given by the ratio of the estimated amount of RNA for a select transcript and the estimated amounts of endogenous controls (Fig. 3C). Consequently, the values given as “Normalized Expression Levels” depend on the levels of transcript within the biological material used to set the calibration curves. We used Auto-qPCR to compare the expression of three gene transcripts across six different cell lines at four different stages in the differentiation of neurons from iPSCs (Fig. 3B and Supplementary Fig. S4). The calibration curve was made from a mix of the cDNAs generated from the reverse-transcribed RNA reactions from the four timepoints in the differentiation process and made of eight four-time serial dilutions to cover a linear relationship in a dynamic range from 1 to 16,384-fold dilution (Fig. 3A). Raw data was normalized with two endogenous controls (ACTB and GAPDH) (Fig. 3D–H and Supplementary Fig. S4A). Auto-qPCR app provides several graphical representations of the normalized expression values. The means of technical replicates are provided for each gene (Fig. 3D). Bar charts were generated for all gene and sample observations plotted together (grouped by gene Fig. 3E and by sample Fig. 3G), allowing for an overview of the data and visualization of the biological variation between cell lines at a given stage.
We used the statistical module in Auto-qPCR to test for changes in gene expression over the different stages of neuronal differentiation; the different cell lines were considered as biological replicates (Supplementary Fig. S5). As there are more than two groups, the Auto-qPCR software runs a one-way-repeated measures ANOVA for each gene. Two summary plots (Fig. 3F,H) and two statistical output tables were generated: one for the ANOVAs and one for the secondary measures (Supplementary Tables S7 and S8). There was a significant effect of the differentiation stage on the expression of synaptic markers. The t tests with false discovery rate (FDR) correction for pairwise comparisons of each stage showed that iPSCs have significantly less expression of each synaptic marker than DAN differentiated for 4 and 6 weeks (Supplementary Table S8), indicating that the differentiation protocol is successful for all cell lines tested, with each iPSC differentiating into progenitors and ultimately DAN (Supplementary Figure S5). We show that raw absolute qPCR data was effectively processed by Auto-qPCR, creating summary data, visualization and statistics for differential gene expression between conditions.
In addition to absolute quantification, the Auto-qPCR software also enables the processing of qPCR data obtained according to a relative quantification design. Contrary to absolute quantification, relative quantification does not require a calibration curve, and quantification (of transcripts) is based on the CT difference between a transcript of interest and one or more endogenous controls (Fig. 4A). Relative qPCR is optimal for two kinds of comparisons: (1) detecting a difference in gene expression between two different conditions, and (2) detecting a difference between two transcripts within the same condition. Relative quantification can be expressed either as RQ = 2−∆CT, where samples are normalized to internal control(s), or RQ = 2−∆∆CT, where a given sample is considered as a calibrator for the unknown samples (Fig. 4B,C).
To illustrate the functions of the program, we compared the expression levels of two different control cell lines at two developmental stages, indicated as D0 (neural precursor cells) and D7 (7 days of differentiation into cortical neurons). We measured the expression levels of the progenitor marker PAX6, and two markers of neuronal differentiation (GRIN1and CAMK2A) and normalized to the housekeeping genes ACTB and GAPDH.
We used the Auto-qPCR app to process the same data twice, for a direct comparison of the two distinct relative quantification options (Supplementary Fig. S6). Figure 4D shows the mean expression from technical triplicates calculated by selecting the RQ = 2−∆CT. The ∆CT approach (not using a sample as calibrator) allows a comparison of the expression levels for the three different transcripts. We observed that relative to the endogenous controls, the D0 expression values for each transcript varied widely between the two cell lines tested. However, as expected for both cell lines, PAX6 expression is higher at the D0 stage compared to D7. Conversely, both GRIN1 and CAMK2A exhibited higher expression at the D7 stage compared to D0. Using the statistics module in the Auto-qPCR app, we compared the mean levels of each gene transcript at D0 and D7 using paired t tests for each gene (Fig. 4E,F). We found that although there were clear differences in expression, they were not significant between D0 and D7, likely a result of there only being two samples for each time point (Supplementary Table S9 and Supplementary Fig. S6A and S7). Interestingly, we found that the CAMK2A RQ∆CT was twice the level of GRIN1 at D7 RQ∆CT (Fig. 4F).
We next analysed this dataset with the RQ∆∆CT model (indicated as ΔΔCT) in the web app (Supplementary Fig. S6B) where transcript levels are compared to both control gene expression (in this case ACTB and GAPDH) and a calibration sample; in this case we set one sample, AIW002-02-D0 arbitrarily as the reference sample (Fig. 4G). Here we can easily compare expression in a test condition relative to a control condition by displaying the results as fold change in expression. All decreases are displayed as between 0 and 1 and all the increased expression levels are above 1 (Fig. 4C). With the double normalization (RQ∆∆CT), all values were expressed as a variation compared to the calibrator (AIW002-2-D0) as seen in Fig. 4G–I. As in the RQ∆CT model, the changes in gene expression from D0 to D7 were not significant (Supplementary Table S10). Although the ratio of expression for a given gene in each cell line between DO and D7 remained unchanged, differential expression between genes can no longer be analysed. The RQ∆∆CT shown in Fig. 4H showed that PAX6 expression was higher at D0 than D7 and that CAMK2a and GRIN1 expression were both higher at D7 than D0, as seen in Fig. 4E using the RQ∆CT model. However, with the double normalization, the increase in GRIN1 expression from D0 to D7 appears much larger than the increase in CAMK2a expression (Fig. 4H,I), which was the opposite result from the single normalization model (RQ∆CT) (Fig. 4E,F). Our findings highlight the need to analyze data with attention to the biological question. Using only the RQ∆∆CT analysis, one might mistakenly believe the increase in GRIN1 expression is greater than that of CAMK2a. With Auto-qPCR we provide a quick easy option to process the exported qPCR data with two different relative models. We show the same gene expression ratios between the two time points, but different expression gene levels using the different relative quantitation models.
Auto-qPCR produces the same results as manual processing of a previously published dataset
One of our objectives was to provide a tool for analyzing data from qPCR experiments generated with different qPCR machines. We reanalyzed a published dataset generated by the Gorwood lab30, on a different machine (Opticon 2, Bio-Rad). The original study measured gene expression in three sub cortical areas (subthalamic nucleus (STN), substantia nigra (SN) and globus pallidus (GP) of mice submitted to a place preference paradigm to cocaine30. Manual processing shows a significant increase in Nrxn3 expression in the cocaine-treated group compared to control, specifically in the GP (Fig. 5A).
We next processed the raw data using the Auto-qPCR web app absolute quantification pipeline and normalized to B2M (Fig. 5B and Supplementary Figure S8A). This summary data closely matched the manually calculated data (Supplementary Table S11). The standard method of removing outliers from technical replicates is to remove the replicate most different from the mean, if the CT standard deviation (CT-SD) is above 0.3. Under ‘Options for removing technical replicates’ in the Auto-qPCR software the threshold can be adjusted. During manual analysis, each set of technical replicates is inspected when the CT-SD value is above 0.3, when one replicate is clearly different from the other two the divergent value will be removed. There are some instances in manual processing where no replicates are removed when the CT-SD is greater than 0.3, because the triplicate values are evenly distributed. Auto-qPCR has an option to account for this type of data when the user selects ‘preserve highly variable values’. With this option a replicate is only removed if the median is far from the mean. We processed the Nrxn3 expression data with a range of CT-SD cut-off values to display the difference in outcomes and with or without preserving highly variable replicates (Supplementary Table S11). We compared the variances generated by the differences between the expression values from manual treatment and from Auto-qPCR using a CT-SD cut-off of 0.3 with or without preserving highly variable replicates. We found that the preservation of highly variable option combined with a cut-off at 0.3 generate a 20% decrease in the variance between manual and automatic treatments (Supplementary Table S12) and preserved values falsely estimated as outliers by manual processing, which illustrates the subjectivity of the user with respect to the decision to retain or exclude a value based on criteria of divergence Our analysis suggests that applying two rules of data filtering provides a more systematic data analysis method and minimizes interindividual bias. Here we applied the standard cut-off of 0.3 and preserved highly variable replicates, appropriate for the highly variable and RNA level experimental samples we are analyzing.
Auto-qPCR also permits statistical groups to be designated in the sample name or in a specific group column, which can be added into the qPCR data during the plate set up or later in the results spreadsheet. To allow for statistical analysis of this data, we added a grouping column into the raw data files (Supplementary Table S13) and using the Auto-qPCR statistics module, we reanalysed the effect of drug treatment and brain regions on expression of Nrxn3 across several parameters. We first compared the overall effect of cocaine on expression after pooling the three brain regions and found that although the expression of Nrxn3 was increased across brain regions with cocaine treatment, there was no overall significant effect of drug treatment (Fig. 5C, Supplementary Fig. S9A and Supplementary Table S14). Comparing the three brain regions while pooling together control and cocaine treatment showed a significant difference in expression across brain regions. Post-hoc analysis revealed Nrxn3 expression in the STN was significantly lower than in the GP and SN (Fig. 5D, Supplementary Fig. S10A and Supplementary Table S15). When we considered each brain region with and without treatment as independent conditions, and individual mice as biological replicates and used a one-way ANOVA followed by post hoc tests using multiple t test with a correction for multiple comparisons we find cocaine significantly increased Nrxn3 expression specifically in the GP and not in the SN or STN (Fig. 5E and Supplementary Table S16). To apply the identical statistical treatment as originally presented, we performed a two-way ANOVA followed by a repeated measures t tests with FDR correction on the interaction variable between treatment and brain region, using Auto-qPCR, and found the same results as the one-way ANOVA (Fig. 5F, Supplementary Fig. S10B and Supplementary Table S17) and a t test of the GP alone (Fig. 5G), all in agreement with the originally published results30. Together the data shows that the Auto-qPCR software is capable of processing data generated by another machine and the results match those processed manually.
This paper presents Auto-qPCR, a new web app for qPCR analysis and provides examples of the functionalities of the software applied to qPCR experimental datasets generated from DNA (genomic instability assay), cDNA amplification, and RNA transcripts (absolute and relative quantification data). We have also summarized the computational bases of relative and absolute quantifications performed by Auto-qPCR, which is important for users to understand during experimental design. The Auto-qPCR web app also provides a statistical module that will be applicable to the majority of qPCR analysis experiments, and provides a correction across multiple tests, when more than two samples are compared, to mitigate against false positives. As not all experimental designs require differential analyses, the user can use Auto-qPCR without statistical analysis, calculating normalized RNA concentrations, and a summary table and graphs will be generated. Furthermore, the web app can be used with no installation or login requirements. We have created an easy-to-use program that is completely free and open source, able to process data from different qPCR machines and all common experimental designs, that will be advantageous for any lab performing qPCR experiments.
Given the importance of qPCR in molecular biology, other programs are available to perform many steps of the qPCR data treatment18,21,22,23,34. The Q-PCR and PIPE-T programs were designed to treat and display qPCR data generated according to a relative quantification model23,34. SATQPCR is a web app that treats qPCR data using the relative quantification model and performs differential analyses. However, it does not take the exported results files directly from the qPCR data and requires manually preformatting of the data before analysis22. ELIMU-MDx is a web-based interface conceived to collect specific information regarding qPCR assays for diagnostic purposes. EILMU-MDx functions as a data management system, processes qPCR data generated using the absolute quantification method and requires an account and login information21. Finally, another web app “Do my qPCR calculations” requires no login but needs manual preformatting of an Excel sheet to upload or enter values directly. It also provides relative quantification results, but requires manual preformatting of an Excel sheet to upload or entering values directly35. The main specifications of these programs relative to ours are presented in Supplementary Table S18 for side-by-side comparison.
Reviewing different software published to serve similar purposes highlights the unique characteristics of Auto qPCR, as no other web app combines all the features we have included in our software. First as a web app, Auto-qPCR does not require installation or a user login and can be accessed from any device connected to internet. Furthermore, for the users who want to work on their analysis off-line, we also provide the option to install the program onto their computer, which entirely reproduces the environment of the web-app. Second, data processed by Auto-qPCR does not require any preformatting of the results file to be performed manually. Instead, once the qPCR experiment is complete, our program takes the csv or txt export file directly from the thermocycler so there is no copy/paste or formatting step to be done by the user. Third, Auto-qPCR can manage the data from multiple separate absolute files at once, as well as batch process multiple results files from a relative quantification. The program creates a clean data set (with all technical replicates) and a summary data table. Fourth, unlike the other software mentioned above, Auto-qPCR includes three different models, conceived to support qPCR data generated from absolute and two methods of relative quantification designs. No other program provides the option of choosing between the two relative quantification methods. Fifth, we provide normalization to multiple reference genes and calculate the mean normalized value for each replicate, and not the sample mean, an important feature implemented in relatively few other programs. This avoids the RNA quantity value being influenced by extreme values. Sixth, we extend the use of the program to suit qPCR data from DNA quantification. Finally, we provide an extensive statistics module for calculating differential gene expression that requires no additional input files. Options are included for experimental designs that include two or more sample comparisons (t test, one- and two-way ANOVA and the equivalent non-parametric tests) and automatically generates bar charts for data visualization and summary tables with the statistical results. In summary, we have created a unique, easy to use qPCR analysis program that can benefit any researcher or lab that needs to analyze qPCR data on a regular basis, by saving time, avoiding errors and generating reproducible, figure-ready plots.
Auto-qPCR provides users the option for relative quantification by two methods: expression relative to endogenous control genes only (∆CT method) or relative to endogenous genes and also normalized to a control condition (∆∆CT method). Although the ∆∆CT method is considered the gold standard to express, in one number, the variation in gene expression between two conditions and the amplitude of that change in expression36, it does not account for inter gene expression variation within the control condition37. The differences between quantifying relative expression with or without a control condition used as a calibrator, are clearly demonstrated above (Fig. 4). Expression levels of GRIN1 and CAMK2a calculated with either relative quantification model was increased at seven days of differentiation (D7) compared to day zero (DO). However, we also found that GRIN1 and CAMK2A had different levels in the baseline condition (∆CT), thus we observe that information is lost when using a ∆∆CT normalization. For relative quantification using a ∆∆CT normalization we measured a fold change of variation compared to a control condition for a given gene38, but information about differences of expression between two genes in control condition were not observed (Fig. 4F). We have provided both the gold standard method of relative quantification and a method to calculate gene expression without a reference sample, to allow users to quickly determine expression changes without losing information about the level of expression in control conditions.
Reprocessing the external dataset highlighted two main advantages of treating qPCR dataset with a program. First, manual analysis of qPCR data is time consuming. Second, comparing both data treatments (manual and program-assisted) has shown that one important source of variation between results of manual analysis is the inconsistent rules used for data exclusion. Although removing one outlier from technical replicates, in the vast majority of cases, improves the CT standard deviation (CT-SD) by decreasing it under the commonly accepted threshold of 0.3, in many cases researchers decide to keep a technical replicate even if the CT-SD value is above 0.3. These judgement calls frequently occur when transcripts have low expression levels and the high variance between technical replicates does not permit a decision based on the adjustment of the CT-SD. To account for these situations, we incorporated a second rule for data inclusion/exclusion based on the distance between the arithmetic mean and the median value of technical replicates to determine the most acceptable set of technical replicates. Applying such an algorithm to the user’s judgement removes variability and potential bias in the resulting normalized gene expression levels. We were able to reprocess external data using Auto-qPCR and acquired the same summary output, reaching the same conclusions as the initial study. We showed that Auto-qPCR can process data from different PCR machines and matched the expected outcome from manual processing without the risk of bias or errors. Using a double rule for data inclusion/exclusion for highly variable signal between technical replicates, the program provides a unique treatment that will considerably reduce the risk of variability and mistakes generated by and between users during manual data processing.
The Auto-qPCR program does have has some limitations, but it also has and a number other potential uses not included in this manuscript. Although the program is able to compute data from independent qPCR plates in single plex (where each plate has a different amplicon), Auto-qPCR has not been adjusted at this stage to manage duplex qPCR (with one endogenous control and one transcript of interest quantified in the same well). Auto-qPCR has also not been equipped to process an inter-plate calibrator, required to cover a sample size of more than one plate, in absolute quantification mode experimental designs. Finally, as most of the primer sets for gene expression are now predesigned and eventually pretested by companies taking in consideration optimal efficiencies of amplification, correction factors for efficiencies have not been added into the Auto-qPCR algorithms. Despite these caveats, we propose that Auto-qPCR could be employed in a variety of molecular biology protocols and many of these features could be added in future iterations. Auto-qPCR is capable of analyzing data from a chromatin immunoprecipitation experiment followed by specific DNA amplification15. The analyses could be performed using either the absolute or the relative quantification models. The absolute quantification method would permit testing primer efficiency through the calibration curve39, and the DNA target amplification would be normalized to an unbound DNA as previously described40,41. Alternatively, the level of DNA/protein interaction can be estimated using the relative quantification models with one or several regions, known to be unbound by a protein of interest, as endogenous control(s) (∆CT mode) and with a biological condition as a calibrator (∆∆CT mode). Auto-qPCR is flexible enough to let the user choosing the most appropriate model to use, based on the information available on the DNA regions to amplify and analyze.
The Auto-qPCR program was conceived to treat, analyze, and display qPCR data generated using either relative or absolute quantification designs, while limiting errors related to manual processing. Data processing tools cannot replace or supplement appropriate experimental design and statistical power. The conditions included with the design and interpretation of the results still remain in the user’s hand. We have provided a tool that will provide easy, reproducible analysis without user errors for unlimited samples. Although, we cannot computationally remove the need for replication and controls, analysis time will no longer be a limitation. Auto-qPCR permits researchers to conduct studies with larger experimental designs while minimizing the risk of mistakes during the data analysis.
Quantitative polymerase chain reaction
Induced pluripotent stem cells
Copy number variants
Single nucleotide variants
Neural precursor cells
Saiki, R. K. et al. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230, 1350–1354. https://doi.org/10.1126/science.2999980 (1985).
Magnuson, V. L. et al. Substrate nucleotide-determined non-templated addition of adenine by Taq DNA polymerase: Implications for PCR-based genotyping and cloning. Biotechniques 21, 700–709. https://doi.org/10.2144/96214rr03 (1996).
Scharf, S. J., Horn, G. T. & Erlich, H. A. Direct cloning and sequence analysis of enzymatically amplified genomic sequences. Science 233, 1076–1078. https://doi.org/10.1126/science.3461561 (1986).
Beggs, A. H., Koenig, M., Boyce, F. M. & Kunkel, L. M. Detection of 98% of DMD/BMD gene deletions by polymerase chain reaction. Hum. Genet. 86, 45–48 (1990).
Mullis, K. B. & Faloona, F. A. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol. 155, 335–350 (1987).
Saiki, R. K., Bugawan, T. L., Horn, G. T., Mullis, K. B. & Erlich, H. A. Analysis of enzymatically amplified beta-globin and HLA-DQ alpha DNA with allele-specific oligonucleotide probes. Nature 324, 163–166. https://doi.org/10.1038/324163a0 (1986).
De la Vega, F. M., Lazaruk, K. D., Rhodes, M. D. & Wenz, M. H. Assessment of two flexible and compatible SNP genotyping platforms: TaqMan SNP Genotyping Assays and the SNPlex Genotyping System. Mutat. Res. 573, 111–135. https://doi.org/10.1016/j.mrfmmm.2005.01.008 (2005).
Ye, S., Dhillon, S., Ke, X., Collins, A. R. & Day, I. N. An efficient procedure for genotyping single nucleotide polymorphisms. Nucleic Acids Res. 29, E88–E88. https://doi.org/10.1093/nar/29.17.e88 (2001).
D’Haene, B., Vandesompele, J. & Hellemans, J. Accurate and objective copy number profiling using real-time quantitative PCR. Methods 50, 262–270. https://doi.org/10.1016/j.ymeth.2009.12.007 (2010).
Charbonnier, F. et al. Detection of exon deletions and duplications of the mismatch repair genes in hereditary nonpolyposis colorectal cancer families using multiplex polymerase chain reaction of short fluorescent fragments. Cancer Res. 60, 2760–2763 (2000).
Wong, M. L. & Medrano, J. F. Real-time PCR for mRNA quantitation. Biotechniques 39, 75–85. https://doi.org/10.2144/05391RV01 (2005).
Gupta, R. A. et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071–1076. https://doi.org/10.1038/nature08975 (2010).
Shi, R. & Chiang, V. L. Facile means for quantifying microRNA expression by real-time PCR. Biotechniques 39, 519–525. https://doi.org/10.2144/000112010 (2005).
Varkonyi-Gasic, E., Wu, R., Wood, M., Walton, E. F. & Hellens, R. P. Protocol: A highly sensitive RT-PCR method for detection and quantification of microRNAs. Plant Methods 3, 12. https://doi.org/10.1186/1746-4811-3-12 (2007).
Mukhopadhyay, A., Deplancke, B., Walhout, A. J. & Tissenbaum, H. A. Chromatin immunoprecipitation (ChIP) coupled to detection by quantitative real-time PCR to study transcription factor binding to DNA in Caenorhabditis elegans. Nat. Protoc. 3, 698–709. https://doi.org/10.1038/nprot.2008.38 (2008).
Dahl, J. A. & Collas, P. Q2ChIP, a quick and quantitative chromatin immunoprecipitation assay, unravels epigenetic dynamics of developmentally regulated genes in human carcinoma cells. Stem Cells 25, 1037–1046. https://doi.org/10.1634/stemcells.2006-0430 (2007).
Milne, T. A., Zhao, K. & Hess, J. L. Chromatin immunoprecipitation (ChIP) for analysis of histone modifications and chromatin-associated proteins. Methods Mol. Biol. 538, 409–423. https://doi.org/10.1007/978-1-59745-418-6_21 (2009).
Pabinger, S., Rodiger, S., Kriegner, A., Vierlinger, K. & Weinhausel, A. A survey of tools for the analysis of quantitative PCR (qPCR) data. Biomol. Detect. Quantif. 1, 23–33. https://doi.org/10.1016/j.bdq.2014.08.002 (2014).
Bustin, S. A. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J. Mol. Endocrinol. 25, 169–193 (2000).
Pfaffl, M. W. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 29, e45. https://doi.org/10.1093/nar/29.9.e45 (2001).
Krahenbuhl, S. et al. ELIMU-MDx: A web-based, open-source platform for storage, management and analysis of diagnostic qPCR data. Biotechniques 68, 22–27. https://doi.org/10.2144/btn-2019-0064 (2020).
Rancurel, C., van Tran, T., Elie, C. & Hilliou, F. SATQPCR: Website for statistical analysis of real-time quantitative PCR data. Mol. Cell Probes 46, 101418. https://doi.org/10.1016/j.mcp.2019.07.001 (2019).
Zanardi, N. et al. PIPE-T: A new Galaxy tool for the analysis of RT-qPCR expression data. Sci. Rep. 9, 17550. https://doi.org/10.1038/s41598-019-53155-9 (2019).
Chen, C. X. Q. et al. Standardized quality control workflow to evaluate the reproducibility and differentiation potential of human iPSCs into neurons. Methods Protoc. 4, https://doi.org/10.3390/mps4030050 (2021).
Bell, S. et al. A rapid pipeline to model rare neurodevelopmental disorders with simultaneous CRISPR/Cas9 gene editing. Stem Cells Transl. Med. 6, 886–896. https://doi.org/10.1002/sctm.16-0158 (2017).
Kriks, S. et al. Dopamine neurons derived from human ES cells efficiently engraft in animal models of Parkinson’s disease. Nature 480, 547–551. https://doi.org/10.1038/nature10648 (2011).
Chen, E. S. et al. Induction of Dopaminergic or Cortical neuronal progenitors from iPSCs. Zenodo. https://doi.org/10.5281/zenodo.3364831 (2019).
Chen, E. S., Lauinger, N., Rocha, C., Rao, T. & Durcan, T. M. Generation of dopaminergic or cortical neurons from neuronal progenitors. Zenodo. https://doi.org/10.5281/zenodo.3361005 (2019).
Abeysooriya, M., Soria, M., Kasu, M. S. & Ziemann, M. Gene name errors: Lessons not learned. PLoS Comput. Biol. 17, e1008984. https://doi.org/10.1371/journal.pcbi.1008984 (2021).
Kelai, S. et al. Nrxn3 upregulation in the globus pallidus of mice developing cocaine addiction. NeuroReport 19, 751–755. https://doi.org/10.1097/WNR.0b013e3282fda231 (2008).
Tosca, L. et al. Genomic instability of human embryonic stem cell lines using different passaging culture methods. Mol. Cytogenet. 8, 30. https://doi.org/10.1186/s13039-015-0133-8 (2015).
Yoshihara, M., Hayashizaki, Y. & Murakawa, Y. Genomic instability of iPSCs: Challenges towards their clinical applications. Stem Cell Rev. 13, 7–16. https://doi.org/10.1007/s12015-016-9680-6 (2017).
Ovstebo, R., Haug, K. B., Lande, K. & Kierulf, P. PCR-based calibration curves for studies of quantitative gene expression in human monocytes: Development and evaluation. Clin. Chem. 49, 425–432. https://doi.org/10.1373/49.3.425 (2003).
Pabinger, S. et al. QPCR: Application for real-time PCR data management and analysis. BMC Bioinform. 10, 268. https://doi.org/10.1186/1471-2105-10-268 (2009).
Tournayre, J., Reichstadt, M., Parry, L., Fafournoux, P. & Jousse, C. “Do my qPCR calculation”, a web tool. Bioinformation 15, 369–372. https://doi.org/10.6026/97320630015369 (2019).
Schmittgen, T. D. & Livak, K. J. Analyzing real-time PCR data by the comparative C(T) method. Nat. Protoc. 3, 1101–1108. https://doi.org/10.1038/nprot.2008.73 (2008).
Yuan, J. S., Reed, A., Chen, F. & Stewart, C. N. Jr. Statistical analysis of real-time PCR data. BMC Bioinform. 7, 85. https://doi.org/10.1186/1471-2105-7-85 (2006).
Rao, X., Huang, X., Zhou, Z. & Lin, X. An improvement of the 2(−delta delta CT) method for quantitative real-time polymerase chain reaction data analysis. Biostat. Bioinform. Biomath. 3, 71–85 (2013).
Brankatschk, R., Bodenhausen, N., Zeyer, J. & Burgmann, H. Simple absolute quantification method correcting for quantitative PCR efficiency variations for microbial community samples. Appl. Environ. Microbiol. 78, 4481–4489. https://doi.org/10.1128/AEM.07878-11 (2012).
Mathieu, O., Probst, A. V. & Paszkowski, J. Distinct regulation of histone H3 methylation at lysines 27 and 9 by CpG methylation in Arabidopsis. EMBO J. 24, 2783–2791. https://doi.org/10.1038/sj.emboj.7600743 (2005).
Maussion, G. et al. Investigation of genes important in neurodevelopment disorders in adult human brain. Hum. Genet. 134, 1037–1053. https://doi.org/10.1007/s00439-015-1584-z (2015).
T.M.D. received funding through the McGill Healthy Brains for Healthy Lives (HBHL) initiative, the CQDM FACS program, the Alain and Sandra Bouchard Foundation, the Ellen Foundation and the Mowafaghian Foundation. T.M.D is supported by a project grant from CIHR (PJT-169095). R.A.T was funded by a Healthy Brains for Healthy Lives Fellowship. Thanks to Ivan Castanon Niconoff for helping create and set up the virtual machine used to host the Auto-qPCR web app. Thanks to Maria José Castellanos Montiel, Vincent Soubannier and Nguyen-Vi Mohamed, for testing the web app.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Maussion, G., Thomas, R.A., Demirova, I. et al. Auto-qPCR; a python-based web app for automated and reproducible analysis of qPCR data. Sci Rep 11, 21293 (2021). https://doi.org/10.1038/s41598-021-99727-6
BMC Bioinformatics (2022)