Introduction

There is no universally accepted “best” method for choosing a set of reference genes (RGs) for normalization. It has been reported1, 2 that RGs' expression stability may vary under different types of cancer. House-keeping genes (HKGs) cannot be used as internal controls in different tissues3, developmental stages4, or experimental conditions5. Doing so could yield conflicting results. Different experimental backgrounds should be strictly considered when determining the best RG3, 6, 7, 8.

In gene expression studies, RGs are usually selected on the basis of previously published literature. However, these RGs are often used in very different experimental conditions and lack validation. HKGs, such as GAPDH, ACTB, and HPRT1, are the most widely utilized genes in the calibration of gene expression levels6. A growing number of studies have been reporting that the expression level of RGs may vary in different organisms or experimental conditions. Even the most commonly used HKG cannot be assumed to a universally applicable RG; its behavior in different cancers, tissues, or cells must be considered9, 10, 11. A set of RGs should be strictly validated before their use. Here, we report an RG selection system that could help to address such problems. nasopharyngeal carcinoma (NPC) was used as an example to test the efficiency of the system.

NPC is a prevalent tumor in Southeast Asia, particularly among the Cantonese population of southern China. Its incidence has remained high for decades12. Compared to other malignant tumors of the upper aerodigestive tract, NPC is a special type of head and neck cancer in its epidemiology, pathology, clinical presentation, and response to treatment. The etiology of NPC involves multiple factors, including genetic susceptibility, exposure to chemical carcinogens, and Epstein-Barr virus (EBV) infection13.

We established a strategy to determine the most suitable RGs for NPC studies. Similar studies in other forms of cancer, including prostate cancer14, human bladder cancer15, hepatocellular carcinoma16, breast carcinoma17, gastrointestinal tumor18, and human renal cell carcinoma19, have previously been carried out.

The goals of our study were (a) to develop an improved strategy for selecting optimal RGs in gene expression studies and (b) to investigate a panel of potential RGs with regard to their expression stability and therefore their suitability for use as RGs in NPC research. Our findings are intended to improve methods of normalization and the accuracy of gene expression data.

Materials and methods

Tissue information, RNA isolation, and reverse transcription

Poorly differentiated squamous NPC tissue samples were obtained from 37 NPC patients with consent, before treatment, at the Institute of Nasopharyngeal Carcinoma, The People's Hospital of Guangxi Zhuang Nationality Autonomous Region, Nanning, China. In addition, 20 nasopharyngeal phlogistic (NP) tissues were obtained from patients without NPC at the same hospital. All specimens were reviewed by an otorhinolaryngologic pathologist. All fresh tissues were snap-frozen in liquid nitrogen and stored at -80 °C until use.

Each specimen was ground into a fine powder in a 10 cm ceramic mortar (RNase-free), and then RNA was isolated following the manufacturer's recommendation (Biostar, Shanghai). RNA was resuspended in Milli-Q H2O. RNA concentrations were determined by spectrophotometry using a GeneQuant II (Pharmacia Biotech). The integrity of the RNA was verified by 1% agarose gel electrophoresis. Following the manufacturer's instructions, total RNA was reverse transcribed with Oligo (dT) primers (Takara) and a Superscript III RNase H-Reverse Transcriptase kit (Invitrogen). The resultant synthesized cDNA was stored at -20 °C.

Gene expression profiles

Fluorescent cDNA probes were prepared through reverse transcription of the isolated total RNA and then purified. The total RNAs from 15 nasopharyngeal phlogistic tissues were mixed and labeled with Cy3-dUTP by reverse transcription. This mixture is the common control pool in our experiments. RNA samples from 20 NPC patients were separately labeled with Cy5-dUTP. 20 cDNA microarrays containing 14112 genes were then performed, following the manufacturer's instructions. The microarrays were scanned to detect emission from Cy5 and Cy3 by ScanArray 4000 (Packard, Biochip Technologies) at two wavelengths, 635 nm and 532 nm, respectively. The acquired images were analyzed using GenePix Pro 3.0 software. The intensities at the two wavelengths for each spot indicate the quantity of bound Cy3-dUTP and Cy5-dUTP. Ratios of Cy5 to Cy3 were computed by GenePix Pro 3.0 using the median ratio method. Overall intensities were normalized using the Lowess method. The specifics of the 20 cDNA microarray (BioStarH-141s, Biostar, Shanghai) experiments, including probe preparation, microarray hybridization, image detection and data normalization, were carried out as in our previous reports20, 21.

Potential reference gene selection in microarray data

An equation based on the coefficient of variation was constructed to evaluate the data resulting from the 20 gene expression profiles. In Equation 1, for two genes x and y, aix and aiy stand for their ratio from microarray i. Matrix Axy consists of m elements that are log2-transformed ratios aix/aiy, which are calculated from the m=20 microarrays (Equation 1). We defined the pairwise variation Vxy for the two genes x and y as the standard deviation of the matrix Axy (Equation 2). The gene stability measure Sx for gene x was the arithmetic mean of all pairwise variations Vxy (Equation 3).

Primer design and real-time quantitative PCR

PCR primer sequences were based on the NCBI database (National Center for Biotechnology Information, [http://www.ncbi.nlm.nih.gov/]) and Ensembl database ([http://www.ensembl.org/]) and were designed using Primer Premier 5. BLAST analysis against genomic DNA was performed, using both databases, to test the specificity of the primers. Information about the primers is listed in Table 1. These primers were then used in quantitative real-time PCR, with the ultimate purpose of determining the appropriate RGs. In the real-time PCR experiments, ten housekeeping genes (YARS, EIF3S7, PFDN1, HPRT1, GAPDH, TBP, ACTB, B2M, G6PDH, and HBB) were replicated in triplicate using Sybr Green Mastermix on the ABI Prism 7900 Sequence Detection System (Applied Biosystems, Foster City, CA). Each reaction contained cDNA corresponding to 20 ng of RNA along with 400 nmol/L primers in a final volume of 10 μL. A reaction mixture without cDNA template was used as a negative control. For PCR, reaction mixtures were initially incubated at 95 °C for 15 min and then subjected to 40 cycles of 95 °C for 15 s and 60 °C for 1 min. Melting curve analysis was performed on each sample to ensure that a single amplicon was amplified in the reaction. For each different pair of primers, efficiency of real-time PCR (E), slope values, and correlation coefficients (R2) were determined using serial 1:5 dilutions of template cDNA. E was calculated with the equation (E=(10[-1/slope]–1)×100).

Table 1 Information on the primers used for real-time PCR.

GeNorm analysis

The software program geNorm, version 3.4, was used to compare the stability of candidate reference genes. It is a visual basic application for Microsoft Excel ([http://medgen.ugent.be/jvdesomp/genorm/])22. The expression stability (M) was calculated as the standard deviation of the logarithmically transformed expression ratios23. This M value is the average pairwise variation of one particular gene compared to all the other control genes. Genes with the lowest M value had the most stable transcription28. The least stable gene, which is the one with the highest M value, was automatically excluded from the next calculation round. The cut-off value of M was set at 1.514, 24, 25, 26, 27.

GeNorm also calculated the optimal set of RGs through producing normalization factors (NFn), which are based on the geometric mean of the n most stable reference genes' expression levels23, 29. For example, a normalization factor (NF2) was calculated as the geometric mean of the two most stably expressed genes' expression levels. The procedure then continued to produce NF3 to NFn by stepwise inclusion of additional RGs. The optimal number of reference genes was chosen by calculating the pairwise variation parameter V23, 30, which is defined as the pairwise variation between 2 sequential normalization factors for each normalization factor and its predecessor. If the additional gene has a significant effect, it will have a large variation and should be included in the RGs set14.

Results

RNA quality

All RNA was separately extracted from 37 NPC and 20 NP tissues. From these, 20 NPC and 15 NP tissues were used to perform cDNA microarray studies. The other 17 NPC and 5 NP tissues were prepared for real-time PCR experiments, for which the RNA concentrations are listed (Table 2). RNA concentrations and purity were assessed using agarose gel electrophoresis and by determining the absorbance ratio of 260 nm to 280 nm (mean±SD, 1.91±0.03).

Table 2 RNA concentrations of NPC and NP samples for real-time PCR.

Potential reference genes selection

An equation (see Materials and Methods) was constructed to evaluate the data resulting from the 20 gene expression profiles. The most stably expressed genes are listed in Table 3. In addition, all HKGs31 in the microarrays were analyzed. The distribution of the HKGs' S scores is shown in Figure 1A. The three HKGs (YARS, EIF3S7, and PFDN1) with the lowest S values were selected as the candidate RGs.

Table 3 The most stably expressed genes from microarray data. √ means the gene is HKG, while × means not.
Figure 1
figure 1

Selecting potential reference genes. According to gene expression levels, all housekeeping genes in cDNA microarrays were ranked by S score. The distribution of the housekeeping genes' S scores is shown here. This distribution includes the S scores of the three genes (YARS, EIF3S7, and PFDN1) with the lowest S scores. In previously published NPC studies, the 4 genes used most frequently as internal control genes were GAPDH, ACTB, HBB, and HPRT1.

MEDLINE was searched for Medical Subject Heading (MeSH) terms “nasopharyngeal neoplasm” and “RT-PCR.” We evaluated the entire set of 151 articles resulting from this search and found 53 articles could be used (Figure 1B). Four genes were selected as candidate RGs: GAPDH (27 times; 50.9%), ACTB (18 times; 34.0%), HBB (3 times; 5.7%),and HPRT1 (2 times; 3.8%).

Expression levels of candidate RGs

Real-time PCR was performed to detect the expression levels of the 10 candidate RGs in 17 NPC tissues and 5 NP tissues. Ct values of candidate RGs were between 16 and 30, which is a wide expression range for real-time PCR experiments (Figure 2).

Figure 2
figure 2

Expression levels of candidate reference genes in NPC and NP samples. Values are given as real-time PCR cycle threshold numbers (Ct values). Boxes represent the lower and upper quartiles with medians; bars represent the ranges for the data.

GeNorm analysis

Gene expression stability was analyzed by geNorm22, 23, which calculated the measure of gene expression stability (M) based on the average pairwise variation of all studied genes.

The gene with the lowest M value was considered the most stable23, 32. The genes studied, in order from the most stable to the least stable, were YARS and HPRT1, EIF3S7, GAPDH, TBP, PFDN1, ACTB, B2M, G6PDH, and HBB (Figure 3A).

Figure 3
figure 3

GeNorm analysis of 10 candidate genes. (A) The stability parameter M, which was calculated for each gene in every calculation round, is plotted on the Y axis. The X axis shows the 10 genes ranked according to their expression stability. (B) geNorm calculated NF from leastwise 2 genes to determine the optimal number of reference genes. V is defined as the pairwise variation between 2 sequential NFn and NFn+1.

Furthermore, the optimal number of reference gene sets was evaluated by comparing the pairwise variation between sequential normalization factors (NFs) (Figure 3B). The results show that the inclusion of the fourth gene had about the same effect (V=0.298) on the NF as the inclusion of the third gene (V=0.287) had. There is no significant improvement in the normalization factor to be gained from using more than three genes (HPRT1, YARS, and EIF3S7).

Discussion

In our strategy (Figure 4), there are two main steps to creating the candidate RG pool. The first step involves screening genes using microarray data. We established a mathematical basis for screening genes based on microarray data and calculated every gene's S score using the relevant equations. The S score for a gene represents the stability of that gene's expression level. The lower the S value is, the more stably the gene is expressed. In those genes with relatively low S, we preferred HKGs to be candidate RGs because they are a group of genes that are required for the maintenance of basal cellular function and are constitutively expressed in all cells. The second main step in our strategy involves screening published literature to identify frequently used control genes. Such a two-step approach to select candidate RGs could provide more genes' information and identify optimal RGs with greater accuracy to help finding optimal RGs. We validated those candidate genes using the results of real-time PCR. Then geNorm, a widely available program, was used to screen the best RGs from the real-time PCR data. The GeNorm program was used to validate RGs by determining the most stably expressed candidate gene based on each gene's average expression stability (M). A normalization factor (NF) was then generated by calculating the geometric mean of the most stable RGs23.

Figure 4
figure 4

The strategy of selecting reference genes.

Potential reference genes selection

The more stable the expression of a gene is, the lower that gene's S score. When n genes were ranked by S score, the most unstable gene with the highest S score was then excluded for the next round because this gene may cause significant bias in the results. Then, a new S score of each of the remaining n-1 genes was calculated in the second round. This procedure was repeated until just two genes remained. Ultimately, every gene in the gene expression profile was ranked according to the stability of its expression. Three HKGs (YARS, EIF3S7, and PFDN1) with the lowest S value were selected as the candidate RGs. The literature does not report previous use of these genes as RGs.

We performed a MEDLINE search to identify RGs used in previously published NPC studies. The results of this search showed that no standard set of RGs for NPC expression studies currently exists. Four genes (GAPDH, ACTB, HBB, and HPRT1) were then selected as candidate RGs. In addition, we selected another three HKGs (TBP, B2M, and G6PDH) that are frequently used as RGs in other cancer studies14, 15, 33, 34, 35, 36, 37. Then, the candidate RGs were narrowed down to ten HKGs. Such an approach to select candidate RGs could provide more genes' information to help identify optimal RGs.

Expression levels of candidate RGs

Real-time PCR was then performed to find the expression level of the 10 candidate RGs. In gene expression studies, it is better to use RGs with similar expression levels to normalize the target genes. In our study, Ct values of candidate RGs were between 16 and 30, which is a wide expression range for real-time PCR experiments (Figure 2). GAPDH, EIF3S7, and ACTB were highly expressed, with Ct values below 21 cycles. YARS, PFDN1, HPRT1, and TBP were expressed at relatively low levels and had Ct values over 21 cycles. This set of genes could be an effective reference for target genes with a large range of gene expression levels.

GeNorm analysis

GeNorm calculated the measure of gene expression stability (M). Following the procedure specified by geNorm, we converted Ct values to linear quantities by 2-ΔCt using the highest expressed sample as a calibration sample. M is defined as the average pairwise variation of one gene compared to each of the others. Genes were ranked from the most stable to the least stable as follows: YARS and HPRT1, EIF3S7, GAPDH, TBP, PFDN1, ACTB, B2M, G6PDH, and HBB (Figure 3A). NPC expression research required M values of 1.5 or less14, 24, 25, 26, 27. B2M, G6PDH, and HBB were not suitable for NPC expression research because their M values were above the 1.5 threshold. YARS and HPRT, both of which had the lowest M value of the group, were identified as the two most stably expressed genes.

In our studies, HPRT1 and YARS were identified by geNorm23 as the most appropriate internal control genes of 10 candidate RGs. Furthermore, 5 other genes (EIF3S7, GAPDH, TBP, PFDN1, and ACTB) were also shown to be appropriate RGs for NPC studies (Figure 3A). The three least stable genes (B2M, G6PDH, and HBB) were all beyond the cut-off value. Although these three genes have commonly been used as RGs in other cancers, they should be avoided as internal control genes in NPC. Interestingly, the results showed that ACTB was not as stable as predicted. It had the lowest M value of seven qualified RGs. To the best of our knowledge, ACTB was one of the most commonly used RGs in previously published NPC studies. According to the articles evaluated, ACTB was used 18 times (34.0%), which was more frequently used than other genes, second to GAPDH, which was used 27 times (50.9%). The relatively high variability of ACTB expression further demonstrates the problems of relying on commonly used RGs. Similar erroneous normalizations had been reported by other researchers14, 23. These examples indicate that the absence of a thorough validation process for RGs could result in imprecise normalization.

Furthermore, NFs were calculated for the two most stably expressed genes by stepwise inclusion of a less stable RG. The results showed there is no significant improvement in the normalization factor when using more than three genes (HPRT1, YARS, and EIF3S7). This result indicates that the three-gene set is adequate for the normalization in NPC gene expression studies. We strongly recommended that if conditions permit, this three-gene set should be used instead of the single RG for data normalization.

Advantages and limits to the RG-selecting system

The conventional approach to finding RGs involves only screening earlier studies. There are 2 obvious defects in this approach. The first defect is as noted above, a lack of any systematic comparison of particular experimental contexts. Simply using conventionally used RGs may result in a bias of normalization. The second defect is that if no RGs appear in the prior published research for the cancer or species to be studied, then one is left with no method for choosing the right RGs. Our strategy helps to remedy both of these defects by including microarray data analysis as part of the method for choosing RGs. Furthermore, the present study is the first systematic comparison of the effectiveness of a set of potential RGs for nasopharyngeal carcinoma research. No previous studies have tried to validate optimal RGs in NPC. One possible reason this validation has not been done previously is that NPC and nasopharyngeal phlogistic specimens are very difficult to acquire because (a) most NPC are prevalent in Southeast China; and (b) the size of the cancer tissue is very small.

When we compared the cDNA microarray screen results to the geNorm results, we found that HPRT1, one of the best control genes, was not identified in the screen of the microarray data. The failure to identify it as microarray candidate RGs may have biological and methodological reasons. One reason might be that many genes in the microarray such as ribosomal protein genes were stably expressed. Although HPRT1 was not selected as candidate gene, it was still in the pool of stably expressed genes with low T values. Another reason it was not identified might be the fact that, to calculate T values, more than 10 000 genes were used to correct for each other, but only a few candidate genes were used in geNorm. Furthermore, real-time PCR data were thought to be more accurate than cDNA microarray data. Methodological differences between cDNA microarray and real-time PCR were also responsible for the bias. We can conclude from this result that the literature screen is an important complement to the microarray screening strategy.

We also reported here three new HKGs (YARS, PFDN1, and EIF3S), which were all shown to be appropriate RGs for NPC gene expression studies. These genes were identified directly from the 20 microarray data sets. The current study is the first report of these genes as potential RGs in NPC. The current study also shows that the strategy of screening microarray data for suitable candidate RGs is feasible. The process for finding candidate RGs shown in this study may help researchers who wish to find RGs for a new species, tissue, or cell type that has no record of proven RGs.

Author contribution

Yi GUO and Jia-xin CHEN designed the strategy of the research presented in this article. NPC tissues were collected by Jia-xin CHEN, and biology experiments were carried out by Yi GUO. Yi GUO, Jia-xin CHEN, Shu YANG, and Xu-ping FU performed microarray data and RT-PCR analysis. Yi GUO drafted the manuscript. Zheng ZHANG, Ke-he CHEN, Yan HUANG, Yao LI, Yi XIE, and Yu-min MAO helped to revise the manuscript. Yao LI organized all the results and provided advice in the preparation of the manuscript.