Identification of reference genes for circulating microRNA analysis in colorectal cancer

Quantitative real-time PCR (qPCR) is the most frequently used method for measuring expression levels of microRNAs (miRNAs), which is based on normalization to endogenous references. Although circulating miRNAs have been regarded as potential non-invasive biomarker of disease, no study has been performed so far on reference miRNAs for normalization in colorectal cancer. In this study we tried to identify optimal reference miRNAs for qPCR analysis across colorectal cancer patients and healthy individuals. 485 blood-derived miRNAs were profiled in serum sample pools of both colorectal cancer and healthy control. Seven candidate miRNAs chosen from profiling results as well as three previous reported reference miRNAs were validated using qPCR in 30 colorectal cancer patients and 30 healthy individuals, and thereafter analyzed by statistical algorithms BestKeeper, geNorm and NormFinder. Taken together, hsa-miR-93-5p, hsa-miR-25-3p and hsa-miR-106b-5p were recommended as a set of suitable reference genes. More interestingly, the three miRNAs validated from 485 miRNAs are derived from a single primary transcript, indicting the cluster may be highly conserved in colorectal cancer. However, all three miRNAs differed significantly between healthy individuals and non-small cell lung cancer or breast cancer patients and could not be used as reference genes in the two types of cancer.

In this article, we aim to develop miRNA profiling data normalization approach based on the selection from large scale of candidate miRNAs in colorectal cancer. First, 485 human miRNAs were selected and subsequently quantified with a sensitive qPCR-based assay, the S-Poly(T) Plus assay 21 , of which, 372 miRNAs that could be detected in human serum or plasma were obtained from QIAGEN website (http://www.sabiosciences.com/genetable.php?pcatn= MIHS-3106Z); 113 other blood-derived miRNAs were extracted from literatures with the key words "microRNA/miRNA", "serum/plasma/blood" in combination with "human/hsa-". Then, BestKeeper, geNorm and NormFinder algorithms were applied to identify the most stable reference genes. Finally, the recommended normalizer(s) was (were) validated in colorectal cancer as well as non-small cell lung cancer (NSCLC) and breast cancer.

Results
Selection of candidate reference miRNAs from profiling. To identify suitable reference genes across the healthy individuals and colorectal cancer patients, we have developed miRNA profiling data normalization approach based on the selection from large scale of candidate miRNAs. 485 miRNAs (Supplemental Table 1) were assessed with qPCR approach and were depicted in the volcano plot as in Fig. 1. Seven out of 485 putative reference miRNAs (hsa-miR-320d, hsa-miR-25-3p, hsa-miR-92b-3p, hsa-miR-106b-5p, hsa-miR-10b-5p, hsa-miR-107 and hsa-miR-574-5p) were selected based on the following criteria: (a) the Ct values of all miRNA were < 32; (b) the fold change between the healthy populations and colorectal cancer patients was lower than 1.1; (c) no significant differences were existed between the healthy and diseased groups (P > 0.05). In addition, three candidates (hsa-miR-101-3p, hsa-miR-16-5p, and hsa-miR-93-5p) were also chosen as potential reference miR-NAs according to literatures [22][23][24][25][26] . For the three miRNAs, we also used miRNA profiling data in this study to make sure consistent results. Taking together, ten putative reference miRNAs were subjected to further analyses (Table 1) Table 1. Relative quantities of candidate reference miRNAs in gnome-wild serum profile. Hsa-miR-101-3p, hsa-miR-16-5p and hsa-miR-93-5p have been reported as normalizers in literatures; seven additional putative miRNAs (hsa-miR-320d, hsa-miR-25-3p, hsa-miR-92b-3p, hsa-miR-106b-5p, hsa-miR-10b-5p, hsa-miR-107 and hsa-miR-574-5p) were extracted with standards: fold change < 1.1 and P value > 0.05. Data of ten miRNA was from profiling data set in this study to make sure consistent results. All the Ct value was less than 32. between the healthy and diseased groups, except hsa-miR-574-5p (P < 0.01) (Fig. 2). Therefore, we picked the left nine putative reference miRNAs and reassess their potential contributions as normalizers following the instructions of the BestKeeper, geNorm and NormFinder.
BestKeeper was first used to analyze the expression variability of each miRNA by calculating the Ct set standard deviation (SD). Genes with SD greater than 1 were assumed to be in consistent. This application ranked hsa-miR-320d, hsa-miR-92b-3p, hsa-miR-25-3p and hsa-miR-106b-5p as the least variable genes (SD values are 0.498, 0.748, 0.842 and 0.966, respectively), suggesting these four candidates could be used as reference genes in miRNA analysis of colorectal cancer ( Table 2).   Based on fold change data (see the statistical method in details), geNorm generates a stability value M using the average pair wise variation between all tested genes, accompanied by stepwise exclusion of the least stable gene until the two most stable genes were left (Fig. 3A). Genes with the highest M value have the least stable expression, whereas the genes with the lowest M value have the most sable expression. In this study, hsa-miR-106b-5p, hsa-miR-93-5p and hsa-miR-25-3p showed the lowest M values, indicating with the most stable expression (Table 1). In addition, geNorm calculates a normalization factor (V NF value), which is a criterion for the optimal number reference genes. V NF value of 1.5 was set as a cutoff for proper normalization. When this threshold achieved, it is not necessary to include any additional reference genes 27 . In our data sets, only V 8/9 was less than 0.15, indicating the nine miRNAs combination as the best reference normalization factor (Fig. 3B). It is not so convenient for this combined reference set as a normalizer; therefore, this threshold should not be viewed too strictly.
Similar to geNorm, NormFinder evaluated most stable genes with the lowest M value. NormFinder also showed that hsa-miR-106b-5p was the most stable candidate reference gene with M value 0.228, followed by hsa-miR-25-3p and hsa-miR-93-5p with an M value of 0.269 and 0.283, respectively ( Table 2).
Although displaying differently, the results provided by BestKeeper, geNorm and Normfinder had overlaps. BestKeeper indicated that hsa-miR-320d, hsa-miR-92b-3p, hsa-miR-25-3p and hsa-miR-106b-5p could be used as reference miRNAs; both geNorm and NornFinder recommended hsa-miR-106b-5p, hsa-miR-25-3p and hsa-miR-93-5p as top three references. Overall, hsa-miR-106b-5p and hsa-miR-25-3p were recommended as suitable reference miRNAs by three algorithms. hsa-miR-93-5p ranked second when it was normalized by geNorm and NormFinder, and had been reported as validated reference miRNA [23][24][25] . We therefore included hsa-miR-93-5p as another reference gene (Tables 1 and 2). Comparing with a single reference gene, the combination of more than one normalizer may increase the accuracy of quantification in qPCR study. We therefore propose that the set of miRNAs including hsa-miR-106b-5p, hsa-miR-25-3p and hsa-miR-93-5p could be used as reference genes as combination or separately in colorectal cancer. Then, there is no need for a spike-in reference, because RNA extraction efficiency will not have influence on the miRNA expression pattern when endogenous gene(s) was(were) used as normalizer(s).
To further demonstrate the applicability of the three reference miRNAs, another set of samples from a different hospital (Peking University Shenzhen Hospital) were collected. Sera were obtained from 30 healthy individuals and 30 colorectal cancer patients. The expression levels of hsa-miR-106b-5p, hsa-miR-25-3p and hsa-miR-93-5p still showed no significant difference between the healthy and diseased groups (Supplemental Fig. 1). Besides, the S-Poly(T) Plus method, was revaluated by comparing with commercial TaqMan microRNA assay kit (Applied Biosystems). It turned out that the S-Poly(T) Plus assay showed a 2.6~263-fold increase in sensitivity (Supplemental Fig. 2), indicating that the miRNA measurement method used in this study are reliable.
Revalidated reference miRNAs in non-small cell lung cancer and breast cancer. It was also tested whether one or a set of miRNAs could be used as universal reference genes for other types of cancer. Expression patterns of the top three reference miRNAs of colorectal cancer (hsa-miR-93-5p, hsa-miR-25-3p and hsa-miR-106b-5p) were determined in 30 healthy donors, 30 non-small cell lung cancer patients and 30 breast cancer patients, respectively. miRNA levels were normalized to spiked-in cel-miR-54-5p. The results showed that all three miRNAs differed significantly between healthy controls and non-small cell lung cancer patients or healthy controls and breast cancer patients, indicating that hsa-miR-93-5p, hsa-miR-25-3p and hsa-miR-106b-5p might be colorectal cancer specific as reference genes (Fig. 5). Moreover, it had been proved that other six candidate reference miRNAs (hsa-miR-107, hsa-miR-10b-5p, hsa-miR-92b-3p, hsa-miR-101-3p, hsa-miR-320d, and

Discussion
Since the mature miRNAs were first amplified and quantified by RT-PCR 31 , qPCR has been the most frequently used approach in miRNA expression studies. The accuracy of qPCR depends on suitable normalization of data inasmuch as inappropriate use of reference genes can significantly alter the results of target miRNA quantification. Numerous RNA species, including rRNA, snRNA, and miRNAs had been used as reference genes so far [34][35][36] . However, it has some problem of rRNA and snRNAs in serum or plasma in aspects of the abundance and stability.
In our study, we combined multiple strategies for identifying appropriate reference miRNAs. First, candidate reference genes were selected with qPCR approach from genome-wide serum miRNA expression profile (see the methods). To our knowledge, such assessment and validation of reference miRNAs for colorectal cancer studies has not be reported. Comparing to those reference genes identified by microarray or extracted from pervious literature, the accuracy and comparability of miRNA expression level data have been dramatically improved. Then, non-human miRNA cel-miR-54-5p from Caenorhabditis elegans was used as spiked-in in terms of less technical variation and more accurate identification of biological changes. Excluding technical variation, true reference genes should be stably expressed in the serum mixture pool and a larger cohort of colorectal cancer and healthy individuals. Therefore, after all miRNAs were profiled, putative reference miRNAs were validated in each sample; and candidate reference miRNAs with the least differences were confirmed and analyzed in BestKeeper, geNorm and NormFinder. Finally, hsa-miR-25-3p, hsa-miR-93-5p and hsa-miR-106b-5p were identified from 485 miR-NAs as best set of reference genes in colorectal cancer.
hsa-miR-144-3p has been reported as potential biomarker in colorectal cancer and its forced expression was associated with malignant potential and prognosis 30,37 . In our study, hsa-miR-144-3p was significantly up-regulated in colorectal cancer patients comparing to healthy donors when cel-miR-54-5p, hsa-miR-93-5p, hsa-miR-25-3p and hsa-miR-106b-5p were served as normalizers. It confirms the suitability of hsa-miR-93-5p, hsa-miR-25-3p and hsa-miR-106b-5p to be used as optimal reference genes in the quantification. On the other hand, the difference was more significant when hsa-miR-144-3p were normalized to hsa-miR-106b-5p (Fig. 4). These results indicated that normalization to systematically selected reference genes could detect a miRNA even with smaller differences, and that hsa-miR-106b-5p could be somewhat better than hsa-miR-25-3p and hsa-miR-93-5p as a normalizer.
hsa-miR-223 has been frequently observed to be aberrantly expressed in varieties of tumor 34,38,39 , but there was no consistent results about the expression levels in colorectal cancer. Overexpression of hsa-miR-223 had been reported in colorectal adenocarcinomas 40 ; however, evidences revealed that there was no significant difference between colorectal cancer patients and controls for hsa-miR-223 in plasma 28 or tissue 41 . Regardless of different in samples, these results have drawn particular attention to the effect of references in normalizing the results, and demonstrated the urgent need for identification of suitable reference genes to produce reliable data. In our results, the expression level of hsa-miR-223-3p varied differently when cel-miR-54-5p, hsa-miR-93-5p and hsa-miR-106b-5p were used as normalizer but not when data normalized to hsa-miR-25-3p. Considering that it Figure 5. Expression levels of hsa-miR-93-5p, hsa-miR-25-3p and hsa-miR-106b-5p in non-small cell lung cancer and breast cancer. miRNAs were quantified in the plasma of 30 non-small cell lung cancer patients, 30 breast cancer patients and 30 healthy individuals. miRNA levels were normalized to spiked-in cel-miR-54-5p and represented in scatter plots. Data are shown as means ± SE, **P < 0.01, ***P < 0.001. was consistent when hsa-miR-106b-5p, cel-miR-54-5p and hsa-miR-93-5p were used as normalizers, we therefore believed that it hsa-miR-223-3p was significant down-regulated tendencybetween the colorectal cancer patient and healthy control in our study.
This study was the first systematic investigation to identify appropriate reference miRNAs in colorectal cancer. We have identified hsa-miR-106b-5p, hsa-miR-93-5p and hsa-miR-25-3p as a set of best reference genes for qPCR normalization analysis in colorectal cancer. To our surprise, these three candidate references validated from 485 miRNAs were located in a single primary transcript on chromosome 7, indicating the function of miR-106b-25 (miR-106b/miR-93/miR-25) cluster may be highly conserved in colorectal cancer. Meanwhile, it also proved that the strategies we used in the identification of reference miRNAs in clinical serum sample between controls and diseased group are reliable.
We also evaluated the expression pattern of hsa-miR-25-3p, hsa-miR-93-5p and hsa-miR-106b-5p in non-small cell lung cancer and breast cancer. However, these miRNAs were significantly down-regulated in cancer patients and hence could not be used as references. Whether they could be used as reference genes for other cancers remains to be determined.

Methods
Serum collection. Human sera were collected from donors in Shenzhen People's Hospital (Shenzhen, China) including 86 healthy donors, 66 patients with colorectal cancer, 30 patients with non-small cell lung cancer and 30 patients with breast cancer. Unless stated, samples used in this study were from this hospital. Another set of serum samples were collected from 30 healthy individuals and 30 colorectal cancer patients who came from Peking University Shenzhen Hospital (Shenzhen, China). The information of all donors was detailed in Table 3. The blood samples were kept at room temperature for 1 h and then centrifuged at 3,000 × g for 10 min at 4 °C. The sera were then stored at − 80 °C. All procedures were carried out according to the approved guidelines. Both the two Institutional Ethics Committees at the Shenzhen People's Hospital (Shenzhen, China) and Peking University Shenzhen Hospital (Shenzhen, China) approved the experiments, and written informed consent was obtained from all subjects. RNA extraction. Total RNA was isolated using our developed serum/plasma (S/P) miRsol method as previous described 21 . Of note, 0.1 pM spiked-in Caenorhabditis elegans cel-miR-54-5p was used for normalization and determination of technical variation in RNA extraction.
Polyadenylation, reverse transcription and Real-time PCR. The polyadenylation, reverse transcription and Real-time PCR procedure were conducted exactly according to S-Poly(T) Plus protocol 21 . To profile miRNAs effectively, every 7 out of 485 miRNAs as well as spiked-in cel-miR-54 were grouped together for the one-step reaction of polyadenylation and reverse transcription, simultaneously. miRNAs with identical forward primers or with more than five base-pairings between the forward primer and RT primers have been avoided in the same group. All sequences, primers and probes were listed in the Supplemental Table 1. Besides, the S-Poly(T) Plus assay was evaluated by comparing with TaqMan microRNA assay kit (Applied Biosystems) according to the manufacturer's instructions. No-template control (NTC) and no-reverse transcriptase control (-RT) were conducted simultaneously.
Real-time PCR was performed in 96-well plates by using ABI StepOne Plus thermal cycler. Each PCR reaction was carried out in duplicate. The miRNA expression level was normalized to spiked-in cel-miR-54-5p. miRNAs with cycle threshold (Ct) value less than 35 in the panel were included in data analysis.

Selection and validation of reference miRNAs.
A three-step test was designed to select and validate reference miRNAs for colorectal cancer. In the initial screening, we pooled serum samples of 86 healthy subjects and 66 patients with colorectal cancer respectively. 485 miRNAs from the two pools were detected using  the S-Poly(T) Plus method. Then, miRNAs without significant difference between colorectal cancer group and healthy group (fold change < 1.1 and P > 0.05) were selected for validation in each individual of 30 colorectal cancer patients and 30 healthy controls (randomly chosen from 86 colorectal cancer patients and 66 healthy donors). miRNA(s) stably expressed in the serum mixture pool and individuals was (were) selected as most suitable candidate reference(s) in colorectal cancer. Ultimately, we selected hsa-miR-27a-3p, hsa-miR-144-3p and hsa-miR-223-3p as target miRNAs to test the effect of the candidate reference(s) on the accuracy of qPCR results. Moreover, to find out if the selected reference(s) also stably expressed in other diseases, it (they) was (were) validated in each individual of 30 non-small cell lung cancer and 30 breast cancer patients. Fold change of miR-NAs were calculated using the 2 −ΔΔCt method and spiked-in cel-miR-54-5p was used as a normalization control.
Statistics. Statistical analysis was carried out using the GraphPad Prism 5 using two-tailed Student's test.
Data were present as means ± SE (standard error). P value of less than 0.05 was considered statistically significant. Mean values were obtained from SPSS software, version19.0 (SPSS Inc., Chicago, IL, USA). The stability of the candidate reference genes was evaluated by the computer program BestKeeper 42 , geNorm 27 and NormFinder 43 . In the BestKeeper software, data observations are in the form of raw threshold cycles, while fold change were used as input data in both geNorm and NormFinder.