Systematic evaluation of multiple qPCR platforms, NanoString and miRNA-Seq for microRNA biomarker discovery in human biofluids

Aberrant miRNA expression has been associated with many diseases, and extracellular miRNAs that circulate in the bloodstream are remarkably stable. Recently, there has been growing interest in identifying cell-free circulating miRNAs that can serve as non-invasive biomarkers for early detection of disease or selection of treatment options. However, quantifying miRNA levels in biofluids is technically challenging due to their low abundance. Using reference samples, we performed a cross-platform evaluation in which miRNA profiling was performed on four different qPCR platforms (MiRXES, Qiagen, Applied Biosystems, Exiqon), nCounter technology (NanoString), and miRNA-Seq. Overall, our results suggest that using miRNA-Seq for discovery and targeted qPCR for validation is a rational strategy for miRNA biomarker development in clinical samples that involve limited amounts of biofluids.

miRNA profiling platforms. The four qPCR platforms evaluated in this study have slightly different methods for miRNA profiling (Table 1). For example, the ABI and MiRXES platforms rely on gene-specific primers for reverse transcription, but the Qiagen and Exiqon platforms perform universal A-tailing. The most comprehensive miRNA panels available from each manufacturer at the time of experiment were selected. Qiagen's miRNome panel had the highest number of mature miRNAs (1066 miRNAs) whereas the MIRXES panel was the most focused (560 miRNAs) ( Table 2). The number of overlapping miRNAs across the four qPCR panels is 438, significantly higher than the 196 overlapping miRNAs analyzed in the miRQC study 19 . Expression levels of miRNAs in the Qiagen, Exiqon and ABI panels were determined using Ct values and normalized by spike-in RNA or endogenous small RNA and/or calibrators accordingly to manufacturers' recommendations.
The NanoString assay involves ligating unique oligonucleotide tags onto target miRNAs, followed by detection of targets by hybridization to color-coded probes. Each target-probe pair is individually resolved and counted on the instrument. The nCounter Human v3 miRNA codeset contains probes for measuring 800 miRNAs ( Table 2).
As the relative performance of miRNA-Seq library preparation kits from different manufacturers was unclear, we evaluated the performance of two different kits on our reference samples: the TruSeq Small RNA Library Prep   ). Ultimately, we selected the TruSeq method for miRNA-Seq as our evaluation showed that the TruSeq kit is superior to the NEXTflex kit in overall yield and consistency of library preparation, miRNA detection rate and total hands-on time (detailed report in Supplementary Note 1). The miRNA-Seq assay relies on the presence of 5′-phosphate and 3′-hydroxyl groups on mature miRNAs to ligate specific RNA adapters onto each end of the target molecule. Reverse transcription is used to generate cDNA, which is then amplified by PCR to generate molecules that are compatible for sequencing on an Illumina sequencer. MiRNA expression levels are obtained by counting the number of sequence reads that align to regions of the human genome corresponding to mature miRNA sequences. We observed lower alignment rates in Ref. Serum and plasma samples (~ 9% and ~ 16% respectively) compared to the Brain QC sample (~ 38%), which may be due to a lower overall miRNA content in the biofluid samples (Table 3). In miRNA-Seq, the count data for each sample is normalized to sequencing depth and is represented as the number of reads per million mapped miRNA reads (RPM). In principle, miRNA-Seq can capture the complete range of miRNA species in a sample and has the potential to measure the largest number of miRNAs ( Table 2). As the platform is sequence agnostic, miRNA-Seq can be used to identify and quantitate novel miRNAs, i.e., sequences that are likely to be miRNAs but are not annotated in miRBase.
Reproducibility. On each platform, at least three independent runs were performed on Ref. Serum to obtain miRNA expression profiles. The concordance correlation coefficient (ccc) was calculated between runs to assess the reproducibility of each platform ( Supplementary Fig. S1). We observed that the MiRXES qPCR and miRNA-Seq platforms had almost perfect concordance between runs (ccc = 0.99), the other three qPCR platforms had moderate inter-run concordance (ccc > 0.9), and the NanoString platform had poor inter-run concordance (ccc = 0.82) ( Table 2). However, expression profiles obtained from the Brain QC sample had almost perfect inter-run concordance on both the miRNA-Seq and NanoString platforms (ccc = 0.99), suggesting that the lower reproducibility observed with NanoString may be restricted to sample types with lower miRNA content, such as serum ( Supplementary Fig. S1). The miRNA-Seq method was also highly robust, as miRNA expression levels obtained from libraries with different input volume or PCR cycle number were highly concordant (Supplementary Table S1).
Detection rate and lower limit of quantitation. Detection rate is determined by a combination of platform content, sensitivity, and the platform-specific cutoff used for the limit of detection (LOD). Each platform's lower limit of quantification (LLOQ) is the lowest number of miRNAs that can be quantitatively determined with a defined precision. In this study, LLOQ was determined using a cutoff of 50% coefficient of variation (CV) on all platforms. Using a LOD of five sequence reads, the number of unique miRNAs detected by miRNA-Seq in the Brain QC and Ref. Serum samples did not saturate even at a sequencing depth of 50 million reads (Fig. 2a). At this read depth, greater than 1150 and 650 miRNAs were detected above the LOD in the Brain QC and Ref.
Serum samples, respectively. This suggests that miRNA-Seq can achieve very high sensitivity in biofluid samples by capturing a complex library of miRNA molecules, followed by deep sampling by next-generation sequencing. For miRNA-Seq, the LLOQ in Ref. Serum increases with higher sequencing depth, and was approximately eight  www.nature.com/scientificreports/ reads at a sequencing depth of 20 million reads (Fig. 2b). In the Brain QC sample, the LLOQ was approximately four reads and was relatively stable at different sequencing depths (Fig. 2b). Our subsampling analysis suggests that miRNA-Seq libraries in serum should be sequenced to ~ 20 million reads as the number of miRNAs that can be reliably detected appears to reach saturation at this sequencing depth (Fig. 2a). However, as the number of additional miRNAs detected above LLOQ in Ref.
Serum is relatively small when sequencing depth was increased from 10 to 20 million reads (300 vs 372 miRNAs), sequencing more samples at a lower depth of 10 million reads may be preferable in biomarker projects where budgets are limited. Conversely, the number of miRNAs that were detected above LLOQ in the Brain QC sample did not saturate even at a sequencing depth of 50 million reads (Fig. 2a). For tissue samples with more complex miRNA content, a trade-off will have to be made between increasing miRNA detection with sequencing depth and enhancing statistical power with more samples. A more meaningful performance parameter is the number of miRNAs that can be reliably detected in a given sample. To that end, we calculated the LLOQ for each platform using expression data obtained from at least three independent runs (Online Methods, Supplementary Fig. S2). Overall, there were significant differences in the number of miRNAs detected above the LLOQ in Ref. Serum-NanoString only detected 84 miRNAs, miRNA-Seq detected 372 miRNAs, and the number of miRNAs detected by qPCR was highly variable between platforms ( Table 2). Across all platforms, the MiRXES qPCR platform detected the highest number of miRNAs above the LLOQ. Notably, the LLOQ could not be calculated for the Qiagen qPCR platform as the variability in measurement of miRNA levels was particularly high even at the lower Ct range ( Supplementary Fig. S2) and results from this platform were excluded from further analysis.
For miRNAs detected above the LLOQ, their expression levels were measured with highest precision by qPCR, compared to miRNA-Seq and NanoString (Fig. 2c).
Inter-platform correlation. A total of 626 miRNAs were detected above the LLOQ in Ref. Serum by at least one platform (Fig. 2d). A significant proportion of miRNAs (51%) were only detected by one platform, and only 36 miRNAs (6%) were detected by all five platforms (Fig. 2d). Among the qPCR platforms, the largest inter-platform overlap with miRNA-Seq was observed with MiRXES (243 miRNAs), followed by Exiqon (167 miRNAs) and ABI (142 miRNAs) (Supplementary Table S2). The discrepancy in detection rate between the qPCR platforms could be partially explained by the higher overall sensitivity of the MiRXES ID3EAL assay-a significant number of miRNAs (246 miRNAs) were only detected by MiRXES but not Exiqon or ABI. However, a substantial proportion of the miRNAs detected by Exiqon (21%) and ABI (20%) were not detected above the LLOQ by MiRXES. Among these miRNAs, around 70% were missing from the MiRXES panel, indicating that a reason for the discrepancy is non-overlapping panel content across qPCR platforms. However, each qPCR platform may also be unique in their ability to reliably detect some miRNAs. www.nature.com/scientificreports/ Next, we evaluated inter-platform correlation in miRNA levels measured in Ref. Serum (Fig. 3). Overall, moderate to strong correlation was observed between the qPCR platforms and between qPCR and miRNA-Seq, except for ABI qPCR and miRNA-Seq where inter-platform correlation is weak (r = 0.38). The highest inter-platform correlation was observed between Exiqon qPCR and MiRXES qPCR (r = 0.78). Similar dynamic range in miRNA levels was detected by miRNA-Seq and qPCR (~ 10 6 ). The correlation observed for NanoString with the other platforms were either not statistically significant (miRNA-Seq) or in the weak to moderate range (qPCR)-this could partly be explained by the lower dynamic range in miRNA detection for the NanoString platform (~ 10 3 ). In contrast, substantial overlap and strong inter-platform correlation was observed between miRNA-Seq and NanoString in the Brain QC sample (r = 0.63); however, miRNAs detected by NanoString appear to be skewed towards lower expression levels, i.e., below 100 counts ( Supplementary Fig. S3). These results suggest that lower detection sensitivity for the NanoString platform may be affecting its accuracy in measuring miRNA expression, particularly in samples with low miRNA content. Interestingly, strong correlation was observed between miRNA-Seq and MiRXES qPCR (r = 0.69); it was the second highest correlation observed between any two platforms in this study and this was comparable to that observed in twelve plasma samples on which miRNA profiling was also performed by both platforms (Supplementary Fig. S4).
Since miRNA-Seq and MiRXES qPCR had strong inter-platform correlation and higher assay precision and detection rate compared to the other platforms evaluated in this study, we decided to explore the potential of   Fig. S5) and 517 miRNAs by qPCR alone.
Validation of novel serum miRNAs detected by miRNA-Seq. Using miRNA-Seq as a discovery platform can also enable the identification of novel miRNAs that can be verified subsequently by qPCR. Novel miRNAs that were identified by miRNA-Seq in Ref. Serum were filtered for those that had a mirDeep score cutoff of ≥ 1 in technical replicates that were performed across three independent runs. The hairpin structure, mapped read distribution, and read counts observed in each predicted miRNA precursor suggested that they encoded true miRNAs (Supplementary Note 2). A total of 14 mature miRNA sequences that had average read counts per library ranging from 47 to 342,186 were selected for subsequent validation (Supplementary Table S3). The expression of these 14 miRNAs was measured by qPCR in Ref. Serum. Twelve miRNAs were detected above the LOD of the MiRXES platform (Ct < 40). We observed an inverse correlation between Ct values by qPCR and read counts by miRNA-Seq (Fig. 4a), although the correlation was not statistically significant at p = 0.05 (p = 0.08).  www.nature.com/scientificreports/ Next, we surveyed the intracellular and extracellular expression levels of these 14 predicted miRNAs by qPCR across six human cell lines of diverse origin-kidney (HEK-293), monocytic (THP-1), oesophageal (OE19), stomach (NUGC-3), colorectal (MDST8) and neural (NPC). Except for 31893-3p, which was also not detected by qPCR in Ref. Serum, intracellular expression for the remaining 13 miRNAs were detected at varying levels across the six cell lines (Fig. 4b). Notably, all 13 miRNAs were detected above the LOD in all cell lines. In contrast, extracellular expression was detected above the LOD for nine miRNAs in at least one cell line (Fig. 4c). Interestingly, only six miRNAs had detectable extracellular levels across all cell lines. For instance, 3503-5p was detected at relatively high levels (~ 27 Ct) in the cell culture media from HEK-293, THP-1 and NPC but was undetectable in OE19, MDST8 and NUGC-3, suggesting that this particular miRNA may not be secreted by cells of gastrointestinal origin. Further, we calculated the ratio of miRNA expression in the extracellular and intracellular fraction by normalizing for cell number and volume of culture media (Fig. 4d). This analysis revealed several interesting observations. Several miRNAs such as 41393-3p and 23042-3p were detected at substantially higher levels in the extracellular fraction across all cell lines (ratio > 10), suggesting that they were likely to be actively secreted by these cells. On the other hand, several miRNAs had ratios greater than or lesser than one in different cell lines. For instance, 28164-3p had a ratio of 5.5 and 0.1 in THP-1 and MDST8 cells respectively, which indicates that this miRNA is more likely to be secreted by the former. It should be noted that that intracellular RNA and extracellular RNA were extracted using different reagents, which may introduce bias in measuring absolute expression [34][35][36] .

Discussion
Overall, our results show that miRNA profiling by qPCR, NanoString and miRNA-Seq differed significantly in performance, cost and speed ( Table 2). Using a similar cutoff of 50% CV across all platforms, we found that MiRXES qPCR detected the highest number of miRNAs above the LLOQ in serum. This could be due to the use of a curated panel (optimized for human biofluids) and the high sensitivity and precision that is typically associated with qPCR technology. However, we also observed significant variability in performance in the other qPCR platforms evaluated in this study. Compared to MiRXES qPCR, miRNA-Seq detected slightly fewer miRNAs above the LLOQ in serum, but offers several advantages such as lower cost and faster turnaround time. Furthermore, the miRNA-Seq assay is target agnostic and will not require sample-specific optimization. The NanoString platform detected the lowest number of miRNAs above the LLOQ in serum. This could be due to several reasons: (a) in contrast to qPCR and miRNA-Seq, the NanoString assay is likely to have lower sensitivity as it does not involve target amplification, (b) the NanoString assay only allows a maximum RNA input volume of 3 µl, which is slightly lower than that of miRNA-Seq and qPCR. On the latter, we have found that using higher amounts of RNA input can increase the number of miRNAs detected above the LLOQ on the NanoString platform (data not shown). Therefore, it is likely that the performance of the NanoString platform on biofluid samples can be improved by increasing the overall RNA concentration of the sample, e.g., using vacuum concentration or a starting with higher volume of serum or plasma.
MiRNA biomarker development and successful translation into the clinic will require a robust assay that is capable of performing reliable measurement of miRNA abundance. Therefore, applying an LLOQ cutoff during biomarker discovery would be a reasonable approach to select candidates for subsequent follow up. To that end, we found that the number of miRNAs that can be reliably measured in serum reaches saturation at a sequencing depth of ~ 20 million reads. We suggest that this fit-for-purpose approach to validate the miRNA-Seq assay can maximize cost effectiveness by providing a rational basis for the amount of sequencing required per sample.
The platform-specific biases in miRNA detection that we observed in biofluids are not surprising, given that similar observations were made in other sample types. Nonetheless, we observed a high degree of overlap and strong correlation between miRNA-Seq and qPCR. Even though qPCR platforms are limited to the number of assays available, we showed that using miRNA-Seq to identify candidate miRNA biomarkers for subsequent validation by qPCR is a viable approach. The use of an expanded panel allowed us to perform qPCR-based measurements for the majority (86%) of miRNAs that were detected by miRNA-Seq, which resulted in the co-detection of 320 miRNAs on both platforms. To our knowledge, this is the largest number of miRNAs that can be reliably measured on two independent platforms in a single serum sample. Given the strengths and weaknesses of each platform, we propose that using miRNA-Seq for miRNA biomarker discovery and subsequent validation using qPCR represents a rational combination for miRNA biomarker development. However, our study did not assess the ability of each platform to detect differences between samples (e.g., healthy individuals and patients), so the potential of this approach will have to be evaluated in a subsequent study.
Another advantage of using miRNA-Seq for biomarker discovery is its ability to identify novel miRNAs. Of the 14 mature miRNAs that were selected for further analysis, most of them were also detected by qPCR in Ref.
Serum. The existence of these predicted miRNAs in the serum suggests that they might be actively or selectively secreted 37 . Alternatively, these miRNAs may also be released into the circulation as a by-product of cell death 37 . By measuring the intracellular and extracellular expression levels of these predicted miRNAs across multiple cell lines, we provide experimental validation for many of these novel miRNAs and evidence for their active and/or tissue-specific secretion into the circulation.

Methods
Ethics statement. All patients provided written informed consent according to the Declaration of Helsinki, and the study was approved by the institutional review board of the National University Hospital (Singapore). All methods in this study were performed in accordance with relevant guidelines and regulations. www.nature.com/scientificreports/ RNA samples. Reference Serum is a human serum stock that was created by pooling serum samples from different individuals (Innovative Research, Inc.). RNA extraction was performed on 200 µl aliquots of Reference Serum using the miRNeasy Serum/Plasma Kit (Qiagen) by following the manufacturer's recommendations, with the following modifications: (a) a set of three proprietary spike-in control RNAs (20 nt, MiRXES) with sequences distinct from any of the 2588 annotated mature human miRNAs (miRBase release 21) were spiked into the sample lysis buffer (QIAzol Lysis Reagent, Qiagen) at high, medium and low levels prior to sample RNA isolation; (b) due to the low abundance of total RNA in serum, bacteriophage MS2 RNA was added into the lysis buffer (1 µg per ml of QIAZol) to improve RNA isolation yield; (c) after mixing with chloroform, centrifugation was performed at 18,000×g for 15 min at room temperature; (d) the final elution step was performed with 15 µl of RNase-free water. RNA extracted from multiple aliquots of Reference Serum was pooled for cross-platform evaluation of microRNA profiling. FirstChoice Human Brain Reference RNA (Thermo Fisher Scientific) was used as a QC sample for miRNA-Seq and NanoString. Plasma samples were obtained from patients undergoing bariatric surgery at the National University Hospital (NUH). RNA extraction on plasma samples were performed using the same procedure described above.
Cell lines. The following human cell lines were obtained commercially and cultured using the manufacturer's recommendations-HEK-293 (ATCC), THP-1 (ATCC), OE19 (ECACC), NUGC-3 (JCRB), MDST8 (ECACC). Neural progenitor cells (NPC) were derived from hESCs (Es Cell International Pte Ltd) using a previously published protocol 38 . All cell lines were cultured in T75 flasks with 20 to 30 ml of culture media for up to 48 h. Cells were grown at sub-confluent density to minimize the release of intracellular miRNAs into the culture media due to cell death. To isolate extracellular RNA, 1 ml of cell culture media was transferred to a 1.5 ml tube and subjected to centrifugation at 16,000×g for 15 min to remove cells and cell debris. 200 µl of the supernatant was harvested for RNA extraction using the same procedure described above for serum/plasma. To isolate intracellular RNA, cells were harvested by trypsin treatment (for adherent cell lines), followed by centrifugation at 200×g for 5 min. The resulting pellet was resuspended in 1 ml of TRI Reagent (Sigma Aldrich) and RNA extraction was performed following manufacturer's instructions. The Ct values obtained by qPCR were converted by applying a sample-specific scaling factor to normalize for the number of cells or amount of cell culture media that was measured. The normalized expression level for each miRNA was expressed as Arbitrary Units (AU) per million cells.
MiRXES qPCR. Ten microliters of serum or plasma RNA was reverse transcribed according to manufacturer's instructions (MiRXES) on a Veriti Thermal Cycler (Applied Biosystems). Template-specific RT primers for 560 miRNAs were divided into ten primer pools (50-to 60-plex per pool). Each primer pool was used for a multiplex RT reaction with 1 µl of RNA. A 6-log serial dilution of synthetic templates (10 7 to 10 2 copies) for each miRNA and a non-template control (nuclease-free water spiked with MS2) were reversed-transcribed with the isolated sample RNA. This enabled absolute quantification of sample miRNA expression copy numbers through intrapolation using standards curves generated with synthetic miRNA. Each cDNA sample was pre-amplified by 14 cycles of PCR using Augmentation Primer Pools (MiRXES) on the Veriti Thermal Cycler. Single-plex qPCR reactions were performed on the pre-amplified cDNA sample using miRNA-specific qPCR assays and Xtensa Sybr Green qPCR Master Mix according to manufacturer's instructions (MiRXES). All qPCR reactions were performed with technical duplicates on the ViiA qPCR system (384-well configuration, Applied Biosystems). Raw threshold cycle (Ct) values were calculated using the ViiA 7 RUO software with automatic baseline setting and a threshold of 0.5. After quantification, miRNA levels (in copies per ml of serum or plasma) were determined by intra-polation of the sample miRNA Ct value to the standard curve of its synthetic miRNA template (Supplementary Table S4). Technical variations were normalized by a set of three proprietary spike-in control RNAs that were spiked at high, medium and low concentration prior to RNA isolation. For the 92 new miRNA assays that were added to the expanded panel, specific miRNA RT-qPCR assays were developed and miRNAs were measured using a similar qPCR workflow described above. RT primers for the 92 miRNAs were divided into two primer pools (50-60-plex per pool). Each primer pool was used for a multiplex RT reaction with 1 µl of RNA.
ABI qPCR. . The PCR plates were sealed, briefly centrifuged and run on a ViiA qPCR system (384-well configuration). Raw Ct values were calculated using the ViiA 7 RUO software with automatic baseline setting and a threshold of 0.5. A normalization factor for each sample was calculated from the residuals of the Ct of miRTC and the PPC in comparison to their respective averages over all samples. The normalization factor was then subtracted from the raw Ct for the normalized Ct for each miRNA in that sample (Supplementary Table S7).
NanoString. Three microliters of serum RNA and 100 ng of brain RNA were processed according to the manufacturer's recommendations for the nCounter Human v3 miRNA Expression Assay Kit (NanoString; CSO-MIR3-12). Each sample was scanned for 555 FOV on the nCounter Digital Analyzer. Raw count data from independent runs of Reference Serum were processed together in nSolver (v2.0; NanoString). For each gene, count data was processed as follows: (a) background subtraction using the geometric mean of the negative controls, (b) normalization to the geometric mean of the positive controls, and (c) normalization to the geometric mean of the top 100 most highly expressed microRNAs. For both methods, gel purification of the library construct was performed by ethanol precipitation. Purified libraries were verified on a High Sensitivity D1000 ScreenTape (Agilent Technologies) and quantified by realtime PCR using a Library Quantification Kit (KAPA Biosystems). Pooled libraries from each run were loaded at 1.8 pM for 1 × 40 bp sequencing on a NextSeq 500 using a High Output Kit (Illumina.). Sequence data in fastq format was obtained using bcl2fastq (v2.16).
The CAP-miRSeq pipeline 39 was used to process and analyze miRNA-Seq data. In brief, reads were trimmed for low quality bases and Illumina adapter sequences using cutadapt 40 . For data generated using the NEXTflex method, additional trimming was performed to remove the randomized adapter sequences at the ligation junctions. Trimmed reads were aligned to the hg38 reference genome assembly (http://hgdow nload .cse.ucsc.edu/ golde nPath /hg38/bigZi ps/hg38.fa.gz) using the miRDeep2 package 41 . Expression levels for known microRNAs were quantified by counting reads mapping to microRNA coordinates defined in miRBase release 21 (ftp://mirba se.org/pub/mirba se/21) and novel microRNAs were also predicted by miRDeep2. Based on structure and read signature for each potential miRNA precursor, the mirDeep2 core algorithm assigns a score for each potential miRNA precursor that reflects the likelihood of it being a genuine miRNA. The CAP-miRSeq pipeline was run using default parameters. Within each sample, expression levels for mature microRNAs were normalized to reads per million mapped reads (RPM). Statistical analyses. The lower limit of quantification (LLOQ) was determined using at least three independent runs on each platform. The microRNA expression data was fitted by LOESS (locally estimated scatterplot smoothing), to estimate the trend between mean expression (e.g., normalized counts or Ct values) and variability (i.e., coefficient of variation). LLOQ was determined using a cutoff of 50% CV on all platforms (Supplementary File 1). Lin's concordance correlation coefficient 42 was calculated using the epi.ccc function in the epiR package (https ://cran.r-proje ct.org/web/packa ges/epiR/index .html).