We compared quantitative RT-PCR (qRT-PCR), RNA-seq and capture sequencing (CaptureSeq) in terms of their ability to assemble and quantify long noncoding RNAs and novel coding exons across 20 human tissues. CaptureSeq was superior for the detection and quantification of genes with low expression, showed little technical variation and accurately measured differential expression. This approach expands and refines previous annotations and simultaneously generates an expression atlas.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Clark, M.B. et al. PLoS Biol. 9, e1000625 (2011).
Kapranov, P., Willingham, A.T. & Gingeras, T.R. Nat. Rev. Genet. 8, 413–423 (2007).
Djebali, S. et al. Nature 489, 101–108 (2012).
Jiang, L. et al. Genome Res. 21, 1543–1551 (2011).
Mercer, T.R. et al. Nat. Protoc. 9, 989–1009 (2014).
Mercer, T.R. et al. Nat. Biotechnol. 30, 99–104 (2012).
ERCC Consortium. BMC Genomics 6, 150 (2005).
Hansen, K.D., Brenner, S.E. & Dudoit, S. Nucleic Acids Res. 38, e131 (2010).
Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L. & Pachter, L. Genome Biol. 12, R22 (2011).
Cabili, M.N. et al. Genes Dev. 25, 1915–1927 (2011).
Derrien, T. et al. Genome Res. 22, 1775–1789 (2012).
Harrow, J. et al. Genome Res. 22, 1760–1774 (2012).
Amaral, P.P., Clark, M.B., Gascoigne, D.K., Dinger, M.E. & Mattick, J.S. Nucleic Acids Res. 39, D146–D151 (2011).
Wang, L. et al. Nucleic Acids Res. 41, e74 (2013).
Finn, R.D. et al. Nucleic Acids Res. 42, D222–D230 (2014).
Keren, H., Lev-Maor, G. & Ast, G. Nat. Rev. Genet. 11, 345–355 (2010).
Lindblad-Toh, K. et al. Nature 478, 476–482 (2011).
FANTOM Consortium. Nature 507, 462–470 (2014).
Andersson, R. et al. Nature 507, 455–461 (2015).
Roadmap Epigenomics Consortium. Nature 518, 317–330 (2015).
Mercer, T.R. et al. Genome Res. 25, 290–303 (2015).
Hsu, F. et al. Bioinformatics 22, 1036–1046 (2006).
Pruitt, K.D. et al. Nucleic Acids Res. 42, D756–D763 (2014).
Ning, Z., Cox, A.J. & Mullikin, J.C. Genome Res. 11, 1725–1729 (2001).
Martin, J.A. & Wang, Z. Nat. Rev. Genet. 12, 671–682 (2011).
Kim, D. et al. Genome Biol. 14, R36 (2013).
Langmead, B. & Salzberg, S.L. Nat. Methods 9, 357–359 (2012).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Dobin, A. et al. Bioinformatics 29, 15–21 (2013).
Haas, B.J. et al. Nat. Protoc. 8, 1494–1512 (2013).
Trapnell, C. et al. Nat. Protoc. 7, 562–578 (2012).
Anders, S., Pyl, P.T. & Huber, W. Bioinformatics 31, 166–169 (2015).
Quinlan, A.R. & Hall, I.M. Bioinformatics 26, 841–842 (2010).
Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).
Crooks, G.E., Hon, G., Chandonia, J.M. & Brenner, S.E. Genome Res. 14, 1188–1190 (2004).
Love, M.I., Huber, W. & Anders, S. Genome Biol. 15, 550 (2014).
Robinson, M.D., McCarthy, D.J. & Smyth, G.K. Bioinformatics 26, 139–140 (2010).
Blanchette, M. et al. Genome Res. 14, 708–715 (2004).
Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Genome Res. 20, 110–121 (2010).
Sherry, S.T. et al. Nucleic Acids Res. 29, 308–311 (2001).
Forbes, S.A. et al. Nucleic Acids Res. 39, D945–D950 (2011).
Stenson, P.D. et al. Genome Med 1, 13 (2009).
The authors acknowledge the following funding sources: an Australian National Health and Medical Research Council (NHMRC) Australia Fellowship (631668 to J.S.M. and 631542 to M.E.D.), an NHMRC Early Career Fellowship (APP1072662 to M.B.C.), an EMBO Long Term Fellowship (ALTF 864-2013 to M.B.C.), the Queensland State Government (National and International Research Alliance Program to L.K.N.) and an EMBL Interdisciplinary Postdoc (EIPOD) under Marie Curie Actions (COFUND) (to G.B.). The contents of the published material are solely the responsibility of the administering institution, a participating institution or individual authors and do not reflect the views of NHMRC. The authors thank the ENCODE consortium for the provision of data; data were employed in strict accordance with the associated data-release policy. The authors also thank Prof. M. Brown (University of Queensland) for contributions to manuscript preparation.
T.R.M. is a recipient of a Roche Discovery Agreement (2014). M.B.C. has received research support from Roche/Nimblegen for an unrelated research project.
Integrated supplementary information
Supplementary Figure 1 Advantages of RNA CaptureSeq for profiling genes with alternative splicing or low expression.
(a) Schematic figure indicating limitations of qRT-PCR for quantifying alternative splicing events. (b,c) Dynamic range of K562 cell transcriptome populations demonstrated by transcript (b) or exon (c) expression. Notably, the top 1% of transcripts comprises 38.4% of the total expressed mRNA population. (d) Calculated maximal fold enrichment achieved by CaptureSeq relative to number of genes (combining all known isoforms) targeted (estimated gene expression based on average gene expression in human K562 cell line). Note that higher enrichments can be maintained by removing highly expressed isoforms and gene loci from CaptureSeq targets.
Supplementary Figure 2 Comparative analysis of ERCC spike-in quantification using RNA sequencing and CaptureSeq.
(a) Fold enrichment achieved by CaptureSeq for each ERCC standard. High and variable enrichment at low ERCC concentrations results from low and sporadic alignment of RNAseq reads to ERCC standards. Decreasing enrichment at high ERCC concentrations is due to CaptureSeq saturation. Each technical replicate capture hybridization contained three biological replicate samples. (b) Spearman correlation of measured abundance by CaptureSeq of ERCC probes for three biological replicate samples in technical replicate. (c,d) Average Spearman correlation of measured abundance of ERCC probes for three biological replicates of CaptureSeq (c) and RNA-seq (d). (e) Segmented regression analysis indicates inflection point in the measured abundance of ERCC probes by CaptureSeq at an ERCC concentration of 2.34 attomol/μl (dotted line). n = 3 biological replicates; error bars are s.d. (f) RNA sequencing exhibits a linear profile across the range of ERCC concentrations it detects. n = 3 biological replicates; error bars are s.d.
(a,b) Averaged read coverage for each ERCC probe from RNA-seq and CaptureSeq. n = 3 biological replicates; error bars are s.d. Horizontal dotted line shows eightfold coverage. RepA and RepB are technical replicate capture hybridizations containing three biological replicate samples. (a) Number of ERCC transcripts required for eightfold coverage. (b) Concentration of ERCC transcripts required for eightfold coverage. Vertical dotted line marks concentrations above which CaptureSeq is saturated. Lowest three concentrations of probes (<0.00114 attomol/μl) have zero coverage in more than 50% of RNA-seq libraries. (c) Fold difference in variability between RNA-seq measurement of ERCC abundance and CaptureSeq technical replicates (n = 3 biological replicates). Horizontal dotted line is at 1 and −1 (no difference in variability). Values above 1 show RNA-seq is more variable; values below −1 show CaptureSeq is more variable. Vertical dotted line is the ERCC concentration that allows consistent eightfold coverage by RNA-seq. RNA-seq is more variable at low expression levels. (d) Mean difference between CaptureSeq and RNA-seq accuracy in measuring ERCC abundance. RNAseq provided less accurate expression measurements at low levels but was more accurate at high levels. n = 3 biological replicates; error bars are s.d.
(a,b) Relationships among ERCC length (a), GC% (b) and CaptureSeq performance on moderately expressed probes compared to RNA-seq (enrichment residuals shown). Spearman correlation shown; line is nonlinear regression fit. RepA and RepB are technical replicate capture hybridizations containing three biological replicate samples. (c) Combined sequence read coverage across ERCC all standards merged (left) or two representative ERCC controls (middle and right) by RNAseq (blue) and CaptureSeq (red). Difference between read coverage indicated by gray shaded area. (d) Relative nucleotide enrichment for ERCC sequences that exhibit differential coverage between RNA-seq and CaptureSeq. No specific nucleotide bias is observed in regions exhibiting differential coverage. (e) Sequenced read coverage profile of SMPD2 by RNA-seq (blue) and CaptureSeq (red). Only minor variation is observed between the two profiles.
(a) Pearson correlations of measured abundance of ERCC probes for one representative sample versus all others containing the same ERCC mix. Top, ERCC mix 1; bottom, ERCC mix 2. Two multiplexed capture hybridizations were performed containing a mix of ERCC mix 1 and 2 samples. Slightly higher correlations equate to samples present in the same hybridization. (b) Clustering of ERCC read counts following variance stabilizing transformation. ERCC mixes 1 (n = 5) and 2 (n = 4) clearly separate followed by separation by capture hybridization. Samples present in same hybridization shown in red and black, respectively. (c,d) Relationship between ERCC concentration and detected ERCC abundance. Segmental linear regression to determine the ERCC concentration at which saturation occurs (dotted line). Error bars are s.d. Linear slopes from segmental linear regression and the Pearson correlation for non-saturating concentrations are provided. (c) ERCC mix 1 samples; n = 5 biological replicates. Saturation at 1.30 attomol/μl. (d) ERCC mix 2 samples; n = 4 biological replicates. Saturation at 0.976 attomol/μl. (e) Averaged read coverage for each ERCC probe from ERCC mix 1 (n = 5) and ERCC mix 2 (n = 4) pools. Error bars are s.d. Y-axis dotted line shows eightfold coverage. (f) edgeR MA plot of log fold change for each ERCC control between the two mixes against transcript expression in log CPM (counts per million). Differentially expressed (DE) controls colored red; non-DE colored black. Zero fold change between two samples shown by blue line. edgeR performed using TMM normalization.
Supplementary Figure 6 Comparison of CaptureSeq and RNA-seq for differential gene expression analysis.
Comparison of CaptureSeq and RNA-seq for differential gene expression analysis. (a) Quantification of fold changes in ERCC standard abundances between two distinct samples (ERCC 1, n = 5 biological replicates; and ERCC 2, n = 4 biological replicates) for CaptureSeq and RNA-seq (with a matched number of reads). CaptureSeq records values for all ERCC standards (92); expression values were not obtained for 11 standards with RNA-seq. Slopes from nonlinear regression with a straight-line fit. (b) Variability in fold-change measurements for each ERCC fold-change category between CaptureSeq and matched RNA-seq. For each category RNA-seq showed greater variation. (c,d) edgeR MA plot of log fold change for each ERCC control against transcript expression in log CPM (counts per million). Differentially expressed (DE) controls colored red; non-DE colored black. Zero fold change between two samples shown by blue line. Matched RNA-seq (c), RNA-seq all reads, no downsampling (d). (e) Relationship between ERCC expression level (log CPM) and ability of CaptureSeq and RNA-seq to detect DE, given various levels of expression differences between two groups. Left, CaptureSeq; middle, matched RNA-seq; right, RNA-seq all reads, no downsampling. FDR, false discovery rate. 1% FDR shown by dashed line. FDR values limited to minimum value of 10−37.
Frequency distribution of expression for different gene classes according to biotype (a), gene ontology biological function (b) or annotation in disease database (c) in K562 cells. (d) Frequency distribution of probes relative to fraction of length with overlapping alignments from captured genomic DNA. We found greater than onefold coverage across the entirety of 96.5% of probes, thereby validating the ability to capture gDNA. (e) Plot showing measured relative to known abundance of ERCC standards by CaptureSeq (orange) and RNA-seq (dark blue). We have plotted measured abundance before (orange, light blue) and after (red, dark blue) removing duplicate reads. Although removing duplicate reads may reduce the impact of PCR amplification artifacts, it also causes the abundance of ERCC spike-ins to be underestimated, decreasing the quantitative range of CaptureSeq, and is therefore not recommended. (f) Genome browser view showing read alignment profile and assembled transcripts from RNA-seq (upper) and CaptureSeq (lower) across the Titin-antisense lncRNA locus. CaptureSeq read alignment shows higher specificity for exons, with fewer reads derived from nascent transcription present, resulting in more accurate transcript assembly. By contrast, RNA-seq shows a large amount of nascent transcription, resulting in the misassembly of the transcript locus with ‘retained’ introns.
Measured expression (FPKM) of ERCC standards in each human tissue library analyzed. Pearson’s correlation indicates the quantitative accuracy of libraries following capture. Despite enhanced coverage, some ERCC probes (red) remained undetected, indicating that sequencing had not proceeded to saturation.
(a) Proportion of introns with canonical splice junctions in previous coding and lncRNA exons is similar to new introns identified using CaptureSeq. (b) Sequence motif at 3’ intron end shows similar enrichment for poly-pyrimidine tract and splice elements in previously annotated introns and new introns identified by CaptureSeq. (c) Example of multiple previous lncRNA annotations that are merged into single higher-order contiguous lncRNA loci following more complete and accurate assembly with CaptureSeq. (d,e) Frequency distribution of open-reading-frame length and hexamer score indicates distinction between coding and noncoding transcripts analyzed from CaptureSeq assembled transcripts. (f) Box-whisker plot showing that CaptureSeq assembled gene models contained more exons and were more complete than previous annotations (based on GENCODE v19, Cabili et al. (2011), and lncRNAdb) used to design the capture array.
(a) Cumulative frequency distribution indicating the conservation (according to 100-way MutliZ Alignment) of previously annotated (based on GENCODE v19, Cabili et al., and lncRNAdb) coding and lncRNA exons relative to novel exons identified using CaptureSeq. (b) Conservation at 3’ exon boundary showing 3-nt periodicity characteristics of previously known coding gene exons (red) relative to new coding gene exons (orange) identified by CaptureSeq and (c) similar conservation of splice elements in previous lncRNA annotations relative to new lncRNA exons identified by CaptureSeq. (d) Comparison of SNP, repeat and predicted RNA secondary structure density between previous gene annotations (based on GENCODE v19, Cabili et al. (2011), and lncRNAdb) and new annotations assembled from CaptureSeq experiments.
(a) Targeting previous lncRNA annotations (blue) integrates them into a single complex locus. (b) CaptureSeq ensnares additional novel exons into the initial annotation, thereby expanding the previous annotation to annotated TSS’s. (c) CaptureSeq revises previous lncRNA annotations to identify a 1021 amino acid ORF. An assembly gap in GRCh37 (hg19) means the protein N-terminal may not be present. A new contig in GRCh38 places MGC50722 6kb upstream suggesting the possibility these two loci form one gene. (d) LncRNA can be erroneously annotated when only transcript fragments are available as demonstrated in example showing a lncRNA locus contains distal coding exons for downstream NPAS4 gene. Arrows indicate direction of transcription. Fantom 5 TSS on forward strand (red) and reverse strand (blue).
(a) Hierarchal clustering of lncRNA loci according to expression. (b) Example of brain-specific lncRNA Evf2 correctly assembled and quantified using CaptureSeq.
Examples of novel coding exons within GENCODE genes assembled following CaptureSeq. (a) Identification of a novel transcription start site for GLIS1 gene well supported by chromatin marks for transcriptional initiation. New first exon adds 175 amino acids to 5' of protein. (b) Novel internal coding exons in TNXB. Novel exon(s) are conserved and maintain the TNXB reading frame. (c) Targeting novel exons solely identified by evolutionary conservation enables the identification of novel exons that help assemble multiple HMNC2 annotation fragments into a contiguous gene locus. (d) Putative novel coding locus in bi-directional orientation with ZNF593 contains a 186 amino acid ORF.
Supplementary Figures 1–13 and Supplementary Results (PDF 2384 kb)
PCR primers utilized in this study. (XLSX 49 kb)
Capture transcripts containing putative novel coding exons plus associated putative novel coding exons and GENCODE genes. (XLSX 413 kb)
Novel coding exon transcript annotations. (XLSX 49 kb)
Human genome coordinates (hg19) of tiled regions used for lncRNA CaptureSeq experiment. (ZIP 1095 kb)
Human genome coordinates (hg19) for all captured and assembled noncoding RNAs. (ZIP 1629 kb)
Human genome coordinates (hg19) for all captured and assembled coding RNAs. (ZIP 497 kb)
Human genome coordinates (hg19) for all novel coding RNAs that join LncRNA and coding gene loci. (ZIP 127 kb)
Human genome coordinates (hg19) for all novel noncoding RNAs. (ZIP 1020 kb)
Transcript Annotation file (.gtf) for all assembled transcripts (comprehensive). (ZIP 4055 kb)
FPKM values for all assembled transcripts (comprehensive). (ZIP 9178 kb)
Human genome coordinates (hg19) for all putative novel coding exons that were expressed. Exon annotation as per Lindblad-Toh et al. (2011). (ZIP 14 kb)
Human genome coordinates (hg19) for all captured and assembled transcripts containing putative novel coding exons. (ZIP 53 kb)
About this article
Cite this article
Clark, M., Mercer, T., Bussotti, G. et al. Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing. Nat Methods 12, 339–342 (2015). https://doi.org/10.1038/nmeth.3321
Nature Communications (2020)
Comprehensive identification of mRNA isoforms reveals the diversity of neural cell-surface molecules with roles in retinal development and disease
Nature Communications (2020)
Journal of Cellular Biochemistry (2020)
Non-coding RNA Research (2020)
Nature Reviews Genetics (2020)