Abstract
Targeted RNA sequencing (RNA-seq) aims to focus coverage on areas of interest that are inadequately sampled in standard RNA-seq experiments. Here we present multiplexed primer extension sequencing (MPE-seq), an approach for targeted RNA-seq that uses complex pools of reverse-transcription primers to enable sequencing enrichment at user-selected locations across the genome. We targeted hundreds to thousands of pre-mRNA splice junctions and obtained high-precision detection of splice isoforms, including rare pre-mRNA splicing intermediates.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All sequencing data are available through NCBI’s Sequence Read Archive (SRA) under accession number SRP148810.
References
Merkin, J., Russell, C., Chen, P. & Burge, C. B. Science 338, 1593–1599 (2012).
Barbosa-Morais, N. L. et al. Science 338, 1587–1593 (2012).
Mercer, T. R. et al. Nat. Biotechnol. 30, 99–104 (2011).
Mercer, T. R. et al. Nat. Protoc. 9, 989–1009 (2014).
Blomquist, T. M. et al. PLoS One 8, e79120 (2013).
Kivioja, T. et al. Nat. Methods 9, 72–74 (2011).
Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Biotechniques 30, 892–897 (2001).
Zheng, W., Chung, L. M. & Zhao, H. BMC Bioinformatics 12, 290 (2011).
Carey, M. F., Peterson, C. L. & Smale, S. T. Cold Spring Harb. Protoc. 2013, 164–173 (2013).
Coombes, C. E. & Boeke, J. D. RNA 11, 323–331 (2005).
Padgett, R. A. et al. Proc. Natl Acad. Sci. USA 82, 8349–8353 (1985).
Booth, G. T., Wang, I. X., Cheung, V. G. & Lis, J. T. Genome Res. 26, 799–811 (2016).
Kim, S. H. & Lin, R. J. Mol. Cell. Biol. 16, 6810–6819 (1996).
Chen, W. et al. Cell 173, 1031–1044 (2018).
Wan, W., Lu, M., Wang, D., Gao, X. & Hong, J. Sci. Rep. 7, 6119 (2017).
Stepankiw, N., Raghavan, M., Fogarty, E. A., Grimson, A. & Pleiss, J. A. Nucleic Acids Res. 43, 8488–8501 (2015).
Nojima, T. et al. Cell 161, 526–540 (2015).
Burke, J. E. et al. Cell 173, 1014–1030 (2018).
Lucks, J. B. et al. Proc. Natl Acad. Sci. USA 108, 11063–11068 (2011).
Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Nature 505, 701–705 (2014).
Hartwell, L. H., McLaughlin, C. S. & Warner, J. R. Mol. Gen. Genet. 109, 42–56 (1970).
Wernersson, R. & Nielsen, H. B. Nucleic Acids Res. 33, W611–W615 (2005).
Collart, M. A. & Oliviero, S. Curr. Protoc. Mol. Biol. 23, 13.12.1–13.12.5 (2001).
Engel, S. R. et al. G3 (Bethesda) 4, 389–398 (2014).
Dobin, A. et al. Bioinformatics 29, 15–21 (2013).
Rhind, N. et al. Science 332, 930–936 (2011).
Quinlan, A. R. & Hall, I. M. Bioinformatics 26, 841–842 (2010).
Mayerle, M. et al. Proc. Natl Acad. Sci. USA 114, 4739–4744 (2017).
Grate, L. & Ares, M. Jr. Methods Enzymol. 350, 380–392 (2002).
Ramírez, F. et al. Nucleic Acids Res. 44, W160–W165 (2016).
Conesa, A. et al. Genome Biol. 17, 13 (2016).
Acknowledgements
We thank members of the J.A.P., H. Kwak, and A. Grimson laboratories, as well as the anonymous reviewers for critical feedback on this work. We thank L. Yao for initial drafts of Fig. 1a and Supplementary Fig. 9a. We thank P. Schweitzer, J. Grenier, and the BRC Genomics Facility at Cornell for outstanding technical support with Illumina sequencing. This work was funded by the American Cancer Society (Research Scholars Grant to J.A.P.) and the NIH (grant R01GM098634 to J.A.P.).
Author information
Authors and Affiliations
Contributions
H.X., B.J.F., Z.W.D., M.G., and J.A.P. contributed to research design. H.X., B.J.F., Z.W.D., and M.G. performed research and analyzed data. All authors wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Elevated temperatures in reverse-transcription reactions increase specificity.
The fraction of on-target and off-target reads from replicate MPE-seq libraries generated from reverse-transcription reactions carried out at various temperatures. A small fraction of reads were categorized as “Unextended primer,” which corresponds to short primer extension products (0–5 bases extended past the primer), and thus were categorized as neither cDNAs derived from RNA targets nor unmappable.
Supplementary Figure 2 Percentage of reads mapping to targeted regions.
The percentage of reads mapped to target and off-target regions is depicted for MPE-seq and conventional RNA-seq. In MPE-seq a small fraction of reads were categorized as “Unextended primer,” which corresponds to short primer extension products (0–5 bases extended past the primer), and thus were not categorized as cDNAs derived from RNA targets.
Supplementary Figure 3 Expression measurements as determined by MPE-seq and RNA-seq.
(a) A scatter plot depicts gene expression measurements (RNA-seq) in replicate datasets. Genes containing splice events that were among those chosen for targeted sequencing are depicted in red. These targeted genes range in expression level by orders of magnitude. (b) A scatter plot depicts gene expression measurements in replicate MPE-seq datasets (red). Similar to conventional RNA-seq, expression measurements in MPE-seq are highly reproducible between replicates, even for the small proportion of mis-priming events that map to off-target locations (gray). (c) A scatter plot depicts gene expression measurements in RNA-seq and MPE-seq. The right shift of targeted genes reflects successful enrichment of targets by orders of magnitude. The observation that even highly expressed genes as measured by RNA-seq are proportionally highly expressed in MPE-seq suggests that primers are not limiting during reverse transcription.
Supplementary Figure 4 Splicing measurements as determined by MPE-seq and RNA-seq.
(a) A scatter plot depicts intron-retention measurements in MPE-seq and conventional RNA-seq using wild-type (Prp2) RNA. For calculation of R2, n = 252 intron-retention events that were quantified, requiring at least one spliced read and one unspliced read in both experiments. (b) A scatter plot depicts intron-retention measurements in MPE-seq and conventional RNA-seq using RNA from a splicing mutant strain (prp2-1). For calculation of R2, n = 193 intron-retention events that were quantified, requiring at least one spliced read and one unspliced read in both experiments. (c) A scatter plot depicts the fold-change (prp2-1/Prp2) in intron retention as measured by MPE-seq and conventional RNA-seq. For calculation of R2, n = 203 intron-retention events that were quantified, requiring that both RNA-seq and MPE-seq be quantifiable in the wild-type (Prp2) dataset.
Supplementary Figure 5 Schematic for assigning reads to splice intermediate isoforms.
(a) Schematic depicting cDNA products derived from pre-first-step (P), lariat intermediate (L), and spliced mRNA (S) isoforms. (b) To quantify the abundance of P, L, and S isoforms for each targeted splice event, we counted read fragments and categorized them into six classes based on paired-end alignments. Fragments containing a splice junction (C1 and C2) are indicative of S. Fragments that are unspliced and traverse the branch-point region (C3) are classified as P. Fragments that are unspliced but terminate within a window of –3 to +5 bp from the previously determined branch point (C4) are classified as L. Fragments that are unspliced and either terminate downstream of the branch point (C5) or for which the terminus could not be mapped (C6) are ambiguous between P and L. Therefore, for accounting purposes, the counts for these fragments were coerced into P and L classifications based on the ratio of P and L determined by unambiguous mappings (C3 and C4). See Methods for more details.
Supplementary Figure 6 Transcription start site profiling by MPE-seq.
Metagene profile of 3ʹ ends mapped by MPE-seq, centered on transcription start sites (TSSs) as determined by PRO-cap, an orthologous method for mapping TSSs. The high abundance of read ends that pile up at TSSs indicates that MPE-seq can be used to profile cDNA termini.
Supplementary Figure 7 Lariat-intermediate-derived cDNAs contain a unique signature of mismatches.
(a) Genome browser screenshot of the 3ʹ ends of reads from paired-end sequenced fragments illustrates the unique signature of non-templated base incorporation by reverse transcriptase at a branched adenosine versus the 5ʹ RNA terminus. (b) Genome-wide quantification of the mismatch frequencies at 3ʹ termini of cDNAs near the TSS (left) versus at the annotated branch point (right).
Supplementary Figure 8 Transcript features that correlate with the abundance of lariat intermediates.
(a) The abundance of pre-first-step RNA and lariat intermediate RNA is significantly correlated with the classification of introns into those that are in ribosomal protein genes (RPG) and non-RPG. However, the abundance of lariat intermediate relative to pre-first-step RNA, a metric of the efficiency of the second step of splicing, does not correlate. Horizontal lines in box plots represent the 25th, 50th, and 75th percentiles. Whiskers end at the 0th and 100th percentiles. A two-sided Mann–Whitney U-test was used to test differences between the distributions of these metrics. For each metric (pre-first-step, lariat intermediate, and lariat intermediate of unspliced), n = 141 introns for which we attempted lariat quantification and found at least one spliced read, of which 64 were RPG and 77 were non-RPG. (b) Spearman correlations of various features to the abundance of pre-first-step RNA, lariat intermediate, or the abundance of lariat intermediate relative to pre-first-step RNA. None of these features significantly correlate with the abundance of lariat intermediate relative to pre-first-step RNA, a metric of the efficiency of the second step of splicing. Error bars indicate 95% confidence intervals as estimated by Fisher transformation of Spearman’s correlation coefficient. For each metric (pre-first-step, lariat intermediate, and lariat intermediate of unspliced), n = 141 introns for which we attempted lariat quantification and found at least one spliced read.
Supplementary Figure 9 Array-based oligonucleotide synthesis can be used to generate primer pools for use in MPE-seq.
(a) Obtaining adequate amounts of primer pools for MPE-seq from cost-effective array-based oligonucleotide synthesis can be achieved in four steps. (1) PCR amplification of the oligonucleotide synthesis pool using a 5ʹ blocked sense primer and a biotinylated antisense primer. (2) Restriction digestion to cleave off the PCR primer handle. (3) Lambda exonuclease digestion of free 5ʹ ends. (4) Streptavidin purification of biotinylated PCR handle. The unbound fraction is the desired primer pool product. (b) Steps during the amplification and purification of array-synthesized primer pools are monitored via native gel electrophoresis. The control lane represents a pool of individually synthesized MPE-seq primers which did not require amplification and purification. Lanes refer to products of each individual step in the protocol. The unbound fraction is the desired primer pool product. Similar results have been consistently obtained in >3 independent experiments. (c) The percentage of reads mapped to target and off-target regions is depicted for MPE-seq using array-synthesized primers. (d) A scatter plot compares the fraction of unspliced mRNAs measured by MPE-seq libraries which used individually synthesized primer pools versus array-based synthesis of primer pools. For calculation of Pearson’s correlation coefficient, n = 140 intron-retention events which were quantified, requiring at least one spliced read and one unspliced read in both experiments. Sashimi plot of a targeted region within the ats1 gene locus demonstrates the capacity of MPE-seq to reveal complex alternative splicing patterns with higher sensitivity than RNA-seq, despite having lower total sequencing depth.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–9
Supplementary Table 1
Description of the sequencing libraries in this study
Supplementary Table 2
Counts of unannotated splice sites at target introns for S. cerevisiae. Data based on MPE-seq (samples WT_A and WT_B).
Supplementary Table 3
Counts of pre-first-step RNA (P), lariat intermediate (L), and spliced (S) for all targeted introns for S. cerevisiae. Data based on samples WT_A and WT_B,
Supplementary Table 4
Counts of exon–exon junctions at target introns for S. pombe. Data based on MPE-seq (WT_A) and RNA-seq (Rhind et al.26).
Supplementary Table 5
Reverse-transcription primer pools for S. cerevisiae
Supplementary Table 6
Reverse-transcription primer pools for S. pombe
Supplementary Table 7
Other primers used in this study
Supplementary Table 8
Branch-point annotations used (bedfile containing coordinates of regions from branch to 3' splice site)
Rights and permissions
About this article
Cite this article
Xu, H., Fair, B.J., Dwyer, Z.W. et al. Detection of splice isoforms and rare intermediates using multiplexed primer extension sequencing. Nat Methods 16, 55–58 (2019). https://doi.org/10.1038/s41592-018-0258-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-018-0258-x