Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Detection of splice isoforms and rare intermediates using multiplexed primer extension sequencing

Abstract

Targeted RNA sequencing (RNA-seq) aims to focus coverage on areas of interest that are inadequately sampled in standard RNA-seq experiments. Here we present multiplexed primer extension sequencing (MPE-seq), an approach for targeted RNA-seq that uses complex pools of reverse-transcription primers to enable sequencing enrichment at user-selected locations across the genome. We targeted hundreds to thousands of pre-mRNA splice junctions and obtained high-precision detection of splice isoforms, including rare pre-mRNA splicing intermediates.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: MPE-seq uses complex pools of reverse-transcription primers to target sequencing to regions of interest.
Fig. 2: MPE-seq enrichment allows high-precision measurements of splicing.
Fig. 3: MPE-seq allows genome-wide profiling of lariat intermediates.

Similar content being viewed by others

Data availability

All sequencing data are available through NCBI’s Sequence Read Archive (SRA) under accession number SRP148810.

References

  1. Merkin, J., Russell, C., Chen, P. & Burge, C. B. Science 338, 1593–1599 (2012).

    Article  CAS  Google Scholar 

  2. Barbosa-Morais, N. L. et al. Science 338, 1587–1593 (2012).

    Article  CAS  Google Scholar 

  3. Mercer, T. R. et al. Nat. Biotechnol. 30, 99–104 (2011).

    Article  Google Scholar 

  4. Mercer, T. R. et al. Nat. Protoc. 9, 989–1009 (2014).

    Article  CAS  Google Scholar 

  5. Blomquist, T. M. et al. PLoS One 8, e79120 (2013).

    Article  Google Scholar 

  6. Kivioja, T. et al. Nat. Methods 9, 72–74 (2011).

    Article  Google Scholar 

  7. Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Biotechniques 30, 892–897 (2001).

    Article  CAS  Google Scholar 

  8. Zheng, W., Chung, L. M. & Zhao, H. BMC Bioinformatics 12, 290 (2011).

    Article  CAS  Google Scholar 

  9. Carey, M. F., Peterson, C. L. & Smale, S. T. Cold Spring Harb. Protoc. 2013, 164–173 (2013).

    Google Scholar 

  10. Coombes, C. E. & Boeke, J. D. RNA 11, 323–331 (2005).

    Article  CAS  Google Scholar 

  11. Padgett, R. A. et al. Proc. Natl Acad. Sci. USA 82, 8349–8353 (1985).

    Article  CAS  Google Scholar 

  12. Booth, G. T., Wang, I. X., Cheung, V. G. & Lis, J. T. Genome Res. 26, 799–811 (2016).

    Article  CAS  Google Scholar 

  13. Kim, S. H. & Lin, R. J. Mol. Cell. Biol. 16, 6810–6819 (1996).

    Article  CAS  Google Scholar 

  14. Chen, W. et al. Cell 173, 1031–1044 (2018).

    Article  CAS  Google Scholar 

  15. Wan, W., Lu, M., Wang, D., Gao, X. & Hong, J. Sci. Rep. 7, 6119 (2017).

    Article  Google Scholar 

  16. Stepankiw, N., Raghavan, M., Fogarty, E. A., Grimson, A. & Pleiss, J. A. Nucleic Acids Res. 43, 8488–8501 (2015).

    Article  CAS  Google Scholar 

  17. Nojima, T. et al. Cell 161, 526–540 (2015).

    Article  CAS  Google Scholar 

  18. Burke, J. E. et al. Cell 173, 1014–1030 (2018).

    Article  CAS  Google Scholar 

  19. Lucks, J. B. et al. Proc. Natl Acad. Sci. USA 108, 11063–11068 (2011).

    Article  CAS  Google Scholar 

  20. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Nature 505, 701–705 (2014).

    Article  CAS  Google Scholar 

  21. Hartwell, L. H., McLaughlin, C. S. & Warner, J. R. Mol. Gen. Genet. 109, 42–56 (1970).

    Article  CAS  Google Scholar 

  22. Wernersson, R. & Nielsen, H. B. Nucleic Acids Res. 33, W611–W615 (2005).

    Article  CAS  Google Scholar 

  23. Collart, M. A. & Oliviero, S. Curr. Protoc. Mol. Biol. 23, 13.12.1–13.12.5 (2001).

    Article  Google Scholar 

  24. Engel, S. R. et al. G3 (Bethesda) 4, 389–398 (2014).

    Article  Google Scholar 

  25. Dobin, A. et al. Bioinformatics 29, 15–21 (2013).

    Article  CAS  Google Scholar 

  26. Rhind, N. et al. Science 332, 930–936 (2011).

    Article  CAS  Google Scholar 

  27. Quinlan, A. R. & Hall, I. M. Bioinformatics 26, 841–842 (2010).

    Article  CAS  Google Scholar 

  28. Mayerle, M. et al. Proc. Natl Acad. Sci. USA 114, 4739–4744 (2017).

    Article  CAS  Google Scholar 

  29. Grate, L. & Ares, M. Jr. Methods Enzymol. 350, 380–392 (2002).

    Article  CAS  Google Scholar 

  30. Ramírez, F. et al. Nucleic Acids Res. 44, W160–W165 (2016).

    Article  Google Scholar 

  31. Conesa, A. et al. Genome Biol. 17, 13 (2016).

    Article  Google Scholar 

Download references

Acknowledgements

We thank members of the J.A.P., H. Kwak, and A. Grimson laboratories, as well as the anonymous reviewers for critical feedback on this work. We thank L. Yao for initial drafts of Fig. 1a and Supplementary Fig. 9a. We thank P. Schweitzer, J. Grenier, and the BRC Genomics Facility at Cornell for outstanding technical support with Illumina sequencing. This work was funded by the American Cancer Society (Research Scholars Grant to J.A.P.) and the NIH (grant R01GM098634 to J.A.P.).

Author information

Authors and Affiliations

Authors

Contributions

H.X., B.J.F., Z.W.D., M.G., and J.A.P. contributed to research design. H.X., B.J.F., Z.W.D., and M.G. performed research and analyzed data. All authors wrote the paper.

Corresponding author

Correspondence to Jeffrey A. Pleiss.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Elevated temperatures in reverse-transcription reactions increase specificity.

The fraction of on-target and off-target reads from replicate MPE-seq libraries generated from reverse-transcription reactions carried out at various temperatures. A small fraction of reads were categorized as “Unextended primer,” which corresponds to short primer extension products (0–5 bases extended past the primer), and thus were categorized as neither cDNAs derived from RNA targets nor unmappable.

Supplementary Figure 2 Percentage of reads mapping to targeted regions.

The percentage of reads mapped to target and off-target regions is depicted for MPE-seq and conventional RNA-seq. In MPE-seq a small fraction of reads were categorized as “Unextended primer,” which corresponds to short primer extension products (0–5 bases extended past the primer), and thus were not categorized as cDNAs derived from RNA targets.

Supplementary Figure 3 Expression measurements as determined by MPE-seq and RNA-seq.

(a) A scatter plot depicts gene expression measurements (RNA-seq) in replicate datasets. Genes containing splice events that were among those chosen for targeted sequencing are depicted in red. These targeted genes range in expression level by orders of magnitude. (b) A scatter plot depicts gene expression measurements in replicate MPE-seq datasets (red). Similar to conventional RNA-seq, expression measurements in MPE-seq are highly reproducible between replicates, even for the small proportion of mis-priming events that map to off-target locations (gray). (c) A scatter plot depicts gene expression measurements in RNA-seq and MPE-seq. The right shift of targeted genes reflects successful enrichment of targets by orders of magnitude. The observation that even highly expressed genes as measured by RNA-seq are proportionally highly expressed in MPE-seq suggests that primers are not limiting during reverse transcription.

Supplementary Figure 4 Splicing measurements as determined by MPE-seq and RNA-seq.

(a) A scatter plot depicts intron-retention measurements in MPE-seq and conventional RNA-seq using wild-type (Prp2) RNA. For calculation of R2, n = 252 intron-retention events that were quantified, requiring at least one spliced read and one unspliced read in both experiments. (b) A scatter plot depicts intron-retention measurements in MPE-seq and conventional RNA-seq using RNA from a splicing mutant strain (prp2-1). For calculation of R2, n = 193 intron-retention events that were quantified, requiring at least one spliced read and one unspliced read in both experiments. (c) A scatter plot depicts the fold-change (prp2-1/Prp2) in intron retention as measured by MPE-seq and conventional RNA-seq. For calculation of R2, n = 203 intron-retention events that were quantified, requiring that both RNA-seq and MPE-seq be quantifiable in the wild-type (Prp2) dataset.

Supplementary Figure 5 Schematic for assigning reads to splice intermediate isoforms.

(a) Schematic depicting cDNA products derived from pre-first-step (P), lariat intermediate (L), and spliced mRNA (S) isoforms. (b) To quantify the abundance of P, L, and S isoforms for each targeted splice event, we counted read fragments and categorized them into six classes based on paired-end alignments. Fragments containing a splice junction (C1 and C2) are indicative of S. Fragments that are unspliced and traverse the branch-point region (C3) are classified as P. Fragments that are unspliced but terminate within a window of –3 to +5 bp from the previously determined branch point (C4) are classified as L. Fragments that are unspliced and either terminate downstream of the branch point (C5) or for which the terminus could not be mapped (C6) are ambiguous between P and L. Therefore, for accounting purposes, the counts for these fragments were coerced into P and L classifications based on the ratio of P and L determined by unambiguous mappings (C3 and C4). See Methods for more details.

Supplementary Figure 6 Transcription start site profiling by MPE-seq.

Metagene profile of 3ʹ ends mapped by MPE-seq, centered on transcription start sites (TSSs) as determined by PRO-cap, an orthologous method for mapping TSSs. The high abundance of read ends that pile up at TSSs indicates that MPE-seq can be used to profile cDNA termini.

Supplementary Figure 7 Lariat-intermediate-derived cDNAs contain a unique signature of mismatches.

(a) Genome browser screenshot of the 3ʹ ends of reads from paired-end sequenced fragments illustrates the unique signature of non-templated base incorporation by reverse transcriptase at a branched adenosine versus the 5ʹ RNA terminus. (b) Genome-wide quantification of the mismatch frequencies at 3ʹ termini of cDNAs near the TSS (left) versus at the annotated branch point (right).

Supplementary Figure 8 Transcript features that correlate with the abundance of lariat intermediates.

(a) The abundance of pre-first-step RNA and lariat intermediate RNA is significantly correlated with the classification of introns into those that are in ribosomal protein genes (RPG) and non-RPG. However, the abundance of lariat intermediate relative to pre-first-step RNA, a metric of the efficiency of the second step of splicing, does not correlate. Horizontal lines in box plots represent the 25th, 50th, and 75th percentiles. Whiskers end at the 0th and 100th percentiles. A two-sided Mann–Whitney U-test was used to test differences between the distributions of these metrics. For each metric (pre-first-step, lariat intermediate, and lariat intermediate of unspliced), n = 141 introns for which we attempted lariat quantification and found at least one spliced read, of which 64 were RPG and 77 were non-RPG. (b) Spearman correlations of various features to the abundance of pre-first-step RNA, lariat intermediate, or the abundance of lariat intermediate relative to pre-first-step RNA. None of these features significantly correlate with the abundance of lariat intermediate relative to pre-first-step RNA, a metric of the efficiency of the second step of splicing. Error bars indicate 95% confidence intervals as estimated by Fisher transformation of Spearman’s correlation coefficient. For each metric (pre-first-step, lariat intermediate, and lariat intermediate of unspliced), n = 141 introns for which we attempted lariat quantification and found at least one spliced read.

Supplementary Figure 9 Array-based oligonucleotide synthesis can be used to generate primer pools for use in MPE-seq.

(a) Obtaining adequate amounts of primer pools for MPE-seq from cost-effective array-based oligonucleotide synthesis can be achieved in four steps. (1) PCR amplification of the oligonucleotide synthesis pool using a 5ʹ blocked sense primer and a biotinylated antisense primer. (2) Restriction digestion to cleave off the PCR primer handle. (3) Lambda exonuclease digestion of free 5ʹ ends. (4) Streptavidin purification of biotinylated PCR handle. The unbound fraction is the desired primer pool product. (b) Steps during the amplification and purification of array-synthesized primer pools are monitored via native gel electrophoresis. The control lane represents a pool of individually synthesized MPE-seq primers which did not require amplification and purification. Lanes refer to products of each individual step in the protocol. The unbound fraction is the desired primer pool product. Similar results have been consistently obtained in >3 independent experiments. (c) The percentage of reads mapped to target and off-target regions is depicted for MPE-seq using array-synthesized primers. (d) A scatter plot compares the fraction of unspliced mRNAs measured by MPE-seq libraries which used individually synthesized primer pools versus array-based synthesis of primer pools. For calculation of Pearson’s correlation coefficient, n = 140 intron-retention events which were quantified, requiring at least one spliced read and one unspliced read in both experiments. Sashimi plot of a targeted region within the ats1 gene locus demonstrates the capacity of MPE-seq to reveal complex alternative splicing patterns with higher sensitivity than RNA-seq, despite having lower total sequencing depth.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9

Reporting Summary

Supplementary Table 1

Description of the sequencing libraries in this study

Supplementary Table 2

Counts of unannotated splice sites at target introns for S. cerevisiae. Data based on MPE-seq (samples WT_A and WT_B).

Supplementary Table 3

Counts of pre-first-step RNA (P), lariat intermediate (L), and spliced (S) for all targeted introns for S. cerevisiae. Data based on samples WT_A and WT_B,

Supplementary Table 4

Counts of exon–exon junctions at target introns for S. pombe. Data based on MPE-seq (WT_A) and RNA-seq (Rhind et al.26).

Supplementary Table 5

Reverse-transcription primer pools for S. cerevisiae

Supplementary Table 6

Reverse-transcription primer pools for S. pombe

Supplementary Table 7

Other primers used in this study

Supplementary Table 8

Branch-point annotations used (bedfile containing coordinates of regions from branch to 3' splice site)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, H., Fair, B.J., Dwyer, Z.W. et al. Detection of splice isoforms and rare intermediates using multiplexed primer extension sequencing. Nat Methods 16, 55–58 (2019). https://doi.org/10.1038/s41592-018-0258-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-018-0258-x

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing