Detection of splice isoforms and rare intermediates using multiplexed primer extension sequencing

Xu, Hansen; Fair, Benjamin J.; Dwyer, Zachary W.; Gildea, Michael; Pleiss, Jeffrey A.

doi:10.1038/s41592-018-0258-x

Brief Communication
Published: 20 December 2018

Detection of splice isoforms and rare intermediates using multiplexed primer extension sequencing

Hansen Xu¹^na1,
Benjamin J. Fair¹^na1,
Zachary W. Dwyer¹,
Michael Gildea¹ &
…
Jeffrey A. Pleiss ORCID: orcid.org/0000-0002-2145-4007¹

Nature Methods volume 16, pages 55–58 (2019)Cite this article

4144 Accesses
11 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Targeted RNA sequencing (RNA-seq) aims to focus coverage on areas of interest that are inadequately sampled in standard RNA-seq experiments. Here we present multiplexed primer extension sequencing (MPE-seq), an approach for targeted RNA-seq that uses complex pools of reverse-transcription primers to enable sequencing enrichment at user-selected locations across the genome. We targeted hundreds to thousands of pre-mRNA splice junctions and obtained high-precision detection of splice isoforms, including rare pre-mRNA splicing intermediates.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: MPE-seq uses complex pools of reverse-transcription primers to target sequencing to regions of interest.**

**Fig. 2: MPE-seq enrichment allows high-precision measurements of splicing.**

**Fig. 3: MPE-seq allows genome-wide profiling of lariat intermediates.**

rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data

Article 23 February 2024

Accurate assembly of multi-end RNA-seq data with Scallop2

Article 28 March 2022

isoCirc catalogs full-length circular RNA isoforms in human transcriptomes

Article Open access 12 January 2021

Data availability

All sequencing data are available through NCBI’s Sequence Read Archive (SRA) under accession number SRP148810.

References

Merkin, J., Russell, C., Chen, P. & Burge, C. B. Science 338, 1593–1599 (2012).
Article CAS Google Scholar
Barbosa-Morais, N. L. et al. Science 338, 1587–1593 (2012).
Article CAS Google Scholar
Mercer, T. R. et al. Nat. Biotechnol. 30, 99–104 (2011).
Article Google Scholar
Mercer, T. R. et al. Nat. Protoc. 9, 989–1009 (2014).
Article CAS Google Scholar
Blomquist, T. M. et al. PLoS One 8, e79120 (2013).
Article Google Scholar
Kivioja, T. et al. Nat. Methods 9, 72–74 (2011).
Article Google Scholar
Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Biotechniques 30, 892–897 (2001).
Article CAS Google Scholar
Zheng, W., Chung, L. M. & Zhao, H. BMC Bioinformatics 12, 290 (2011).
Article CAS Google Scholar
Carey, M. F., Peterson, C. L. & Smale, S. T. Cold Spring Harb. Protoc. 2013, 164–173 (2013).
Google Scholar
Coombes, C. E. & Boeke, J. D. RNA 11, 323–331 (2005).
Article CAS Google Scholar
Padgett, R. A. et al. Proc. Natl Acad. Sci. USA 82, 8349–8353 (1985).
Article CAS Google Scholar
Booth, G. T., Wang, I. X., Cheung, V. G. & Lis, J. T. Genome Res. 26, 799–811 (2016).
Article CAS Google Scholar
Kim, S. H. & Lin, R. J. Mol. Cell. Biol. 16, 6810–6819 (1996).
Article CAS Google Scholar
Chen, W. et al. Cell 173, 1031–1044 (2018).
Article CAS Google Scholar
Wan, W., Lu, M., Wang, D., Gao, X. & Hong, J. Sci. Rep. 7, 6119 (2017).
Article Google Scholar
Stepankiw, N., Raghavan, M., Fogarty, E. A., Grimson, A. & Pleiss, J. A. Nucleic Acids Res. 43, 8488–8501 (2015).
Article CAS Google Scholar
Nojima, T. et al. Cell 161, 526–540 (2015).
Article CAS Google Scholar
Burke, J. E. et al. Cell 173, 1014–1030 (2018).
Article CAS Google Scholar
Lucks, J. B. et al. Proc. Natl Acad. Sci. USA 108, 11063–11068 (2011).
Article CAS Google Scholar
Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Nature 505, 701–705 (2014).
Article CAS Google Scholar
Hartwell, L. H., McLaughlin, C. S. & Warner, J. R. Mol. Gen. Genet. 109, 42–56 (1970).
Article CAS Google Scholar
Wernersson, R. & Nielsen, H. B. Nucleic Acids Res. 33, W611–W615 (2005).
Article CAS Google Scholar
Collart, M. A. & Oliviero, S. Curr. Protoc. Mol. Biol. 23, 13.12.1–13.12.5 (2001).
Article Google Scholar
Engel, S. R. et al. G3 (Bethesda) 4, 389–398 (2014).
Article Google Scholar
Dobin, A. et al. Bioinformatics 29, 15–21 (2013).
Article CAS Google Scholar
Rhind, N. et al. Science 332, 930–936 (2011).
Article CAS Google Scholar
Quinlan, A. R. & Hall, I. M. Bioinformatics 26, 841–842 (2010).
Article CAS Google Scholar
Mayerle, M. et al. Proc. Natl Acad. Sci. USA 114, 4739–4744 (2017).
Article CAS Google Scholar
Grate, L. & Ares, M. Jr. Methods Enzymol. 350, 380–392 (2002).
Article CAS Google Scholar
Ramírez, F. et al. Nucleic Acids Res. 44, W160–W165 (2016).
Article Google Scholar
Conesa, A. et al. Genome Biol. 17, 13 (2016).
Article Google Scholar

Download references

Acknowledgements

We thank members of the J.A.P., H. Kwak, and A. Grimson laboratories, as well as the anonymous reviewers for critical feedback on this work. We thank L. Yao for initial drafts of Fig. 1a and Supplementary Fig. 9a. We thank P. Schweitzer, J. Grenier, and the BRC Genomics Facility at Cornell for outstanding technical support with Illumina sequencing. This work was funded by the American Cancer Society (Research Scholars Grant to J.A.P.) and the NIH (grant R01GM098634 to J.A.P.).

Author information

These authors contributed equally: Hansen Xu, Benjamin J. Fair.

Authors and Affiliations

Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
Hansen Xu, Benjamin J. Fair, Zachary W. Dwyer, Michael Gildea & Jeffrey A. Pleiss

Authors

Hansen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin J. Fair
View author publications
You can also search for this author in PubMed Google Scholar
Zachary W. Dwyer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Gildea
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey A. Pleiss
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.X., B.J.F., Z.W.D., M.G., and J.A.P. contributed to research design. H.X., B.J.F., Z.W.D., and M.G. performed research and analyzed data. All authors wrote the paper.

Corresponding author

Correspondence to Jeffrey A. Pleiss.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Elevated temperatures in reverse-transcription reactions increase specificity.

The fraction of on-target and off-target reads from replicate MPE-seq libraries generated from reverse-transcription reactions carried out at various temperatures. A small fraction of reads were categorized as “Unextended primer,” which corresponds to short primer extension products (0–5 bases extended past the primer), and thus were categorized as neither cDNAs derived from RNA targets nor unmappable.

Supplementary Figure 2 Percentage of reads mapping to targeted regions.

The percentage of reads mapped to target and off-target regions is depicted for MPE-seq and conventional RNA-seq. In MPE-seq a small fraction of reads were categorized as “Unextended primer,” which corresponds to short primer extension products (0–5 bases extended past the primer), and thus were not categorized as cDNAs derived from RNA targets.

Supplementary Figure 3 Expression measurements as determined by MPE-seq and RNA-seq.

(a) A scatter plot depicts gene expression measurements (RNA-seq) in replicate datasets. Genes containing splice events that were among those chosen for targeted sequencing are depicted in red. These targeted genes range in expression level by orders of magnitude. (b) A scatter plot depicts gene expression measurements in replicate MPE-seq datasets (red). Similar to conventional RNA-seq, expression measurements in MPE-seq are highly reproducible between replicates, even for the small proportion of mis-priming events that map to off-target locations (gray). (c) A scatter plot depicts gene expression measurements in RNA-seq and MPE-seq. The right shift of targeted genes reflects successful enrichment of targets by orders of magnitude. The observation that even highly expressed genes as measured by RNA-seq are proportionally highly expressed in MPE-seq suggests that primers are not limiting during reverse transcription.

Supplementary Figure 4 Splicing measurements as determined by MPE-seq and RNA-seq.

(a) A scatter plot depicts intron-retention measurements in MPE-seq and conventional RNA-seq using wild-type (Prp2) RNA. For calculation of R², n = 252 intron-retention events that were quantified, requiring at least one spliced read and one unspliced read in both experiments. (b) A scatter plot depicts intron-retention measurements in MPE-seq and conventional RNA-seq using RNA from a splicing mutant strain (prp2-1). For calculation of R², n = 193 intron-retention events that were quantified, requiring at least one spliced read and one unspliced read in both experiments. (c) A scatter plot depicts the fold-change (prp2-1/Prp2) in intron retention as measured by MPE-seq and conventional RNA-seq. For calculation of R², n = 203 intron-retention events that were quantified, requiring that both RNA-seq and MPE-seq be quantifiable in the wild-type (Prp2) dataset.

Supplementary Figure 5 Schematic for assigning reads to splice intermediate isoforms.

(a) Schematic depicting cDNA products derived from pre-first-step (P), lariat intermediate (L), and spliced mRNA (S) isoforms. (b) To quantify the abundance of P, L, and S isoforms for each targeted splice event, we counted read fragments and categorized them into six classes based on paired-end alignments. Fragments containing a splice junction (C₁ and C₂) are indicative of S. Fragments that are unspliced and traverse the branch-point region (C₃) are classified as P. Fragments that are unspliced but terminate within a window of –3 to +5 bp from the previously determined branch point (C₄) are classified as L. Fragments that are unspliced and either terminate downstream of the branch point (C₅) or for which the terminus could not be mapped (C₆) are ambiguous between P and L. Therefore, for accounting purposes, the counts for these fragments were coerced into P and L classifications based on the ratio of P and L determined by unambiguous mappings (C₃ and C₄). See Methods for more details.

Supplementary Figure 6 Transcription start site profiling by MPE-seq.

Metagene profile of 3ʹ ends mapped by MPE-seq, centered on transcription start sites (TSSs) as determined by PRO-cap, an orthologous method for mapping TSSs. The high abundance of read ends that pile up at TSSs indicates that MPE-seq can be used to profile cDNA termini.

Supplementary Figure 7 Lariat-intermediate-derived cDNAs contain a unique signature of mismatches.

(a) Genome browser screenshot of the 3ʹ ends of reads from paired-end sequenced fragments illustrates the unique signature of non-templated base incorporation by reverse transcriptase at a branched adenosine versus the 5ʹ RNA terminus. (b) Genome-wide quantification of the mismatch frequencies at 3ʹ termini of cDNAs near the TSS (left) versus at the annotated branch point (right).

Supplementary Figure 8 Transcript features that correlate with the abundance of lariat intermediates.

(a) The abundance of pre-first-step RNA and lariat intermediate RNA is significantly correlated with the classification of introns into those that are in ribosomal protein genes (RPG) and non-RPG. However, the abundance of lariat intermediate relative to pre-first-step RNA, a metric of the efficiency of the second step of splicing, does not correlate. Horizontal lines in box plots represent the 25th, 50th, and 75th percentiles. Whiskers end at the 0th and 100th percentiles. A two-sided Mann–Whitney U-test was used to test differences between the distributions of these metrics. For each metric (pre-first-step, lariat intermediate, and lariat intermediate of unspliced), n = 141 introns for which we attempted lariat quantification and found at least one spliced read, of which 64 were RPG and 77 were non-RPG. (b) Spearman correlations of various features to the abundance of pre-first-step RNA, lariat intermediate, or the abundance of lariat intermediate relative to pre-first-step RNA. None of these features significantly correlate with the abundance of lariat intermediate relative to pre-first-step RNA, a metric of the efficiency of the second step of splicing. Error bars indicate 95% confidence intervals as estimated by Fisher transformation of Spearman’s correlation coefficient. For each metric (pre-first-step, lariat intermediate, and lariat intermediate of unspliced), n = 141 introns for which we attempted lariat quantification and found at least one spliced read.

Supplementary Figure 9 Array-based oligonucleotide synthesis can be used to generate primer pools for use in MPE-seq.

(a) Obtaining adequate amounts of primer pools for MPE-seq from cost-effective array-based oligonucleotide synthesis can be achieved in four steps. (1) PCR amplification of the oligonucleotide synthesis pool using a 5ʹ blocked sense primer and a biotinylated antisense primer. (2) Restriction digestion to cleave off the PCR primer handle. (3) Lambda exonuclease digestion of free 5ʹ ends. (4) Streptavidin purification of biotinylated PCR handle. The unbound fraction is the desired primer pool product. (b) Steps during the amplification and purification of array-synthesized primer pools are monitored via native gel electrophoresis. The control lane represents a pool of individually synthesized MPE-seq primers which did not require amplification and purification. Lanes refer to products of each individual step in the protocol. The unbound fraction is the desired primer pool product. Similar results have been consistently obtained in >3 independent experiments. (c) The percentage of reads mapped to target and off-target regions is depicted for MPE-seq using array-synthesized primers. (d) A scatter plot compares the fraction of unspliced mRNAs measured by MPE-seq libraries which used individually synthesized primer pools versus array-based synthesis of primer pools. For calculation of Pearson’s correlation coefficient, n = 140 intron-retention events which were quantified, requiring at least one spliced read and one unspliced read in both experiments. Sashimi plot of a targeted region within the ats1 gene locus demonstrates the capacity of MPE-seq to reveal complex alternative splicing patterns with higher sensitivity than RNA-seq, despite having lower total sequencing depth.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9

Reporting Summary

Supplementary Table 1

Description of the sequencing libraries in this study

Supplementary Table 2

Counts of unannotated splice sites at target introns for S. cerevisiae. Data based on MPE-seq (samples WT_A and WT_B).

Supplementary Table 3

Counts of pre-first-step RNA (P), lariat intermediate (L), and spliced (S) for all targeted introns for S. cerevisiae. Data based on samples WT_A and WT_B,

Supplementary Table 4

Counts of exon–exon junctions at target introns for S. pombe. Data based on MPE-seq (WT_A) and RNA-seq (Rhind et al.²⁶).

Supplementary Table 5

Reverse-transcription primer pools for S. cerevisiae

Supplementary Table 6

Reverse-transcription primer pools for S. pombe

Supplementary Table 7

Other primers used in this study

Supplementary Table 8

Branch-point annotations used (bedfile containing coordinates of regions from branch to 3' splice site)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, H., Fair, B.J., Dwyer, Z.W. et al. Detection of splice isoforms and rare intermediates using multiplexed primer extension sequencing. Nat Methods 16, 55–58 (2019). https://doi.org/10.1038/s41592-018-0258-x

Download citation

Received: 14 July 2018
Accepted: 08 November 2018
Published: 20 December 2018
Issue Date: January 2019
DOI: https://doi.org/10.1038/s41592-018-0258-x