Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

High-throughput RNA isoform sequencing using programmed cDNA concatenation

Abstract

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: MAS-ISO-seq workflow and experimental validation using synthetic RNA isoforms.
Fig. 2: Single-cell isoform-resolved sequencing of primary human CD8+ T cells with MAS-ISO-seq.

Similar content being viewed by others

Data availability

Links to the datasets used in this study can be found at https://github.com/broadinstitute/mas-seq-paper-data.

Human tumor-infiltrating CD8+ T cells single-cell RNA-sequencing data are available from dbGAP with accession number phs003200.v1.p1.

Code availability

An online repository of code for the Longbow tool used in this study can be found at https://github.com/broadinstitute/longbow.

References

  1. Hardwick, S. A., Joglekar, A., Flicek, P., Frankish, A. & Tilgner, H. U. Getting the entire message: progress in isoform sequencing. Front. Genet. 10, 709 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Dvinge, H., Kim, E., Abdel-Wahab, O. & Bradley, R. K. RNA splicing factors as oncoproteins and tumour suppressors. Nat. Rev. Cancer 16, 413–430 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Kanitz, A. et al. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol. 16, 150 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38, 708–714 (2020).

    Article  CAS  PubMed  Google Scholar 

  7. Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Baid, G. et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat. Biotechnol. 41, 232–238 (2022).

    PubMed  Google Scholar 

  10. Buschmann, T. & Bystrykh, L. V. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics 14, 272 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Paul, L. et al. SIRVs: spike-in RNA variants as external isoform controls in RNA-sequencing. Preprint at bioRxiv https://doi.org/10.1101/080747 (2016).

  12. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Oberdoerffer, S. et al. Regulation of CD45 alternative splicing by heterogeneous ribonucleoprotein, hnRNPLL. Science 321, 686–691 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Bio-Rad. Mini-review: CD45 characterization and isoforms. https://www.bio-rad-antibodies.com/cd45-characterization-isoforms-structure-function-antibodies-minireview.html (2023).

  15. Shi, ZX., Chen, ZC. & Zhong, JY. High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing. Nat Commun 14, 2631 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Schlecht, U., Mok, J., Dallett, C. & Berka, J. ConcatSeq: a method for increasing throughput of single molecule sequencing by concatenating short DNA fragments. Sci. Rep. 7, 5252 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Kanwar, N., Blanco, C., Chen, I. A. & Seelig, B. PacBio sequencing output increased through uniform and directional fivefold concatenation. Sci. Rep. 11, 1–13 (2021).

    Article  Google Scholar 

  18. Larsson, A. J. & Sandberg, R. stitcher.py. Zenodo. https://doi.org/10.5281/zenodo.3765223 (2020).

  19. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  20. CCS Docs. What is in the reads.bam? https://ccs.how/faq/reads-bam.html (2022)

  21. Schreiber, J. Pomegranate: fast and flexible probabilistic modeling in python. J Mach Learn Res 18, 1–6 (2018).

    Google Scholar 

  22. Durbin, R., Eddy, S. R., Krogh, A. & Mitchison, G. Biological Sequence Analysis (Cambridge University Press, 1998).

  23. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Zhao, M., Lee, W.-P., Garrison, E. P. & Marth, G. T. SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8, e82138 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Garbe, W. (2012). SymSpell [Computer software]. https://github.com/wolfgarbe/SymSpell

  26. Garbe, W. 1000x Faster spelling correction algorithm. https://gist.github.com/SebastiaanLubbers/8402454 (2012).

  27. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021).

    Article  CAS  PubMed  Google Scholar 

  30. Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Res 9, 304 (2020).

    Article  Google Scholar 

  31. Wolf, A., Ramirez, F. & Rybakov, S. Preprocessing and clustering 3k PBMCs. Scanpy documentation. https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html. (2022)

  32. HGNC. Gene group: T cell receptors (TR). https://www.genenames.org/data/genegroup/#!/group/370 (2023).

  33. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Wolf, A., Ramirez, F. & Rybakov, S. Trajectory inference for hematopoiesis in mouse. Scanpy documentation. https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html. (2022)

  35. Glinos, D.A., Garborcauskas, G. & Hoffman, P. et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature 608, 353–359 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Seki, M., Oka, M., Xu, L., Suzuki, A. & Suzuki, Y. Transcript identification through long-read sequencing. Methods Mol. Biol. 2284, 531–541 (2021).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank W. Kretzschmar for the helpful discussions. This work was supported by Broad Institute SPARC awards 800353 (to A.M.A., K.V.G., N.H. and P.C.B.) and 800307 (to K.V.G.); National Institutes of Health grant (U19 AI082630 to N.H.), Adelson Medical Research Foundation (to N.H.), National Human Genome Research Institute grant (RM1HG006193 to N.H. and P.C.B.), with additional support from the Center for Cell Circuits at the Broad Institute (HG006193). M.A.S. is a Cancer Research Institute Irvington Fellow supported by the Cancer Research Institute (CRI award 4071).

Author information

Authors and Affiliations

Authors

Contributions

A.M.A. conceived and developed the molecular workflow and designed and performed the experiments. K.V.G. developed the statistical annotation software with contributions from J.S. J.S. developed the data processing pipeline with contributions from V.P. and M.G. and performed bioinformatic analyses. M.B. performed Smart-seq3 and single-cell RNA-seq data analysis and statistical modeling and devised the isoform identification algorithm with contributions from J.S. V.P. developed the UMI and CBC error correction algorithms and conducted the bioinformatic analysis with contributions from A.M.A. S.S. aided through discussions and analysis. G.M.B., E.M.B. and M.S.F. consented patients, collected samples and processed and generated the 10× Genomics scRNA-seq data. M.A.S assisted with T-cell data analysis. M.C., A.D., T.B. and S.G. aided in the data generation and helped troubleshoot early iterations of the protocol. A.A.P. and E.B. provided access to cloud computing and other resources to facilitate data processing and analysis. A.M.A., K.V.G., J.S., M.B., V.P., P.C.B. and N.H. cowrote the manuscript.

Corresponding authors

Correspondence to Aziz M. Al’Khafaji, Kiran V. Garimella, Mehrtash Babadi, Victoria Popic, Paul C. Blainey or Nir Hacohen.

Ethics declarations

Competing interests

The funding that contributed to the subject matter of this manuscript is described as follows: Broad Institute SPARC award, National Institutes of Health grant (U19 AI082630), Adelson Medical Research Foundation, National Human Genome Research Institute grant (RM1HG006193), support from the Center for Cell Circuits at the Broad Institute (HG006193) and Cancer Research Institute award (4071). A.M.A., K.V.G., J.S., M.B., P.C.B. and N.H. are inventors on a licensed, pending international patent application, having serial number PCT/US2021/037226, filed by Broad Institute of MIT and Havard, Massachusetts General Hospital and Massachusetts Institute of Technology, directed to certain subject matter related to the MAS-seq method described in this manuscript. Broad Institute of MIT and Harvard and Pacific Biosciences of California entered into a collaboration agreement relating to this research subsequent to the submission of this manuscript. A.A.P. is a venture partner and employee of GV. He has received funding from Verily, Microsoft, Illumina, Bayer, Pfizer, Biogen, Abbvie, Intel and IBM. M.S.F. receives funding from Bristol-Myers Squibb. G.M.B. has served on SAB and on the steering committee for Nektar Therapeutics. She has SRAs with Olink Proteomics and Palleon Pharmaceuticals. She served on SAB and as a speaker for Novartis. N.H. holds equity in BioNTech and is a founder and equity holder of Danger Bio. P.C.B. is a consultant to and/or holds equity in companies that develop or apply genomic or genome editing technologies: 10× Genomics, General Automation Lab Technologies/Isolation Bio, Celsius Therapeutics, Next Gen Diagnostics LLC, Cache DNA, Concerto Biosciences, Stately Bio, Ramona Optics, Bifrost Biosystems and Amber Bio. P.C.B.’s group receives research funding from industry for unrelated work. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Omid Faridani and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–21, Supplementary Table 1 and Supplementary Note.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al’Khafaji, A.M., Smith, J.T., Garimella, K.V. et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol 42, 582–586 (2024). https://doi.org/10.1038/s41587-023-01815-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-023-01815-7

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing