High-throughput RNA isoform sequencing using programmed cDNA concatenation

Al’Khafaji, Aziz M.; Smith, Jonathan T.; Garimella, Kiran V.; Babadi, Mehrtash; Popic, Victoria; Sade-Feldman, Moshe; Gatzen, Michael; Sarkizova, Siranush; Schwartz, Marc A.; Blaum, Emily M.; Day, Allyson; Costello, Maura; Bowers, Tera; Gabriel, Stacey; Banks, Eric; Philippakis, Anthony A.; Boland, Genevieve M.; Blainey, Paul C.; Hacohen, Nir

doi:10.1038/s41587-023-01815-7

Brief Communication
Published: 08 June 2023

High-throughput RNA isoform sequencing using programmed cDNA concatenation

Nature Biotechnology volume 42, pages 582–586 (2024)Cite this article

13k Accesses
16 Citations
99 Altmetric
Metrics details

Subjects

Abstract

Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: MAS-ISO-seq workflow and experimental validation using synthetic RNA isoforms.**

**Fig. 2: Single-cell isoform-resolved sequencing of primary human CD8⁺ T cells with MAS-ISO-seq.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast

Article Open access 28 March 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Data availability

Links to the datasets used in this study can be found at https://github.com/broadinstitute/mas-seq-paper-data.

Human tumor-infiltrating CD8⁺ T cells single-cell RNA-sequencing data are available from dbGAP with accession number phs003200.v1.p1.

Code availability

An online repository of code for the Longbow tool used in this study can be found at https://github.com/broadinstitute/longbow.

References

Hardwick, S. A., Joglekar, A., Flicek, P., Frankish, A. & Tilgner, H. U. Getting the entire message: progress in isoform sequencing. Front. Genet. 10, 709 (2019).
Article CAS PubMed PubMed Central Google Scholar
Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
Article CAS PubMed PubMed Central Google Scholar
Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2015).
Article PubMed PubMed Central Google Scholar
Dvinge, H., Kim, E., Abdel-Wahab, O. & Bradley, R. K. RNA splicing factors as oncoproteins and tumour suppressors. Nat. Rev. Cancer 16, 413–430 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kanitz, A. et al. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol. 16, 150 (2015).
Article PubMed PubMed Central Google Scholar
Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38, 708–714 (2020).
Article CAS PubMed Google Scholar
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Article CAS PubMed PubMed Central Google Scholar
Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).
Article CAS PubMed PubMed Central Google Scholar
Baid, G. et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat. Biotechnol. 41, 232–238 (2022).
PubMed Google Scholar
Buschmann, T. & Bystrykh, L. V. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics 14, 272 (2013).
Article PubMed PubMed Central Google Scholar
Paul, L. et al. SIRVs: spike-in RNA variants as external isoform controls in RNA-sequencing. Preprint at bioRxiv https://doi.org/10.1101/080747 (2016).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Oberdoerffer, S. et al. Regulation of CD45 alternative splicing by heterogeneous ribonucleoprotein, hnRNPLL. Science 321, 686–691 (2008).
Article CAS PubMed PubMed Central Google Scholar
Bio-Rad. Mini-review: CD45 characterization and isoforms. https://www.bio-rad-antibodies.com/cd45-characterization-isoforms-structure-function-antibodies-minireview.html (2023).
Shi, ZX., Chen, ZC. & Zhong, JY. High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing. Nat Commun 14, 2631 (2023).
Article CAS PubMed PubMed Central Google Scholar
Schlecht, U., Mok, J., Dallett, C. & Berka, J. ConcatSeq: a method for increasing throughput of single molecule sequencing by concatenating short DNA fragments. Sci. Rep. 7, 5252 (2017).
Article PubMed PubMed Central Google Scholar
Kanwar, N., Blanco, C., Chen, I. A. & Seelig, B. PacBio sequencing output increased through uniform and directional fivefold concatenation. Sci. Rep. 11, 1–13 (2021).
Article Google Scholar
Larsson, A. J. & Sandberg, R. stitcher.py. Zenodo. https://doi.org/10.5281/zenodo.3765223 (2020).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078 (2009).
Article PubMed PubMed Central Google Scholar
CCS Docs. What is in the reads.bam? https://ccs.how/faq/reads-bam.html (2022)
Schreiber, J. Pomegranate: fast and flexible probabilistic modeling in python. J Mach Learn Res 18, 1–6 (2018).
Google Scholar
Durbin, R., Eddy, S. R., Krogh, A. & Mitchison, G. Biological Sequence Analysis (Cambridge University Press, 1998).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhao, M., Lee, W.-P., Garrison, E. P. & Marth, G. T. SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8, e82138 (2013).
Article PubMed PubMed Central Google Scholar
Garbe, W. (2012). SymSpell [Computer software]. https://github.com/wolfgarbe/SymSpell
Garbe, W. 1000x Faster spelling correction algorithm. https://gist.github.com/SebastiaanLubbers/8402454 (2012).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Article CAS PubMed PubMed Central Google Scholar
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Article CAS PubMed PubMed Central Google Scholar
Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021).
Article CAS PubMed Google Scholar
Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Res 9, 304 (2020).
Article Google Scholar
Wolf, A., Ramirez, F. & Rybakov, S. Preprocessing and clustering 3k PBMCs. Scanpy documentation. https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html. (2022)
HGNC. Gene group: T cell receptors (TR). https://www.genenames.org/data/genegroup/#!/group/370 (2023).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wolf, A., Ramirez, F. & Rybakov, S. Trajectory inference for hematopoiesis in mouse. Scanpy documentation. https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html. (2022)
Glinos, D.A., Garborcauskas, G. & Hoffman, P. et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature 608, 353–359 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438 (2020).
Article CAS PubMed PubMed Central Google Scholar
Seki, M., Oka, M., Xu, L., Suzuki, A. & Suzuki, Y. Transcript identification through long-read sequencing. Methods Mol. Biol. 2284, 531–541 (2021).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank W. Kretzschmar for the helpful discussions. This work was supported by Broad Institute SPARC awards 800353 (to A.M.A., K.V.G., N.H. and P.C.B.) and 800307 (to K.V.G.); National Institutes of Health grant (U19 AI082630 to N.H.), Adelson Medical Research Foundation (to N.H.), National Human Genome Research Institute grant (RM1HG006193 to N.H. and P.C.B.), with additional support from the Center for Cell Circuits at the Broad Institute (HG006193). M.A.S. is a Cancer Research Institute Irvington Fellow supported by the Cancer Research Institute (CRI award 4071).

Author information

These authors contributed equally: Aziz M. Al’Khafaji, Jonathan T. Smith, Kiran V Garimella, Mehrtash Babadi, Victoria Popic.

Authors and Affiliations

Broad Institute of MIT and Harvard, Cambridge, MA, USA
Aziz M. Al’Khafaji, Jonathan T. Smith, Kiran V. Garimella, Mehrtash Babadi, Victoria Popic, Moshe Sade-Feldman, Michael Gatzen, Siranush Sarkizova, Marc A. Schwartz, Emily M. Blaum, Allyson Day, Maura Costello, Tera Bowers, Stacey Gabriel, Eric Banks, Anthony A. Philippakis, Paul C. Blainey & Nir Hacohen
Department of Medicine, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
Moshe Sade-Feldman, Emily M. Blaum & Nir Hacohen
Department of Pediatrics, Harvard Medical School, Boston, MA, USA
Marc A. Schwartz
Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA, USA
Marc A. Schwartz
Department of Pediatric Oncology, Dana Farber Cancer Institute, Boston, MA, USA
Marc A. Schwartz
Division of Surgical Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Genevieve M. Boland
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Paul C. Blainey
Koch Institute for Integrative Cancer Research at the Massachusetts Institute of Technology, Cambridge, MA, USA
Paul C. Blainey
Harvard Medical School, Boston, MA, USA
Nir Hacohen
Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Charlestown, MA, USA
Nir Hacohen

Authors

Aziz M. Al’Khafaji
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan T. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Kiran V. Garimella
View author publications
You can also search for this author in PubMed Google Scholar
Mehrtash Babadi
View author publications
You can also search for this author in PubMed Google Scholar
Victoria Popic
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Sade-Feldman
View author publications
You can also search for this author in PubMed Google Scholar
Michael Gatzen
View author publications
You can also search for this author in PubMed Google Scholar
Siranush Sarkizova
View author publications
You can also search for this author in PubMed Google Scholar
Marc A. Schwartz
View author publications
You can also search for this author in PubMed Google Scholar
Emily M. Blaum
View author publications
You can also search for this author in PubMed Google Scholar
Allyson Day
View author publications
You can also search for this author in PubMed Google Scholar
Maura Costello
View author publications
You can also search for this author in PubMed Google Scholar
Tera Bowers
View author publications
You can also search for this author in PubMed Google Scholar
Stacey Gabriel
View author publications
You can also search for this author in PubMed Google Scholar
Eric Banks
View author publications
You can also search for this author in PubMed Google Scholar
Anthony A. Philippakis
View author publications
You can also search for this author in PubMed Google Scholar
Genevieve M. Boland
View author publications
You can also search for this author in PubMed Google Scholar
Paul C. Blainey
View author publications
You can also search for this author in PubMed Google Scholar
Nir Hacohen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.M.A. conceived and developed the molecular workflow and designed and performed the experiments. K.V.G. developed the statistical annotation software with contributions from J.S. J.S. developed the data processing pipeline with contributions from V.P. and M.G. and performed bioinformatic analyses. M.B. performed Smart-seq3 and single-cell RNA-seq data analysis and statistical modeling and devised the isoform identification algorithm with contributions from J.S. V.P. developed the UMI and CBC error correction algorithms and conducted the bioinformatic analysis with contributions from A.M.A. S.S. aided through discussions and analysis. G.M.B., E.M.B. and M.S.F. consented patients, collected samples and processed and generated the 10× Genomics scRNA-seq data. M.A.S assisted with T-cell data analysis. M.C., A.D., T.B. and S.G. aided in the data generation and helped troubleshoot early iterations of the protocol. A.A.P. and E.B. provided access to cloud computing and other resources to facilitate data processing and analysis. A.M.A., K.V.G., J.S., M.B., V.P., P.C.B. and N.H. cowrote the manuscript.

Corresponding authors

Correspondence to Aziz M. Al’Khafaji, Kiran V. Garimella, Mehrtash Babadi, Victoria Popic, Paul C. Blainey or Nir Hacohen.

Ethics declarations

Competing interests

The funding that contributed to the subject matter of this manuscript is described as follows: Broad Institute SPARC award, National Institutes of Health grant (U19 AI082630), Adelson Medical Research Foundation, National Human Genome Research Institute grant (RM1HG006193), support from the Center for Cell Circuits at the Broad Institute (HG006193) and Cancer Research Institute award (4071). A.M.A., K.V.G., J.S., M.B., P.C.B. and N.H. are inventors on a licensed, pending international patent application, having serial number PCT/US2021/037226, filed by Broad Institute of MIT and Havard, Massachusetts General Hospital and Massachusetts Institute of Technology, directed to certain subject matter related to the MAS-seq method described in this manuscript. Broad Institute of MIT and Harvard and Pacific Biosciences of California entered into a collaboration agreement relating to this research subsequent to the submission of this manuscript. A.A.P. is a venture partner and employee of GV. He has received funding from Verily, Microsoft, Illumina, Bayer, Pfizer, Biogen, Abbvie, Intel and IBM. M.S.F. receives funding from Bristol-Myers Squibb. G.M.B. has served on SAB and on the steering committee for Nektar Therapeutics. She has SRAs with Olink Proteomics and Palleon Pharmaceuticals. She served on SAB and as a speaker for Novartis. N.H. holds equity in BioNTech and is a founder and equity holder of Danger Bio. P.C.B. is a consultant to and/or holds equity in companies that develop or apply genomic or genome editing technologies: 10× Genomics, General Automation Lab Technologies/Isolation Bio, Celsius Therapeutics, Next Gen Diagnostics LLC, Cache DNA, Concerto Biosciences, Stately Bio, Ramona Optics, Bifrost Biosystems and Amber Bio. P.C.B.’s group receives research funding from industry for unrelated work. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Omid Faridani and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–21, Supplementary Table 1 and Supplementary Note.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Al’Khafaji, A.M., Smith, J.T., Garimella, K.V. et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol 42, 582–586 (2024). https://doi.org/10.1038/s41587-023-01815-7

Download citation

Received: 01 October 2021
Accepted: 02 May 2023
Published: 08 June 2023
Issue Date: April 2024
DOI: https://doi.org/10.1038/s41587-023-01815-7

This article is cited by

Plant pangenomes for crop improvement, biodiversity and evolution
- Mona Schreiber
- Murukarthick Jayakodi
- Martin Mascher
Nature Reviews Genetics (2024)
Integrative genotyping of cancer and immune phenotypes by long-read sequencing
- Livius Penter
- Mehdi Borji
- Catherine J. Wu
Nature Communications (2024)
Isoform-specific RNA structure determination using Nano-DMS-MaP
- Anne-Sophie Gribling-Burrer
- Patrick Bohn
- Redmond P. Smyth
Nature Protocols (2024)
Recent advances in CRISPR-based functional genomics for the study of disease-associated genetic variants
- Heon Seok Kim
- Jiyeon Kweon
- Yongsub Kim
Experimental & Molecular Medicine (2024)
Molecular evidence of anteroposterior patterning in adult echinoderms
- L. Formery
- P. Peluso
- C. J. Lowe
Nature (2023)