Abstract
Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer
Nature Communications Open Access 27 November 2023
-
Dynamic thresholding and tissue dissociation optimization for CITE-seq identifies differential surface protein abundance in metastatic melanoma
Communications Biology Open Access 10 August 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout


Data availability
Links to the datasets used in this study can be found at https://github.com/broadinstitute/mas-seq-paper-data.
Human tumor-infiltrating CD8+ T cells single-cell RNA-sequencing data are available from dbGAP with accession number phs003200.v1.p1.
Code availability
An online repository of code for the Longbow tool used in this study can be found at https://github.com/broadinstitute/longbow.
References
Hardwick, S. A., Joglekar, A., Flicek, P., Frankish, A. & Tilgner, H. U. Getting the entire message: progress in isoform sequencing. Front. Genet. 10, 709 (2019).
Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2015).
Dvinge, H., Kim, E., Abdel-Wahab, O. & Bradley, R. K. RNA splicing factors as oncoproteins and tumour suppressors. Nat. Rev. Cancer 16, 413–430 (2016).
Kanitz, A. et al. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol. 16, 150 (2015).
Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38, 708–714 (2020).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).
Baid, G. et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat. Biotechnol. 41, 232–238 (2022).
Buschmann, T. & Bystrykh, L. V. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics 14, 272 (2013).
Paul, L. et al. SIRVs: spike-in RNA variants as external isoform controls in RNA-sequencing. Preprint at bioRxiv https://doi.org/10.1101/080747 (2016).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Oberdoerffer, S. et al. Regulation of CD45 alternative splicing by heterogeneous ribonucleoprotein, hnRNPLL. Science 321, 686–691 (2008).
Bio-Rad. Mini-review: CD45 characterization and isoforms. https://www.bio-rad-antibodies.com/cd45-characterization-isoforms-structure-function-antibodies-minireview.html (2023).
Shi, ZX., Chen, ZC. & Zhong, JY. High-throughput and high-accuracy single-cell RNA isoform analysis using PacBio circular consensus sequencing. Nat Commun 14, 2631 (2023).
Schlecht, U., Mok, J., Dallett, C. & Berka, J. ConcatSeq: a method for increasing throughput of single molecule sequencing by concatenating short DNA fragments. Sci. Rep. 7, 5252 (2017).
Kanwar, N., Blanco, C., Chen, I. A. & Seelig, B. PacBio sequencing output increased through uniform and directional fivefold concatenation. Sci. Rep. 11, 1–13 (2021).
Larsson, A. J. & Sandberg, R. stitcher.py. Zenodo. https://doi.org/10.5281/zenodo.3765223 (2020).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078 (2009).
CCS Docs. What is in the reads.bam? https://ccs.how/faq/reads-bam.html (2022)
Schreiber, J. Pomegranate: fast and flexible probabilistic modeling in python. J Mach Learn Res 18, 1–6 (2018).
Durbin, R., Eddy, S. R., Krogh, A. & Mitchison, G. Biological Sequence Analysis (Cambridge University Press, 1998).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Zhao, M., Lee, W.-P., Garrison, E. P. & Marth, G. T. SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications. PLoS ONE 8, e82138 (2013).
Garbe, W. (2012). SymSpell [Computer software]. https://github.com/wolfgarbe/SymSpell
Garbe, W. 1000x Faster spelling correction algorithm. https://gist.github.com/SebastiaanLubbers/8402454 (2012).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021).
Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Res 9, 304 (2020).
Wolf, A., Ramirez, F. & Rybakov, S. Preprocessing and clustering 3k PBMCs. Scanpy documentation. https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html. (2022)
HGNC. Gene group: T cell receptors (TR). https://www.genenames.org/data/genegroup/#!/group/370 (2023).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).
Wolf, A., Ramirez, F. & Rybakov, S. Trajectory inference for hematopoiesis in mouse. Scanpy documentation. https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html. (2022)
Glinos, D.A., Garborcauskas, G. & Hoffman, P. et al. Transcriptome variation in human tissues revealed by long-read sequencing. Nature 608, 353–359 (2022).
Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438 (2020).
Seki, M., Oka, M., Xu, L., Suzuki, A. & Suzuki, Y. Transcript identification through long-read sequencing. Methods Mol. Biol. 2284, 531–541 (2021).
Acknowledgements
We thank W. Kretzschmar for the helpful discussions. This work was supported by Broad Institute SPARC awards 800353 (to A.M.A., K.V.G., N.H. and P.C.B.) and 800307 (to K.V.G.); National Institutes of Health grant (U19 AI082630 to N.H.), Adelson Medical Research Foundation (to N.H.), National Human Genome Research Institute grant (RM1HG006193 to N.H. and P.C.B.), with additional support from the Center for Cell Circuits at the Broad Institute (HG006193). M.A.S. is a Cancer Research Institute Irvington Fellow supported by the Cancer Research Institute (CRI award 4071).
Author information
Authors and Affiliations
Contributions
A.M.A. conceived and developed the molecular workflow and designed and performed the experiments. K.V.G. developed the statistical annotation software with contributions from J.S. J.S. developed the data processing pipeline with contributions from V.P. and M.G. and performed bioinformatic analyses. M.B. performed Smart-seq3 and single-cell RNA-seq data analysis and statistical modeling and devised the isoform identification algorithm with contributions from J.S. V.P. developed the UMI and CBC error correction algorithms and conducted the bioinformatic analysis with contributions from A.M.A. S.S. aided through discussions and analysis. G.M.B., E.M.B. and M.S.F. consented patients, collected samples and processed and generated the 10× Genomics scRNA-seq data. M.A.S assisted with T-cell data analysis. M.C., A.D., T.B. and S.G. aided in the data generation and helped troubleshoot early iterations of the protocol. A.A.P. and E.B. provided access to cloud computing and other resources to facilitate data processing and analysis. A.M.A., K.V.G., J.S., M.B., V.P., P.C.B. and N.H. cowrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The funding that contributed to the subject matter of this manuscript is described as follows: Broad Institute SPARC award, National Institutes of Health grant (U19 AI082630), Adelson Medical Research Foundation, National Human Genome Research Institute grant (RM1HG006193), support from the Center for Cell Circuits at the Broad Institute (HG006193) and Cancer Research Institute award (4071). A.M.A., K.V.G., J.S., M.B., P.C.B. and N.H. are inventors on a licensed, pending international patent application, having serial number PCT/US2021/037226, filed by Broad Institute of MIT and Havard, Massachusetts General Hospital and Massachusetts Institute of Technology, directed to certain subject matter related to the MAS-seq method described in this manuscript. Broad Institute of MIT and Harvard and Pacific Biosciences of California entered into a collaboration agreement relating to this research subsequent to the submission of this manuscript. A.A.P. is a venture partner and employee of GV. He has received funding from Verily, Microsoft, Illumina, Bayer, Pfizer, Biogen, Abbvie, Intel and IBM. M.S.F. receives funding from Bristol-Myers Squibb. G.M.B. has served on SAB and on the steering committee for Nektar Therapeutics. She has SRAs with Olink Proteomics and Palleon Pharmaceuticals. She served on SAB and as a speaker for Novartis. N.H. holds equity in BioNTech and is a founder and equity holder of Danger Bio. P.C.B. is a consultant to and/or holds equity in companies that develop or apply genomic or genome editing technologies: 10× Genomics, General Automation Lab Technologies/Isolation Bio, Celsius Therapeutics, Next Gen Diagnostics LLC, Cache DNA, Concerto Biosciences, Stately Bio, Ramona Optics, Bifrost Biosystems and Amber Bio. P.C.B.’s group receives research funding from industry for unrelated work. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks Omid Faridani and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–21, Supplementary Table 1 and Supplementary Note.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Al’Khafaji, A.M., Smith, J.T., Garimella, K.V. et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01815-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-023-01815-7
This article is cited by
-
Dynamic thresholding and tissue dissociation optimization for CITE-seq identifies differential surface protein abundance in metastatic melanoma
Communications Biology (2023)
-
Molecular evidence of anteroposterior patterning in adult echinoderms
Nature (2023)
-
Single-cell multi-omics of mitochondrial DNA disorders reveals dynamics of purifying selection across human immune cells
Nature Genetics (2023)
-
Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer
Nature Communications (2023)