Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data

Abstract

The rapid growth of high-throughput single-cell and single-nucleus RNA-sequencing (scRNA-seq and snRNA-seq) technologies has produced a wealth of data over the past few years. The size, volume and distinctive characteristics of these data necessitate the development of new computational methods to accurately and efficiently quantify sc/snRNA-seq data into count matrices that constitute the input to downstream analyses. We introduce the alevin-fry framework for quantifying sc/snRNA-seq data. In addition to being faster and more memory frugal than other accurate quantification approaches, alevin-fry ameliorates the memory scalability and false-positive expression issues that are exhibited by other lightweight tools. We demonstrate how alevin-fry can be effectively used to quantify sc/snRNA-seq data, and also how the spliced and unspliced molecule quantification required as input for RNA velocity analyses can be seamlessly extracted from the same preprocessed data used to generate normal gene expression count matrices.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the alevin-fry pipeline (operating in USA quantification mode).
Fig. 2: Comprehensive analysis of the performance pf alevin-fry on real and simulated datasets.

Similar content being viewed by others

Data availability

All data analyzed in this paper are publicly available. The mouse pancreas dataset, the mouse placenta dataset and the zebrafish pineal dataset analyzed during the current study are available on NCBI Gene Expression Omnibus, under accession number GSM3852755, GSM4609872 and GSM3511193, respectively. The PBMC5k and PBMC10k dataset are available from 10x Genomics at https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_v3 and https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3, respectively. The description of the references that were used for mapping the sequencing reads from each dataset can be found in Supplementary Table 1.

Code availability

Alevin-fry is written in Rust (https://www.rust-lang.org), and is available under the BSD 3-Clause license, as a free and open-source tool at https://github.com/COMBINE-lab/alevin-fry. The specific version used in this work40 has been uploaded to zenodo at https://doi.org/10.5281/zenodo.5799568. The generation of RAD files is implemented as part of the alevin command of the salmon tool, available at https://github.com/COMBINE-lab/salmon. Both tools are also available through bioconda41,42. The roe R package has been developed for the construction of splici reference sequences, and it is available at https://github.com/COMBINE-lab/roe as free and open-source software under the BSD 3-Clause license. Useful scripts and functions for simplifying reference preparation and quantification as well as utilities for reading alevin-fry output in Python and R are available at https://github.com/COMBINE-lab/usefulaf. Support for reading alevin-fry output (including USA mode output) has been integrated into the fishpond package available at https://github.com/mikelove/fishpond as well as through Bioconductor43. The scripts used to perform the analyses in this paper are available at https://github.com/COMBINE-lab/alevin-fry-paper-scripts.

References

  1. Svensson, V., da Veiga Beltrame, E. & Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database 2020, baaa073 (2020).

  2. Li, B., Ruotti, V., Stewart, R. M., Thomson, J. A. & Dewey, C. N. RNA-seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010).

    Article  Google Scholar 

  3. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  Google Scholar 

  4. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

    Article  CAS  Google Scholar 

  5. Srivastava, A., Malik, L., Smith, T., Sudbery, I. & Patro, R. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biol. 20, 65 (2019).

    Article  Google Scholar 

  6. Niebler, S., Müller, A., Hankeln, T. & Schmidt, B. RainDrop: rapid activation matrix computation for droplet-based single-cell RNA-seq reads. BMC Bioinformatics 21, 274 (2020).

    Article  CAS  Google Scholar 

  7. Melsted, P. et al. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat. Biotechnol. 39, 813–818 (2021).

    Article  CAS  Google Scholar 

  8. Melsted, P., Ntranos, V. & Pachter, L. The barcode, UMI, set format and BUStools. Bioinformatics 35, 4472–4473 (2019).

    Article  CAS  Google Scholar 

  9. Kaminow, B., Yunusov, D. & Dobin. A. STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/2021.05.05.442755 (2021).

  10. Shainer, I. et al. Agouti-related protein 2 is a new player in the teleost stress response system. Curr. Biol. 29, 2009–2019.e7 (2019).

    Article  CAS  Google Scholar 

  11. Shainer, I. & Stemmer, M. Choice of preprocessing pipeline influences clustering quality of scRNA-seq datasets. BMC Genomics 22, 661 (2021).

    Article  CAS  Google Scholar 

  12. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).

    Article  CAS  Google Scholar 

  13. Cau, E., Ronsin, B., Bessière, L. & Blader, P. A notch-mediated, temporal asymmetry in BMP pathway activation promotes photoreceptor subtype diversification. PLoS Biol. 17, e2006250 (2019).

    Article  Google Scholar 

  14. Lun, A. T. L. et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 20, 63 (2019).

    Article  Google Scholar 

  15. Crespo, C., Soroldoni, D. & Knust, E. A novel transgenic zebrafish line for red opsin expression in outer segments of photoreceptor cells. Dev. Dyn. 247, 951–959 (2018).

    Article  CAS  Google Scholar 

  16. Wada, S. et al. Color opponency with a single kind of bistable opsin in the zebrafish pineal organ. Proc. Natl Acad. Sci. USA 115, 11310–11315 (2018).

    Article  CAS  Google Scholar 

  17. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  18. Brüning, R. S., Tombor, L., Schulz, M. H., Dimmeler, S. & John, D. Comparative analysis of common alignment tools for single-cell RNA sequencing. GigaScience 11, giac001 (2022).

    Article  Google Scholar 

  19. La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    Article  Google Scholar 

  20. Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).

    Article  CAS  Google Scholar 

  21. Soneson, C., Srivastava, A., Patro, R. & Stadler, M. B. Preprocessing choices affect RNA velocity results for droplet scRNA-seq data. PLoS Comput. Biol. 17, e1008585 (2021).

    Article  CAS  Google Scholar 

  22. Marsh, B. & Blelloch, R. Single nuclei RNA-seq of mouse placental labyrinth development. eLife https://doi.org/10.7554/elife.60266 (2020).

  23. Woods, L., Perez-Garcia, V. & Hemberger, M. Regulation of placental development and its impact on fetal growth—new insights from mouse models. Front. Endocrinol. https://doi.org/10.3389/fendo.2018.00570 (2018).

  24. 10k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor (v3 Chemistry) (10x Genomics, 2018); https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_v3

  25. Srivastava, A. et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol. 21, 239 (2020).

    Article  CAS  Google Scholar 

  26. You, Y. et al. Benchmarking UMI-based single-cell RNA-seq preprocessing workflows. Genome Biol. 22, 339 (2021).

    Article  CAS  Google Scholar 

  27. Sarkar, H., Srivastava, A. & Patro, R. Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level. Bioinformatics 35, i136–i144 (2019).

    Article  CAS  Google Scholar 

  28. Almodaresi, F., Sarkar, H., Srivastava, A. & Patro, R. A space and time-efficient index for the compacted colored de Bruijn graph. Bioinformatics 34, i169–i177 (2018).

    Article  CAS  Google Scholar 

  29. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  Google Scholar 

  30. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  Google Scholar 

  31. Srivastava, A., Sarkar, H., Gupta, N. & Patro, R. RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics 32, i192–i200 (2016).

    Article  CAS  Google Scholar 

  32. Li. H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

  33. Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

    Article  CAS  Google Scholar 

  34. Zhu, A., Srivastava, A., Ibrahim, J. G., Patro, R. & Love, M. I. Non-parametric expression analysis using inferential replicate counts. Nucleic Acids Res. 47, e105–e105 (2019).

    Article  CAS  Google Scholar 

  35. 5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor (v3 Chemistry) (10x Genomics, 2019): https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_v3

  36. Bastidas-Ponce, A. et al. Massive single-cell mRNA profiling reveals a detailed roadmap for pancreatic endocrinogenesis. Development https://doi.org/10.1242/dev.173849 (2019).

  37. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    Article  CAS  Google Scholar 

  38. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  39. Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).

    Article  Google Scholar 

  40. He, D. et al. Alevin-fry v0.4.0 for manuscript "Alevin-fry unlocks rapid, accurate, and memory-frugal quantification of single-cell RNA-seq data". Zenodo https://doi.org/10.5281/zenodo.5806834 (2021).

  41. Grüning, B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018).

    Article  Google Scholar 

  42. He, D. et al. Additional data for manuscript "Alevin-fry unlocks rapid, accurate, and memory-frugal quantification of single-cell RNA-seq data" [Data set]. Zenodo https://doi.org/10.5281/zenodo.5799568 (2021).

  43. Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome biology 5(10), 1–16 (2004).

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Institute of Health under grant award numbers R01HG009937 to R.P. and K99CA267677 to A.S., and the National Science Foundation under grant award numbers CCF-1750472 to R.P. and CNS-1763680 to R.P. Also, this project has been made possible in part by grant number CZIF2020-004893 from the Chan Zuckerberg Initiative Foundation to R.P. The funders had no role in the design of the method, data analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors conceptualized the method. D.H., A.S., R.P., M.Z. and H.S. implemented the software. M.Z. and R.P. benchmarked the tools. D.H., R.P. and C.S. analyzed the results. All authors wrote and approved the manuscript.

Corresponding author

Correspondence to Rob Patro.

Ethics declarations

Competing interests

R.P. is a cofounder of Ocean Genomics, Inc. The other authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Davis McCarthy, David van Dijk and Matthew Ritchie for their contribution to the peer review of this work. Lin Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary analyses and figures.

Reporting Summary

Peer Review Information

Supplementary Table 1

Details of the datasets used in this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, D., Zakeri, M., Sarkar, H. et al. Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data. Nat Methods 19, 316–322 (2022). https://doi.org/10.1038/s41592-022-01408-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-022-01408-3

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics