Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Jointly defining cell types from multiple single-cell datasets using LIGER

Abstract

High-throughput single-cell sequencing technologies hold tremendous potential for defining cell types in an unbiased fashion using gene expression and epigenomic state. A key challenge in realizing this potential is integrating single-cell datasets from multiple protocols, biological contexts, and data modalities into a joint definition of cellular identity. We previously developed an approach, called linked inference of genomic experimental relationships (LIGER), that uses integrative nonnegative matrix factorization to address this challenge. Here, we provide a step-by-step protocol for using LIGER to jointly define cell types from multiple single-cell datasets. The main stages of the protocol are data preprocessing and normalization, joint factorization, quantile normalization and joint clustering, and visualization. We describe how to jointly define cell types from single-cell RNA-seq (scRNA-seq) and single-nucleus ATAC-seq (snATAC-seq) data, but similar steps apply across a wide range of other settings and data types, including cross-species analysis, single-nucleus DNA methylation, and spatial transcriptomics. Our protocol contains examples of expected results, describes common pitfalls, and relies only on our freely available, open-source R implementation of LIGER. We also provide R Markdown tutorials showing the outputs from each individual code segment. The analysis process can be performed in 1–4 h, depending on dataset size, and assumes no specialized bioinformatics training.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Diagram of high-level protocol stages.
Fig. 2: Visualizing LIGER results using UMAP and t-SNE.
Fig. 3: LIGER enables metagene- and dataset-specific analysis of PBMC data.
Fig. 4: Parameter selection of the number of factors k and the tuning parameter λ.
Fig. 5: Plots of raw and normalized loading of factor 21.
Fig. 6: Spurious alignment between datasets decreases after removing mitochondrial artifact factors.
Fig. 7: Distinct cell types show poor alignment compared to alignment of control and stimulated PBMC datasets.
Fig. 8: Diagram of differential expression analysis strategies to find shared cluster markers and cluster-specific dataset differences.
Fig. 9: Marker gene identified by LIGER shows consistent cell-type-specific expression across datasets.
Fig. 10: Marker genes identified by LIGER show expression differences across datasets.
Fig. 11: LIGER enables joint clustering of BMMC data across modalities.
Fig. 12: Expression and chromatin accessibility of marker genes selected by LIGER show consistency across modalities.
Fig. 13: Metagenes and metagene expression levels for BMMC data.
Fig. 14: Genes showing expression and accessibility differences.
Fig. 15: UCSC Genome Browser view showing the correlations between three candidate chromatin-accessible regions and the target gene S100A9.
Fig. 16: Expression and correlated accessibility for S100A9 and a nearby intergenic peak.
Fig. 17: Runtime and peak memory usage for joint factorization of scRNA-seq datasets using LIGER.

Data availability

The datasets used in this paper are all previously published and publicly available:

• scRNA-seq and snATAC-seq data from human BMMCs, from Granja et al.24, GEO accession code GSE139369.

• scRNA-seq data composed of two datasets of interneurons and oligodendrocytes from the mouse frontal cortex, from Saunders et al.1. Data available at http://dropviz.org/.

• scRNA-seq data from control and interferon-stimulated PBMCs, from Kang et al.18, GEO accession code GSE96583.

Code availability

The code is freely available at https://github.com/MacoskoLab/liger. The code is also available through an assigned DOI at https://doi.org/10.5281/zenodo.3765403.

References

  1. 1.

    Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030.e16 (2018).

    CAS  Article  Google Scholar 

  2. 2.

    Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887.e17 (2019).

    CAS  Article  Google Scholar 

  3. 3.

    Yang, Z. & Michailidis, G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8 (2016).

    Article  Google Scholar 

  4. 4.

    Zhang, A. W. et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 16, 1007–1015 (2019).

    CAS  Article  Google Scholar 

  5. 5.

    Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

    CAS  Article  Google Scholar 

  6. 6.

    Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).

    CAS  Article  Google Scholar 

  7. 7.

    Clark, S. J. et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun. 9, 781 (2018).

    Article  Google Scholar 

  8. 8.

    Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, aat5691 (2018).

    Article  Google Scholar 

  9. 9.

    Yao, Z. et al. An integrated transcriptomic and epigenomic atlas of mouse primary motor cortex cell types. Preprint at bioRxiv https://doi.org/10.1101/2020.02.29.970558 (2020).

  10. 10.

    Tran, N. M. et al. Single-cell profiles of retinal ganglion cells differing in resilience to injury reveal neuroprotective genes. Neuron 104, 1039–1055.e12 (2019).

    CAS  Article  Google Scholar 

  11. 11.

    Krienen, F. M. et al. Innovations in primate interneuron repertoire. Preprint at bioRxiv https://doi.org/10.1101/709501 (2019).

  12. 12.

    Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).

    CAS  Article  Google Scholar 

  13. 13.

    Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).

    CAS  Article  Google Scholar 

  14. 14.

    Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

    CAS  Article  Google Scholar 

  15. 15.

    Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    CAS  Article  Google Scholar 

  16. 16.

    Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).

    CAS  Article  Google Scholar 

  17. 17.

    Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).

    Article  Google Scholar 

  18. 18.

    Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    CAS  Article  Google Scholar 

  19. 19.

    Svensson, V., da Veiga Beltrame, E. & Pachter, L. Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq. Preprint at bioRxiv https://doi.org/10.1101/762773 (2019).

  20. 20.

    Montoro, D. T. et al. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature 560, 319–324 (2018).

    CAS  Article  Google Scholar 

  21. 21.

    Plasschaert, L. W. et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature 560, 377–381 (2018).

    CAS  Article  Google Scholar 

  22. 22.

    Welch, J. D., Hu, Y. & Prins, J. F. Robust detection of alternative splicing in a population of single cells. Nucleic Acids Res 44, e73 (2016).

    Article  Google Scholar 

  23. 23.

    Gao, C. et al. Iterative refinement of cellular identity from single-cell data using online learning. Preprint at bioRxiv https://doi.org/10.1101/2020.01.16.909861 (2020).

  24. 24.

    Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by NIH grants R01 AI149669 and R01 HG010883 (J.D.W.) and U19 1U19MH114821 (E.Z.M.).

Author information

Affiliations

Authors

Contributions

J.L., C.G., J.S., and J.D.W. performed the data analysis. J.L., C.G., J.S., and J.D.W. wrote the paper, with input from E.Z.M. and V.K. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Joshua D. Welch.

Ethics declarations

Competing interests

A patent application on LIGER has been submitted by The Broad Institute, Inc., and The General Hospital Corporation with E.Z.M., J.D.W. and V.K. as inventors.

Additional information

Peer review information Nature Protocols thanks Andrew Adey, Jinmiao Chen and Sarah Teichmann for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Welch, J. D. et al. Cell 177, 1873–1887.e17 (2019): https://doi.org/10.1016/j.cell.2019.05.006

Tran, N. M. et al. Neuron 104, 1039–1055.e12 (2019): https://doi.org/10.1016/j.neuron.2019.11.006

Yao, Z. et al. Preprint at bioRxiv (2020): https://doi.org/10.1101/2020.02.29.970558

Krienen, F. M. et al. Preprint at bioRxiv (2019): https://doi.org/10.1101/709501

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Gao, C., Sodicoff, J. et al. Jointly defining cell types from multiple single-cell datasets using LIGER. Nat Protoc 15, 3632–3662 (2020). https://doi.org/10.1038/s41596-020-0391-8

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing