High-throughput single-cell sequencing technologies hold tremendous potential for defining cell types in an unbiased fashion using gene expression and epigenomic state. A key challenge in realizing this potential is integrating single-cell datasets from multiple protocols, biological contexts, and data modalities into a joint definition of cellular identity. We previously developed an approach, called linked inference of genomic experimental relationships (LIGER), that uses integrative nonnegative matrix factorization to address this challenge. Here, we provide a step-by-step protocol for using LIGER to jointly define cell types from multiple single-cell datasets. The main stages of the protocol are data preprocessing and normalization, joint factorization, quantile normalization and joint clustering, and visualization. We describe how to jointly define cell types from single-cell RNA-seq (scRNA-seq) and single-nucleus ATAC-seq (snATAC-seq) data, but similar steps apply across a wide range of other settings and data types, including cross-species analysis, single-nucleus DNA methylation, and spatial transcriptomics. Our protocol contains examples of expected results, describes common pitfalls, and relies only on our freely available, open-source R implementation of LIGER. We also provide R Markdown tutorials showing the outputs from each individual code segment. The analysis process can be performed in 1–4 h, depending on dataset size, and assumes no specialized bioinformatics training.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The datasets used in this paper are all previously published and publicly available:
• scRNA-seq and snATAC-seq data from human BMMCs, from Granja et al.24, GEO accession code GSE139369.
• scRNA-seq data from control and interferon-stimulated PBMCs, from Kang et al.18, GEO accession code GSE96583.
Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030.e16 (2018).
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887.e17 (2019).
Yang, Z. & Michailidis, G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8 (2016).
Zhang, A. W. et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 16, 1007–1015 (2019).
Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).
Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).
Clark, S. J. et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat. Commun. 9, 781 (2018).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, aat5691 (2018).
Yao, Z. et al. An integrated transcriptomic and epigenomic atlas of mouse primary motor cortex cell types. Preprint at bioRxiv https://doi.org/10.1101/2020.02.29.970558 (2020).
Tran, N. M. et al. Single-cell profiles of retinal ganglion cells differing in resilience to injury reveal neuroprotective genes. Neuron 104, 1039–1055.e12 (2019).
Krienen, F. M. et al. Innovations in primate interneuron repertoire. Preprint at bioRxiv https://doi.org/10.1101/709501 (2019).
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21, 12 (2020).
Büttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
Svensson, V., da Veiga Beltrame, E. & Pachter, L. Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq. Preprint at bioRxiv https://doi.org/10.1101/762773 (2019).
Montoro, D. T. et al. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature 560, 319–324 (2018).
Plasschaert, L. W. et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature 560, 377–381 (2018).
Welch, J. D., Hu, Y. & Prins, J. F. Robust detection of alternative splicing in a population of single cells. Nucleic Acids Res 44, e73 (2016).
Gao, C. et al. Iterative refinement of cellular identity from single-cell data using online learning. Preprint at bioRxiv https://doi.org/10.1101/2020.01.16.909861 (2020).
Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).
This work was supported by NIH grants R01 AI149669 and R01 HG010883 (J.D.W.) and U19 1U19MH114821 (E.Z.M.).
A patent application on LIGER has been submitted by The Broad Institute, Inc., and The General Hospital Corporation with E.Z.M., J.D.W. and V.K. as inventors.
Peer review information Nature Protocols thanks Andrew Adey, Jinmiao Chen and Sarah Teichmann for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
Welch, J. D. et al. Cell 177, 1873–1887.e17 (2019): https://doi.org/10.1016/j.cell.2019.05.006
Tran, N. M. et al. Neuron 104, 1039–1055.e12 (2019): https://doi.org/10.1016/j.neuron.2019.11.006
Yao, Z. et al. Preprint at bioRxiv (2020): https://doi.org/10.1101/2020.02.29.970558
Krienen, F. M. et al. Preprint at bioRxiv (2019): https://doi.org/10.1101/709501
About this article
Cite this article
Liu, J., Gao, C., Sodicoff, J. et al. Jointly defining cell types from multiple single-cell datasets using LIGER. Nat Protoc 15, 3632–3662 (2020). https://doi.org/10.1038/s41596-020-0391-8
Communications Biology (2021)
Nature Neuroscience (2021)
Integration of Transformative Platforms for the Discovery of Causative Genes in Cardiovascular Diseases
Cardiovascular Drugs and Therapy (2021)