An explosion in single-cell technologies has revealed a previously underappreciated heterogeneity of cell types and novel cell-state associations with sex, disease, development and other processes. Starting with transcriptome analyses, single-cell techniques have extended to multi-omics approaches and now enable the simultaneous measurement of data modalities and spatial cellular context. Data are now available for millions of cells, for whole-genome measurements and for multiple modalities. Although analyses of such multimodal datasets have the potential to provide new insights into biological processes that cannot be inferred with a single mode of assay, the integration of very large, complex, multimodal data into biological models and mechanisms represents a considerable challenge. An understanding of the principles of data integration and visualization methods is required to determine what methods are best applied to a particular single-cell dataset. Each class of method has advantages and pitfalls in terms of its ability to achieve various biological goals, including cell-type classification, regulatory network modelling and biological process inference. In choosing a data integration strategy, consideration must be given to whether the multi-omics data are matched (that is, measured on the same cell) or unmatched (that is, measured on different cells) and, more importantly, the overall modelling and visualization goals of the integrated analysis.
With the development of single-cell multi-omics techniques, tools and models for data integration are critically important.
Integration problems in single-cell biology can be divided into those associated with the integration of matched and unmatched data.
Strategies for integrating matched data include joint latent space inference, consensus of individual inferences and biological causal modelling.
Strategies for integrating unmatched data include annotated group matching, matching with common features and aligning spaces.
Visualization methods for integrated multimodal single-cell data are still underdeveloped.
Future challenges include accounting for specific noise related to each modality, overcoming the need for computing efficiency and developing biologically interpretable integration strategies.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Military Medical Research Open Access 17 August 2023
Signal Transduction and Targeted Therapy Open Access 12 June 2023
npj Science of Food Open Access 05 June 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Richardson, S., Tseng, G. C. & Sun, W. Statistical methods in integrative genomics. Annu. Rev. Stat. Appl. 3, 181–209 (2016).
Yuan, G.-C. et al. Challenges and emerging directions in single-cell analysis. Genome Biol. 18, 84 (2017).
Eberwine, J., Sul, J.-Y., Bartfai, T. & Kim, J. The promise of single-cell sequencing. Nat. Methods 11, 25–27 (2014).
Yao, Z. et al. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Preprint at bioRxiv https://doi.org/10.1101/2020.03.30.015214 (2020).
Cao, J. et al. A human cell atlas of fetal gene expression. Science 370, eaba7721 (2020).
Ransick, A. et al. Single-cell profiling reveals sex, lineage, and regional diversity in the mouse kidney. Dev. Cell 51, 399–413.e7 (2019). A comprehensive kidney scRNA-seq atlas with the visualization tool Kidney Cell Explorer.
Kirita, Y., Wu, H., Uchimura, K., Wilson, P. C. & Humphreys, B. D. Cell profiling of mouse acute kidney injury reveals conserved cellular responses to injury. Proc. Natl Acad. Sci. USA 117, 15874–15883 (2020).
Kuppe, C. et al. Decoding myofibroblast origins in human kidney fibrosis. Nature 589, 281–286 (2021).
Gerhardt, L. M. S. et al. Single-nuclear transcriptomics reveals diversity of proximal tubule cell states in a dynamic response to acute kidney injury. Proc. Natl Acad. Sci. USA 118, e2026684118 (2021).
Ma, A., McDermaid, A., Xu, J., Chang, Y. & Ma, Q. Integrative methods and practical challenges for single-cell multi-omics. Trends Biotechnol. 38, 1007–1022 (2020). A comprehensive review of single-cell multi-omics technologies.
Lee, J., Hyeon, D. Y. & Hwang, D. Single-cell multiomics: technologies and data analysis methods. Exp. Mol. Med. 52, 1428–1442 (2020).
Sullivan, K. M. & Susztak, K. Unravelling the complex genetics of common kidney diseases: from variants to mechanisms. Nat. Rev. Nephrol. 16, 628–640 (2020). An up-to-date review on efforts to gain further understanding of kidney disease-associated genome-wide association study variants.
Muto, Y. et al. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat. Commun. 12, 2190 (2021).
Miao, Z. et al. Single cell regulatory landscape of the mouse kidney highlights cellular differentiation programs and disease targets. Nat. Commun. 12, 2277 (2021).
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Gorin, G., Svensson, V. & Pachter, L. Protein velocity and acceleration from single-cell multiomics experiments. Genome Biol. 21, 39 (2020).
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
Peterson, V. M. et al. Multiplexed quantification of proteins and transcripts in single cells. Nat. Biotechnol. 35, 936–939 (2017).
Zhou, Z., Ye, C., Wang, J. & Zhang, N. R. Surface protein imputation from single cell transcriptomes by deep neural networks. Nat. Commun. 11, 651 (2020).
Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871.e8 (2018).
Serra, A., Fratello, M., Greco, D. & Tagliaferri, R. Data integration in genomics and systems biology. in 2016 IEEE Congress on Evolutionary Computation (CEC) 1272–1279 (IEEE, 2016).
Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 83 (2017).
Liu, L. et al. Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity. Nat. Commun. 10, 470 (2019).
Dueck, H. et al. Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation. Genome Biol. 16, 122 (2015).
Dueck, H. R. et al. Assessing characteristics of RNA amplification methods for single cell RNA sequencing. BMC Genomics 17, 966 (2016).
Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).
Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).
Zhu, C. et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol. 26, 1063–1070 (2019).
Ma, S. et al. Chromatin potential identified by shared single cell profiling of RNA and chromatin. Preprint at bioRxiv https://doi.org/10.1101/2020.06.17.156943 (2020).
Han, S. H., Choi, Y., Kim, J. & Lee, D. Photoactivated selective release of droplets from microwell arrays. ACS Appl. Mater. Interfaces 12, 3936–3944 (2020).
Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).
Li, Y., Ma, L., Wu, D. & Chen, G. Advances in bulk and single-cell multi-omics approaches for systems biology and precision medicine. Brief. Bioinform. https://doi.org/10.1093/bib/bbab024 (2021).
Sokal, R. R. Distance as a measure of taxonomic similarity. Syst. Biol. 10, 70–79 (1961).
Sneath, P. H. A. & Sokal, R. R. Numerical Taxonomy: The Principles and Practice of Numerical Classification (WF Freeman, 1973).
Wang, X. et al. BREM-SC: a Bayesian random effects mixture model for joint clustering single cell multi-omics data. Nucleic Acids Res. 48, 5814–5824 (2020).
Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol. 21, 25 (2020).
Argelaguet, R. et al. Multi-omics factor analysis — a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124 (2018).
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014). This paper introduces the similarity network fusion model, which is widely applied in multi-omics integration.
Kim, H. J., Lin, Y., Geddes, T. A., Yang, J. Y. H. & Yang, P. CiteFuse enables multi-modal analysis of CITE-seq data. Bioinformatics 36, 4137–4143 (2020).
Han, X. et al. Construction of a human cell landscape at single-cell level. Nature 581, 303–309 (2020).
Packer, J. S. et al. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science 365, eaax1971 (2019). A single-cell atlas of Caenorhabditis elegans with the visualization tool visCello.
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Slavov, N. Single-cell protein analysis by mass spectrometry. Curr. Opin. Chem. Biol. 60, 1–9 (2021).
Neumann, E. K., Ellis, J. F., Triplett, A. E., Rubakhin, S. S. & Sweedler, J. V. Lipid analysis of 30000 individual rodent cerebellar cells using high-resolution mass spectrometry. Anal. Chem. 91, 7871–7878 (2019).
Zhu, Q. et al. Developmental trajectory of prehematopoietic stem cell formation from endothelium. Blood 136, 845–856 (2020).
Wang, C. et al. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol. 21, 198 (2020).
Asp, M., Bergenstråhle, J. & Lundeberg, J. Spatially resolved transcriptomes—next generation tools for tissue exploration. BioEssays 42, 1900221 (2020).
Zhu, Q., Shah, S., Dries, R., Cai, L. & Yuan, G.-C. Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat. Biotechnol. 36, 1183–1190 (2018).
Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019).
Andersson, A. et al. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun. Biol. 3, 565 (2020).
Govek, K. W. et al. Single-cell transcriptomic analysis of mIHC images via antigen mapping. Sci. Adv. 7, eabc5464 (2021).
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018). This paper introduces the MNN method that became popular in single-cell biology with multiple applications.
Campbell, K. R. et al. clonealign: statistical integration of independent single-cell RNA and DNA sequencing data from human cancers. Genome Biol. 20, 54 (2019).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887.e17 (2019).
Yang, Z. & Michailidis, G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1–8 (2016).
Amodio, M. & Krishnaswamy, S. MAGAN: aligning biological manifolds. Proc. Machine Learn. Res. 80, 215–223 (2018).
Welch, J. D., Hartemink, A. J. & Prins, J. F. MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol. 18, 138 (2017).
Liu, J., Huang, Y., Singh, R., Vert, J.-P. & Noble, W. S. in 19th International Workshop on Algorithms in Bioinformatics (WABI 2019) (eds Huber, K. T. & Gusfield, D.) Vol. 143 10:1–10:13 (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019).
Cao, K., Bai, X., Hong, Y. & Wan, L. Unsupervised topological alignment for single-cell multi-omics integration. Bioinformatics 36, i48–i56 (2020).
Demetci, P., Santorella, R., Sandstede, B., Noble, W. S. & Singh, R. Gromov-Wasserstein optimal transport to align single-cell multi-omics data. Preprint at bioRxiv https://doi.org/10.1101/2020.04.28.066787 (2020).
Li, X. et al. Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nat. Commun. 11, 2338 (2020).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at arxiv https://arxiv.org/abs/1803.00385 (2020).
Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).
Costa, F., Grün, D. & Backofen, R. GraphDDP: a graph-embedding approach to detect differentiation pathways in single-cell-data using prior class knowledge. Nat. Commun. 9, 3685 (2018).
Wu, Y. & Zhang, K. Tools for the analysis of high-dimensional single-cell RNA sequencing data. Nat. Rev. Nephrol. 16, 408–421 (2020). A comprehensive review of scRNA-seq data analysis pipelines and computational tools.
Steiniger, S. & Hay, G. J. Free and open source geographic information tools for landscape ecology. Ecol. Inform. 4, 183–195 (2009).
Raney, B. J. et al. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC genome browser. Bioinformatics 30, 1003–1005 (2014).
Ou, J. & Zhu, L. J. trackViewer: a bioconductor package for interactive and integrative visualization of multi-omics data. Nat. Methods 16, 453–454 (2019).
Snyder, M. P. et al. The human body at cellular resolution: the NIH human biomolecular atlas program. Nature 574, 187–192 (2019).
Hillje, R., Pelicci, P. G. & Luzi, L. Cerebro: interactive visualization of scRNA-seq data. Bioinformatics 36, 2311–2313 (2020).
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).
Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254 (2019).
Chakrabarti, S. et al. Hidden heterogeneity and circadian-controlled cell fate inferred from single cell lineages. Nat. Commun. 9, 5372 (2018).
Zhong, L. et al. Single cell transcriptomics identifies a unique adipose lineage cell population that regulates bone marrow environment. eLife 9, e54695 (2020).
Lahens, N. F. et al. IVT-seq reveals extreme bias in RNA sequencing. Genome Biol. 15, R86 (2014).
Marquina-Sanchez, B. et al. Single-cell RNA-seq with spike-in cells enables accurate quantification of cell-specific drug effects in pancreatic islets. Genome Biol. 21, 106 (2020).
Xi, N. M. & Li, J. J. Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Syst. 12, 176–194.e6 (2021).
Franzosa, E. A. et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat. Microbiol. 4, 293–305 (2019).
Tini, G., Marchetti, L., Priami, C. & Scott-Boyer, M.-P. Multi-omics integration — a comparison of unsupervised clustering methodologies. Brief. Bioinform. 20, 1269–1279 (2019).
Pierson, E. & Yau, C. ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16, 241 (2015).
Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).
Marinov, G. K. et al. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 24, 496–510 (2014).
Zhang, L. & Nie, Q. scMC learns biological variation through the alignment of multiple single-cell genomics datasets. Genome Biol. 22, 10 (2021).
Fang, R. et al. Comprehensive analysis of single cell ATAC-seq data with SnapATAC. Nat. Commun. 12, 1337 (2021).
Velleman, P. F. & Wilkinson, L. Nominal, ordinal, interval, and ratio typologies are misleading. Am. Stat. 47, 65–72 (1993).
He, B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 4, 827–834 (2020).
Wu, H., Kirita, Y., Donnelly, E. L. & Humphreys, B. D. Advantages of single-nucleus over single-cell RNA sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis. J. Am. Soc. Nephrol. 30, 23–32 (2019).
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
McGinnis, C. S. et al. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods 16, 619–626 (2019).
Yang, K. D. et al. Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nat. Commun. 12, 31 (2021).
Dhillon, P. et al. The nuclear receptor ESRRA protects from kidney disease by coupling metabolism and differentiation. Cell Metab. 33, 379–394.e8 (2021).
Sheng, X. et al. Systematic integrated analysis of genetic and epigenetic variation in diabetic kidney disease. Proc. Natl Acad. Sci. USA 117, 29013–29024 (2020).
Wu, P.-H. et al. Single-cell morphology encodes metastatic potential. Sci. Adv. 6, eaaw6938 (2020).
Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).
Lindström, N. O. et al. Spatial transcriptional mapping of the human nephrogenic program. Preprint at bioRxiv https://doi.org/10.1101/2020.04.27.060749 (2020).
Khaladkar, M. et al. Subcellular RNA sequencing reveals broad presence of cytoplasmic intron-sequence retaining transcripts in mouse and rat neurons. PLoS ONE 8, e76194 (2013). The first subcellular RNA sequencing method.
This work was supported in part by UC2DK126024 grant to J.K., B.D.H. and A.P.M. as well as by a Health Research Formula Fund of the Commonwealth of Pennsylvania, which did not have a direct role in the work.
A.P.M. is a scientific adviser to Novartis, eGENESIS, TRESTLE Therapeutics and IVIVA Medical. The other authors declare no competing interests.
Peer review information
Nature Reviews Nephrology thanks B. J. Aronow, Q. Nie and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
GenitoUrinary Development Molecular Anatomy Project: https://www.gudmap.org/
HuBMAP portal: https://portal.hubmapconsortium.org/
ReBuilding a Kidney: https://www.rebuildingakidney.org/
- Assay for transposase-accessible chromatin using sequencing
(ATAC-seq). A technique that profiles the accessibility of DNA elements based on the principle that the Tn5 transposase can insert a transposon only at accessible parts of the chromosome. The insertion location is identified through DNA sequencing.
- Cis-regulatory elements
DNA elements proximal to a gene that are required for controlling gene expression. Such elements usually include promoters and enhancers, and often contain transcription factor-binding sites.
- Molecule recovery efficiency
Single-cell assays capture molecules, such as mRNAs or transposon-interrupted DNA fragments, and amplify them for readout. Different protocols recover a given pool of molecules with different efficiencies; for example, a single podocyte might have 300,000 mRNA molecules and an RNA sequencing protocol with a 10% recovery efficiency would recover ~30,000 of these.
- Joint snRNA-seq and snATAC-seq
Single-cell RNA sequencing (scRNA-seq) attempts to recover RNA from the whole cell, whereas single-nucleus RNA sequencing (snRNA-seq) only isolates the nuclear fraction of the RNA; the two transcriptomes are related but different. Multi-omics methods involving assay for transposase-accessible chromatin using sequencing (ATAC-seq) and RNA-seq typically isolate the nucleus first, resulting in snRNA-seq and snATAC-seq.
- Feature space
In machine learning, measured variables are often called features and the set of features comprise a feature space.
- Sequential fluorescence in situ hybridization
(seqFISH). A technique that measures mRNA quantity through sequential fluorescent probes that have combinatorially encoded information for each targeted mRNA. For example, a sequence signal, probe A then B, might encode gene X, whereas the sequence probe A then C might encode gene Y.
- Read depth
A quantity that measures the number of times that sequencing reads cover a given genomic region. The region of interest may be a base pair or an entire transcribed region.
- Canonical correlation analysis
A multivariate statistical technique that computes the correlation between two sets of variables, say X and Y. Canonical correlation analysis finds the linear combination of X and the linear combination of Y that maximizes correlation.
- Non-negative matrix factorization
A group of algorithms that decompose one matrix into a product of two (or more) matrices, such that the elements in each matrix are non-negative. Typically, each matrix has a model interpretation; for example, a data matrix factorizes the matrix into one representing latent space features and another representing latent space features to cells.
A metagene is some (mathematical) function of a group of genes (for example, linear combination), often relating some shared properties. For example, methods like non-negative matrix factorization compute matrices as the product between a gene-by-metagene matrix and a metagene-by-cell matrix.
- Dimension reduction
A data transformation method that reduces the number of dimensions in the original feature space to a lower-dimensional space (usually much lower than the original one) while certain properties (for example, the distance measures between observations) of the original data are preserved.
In contrast to real time, pseudotime represents computationally inferred temporal stages of a collection of cells.
- Principal component analysis
A common dimension reduction method that aims to project the original data to a fixed smaller dimension while minimizing the squared error during data reduction. This approach can be viewed as maximizing the variance in the projected data.
In mathematics, embedding is a map from one set X to another set Y, where some characteristic of X is preserved. In single-cell studies, the term embedding has been used for methods that ‘place’ cells in a new feature space, possibly of a lower dimension, such that notions of cell-to-cell distances are approximately preserved.
In single-cell biology, dropouts are usually the transcripts that were present in the cell but were not captured during sequencing.
- Ambient RNA
In droplet-based single-cell RNA sequencing approaches, the measured mRNA molecules can be contaminated by mRNAs from other cells present in the suspension, for example, owing to cell rupture. These contaminating mRNAs are termed ambient RNA.
During high-throughput single-cell (or single-nucleus) isolation in droplets or similar vessels, two or more cells might be captured together creating a mixture of molecules. Computational methods have been developed to detect and remove such unwanted observations from the dataset.
About this article
Cite this article
Miao, Z., Humphreys, B.D., McMahon, A.P. et al. Multi-omics integration in the age of million single-cell data. Nat Rev Nephrol 17, 710–724 (2021). https://doi.org/10.1038/s41581-021-00463-x
This article is cited by
Military Medical Research (2023)
Nature Reviews Clinical Oncology (2023)
Signal Transduction and Targeted Therapy (2023)
Nature Methods (2023)
Nature Structural & Molecular Biology (2023)