Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Computational principles and challenges in single-cell data integration

Abstract

The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term ‘data integration’ has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Alternative choices of anchors for data integration.
Fig. 2: Cell-type-specific eQTL mapping as an example of local vertical integration.
Fig. 3: Mosaic integration.
Fig. 4: Mapping time-resolved single-cell genomics experiments across species.
Fig. 5: Data integration of spatially resolved transcriptomics.
Fig. 6: Exploiting molecular variation at single-cell resolution to construct population-level maps of human phenotypic variation.

Similar content being viewed by others

References

  1. Navin, N. E. The first five years of single-cell cancer genomics and beyond. Genome Res. 25, 1499–1507 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Peng, G., Cui, G., Ke, J. & Jing, N. Using single-cell and spatial transcriptomes to understand stem cell lineage specification during early embryo development. Annu. Rev. Genomics Hum. Genet. 21, 163–181 (2020).

    Article  CAS  PubMed  Google Scholar 

  3. Clark, S. J. et al. scNMT-seq enables joint profiling of chromatin accessibility, DNA methylation and transcription in single cells. Nat. Commun. 9, 781 (2018).

  4. Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell https://doi.org/10.1016/j.cell.2020.09.056 (2020).

  5. Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Angermueller, C. et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat. Methods 13, 229–232 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Swanson, E. et al. TEA-seq: a trimodal assay for integrated single cell measurement of transcription, epitopes, and chromatin accessibility. Preprint at bioRxiv https://doi.org/10.1101/2020.09.04.283887 (2020).

  8. Stuart, T. & Satija, R. Integrative single-cell analysis. Nat. Rev. Genet. 20, 257–272 (2019).

    Article  CAS  PubMed  Google Scholar 

  9. Macaulay, I. C., Ponting, C. P. & Voet, T. Single-cell multiomics: multiple measurements from single cells. Trends Genet. 33, 155–168 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Chappell, L., Russell, A. J. C. & Voet, T. Single-cell (multi) omics technologies. Annu. Rev. Genomics Hum. Genet. 19, 15–41 (2018).

    Article  CAS  Google Scholar 

  11. Hao, Y., Hao, S., Andersen-Nissen, E. & Mauck, W. M. Integrated analysis of multimodal single-cell data. Preprint at bioRxiv https://doi.org/10.1101/2020.10.12.335331 (2020).

  12. Forcato, M., Romano, O. & Bicciato, S. Computational methods for the integrative analysis of single-cell data. Brief. Bioinform. 22, 20–29 (2021).

    Article  PubMed  Google Scholar 

  13. Ma, A., McDermaid, A., Xu, J., Chang, Y. & Ma, Q. Integrative methods and practical challenges for single-cell multi-omics. Trends Biotechnol. 38, 1007–1022 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Colomé-Tatché, M. & Theis, F. J. Statistical single cell multi-omics integration. Curr. Opin. Syst. Biol. 7, 54–59 (2018).

    Article  Google Scholar 

  15. Lähnemann, D. et al. Eleven grand challenges in single-cell data science. Genome Biol. 21, 31 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Cheow, L. F. et al. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nat. Methods 13, 833–836 (2016).

    Article  CAS  PubMed  Google Scholar 

  17. Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. https://doi.org/10.1038/s41587-019-0290-0 (2019).

  18. Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

  19. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    PubMed  PubMed Central  Google Scholar 

  20. Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

    Article  PubMed  Google Scholar 

  21. Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Polański, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics 36, 964–965 (2020).

    PubMed  Google Scholar 

  26. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15, 359–362 (2018).

    Article  CAS  PubMed  Google Scholar 

  29. Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Johansen, N. & Quon, G. scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data. Genome Biol. 20, 166 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Preprint at bioRxiv https://doi.org/10.1101/2020.05.22.111161 (2020).

  32. Schadt, E. E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003).

    Article  CAS  PubMed  Google Scholar 

  33. Cantini, L. et al. Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat. Commun. 12, 124 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Buettner, F., Pratanwanich, N., McCarthy, D. J., Marioni, J. C. & Stegle, O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 18, 212 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Nica, A. C. & Dermitzakis, E. T. Expression quantitative trait loci: present and future. Philos. Trans. R. Soc. B Biol. Sci. 368, 20120362 (2013).

    Article  CAS  Google Scholar 

  36. Westra, H.-J. & Franke, L. From genome to function by studying eQTLs. Biochim. Biophys. Acta 1842, 1896–1902 (2014).

    Article  CAS  PubMed  Google Scholar 

  37. Hu, Y. et al. Simultaneous profiling of transcriptome and DNA methylome from a single cell. Genome Biol. 17, 88 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Liu, L. et al. Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity. Nat. Commun. 10, 470 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Cao, J. et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Packer, J. & Trapnell, C. Single-cell multi-omics: an engine for new quantitative models of gene regulation. Trends Genet. 34, 653–665 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

    Article  CAS  PubMed  Google Scholar 

  43. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

    Article  CAS  PubMed  Google Scholar 

  44. Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459–463 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Henderson, C. R. Applications of Linear Models in Animal Breeding Univ. Guelph (1984).

  46. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Furlotte, N. A., Kang, H. M., Ye, C. & Eskin, E. Mixed-model coexpression: calculating gene coexpression while accounting for expression heterogeneity. Bioinformatics 27, i288–i294 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Fairfax, B. P. et al. Genetics of gene expression in primary immune cells identifies cell type–specific master regulators and roles of HLA alleles. Nat. Genet. 44, 502–510 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. van der Wijst, M. G. P. et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat. Genet. 50, 493–497 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Cuomo, A. S. E. et al. Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nat. Commun. 11, 810 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Strober, B. J. et al. Dynamic genetic regulation of gene expression during cellular differentiation. Science 364, 1287–1290 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Wills, Q. F. et al. Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat. Biotechnol. 31, 748–752 (2013).

    Article  CAS  PubMed  Google Scholar 

  54. Sarkar, A. K. et al. Discovery and characterization of variance QTLs in human induced pluripotent stem cells. PLoS Genet. 15, e1008045 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. van der Wijst, M. et al. The single-cell eQTLGen consortium. eLife 9, e52155 (2020).

  56. Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    Article  CAS  PubMed  Google Scholar 

  57. Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53, 304–312 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Rubin, A. J. et al. Coupled single-cell CRISPR screening and epigenomic profiling reveals causal gene regulatory networks. Cell 176, 361–376 (2019).

    Article  CAS  PubMed  Google Scholar 

  60. Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Methods 14, 297–301 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Schraivogel, D. et al. Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat. Methods 17, 629–635 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 1516 (2019).

    Article  CAS  PubMed  Google Scholar 

  63. Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Argelaguet, R. et al. Multi-omics profiling of mouse gastrulation at single-cell resolution. Nature 576, 487–491 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).

  66. Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Xu, C., Tao, D. & Xu, C. A survey on multi-view learning. Preprint at https://arxiv.org/abs/1304.5634 (2013).

  69. Argelaguet, R. et al. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124 (2018).

  70. Lock, E. F., Hoadley, K. A., Marron, J. S. & Nobel, A. B. Joint and Individual Variation Explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Stat. 7, 523–542 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Singh, A. et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics 35, 3055–3062 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Meng, C., Kuster, B., Culhane, A. C. & Gholami, A. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics 15, 162 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  73. Klami, A., Virtanen, S., Leppäaho, E. & Kaski, S. Group factor analysis. IEEE Trans. Neural Netw. Learn. Syst. 26, 2136–2147 (2015).

    Article  PubMed  Google Scholar 

  74. Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Luo, C. et al. Single nucleus multi-omics links human cortical cell regulatory genome diversity to disease risk variants. Preprint at bioRxiv https://doi.org/10.1101/2019.12.11.873398 (2019).

  76. Wang, C. et al. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol. 21, 198 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Welch, J. D., Hartemink, A. J. & Prins, J. F. MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol. 18, 138 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  78. Liu, J., Huang, Y., Singh, R., Vert, J.-P. & Noble, W. S. Jointly embedding multiple single-cell omics measurements. Preprint at bioRxiv https://doi.org/10.1101/644310 (2019).

  79. Zheng, H. et al. Cross-domain fault diagnosis using knowledge transfer strategy: a review. IEEE Access 7, 129260–129290 (2019).

    Article  Google Scholar 

  80. Ruder, S., Peters, M. E., Swayamdipta, S. & Wolf, T. Transfer learning in natural language processing. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials 15–18 https://doi.org/10.18653/v1/n19-5004 (2019).

  81. Wang, J. et al. Data denoising with transfer learning in single-cell transcriptomics. Nat. Methods 16, 875–878 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Lieberman, Y., Rokach, L. & Shay, T. CaSTLe—classification of single cells by transfer learning: harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments. PLoS ONE 13, e0205499 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  83. Lotfollahi, M., Naghipourfar, M., Luecken, M. D. & Khajavi, M. Query to reference single-cell integration with transfer learning. Preprint at bioRxiv https://doi.org/10.1101/2020.07.16.205997 (2020).

  84. Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).

  85. Eng, C.-H. L., Shah, S., Thomassie, J. & Cai, L. Profiling the transcriptome with RNA SPOTs. Nat. Methods 14, 1153–1155 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  87. Giacomello, S. et al. Spatially resolved transcriptome profiling in model plant species. Nat. Plants 3, 17061 (2017).

    Article  CAS  PubMed  Google Scholar 

  88. Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Marioni, J. C. & Arendt, D. How single-cell genomics is changing evolutionary and developmental biology. Annu. Rev. Cell Dev. Biol. 33, 537–553 (2017).

    Article  CAS  PubMed  Google Scholar 

  90. Shafer, M. E. R. Cross-species analysis of single-cell transcriptomic data. Front. Cell Dev. Biol. 7, 175 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Vintsyuk, T. K. Speech discrimination by dynamic programming. Cybernetics 4, 52–57 (1972).

    Article  Google Scholar 

  92. Cacchiarelli, D. et al. Aligning single-cell developmental and reprogramming trajectories identifies molecular determinants of myogenic reprogramming outcome. Cell Syst. 7, 258–268 (2018).

    Article  CAS  PubMed  Google Scholar 

  93. Alpert, A., Moore, L. S., Dubovik, T. & Shen-Orr, S. S. Alignment of single-cell trajectories to compare cellular expression dynamics. Nat. Methods 15, 267–270 (2018).

    Article  CAS  PubMed  Google Scholar 

  94. Do, V. H. et al. Dynamic pseudo-time warping of complex single-cell trajectories. Preprint at bioRxiv https://doi.org/10.1101/522672 (2019).

  95. Velten, B., Braunger, J. M., Arnol, D., Argelaguet, R. & Stegle, O. Identifying temporal and spatial patterns of variation from multi-modal data using MEFISTO. Preprint at bioRxiv https://doi.org/10.1101/2020.11.03.366674 (2020).

  96. Kanton, S. et al. Organoid single-cell genomic atlas uncovers human-specific features of brain development. Nature 574, 418–422 (2019).

    Article  CAS  PubMed  Google Scholar 

  97. Gabaldón, T. & Koonin, E. V. Functional and evolutionary implications of gene orthology. Nat. Rev. Genet. 14, 360–366 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  98. Arendt, D. et al. The origin and evolution of cell types. Nat. Rev. Genet. 17, 744–757 (2016).

    Article  CAS  PubMed  Google Scholar 

  99. Elosua-Bayes, M., Nieto, P., Mereu, E., Gut, I. & Heyn, H. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. gkab043 (2021).

  100. Chidester, B., Zhou, T. & Ma, J. SpiceMix: integrative single-cell spatial modeling for inferring cell identity. Preprint at bioRxiv https://doi.org/10.1101/2020.11.29.383067 (2021).

  101. Kleshchevnikov, V. et al. Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics. Preprint at bioRxiv https://doi.org/10.1101/2020.11.15.378125 (2020).

  102. Andersson, A. et al. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun. Biol. 3, 565 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  103. Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-00830-w (2021).

  104. Svensson, V., Teichmann, S. A. & Stegle, O. SpatialDE: identification of spatially variable genes. Nat. Methods 15, 343–346 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell–cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Rood, J. E. et al. Toward a common coordinate framework for the human body. Cell 179, 1455–1467 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Camp, J. G., Platt, R. & Treutlein, B. Mapping human cell phenotypes to genotypes with single-cell genomics. Science 365, 1401–1405 (2019).

    Article  CAS  PubMed  Google Scholar 

  108. Nieto, P., Elosua-Bayes, M. M., Trincado, J. L. & Marchese, D. A single-cell tumor immune atlas for precision oncology. Preprint at bioRxiv https://doi.org/10.1101/2020.10.26.354829 (2020).

  109. Keener, A. B. Single-cell sequencing edges into clinical trials. Nat. Med. 25, 1322–1326 (2019).

    Article  PubMed  CAS  Google Scholar 

  110. Rajewsky, N. et al. LifeTime and improving European healthcare through cell-based interceptive medicine. Nature https://doi.org/10.1038/s41586-020-2715-9 (2020).

  111. Shalek, A. K. & Benson, M. Single-cell analyses to tailor treatments. Sci. Transl. Med. 9, eaan4730 (2017).

  112. Hotelling, H. Relations between two sets of variates. Biometrika 28, 321–377 (1936).

    Article  Google Scholar 

  113. Meng, C. et al. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief. Bioinform. 17, 628–641 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol. 21, 25 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  115. Stark, S. G. et al. SCIM: universal single-cell matching with unpaired feature sets. Bioinformatics 36, i919–i927 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Cao, K., Bai, X., Hong, Y. & Wan, L. Unsupervised topological alignment for single-cell multi-omics integration. Bioinformatics 36, i48–i56 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Duren, Z. et al. Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations. Proc. Natl Acad. Sci. USA 115, 7723–7728 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Saunders, A. et al. Molecular diversity and specializations among the cells of the adult mouse brain. Cell 174, 1015–1030 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Vieira Braga, F. A. et al. A cellular census of human lungs identifies novel cell states in health and in asthma. Nat. Med. 25, 1153–1163 (2019).

    Article  CAS  PubMed  Google Scholar 

  122. Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619–625 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Wang, A. et al. Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes. eLife 9, e62522 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Lawlor, M. et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 27, 208–222 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Segerstolpe, Å. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Cao, J. et al. A human cell atlas of fetal gene expression. Science 370, eaba7721 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  129. Domcke, S. et al. A human cell atlas of fetal chromatin accessibility. Science 370, eaba7612 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Bravo González-Blas, C. et al. Identification of genomic enhancers through spatial integration of single‐cell transcriptomics and epigenomics. Mol. Syst. Biol. 16, e9438 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  131. Pijuan-Sala, B. et al. Single-cell chromatin accessibility maps reveal regulatory programs driving early mouse organogenesis. Nat. Cell Biol. 22, 487–497 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Preisel, S. et al. Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation. Nat. Neurosci. 21, 432–439 (2018).

    Article  CAS  Google Scholar 

  133. Luo, C. et al. Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex. Science 357, 600–604 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Lee, D.-S. et al. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat. Methods 16, 999–1006 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Johnstone, I. M. & Titterington, D. M. Statistical challenges of high-dimensional datal. Philos. Trans. A Math. Phys. Eng. Sci. 367, 4237–4253 (2009).

    PubMed  PubMed Central  Google Scholar 

  136. Guo, F. et al. Single-cell multi-omics sequencing of mouse early embryos and embryonic stem cells. Cell Res. 27, 967–988 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2018).

    Article  PubMed  Google Scholar 

  138. Buettner, F. et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33, 155–160 (2015).

    Article  CAS  PubMed  Google Scholar 

  139. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).

    Article  CAS  PubMed  Google Scholar 

  140. Stegle, O., Teichmann, S. A. & Marioni, J. C. Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015).

    Article  CAS  PubMed  Google Scholar 

  141. Vallejos, C. A., Marioni, J. C. & Richardson, S. BASiCS: Bayesian analysis of single-cell sequencing data. PLoS Comput. Biol. 11, e1004333 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  142. Kharchenko, P. V., Silberstein, L. & Scadden, D. T. Bayesian approach to single-cell differential expression analysis. Nat. Methods 11, 740–742 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

R.A. and A.S.E.C. are supported by a PhD fellowship from the EMBL International PhD Programme. O.S. is supported by core funding from EMBL and the DKFZ, as well as the BMBF, the Volkswagen Foundation and the European Union (810296). J.C.M. acknowledges core funding from EMBL and core support from Cancer Research UK (C9545/A29580).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ricard Argelaguet, Anna S. E. Cuomo, Oliver Stegle or John C. Marioni.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Carl Herrmann and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Argelaguet, R., Cuomo, A.S.E., Stegle, O. et al. Computational principles and challenges in single-cell data integration. Nat Biotechnol 39, 1202–1215 (2021). https://doi.org/10.1038/s41587-021-00895-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-021-00895-7

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing