Computational principles and challenges in single-cell data integration

Nature Biotechnology (2021)Cite this article

Subjects

Abstract

The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term ‘data integration’ has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods.

Fig. 1: Alternative choices of anchors for data integration.
Fig. 2: Cell-type-specific eQTL mapping as an example of local vertical integration.
Fig. 3: Mosaic integration.
Fig. 4: Mapping time-resolved single-cell genomics experiments across species.
Fig. 5: Data integration of spatially resolved transcriptomics.
Fig. 6: Exploiting molecular variation at single-cell resolution to construct population-level maps of human phenotypic variation.

