Abstract
Cellular identity in complex multicellular organisms is determined in part by the physical organization of cells. However, large-scale investigation of the cellular interactome remains technically challenging. Here we develop cell interaction by multiplet sequencing (CIM-seq), an unsupervised and high-throughput method to analyze direct physical cell–cell interactions between cell types present in a tissue. CIM-seq is based on RNA sequencing of incompletely dissociated cells, followed by computational deconvolution into constituent cell types. CIM-seq estimates parameters such as number of cells and cell types in each multiplet directly from sequencing data, making it compatible with high-throughput droplet-based methods. When applied to gut epithelium or whole dissociated lung and spleen, CIM-seq correctly identifies known interactions, including those between different cell lineages and immune cells. In the colon, CIM-seq identifies a previously unrecognized goblet cell subtype expressing the wound-healing marker Plet1, which is directly adjacent to colonic stem cells. Our results demonstrate that CIM-seq is broadly applicable to unsupervised profiling of cell-type interactions in different tissue types.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Evaluation of cell-cell interaction methods by integrating single-cell RNA sequencing data with spatial information
Genome Biology Open Access 17 October 2022
-
Articulating the “stem cell niche” paradigm through the lens of non-model aquatic invertebrates
BMC Biology Open Access 20 January 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Code availability
The CIM-seq R package is available under the LGPL-3 license on GitHub: https://github.com/EngeLab/CIMseq40.
References
Sato, T. et al. Paneth cells constitute the niche for Lgr5 stem cells in intestinal crypts. Nature 469, 415–418 (2011).
Morrison, S. J. & Scadden, D. T. The bone marrow niche for haematopoietic stem cells. Nature 505, 327–334 (2014).
Regev, A. et al. Science forum: the Human Cell Atlas. eLife https://doi.org/10.7554/eLife.27041 (2017).
Crosetto, N., Bienko, M. & van Oudenaarden, A. Spatially resolved transcriptomics and beyond. Nat. Rev. Genet. 16, 57–66 (2015).
Codeluppi, S. et al. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat. Methods 15, 932–935 (2018).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16, 987–990 (2019).
Giladi, A. et al. Dissecting cellular crosstalk by sequencing physically interacting cells. Nat. Biotechnol. 38, 629–637 (2020).
Halpern, K. B. et al. Paired-cell sequencing enables spatial gene expression mapping of liver endothelial cells. Nat. Biotechnol. 36, 962–970 (2018).
Boisset, J.-C. et al. Mapping the physical network of cellular interactions. Nat. Methods 15, 547–553 (2018).
Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8, 281–291.e9 (2019).
Poli, R., Kennedy, J. & Blackwell, T. Particle swarm optimization: an overview. Swarm Intell. 1, 33–57 (2007).
Lee, H., Pine, P. S., McDaniel, J., Salit, M. & Oliver, B. External RNA controls Consortium beta version update. J. Genomics 4, 19–22 (2016).
Gehart, H. et al. Identification of enteroendocrine regulators by real-time single-cell differentiation mapping. Cell 176, 1158–1173.e16 (2019).
Beumer, J. et al. Enteroendocrine cells switch hormone expression along the crypt-to-villus BMP signalling gradient. Nat. Cell Biol. 20, 909–916 (2018).
Tetteh, P. W. et al. Replacement of lost Lgr5-positive stem cells through plasticity of their enterocyte-lineage daughters. Cell Stem Cell 18, 203–213 (2016).
Grün, D. et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525, 251–255 (2015).
Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).
Rothenberg, M. E. et al. Identification of a cKit+ colonic crypt base secretory cell that supports Lgr5+ stem cells in mice. Gastroenterology 142, 1195–1205.e6 (2012).
Sasaki, N. et al. Reg4+ deep crypt secretory cells function as epithelial niche for Lgr5+ stem cells in colon. Proc. Natl Acad. Sci. USA 113, E5399–E5407 (2016).
Specian, R. D. & Oliver, M. G. Functional biology of intestinal goblet cells. Am. J. Physiol. 260, C183–C193 (1991).
Shoshkes-Carmel, M. et al. Subepithelial telocytes are an important source of Wnts that supports intestinal crypts. Nature 557, 242–246 (2018).
Valenta, T. et al. Wnt ligands secreted by subepithelial mesenchymal cells are essential for the survival of intestinal stem cells and gut homeostasis. Cell Rep. 15, 911–918 (2016).
Degirmenci, B., Valenta, T., Dimitrieva, S., Hausmann, G. & Basler, K. GLI1-expressing mesenchymal cells form the essential Wnt-secreting niche for colon stem cells. Nature 558, 449–453 (2018).
Zepp, J. A. et al. IL-17A-induced PLET1 expression contributes to tissue repair and colon tumorigenesis. J. Immunol. 199, 3849–3857 (2017).
Raymond, K. et al. Expression of the orphan protein Plet-1 during trichilemmal differentiation of anagen hair follicles. J. Investigative Dermatol. 130, 1500–1513 (2010).
Karrich, J. J. et al. Expression of Plet1 controls interstitial migration of murine small intestinal dendritic cells. Eur. J. Immunol. 49, 290–301 (2019).
Wang, X. et al. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10, 380 (2019).
Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat. Methods 17, 159–162 (2020).
Cabello-Aguilar, S. et al. SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Res. 48, 10 (2020).
van den Brink, S. et al. Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations. Nat. Methods 14, 935–936 (2017).
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
McGinnis, C. S., Murrow, L. M. & Gartner, Z. J. DoubletFinder: doublet detection in single-cell rna sequencing data using artificial nearest neighbors. Cell Syst. 8, 281–291.e9 (2019).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014); https://www.R-project.org/
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Ab Wahab, M. N., Nefti-Meziani, S. & Atyabi, A. A comprehensive review of swarm optimization algorithms. PLoS ONE 10, e0122827 (2015).
Serviss, J. T. & Enge, M. Cell intereaction by multiplet sequencing, v.1.0 (GitHub, 2021); https://doi.org/10.5281/zenodo.4729935
Acknowledgements
We thank R. Toftgård, J. Taipale and members of the Enge and Gerling laboratories for scientific suggestions and for critically reading the paper. We also thank S. Quake, S. Darmanis and members of the Quake laboratory, who contributed valuable ideas in the early development of the CIM-seq method. We are grateful to X.L. Wang for help with animal breeding. The Enge laboratory is supported by SFO StratRegen, The Swedish Cancer Society, The Swedish Childhood Cancer Fund, Radiumhemmets forskningsfonder, The Swedish Research Council (2020-02940) and Cancer Research KI. The Gerling laboratory is supported by The Swedish Society for Medical Research, The Swedish Research Council (2018-02023), Åke Wiberg’s Foundation, Jeansson’s Foundation and The Swedish Society of Medicine. Microscopy was performed at the LCI Facility/Nikon Center of Excellence, Karolinska Institutet (supported by the Wallenberg Foundation, KI and KTH).
Author information
Authors and Affiliations
Contributions
M.E., M.G., N.A. and J.T.S. planned the experiments. N.A. performed the experiments, with the help of N.G., A.B.A., E.D., I.Š. and R.H. N.B. and M.E. conceived and designed CIM-seq. J.T.S. and M.E. implemented the computational method. N.B. performed the droplet sequencing, and participated in analysis of the data. M.E. and M.G. wrote the paper with the help of J.T.S. and N.A.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Methods thanks Moshe Biton and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 CIM-seq benchmarking in a controlled setting.
a, Flow cytometry analysis of re-association rate. HCT116 singlets and multiplets were sorted separately and singlets were re-analyzed after 0, 30, and 120 minutes. b, Probability calculation based on synthetic multiplets modeled as individual poisson processes. Red line denotes the sequenced gene expression value in the multiplet. Black tick marks indicate the gene expression values of individual synthetic multiplets, and gray lines indicate the individual Poisson distributions for each of those values (as probability density, scale given on left y-axis). The probability of observing a given value is the average p-value density across all Poisson processes (-log10(p) shown as blue line, scale given on the right y-axis). This is the per-gene cost that is used in the maximum likelihood estimation to determine multiplet composition. Three genes which follow common patterns are shown for illustration – broad range of expression values (eg. relatively noisy gene expression), narrow range of expression values, and predominantly zero. c, Distribution of cell numbers per multiplet as determined by phase contrast microscopy. In total 99 multiplets were examined from 9 independent experiments. d, Fractional content of spike-in RNA (ERCC) can be used to estimate the number of cells in a multiplet. %ERCC (right-hand x-axis) is translated into number of cells (left-hand axis). e, ERCC-based cell number estimation correlates well with real cell number in multiplets with known cell composition. Centre of box plots indicate median value, with bounds marking upper and lower limits of interquartile range. Whiskers indicate observations outside of interquartile range, and data points falling outside of whiskers are outliers (< Q1−1.5 * IQR or > Q3 + 1.5 * IQR) f) Representative bright-field images used to quantify the number of cells per multiplet in 1c.
Extended Data Fig. 2 Analysis and deconvolution of artificial multiplets.
a, Dimensionality reduction (UMAP) and unsupervised classification of the sorted cell line singlets. b, Analysis of A375 (CD74) and HOS (ACTG2) specific marker expression in singlets and multiplets shows co-expression is only observed in multiplets. c, Summary of error rates of multiplets with known cell composition. Mean true positive rate (TPR), mean true negative rate (TNR) and mean misclassification rate (MCR) is shown for each multiplet cell number. d, Specific error rates. Cell line-based multiplets of a known composition results showing the number of samples (n), true positive rate (TPR), true negative rate (TNR), and misclassification rate (MCR) for each multiplet composition. e, Numbers of expected (sorted) and detected (deconvoluted) cell types in cell line-based multiplets of a known composition.
Extended Data Fig. 3 Droplet-based CIM-seq performs similarly to plate-based CIM-seq.
a, Detected number of genes in singlets, multiplets with number of distinct cell types identified through multiplet deconvolution between 1-4, and matched simulated multiplets. As expected, the number of genes expressed in multiplets increases in a logarithmic fashion with the number of cell types detected. This is due to the fact that the fraction of shared genes between cell types increases with the number of unique cell types. A multiplet was simulated by randomly picking gene reads from singlets of the predicted constituent classes to match the number of reads in the multiplet. n is the number of occurrences in the droplet small intestinal dataset, from one mouse; Singlets: 5279, 1: 1537 2: 1205 3: 541 4: 153. Centre of box plots indicate median value, with bounds marking upper and lower limits of interquartile range. Whiskers indicate observations outside of interquartile range, and data points falling outside of whiskers are outliers (< Q1−1.5 * IQR or > Q3 + 1.5 * IQR) (b) Estimated number of cells against number of unique cell types detected per multiplet in plate and droplet based CIM-seq. Centre of box plots indicate median value, with bounds marking upper and lower limits of interquartile range. Whiskers indicate observations outside of interquartile range, and data points falling outside of whiskers are outliers (< Q1−1.5 * IQR or > Q3 + 1.5 * IQR). n was the number of occurrences in the small intestinal plate and droplet datasets. Plate (3 mice); 0: 19, 1: 80, 2: 220, 3: 96, 4: 20. For Droplet (one mouse); 0: 227, 1: 1537 2: 1205 3: 541 4: 153. c, Top: Red lines indicate significant depletion of interaction in multiplets. Bottom: Green lines indicate relatively stronger enrichment at short distances (doublets) and blue lines at long distances (quadruplets). Line thickness indicates strength of enrichment/depletion strength. Note that Paneth/stem cell co-occurrence is enriched at short distances, whereas the less ordered enterocyte/late progenitors are approximately equally enriched. d, Cell numbers in multiplets sequenced by droplet-based scRNA-seq (top), estimated by UMI counts, and plate (bottom), estimated by spike-in mRNA. e, Comparison of the most highly enriched connections found with both methods. Connections are significant (p < 1E-3 for plate, p < 1E-10 for droplet) unless indicated by ‘-’. Connections indicated by ‘*’ involve the class ‘Progenitor late-2’, exclusive to the droplet-based method.
Extended Data Fig. 4 In situ hybridization (ISH) for marker genes of the small intestinal epithelium.
a, RNA ISH of the small intestine showing that Lgr5 and Lyz1 transcripts are exclusively located at the crypt base, and not seen in villi. b, Expression of Alpi is exclusive to small intestinal villi (c) Expression of Slc26a3 at the top of the crypt and lower end of the villi. C: crypt, Ap: apical lumen, St: stroma. Transcript colors and scale bars indicated in figures. Nuclear counterstain with DAPI. Brightness of channels was adjusted in order to improve clarity.
Extended Data Fig. 5 Characterization of cell types and states in the crypts of the colonic epithelium.
a, Differentially expressed genes between the three stem cell clusters of the colonic epithelium. b, Expression levels of canonical stem cell markers in colonic crypts among subsets. c, UMAP plots showing classification of goblet cells (left) and expression levels of marker genes of goblet cell subsets (right). Previously described markers of deep crypt secretory cells such as Kit and Cd24a show low specificity. d, Violin plots displaying Plet1 expression as well as previously described markers of deep crypt secretory cells.
Extended Data Fig. 6 Characteristics of Plet1 expressing cells of the colonic epitlelium.
a, Plet1 differential expression genes, log2(TPM + 1) normalized to [0, 1], overlaid on the colonic epithelium UMAP. b, The probability of observing Lgr5 expression in Muc2 + goblet cells dependent on Plet1 expression.
Extended Data Fig. 7 Microscopy of marker genes in healthy and regenerating colonic epithelium.
a, Consecutive sections from histologically unaffected regions of colon from a mouse treated with dextran sulfate sodium (DSS). H&E (top), RNA ISH of Lgr5 and Plet1 (middle), and immunohistochemistry of Mki67 (bottom). b, RNA ISH of Lgr5 and Plet1 at the bottom of crypts in regenerating colonic epithelium showing a similar pattern to that seen in normal healthy colon crypts (compare also Supplementary Figure 9); dashed line indicates epithelial/stromal interface. c, RNA ISH of Lgr5 and Slc26a3 of an unaffected colon showing both markers in exclusive anatomic regions with Slc26a3 marking differentiated cells close to the lumen. d) Left panel: multiplex RNA ISH for Lgr5, Plet1, and Best2. Both Plet1 and Best2 mark goblet cells, but only Plet1 + cells are located in the crypt base, while Best2 + cells are found more apically. Right panel: UMAP of sc-RNA seq results for Plet1 and Best2 showing mutual exclusivity of expression in goblet cell cluster. Transcript colors and scale bars indicated in figures. Nuclear counterstain with DAPI, except for (d), where TO-PRO-3 was used as nuclear counterstain. Representative images (from a total of n = 7 mice) of colonic tissue after treatment with DSS. e, Upper panel: Fluorescent RNA ISH for Lgr5 (green) and Plet1 (red) six days after start of DSS (first day of regeneration). Lower panel: Hematoxylin/eosin staining of a consecutive section. Note elongated, regenerative crypts left from a damaged area (arrow); Plet1 expression can be seen apically, while Lgr5 is seen exclusively in the crypt bottom. Compare with the expression of both Plet1 and Lgr5 exclusively in the crypt base in an area without severe histological changes (arrowhead). f, Stainings as in a), eight days after start of DSS (third day of regeneration). A large area of damaged epithelium is seen (left half of the images, arrow), in which Plet1 is expressed by most epithelial cells lining the damaged region. Compare to the right half of the image, in which epithelial damage is less severe and where Lgr5 and Plet1 are largely located next to each other in the crypt as in normal tissue. C: crypt, Ap: apical lumen, St: stroma. Transcript colors indicated in figures. Scale bars indicate 100 μm. Nuclear counterstain with DAPI. For all fluorescently labelled images brightness was adjusted in order to improve clarity of each channel.
Extended Data Fig. 8 Cell number estimation and alternative deconvolution strategies using CIM-seq.
a, Raw deconvolution result for plate based data from small intestine, before adjustments for cell number and cell size. b, Histogram of the fractions in a. Note the multimodal distribution, eg. that fractions are predominantly either zero, one, or in the 0.2-0.6 range. This is compatible with doublet/triplets being dominant as observed empirically, indicating that overfitting is not a major problem. c, Fractions from a deconvolution based on weighted least squares regression (‘MuSiC’). Note that the bimodal distribution from b) is replaced with predominance of small fractions. d,e, Comparison between empirical poisson and weighted least squares deconvolution methods. d) Number of edges for each class is highly correlated, but empirical poisson has overall fewer edges for all cell types except immune cells, which is an outlier in that it has very low mRNA content. e, CIM-seq results based on MuSiC deconvolution. Results are highly similar to using the empirical poisson method, although a connection between immune cells and stem cells is added. Note that since immune cells were excluded from the analysis in the FACS sort stage, this connection is known to be false. Circos diagram shows each cell type as a colored block, and co-occurrence of cell types in a multiplet as a thin line between blocks so that many co-occurrences form a wide band connecting them. Purple lines indicate that the connected cell types interact in a specific fashion with the color strength indicating the fold enrichment of connections compared to that expected by random chance. Nonsignificant (p > 1E-3, hypergeometric test with FDR correction, see methods for details) interactions are shown as grey lines. The fractional contribution of a cell type’s transcriptional profile to the multiplets it occurs in is indicated as a blue-green color scale (‘Fractions’). Shown is the average fraction for each combination of cell types. f, Deconvolution results on the small intestinal droplet data set, when omitting cell number estimation and assuming two cells per multiplet. Note that while some strong interactions are still visible (eg. enterocyte:late progenitor), performance is strongly degraded compared to Fig. 3c, with many interactions being demonstratively false (eg. Paneth:Enterocyte, Paneth:Progenitor early). Circos as in e g) Deconvolution of the small intestinal plate based data set, based on Random Forests implemented in ProximID (Boisset et al.), compare with Fig. 2. Blue lines indicate depletion, red lines enrichment. As expected, using the ProximID algorithm which assumes two cells per multiplet, does not produce sensible data. The widespread depletion might be due to the fact that there are a high fraction of singlets in the multiplet data, which will be interpreted as self-interaction.
Extended Data Fig. 9 Identification of cell types by CIM-seq is highly precise.
a, Covariate analyses in mouse gut dataset showing a lack of correlation between covariates and classification. b, Deconvolution of mouse gut singlets shows a high precision for all classes indicating the validity of the classification and the sufficiency of the provided features to allow discrimination between the different cell types. c, Deconvolution of the entire mouse gut dataset indicates a lack of enriched cross-tissue connections and thus implies a low false positive connection rate. To avoid interference from potential batch effects, deconvolution of each multiplet was done only to singlets from other mice (eg. the singlets originating from the same mouse as the multiplet being analyzed were removed from the data). Note that due to tissue-specific interactions, some previously nonsignificant connections (for example goblet cells in small intestine) are now called as significant. Circos and statistical test as in Extended Data Fig. 8e, with purple lines connecting significant interactions (p < 1E-3, hypergeometric test with FDR correction, see methods for details).
Extended Data Fig. 10 Re-analysis of the data from Giladi et al. using CIM-seq.
a, UMAP of the single-cell data from Giladi et al. b, Unsupervised louvain classification of the data in a). Class labels are inferred from expression of T-cell and B-cell markers. c, Histogram of the number of cells in the multiplet (‘PIC’) data from Giladi et al. Cell numbers were estimated in the same way as the droplet scRNA-seq data (using UMI counts). As expected, doublets are more prevalent in this data compared to our CIM-seq experiments. d, Results of CIM-seq deconvolution. Shown for completeness, statistics are not valid since the FACS sorting step has already selected heterotypic doublets. Circos and statistical test as in Extended Data Fig. 8e, with purple lines connecting significant interactions (p < 1E-3, hypergeometric test with FDR correction, see methods for details). e, Analysis of observed UMI counts in the multiplet data, as a function of expected UMI counts based on deconvolution into singlets. Labels indicate the genes highlighted in Giladi et al. for being more highly expressed than expected. Variation is generally lower than in Giladi et al, but the relative effect of the highlighted genes is similar.
Supplementary information
Supplementary Information
Supplementary Figs. 1–4.
Rights and permissions
About this article
Cite this article
Andrews, N., Serviss, J.T., Geyer, N. et al. An unsupervised method for physical cell interaction profiling of complex tissues. Nat Methods 18, 912–920 (2021). https://doi.org/10.1038/s41592-021-01196-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-021-01196-2
This article is cited by
-
Evaluation of cell-cell interaction methods by integrating single-cell RNA sequencing data with spatial information
Genome Biology (2022)
-
Articulating the “stem cell niche” paradigm through the lens of non-model aquatic invertebrates
BMC Biology (2022)
-
Cell interaction by multiplet sequencing
Nature Reviews Genetics (2021)
-
Triangulating spatial relationships from single-cell interaction maps
Nature Methods (2021)