NIPMAP: niche-phenotype mapping of multiplex histology data by community ecology

El Marrahi, Anissa; Lipreri, Fabio; Kang, Ziqi; Gsell, Louise; Eroglu, Alper; Alber, David; Hausser, Jean

doi:10.1038/s41467-023-42878-z

Download PDF

Article
Open access
Published: 07 November 2023

NIPMAP: niche-phenotype mapping of multiplex histology data by community ecology

Nature Communications volume 14, Article number: 7182 (2023) Cite this article

3900 Accesses
84 Altmetric
Metrics details

Subjects

Abstract

Advances in multiplex histology allow surveying millions of cells, dozens of cell types, and up to thousands of phenotypes within the spatial context of tissue sections. This leads to a combinatorial challenge in (a) summarizing the cellular and phenotypic architecture of tissues and (b) identifying phenotypes with interesting spatial architecture. To address this, we combine ideas from community ecology and machine learning into niche-phenotype mapping (NIPMAP). NIPMAP takes advantage of geometric constraints on local cellular composition imposed by the niche structure of tissues in order to automatically segment tissue sections into niches and their interfaces. Projecting phenotypes on niches and their interfaces identifies previously-reported and previously-unreported spatially-driven phenotypes, concisely summarizes the phenotypic architecture of tissues, and reveals fundamental properties of tissue architecture. NIPMAP is applicable to both protein and RNA multiplex histology of healthy and diseased tissue. An open-source R/Python package implements NIPMAP.

CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity

Article 08 December 2023

Unsupervised discovery of tissue architecture in multiplexed imaging

Article 31 October 2022

An end-to-end workflow for multiplexed image processing and analysis

Article 10 October 2023

Introduction

The function of healthy tissues and their disruption in disease depends on the cooperation between cells of different types: hepatocytes in the liver, neurons in the nervous system, immune cells, endothelial cells, fibroblasts, and more¹.

To carry out their functions, cells adopt different phenotypes such as activated or quiescent, adhesive or motile, proliferative or senescent². According to the histological principle of functional zonation, cell types, and their phenotypes organize spatially to facilitate tissue function³. For example, in the liver, hepatocytes perform different functions depending on their position along an artery-vein axis³. In the lymph node, B cells need to relocalize from the B-cell zone to T-cell zone to potentiate antibody-mediated immunity⁴.

The disruption of this organization can directly contribute to disease progression and guide clinical decisions. For example, disruption of pancreatic islet architecture through influx of T cells correlates with the onset of type 1 diabetes⁵. In cancer, tumors can be stratified by the density and distribution of cytotoxic T cells⁶: tumors with dense and uniform T cell infiltration respond best to immune checkpoint inhibitors whereas tumors in which T cells are segregated away from cancer cells show poorer response⁶. Thus, tissue biology and medicine can benefit from characterizing the spatial organization of cell types and their phenotypes in tissues.

In revealing the spatial organization of cells and their phenotypes in tissues, classical techniques such as histology and immunofluorescence imaging are limited to a handful of molecular markers. But in recent years, advances in mass spectrometry such as Multiplexed Ion Beam Imaging (MIBI)⁷ and Imaging CyTOF^8,9 have allowed the quantification of dozens of protein and non-protein markers with single-cell resolution. Doing so has also become feasible by multiplex immunofluorescence microscopy thanks to protein-based methods such as t-CycIF¹⁰, 4i¹¹, Codex¹² as well as RNA-based methods such as MERFISH¹³ and in situ sequencing^14,15. After image processing and cell segmentation, multiplex histology produces rich data in the form of a cell-by-feature table, where features represent the cells’ 2D position in the sample, their types, and quantification of dozens to thousands of molecular markers (Supplementary Fig. 1). Markers are typically chosen to identify the type of cells—hepatocytes, neurons, immune, endothelial cells,—and their phenotype.

The recent increase in the number of cell types and phenotypes that can be surveyed in tissue sections leads to a combinatorial challenge in interpreting the data. For example, visualizing the spatial architecture of cellular phenotypes in a multiplex histology dataset of 15 cell types with 15 phenotypic markers in 40 samples requires surveying 9000 images (15 cell types × 15 markers × 40 samples), each with 10,000 to 1,000,000 cells depending on the imaging technology.

Identifying spatial phenotypic interactions—for example, what cancer phenotypes associate with local suppression of anti-cancer immune activity—is even more daunting: co-visualizing all possible pairs of 15 phenotypes for 15 cell types produces 50,000 images (15 cell types × 15 markers to the square) from a single tissue section. An additional layer of complexity is that phenotypes may only interact in specific tissue regions, or at the interface between specific histological niches. To address these combinatorial challenges, a systematic approach is needed to summarize the cellular and phenotypic architecture of tissues and identify its most salient features.

Tissues are structured into histological niches¹⁶. Within each niche, each cell type has a specific density, defined as the abundance of cells of that type per surface area of the niche. The niche recurs over the tissue section so that a limited number of niches is sufficient to capture the tissue’s cellular architecture, by piecing niches together. In this view, interpreting the tissue architecture from multiplex histology data consists in (a) identifying histological niches and (b) segmenting the image into these niches (Fig. 1a).

**Fig. 1: Ideas from community ecology can complement clustering-based approaches in interpreting multiplex histology data.**

To automatically identify histological niches, one can determine the local cellular composition at numerous sampling sites—defined as cells found within tissue areas of a given size—across a tissue section. Alternatively, sampling sites can be groups of contiguous cells, identified by graph-based community methods¹⁷. The histological niches of the tissue are then revealed as clusters of sites with similar cellular composition^17,18.

While the clustering approach has found numerous successful applications in interpreting multiplex histology data so far^17,18, interpreting the data can benefit from ideas from the field of community ecology, a field with a long history of uncovering spatial patterns^19,20.

Community ecology studies how different species—the ecological analog to our cell types—co-habit in different spatial niches—our histological niches. Within each niche, sites are selected and fieldwork is performed to quantify the species composition at these sites, similar to how multiplex histology surveys cellular composition across a tissue section.

Up until the 1950s, sites and species were then clustered to reveal the organization of species in the different niches (clusters of sites on Fig. 1b). But since the work of Goodall¹⁹, sites are scattered on axes that represent cellular composition using mathematical procedures such as principal components analysis (PCA), so that the proximity of sites reflects the similarity of their cellular composition. Positioning sites on axes of cellular composition allows examining if clustering sites is justified. In the eventuality that sites do not form clusters, the revealed structure of sites can suggest interpretations that better suit the data than clusters (Fig. 1c).

Applying the community ecology approach to multiplex histology data reveals a caveat of clustering sites to identify niches. Sites will form clusters of cellular composition if histological niches occupy distinct areas of the tissue with few interfaces, so that sites belong to only one niche (Fig. 1b). But, when niches colocalize and form larger interfaces, many sites lie at the interface of niches. Because the cellular composition of these sites is a mix (a weighted average) of the corresponding niches, no clear clusters can be distinguished by scattering the cellular composition of sites (Fig. 1c).

Instead, in the case of a two-niche tissue, sites describe a segment in cellular composition space. At each extremity of the segment, we find sites located in the core of the corresponding niche (Fig. 1c). Sites located at the interface between two niches fall in the middle of the segment.

The notion that the local cellular composition of tissues does not necessarily form clusters of cellular composition can help interpret multiplex histology data in two ways. First, when sites do not form clusters, many clusters can be needed to describe tissue architecture, potentially leading to an inflation of clusters of unclear histological significance that over-complicate our view of tissue architecture with little scientific benefit. Interpreting local cellular composition as a continuum defined by a parsimonious number of niches can help address this.

Second, interfaces between niches are of interest to interpret tissue dynamics, for example, how tumor progression associates with the biology of the cancer immune interface⁶. Yet, existing approaches to finding interface regions require parameter tuning to specify their cellular composition or the local image properties of interfaces, a time-consuming and potentially subjective process^21,22,23. Recognizing that sites that fall in the middle of the segment represent interface regions can identify interfaces automatically.

Here, we implement the community ecology approach for niche-phenotype mapping (NIPMAP, https://github.com/jhausserlab/NIPMAP) of multiplex histology data (Fig. 1d). Applying NIPMAP to protein and RNA multiplex histology of healthy and pathological tissues reveals unexpected geometry in the cellular composition of sites: sites do not form clusters of cellular composition but instead fall on simplexes, the geometric generalization of triangles or triangular pyramids to arbitrary dimensions. These simplexes are automatically identified using algorithms from satellite image analysis to explain spatial variation in the cellular composition of tissues in terms of histological niches and their interfaces. Projecting cellular phenotypes onto niches and their interfaces reveals known and novel spatial phenotypes, and concisely summarizes how these phenotypes associate with niches and their interfaces (Fig. 1d). Finally, analyzing the niche-interface architecture of tissues uncovered by NIPMAP reveals that (a) spatial context is a stronger determinant of phenotype than cell-autonomous effects, and (b) both niches and their interfaces structure the cellular and phenotypic architecture of tissues (Fig. 1d).

Results

Community ecology niches offer a concise and accurate framework to interpret the cellular architecture of tissues

We illustrate the community ecology approach in a multiplex histology dataset of 17 cell types in 40 triple-negative breast tumor samples (Fig. 2a, Supplementary Fig. 1, Methods) from Keren et al.²¹.

**Fig. 2: Community ecology-inspired niches offer a quantitative framework to interpret cellular tissue architecture from multiplex histology data.**

We determined the cellular composition—the number of cells of each type per unit area (Methods)—of 4000 sites: 100 sites per tumor sample in each of our 40 tumor sections.

Sites are points positioned on axes that represent cellular composition. Because there are 17 cell types, 17 axes (dimensions) are needed in principle. This creates a representation challenge as human intuition is limited to 3 dimensions. However, two principles decrease the number of axes required to interpret tissue architecture. First, the abundance of certain cell types varies little across sites and thus contributes little to tissue architecture so that the corresponding axes can be neglected. Second, the abundance of cell types can correlate across sites—for example, because the cells cooperate in performing a tissue function—so that these cell types can be grouped into a single axis. The axes that optimally capture site cellular composition can be determined automatically by PCA, following the community ecology approach.

To interpret tissue architecture, it is important to set the radius of sampling sites to an appropriate size. Sites need to be large enough to capture local coordination in cellular composition and small enough to avoid blurring this coordination across different niches. To determine an appropriate radius, local coordination was quantified by the number of axes—principal components (PCs)—needed to capture spatial variation in cellular composition. When sampling sites are too small, they cover only one cell at a time: there is little covariance in the cellular composition of sites, and many axes (PCs) are thus needed to capture the cellular composition of sites. Increasing the radius of sites to include neighboring cells reveals covariance structures so that a smaller number of axes (PCs) is sufficient in capturing site cellular composition.

We found that 8 or more PCs are required to capture site cellular composition when the site radius is smaller than 10 μm (~1 cell, Fig. 2b). When sites have a 25μm radius, three PCs are enough to capture 82% of the variance in site cellular composition (Fig. 2b, c). A site radius of 25μm implies that cellular coordination emerges at a length scale of 2–4 cells. Increasing the site radius beyond 25 μm uncovered little novel covariance. We thus set the sampling site radius to 25 μm.

Scattering sites on three PCs revealed no clear clusters (Fig. 2c). Instead, sites described a continuum with the shape of a 3D simplex: a pyramid with a triangular basis. This observation has significance for interpreting tissue architecture. Any point within a simplex can be described as a weighted average of the endpoints that define the simplex (Fig. 2d). Thus, observing that sites are constrained by the geometry of a 3D simplex implies that local cellular composition of the tissue is a mix (weighted average) of four histological niches, the endpoints of the 3D simplex (Fig. 2d). Sites close to endpoints represent cores within the niches, whereas sites halfway between endpoints localize at the interface between two niches (Fig. 2d). This interpretation generalizes the two-niches-and-interface interpretation of continua of site composition introduced in Fig. 1c to more than two niches.

Coloring sites according to their position in the simplex—the contribution of the four niches to site cellular composition—segments tissue sections into histological niches (Fig. 2e). In this niche view of tissue architecture, spatial variation in cellular composition is explained by a locally varying mix of the four niches (Fig. 2e, Supplementary Fig. 2).

The simplex endpoints—and thus the niches—can be identified automatically using hyperspectral unmixing algorithms from the field of satellite imaging²⁴ or by archetype analysis from machine learning^25,26. We used the latter algorithms in the present analysis (Methods). The statistical significance of fitting a simplex to sites was quantified using the t-ratio test^27,28 (here p < 0.001, n = 4000 sampling sites, one-sided, t-ratio = 1.045 in original data, t-ratio 95% confidence interval in shuffled data [1.21−1.56]).

To automatically delineate interface regions between any pair of niches, we note that sites located at the interface between two niches have high weights for both niches. Thus, multiplying the weights of these niches specifically produces high scores for sites located at the interface between the two niches (Fig. 2f). Sorting samples by increasing the prevalence of tumor-immune interfaces (Methods) recovered the mixed vs compartmentalized sample classification proposed by Keren et al. as well as previously reported differences in the immuno-signaling environment of mixed vs compartmentalized samples (Supplementary Fig. 3a-b).

Examining the niches’ cellular composition allows us to interpret the biology of each niche (Fig. 2g). The light blue niche is characterized by a high density of cancer cells (Supplementary Fig. 4a). The black niche features mostly macrophages and mesenchymal-like cells at low density (Fig. 2g) and could thus represent the fibrotic, necrotic niche. In the red niche, we find a mix of CD68⁺ and MHCII⁺ macrophages (Supplementary Fig. 4b), CD8 and CD4-T cells, and natural killer (NK) cells (Fig. 2g). A regulatory T cell phenotype can be excluded for the CD4-T cells in this niche as T regulatory cells were assigned their own cell type-based on co-expression of CD4 and FOXP3²¹ (Supplementary Fig. 4c). This combination of cell types suggests a type 1 inflammatory region whose function is to trigger anti-cancer immunity⁴. The pink niche may represent the tertiary lymphoid structure (TLS): we find MHCII⁺ and CD45RO⁺ B cells and CD4-T cells (Fig. 2g, Supplementary Fig. 4d, e). MHCII and CD45RO are both expressed by B cells activated by antigen recognition^29,30.

We note that niches were determined by collecting 100 sites per sample, so that the total area covered by sites represents 30% of the image area. Such a sampling intensity is sufficient to accurately identify niches while speeding up computations (Supplementary Fig. 5a, b, Methods).

While the four niches correspond to known histological areas from breast pathology³¹, clustering-based niches often find a dozen histological niches^8,9,18. More clusters potentially allow a more accurate description of tissue architecture, at the cost of increased complexity. To determine if increasing the number of niches improves the accuracy of tissue description, we quantified how accurately different numbers of clustering- and community ecology-based niches captured site cellular composition (Methods). We find that 4 community ecology-based niches capture 82% of the inter-site variance in cellular composition while 4 clustering-based niches capture 58% of the variance (Fig. 2h). To describe tissue architecture as accurately as 4 community ecology-based niches, >15 clustering-based niches are needed (Fig. 2h). Thus, community ecology-inspired histological niches provide an accurate yet concise description of tissue architecture.

The community ecology approach can also address artifacts of clustering. For example, sites with similar cellular composition can be assigned to different clusters (Fig. 2i), and a given cluster can contain sites both in the niche core and at its interface (Fig. 2i).

Decreasing or increasing the number of niches from 4 niches down to 2 niches or up to 7 niches causes niches to merge into more coarse-grained niches or split into increasingly fine-grained sub-niches (Supplementary Fig. 6a, b). While we used four niches here to balance accuracy and conciseness, this balance can be tuned by adjusting the number of the niches up or down to zoom in or out on the complexity of tissue architecture.

The number of identifiable niches depends on their prevalence and the amount of available tissue data. A power analysis based on tissue simulations suggests that a single MIBI image is sufficient to capture a niche that covers 3% of the tissue area or more (Methods, Supplementary Fig. 7a, b). The probability of capturing a rare niche scales as the product of niche prevalence times data size, so that increasing the amount of data allows identifying rarer niches. For example, the tissue area of 6 MIBI images allows the identification of niches that occupy less than 1% of the tissue area (Supplementary Fig. 7b).

Niches identified by NIPMAP generalize across patients of different breast cancer types and connect the microscopic and macroscopic levels of tumor architecture

The four niches are shared across tissue sections. Certain tissue sections make use of all niches (for example patient 35 in Fig. 2j) while others use only a few niches (patient 4 for example, Fig. 2j).

The observation that tissue sections from different cancer patients are composed of the same niches allows for connecting the microscopic cellular architecture of tumors—revealed by multiplex histology—with their macroscopic cellular architecture—revealed by non-spatial methods such as flow cytometry or single-cell mass cytometry. If tumors from different patients are made of the same niches, inter-patient variation in the macroscopic cellular composition of tumors is expected to fall on a simplex whose endpoints represent the four niches (Methods, Supplementary Note 1).

To test this prediction, we compared the macroscopic cellular composition of 128 breast tumors measured by CyTOF³² with the microscopic niches found in the multiplex histology data of 40 triple-negative tumors from Keren et al.²¹. The macroscopic data of Wagner et al.³² represents dissociated samples with a volume of 100 mm³, 7 orders of magnitude larger than the microscopic sampling sites we employed in analyzing the data from Keren et al.²¹ ((25 μm)³ = 1.6 × 10⁻⁵ mm³). The data from Wagner et al.³² also originates from different breast cancer types—ER+, PR+, Her2+, and triple-negative: this allows testing if microscopic niches identified in the triple-negative breast tumors of Keren et al.²¹ generalize across breast cancer types.

As predicted, inter-patient variation in macroscopic cellular composition falls on a low-dimensional simplex bound by the four microscopic niches (Fig. 2k). Cancer and fibrotic niches occupy their own corner of the simplex, as expected. Unexpectedly, the TLS and inflammatory niches share the same corner despite having different cellular compositions (Fig. 2g). This is because the prevalence of the TLS and inflammatory niches is macroscopically coupled in tumors (Supplementary Note 2). The four niches also explain why certain combinations of cell types are found in tumors and why others can never be observed (Supplementary Note 3).

In summary, community ecology-based histological niches emerge from the local cellular composition of tissues at a length scale of 2–4 cells. Niches can be identified automatically by algorithms from satellite image analysis and machine learning. They have a clear histo-pathological interpretation and provide a concise yet accurate description of tumor architecture that generalizes across patients, tumor types, and the microscopic-macroscopic levels of tumor architecture. This suggests that community ecology-based niches can provide an objective foundation to interpret tissue architecture.

Niche-phenotype mapping identifies spatial phenotypes and summarizes the phenotypic architecture of tissues

Having identified histological niches and segmented tissue sections accordingly, we determined how niches and their interfaces are associated with cellular phenotypes.

To do so, we took advantage of single-cell, spatial measurements of 18 phenotypic markers also profiled by Keren et al.²¹ alongside the 17 lineage markers used to determine cell types (Supplementary Fig. 8a). We looked for phenotypic markers whose intensity associates with the position of cells in a given niche or at a given interface. The position of cells in a niche was quantified as the weight of that niche. Similarly, the position of cells at an interface between two niches was quantified as the product of the weights of these two niches. We then correlated the niche/interface weight with the intensity of phenotypic markers (Spearman’s rank order correlation coefficient ρ, p values: two-sided t test, false discovery rates from Strimmer³³, Methods).

Statistically significant niche-phenotypes associations (fdr < 1%, ρ > 0.3) were ordered by cell types and visualized as a heatmap (Fig. 3a). To explore these associations in a phenotype-centered rather than in a cell type-centered manner, niche-phenotypes associations can also be sorted by phenotypes (Supplementary Fig. 8b). Out of the 3040 possible niche-phenotypes associations in this dataset (16 cell types × 19 phenotypic markers × 10 niches and interfaces), significant associations were reported as a table which concisely summarizes the phenotypic architecture of the tissue (Table 1).

**Fig. 3: Niche-phenotype mapping identifies spatial phenotypes and summarizes the phenotypic architecture of breast tumors.**

Table 1 NIPMAP concisely summarizes the cellular and phenotypic architecture of tissue samples as a table of niches/interfaces and associated cellular phenotypes

Full size table

Niche-phenotype mapping recovered expected spatial phenotypes. For example, among B cells, the HLA-DR (MHCII) phenotype is associated with the TLS niche while HLA-DR negative cells—presumably plasma cells—localize in other niches (Fig. 3b)⁴. Neutrophils and tumor cells with an HLA-DR (MHCII) phenotype localized in the inflammatory region (Fig. 3c, Table 1). While MHCII expression is normally restricted to antigen-presenting cells of the immune system⁴—dendritic cells, macrophages, B cells—MHCII+ neutrophils are emerging as actors in anti-tumor immunity³⁴. MHCII expression in tumor cells has also been reported previously and associates with positive prognosis³⁵.

In keratin-positive tumor cells, the MHCI marker associated with the interface of cancer and inflammation niches (Fig. 3d, Supplementary Fig. 10). This suggests that MHCI expression in tumor cells could determine the position of the cancer-inflammation interface. Alternatively, the proximity of the inflammatory niche could induce MHCI in neighboring cancer cells or secrete MHCI as a soluble form ref. ³⁶ (Supplementary Fig. 10).

Niche-phenotype mapping also highlighted unexpected spatial phenotypes. CD45RO+ macrophages and dendritic cells localized in the inflammatory niche (Fig. 3e, Supplementary Fig. 9). CD45, a commonly used marker of bone marrow-derived immune cells, has several splicing isoforms. The CD45RO isoform is a marker of activated and memory T cells as well as activated B cells with highly mutated B-cell receptors (BCR)^29,30. In the context of macrophages, CD45RO was previously reported to be the dominant CD45 isoform and to function as a cell adhesion receptor that inhibits pro-inflammatory macrophages³⁷. The literature suggests that the CD45RO signal we analyze here is specific to the CD45RO isoform (see Discussion).

Keratin 6, a highly abundant protein that forms intermediate filaments in the cytoskeleton of epithelial cells³⁸ was found in dendritic cells of the inflammatory niche (Fig. 3f), perhaps because DCs first migrate to the cancer niche where they take up Keratin 6, before re-localizing to the inflammatory niche.

Neutrophils located at the interface of cancer and inflammatory niches are positive for the immunosuppressive markers IDO and PD-L1 (Fig. 3a, g, Table 1). This suggests a potential role of neutrophils in facilitating the immune escape of cancer cells.

Both niches and their interfaces structure spatial variation in cellular phenotypes

NIPMAP can be used to explore fundamental questions regarding the cellular and phenotypic architecture of tissues.

We illustrate this by examining the origin of phenotypic heterogeneity of the tumor microenvironment. This heterogeneity could originate from the spatial context of cells, with local signaling cues determining cellular phenotypes. Phenotypic heterogeneity could also stem from cell-autonomous phenomena such as catching cells at different points of differentiation trajectories or from stochasticity in adopting different phenotypes. Cell-autonomous and spatial contexts can both contribute to phenotypic heterogeneity^39,40,41,42.

Understanding the relative contribution of cell-autonomous phenomena vs spatial context to phenotypic heterogeneity has practical implications for spatial and single-cell omics data analysis. If spatial context drives phenotypes, phenotypes relevant to spatial tissue architecture are expected to emerge in spatially-agnostic analyses such as phenotypic clustering¹⁷. But spatially-agnostic methods are expected to struggle at identifying spatial phenotypes if cell-autonomous factors dominate phenotypic heterogeneity: in this case, spatial approaches to spatial phenotype identification such as NIPMAP are needed.

We find that niche-phenotype mapping and (spatially-agnostic) phenotypic clusters identify shared as well as different phenotypic markers (Fig. 4a for dendritic cells, other cell types in Supplementary Fig. 8c–e). For example, in DCs, the CD45RO and Keratin 6 markers associate with the inflammatory niche and define phenotypic clusters 4 6 and 1. Yet, niche-phenotype mapping identifies an additional marker, HLA class 1, not highlighted by phenotypic clustering. This suggests that clustering can fail to highlight spatial markers, perhaps because phenotypic clusters are not just driven by space but also by cell-autonomous effects independent of spatial context. In support of this hypothesis, phenotypic clustering highlights CD138 and HLA-DR, both of which are poorly associated with space in DCs (Fig. 4a).

**Fig. 4: Niche-phenotype mapping reveals two fundamental properties of tissue phenotypic architecture.**

This raises the question of the relative contribution of cell-autonomous phenomena and spatial context to the phenotypic heterogeneity of the tumor microenvironment.

If cell-autonomous effects dominate phenotypic heterogeneity, phenotypic clusters are expected to show poor association to space: clustering cells by phenotypic markers without regard to spatial context will produce clusters that poorly predict a cell’s niche (Fig. 4b) Conversely, phenotypic clusters will predict the niche if spatial context determines phenotypic heterogeneity (Fig. 4c).

To test this, we examined how tightly phenotypic clusters associate with spatial context. As an upper bound for how precisely the marker panel and marker quantification can position cells in space, we use a linear predictor of a cell’s niche from phenotypic marker intensities (area under the curve = 0.89 in predicting which DCs located in the inflammatory niche, Fig. 4d).

In dendritic cells, we find that one phenotypic cluster predicts the inflammatory niche as accurately as the linear model (Fig. 4d). Other DC clusters predict the location of DCs in other niches (Supplementary Fig. 8c). These observations generalize to other cell types (Supplementary Fig. 8d, e).

This suggests that the phenotypic heterogeneity of the tumor microenvironment is driven both by the spatial context—niche or interface—in which cells find themselves and by cell-autonomous effects, with spatial context playing a bigger role than cell-autonomous effects.

Another fundamental question of tissue architecture is whether niches contribute more to structuring phenotypic heterogeneity than interfaces.

To find out, we quantified how much a given niche or interface associates with phenotypic heterogeneity by summing the squared correlations of all phenotypes of all cell types for that niche or interface (the rows of the matrix in Fig. 3a). If niches structure phenotypic heterogeneity more than interfaces, the 4 niches are expected to have a larger sum of squared correlations compared to the six interfaces (Fig. 4e). Conversely, if interfaces structure phenotypic heterogeneity more than niches, the interfaces are expected to have a larger sum of squared correlations (Fig. 4e).

We find that both niches and interfaces can have a large sum of squared correlations (Fig. 4e). The inflammatory and fibrotic niches as well as the inflammatory-cancer and inflammatory-fibrotic interfaces contribute most to phenotypic heterogeneity in the context of the present phenotypic marker panel. This suggests that phenotypic heterogeneity is structured by both niches and interfaces. Interfaces thus represent histological areas in which cells adopt specific phenotypes, different from the niches that meet at the interface. For example, phospho-S6+ Tregs and phospho-S6+ macrophages are specific to the cancer-inflammatory interface while CD45RO+ B cells specifically located at the TLS-inflammatory interface.

NIPMAP identifies the cellular and phenotypic architecture of developing lung profiled by in situ RNA sequencing

So far, we applied NIPMAP to spatial profiling of tumor tissues at the protein level. However healthy tissues and RNA profiling data can also be interpreted with NIPMAP. We illustrate this by applying NIPMAP on single-cell, spatial RNA profiling of healthy embryonic human lungs by In Situ Sequencing (ISS, Fig. 5a)^43,44.

**Fig. 5: NIPMAP generalizes to RNA-based spatial profiling of healthy tissue.**

Similar to tumors, we find that covariance structure in local cellular composition emerges in sites with a 25 μm radius (Supplementary Fig. 11a). Four PCs are sufficient to capture 85% of the spatial variation in the cellular composition of the tissue.

Spatial variation in the cellular composition of the developing lung fits a simplex with five endpoints (Supplementary Fig. 11b), suggesting 5 niches (p = 0.001, n = 20, 000 sites, one-sided t-ratio test, t-ratio statistic = 1.496 in original data, t-ratio 95% confidence interval in shuffled data [1.59 − 2.07]). A 5-end-point simplex is a four-dimensional geometrical object which makes it difficult to visualize. To address this, we projected sites on the faces of the simplex (Methods). Examining the distribution of projected sites on the faces of the simplex, we observed sites close to all 5-endpoints, supporting the existence of all 5 niches (Supplementary Fig. 11b).

Quantifying the cellular composition of each niche suggested (1) epithelial, (2) parenchymal, (3) smooth muscle, and (4) vessel niches, as well as (5) ductal and alveolar (liquid-filled) space (Fig. 5b, c). In well-formed ducts, we observed that the epithelium separates the ductal space from the smooth muscle niche (Fig. 5b, d), as expected. The vessel niche does not associate with the alveolar space nor the epithelial niche (Fig. 5b, Supplementary Fig. 11b), as expected at this stage of development (week 13)⁴⁴.

As in tumors, we find that niches structure phenotypic architecture (Fig. 5e, Supplementary Fig. 11c). For example, localization in the vessel niche of arterial cells and pericytes correlated with expression of JAG1 (Fig. 5f) whose functions in endothelial development were previously reported⁴⁵. The platelet-derived growth factor A (PDGFA) ligand and receptor respectively are associated with the vascular vs parenchymal context of pericytes (Fig. 5e, Supplementary Fig. 11d, e).

To test the robustness of the identified niches with respect to cell type granularity, we repeated niche identification using the 73 cell types proposed by Sountoulidis et al.⁴⁴ instead of the 32 coarser-grained cell types of Fig. 5. Increasing the number of cell types, we find five niches with cellular composition and spatial distribution similar to those found with 32 cell types (Supplementary Fig. 12). This suggests that niches show a degree of robustness to cell type granularity.

Thus NIPMAP generalizes to spatial RNA profiling data and healthy tissues.

Discussion

Multiplex histology produces rich datasets in the form of the location of 10,000–1,000,000 cells, dozens of cell types, and dozens to thousands of phenotypic markers. 100,000+ images of phenotype interactions can be produced from a single sample, which leads to a combinatorial challenge in visualizing and interpreting the data. To address this, we introduce NIPMAP, adapting methods from community ecology and satellite image analysis to multiplex histology data in order to (a) identify the histological niches underlying spatial tissue architecture and (b) summarize how histological niches and their interfaces structure cellular phenotypes.

Applying NIPMAP to multiplex histology data from healthy and disease samples reveals that the local cellular composition of tissues has the low-dimensional geometric structure of a simplex. The endpoints and halfway points of the simplex represent histological niches and their interfaces. Niches match known histo-pathological areas and provide a concise yet accurate summary of tissue architecture.

In the context of breast tumors, these niches generalize across patients and tumor types and connect the microscopic and macroscopic levels of cellular architecture. Individual phenotypic markers are mapped on these niches to identify spatial phenotypes and summarize how phenotypes integrate into histological niches and their interfaces. Analyzing how phenotypes associate with niches and their interfaces suggests that spatial context and cell-autonomous effects both determine phenotypes, with the former having a larger influence than the latter. Phenotypic heterogeneity is structured both by niches and their interfaces, with interfaces being home to specific cellular phenotypes.

Errors in cellular segmentation and lateral signal spill-over can lead to mis-assigning cell types and phenotypes, potentially leading to false positives or false negatives during niche-phenotype mapping⁴⁶ (see the p53 marker in Supplementary Fig. 10 for example). Even in perfectly segmented tissues, marker signal can be mis-attributed: a marker can associate with cells of a given type within a given niche not because cells of that type express the marker but instead because the marker is systematically present in that niche, perhaps as a soluble form or as a constituent of the extra-cellular matrix (see the CD138 marker in Supplementary Fig. 9 for example). NIPMAP is designed as the final layer of the multiplex histology data processing stack. It does not attempt to correct cellular segmentation, cell type assignment, or signal attribution errors: these issues need to be addressed in the corresponding layers. These issues are recognized in the multiplex histology field and ongoing methodological research is seeking to address them^{46,47,48,49,50}. The statistical methodology employed by NIPMAP provides a degree of robustness to segmentation, cell type and signal attribution errors because niche-phenotype associations are only captured if they occur systematically across cells. Despite that, and until a definitive methodological solution to cellular segmentation, cell type assignment or signal attribution errors is established, spatial phenotypes highlighted by NIPMAP need to be confirmed by overlaying cellular segmentation with the spatial signal distribution of the corresponding phenotypic marker (Supplementary Figs. 9, 10).

NIPMAP identifies novel spatial phenotypes to the best of our knowledge, such as an association between CD45RO and the localization of macrophages in the inflammatory niche of triple-negative breast tumors. CD45 is a hematopoietic marker that can be expressed as different isoforms CD45RA, CD45RB, and CD45RC depending on alternative splicing of its three exons A, B, and C. The CD45RO isoform lacks the A, B, and C exons. The different isoforms are associated with specific phenotypes, at least in the context of lymphocytes cells^29,30. Thus, quantifying the CD45 isoforms specifically is a prerequisite to interpret the observation that CD45RO+ macrophages localize in the inflammatory niche of triple-negative tumors. The existing literature suggests that the CD45RO antibody (UCHL1 clone, Biolegend) used in the MIBI study we re-analyzed here²¹ is specific to the CD45RO isoform of CD45. The UCHL1 clone has been used to specifically quantify CD45RO since the 1980s^51,52,53,54. An early flow cytometry study in T cells found that the abundance of CD45RA and CD45RO abundance—as quantified by UCHL1—correlate negatively, consistent with the specificity of UCHL1 to the CD45RO isoform⁵². Profiling CD45RO by means of the UCHL1 clone alongside CD45RA (by means of the 2H4, F8-11-13, or HI100 clones) is routinely used to discriminate between memory and naive T cells^52,54. In addition, the risk of non-specific quantification of CD45 isoforms in macrophages is mitigated by western blot, flow cytometry and full-length RNA-seq observations that bone marrow-derived macrophages from the spleen, white adipose tissue, liver, and the peritoneal cavity lack the CD45RA, RB and RC isoforms and thus specifically express the CD45RO isoform³⁷. These observations suggest that macrophages specifically upregulate the CD45RO isoform in the inflammatory niche of triple-negative breast tumors.

When performing single-cell analyses, a decision needs to be made regarding the granularity of cell types. For example, T cells could be lumped together with other immune cells or be assigned a more granular type such as Th17 CD4+ T cell. Alternatively, cell typing could have intermediate granularity - such as CD4-T cells—and complemented by phenotypes—such as Th17. This raises the question of how cell type granularity impacts niche identification. In general, one expects optimal niche identification when analyzing cell types at a range of intermediate granularities and poor niche identification when the cell types are insufficiently granular or too granular. This is because insufficient granularity can prevent observing the cell types that characterize a given niche: for example, a vascular niche characterized by pericytes and endothelial cells cannot be identified if cells are coarsely grouped into epithelial vs non-epithelial. Conversely, too granular cell types–for example if each cell has its own type - prevent identifying recurrent patterns in the local cell type composition of tissues to reveal its niches. In the context of the ISS data of Sountoulidis et al.⁴⁴, our findings suggest that the niches identified are robust to the exact number of cell types used.

To interpret spatial phenotypes, markers need to be separated into (1) markers of cell type and (2) markers of phenotypes, for exclusive use during niche identification and niche-phenotype mapping respectively. This is because using a given marker both for cell typing and phenotyping would necessarily identify the phenotype in the niches where the cell type is present, thus biasing niche-phenotype mapping. Both datasets used here comply with this separation. The MIBI data from Keren et al. used 17 lineage markers to define cell types and another 18 functional markers to identify phenotypes²¹. The ISS data of Sountoulidis et al.⁴⁴ used 72 lineage markers to identify cell types and profiled a distinct set of 75 genes from the WNT, SHH, NOTCH, and RTK pathways⁴⁴ which we used here in niche-phenotyping mapping.

NIPMAP can complement existing methods to analyze multiplex histology data. For example, cellular spatial enrichment analysis aims at identifying spatial interactions by looking for pairs of cell types that colocalize more often than expected by random chance, defined by shuffling cell positions within a given niche²¹. By providing an approach to identify niches automatically, NIPMAP could facilitate cellular spatial enrichment analysis. Other spatial analyses such as proximity analysis and nearest neighbor analysis could benefit from NIPMAP’s niche identification in the same way.

To cluster-based methods aimed at identifying discrete cellular structures in multiplex histology data such as community detection based on spatial cellular networks and cellular neighborhood analysis^17,18, NIPMAP adds two principles. The first is that the number and identity of histological niches can be determined by exploiting the simplex geometry of local cellular composition. The second is that the simplex geometry provides a criterion to distinguish niches from interfaces and automatically identify interfaces without parameter tuning. Identifying interface regions is of high interest to tissue biology. In cancer, for example, understanding why immune cells from the inflammatory niche fail to penetrate the cancer niche can suggest therapeutic interventions to remove blocks to anti-tumor immunity⁶.

Here we illustrated NIPMAP on spatial data from MIBI⁷, a protein-based approach, and ISS⁴³, an RNA-based method. Beyond these two technologies, NIPMAP is designed to be applicable to other spatial methods with single-cell resolution such as Imaging CyTOF, Codex, CycIF, 4i, MERFISH, and more^10,11,12,13.

Similar to how community ecology defines ecological niches based on local species covariance, local covariance between cell types is exploited by NIPMAP to identify histological niches. This approach requires assigning a type to each cell based on marker intensities and prior knowledge, with two potential downsides. First, niches identifiable in this way could be limited by previously known cell types. Second, assigning types to individual cells from marker intensities is time-consuming and not guaranteed to be error-free due to segmentation errors, signal misattribution, non-specific antibody binding, auto-fluorescence, molecular exchanges between cells, and more. To address this, it would be desirable for niche identification to be based not on the types of cells but instead on marker intensities of local tissue regions prior to segmentation into cells. Preliminary exploration of this question in the context of the MIBI data of Keren et al.²¹ suggests that niches can potentially be identified without assigning predefined types to cells, resulting in similar niches as cell type-based niche identification (Supplementary Note 4). Future research can further develop this methodology.

In the future, the cellular and phenotypic architecture identified by NIPMAP could support efforts aimed at understanding how healthy tissues maintain function despite the need for constant cellular turn-over⁵⁵ and interpreting the spatial dynamics of niches during histological processes such as development, tissue repair or disease progression^44,56,57.

Methods

Ethics statement

MIBI data of TNBC of Keren et al.²¹: all the MIBI samples came from archival tissue blocks housed in the Stanford Pathology tissue bank that were sourced from primary surgical resections. Since no material was acquired prospectively for the study, acquiring MIBI data on these samples was not deemed human subjects research, and requirements for an ethical permit were waived by the institutional review board.

Fetal lung ISS data of Sountoulidis et al.⁴⁴: ethical permit was obtained by the authors of the initial study from the Swedish National Board of Health and Welfare. The analysis was approved by the Swedish Ethical Review Authority (2018/769-31). The clinical staff of the initial study acquired informed written consent from the donor.

CyTOF data of Wagner et al.³²: tissue were collected after obtaining written informed consent from patients at the University Hospital Basel (Switzerland), the University Hospital Zurich (Switzerland), and in collaboration with the Patient’s Tumor Bank of Hope (PATH, Germany) at the breast cancer centers at St. Johannes Hospital Dortmund and Institute of Pathology at Josefshaus (Germany) and the University Hospital Giessen and Marburg, Marburg site (Germany). The collection was approved by the Ethics Committee Northwest/Central Switzerland (#2016-00067), the Ethics Committee Zurich (#2016-00215), and the Faculty of Medicine Ethics Committee at Friedrich-Wilhelms-University Bonn (#255/06).

We did not carry out sex/gender analysis for two reasons. First, the experimental unit of our study is cells and groups of cells, not human individuals. Second, the aim of our study introduce a computational method that detects spatial patterns of cellular organization in tissue. Such patterns of cellular organization are a hallmark of multi-cellular organisms, including humans, across biological (sex, age) and social groups (genders, ethnicity). Examining if and how specific details in these spatial patterns vary across biological and social groups is beyond the scope of the present study.

NIPMAP methodology overview

Sites are sampled from the tissue and their composition in terms of (predetermined) cell types is estimated with a Gaussian kernel. The main co-variance axes of cellular composition are identified by PCs analysis. Archetype analysis is used to fit a simplex to site cellular composition and thereby identify histological niches. From these niches, the original tissue is spatially segmented into niches and interface regions. Niches and interfaces are associated with phenotypic markers by correlation analysis to (1) summarize tissue phenotypic architecture and (2) identify salient spatial phenotypes.

Processing the MIBI data from Keren et al

From the website of the Angelo lab, we obtained processed MIBI data for 36 protein markers from 41 TNBC patient samples: intensity values, segmented images, and patient data. The 41 samples represented patients aged 26-91 (mean 54.2 years). While gender information was not included, samples are expected to be female because 99% of breast tumor patients are female.

The segmented data (cellData.csv) contained (x, y) coordinates of each cell and its type (out of 17 cell types) as determined by the authors of the study. Following the authors of Keren et al.²¹, patient 30 was excluded from the analysis.

Quantifying cell type density in sites with a Gaussian kernel

Each tissue slide is a 2-dimensional space with cells of a determined cell type as points of coordinates (x, y). We positioned 100 sampling sites randomly on each slide, by drawing the centers from a uniform distribution. In contrast to common practice, sites are not positioned on cells but uniformly across the tissue section. Doing so has the advantage that sites are representative of tissue architecture and unbiased by spatial variation in cellular density.

We generated 4000 sites, 100 sites for each of the 40 slides.

To quantify the abundance of cells of different types at each site, rather than counting cells with a circle of radius r, we used a Gaussian kernel density estimation to decrease counting noise: if a cell is slightly outside the circle of radius r, it is not counting with the first strategy. Counts are thus sensitive to slight changes in the position of the center of the circle. Gaussian kernel density estimation addresses this by weighting cell counts by their distance to the center in a smooth fashion. The weight g of a cell of position x (with x a vector) to the site of center s_k is defined as

$$g({{{{{{{\boldsymbol{x}}}}}}}},{{{{{{{{\boldsymbol{s}}}}}}}}}_{{{{{{{{\boldsymbol{k}}}}}}}}})=\frac{1}{2\pi r}{e}^{-\frac{1}{2}{\left(\frac{\parallel {{{{{{{{\boldsymbol{s}}}}}}}}}_{{{{{{{{\boldsymbol{k}}}}}}}}}-{{{{{{{\boldsymbol{x}}}}}}}}\parallel }{r}\right)}^{2}}$$

(1)

We summed up the density values for each cell type.

We performed PCA, centered and unscaled using the ade4 package of the data analysis software R²⁰.

In positioning sites, we excluded areas of the slides located within distance r from the image edge, in order to decrease edge effects.

We explored a broad range of width r values to examine the robustness of tumor architecture (Fig. 2b). We found that r = 25 μm is the minimal radius allowing to capture cellular architecture (Fig. 2b). This suggests that tumor micro-architecture emerges on a scale of 2-4 cells.

Archetype analysis

Archetype analysis²⁵ aims to fit a d-dimensional simplex as tightly as possible to n data points x. The simplex has p endpoints ${{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\boldsymbol{k}}}}}}}}}\in {{\mathbb{R}}}^{d},k=1,...,p$ which represent the endpoints, also known as archetypes²⁶. By definition, each point x within the simplex can be written as a weighted average of the endpoints

$${{{{{{{\boldsymbol{x}}}}}}}}=\mathop{\sum }\limits_{k=1}^{p}{\alpha }_{k}{{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\boldsymbol{k}}}}}}}}}$$

(2)

with the weights α constrained by 0 ≤ α_k ≤ 1 and $\mathop{\sum }\nolimits_{k=1}^{p}{\alpha }_{k}=1$. We used the tumor samples projected onto the 3 first PCs as input for archetype analysis. We used the Archetypal Analysis python package⁵⁸, with the parameters: n_archetypes = 4, tolerance = 0.001, max_iter = 200, random_state = 0, C = 0.0001, initialize = ’random’, redundancy_try = 30. The output of this algorithm contains a dataset of α_k weights for each tumor sample and the coordinates of the endpoints b_k in the reduced space of 3 PCs.

We set the number of endpoints using elbow criteria on the fraction of variance in the local cellular composition explained by a different number of endpoints. When varying the number of endpoints p, the number d of PCs used for fitting the simplex was always d = p − 1 because p−1 dimensions are generally needed to describe a simplex with p endpoints.

Assessing the robustness of niches to sampling intensity

In order to robustly identify niches while optimizing computation time, we performed an error analysis as a function of the sampling intensity—defined as the ratio of the total area of sites to the tissue area—to test how deeply tissues need to be sampled so as to control for niche cellular composition error.

To minimize the sampling error, we first over-sampled the tissue by collecting a number of sites such that the total area covered 1000% of the tissue area. Over-sampling the tissue minimizes the sampling error because, even when sampling at 100% intensity, random positioning of sites may leave certain tissue areas uncovered by any site. Sites sampled at 1000% intensity were used to determine reference niches for the MIBI dataset of Keren et al.²¹.

We then sampled sites such that the total area covered 300%, 100%, 30%, 10%, 3%, and 1% of the tissue area. At each sampling intensity, sites were sampled 100 times and niches were computed, producing 100 sets of four niches per sampling intensity. The niche estimation error was computed as the RMSE to the reference niches in terms of cellular composition. We plotted the root mean squared error averaged over the 100 repeats at each sampling intensity (Supplementary Fig. 5a).

A 30% sampling intensity—which we used in analyzing the MIBI data of Keren et al.²¹—identified niches with small enough an error to robustly characterize the biology of each niche (Supplementary Fig. 5b) while speeding up computations. If computation time is not an issue, we recommend sampling a number of sites equivalent to 100% of the tissue area or more, as the error is slightly smaller compared to sampling 30%.

Classifying tumor samples into mixed vs compartmentalized using niche weights

To associate NIPMAP’s niche segmentation with the previously proposed mixed vs compartmentalized classification of tumor architecture of Keren et al.²¹, we sorted samples according to the contribution of tumor-immune interfaces relative to the total prevalence of immune niches, following the methodology described by Keren et al.²¹. More specifically, for each sample, the NIPMAP mixing score m was computed as

$$m=\frac{ < {\alpha }_{3}({\alpha }_{1}+{\alpha }_{2}) > }{ < {\alpha }_{1} > + < {\alpha }_{2} > },$$

(3)

where α₁, α₂, α₃ represent the weight of the TLS, inflammatory, and cancer niches, respectively, at a given site, and averaging is performed over sites. The NIPMAP mixing score matched the mixed vs compartmentalized classification of Keren et al. for 37 out of the 40 samples (Supplementary Fig. 3a). We then tested whether the findings of Keren et al. on the association between mixed vs compartmentalized samples (reported in Fig. 5B, E, H of the original study) and the immuno-signaling environment could be reproduced using the NIPMAP mixing score. All three associations reported by Keren et al. could be reproduced using the NIPMAP mixing score (Supplementary Fig. 3b). Cold samples were excluded from the analysis, following the exclusion criteria of Keren et al.

Comparing the spatial variation in cellular composition captured by different numbers of community ecology- and clustering-based niches

We compared how NIPMAP and clustering captured spatial variation in the composition of m cell types across 4000 sites sampled from 40 triple-negative breast tumors analyzed by MIBI.

Iterating over the number of niches (p = 2, …, 17 niches), NIPMAP was performed using p − 1 PCs $U\in {{\mathbb{R}}}^{m,p-1}$ to find p niches ${{{{{{{{\boldsymbol{b}}}}}}}}}_{j}\in {{\mathbb{R}}}^{p-1},j=1,\ldots,p$. We collect all niches b_j into a matrix $B\in {{\mathbb{R}}}^{p-1,p}$.

The percentage of variance in cellular composition explained by NIPMAP niches was computed as follows. For each site k, we compute the niche weights ${{{{{{{{\boldsymbol{\alpha }}}}}}}}}_{k}\in {{\mathbb{R}}}^{p}$ that best fit the site’s cellular composition c_k − c₀ ≃ UBα_k, where c₀ is the average cellular composition of sites (we performed centered PCA). The difference between the site’s best-fitted cellular composition UBα_k + c₀ and the observed site composition c_k is defined as the error ϵ_k = c_k − c₀ − UBα_k. 100% − the ratio of the squared error to the total sum of squares of site cellular is the fraction of explained variance by the niches, $100\%-{\sum }_{i,k}{\epsilon }_{ik}^{2}/{\sum }_{i,k}{({c}_{ik}-{c}_{i0})}^{2},$ where i = 1, …, m represents the cell type.

To determine how k-means clustering captures the spatial variation in cellular composition, we perform clustering to find p niches ${{{{{{{{\boldsymbol{b}}}}}}}}}_{j}\in {{\mathbb{R}}}^{m}$ in the cellular abundance of our 4000 sites. The fraction of variance explained by p niches—the clusters—was computed as 100% − the ratio of the within-clusters sum of squares to the total sum of squares of the site composition data, $100\%-{\sum }_{i,k}{({c}_{ik}-{b}_{i\nu (k)})}^{2}/{\sum }_{i,k}{({c}_{ik}-{c}_{i0})}^{2},$ where ν(k) is the cluster j to which site k belongs.

We note that NIPMAP’s simplex model requires slightly more free parameters compared to k-means clustering, which helps increase the percentage of explained variance. We note that the goal of this analysis is not to find the best trade-off between the number of parameters and goodness of fit, but rather to identify a data structure that summarizes phenotypic spatial architecture in a concise fashion, using a small number of niches that fit well within the cognitive limitations of the humans.

Power analysis of the probability to capture a rare niche

To study the power of NIPMAP to capture a rare niche, we simulated tissue data in which the prevalence of one niche varied while the prevalence of the remaining niches was set to be equal to each other. As a rare niche is expected to be more difficult to identify with little tissue data, we also varied the amount of tissue data in the simulation.

An existing approach to simulate tissue data⁵⁹ requires spatial co-occurrence statistics of cells of different types as an input. Tuning co-occurrence statistics so as to (a) specify the number and cellular composition of niches and (b) vary the abundance of a specific niche while keeping the other niches constant is not trivial. To address this, we designed a tissue simulation approach that can accommodate these two requirements, as follows.

We first simulated the spatial distribution of niche weights α(x, y), where α_i is the weight of niche i at position (x, y) of the tissue, and ∑_iα_i = 1. Simulated tissues should show continuous regions in which a given niche dominates, with smooth spatial transition into the contiguous niche. Drawing upon classical mathematical models of spatial patterns⁶⁰, we reasoned that a reaction-diffusion system in which niches compete locally with each other and diffusion enforces smoothness of niche weights in space could simulate realistic spatial distributions of niche weights.

Experimenting with different reaction-diffusion systems lead to the following equation:

$$\frac{d{\alpha }_{i}(x,y)}{dt}=\beta {\alpha }_{i}\left(\frac{{\alpha }_{i}^{4}}{{\alpha }_{i}^{5}+{K}^{5}}-\frac{1}{2K}\right)+D\left(\frac{{\partial }^{2}{\alpha }_{i}}{\partial {x}^{2}}+\frac{{\partial }^{2}{\alpha }_{i}}{\partial {y}^{2}}\right)$$

(4)

where we set β = 1/day.

The positive term in the first bracket represents cooperative logistic growth. When the weight α_i of niche i is close to 0, there is near-zero niche growth. There is a step-like increase in growth as α_i approaches K, which rapidly saturates around 1, due to the high exponent of α_i (power of 5). By setting K = 1/n where n is the number of niches, we can thus establish growth dynamics in which only one niche wins at each location (x, y).

The negative term in the first bracket of the equation prevents niche weights from growing to infinity, by ensuring that there are two stable fixed points in the absence of diffusion (that is when D ≔ 0): α_i = 0—niche i is absent at (x, y)—and α_i ≃ 2K—niche i is present at (x, y).

The second bracket adds diffusion, to enforce smooth variation in niche weights with respect to space. We set the spatial domain in both x and y to [0, L] with L = 800 μm, the size of a MIBI image, using periodic boundary conditions.

We simulated this system numerically on a 50 × 50 lattice, that is dx = dy = L/50 = 16 μm until convergence using R’s ode.2D solver from the deSolve library. At convergence, the αs were normalized to sum up to 1 at each position (x, y). We used n = 4 niches due to practical relevance to our re-analysis of the MIBI data of Keren et al.²¹. Setting D = 40 μm²/day produced niches whose spatial architecture resembles that of MIBI data (Supplementary Fig. 7a).

We define the prevalence of niche i as L⁻²∫α_i(x, y)dxdy. To simulate tissues in which one niche is more rare—less prevalent—than the others, we altered the initial condition α⁰(x, y). We randomly initialized α⁰(x, y) so that, at each (x, y), the weight of one niche was 1 and the weight of all other niches was 0. To simulate tissues in which niche 1 was less prevalent than other niches, we varied the probability f that α₁(x, y) was 1. The probability that α_i(x, y) = 1 for the other niches i ≠ 1 was set to be equal. This resulted in initial conditions α⁰(x, y) in which the prevalence of niche 1 was f and the prevalence of niche 2, 3 and 4 was (1 − f)/(n − 1).

As expected, tissues represented by the initial condition α⁰(x, y) were unrealistic and unstructured: niches showed no spatial contiguity as niche weights varied abruptly from 0 to 1 from one location of the lattice to the next. Simulating the reaction-diffusion dynamics of α(x, y) defined by Equation (4) to convergence caused niches to compete locally and laterally in space and thereby to establish contiguous areas in which a given niche dominated (Supplementary Fig. 7a).

Varying f in the initial condition α⁰(x, y) from 0.04 to 0.25 in 9 logarithmic steps and simulating the system to convergence generated 9 tissues α(x, y) in the form of matrices 50x50xn in which the prevalence of niche 1 ranged from 0.24% to 30.4% (Supplementary Fig. 7a).

From these nine simulated tissues α(x, y), we simulated multiplex histology data. Multiplex histology data has the form of a table X with n_s rows representing sites and columns representing the local abundance of the different cell types.

To simulate this table X for a given niche prevalence and a given number of MIBI images n_i (amount of tissue data), we first computed the number of sites n_s needed to cover the area of the n_i images, n_s = n_iL²/(πr²), where r = 25 μm is the radius of the site. Simulating sites from more tissue images is equivalent to collecting more sites from a given tissue image in the present scenario that different tissue images are made of the same niches.

We positioned n_s sites in the tissue by sampling their position (x, y) from a uniform distribution between 0 and L. For each site, we determined the local niche weights α(x, y) by linear interpolation, to generate a matrix A of sites (rows) × niches (columns). We simulated local cellular densities as X = BA, where B is the matrix of niche cellular composition, whose rows represent cell types and columns represent niches. For realism, the cellular composition of the niches $B=\left({{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\bf{1}}}}}}}}},{{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\bf{2}}}}}}}}},{{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\bf{3}}}}}}}}},{{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\bf{4}}}}}}}}}\right)$ was set to the four niches and 17 cell types of the MIBI data of Keren et al.²¹.

We passed X to NIPMAP to estimate 4 niches ${\hat{{{{{{{{\boldsymbol{b}}}}}}}}}}_{i},i=1,\ldots,4$. We computed the RMSE ϵ on the estimated cellular composition of niche 1 from

$${\epsilon }^{2}=\frac{1}{17}\mathop{\min }\limits_{i}| | {\hat{{{{{{{{\boldsymbol{b}}}}}}}}}}_{i}-{{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\bf{1}}}}}}}}}| {| }^{2}$$

(5)

where 17 is the number of cell types. The min operator ensures that the estimated niche closest to niche 1 is used to compute the error. This is necessary because niche indices are arbitrary in NIPMAP so that b₁ doesn’t necessarily match ${\hat{{{{{{{{\boldsymbol{b}}}}}}}}}}_{1}$.

We repeated this procedure 100 times for each niche prevalence and number of images n_i. Inspecting the estimated niche composition $\hat{B}$ as a function of the estimation error ϵ suggested that a threshold ϵ < 4.5 × 10⁻⁴ distinguishes simulations in which niche 1 was accurately captured from simulations in which niche 1 failed to be captured. Thus, we estimated the probability to identify the rare niche 1 as the fraction of these 100 simulations for which ϵ < 4.5 × 10⁻⁴ (Supplementary Fig. 7b).

Identifying interface regions

Sites located at interface regions have high weights for more than one niche. Thus, to find sites at the interface of two niches, we compute the product of the niches’ weight and look for sites where this product is high (Fig. 2f).

Interface regions can be defined in two ways. Under one definition, interfaces are found at the contact of two niches. To find these interfaces, we multiply the weights of pairs of niches. In this definition, interfaces are influenced by local cellular density. For example, a low concentration of immune cells next to cancer cells would not qualify as an immune-cancer interface because a low concentration denotes fibrotic regions (Supplementary Fig. 4f).

To identify interfaces between immune and cancer cells based only on cellular composition and independently of cellular density, we can exclude the contribution of the niche of low cellular density (here the fibrotic niche) by setting its weight to 0 at all sites and re-normalizing the weights of the other niches to sum up to 1. The benefit of this definition is that the interface regions identified through this process fit better with a visual impression, as the visual impression is guided more by cell types (colors) than cellular density (Fig. 2a, f).

Processing and analysis of CyTOF data from Wagner et al.

We downloaded the summarized version of the CyTOF experiments of Wagner et al.³². The data table contains cellular proportions of cell types identified by 73 markers in 144 breast tumor samples. The cohort included 144 female patients, aged 29-93 (mean 62.8 years) and 1 male patient. The sample from the male patient was excluded from the analysis.

Cellular composition was organized in hierarchies, for example, the proportion of live cells among all cells, the proportion of cells of the M (myeloid) cluster among live cells, the proportion of M1 cells among the M cluster, and so on.

Wagner et al.³² assigned cell types—tumor-associated macrophages, CD4+ T regulatory cells—to cell clusters (leaf nodes)—M01, T01, and so on. We took over these cellular assignments from Fig. 2D-L of Wagner et al.³² in order to compute the relative composition of each tumor in terms of 12 cell types, chosen to be as similar as possible to the cell types profiled by Keren et al.²¹: cancer cells, fibroblasts, endothelial cells, CD4-T cells, CD8T cells, NK cells, dendritic cells, macrophages, B cells, plasma B cells, healthy tissue, other immune cells. One sample contained less than 50% live cells and was thus removed, keeping 143 samples for the analysis.

We performed PCA, centered and unscaled using the ade4 package of the data analysis software R²⁰. Unscaled PCA was used because all features have the same units (fractional abundance is unit-less). We explored other transformations such as scaling by the standard deviation and log-transformation. Scaling by the standard deviation destroyed much of the covariance structure expected from breast tumor biology, presumably by amplifying sampling noise in low-abundance cell types. Log-transformation resulted in similar niches to the ones presented in the present article but produced curved simplexes which require developing new algorithms in order to fit the simplex to the data.

Showing that inter-patient variation in macroscopic cellular architecture of tumors is constrained by a simplex whose endpoints are the microscopic niches

If inter-patient variation in the macroscopic cellular architecture of tumors is explained by patient-specific usage of universal niches, inter-patient variation in tumor cellular composition must be constrained by a simplex whose endpoints are the microscopic niches.

To see why, let α_j(x) be the local weight of niche j at location x of the tumor. All the weights can be collected into a vector α, with Σ_jα_j = 1 and α_j > 0. We collect the cellular composition of each niche into a matrix B whose entries b_ij indicate the density of cell type i in niche j, in units of inverse volume (1/μm³). With this notation, the local cellular composition c(x) at location x of the tumor is

$${{{{{{{\boldsymbol{c}}}}}}}}({{{{{{{\boldsymbol{x}}}}}}}})=B{{{{{{{\boldsymbol{\alpha }}}}}}}}({{{{{{{\boldsymbol{x}}}}}}}})$$

(6)

The macroscopic cellular composition of the tumor is then obtained by integrating the microscopic cellular composition c(x) over the tumor volume V

$${{{{{{{\boldsymbol{C}}}}}}}}=\frac{1}{V}{\int}_{\!\!\!V}B{{{{{{{\boldsymbol{\alpha}}}}}}}}({{{{{{{\boldsymbol{x}}}}}}}})dx=B\frac{1}{V}{\int}_{\!\!\!V}{{{{{{{\boldsymbol{\alpha }}}}}}}}({{{{{{{\boldsymbol{x}}}}}}}})dx=B{{{{{{{\boldsymbol{\theta }}}}}}}}$$

(7)

Here, one can show that all θ_j are positive and sum up to one. First, since α_i ≥ 0,

$${\theta }_{i}=\frac{1}{V}{\int}_{\!\!\!V}{\alpha }_{i}({{{{{{{\boldsymbol{x}}}}}}}})d{{{{{{{\boldsymbol{x}}}}}}}}\ge 0.$$

(8)

Second,

$$\mathop{\sum}\limits_{i}{\theta }_{i}=\mathop{\sum}\limits_{i}\frac{1}{V}{\int}_{\!\!\!V}{\alpha }_{i}({{{{{{{\boldsymbol{x}}}}}}}})d{{{{{{{\boldsymbol{x}}}}}}}}=\frac{1}{V}{\int}_{\!\!\!V}\mathop{\sum}\limits_{i}{\alpha }_{i}({{{{{{{\boldsymbol{x}}}}}}}})d{{{{{{{\boldsymbol{x}}}}}}}}=\frac{1}{V}{\int}_{\!\!\!V}1d{{{{{{{\boldsymbol{x}}}}}}}}=1.$$

(9)

Therefore, the macroscopic cellular composition of tumor C is the weighted average of the niches B. Macroscopic tumor composition must be bounded by a simplex.

In silico dissection of healthy tissue

Direct comparison of microscopic niches and macroscopic cellular composition data is not possible because the tumor samples of Wagner et al.³² partially include healthy tissue from the tumor margin whereas healthy tissue was not imaged in the samples of Keren et al.²¹. To enable a comparison of the two datasets, we mathematically dissect healthy tissue out of the tumor samples.

After projection onto n−1 PCs v_i, a CyTOF tumor sample C (vector of proportions of 12 shared cell types) can be written as

$${{{{{{{\boldsymbol{C}}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}=\mathop{\sum }\limits_{i=1}^{3}{u}_{i}{{{{{{{{\boldsymbol{v}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}}+\mathop{\sum }\limits_{i=4}^{n-1}{u}_{i}{{{{{{{{\boldsymbol{v}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}}$$

(10)

where μ is the vector of average proportions of each cell type and u_i represents the contribution of PC i to sample C. We then rewrite the first term as a weighted average of four endpoints (cancer, immunity, healthy, fibrotic) computed by archetype analysis in the space of the first 3 PCs to obtain

$${{{{{{{\boldsymbol{C}}}}}}}}-{{{{{{{\boldsymbol{\mu }}}}}}}}=\mathop{\sum }\limits_{j=1}^{4}{\theta }_{j}{{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}}+\mathop{\sum }\limits_{i=4}^{n-1}{u}_{i}{{{{{{{{\boldsymbol{v}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}}$$

(11)

where b_j is the jth endpoint. To dissect the healthy endpoint (endpoint 3), we remove its contribution from the weighted averages:

$$\vec{\gamma }=\left(\frac{{\theta }_{1}}{{\theta }_{1}+{\theta }_{2}+{\theta }_{4}};\frac{{\theta }_{2}}{{\theta }_{1}+{\theta }_{2}+{\theta }_{4}};\frac{{\theta }_{4}}{{\theta }_{1}+{\theta }_{2}+{\theta }_{4}}\right)$$

(12)

Finally, we compute cellular proportions $\vec{{C}_{d}}$ after dissecting the healthy endpoint as

$${{\boldsymbol{{C}}}_{d}}=\mathop{\sum}\limits_{j}{\gamma }_{j}{{{{{{{{\boldsymbol{b}}}}}}}}}_{{{{{{{{\boldsymbol{j}}}}}}}}}+\mathop{\sum }\limits_{i=4}^{n-1}{u}_{i}{{{{{{{{\boldsymbol{v}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}}+{{{{{{{\boldsymbol{\mu }}}}}}}}$$

(13)

For 15 out of the 143 CyTOF samples, the weight of the healthy endpoint was >50%, suggesting that healthy tissue dominated the cellular composition of these samples. These samples were discarded from further analysis because the lower weight of the non-healthy endpoints risked increasing the dissection error.

Mapping cell types across two datasets

The cell types profiled in the CyTOF data of Wagner et al.³² and MIBI data of Keren et al.²¹ overlap only partially: while some cell types are common to both datasets—CD8T cells for example—other cell types were either only profiled in one dataset, or profiled at a different level of specificity—all B cells vs distinguishing B and plasma cells, all CD4 cells vs distinguishing CD4 cells and Tregs. This creates a challenge in comparing the two datasets. To address this, we mapped cell types across the two datasets to compute cellular composition based on cell types common to both datasets.

To do so, we created an incidence matrix G of dimensions m × n with m cell types from MIBI data as rows and n cell types from CyTOF as columns. The entries of the G matrix are set to 0 if the corresponding cell types from the two datasets are different and to 1 if they are identical. A column or a row can have more than one 1 if the MIBI and CyTOF datasets differ in their granularity for the corresponding cell type. This incidence matrix can be represented as a bipartite graph (Supplementary Fig. 7c).

From G, we then derive two matrices G_k, G_w that allow projecting cell proportions from the initial MIBI X_k and CyTOF data X_k (respectively) onto the shared sets of cell types Y_k = X_kG_k and Y_w = X_wG_w by matrix multiplication.

To compute the projection matrix G_k for the MIBI data, we initialize G_k ≔ G. We then sum up all the rows of G. Rows where the sum is larger than 1 represent MIBI cell types that map to multiple, more granular CyTOF cell types. For these rows, we keep all columns with 0s and the first column with 1 in order to keep only the least granular cell type of the two datasets. The kept column is then named according to the row. To compute the projection matrix G_w, we perform the same procedure, reversing rows and columns.

Applying this procedure associates cell types of the MIBI and CyTOF datasets as indicated in Table 2.

Table 2 Cell types from the MIBI data of Keren et al.²¹ and CyTOF data of Wagner et al.³² were aligned to compare the two datasets

Full size table

Hierarchical clustering of cell phenotypes and spatial specificity of phenotypes

To compare phenotypic clustering to the spatial phenotypes identified by NIPMAP, hierarchical clustering was performed on the intensity of 18 phenotypic markers previously classified as functional markers as opposed to lineage markers⁶¹.

Marker intensities were Z-scored within each cell type to facilitate the visualization of phenotypic clusters and assess marker significance. Hierarchical clustering was performed on Z-scored intensities of all 18 phenotypic markers, in 3 cell types (dendritic cells, NK cells, and neutrophils) using euclidean distance and Ward linkage. To serve as a well-controlled comparison to the 10 niches and interfaces found by NIPMAP, 10 phenotypic clusters were determined for each cell type by cutting the hierarchical clustering dendrogram at the height needed to split the dendrogram into 10 groups using R’s cutree function (dendextend package).

To quantify how phenotypic heterogeneity associates with space, we tested how accurately each phenotypic cluster predicted the niche of a given cell. We considered that a given cell was located in a given niche if the weight of that niche was at least 0.5. By tabulating how often cluster membership matched niche location, we computed the sensitivity and specificity of each cluster in predicting the different niches.

Phenotypic clusters are identified without regard to the niche location of cells. We thus asked whether a combined analysis of niche location and phenotypic markers could identify better predictors of the niche location of cells. To do so, we trained linear predictors of the niche weight of each cell based on the intensity of all 18 phenotypic markers. Changing the cut-off on the predicted niche weight beyond which a cell was considered to localize in that niche, we computed how different niche weight cut-offs achieved different sensitivities and specificities (ROC curves).

Quantifying the niche weights of individual cells

To associate cell phenotypes and niches (see next section), the niche weights of each cell need to be determined.

The niche and interface weights are computed for all cells of the dataset, by centering sites on each cell of the dataset. The cellular composition c of each site is determined and the contributions of the different niches to each site α is computed as described above (Eqn. 6) by solving the matrix equation c = Bα, with B, a matrix of the cellular composition of the different niches.

We solve for α, ∑α_i = 1 by quadratic programming using the qpsolvers Python library with the default solver “quadprog". Cells labeled as Unidentified were discarded from the analysis.

Identifying niche-phenotype associations

To identify associations between phenotypic markers and niches in a given cell type, we iterate over markers, niches, and cell types.

For each marker-niche-cell type triplet, we compute Spearman’s rank correlation ρ between marker intensity and niche weight in individual cells of the dataset. Here, a niche can be an individual niche i (quantified as α_i) or an interface region between two niches i and j (quantified as α_iα_j). We only consider combinations of cell types and niches for which there was at least one example of a cell of that type located mostly in this niche (α_i > 1/2). For interfaces, the maximal weight of each niche is 1/2 (not 1 as in niches): thus we only consider combinations of cell types and interfaces i, j for which there was at least one example of a cell of that type with α_iα_j > 1/2(1/2)² = 1/8. In the heatmap visualization of niche-phenotypes associations, the ρ correlation of the marker-niche-cell type triplets that do not meet this criteria is set to 0.

Statistical significance is quantified as a p-value using a two-sided asymptotic t test approximation. The calculation is repeated for all combinations of phenotypic markers of all cell types, and all niches/interfaces.

To correct for multiple testing and focus analysis on the clearest niche-phenotype associations, we compute the false discovery rate³³ and keep only q-values smaller than 0.01. We also require the Spearman correlation to be at least 0.3, a threshold beyond which visual intuition confirms niche-phenotype associations.

We considered higher-order interfaces between niches (three niches and more) but chose not to report them here because niche-phenotype associations were weaker compared to niches and their pairwise interfaces.

In the analysis of the MIBI data of Keren et al.²¹, to prevent false positives in niche-phenotype associations due to spatial signal bleed-over of cancer cell markers to neighboring cells, we filtered out the Keratin 6 and beta-catenin markers in niches dominated by cancer cells, that is the cancer niche and interfaces with the cancer niche. We also filtered out the p53 marker in Tregs of the cancer niche specifically because overlaying the cell segmentation mask and the p53 signal suggested spatial spill-over in Tregs of the cancer niche (Supplementary Fig. 10). We kept the p53 marker in the analysis so as to potentially capture the p53 phenotype in other cell types and other niches, as p53 localized away from cell membranes in most cells (Supplementary Fig. 10), consistent with its known pattern of nuclear accumulation upon DNA damage and other stresses⁶². We were left with 55 cell type and phenotype markers with significant spatial associations.

To summarize the heatmap of niche-phenotype associations into a tissue architecture table (Table 1, Fig. 5e), we iterate over all niches and interfaces. For each of these, we collect cells with phenotypic associations with q < 1% and ρ > 0.3.

To reduce redundancy in reporting phenotypic associations and clarify the niche vs interface specificity of phenotypic markers, we remove phenotypic markers from a niche if that niche has an interface with a larger ρ. Conversely, we remove phenotypic markers from an interface if that interface borders a niche with a larger ρ.

In each niche/interface, we also report cell types without specific niche-phenotype associations that are robustly present in the niche. To identify these cell types, we consider the 1% sites with the highest weight for that niche. For each cell type, we then compute the mean and standard deviation in the abundance of that cell type. Cell types whose mean abundance is at least one standard deviation away from 0 abundance are reported in that niche.

Processing and analysis of in situ sequencing data from Sountoulidis et al.

The pcw13 embryonic human lung dataset from Sountoulidis et al.⁴⁴ was communicated to us by the authors in the form of a table where rows represent the cells of the dataset and columns represent the x- and y-position of the cells, the cells’ type, and RNA quantification (molecular counts) of 89 phenotypic markers. The data can be downloaded from the dedicated github repository (see Data and Code Availability). The data can be visualized interactively on TissUUmap⁶³.

The authors clustered single-cell gene expression profiles to determine the type of individual cells in the dataset. The 73 initial cell types were simplified into 32 cell types to facilitate interpreting the spatial architecture of the lung tissue (Supplementary Data 1).

We performed niche identification on the 100,006 cells from the dataset, using the same approach used for the 40 tumor samples of Keren et al.²¹ above. To accommodate the larger tissue size (6500 μm × 6500 μm) compared to the MIBI data (800 μm × 800 μm), 20,000 sites were generated, so that the total area of sites represented 100% of the tissue area. Computation time was not a limiting factor in this dataset because we performed fewer follow-up analyses compared to the MIBI data of Keren et al.²¹. Niche-phenotype mapping was then performed on an 800 μm × 800 μm region of the tissue illustrated in Fig. 5a.

Visualizing projection of sites on the faces of high-dimensional simplexes

To visualize how a simplex fits the cellular composition of sites when more than 3 dimensions are needed to capture site cellular composition, we visualize the distribution of sites onto the faces of the simplex.

To do so, we iterate through all the faces of the simplex, defined by all combinations of three endpoints i, j, k.

We exclude sites located farthest from the face, as the projection of sites located far from a face provides little insight regarding how well the simplex’s face fits these sites. To do so, we only keep sites for which the combined weight of the endpoints that define the face is at least 50%.

From the coordinates of these endpoints in PC-space, we determine an orthonormal basis for the three endpoints of the face and position the endpoints on that basis: i at (0, 0), endpoint j at (x_j, y_j) and endpoint k at (x_k, y_k). Then, the position p of sites on the face is determined from niche weights α as p = α_i(0, 0) + α_j(x_j, y_j) + α_k(x_k, y_k).

Statistics and reproducibility

Statistics are reported with p value, sample size, and bilaterality in the context of their use in the article. For reproducibility, full R/python code, including random seed, input data files, and documentation can be found on the companion github repository of this manuscript⁶⁴.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data to reproduce the analyses can be downloaded at https://github.com/jhausserlab/NIPMAP⁶⁴. Source data are provided with this paper.

Code availability

The NIPMAP software package and the code to reproduce the analyses can be downloaded at https://github.com/jhausserlab/NIPMAP⁶⁴.

References

Okabe, Y. & Medzhitov, R. Tissue biology perspective on macrophages. Nat. Immunol. 17, 9–17 (2016).
Article CAS PubMed Google Scholar
Alberts, B. et al. Molecular biology of the cell. (Garland Science, New York, NY, 2007).
Gebhardt, R. & Matz-Soja, M. Liver zonation: novel aspects of its regulation and its impact on homeostasis. World J. Gastroenterol. 20, 8491 (2014).
Article PubMed PubMed Central Google Scholar
Murphy, K. & Weaver, C. Janeway’s Immunobiology. (Garland Science, New York & London, 2016).
Willcox, A., Richardson, S. J., Bone, A. J., Foulis, A. K. & Morgan, N. G. Analysis of islet inflammation in human type 1 diabetes. Clin. Exp. Immunol. 155, 173–181 (2009).
Article CAS PubMed PubMed Central Google Scholar
Galon, J. & Bruni, D. Approaches to treat immune hot, altered and cold tumours with combination immunotherapies. Nat. Rev. Drug Discov. 18, 197–218 (2019).
Article CAS PubMed Google Scholar
Angelo, M. et al. Multiplexed ion beam imaging of human breast tumors. Nat. Med. 20, 436–442 (2014).
Article CAS PubMed PubMed Central Google Scholar
Jackson, H. W. et al. The single-cell pathology landscape of breast cancer. Nature 578, 615–620 (2020).
Article ADS CAS PubMed Google Scholar
Ali, H. R. et al. Imaging mass cytometry and multiplatform genomics define the phenogenomic landscape of breast cancer. Nat. Cancer 1, 163–175 (2020).
Article CAS PubMed Google Scholar
Lin, J. R. et al. Highly multiplexed immunofluorescence imaging of human tissues and tumors using t-CyCIF and conventional optical microscopes. eLife 7, e31657 (2018).
Article PubMed PubMed Central Google Scholar
Gut, G., Herrmann, M. D. & Pelkmans, L. Multiplexed protein maps link subcellular organization to cellular states. Science 361, eaar7042 (2018).
Article PubMed Google Scholar
Kennedy Darling, J. et al. Highly multiplexed tissue imaging using repeated oligonucleotide exchange reaction. Eur. J. Immunol. 51, eji.202048891 (2021).
Article Google Scholar
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015).
Article PubMed PubMed Central Google Scholar
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Article PubMed PubMed Central Google Scholar
Maino, N. et al. A microfluidic platform towards automated multiplexed in situ sequencing. Sci. Rep. 9, 3542 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Leslie Gartner. Textbook of Histology. Elsevier, Philadelphia, PA (2021).
Danenberg, E. et al. Breast tumor microenvironment structures are associated with genomic features and clinical outcome. Nat. Genet. 54, 660–669 (2022).
Article CAS PubMed PubMed Central Google Scholar
Schürch, C. M. et al. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell 182, 1341–1359.e19 (2020).
Article PubMed PubMed Central Google Scholar
Goodall, D. W. Objective methods for the classification of vegetation. III. An essay in the use of factor analysis. Aust. J. Bot. 2, 304–324 (1954).
Article Google Scholar
Thioulouse, J. et al. Multivariate analysis of ecological data with ade4. (Springer, New York, NY, 2018).
Keren, L. et al. A structured tumor-immune microenvironment in triple negative breast cancer revealed by multiplexed ion beam imaging. Cell 174, 1373–1387.e19 (2018).
Article PubMed PubMed Central Google Scholar
Bhate, S. S., Barlow, G. L., Schürch, C. M. & Nolan, G. P. Tissue schematics map the specialization of immune tissue motifs and their appropriation by tumors. Cell Syst. 13, 109–130.e6 (2022).
PubMed Google Scholar
Hoch, T. et al. Multiplexed imaging mass cytometry of the chemokine milieus in melanoma characterizes features of the response to immunotherapy. Sci. Immunol. 7, eabk1692 (2022).
Article CAS PubMed Google Scholar
Bioucas-Dias, J. M. et al. Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 5, 354–379 (2012).
Article ADS Google Scholar
Cutler, A. A. & Breiman, L. Archetypial analysis. Technometrics 36, 338–347 (1994).
Article MathSciNet MATH Google Scholar
Mørup, M. & Hansen, L. K. Archetypal analysis for machine learning and data mining. Neurocomputing 80, 54–63 (2012).
Article Google Scholar
Shoval, O. et al. Evolutionary trade-offs, pareto optimality, and the geometry of phenotype space. Science 336, 1157–1160 (2012).
Article ADS CAS PubMed Google Scholar
Hart, Y. et al. Inferring biological tasks using Pareto analysis of high-dimensional data. Nat. Methods 12, 233–235 (2015).
Article CAS PubMed Google Scholar
Jackson, S. M. et al. CD45RO enriches for activated, highly mutated human germinal center B cells. Blood 110, 3917–3925 (2007).
Article CAS PubMed PubMed Central Google Scholar
Saunders, A. E. & Johnson, P. Modulation of immune cell signalling by the leukocyte common tyrosine phosphatase, CD45. Cell. Signal. 22, 339–348 (2010).
Article CAS PubMed Google Scholar
Dabbs, D. J. Breast pathology. (Elsevier Health Sciences, Philadelphia, PA, 2016).
Wagner, J. et al. A single-cell atlas of the tumor and immune ecosystem of human breast cancer. Cell 177, 1330–1345.e18 (2019).
Article PubMed PubMed Central Google Scholar
Strimmer, K. fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24, 1461–1462 (2008).
Article CAS PubMed Google Scholar
Mysore, V. et al. FcgR engagement reprograms neutrophils into antigen cross-presenting cells that elicit acquired anti-tumor immunity. Nat. Commun. 12, 4791 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Axelrod, M. L., Cook, R. S., Johnson, D. B. & Balko, J. M. Biological consequences of MHC-II expression by tumor cells in cancer. Clin. Cancer Res. 25, 2392–2402 (2019).
Article CAS PubMed Google Scholar
Kessler, A. L., Bruno, M. J. & Buschow, S. I. The potential of soluble human leukocyte antigen molecules for early cancer detection and therapeutic vaccine design. Vaccines 8, 775 (2020).
Article CAS PubMed PubMed Central Google Scholar
Embgenbroich, M. et al. Soluble mannose receptor induces proinflammatory macrophage activation and metaflammation. Proc. Natl. Acad. Sci. USA 118, e2103304118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jacob, J. T., Coulombe, P. A., Kwan, R. & Omary, M. B. Types I and II keratin intermediate filaments. Cold Spring Harb. Perspect. Biol. 10, a018275 (2018).
Article PubMed PubMed Central Google Scholar
Barron, M. & Li, J. Identifying and removing the cell-cycle effect from single-cell RNA-sequencing data. Sci. Rep. 6, 33892 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Shaffer, S. M. et al. Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistance. Nature 546, 431–435 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Battich, N., Stoeger, T. & Pelkmans, L. Control of transcript variability in single mammalian cells. Cell 163, 1596–1610 (2015).
Article CAS PubMed Google Scholar
Carmona-Fontaine, C. et al. Metabolic origins of spatial organization in the tumor microenvironment. Proc. Natl. Acad. Sci. USA 114, 2934–2939 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Ke, R. et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods 10, 857–860 (2013).
Article CAS PubMed Google Scholar
Sountoulidis, A. et al. A topographic atlas defines developmental origins of cell heterogeneity in the human embryonic lung. Nat. Cell Biol. 25, 351–365 (2023).
CAS PubMed PubMed Central Google Scholar
Kerr, B. A. et al. Stability and function of adult vasculature is sustained by Akt/Jagged1 signalling axis in endothelium. Nat. Commun. 7, 10960 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Bai, Y. et al. Adjacent cell marker lateral spillover compensation and reinforcement for multiplexed images. Front. Immunol. 12, 652631 (2021).
Article CAS PubMed PubMed Central Google Scholar
Schmidt, U., Weigert, M., Broaddus, C., & Myers, G. Cell detection with star-convex polygons. In: Lecture Notes in Computer Science, volume 11071, 265–273. (Springer, 2018).
Greenwald, N. F. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol. 40, 555–565 (2022).
Article CAS PubMed Google Scholar
Brbić, M. et al. Annotation of spatially resolved single-cell data with STELLAR. Nat. Methods 19, 1411–1418 (2022).
Article PubMed Google Scholar
Zhang, W. et al. Identification of cell types in multiplexed in situ images by combining protein expression and spatial information using CELESTA. Nat. Methods 19, 759–769 (2022).
Article CAS PubMed PubMed Central Google Scholar
Smith, S. H., Brown, M. H., Rowe, D., Callard, R. E. & Beverley, P. C. Functional subsets of human helper-inducer cells defined by a new monoclonal antibody, UCHL1. Immunology 58, 63–70 (1986).
CAS PubMed PubMed Central Google Scholar
Richards, S. J., Jones, R. A., Roberts, B. E. & Scott, C. S. T-cell membrane CD45RA (2H4) and CD45RO (UCHL1) determinants: I, diverse patterns of expression in mature (Post-Thymic) T-cell proliferations. Leuk. Lymphoma 4, 27–37 (1991).
Article CAS PubMed Google Scholar
Torimoto, Y. et al. Coassociation of CD26 (dipeptidyl peptidase IV) with CD45 on the surface of human T lymphocytes. J. Immunol. 147, 2514–2517 (1991).
Article CAS PubMed Google Scholar
Ponsford, M. et al. Differential responses of CD45+ve T-cell subsets to MBP in multiple sclerosis. Clin. Exp. Immunol. 124, 315–322 (2001).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. et al. Circuit design features of a stable two-cell system. Cell 172, 744–757.e17 (2018).
Article PubMed PubMed Central Google Scholar
Ben-Moshe, S. et al. The spatiotemporal program of zonal liver regeneration following acute injury. Cell Stem Cell 29, 973–989.e10 (2022).
Article PubMed Google Scholar
Davidson, S. et al. Single-cell RNA sequencing reveals a dynamic stromal niche that supports tumor growth. Cell Rep. 31, 107628 (2020).
Article CAS PubMed PubMed Central Google Scholar
Motevalli Soumehsaraei, B., & Barnard, A. Archetypal analysis package. CSRIO https://researchdata.edu.au/archetypal-analysis-package/1424520 (2019).
Baker, E. A. G., Schapiro, D., Dumitrascu, B., Vickovic, S. & Regev, A. In silico tissue generation and power analysis for spatial omics. Nat. Methods 20, 424–431 (2023).
Article CAS PubMed PubMed Central Google Scholar
Turing, Alan The chemical basis of morphogenesis. Philos. Trans. R. Soc. 237, 5–72 (1952).
MathSciNet MATH Google Scholar
Patwa, A. et al. Multiplexed imaging analysis of the tumor-immune microenvironment reveals predictors of outcome in triple-negative breast cancer. Commun. Biol. 4, 852 (2021).
Article PubMed PubMed Central Google Scholar
Maki, C. G. p53 localization. In: Molecular Biology Intelligence Unit, 117–126. (Springer, Boston, MA, 2010).
Solorzano, L., Partel, G. & Wählby, C. TissUUmaps: interactive visualization of large-scale spatial gene expression and tissue morphology data. Bioinformatics 36, 4363–4365 (2020).
Article CAS PubMed PubMed Central Google Scholar
El Marahi, Anissa, Kang, Ziqi, Eroğlu, Alper, Mohseni, Raziyeh & Hausser, Jean. NIPMAP: niche-phenotype mapping of multiplex histology data by community ecology. Github https://doi.org/10.5281/zenodo.10075737 (2023).

Download references

Acknowledgements

The authors thank members of the Hausser lab for critical discussion, and specifically Petter Säterskog, Antony Cougnoux, Guilhem Panneau. We also thank Leeat Keren, Sergio Salas, and Benjamin Towbin for discussions and input on this project. The authors acknowledge the support from the Swedish Cancer Fund (21 1731 Pj), the Swedish Research Council (2018-02530), SciLifeLab, and Karolinska Institutet (all to J.H.).

Funding

Open access funding provided by Karolinska Institute.

Author information

Authors and Affiliations

Department of Cell and Molecular Biology, Karolinska Institutet, Stockholm, 171 77, Sweden
Anissa El Marrahi, Fabio Lipreri, Ziqi Kang, Louise Gsell, Alper Eroglu, David Alber & Jean Hausser
SciLifeLab; Solna, Stockholm, 171 65, Sweden
Anissa El Marrahi, Fabio Lipreri, Ziqi Kang, Louise Gsell, Alper Eroglu, David Alber & Jean Hausser

Authors

Anissa El Marrahi
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Lipreri
View author publications
You can also search for this author in PubMed Google Scholar
Ziqi Kang
View author publications
You can also search for this author in PubMed Google Scholar
Louise Gsell
View author publications
You can also search for this author in PubMed Google Scholar
Alper Eroglu
View author publications
You can also search for this author in PubMed Google Scholar
David Alber
View author publications
You can also search for this author in PubMed Google Scholar
Jean Hausser
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.H. conceived and supervised the research. F.L. prototyped the NIPMAP approach on MIBI data with input from D.A. A.E.M. performed the research, developed the NIPMAP code, and performed all MIBI and CyTOF analyses of the article. Z.K. analyzed the ISS data. L.G. and A.E. performed complementary analyses. J.H. wrote the manuscript with input from A.E.M., Z.K., L.G., and A.E. J.H. acquired funding. All authors edited or commented on the manuscript.

Corresponding author

Correspondence to Jean Hausser.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nat. Commun. thanks Daniel Schulz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

El Marrahi, A., Lipreri, F., Kang, Z. et al. NIPMAP: niche-phenotype mapping of multiplex histology data by community ecology. Nat Commun 14, 7182 (2023). https://doi.org/10.1038/s41467-023-42878-z

Download citation

Received: 02 March 2023
Accepted: 24 October 2023
Published: 07 November 2023
DOI: https://doi.org/10.1038/s41467-023-42878-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.