NIPMAP: niche-phenotype mapping of multiplex histology data by community ecology

Advances in multiplex histology allow surveying millions of cells, dozens of cell types, and up to thousands of phenotypes within the spatial context of tissue sections. This leads to a combinatorial challenge in (a) summarizing the cellular and phenotypic architecture of tissues and (b) identifying phenotypes with interesting spatial architecture. To address this, we combine ideas from community ecology and machine learning into niche-phenotype mapping (NIPMAP). NIPMAP takes advantage of geometric constraints on local cellular composition imposed by the niche structure of tissues in order to automatically segment tissue sections into niches and their interfaces. Projecting phenotypes on niches and their interfaces identifies previously-reported and previously-unreported spatially-driven phenotypes, concisely summarizes the phenotypic architecture of tissues, and reveals fundamental properties of tissue architecture. NIPMAP is applicable to both protein and RNA multiplex histology of healthy and diseased tissue. An open-source R/Python package implements NIPMAP.


Introduction
The function of healthy tissues and their disruption in disease depends on the cooperation between cells of different types: hepatocytes in the liver, neurons in the nervous system, immune cells, endothelial cells, fibroblasts, and more [1].
To carry out their functions, cells adopt different phenotypes such as activated or quiescent, adhesive or motile, proliferative or senescent [2].According to the histological principle of functional zonation, cell types and their phenotypes organize spatially to facilitate tissue function [3].For example, in the liver, hepatocytes perform different functions depending on their position along an artery-vein axis [3].In the lymph node, B cells need to relocalize from the B-cell zone to T cell zone to potentiate antibody-mediated immunity [4].
The disruption of this organization can directly contribute to disease progression and guide clinical decisions.For example, disruption of pancreatic islet architecture through influx of T cells correlates with the onset of type 1 diabetes [5].In cancer, tumors can be stratified by the density and distribution of cytotoxic T cells [6]: tumors with dense and uniform T cell infiltration respond best to immune checkpoint inhibitors whereas tumors in which T cells are segregated away from cancer cells show poorer response [6].Thus, tissue biology and medicine can benefit from characterizing the spatial organization of cell types and their phenotypes in tissues.
In revealing the spatial organization of cells and their phenotypes in tissues, classical techniques such as histology and immunofluorescence imaging are limited to a handful of molecular markers.But in recent years, advances in mass spectrometry such as Multiplexed Ion Beam Imaging (MIBI) [7] and Imaging CyTOF [8,9] have allowed quantifying dozens of protein and non-protein markers with single-cell resolution.Doing so has also become feasible by multiplex immuno-fluorescence microscopy thanks to protein-based methods such as t-CycIF [10], 4i [11], Codex [12] as well as RNA-based methods such as MERFISH [13] and in situ sequencing [14,15].After image processing and cell segmentation, multiplex histology produces rich data in the form of a cell-by-feature table, where features represent the cells' 2D position in the sample, their types and quantification of dozens to thousands of molecular markers (Fig. S1A).Markers are typically chosen to identify the type of cells -hepatocytes, neurons, immune, endothelial cells, . . .-and their phenotype.
The recent increase in the number of cell types and phenotypes that can be surveyed in tissue sections leads to a combinatorial challenge in interpreting the data.For example, visualizing the spatial architecture of cellular phenotypes in a multiplex histology dataset of 15 cell types with 15 phenotypic markers in 40 samples requires surveying 9,000 images (15 cell types x 15 markers x 40 samples), each with 10,000 to 1,000,000 cells depending on the imaging technology.
Identifying spatial phenotypic interactions -for example what cancer phenotypes associate with local suppression of anti-cancer immune activity -is even more daunting: co-visualizing all possible pairs of 15 phenotypes for 15 cell types produces 50,000 images (15 cell types x 15 markers to the square) from a single tissue section.An additional layer of complexity is that phenotypes may only interact in specific tissue regions, or at the interface between specific histological niches.To address these combinatorial challenges, a systematic approach is needed to summarize the cellular and phenotypic architecture of tissues and identify its most salient features.
Tissues are structured into histological niches [16].Within each niche, each cell type has a specific density, defined as the abundance of cells of that type per surface area of the niche.The niche recurs over the tissue section so that a limited number of niches is sufficient to capture the tissue's cellular architecture, by piecing niches together.In this view, interpreting the tissue architecture from multiplex histology data consists in (a) identifying histological niches and (b) segmenting the image into these niches (Fig. 1A).
To automatically identify histological niches, one can determine the local cellular composition at numerous sampling sites -defined as cells found within tissue areas of a given size -across a tissue section.Alternatively, sampling sites can be groups of contiguous cells, identified by graph-based community methods [17].The histological niches of the tissue are then revealed as clusters of sites with similar cellular composition [18,17].
While the clustering approach has found numerous successful applications in interpreting multiplex histology data so far [18,17], interpreting the data can benefit from ideas from the field of community ecology, a field with a long history of uncovering spatial patterns [19,20].
Community ecology studies how different species -the ecological analog to our cell types -co- When niches cover broad tissue areas with little interface, sites mainly fall within a certain niche and their cellular composition thus clusters by niche.C. When niches feature large interface regions, sites often cover more than a single niche so that site cellular composition is a mix of the two niches.Consequently, in the case of two niches, the cellular composition of sites describes a linear segment.Sites from the core region of a given niche are found at the extremities of the segment while sites in the middle of the segment represent interface regions.
habit in different spatial niches -our histological niches.Within each niche, sites are selected and fieldwork is performed to quantify the species composition at these sites, similar to how multiplex histology surveys cellular composition across a tissue section.
Up until the 1950ies, sites and species were then clustered to reveal the organization of species in the different niches (clusters of sites on Fig. 1B).But since the work of Goodall [19], sites are scattered on axes that represent cellular composition using mathematical procedures such as principal components analysis (PCA), so that the proximity of sites reflects the similarity of their cellular composition.
Positioning sites on axes of cellular composition allows examining if clustering sites is justified.In the eventuality that sites do not form clusters, the revealed structure of sites can suggest interpretations that better suit the data than clusters (Fig. 1C).
Applying the community ecology approach to multiplex histology data reveals a caveat of clustering sites to identify niches.Sites will form clusters of cellular composition if histological niches occupy distinct areas of the tissue with few interfaces, so that sites belong to only one niche (Fig. 1B).But, when niches co-localize and form larger interfaces, many sites lie at the interface of niches.Because the cellular composition of these sites is a mix (a weighted average) of the corresponding niches, no clear clusters can be distinguished by scattering the cellular composition of sites (Fig. 1C).
Instead, in the case of a two-niche tissue, sites describe a segment in cellular composition space.At each extremity of the segment, we find sites located in the core of the corresponding niche (Fig. 1C).
Sites located at the interface between two niches fall in the middle of the segment.
The notion that the local cellular composition of tissues does not necessarily form clusters of cellular composition can help interpret multiplex histology data in two ways.First, when sites do not form clusters, many clusters can be needed to describe tissue architecture, potentially leading to an inflation of clusters of unclear histological significance that over-complicate our view of tissue architecture with little scientific benefit.Interpreting local cellular composition as a continuum defined by a parsimonious number of niches can help address this.
Second, interfaces between niches are of interest to interpret tissue dynamics, for example, how tumor progression associates with the biology of the cancer-immune interface [6].Yet, existing approaches to finding interface regions require parameter tuning to specify their cellular composition or the local image properties of interfaces, a time-consuming and potentially subjective process [21,22,23].
Recognizing that sites that fall in the middle of the segment represent interface regions can identify interfaces automatically.
Here, we implement the community ecology approach for niche-phenotype mapping (NIPMAP) of multiplex histology data.Applying NIPMAP to protein and RNA multiplex histology of healthy and pathological tissues reveals unexpected geometry in the cellular composition of sites: sites do not form clusters of cellular composition but instead fall on simplexes, the geometric generalization of triangles or triangular pyramids to arbitrary dimensions.These simplexes are automatically identified using algorithms from satellite image analysis to explain spatial variation in the cellular composition of tissues in terms of histological niches and their interfaces.Projecting cellular phenotypes onto niches and their interfaces reveals known and novel spatial phenotypes, and concisely summarizes how these phenotypes associate with niches and their interfaces.Finally, analyzing the niche-interface architecture of tissues uncovered by NIPMAP reveals that (a) spatial context is a stronger determinant of phenotype than cell-autonomous effects, and (b) both niches and their interfaces structure the cellular and phenotypic architecture of tissues.

Results
2.1 Community ecology niches offer a concise and accurate framework to interpret the cellular architecture of tissues We illustrate the community ecology approach in a multiplex histology dataset of 17 cell types in 40 triple-negative breast tumor samples (Fig. 2A, Fig. S1A, Methods 4.1) from Keren et al. [21].
We determined the cellular composition -the number of cells of each type per unit area (Methods 4.2) -of 4000 sites: 100 sites per tumor sample in each of our 40 tumor sections.
Sites are points positioned on axes that represent cellular composition.Because there are 17 cell types, 17 axes (dimensions) are needed in principle.This creates a representation challenge as human intuition is limited to 3 dimensions.However, two notions decrease the number of axes required to interpret tissue architecture.First, the abundance of certain cell types varies little across sites and thus contributes little to tissue architecture so that the corresponding axes can be neglected.Second, the abundance of cell types can correlate across sites -for example because the cells cooperate in performing a tissue function -so that these cell types can be grouped into a single axis.The axes that optimally capture site cellular composition can be determined automatically by PCA, following the community ecology approach.
To interpret tissue architecture, it is important to set the radius of sampling sites to an appropriate size.Sites need to be large enough to capture local coordination in cellular composition and small enough to avoid blurring this coordination across different niches.To determine an appropriate radius, local coordination was quantified by the number of axes -principal components (PCs) -needed to capture spatial variation in cellular composition.When sampling sites are too small, they cover only one cell at a time: there is little covariance in the cellular composition of sites, and many axes (PCs) are thus needed to capture the cellular composition of sites.Increasing the radius of sites to include neighboring cells reveals covariance structures so that a smaller number of axes (PCs) is sufficient in capturing site cellular composition.
We found that 8 and more PCs are required to capture site cellular composition when the site radius is smaller than 10µm (∽ 1 cell, Fig. 2B).When sites have a 25µm radius, three PCs are enough to capture 82% of the variance in site cellular composition (Fig. 2B-C).A site radius of 25µm implies that cellular coordination emerges at a length-scale of 2-4 cells.Increasing site radius beyond 25µm uncovered little novel covariance.We thus set the sampling site radius to 25µm.
Scattering sites on three PCs revealed no clear clusters (Fig. 2C).Instead, sites described a continuum with the shape of a 3D simplex: a pyramid with a triangular basis.
This observation has significance for interpreting tissue architecture.Any point within a simplex can be described as a weighted average of the end-points that define the simplex (Fig. 2D).Thus, observing that sites are constrained by the geometry of a 3D simplex implies that local cellular composition of the tissue is a mix (weighted average) of four histological niches, the end-points of the 3D simplex (Fig. 2D).Sites close to end-points represent cores within the niches, whereas sites halfway between end-points localize at the interface between two niches (Fig. 2D).This interpretation generalizes the  [24] was projected on its first two PCs after removing the contribution of healthy tissue.The four niches computed from the MIBI data of Keren et al. [21] were then projected on the same space.
two-niche-and-interface interpretation of continua of site composition introduced in Fig. 1 to more than two niches.
Coloring sites according to their position in the simplex -the contribution of the four niches to site cellular composition -segments tissue sections into histological niches (Fig. 2E).In this niche view of tissue architecture, spatial variation in cellular composition is explained by a locally varying mix of the four niches (Fig. 2E, Fig. S1B).
The simplex end-points -and thus the niches -can be identified automatically using hyperspectral unmixing algorithms from the field of satellite imaging [25] or by archetype analysis from machine-learning [26,27].We used the latter algorithms in the present analysis (Methods 4.3).The statistical significance of fitting a simplex to sites (here p < 0.001, n = 4000 sampling sites, one-sided) was quantified using the t-ratio test [28,29].
To automatically delineate interface regions between any pair of niches, we note that sites located at the interface between two niches have high weights for both niches.Thus, multiplying the weights of these niches specifically produces high scores for sites located at the interface between the two niches (Fig. 2F).Sorting samples by increasing prevalence of tumor-immune interfaces (Methods 4.5) recovered the mixed vs compartmentalized sample classification proposed by Keren et al. (Fig. S1C) as well as previously reported differences in the immuno-signaling environment of mixed vs compartmentalized samples (Fig. S1D).
Examining the niches' cellular composition allows interpreting the biology of each niche (Fig. 2G).
The light blue niche is characterized by a high density of cancer cells (Fig. S1E).The black niche features mostly macrophages and mesenchymal-like cells at low density (Fig. 2G) and could thus represent the fibrotic, necrotic niche.In the red niche, we find a mix of CD68 + and MHC-II + macrophages (Fig. S1F), CD8 and CD4 T cells, and natural killer (NK) cells (Fig. 2G).A regulatory T cell phenotype can be excluded for the CD4 T cells in this niche as T regulatory cells were assigned their own cell type based on co-expression of CD4 and FOXP3 [21] (Fig. S1G).This combination of cell types suggests a type 1 inflammatory region whose function is to trigger anti-cancer immunity [4].The pink niche may represent the tertiary lymphoid structure (TLS): we find MHC-II + and CD45RO + B cells and CD4 T cells (Fig. 2G), Fig. S1H-I).MHC-II and CD45RO are both expressed by B cells activated by antigen recognition [30,31].
We note that niches were determined by collecting 100 sites per sample, so that the total area covered by sites represents 30% of the image area.Such a sampling intensity is sufficient to accurately identify niches while speeding up computations (Fig. S1K-L, Methods 4.4).
While the four niches correspond to known histological areas from breast pathology [32], clusteringbased niches often find a dozen of histological niches [18,9,8].More clusters potentially allow a more accurate description of tissue architecture, at the cost of increased complexity.To determine if increasing the number of niches improves the accuracy of tissue description, we quantified how accurately different numbers of clustering-and community ecology-based niches captured site cellular composition (Methods 4.6).We find that 4 community ecology-based niches capture 82% of the inter-site variance in cellular composition while 4 clustering-based niches capture 58% of the variance (Fig. 2H).To describe tissue architecture as accurately as 4 community ecology-based niches, more than 15 clustering-based niches are needed (Fig. 2H).Thus, community-ecology-inspired histological niches provide an accurate yet concise description of tissue architecture.
The community ecology approach can also address artifacts of clustering.For example, sites with similar cellular composition can be assigned to different clusters (Fig. 2I), and a given cluster can contain sites both in the niche core and at its interface (Fig. 2I).
Decreasing or increasing the number of niches from 4 niches down to 2 niches or up to 7 niches causes niches to merge into more coarse-grained niches or split into increasingly fine-grained sub-niches (Fig. S1M-N).While we used four niches here to balance accuracy and conciseness, this balance can be tuned by adjusting the number of the niches up or down to zoom in or out on the complexity of tissue architecture.
The number of identifiable niches depends on their prevalence and on the amount of available tissue data.A power analysis based on tissue simulations suggests that a single MIBI image is sufficient to capture a niche that covers 3% of the tissue area or more (Methods 4.7, Fig. S1O-P).The probability to capture a rare niche scales as the product of niche prevalence times data size, so that increasing the amount of data allows identifying rarer niches.For example, the tissue area of 6 MIBI images allows identification of niches that represent a fraction of percent of the tissue area (Fig. S1P).

Niches identified by NIPMAP generalize across patients of different breast cancer types and connect the microscopic and macroscopic levels of tumor architecture
The four niches are shared across tissue sections.Certain tissue sections make use of all niches (for example patient 35 in Fig. 2J) while others use only a few niches (patient 4 for example, Fig. 2J).
The observation that tissue sections from different cancer patients are composed of the same niches allows connecting the microscopic cellular architecture of tumors -revealed by multiplex histologywith their macroscopic cellular architecture -revealed by non-spatial methods such as flow cytometry or single-cell mass cytometry.If tumors from different patients are made of the same niches, interpatient variation in the macroscopic cellular composition of tumors is expected to fall on a simplex whose endpoints represent the four niches (Methods 4.10, Supplementary Note 1).
To test this prediction, we compared the macroscopic cellular composition of 128 breast tumors measured by CyTOF [24] with the microscopic niches found in the multiplex histology data of 40 triplenegative tumors from Keren et al. [21].The macroscopic data of Wagner et al. [24] represents dissociated samples with a volume of 100mm 3 , 7 orders of magnitude larger than the microscopic sampling sites we employed in analyzing the data from Keren et al. [21] ((25µm) 3 = 1.6 × 10 −5 mm 3 ).The data from Wagner et al. [24] also originates from different breast cancer types -ER+, PR+, Her2+, and triple negative: this allows testing if microscopic niches identified in the triple-negative breast tumors of Keren et al. [21] generalize across breast cancer types.
As predicted, inter-patient variation in macroscopic cellular composition falls on a low-dimensional simplex bound by the four microscopic niches (Fig. 2K).Cancer and fibrotic niches occupy their own corner of the simplex, as expected.Unexpectedly, the TLS and inflammatory niches share the same corner despite having different cellular compositions (Fig. 2G).This is because the prevalence of the TLS and inflammatory niches is macroscopically coupled in tumors (Supplementary Note 2).The four niches also explain why certain combinations of cell types are found in tumors and why others can never be observed (Supplementary Note 3).
In summary, community-ecology-based histological niches emerge from the local cellular composition of tissues at a length scale of 2-4 cells.Niches can be identified automatically by algorithms from satellite image analysis and machine learning.They have a clear histo-pathological interpretation and provide a concise yet accurate description of tumor architecture that generalizes across patients, tumor types and the microscopic-macroscopic levels of tumor architecture.This suggests that communityecology-based niches can provide an objective foundation to interpret tissue architecture.

Niche-phenotype mapping identifies spatial phenotypes and summarizes the phenotypic architecture of tissues
Having identified histological niches and segmented tissue sections accordingly, we determined how niches and their interfaces associated with cellular phenotypes.
To do so, we took advantage of single-cell, spatial measurements of 18 phenotypic markers also profiled by Keren et al. [21] alongside the 17 lineage markers used to determine cell types (Fig. S2A).
We looked for phenotypic markers whose intensity associates with the position of cells in a given niche or at a given interface.The position of cells in a niche was quantified as the weight of that niche.
Similarly, the position of cells at an interface between two niches was quantified as the product of the weights of these two niches.We then correlated the niche/interface weight with the intensity of phenotypic markers (Spearman's rank order correlation coefficient ρ, P-values: two-sided t test, false discovery rates from Strimmer [33], Methods 4.13).
Statistically significant niche-phenotypes associations (fdr< 1%, ρ > 0.3) were ordered by cell types and visualized as a heatmap (Fig. 3A).To explore these associations in a phenotype-centered rather than in a cell-type-centered manner, niche-phenotypes associations can also be sorted by phenotypes (Fig. S2B).Out of the 3,040 possible niche-phenotypes associations in this dataset (16 cell types x 19 phenotypic markers x 10 niches and interfaces), significant associations were reported as a table which concisely summarizes the phenotypic architecture of the tissue (Table 1).
Niche-phenotype mapping recovered expected spatial phenotypes.For example, among B cells, the HLA-DR (MHCII) phenotype associated with the TLS niche while HLA-DR negative cells -presumably plasma cells -localize in other niches (Fig. 3B) [4].Neutrophils and tumor cells with an HLA-DR (MHCII) phenotype localized in the inflammatory region (Fig. 3C, Table 1).While MHCII expression is normally restricted to antigen-presenting cells of the immune system [4] -dendritic cells, macrophages, B cells -MHCII+ neutrophils are emerging as novel actors in anti-tumor immunity [34].MHCII expression in tumor cells has also been reported previously and associates with positive prognosis [35].
In keratin-positive tumor cells, the MHCI marker associated with the interface of cancer and inflammation niches (Fig. 3D, Fig. S3).This suggests that MHCI expression in tumor cells could determine the position of the cancer-inflammation interface.Alternatively, the proximity of the inflammatory niche could induce MHCI in neighboring cancer cells or secrete MHCI as a soluble form [36] (Fig. S3).
Niche-phenotype mapping also highlighted unexpected spatial phenotypes.CD45RO+ macrophages and dendritic cells localized in the inflammatory niche (Fig. 3E, Fig. S3).CD45, a commonly-used marker of bone marrow-derived immune cells, has several splicing isoforms.The CD45RO isoform is a marker of activated and memory T cells as well as activated B cells with highly mutated B cell  receptors (BCR) [30,31].In the context of macrophages, CD45RO was previously reported to be the dominant CD45 isoform and to function as a cell adhesion receptor that inhibits pro-inflammatory macrophages [37].The literature suggests that the CD45RO signal we analyze here is specific to the CD45RO isoform (see Discussion).
Keratin 6, a highly abundant protein that forms intermediate filaments in the cytoskeleton of epithelial cells [38] was found in dendritic cells of the inflammatory niche (Fig. 3F), perhaps because DCs first migrate to the cancer niche where they take up Keratin 6, before re-localizing to the inflammatory niche.
Neutrophils located at the interface of cancer and inflammatory niches are positive for the immunosuppressive markers IDO and PD-L1 (Fig. 3G).This suggests a potential role of neutrophils in facilitating the immune escape of cancer cells.
2.4 Both niches and their interfaces structure spatial variation in cellular phenotypes.
NIPMAP can find use in exploring fundamental questions regarding the cellular and phenotypic architecture of tissues.
We illustrate this by examining the origin of phenotypic heterogeneity of the tumor micro-environment.
This heterogeneity could originate from the spatial context of cells, with local signaling cues determining cellular phenotypes.Phenotypic heterogeneity could also stem from cell-autonomous phenomena such as catching cells at different points of differentiation trajectories or from stochasticity in adopting different phenotypes.Cell-autonomous and spatial contexts can both contribute to phenotypic heterogeneity [39,40,41,42].
Understanding the relative contribution of cell-autonomous phenomena vs spatial context to phenotypic heterogeneity has practical implications for spatial and single-cell omics data analysis.If spatial context drives phenotypes, phenotypes relevant to spatial tissue architecture are expected to emerge in spatially-agnostic analyses such as phenotypic clustering [17].But spatially-agnostic methods are expected to struggle at identifying spatial phenotypes if cell-autonomous factors dominate phenotypic heterogeneity: in this case, spatial approaches to spatial phenotype identification such as NIPMAP are needed.
We find that niche-phenotype mapping and (spatially-agnostic) phenotypic clusters identify shared as well as different phenotypic markers (Fig. 4A for dendritic cells, other cell types in Fig. S2C-E).For example, in DCs, the CD45RO and Keratin6 markers associate with the inflammatory niche and define phenotypic clusters 6 and 1.Yet, niche-phenotype mapping identifies an additional marker, HLA class 1, not highlighted by phenotypic clustering.This suggests that clustering can fail to highlight spatial markers, perhaps because phenotypic clusters are not just driven by space by also by cell-autonomous effects independent of spatial context.In support of this hypothesis, phenotypic clustering highlights CD138 and HLA-DR, both of which poorly associate with space in DCs (Fig. 4A).
This raises the question of the relative contribution of cell-autonomous phenomena and spatial context to the phenotypic heterogeneity of the tumor micro-environment.
If cell-autonomous effects dominate phenotypic heterogeneity, phenotypic clusters are expected to show poor association to space: clustering cells by phenotypic markers without regard to spatial context will produce clusters that poorly predict a cell's niche (Fig. 4B) Conversely, phenotypic clusters will predict the niche if spatial context determines phenotypic heterogeneity (Fig. 4C).
To test this, we examined how tightly phenotypic clusters associate with spatial context.As an upper bound for how precisely the marker panel and marker quantification can position cells in space, we use a linear predictor of a cell's niche from phenotypic marker intensities (area under the curve = 0.89 in predicting which DCs locate in the inflammatory niche, Fig. 4D).
In dendritic cells, we find that one phenotypic cluster predicts the inflammatory niche as accurately as the linear model (Fig. 4D).Other DC clusters predict the location of DCs in other niches (Fig. S2C).
These observations generalize to other cell types (Fig. S2D-E).
This suggests that the phenotypic heterogeneity of the tumor micro-environment is driven both by the spatial context -niche or interface -in which cells find themselves and by cell-autonomous effects, with spatial context playing a bigger role than cell-autonomous effects.
Another fundamental question of tissue architecture is whether niches contribute more to structuring phenotypic heterogeneity than interfaces.
To find out, we quantified how much a given niche or interface associates with phenotypic heterogeneity by summing the squared correlations of all phenotypes of all cell types for that niche or interface (the rows of the matrix in Fig. 3A).If niches structure phenotypic heterogeneity more than interfaces, the 4 niches are expected to have a larger sum of squared correlations compared to the 6 interfaces (Fig. 4E).Conversely, if interfaces structure phenotypic heterogeneity more than niches, the interfaces are expected to have a larger sum of squared correlations (Fig. 4E).
We find that both niches and interfaces can have a large sum of squared correlations (Fig. 4E).
The inflammatory and fibrotic niches as well as the inflammatory-cancer and inflammatory-fibrotic interfaces contribute most to phenotypic heterogeneity in the context of the present phenotypic marker panel.This suggests that phenotypic heterogeneity is structured by both niches and interfaces.Interfaces thus represent histological areas in which cells adopt specific phenotypes, different from the niches that meet at the interface.For example, phosphoS6+ Tregs and phosphoS6+ macrophages are specific to the cancer-inflammatory interface while CD45RO+ B cells specifically locate at the TLS-inflammatory interface.The sensitivity and specificity of phenotypic clusters is computed using cluster membership as a predictor of niche membership.
C. If the spatial niche context of a cell is the main determinant of phenotypes, phenotypes will strongly associate with niches.Thus, the niche of a cell can be predicted from a cell's phenotypic cluster.D. The data supports the hypothesis that the spatial context of cells is a stronger determinant of phenotype than cell-autonomous effects.E. Both niches and their interface structure the spatial architecture of phenotypes.The contribution of a niche or interface to phenotypic architecture is quantified by summing the squared correlations of all phenotypes with that niche (rows of the matrix shown in Fig. 3A).Squared correlations are expected to be higher for niches and lower for interfaces if niches structure phenotypic architecture.Conversely, if phenotypic architecture is structured by interfaces, squared correlations will be higher for interfaces and lower for niches.The data suggest that both niches and interfaces contribute to phenotypic architecture.
2.5 NIPMAP identifies the cellular and phenotypic architecture of developing lung profiled by in situ RNA sequencing So far, we applied NIPMAP to spatial profiling of tumor tissues at the protein level.But healthy tissues and RNA profiling data can also be interpreted with NIPMAP.We illustrate this by applying NIPMAP on single-cell, spatial RNA profiling of healthy embryonic human lungs by In Situ Sequencing (ISS, Fig. 5A) [44,43].
Similar to tumors, we find that covariance structure in local cellular composition emerges in sites with a 25µm radius (Fig. S4A).Four PCs are sufficient to capture 85% of the spatial variation in the cellular composition of the tissue.
Spatial variation in the cellular composition of the developing lung fits a simplex with five end-points (Fig. S4B), suggesting 5 niches (p = 0.013, n = 20, 000 sites, t-ratio test one-sided).A 5-end-point simplex is a four-dimensional geometrical object which makes it difficult to visualize.To address this, we projected sites on the faces of the simplex (Methods 4.15).Examining the distribution of projected sites on the faces of the simplex, we observed sites close to all 5 end-points, supporting the existence of all 5 niches (Fig. S4B).
Quantifying the cellular composition of each niche suggested (1) epithelial, (2) parenchymal, (3) smooth muscle, and (4) vessel niches, as well as (5) ductal and alveolar (liquid-filled) space (Fig. 5B-C).In well-formed ducts, we observed that the epithelium separates the ductal space from the smooth muscle niche (Fig. 5B,D), as expected.The vessel niche does not associate with the alveolar space nor the epithelial niche (Fig. 5B, S4B), as expected at this stage of development (week 13) [43].
As in tumors, we find that niches structure phenotypic architecture (Fig. 5E, Fig. S4C).For example, localization in the vessel niche of arterial cells and pericytes correlated with expression of JAG1 (Fig. 5F) whose functions in endothelial development were previously reported [45].The plateletderived growth factor A (PDGFA) ligand and receptor respectively associated with the vascular vs parenchymal context of pericytes (Fig. 5E, Fig. S4D-E).
To test the robustness of the identified niches with respect to cell type granularity, we repeated niche identification using the 73 cell types proposed by Sountoulidis et al. [43] instead of the 32 coarser-grained cell types of Fig. 5. Increasing the number of cell types, we find five niches with cellular composition and spatial distribution similar to those found with 32 cell types (Fig. S4F-G).This suggests that niches show a degree of robustness to cell type granularity.
Thus NIPMAP generalizes to spatial RNA profiling data and healthy tissues.

Discussion
Multiplex histology produces rich datasets in the form of the location of 10,000 -1,000,000 cells, dozens of cell types, and dozens to thousands of phenotypic markers.100,000+ images of phenotype interactions can be produced from a single sample, which leads to a combinatorial challenge in visualizing and interpreting the data.To address this, we introduce NIPMAP, adapting methods from community ecology and satellite image analysis to multiplex histology data in order to (a) identify the histological niches underlying spatial tissue architecture and (b) summarize how histological niches and their interfaces structure cellular phenotypes.
Applying NIPMAP to multiplex histology data from healthy and disease samples reveals that the local cellular composition of tissues has the low-dimensional geometric structure of a simplex.The endpoints and halfway points of the simplex represent histological niches and their interfaces.Niches match known histo-pathological areas and provide a concise yet accurate summary of tissue architecture.
In the context of breast tumors, these niches generalize across patients and tumor types and connect the microscopic and macroscopic levels of cellular architecture.Individual phenotypic markers are mapped on these niches to identify spatial phenotypes and summarize how phenotypes integrate into histological niches and their interfaces.Analyzing how phenotypes associate with niches and their interfaces suggests that spatial context and cell-autonomous effects both determine phenotypes, with the former having a larger influence than the latter.Phenotypic heterogeneity is structured both by niches and their interfaces, with interfaces being home to specific cellular phenotypes.
Errors in cellular segmentation and lateral signal spill-over can lead to mis-assigning cell types and phenotypes, potentially leading to false-positives or false-negatives during niche-phenotype mapping [46] (see the p53 marker in Fig. S3 for example).Even in perfectly segmented tissues, marker signal can be mis-attributed: a marker can associate with cells of a given type within a given niche not because cells of that type express the marker but instead because the marker is systematically present in that niche, perhaps as a soluble form or as a constituent of the extra-cellular matrix (see the CD138 marker in Fig. S3 for example).NIPMAP is designed as the final layer of the multiplex histology data processing stack.It does not attempt to correct cellular segmentation, cell type assignment or signal attribution errors: these issues need to be addressed in the corresponding layers.These issues are recognized in the multiplex histology field and ongoing methodological research is seeking to address them [47,48,46,49,50].The statistical methodology employed by NIPMAP provides a degree of robustness to segmentation, cell type and signal attribution errors because niche-phenotype associations are only captured if they occur systematically across cells.Despite that, and until a definitive methodological solution to cellular segmentation, cell type assignment or signal attribution errors is established, spatial phenotypes highlighted by NIPMAP need to be confirmed by overlaying cellular segmentation with the spatial signal distribution of the corresponding phenotypic marker (Fig. S3).
NIPMAP identifies novel spatial phenotypes, such as an association between CD45RO and the localization of macrophages in the inflammatory niche of triple-negative breast tumors.CD45 is a hematopoietic marker which can be expressed as different isoforms CD45RA, CD45RB and CD45RC depending on alternative splicing of its three exons A, B, and C. The CD45RO isoform lacks the A, B, C exons.The different isoforms are associated with specific phenotypes, at least in the context of lymphocytes cells [30,31].Thus, quantifying the CD45 isoforms specifically is a prerequisite to interpret the observation that CD45RO+ macrophages localize in the inflammatory niche of triple-negative tumors.The existing literature suggests that the CD45RO antibody (UCHL1 clone, Biolegend) used in the MIBI study we re-analyzed here [21] is specific to the CD45RO isoform of CD45.The UCHL1 clone has been used to specifically quantify CD45RO since the 1980ies [51,52,53,54].An early flow cytometry study in T cells found that the abundance of CD45RA and CD45RO abundance -as quantified by UCHL1 -correlate negatively, consistent with the specificity of UCHL1 to the CD45RO isoform [52].
Profiling CD45RO by means of the UCHL1 clone alongside CD45RA (by means of the 2H4, F8-11-13 or HI100 clones) is routinely used to discriminate between memory and naive T cells [52,54].In addition, the risk of non-specific quantification of CD45 isoforms in macrophages is mitigated by western blot, flow cytometry and full-length RNA-seq observations that bone marrow-derived macrophages from the spleen, white adipose tissue, liver and the peritoneal cavity lack the CD45RA, RB and RC isoforms and thus specifically express the CD45RO isoform [37].These observations suggest that macrophages specifically upregulate the CD45RO isoform in the inflammatory niche of triple-negative breast tumors.
When performing single-cell analyses, a decision needs to be made regarding the granularity of cell types.For example, T cells could be lumped together with other immune cells or be assigned a more granular type such as Th17 CD4+ T cell.Alternatively, cell typing could have intermediate granularity -such as CD4 T cells -and complemented by phenotypes -such as Th17.This raises the question of how cell type granularity impacts niche identification.In general, one expects optimal niche identification when analyzing cell types at a range of intermediate granularities and poor niche identification when the cell types are insufficiently granular or too granular.This is because insufficient granularity can prevent observing the cell types that characterize a given niche: for example, a vascular niche characterized by pericytes and endothelial cells cannot be identified if cells are coarsely grouped into epithelial vs non-epithelial.Conversely, too granular cell types -for example if each cell has its own type -prevent identifying recurrent patterns in the local cell type composition of tissues to reveal its niches.In the context of the ISS data of Sountoulidis et al. [43], our findings suggest that the niches identified are robust to the exact number of cell types used.
To interpret spatial phenotypes, markers need to separated into (1) markers of cell type and (2) markers of phenotypes, for exclusive use during niche identification and niche-phenotype mapping respectively.This is because using a given marker both for cell typing and phenotyping would necessarily identify the phenotype in the niches where the cell type is present, thus biasing niche-phenotype mapping.Both datasets used here comply with this separation.The MIBI data from Keren et al. used 17 lineage markers to define cell types and another 18 functional markers to identify phenotypes [21].
The ISS data of Sountoulidis et al. [43] used 72 lineage markers to identify cell types and profiled a distinct set of 75 genes from the WNT, SHH, NOTCH and RTK pathways [43] which we used here in niche-phenotyping mapping.
NIPMAP can complement existing methods to analyze multiplex histology data.For example, cellular spatial enrichment analysis aims at identifying spatial interactions by looking for pairs of cell types that colocalize more often than expected by random chance, defined by shuffling cell positions within a given niche [21].By providing an approach to identify niches automatically, NIPMAP could facilitate cellular spatial enrichment analysis.Other spatial analyses such as proximity analysis and nearest neighbor analysis could benefit from NIPMAP's niche identification in the same way.
To cluster-based methods aimed at identifying discrete cellular structures in multiplex histology data such as community detection based on spatial cellular networks and cellular neighborhood analysis [17,18], NIPMAP adds a principle to determine the number and identity of histological niches by exploiting the simplex geometry of local cellular composition.NIPMAP also adds a geometric principle to distinguish niches from interfaces and to automatically identify interfaces without parameter tuning.Identifying interface regions is of high interest to tissue biology.In cancer, for example, understanding why immune cells from the inflammatory niche fail to penetrate the cancer niche can suggest therapeutic interventions to remove blocks to anti-tumor immunity [6].
Here we illustrated NIPMAP on spatial data from MIBI [7], a protein-based approach, and ISS [44], an RNA-based method.Beyond these two technologies, NIPMAP is designed to be applicable to other spatial methods with single-cell resolution such as Imaging CyTOF, Codex, CycIF, 4i, MER-FISH and more [10,11,12,13].
Similar to how community ecology defines ecological niches based on local species covariance, local covariance between cell types is exploited by NIPMAP to identify histological niches.This approach requires assigning a type to each cell based on marker intensities and prior knowledge, with two potential downsides.First, niches identifiable in this way could be limited by previously known cell types.Second, assigning types to individual cells from marker intensities is time-consuming and not guaranteed to be error-free due to segmentation errors, signal mis-attribution, non-specific antibody binding, auto-fluorescence, molecular exchanges between cells and more.To address this, it would be desirable for niche identification to be based not on the types of cell but instead on marker intensities of local tissue regions prior to segmentation into cells.Preliminary exploration of this question in the context of the MIBI data of Keren et al. [21] suggests that niches can potentially be identified without assigning predefined types to cells, resulting in similar niches as cell type-based niche identification (Supplementary Note 4).Future research can further develop this methodology.
In the future, the cellular and phenotypic architecture identified by NIPMAP could support efforts aimed at understanding how healthy tissues maintain function despite the need for constant cellular turn-over [55] and interpreting the spatial dynamics of niches during histological processes such as development, tissue repair or disease progression [43,56,57].
4 Online methods 4.1 Processing the MIBI data from Keren et al. [21].
From the website of the Angelo lab, we obtained processed MIBI data for 36 protein markers from 41 TNBC patients samples: intensity values, segmented images, and patient data.In particular, the segmented data (cellData.csv)contained (x, y) coordinates of each cell and its type (out of 17 cell types) as determined by the authors of the study.Following the authors of Keren et al. [21], patient 30 was excluded from the analysis.

Quantifying cell type density in sites with a Gaussian kernel
Each tissue slide is a 2-dimensional space with cells of a determined cell type as points of coordinates (x, y).We positioned 100 sampling sites randomly on each slide, by drawing the centers from a uniform distribution.In contrast to common practice, sites are not positioned on cells but uniformly across the tissue section.Doing so has the advantage that sites are representative of tissue architecture and unbiased by spatial variation in cellular density.
We generated 4000 sites, 100 sites for each of the 40 slides.
To quantify the abundance of cells of different types at each site, rather than counting cells with a circle of radius r, we used a Gaussian kernel density estimation to decrease counting noise: if a cell is slightly outside the circle of radius r, it is not counting with the first strategy.Counts are thus sensitive to slight changes in the position of the center of the circle.Gaussian kernel density estimation addresses this by weighting cell counts by their distance to the center in a smooth fashion.The weight g of a cell of position ⃗ x to the site of center ⃗ s k is defined as We summed up the density values for each cell type.
We performed PCA, centered and unscaled using the ade4 package of the data analysis software R [20].
In positioning sites, we excluded areas of the slides located within distance r from the image edge, in order to decrease edge effects.
We explored a broad range of width r values to examine the robustness of tumor architecture (Fig. 2B).We found that r = 25µm is the minimal radius allowing to capture cellular architecture (Fig. 2B).This suggests that tumor micro-architecture emerges on a scale of 2-4 cells.

Archetype analysis
Archetype analysis [26] aims to fit a d-dimensional simplex as tightly as possible to n data points ⃗ x.The simplex has p endpoints ⃗ b k ∈ R d , k = 1, ..., p which represent the endpoints, also known as archetypes [27].By definition, each point ⃗ x within the simplex can be written as a weighted average of the endpoints with the weights α constrained by 0 ≤ α k ≤ 1 and p k=1 α k = 1.We used the tumor samples projected onto the 3 first PCs as input for archetype analysis.We used the Archetypal Analysis python package [58], with the parameters: n_archetypes = 4, tolerance = 0.001, max_iter = 200, random_state = 0, C = 0.0001, initialize = 'random', redundancy_try = 30.The output of this algorithm contains a dataset of α k weights for each tumor sample and the coordinates of the endpoints ⃗ b k in the reduced space of 3 PCs.
We set the number of endpoints using elbow criteria on the fraction of variance in the local cellular composition explained by a different number of endpoints.When varying the number of endpoints p, the number d of PCs used for fitting the simplex was always d = p − 1 because p − 1 dimensions are generally needed to describe a simplex with p endpoints.

Assessing the robustness of niches to sampling intensity
In order to robustly identify niches while optimizing computation time, we performed an error analysis as a function of the sampling intensity -defined as the ratio of the total area of sites to the tissue area -to test how deeply tissues need to be sampled.To do so, we first over-sampled the tissue by collecting a number of sites such that the total area covered 1000% of the tissue area, and used these sites to determine reference niches for the MIBI dataset of Keren et al. [21].We then sampled sites such that the total area covered 300%, 100%, 30%, 10%, 3% and 1% of the tissue area.At each sampling intensity, sites were sampled 100 times and niches were computed, producing 100 sets of four niches per sampling intensity.The niche estimation error was computed as the root mean squared error (RMSE) to the reference niches.We plotted the root mean squared error averaged over the 100 repeats at each sampling intensity (Fig. S1K).We find that collecting a number of sites equivalent to at least 30% of the tissue area identifies niches with low error while speeding up computations(Fig.S1L).If computation time is not an issue, we recommend sampling a number of sites equivalent to 100% of the tissue area as the error is slightly smaller compared to sampling 30%.
In order to robustly identify niches while optimizing computation time, we performed an error analysis as a function of the sampling intensity -defined as the ratio of the total area of sites to the tissue area -to test how deeply tissues need to be sampled so as to control for niche cellular composition error.
To minimize the sampling error, we first over-sampled the tissue by collecting a number of sites such that the total area covered 1000% of the tissue area.Over-sampling the tissue minimizes the sampling error because, even when sampling at 100% intensity, random positioning of sites may leave certain tissue areas uncovered by any site.Sites sampled at 1000% intensity were used to determine reference niches for the MIBI dataset of Keren et al. [21].
We then sampled sites such that the total area covered 300%, 100%, 30%, 10%, 3% and 1% of the tissue area.At each sampling intensity, sites were sampled 100 times and niches were computed, producing 100 sets of four niches per sampling intensity.The niche estimation error was computed as the root mean squared error (RMSE) to the reference niches in terms of cellular composition.
We plotted the root mean squared error averaged over the 100 repeats at each sampling intensity (Fig. S1K).A 30% sampling intensity -which we used in analyzing the MIBI data of Keren et al. [21] identified niches with small enough an error to robustly characterize the biology of each niche (Fig. S1L) while speeding up computations.If computation time is not an issue, we recommend sampling a number of sites equivalent to 100% of the tissue area or more, as the error is slightly smaller compared to sampling 30%.

Classifying tumor samples into mixed vs compartmentalized using niche weights
To associate NIPMAP's niche segmentation with the previously proposed mixed vs compartmentalized classification of tumor architecture of Keren et al. [21], we sorted samples according to the contribution of tumor-immune interfaces relative to the total prevalence of immune niches, following the methodology described by Keren et al. [21].More specifically, for each sample, the NIPMAP mixing score m was computed as where α 1 , α 2 , α 3 represent the weight of the TLS, inflammatory and cancer niches at a given site, and averaging is performed over sites.The NIPMAP mixing score matched the mixed vs compartmentalized classification of Keren et al. for 37 out of the 40 samples (Fig. S1C).We then tested whether the findings of Keren et al. on the association between mixed vs compartmentalized samples (reported in Fig. 5B, 5E and 5H of the original study) and the immuno-signaling environment could be reproduced using the NIPMAP mixing score.All three associations reported by Keren et al. could be reproduced using the NIPMAP mixing score (Fig. S1D).Cold samples were excluded from the analysis, following the exclusion criteria of Keren et al..

Comparing the spatial variation in cellular composition captured by different numbers of community-ecology-and clustering-based niches
We compared how NIPMAP and clustering captured spatial variation in the composition of m cell types across 4000 sites sampled from 40 triple negative breast tumors analyzed by MIBI.
Iterating over the number of niches (p = 2, . . ., 17 niches), NIPMAP was performed using p − 1 PCs The percentage of variance in cellular composition explained by NIPMAP niches was computed as follows.For each site k, we compute the niche weights ⃗ α k ∈ R p that best fit the site's cellular where ⃗ c 0 is the average cellular composition of sites (we performed centered PCA).The difference between the site's best-fitted cellular composition U B⃗ α k + ⃗ c 0 and the observed site composition ⃗ c k is defined as the error 100%− the ratio of the squared error to the total sum of squares of site cellular is the fraction of explained variance by the niches, where i = 1, . . ., m represents the cell type.
To determine how k-means clustering captures the spatial variation in cellular composition, we perform clustering to find p niches ⃗ b j ∈ R m in the cellular abundance of our 4000 sites.The fraction of variance explained by p niches -the clusters -was computed as 100%− the ratio of the within-clusters sum of squares to the total sum of squares of the site composition data, where ν(k) is the cluster j to which site k belongs.
We note that NIPMAP's simplex model requires slightly more free parameters compared to kmeans clustering, which helps increase the percentage of explained variance.We note that the goal of this analysis is not to find the best trade-off between the number of parameters and goodness of fit, but rather to identify a data structure that summarizes phenotypic spatial architecture in a concise fashion, using a small number of niches that fit well within the cognitive limitations of the humans.

Power analysis of the probability to capture a rare niche
To study the power of NIPMAP to capture a rare niche, we simulated tissue data in which the prevalence of one niche varied while the prevalence of the remaining niches was set to be equal to each other.As a rare niche is expected to be more difficult to identify with little tissue data, we also varied the amount of tissue data in the simulation.
An existing approach to simulate tissue data [59] requires spatial co-occurrence statistics of cells of different types as an input.Tuning co-occurrence statistics so as to (a) specify the number and cellular composition of niches and (b) vary the abundance of a specific niche while keeping the other niches constant is not trivial.To address this, we designed a tissue simulation approach that can accommodate these two requirements, as follows.
We first simulated the spatial distribution of niche weights ⃗ α(x, y), where α i is the weight of niche i at position (x, y) of the tissue, and i α i = 1.Simulated tissues should show continuous regions in which a given niche dominates, with smooth spatial transition into the contiguous niche.Drawing upon classical mathematical models of spatial patterns [60], we reasoned that a reaction-diffusion system in which niches compete locally with each other and diffusion enforces smoothness of niche weights in space could simulate realistic spatial distributions of niche weights.
Experimenting with different reaction-diffusion systems lead to the following equation: where we set β = 1/day.
The positive term in the first bracket represents cooperative logistic growth.When the weight α i of niche i is close to 0, there is near-zero niche growth.There is a step-like increase in growth as α i approaches K, which rapidly saturates around 1, due to the high exponent of α i (power of 5).By setting K = 1/n where n is the number of niches, we can thus establish growth dynamics in which only one niche wins at each location (x, y).
The negative term in the first bracket of the equation prevents niche weights from growing to infinity, by ensuring that there are two stable fixed points in the absence of diffusion (that is when D := 0): α i = 0 -niche i is absent at (x, y) -and α i ≃ 2K -niche i is present at (x, y).
The second bracket adds diffusion, to enforce smooth variation in niche weights with respect to space.We set the spatial domain in both x and y to [0, L] with L = 800µm, the size of a MIBI image, using periodic boundary conditions.
We simulated this system numerically on a 50x50 lattice, that is dx = dy = L/50 = 16µm until convergence using R's ode.2D solver from the deSolve library.At convergence, the ⃗ αs were normalized to sum up to 1 at each position (x, y).We used n = 4 niches due to practical relevance to our reanalysis of the MIBI data of Keren et al. [21].Setting D = 40µm 2 /day produced niches whose spatial architecture resembles that of MIBI data (Fig. S1O).
We define the prevalence of niche i as L −2 α i (x, y)dxdy.To simulate tissues in which one niche is more rare -less prevalent -than the others, we altered the initial condition ⃗ α 0 (x, y).We randomly initialized ⃗ α 0 (x, y) so that, at each (x, y), the weight of one niche was 1 and the weight of all other niches was 0. To simulate tissues in which niche 1 was less prevalent then other niches, we varied the probability f that α 1 (x, y) was 1.The probability that α i (x, y) = 1 for the other niches i ̸ = 1 was set to be equal.This resulted in initial conditions ⃗ α 0 (x, y) in which the prevalence of niche 1 was f and the prevalence of niche 2, 3 and 4 was (1 − f )/(n − 1).
As expected, tissues represented by the initial condition ⃗ α 0 (x, y) were unrealistic and unstructured: niches showed no spatial contiguity as niche weights varied abruptly from 0 to 1 from one location of the lattice to the next.Simulating the reaction-diffusion dynamics of ⃗ α(x, y) defined by Equation 3to convergence caused niches to compete locally and laterally in space and thereby to establish contiguous areas in which a given niche dominated (Fig. S1O).
Varying f in the initial condition ⃗ α 0 (x, y) from 0.04 to 0.25 in 9 logarithmic steps and simulating the sytem to convergence generated 9 tissues ⃗ α(x, y) in the form of matrices 50x50xn in which the prevalence of niche 1 ranged from 0.24% to 30.4% (Fig. S1O).
From these 9 simulated tissues ⃗ α(x, y), we simulated multiplex histology data.Multiplex histology data has the form of a table X with n s rows representing sites and columns representing the local abundance of the different cell types.
To simulate this table X for a given niche prevalence and a given number of MIBI images n i (amount of tissue data), we first computed the number of sites n s needed to cover the area of the n i images, n s = n i L 2 /(πr 2 ), where r = 25µm is the sites radius.Simulating sites from more tissue images is equivalent to collecting more sites from a given tissue image in the present scenario that different tissue images are made of the same niches.
We positioned n s sites in the tissue by sampling their position (x, y) from a uniform distribution between 0 and L. For each site, we determined the local niche weights ⃗ α(x, y) by linear interpolation, to generate a matrix A of sites (rows) x niches (columns).We simulated local cellular densities as We passed X to NIPMAP to estimate 4 niches ⃗ bi , i = 1, . . ., 4. We computed the RMSE ϵ on the estimated cellular composition of niche 1 from where 17 is the number of cell types.The min operator ensures that the estimated niche closest to niche 1 is used to compute the error.This is necessary because niche indices are arbitrary in NIPMAP so that ⃗ b 1 doesn't necessarily match ⃗ b1 .
We repeated this procedure 100 times for each niche prevalence and number of images n i .Inspecting the estimated niche composition B as a function of the estimation error ϵ suggested that a threshold ϵ < 4.5 × 10 −4 distinguishes simulations in which niche 1 was accurately captured from simulations in which niche 1 failed to be captured.Thus, we estimated the probability to identify the rare niche 1 as the fraction of these 100 simulations for which ϵ < 4.5 × 10 −4 (Fig. S1P).

Identifying interface regions
Sites located at interface regions have high weights for more than one niche.Thus, to find sites at the interface of two niches, we compute the product of the niches' weight and look for sites where this product is high (Fig. 2F).
Interface regions can be defined in two ways.Under one definition, interfaces are found at the contact of two niches.To find these interfaces, we multiply the weights of pairs of niches.In this definition, interfaces are influenced by local cellular density.For example, a low concentration of immune cells next to cancer cells would not qualify as an immune-cancer interface because a low concentration denotes fibrotic regions (Fig. S1J).
To identify interfaces between immune and cancer cells based only on cellular composition and independently of cellular density, we can exclude the contribution of the niche of low cellular density (here the fibrotic niche) by setting its weight to 0 at all sites and renormalizing the weights of the other niches to sum up to 1.The benefit of this definition is that the interface regions identified through this process fit better with a visual impression, as the visual impression is guided more by cell types (colors) than cellular density (Fig. 2A, Fig. 2F).
4.9 Processing and analysis of CyTOF data from Wagner et al.
We downloaded the summarized version of the CyTOF experiments of Wagner et al. [24].The data table contains cellular proportions of cell types identified by 73 markers in 144 breast tumor samples.
Cellular composition was organized in hierarchies, for example, the proportion of live cells among all cells, the proportion of cells of the M (myeloid) cluster among live cells, the proportion of M1 cells among the M cluster, and so on.
Wagner et al. [24] assigned cell types -Tumor Associated Macrophages, CD4+ T regulatory cells, ... -to cell clusters (leaf nodes) -M01, T01, and so on.We took over these cellular assignments from Fig. 2D-L of Wagner et al. [24] in order to compute the relative composition of each tumor in terms of 12 cell types, chosen to be as similar as possible to the cell types profiled by Keren et al. [21]: cancer cells, fibroblasts, endothelial cells, CD4 T cells, CD8 T cells, NK cells, dendritic cells, macrophages, B cells, plasma B cells, healthy tissue, other immune cells.One sample contained less than 50% live cells and was thus removed, keeping 143 samples for the analysis.
We performed PCA, centered and unscaled using the ade4 package of the data analysis software R [20].Unscaled PCA was used because all features have the same units (fractional abundance is unit-less).We explored other transformations such as scaling by the standard deviation and logtransformation.Scaling by the standard deviation destroyed much of the covariance structure expected from breast tumor biology, presumably by amplifying sampling noise in low-abundance cell types.
Log-transformation resulted in similar niches to the ones presented in the present article but produced curved simplexes which require developing new algorithms in order to fit the simplex to the data.
4.10 Showing that inter-patient variation in macroscopic cellular architecture of tumors is constrained by a simplex whose end-points are the microscopic niches If inter-patient variation in the macroscopic cellular architecture of tumors is explained by patientspecific usage of universal niches, inter-patient variation in tumor cellular composition must be constrained by a simplex whose end-points are the microscopic niches.
To see why, let α j (⃗ x) be the local weight of niche j at location ⃗ x of the tumor.All the weights can be collected into a vector ⃗ α, with Σ j α j = 1 and α j > 0. We collect the cellular composition of each niche into a matrix B whose entries b ij indicate the density of cell type i in niche j, in units of inverse volume (1/µm 3 ).With this notation, the local cellular composition ⃗ c(⃗ x) at location ⃗ x of the tumor is The macroscopic cellular composition of the tumor is then obtained by integrating the microscopic cellular composition ⃗ c(⃗ x) over the tumor volume Here, one can show that all θ j are positive and sum up to one.First, since α i ≥ 0, Therefore, the macroscopic cellular composition of the tumor ⃗ C is the weighted average of the niches B. Macroscopic tumor composition must be bounded by a simplex.
In silico dissection of healthy tissue.Direct comparison of microscopic niches and macroscopic cellular composition data is not possible because the tumor samples of Wagner et al. [24] partially include healthy tissue from the tumor margin whereas healthy tissue was not imaged in the samples of Keren et al. [21].To enable a comparison of the two datasets, we mathematically dissect healthy tissue out of the tumor samples.
After projection onto n − 1 PCs ⃗ v i , a CyTOF tumor sample ⃗ C (vector of proportions of 12 shared cell types) can be written as where ⃗ µ is the vector of average proportions of each cell type and u i represents the contribution of PC i to the sample C. We then rewrite the first term as a weighted average of four endpoints (cancer, immunity, healthy, fibrotic) computed by archetype analysis in the space of the first 3 PCs to obtain where ⃗ b j is the jth endpoint.To dissect the healthy endpoint (endpoint 3), we remove its contribution from the weighted averages: Finally, we compute cellular proportions − → C d after dissecting the healthy endpoint as For 15 out of the 143 CyTOF samples, the weight of the healthy endpoint was >50%, suggesting that healthy tissue dominated the cellular composition of these samples.These samples were discarded from further analysis because the lower weight of the non-healthy endpoints risked increasing the dissection error.
Mapping cells types across two datasets.The cell types profiled in the CyTOF data of Wagner et al. [24] and MIBI data of Keren et al. [21] overlap only partially: while some cell types are common to both data sets -CD8 T cells for example -other cell types were either only profiled in one dataset, or profiled at a different level of specificity -all B cells vs distinguishing B and plasma cells, all CD4 cells vs distinguishing CD4 cells and Tregs.This creates a challenge in comparing the two datasets.
To address this, we mapped cell types across the two datasets to compute cellular composition based on cell types common to both datasets.
To do so, we created an incidence matrix G of dimensions m × n with m cell types from MIBI data as rows and n cell types from CyTOF as columns.The entries of the G matrix are set to 0 if the corresponding cell types from the two datasets are different and to 1 if they are identical.A column or a row can have more than one 1 if the MIBI and CyTOF datasets differ in their granularity for the corresponding cell type.This incidence matrix can be represented as a bipartite graph (Fig. S1Q).
From G, we then derive two matrices G k , G w that allow projecting cell proportions from the initial MIBI X k and CyTOF data X k (respectively) onto the shared sets of cell types Y k = X k G k and Y w = X w G w by matrix multiplication.
To compute the projection matrix G k for the MIBI data, we initialize G k := G.We then sum up all the rows of G. Rows where the sum is larger than 1 represent MIBI cell types that map to multiple, more granular CyTOF cell types.For these rows, we keep all columns with 0s and the first column with 1 in order to keep only the least granular cell type of the two datasets.The kept column is then named accordingly to the row.To compute the projection matrix G w , we perform the same procedure, reversing rows and columns.
Applying this procedure produces the following cell types to compare the MIBI and CyTOF To quantify how phenotypic heterogeneity associates with space, we tested how accurately each phenotypic cluster predicted the niche of a given cell.We considered that a given cell was located in a given niche if the weight of that niche was at least 0.5.By tabulating how often cluster membership matched niche location, we computed the sensitivity and specificity of each cluster in predicting the different niches.
Phenotypic clusters are identified without regard to the niche location of cells.We thus asked whether a combined analysis of niche location and phenotypic markers could identify better predictors of the niche location of cells.To do so, we trained linear predictors of the niche weight of each cell based on the intensity of all 18 phenotypic markers.Changing the cut-off on the predicted niche weight beyond which a cell was considered to localize in that niche, we computed how different niche weight cut-offs achieved different sensitivities and specificities (ROC curves).

Quantifying the niche weights of individual cells
To associate cell phenotypes and niches (see next section), the niche weights of each cell needs to be determined.
The niche and interface weights are computed for all cells of the dataset, by centering sites on each cell of the dataset.The cellular composition ⃗ c of each site is determined and the contributions of the different niches to each site ⃗ α is computed as described above (Eqn.4) by solving the matrix equation ⃗ c = Bα, with B, a matrix of the cellular composition of the different niches.
We solve for ⃗ α, α i = 1 by quadratic programming using the qpsolvers Python library with the default solver "quadprog".Cells labeled as Unidentified were discarded from the analysis.

Identifying niche-phenotype associations
To identify associations between phenotypic markers and niches in a given cell type, we iterate over markers, niches and cell types.
For each marker-niche-cell-type triplet, we compute Spearman's rank correlation ρ between marker intensity and niche weight in individual cells of the dataset.Here, a niche can be an individual niche i (quantified as α i ) or an interface region between two niches i and j (quantified as α i α j ).We only consider combinations of cell types and niches for which there was at least one example of a cell of that type located mostly in this niche (α i > 1/2).For interfaces, the maximal weight of each niche is 1/2 (not 1 as in niches): thus we only consider combinations of cell types and interfaces i, j for which there was at least one example of a cell of that type with α i α j > 1/2(1/2) 2 = 1/8.In the heatmap visualization of niche-phenotypes associations, the ρ correlation of the marker-niche-cell-type triplets that don't meet this criteria is set to 0.
Statistical significance is quantified as a p-value using a two-sided asymptotic t test approximation.The calculation is repeated for all combinations of phenotypic markers of all cell types, and all niches/interfaces.
To correct for multiple testing and focus analysis on the clearest niche-phenotype associations, we compute the false discovery rate [33] and keep only q-values smaller than 0.01.We also require the Spearman correlation to be at least 0.3, a threshold beyond which visual intuition confirms nichephenotype associations.
We considered higher-order interfaces between niches (3 niches and more) but chose not to report them here because niche-phenotype associations were weaker compared to niches and their pairwise interfaces.
In the analysis of the MIBI data of Keren et al. [21], to prevent false-positives in niche-phenotype associations due to spatial signal bleed-over of cancer cell markers to neighboring cells, we filtered out the Keratin 6 and beta-catenin markers in niches dominated by cancer cells, that is the cancer To do so, we iterate through all the faces of the simplex, defined by all combinations of three endpoints i, j, k.
We exclude sites located farthest from the face, as the projection of sites located far from a face provides little insight regarding how well the simplex's face fits these sites.To do so, we only keep sites for which the combined weight of the endpoints that define the face is at least 50%.
From the coordinates of these endpoints in PC-space, we determine an orthonormal basis for the three endpoints of the face and position the endpoints on that basis: i at (0, 0), endpoint j at (x j , y j ) and endpoint k at (x k , y k ).Then, the position ⃗ p of sites on the face is determined from niche weights ⃗ α as ⃗ p = α i (0, 0) + α j (x j , y j ) + α k (x k , y k ).
Fetal lung ISS data of Sountoulidis et al. [43]: ethical permit was obtained by the authors of the initial study from the Swedish National Board of Health and Welfare.The analysis was approved by the Swedish Ethical Review Authority (2018/769-31).The clinical staff of the initial study acquired informed written consent by the donor.
CyTOF data of Wagner et al. [24]: tissue were collected after obtaining written informed consent from patients at the University Hospital Basel (Switzerland), the University Hospital Zurich (Switzer-

Figure 1 :
Figure 1: Clustering can identify the cellular and phenotypic architecture of tissues from multiplex histology data but can misinterpret the niche-interface structure of tissues.A. To segment histological sections into niches, local cellular composition is surveyed at multiple sampling sites.Sites with similar cellular composition are clustered, and each cluster is interpreted as a histological niche.B-C.Local cellular composition does not necessarily form clusters.B. When niches cover broad tissue areas with little interface, sites mainly fall within a certain niche and their cellular composition thus clusters by niche.C.When niches feature large interface regions, sites often cover more than a single niche so that site cellular composition is a mix of the two niches.Consequently, in the case of two niches, the cellular composition of sites describes a linear segment.Sites from the core region of a given niche are found at the extremities of the segment while sites in the middle of the segment represent interface regions.

Figure 2 :
Figure 2: Community ecology-inspired niches offer a quantitative framework to interpret cellular tissue architecture from multiplex histology data.A. MIBI of tumor sections identify the position and the type of single cells over regions of 800x800 µm 2 .Shown is sample #5 out of 40 samples from Keren et al. [21].B. Covariance structure emerges when considering the cellular composition of sampling sites 25µm in size or larger.While the number of PCs (y-axis) is discrete, the present heatmap interpolation helps visualize how the fraction of explained variance in cellular composition depends on the number of PCs and the size of sampling sites.C.Representing the cellular composition of sites on three axes (PCs) reveals that site cellular composition does not form clusters but instead describes a continuum constrained by a 3D simplex.D. The finding that site cellular composition is constrained by a 3D simplex suggests a view of tumor architecture in which local cellular composition is a weighted average of four histological niches, the end-points of the simplex.Sites located close to the endpoints localize at the core of the corresponding niche.Sites located at the interface between niches fall on the edges of the simplex.E. Coloring tissue sections according to the local weight of the four niches segments tissue sections into niches.Spatial variations in local cellular composition can be interpreted as space-dependent changes in the weight of the four niches.Colors represent local niche weight, as in C-D.F. Cells located at interface regions -here the inflammation x cancer interface -are automatically identified as cells where the product of the weights of the inflammation and cancer niches is high.G.The cellular composition of each niche has a histo-pathological interpretation: cancer, fibrotic/necrotic, inflammatory, and tertiary lymphoid structure (TLS).H. Community-ecology-inspired niches capture tissue cellular architecture more concisely than clustering-based niches.I. Community-ecology-inspired niches address artifacts that can occur with clusteringbased niches.J. Triple-negative breast tumors from different patients employ the same niches: the same four niches explain intra-tumor spatial heterogeneity and inter-patient variation in the cellular composition of tumors.K. Niches generalize across breast cancer types and explain inter-patient variation in the macroscopic cellular composition of tumors.The cellular composition of 128 breast tumors characterized by Wagner et al.[24] was projected on its first two PCs after removing the contribution of healthy tissue.The four niches computed from the MIBI data of Keren et al.[21] were then projected on the same space.

Figure 3 :
Figure 3: Niche-phenotype mapping identifies spatial phenotypes and summarizes the phenotypic architecture of breast tumors.A. Different cell types have phenotypes (columns) with specific associations to the different histological niches and their interfaces (rows).Shown are all phenotypes with a niche-phenotype correlation of at least 0.3 and a false discovery rate of less than 1%.B-G.Niche-phenotype mapping reveals expected (B-D) and novel (E-G) niche-phenotype associations, which are visualized by overlaying phenotypic marker intensity (dot color) and tissue segmentation (background color).Cells of types other than the type indicated in each panel are not shown for clarity.B. B cells are positive for HLA-DR in the TLS niche.C. Keratin-positive tumor cells are positive for HLA-DR / MHC class II in the inflammatory niche.D. Keratin-positive tumor cells are positive for HLA class I at the interface of the cancer and inflammatory niches.E. CD45RO intensity in macrophages is associated with the inflammatory niche.F. Keratin 6 in dendritic cells is associated with the inflammatory niche.G. PD-L1 intensity in neutrophils associates with the interface of cancer and inflammatory niches.Marker intensity represents Z-scored marker abundance, as quantified in the original study [21].

Figure 4 :
Figure4: Niche-phenotype mapping reveals two fundamental properties of tissue phenotypic architecture.A-D.The spatial context of cells is a stronger determinant of phenotype than cell-autonomous effects.Dendritic cells are illustrated here and other cell types appear in Fig.S2D-E.A. Niche-phenotype mapping (left) and (spatially-agnostic) phenotypic clusters (right) identify common phenotypic markers -CD45RO, Keratin6 -but also divergent markers -HLA class 1, HLA-DR, CD138.B. If cellautonomous effects determine phenotypes more than spatial context, spatial context will poorly associate with phenotypes.Thus, phenotypic clusters -which are built independently of spatial context -will predict a cell's niche as poorly as a random predictor.The sensitivity and specificity of phenotypic clusters is computed using cluster membership as a predictor of niche membership.C. If the spatial niche context of a cell is the main determinant of phenotypes, phenotypes will strongly associate with niches.Thus, the niche of a cell can be predicted from a cell's phenotypic cluster.D. The data supports the hypothesis that the spatial context of cells is a stronger determinant of phenotype than cell-autonomous effects.E. Both niches and their interface structure the spatial architecture of phenotypes.The contribution of a niche or interface to phenotypic architecture is quantified by summing the squared correlations of all phenotypes with that niche (rows of the matrix shown in Fig.3A).Squared correlations are expected to be higher for niches and lower for interfaces if niches structure phenotypic architecture.Conversely, if phenotypic architecture is structured by interfaces, squared correlations will be higher for interfaces and lower for niches.The data suggest that both niches and interfaces contribute to phenotypic architecture.

Figure 5 :
Figure 5: NIPMAP generalizes to RNA-based spatial profiling of healthy tissue.A. The cellular and phenotypic architecture of the human developing lung was characterized by in situ RNA sequencing.Data: Sountoulidis et al. [43].B. The developing lung can be segmented into 5 histological niches.C. Each niche has a specific cellular composition suggesting the histological basis of that niche.D. Niche-phenotype mapping recovers known spatial associations between niches, such as the epithelial niche separating the smooth muscle niche from ductal/alveolar space.Dots: sites projected on the face of the 5-niches simplex defined by the ductal/alveolar, epithelial and smooth-muscle niches.Colored areas: contours of dots density.The sequential organization of the ductal, epithelial, and smooth muscle niches is reflected by a depletion of sites at the ductal-smooth muscle interface relative to the ductal-epithelial interface.Thus, from the ductal/alveolar space, we first encounter the epithelial niche.Beyond the epithelial niche, the smooth muscle niche increases in weight.E. Projecting phenotypes onto niches identifies cell types and phenotypes that associate with specific niches.F. Arterial cells express JAG1 when located in the arterial niche but not when located in other niches.Dots: arterial cells.Color bar: JAG1 RNA count per cell.Background color: niche segmentation.
where B is the matrix of niche cellular composition, whose rows represent cell types and columns represent niches.For realism, the cellular composition of the niches B = ⃗ b 1 , ⃗ b 2 , ⃗ b 3 , ⃗ b 4 was set to the four niches and 17 cell types of the MIBI data of Keren et al. [21].
Hierarchical clustering of cell phenotypes and spatial specificity of phenotypesTo compare phenotypic clustering to the spatial phenotypes identified by NIPMAP, hierarchical clustering was performed on the intensity of 18 phenotypic markers previously classified as functional markers as opposed to lineage markers[61].Marker intensities were Z-scored within each cell type to facilitate the visualization of phenotypic clusters and assess marker significance.Hierarchical clustering was performed on Z-scored intensities of all 18 phenotypic markers, in 3 cell types (dendritic cells, NK cells, and neutrophils) using euclidean distance and Ward linkage.To serve as a well-controlled comparison to the 10 niches and interfaces found by NIPMAP, 10 phenotypic clusters were determined for each cell type by cutting the hierarchical clustering dendrogram at the height needed to split the dendrogram in 10 groups using R's cutree function (dendextend package).
land), and in collaboration with the Patient's Tumor Bank of Hope (PATH, Germany) at the breast cancer centers at St. Johannes Hospital Dortmund and Institute of Pathology at Josefshaus (Germany) and the University Hospital Giessen and Marburg, Marburg site (Germany).Collection was approved by the Ethics Committee Northwest/Central Switzerland (#2016-00067), the Ethics Committee Zurich (#2016-00215), and the faculty of medicine ethics committee at Friedrich-Wilhelms-University Bonn (#255/06).

Table 1 :
NIPMAP concisely summarizes the cellular and phenotypic architecture of tissue samples as a table of niches/interfaces and associated cellular phenotypes.The table reports the niche-phenotype associations with a correlation of at least 0.3 and a false discovery rate of at most 1%, as shown on Fig.3A.