## Main

Lung cancer remains the leading cause of cancer-related death, accounting for greater than 20% of all cancer mortalities10. Lung adenocarcinoma (LUAD), a type of non-small cell lung cancer (NSCLC), is the most common subtype and is characterized by distinct cellular and molecular features11. The tumour immune microenvironment (TIME) is a major source of LUAD heterogeneity and influences both disease progression and response to therapy1,3,5. The positioning of immune cells within tumours is known to dictate their function12,13,14; therefore, understanding the spatial landscape of the lung TIME would provide mechanistic insights into disease progression, reveal novel therapeutic vulnerabilities and unveil biomarkers of response to existing treatments. Here, using highly multiplexed imaging mass cytometry (IMC), we interrogated spatially resolved features of the TIME that are associated with clinical outcomes in patients with LUAD. Using a deep neural network model, we demonstrated that various clinical outcomes, such as progression, can be predicted using features that an artificial intelligence-based system can extract from raw IMC images. The ability to identify patients who will progress with a high degree of certainty could guide future post-surgical management.

To spatially characterize the cellular landscape of the lung TIME, we applied IMC to samples from 416 patients with LUAD (Fig. 1a, Extended Data Fig. 1a and Supplementary Table 1). We optimized a 35-plex antibody panel to identify cancer cells, stromal cells, and innate and adaptive immune lineages with diverse functional substates (Extended Data Figs. 1b–d and 24 and Supplementary Table 2). In total, we detected 1,644,178 cells and used a supervised lineage assignment approach to classify 14 distinct immune cell populations, along with tumour cells and endothelial cells using canonical lineage markers (Fig. 1b–f and Extended Data Figs. 1c and 5a).

Consistent with previous work15, high-grade solid tumours had the greatest immune infiltrate (44.6%) compared with micropapillary, acinar, papillary and lepidic architectures (37.0%, 39.7%, 32.8% and 32.7% respectively; Fig. 1g). This was driven by shifts within the myeloid compartment, as there were no significant differences in the average frequency of total lymphoid cells across histological patterns (Fig. 1h,i). In particular, macrophages were the most frequent cell population within the lung TIME, representing 12.3% of total cells (Fig. 1e) and 34.1% of immune cells (Fig. 1f), consistent with their critical role in the NSCLC niche16. We found the highest enrichment of CD163+ macrophages (putative ‘M2-like’ or protumorigenic) in solid tumours, which are one of the most aggressive architectures (Extended Data Fig. 5b, Supplementary Table 3, Supplementary Fig. 1, xii). The prevalence of CD163+ macrophages was strongly correlated with FOXP3+ immunoregulatory T cells (Treg cells) in the solid pattern (Extended Data Fig. 5c, box 1). This relationship was much less pronounced in low-grade lepidic and papillary architectures, which had a strong correlation between CD163+ macrophages and cytotoxic CD8+ T cells (Extended Data Fig. 5c, box 2). These associations suggest a potential interplay between macrophage and T cell populations in the TIME across LUAD patterns. Of note, solid tumours were also enriched for additional myeloid components, including neutrophils, non-classical monocytes and intermediate monocytes (Supplementary Fig. 1, ii, xiv and xv). Similarly to macrophages, these myeloid populations all exhibit diverse functional states in NSCLC biology4 and exemplify the complex heterogeneity that exists in the lung TIME.

We next assessed the relationship between immune populations and clinical or pathological variables by interrogating the frequency of individual cell types as a percentage of total cells within each image (Fig. 2a,b, Supplementary Tables 1 and 4 and Supplementary Fig. 1). Each image was cross-referenced with clinical data from patients, including sex, age, body mass index (BMI), smoking status, stage, progression, survival and histological subtype. Although we discovered established survival associations for several cell types17,18, most were driven by an enrichment in specific clinical or pathological groups. For example, although mast cells were associated with prolonged survival, they were overrepresented in non-smokers, early-stage patients and those with lepidic tumours (Fig. 2b, box 1)—all clinical variables associated with good outcomes. Similarly, CD163+ macrophages, non-classical monocytes and intermediate monocytes were enriched in solid tumours, which have poor outcomes (Fig. 2b, box 2). By contrast, B cell frequency was most significantly associated with better overall survival, independent of any confounding clinical or pathological variables (Fig. 2b, box 3).

Beyond survival associations, we found additional relationships between cell frequencies and specific clinical subgroups. For example, T cell subsets exhibited specific enrichment based on sex and age. Consistent with previous reports19, CD4+ helper T cells were significantly enriched in female patients (Fig. 2b, box 4), who have better overall survival than male patients20,21—an association also observed in our dataset (Extended Data Fig. 5d). Moreover, older patients (more than 75 years of age) had fewer intratumoural CD8+ T cells (Fig. 2b, box 5), reminiscent of immune ageing that is linked to reduced expression of co-stimulatory molecules, antigen receptor diversity and immunotherapy response22,23,24,25. Overall, these data reveal new relationships and add biological insight into established associations between cell frequencies and clinical outcomes within the TIME.

To gain insight into the cellular architecture and spatial organization of the LUAD TIME, we characterized direct interactions and communication patterns between single cells by quantifying cell–cell spatial relationships. Using permutation testing, we assigned the likelihood of interaction or avoidance behaviours between cell pairs across LUAD architectures ranging from least to most aggressive (Fig. 2c,d). Tumour cells had a universal tendency towards homotypic interactions and a relative avoidance with other cell types (Fig. 2d), consistent with spatial analyses in breast cancer12. Homotypic interactions were high across several immune cell populations, suggestive of a spatially coordinated TIME. Many of these interactions were discordant with the pattern of cell frequencies, revealing that the spatial relationship of cell–cell interactions may hold greater prognostic value than frequency alone. In higher-grade histological patterns (solid and micropapillary), both neutrophils and endothelial cells had an increased tendency for interactions with cancer cells compared with lower-grade subtypes (Fig. 2d, box 1a,b). These relationships are consistent with the ability of neutrophils to facilitate tumour cell extravasation into blood vessels, thereby promoting haematogenic metastasis26,27, and solid-predominant LUAD has the highest rate of metastasis compared with other histologies28.

In low-grade lepidic and papillary tumours, CD8+ and CD4+ T cells had a stronger tendency for interaction with cancer cells than high-grade solid LUAD (Fig. 2c,d, box 2), despite the fact that overall CD8+ and CD4+ T cell frequencies were not associated with progression (Fig. 2b, box 6). This relationship echoes previous findings that the spatial interaction of T cells and tumour cells is a stronger indicator of non-recurrence than T cell density alone29. Moreover, despite the co-occurrence of CD163+ macrophages and CD8+ T cells in low-grade tumours (Extended Data Fig. 5c, box 2), their tendency to directly interact was strongest in high-grade tumours and decreased as tumors became lower grade (Fig. 2d, box 3). This is consistent with the role for CD163+ macrophages in suppressing CD8+ T cell function within the TIME30. Similarly, B cells exhibited a greater tendency to interact with CD163+ macrophages in high-grade tumours (Fig. 2d, box 4), despite the observation that high B cell frequency was indicative of prolonged survival (Fig. 2b, box 3). In patient tumours specifically enriched in mature antigen-presenting CD40+ B cells31, these cells became generally more interactive across TIME populations (Extended Data Fig. 5e). Finally, endothelial cells tended to interact with many immune populations in high-grade tumours compared with low-grade tumours, including CD163+ macrophages and monocytes (Fig. 2d, box 5); these interactions may be reminiscent of innate regulation of vascular inflammation, consistent with our observation that immune infiltration was highest in the solid pattern (Fig. 1g), driven largely by differences within the myeloid compartment (Fig. 1h). Together, these analyses paint an overall picture of how stromal interactions shift among histological patterns and exemplify how spatial relationships, rather than cell frequency alone, are important to understand TIME biology.

## LUAD architecture and survival outcomes

To complement our analyses of cell frequencies and interactions, we next explored how cellular phenotypes within the microenvironment relate to survival. We extracted all microenvironmental populations represented in our dataset (including endothelial, myeloid or lymphoid compartments) and performed t-distributed stochastic neighbour embedding (t-SNE) based on functional markers in our antibody panel (Extended Data Fig. 6a–c). First, outside the immune compartment, we observed a distinct population of proliferative Ki-67+ endothelial cells, whose frequency was associated with poor overall survival (Fig. 3a and Supplementary Table 5) and high-grade solid tumours (Extended Data Fig. 6d). Proliferation of the endothelium underlies angiogenesis in response to hypoxia, a common feature of aggressive tumours32. We therefore explored vascular interactions in high-grade patterns and found an enrichment in endothelial cell interactions with neutrophils (Fig. 2d, box 6), leading us to question how specific neutrophil subsets may respond to hypoxic conditions. We observed several neutrophil states based on the expression pattern of three markers: HIF1α+, ARG1+ and MMP9+ (Fig. 3b and Extended Data Fig. 6b). Despite a high frequency of total neutrophils not being correlated with survival in our cohort (Fig. 2b, box 7), an increase in the proportion of the HIF1α+ subset was significantly associated with worse overall survival (Fig. 3b and Supplementary Table 5), which may reflect cases in which angiogenesis is insufficient to alleviate hypoxia. Neutrophils and other granulocytes are sensitive to low-oxygen conditions, and can adopt immunosuppressive behaviours against T cells in this setting33. Indeed, we observed that neutrophils exhibit a higher tendency to interact with immunosuppressive Treg cells in high-grade tumours (Fig. 2d, box 7). Phenotypic analysis within the lymphoid compartment revealed active ERK signalling within a subset of CD4+ T cells associated with prolonged survival (Fig. 3c and Supplementary Table 5), which is known to suppress differentiation into Treg cells34. Consistently, pERK+CD4+ T cells were enriched in low-grade lepidic tumours (Extended Data Fig. 6e) where neutrophil–Treg cell interactions were the lowest (Fig. 2d, box 7), and reduced in high-grade solid tumours (Extended Data Fig. 6e) where Treg cells were most abundant (Fig. 2b, box 8). Together, these findings provide a snapshot of spatially resolved phenotypic programs associated with more aggressive tumours, as they relate to tumour hypoxia and an immunosuppressive niche.

Beyond pairwise interactions, our data hint at the existence of larger cellular communities that are distinctively organized within the TIME across LUAD subtypes. To assess this, we followed a canonical approach to establish cellular neighbourhoods by first identifying the ten nearest spatial neighbours for each individual cell12,14. We then reclassified cells on the basis of their spatially defined cellular neighbourhood (CN). Using this approach, we discovered ten CNs that recapitulated both new and known tissue architectures, which we named: tumour boundary (CN1), undefined (CN2), pan-immune hotspot 1 (CN3), lymphoid enriched (CN4), tumour core (CN5), macrophage enriched (CN6), neutrophil enriched (CN7), pan-immune hotspot 2 (CN8), B cell enriched (CN9) and vascular niche (CN10) (Extended Data Fig. 7a,b). To identify CNs associated with survival, we performed Kaplan–Meier analysis by designating the frequency of CNs for each patient as high (CNhigh; z-score ≥ 0) or low (CNlow; z-score < 0). Consistent with our findings related to B cell frequency (Fig. 2b, box 3, and Supplementary Table 4), CN9high (B cell-enriched) was significantly associated with increased overall survival (Extended Data Fig. 7a and Supplementary Table 6), despite minimal differences in CN9 representation across histological patterns (Extended Data Fig. 7c, box 1). CN3high (pan-immune hotspot 1) and CN4high (lymphoid enriched) were also significantly correlated with increased overall survival across LUAD histologies (Extended Data Fig. 7a,c, box 2). When survival was analysed within histological patterns, associations with increased survival were noted for CN4low (lymphoid enriched) within the lepidic pattern, CN9high (B cell enriched) in the acinar pattern and CN2low (undefined) and CN4high (lymphoid enriched) in the solid pattern (Supplementary Fig. 2).

We were particularly interested in dissecting B cell neighbourhoods in greater detail, given the prognostic value of B cells in our dataset (Fig. 2b, box 3). Two variables that affect CN analysis include the number of interacting cells within a neighbourhood (denoted as n) and the number of total neighbourhoods (denoted as tCN). To further explore the spatial relationship between CNs and survival, we first altered the number of nearest spatial neighbours for each individual cell (n) while maintaining a constant number of neighbourhoods (tCN = 10). Across a wide range of n values (n = 3–30), CNs enriched in B cells were significantly associated with survival (Extended Data Fig. 8a), with the most significant association resulting from n = 10 and tCN = 10 (Extended Data Fig. 7a). To resolve B cell interactions that drive this survival advantage, we increased the tCN to 30. Using this approach, we were able to resolve four B cell-enriched neighbourhoods (CN7, CN11, CN21 and CN25) (Fig. 3d). Across these neighbourhoods, the survival advantage associated with B cells was negated when CNs concurrently displayed an enrichment in Treg cells (CN7 and CN21), whereas the survival advantage was maintained for CN11 (P = 0.0389) and CN25 (P = 0.0034) where Treg cells were lower (Fig. 3e and Supplementary Table 7). When comparing these two neighbourhoods, we noted a greater survival advantage for CN25, which was also enriched for CD4+ helper T cells (by contrast, CN11 was enriched for B cells alone). To determine whether the improved survival benefit associated with CN25 was related to the interaction of B cells and CD4+ helper T cells or to the prevalence of both cell types independent of their interaction, we plotted the survival association of patients who were B cell-high and CD4+ helper T cell-high versus patients who were B cell-high and CD4+ helper T cell-low and observed no significant difference (P = 0.644; Extended Data Fig. 8b). Moreover, the correlation between T cells and B cells in our cohort was low with an R2 of 0.210, thus making it less likely that CD4+ helper T cells and B cells are interacting as a result of a strong correlation in the prevalence of both cell types (Extended Data Fig. 8c,d). These data suggest that the improved survival association for CN25 is related to the interaction of B cells and CD4+ helper T cells beyond the prevalence of both cell types alone. However, an enrichment in Treg cells was still sufficient to negate this survival benefit (CN21), emphasizing the importance of multicellular B cell interactions within the TIME. Finally, there was no significant association between these B cell-enriched CNs and any other clinical variable, including histological subtype (Extended Data Fig. 9a,b and Supplementary Table 7). Together, these findings suggest that the spatial organization of TIME interactions may provide additional insight into individual patient survival beyond histological subtype classifications and cell prevalence.

## Predicting outcomes using deep learning

Given our finding that spatial neighbourhoods are predictive of survival regardless of LUAD architectures, we wondered whether we could leverage spatial data to predict clinical outcomes by using a deep-learning approach (Fig. 4a). We took advantage of transfer learning by using a pretrained convolutional neural network model. We chose the deep residual networks35 architecture pretrained on the ImageNet dataset36. Using the k-fold cross-validation method, we split the data into five folds, with 20% of the data for each fold. In our experiments, we considered four of the folds (80% of the patients) as the training data and the remaining fold (20%) for testing to evaluate the prediction accuracy. We repeated this for all possible combinations. For proof of principle, we first assessed whether the frequency of cells alone within each image would be sufficient to predict clinical variables. We tested routine clinical variables that demonstrated some variation in cell-type frequencies including histological subtype, sex, survival, BMI, cancer progression, cancer stage, age and smoking. Our goal was to increase the ability to predict clinical outcomes above the baseline prediction score, which reflects the chance of predicting the major class over the total number of examples involved for that specific variable. However, we saw negligible increases in prediction score above baseline for most of the clinical variables, suggesting that cell frequency alone does not capture the tumour architecture with enough resolution to predict clinical variables with high confidence (Fig. 4b and Supplementary Table 8).

We next investigated whether the integration of spatial information, obtained by inputting all the raw lineage marker images into our model, would help recognition performance. Using this approach, we observed a significant boost in performance for all clinical variables tested compared with cell frequency alone, suggesting that single-cell positional information that is encoded in each of the multiplex image scans is critical to interpreting the complex TIME that underlies clinical correlates (Fig. 4c and Supplementary Table 9). Spatial information did not confer an increase above baseline for sex, indicating that the TIME that underlies tumours from male individuals versus female individuals is indistinguishable using our model. Finally, to compare whether additional channels from our IMC images would lead to better recognition performance (that is, integration of spatially resolved immune functional substates), we used all the markers in our panel and repeated the predictions. The additional channels did not boost performance, suggesting that there is a certain threshold beyond which additional markers do not add value to clinical predictions based on our model (Fig. 4d and Supplementary Table 10).

We next sought to leverage our deep-learning approach to address a clinically meaningful problem. Although adjuvant chemotherapy has long been demonstrated to improve overall survival in patients with NSCLC who have stage IIA to IIIA disease, patients with stage I tumours smaller than 4 cm currently do not receive adjuvant chemotherapy37,38, despite that many of these patients are at increased risk for progression39. In fact, these patients currently have minimal approved peri-adjuvant therapeutic options37. Even with complete lung tumour resection, a significant proportion of these patients will relapse39. Therefore, we next assessed whether we could predict progression in patients with stage IA–IB lung cancer. The use of standard clinical information in our model (sex, age, BMI, smoking status, pack-years, surgery type, maximum tumour size, tumour grade, predominant histological pattern and stage) was insufficient to predict progression over baseline (Fig. 4e, Extended Data Fig. 10a and Supplementary Table 11). Consistent with our previous results, the frequency of cells alone was also insufficient to significantly predict progression (Fig. 4e, Extended Data Fig. 10b and Supplementary Table 11). When we included spatial information, however, our model predicted progression with 95.9% accuracy from a single 1-mm2 tumour core, smaller than most standard needle biopsies used to establish a diagnosis of lung cancer (Fig. 4e, Extended Data Fig. 10c and Supplementary Table 11). Additional markers, again, did not boost the prediction accuracy (Fig. 4e, Extended Data Fig. 10d and Supplementary Table 11), indicating that there is a minimum threshold of markers required to ascertain accurate predictions, a promising finding for translational practicality.

To validate our findings and to assess how heterogeneity within the lung TIME may affect our predictions, we performed IMC on an independent validation cohort consisting of 60 patients with primary LUAD that included two spatially distinct cores per tumour (Supplementary Table 12). In this new dataset, after training on our discovery cohort, the use of raw images from our lineage markers was able to predict progression with high accuracy (94.2% accuracy; Fig. 4f and Supplementary Table 13), with no added benefit from integrating the entire panel (93.3% accuracy). Of note, our validation cohort was more balanced, with a lower baseline prediction score (75.0%), confirming that there are spatially defined features within these images that probably reflect clinical outcomes. When assessing intratumour heterogeneity, we found substantial agreement in the predictions between two distinct cores from the same tumour (91.7%, lineage markers model). Despite these promising results, we acknowledge that tumour heterogeneity remains a challenge for accurate clinical and pathological diagnosis, an area of research that may benefit greatly from the application of artificial intelligence approaches.

One limitation of highly multiplexed imaging is the impracticality of translating discoveries into a clinically actionable assay that is broadly accessible. We thus sought to determine the minimum threshold of markers that could be used to predict progression without compromising prediction accuracy, with the goal of reducing our panel to approximately five markers (which is more likely to be amenable to clinical pathology practice). We first assessed the predictive performance of the spatial information derived from each individual marker in our antibody panel. Not surprisingly, we found that CD20 (a B cell marker) was most associated with an improved prediction of progression in our discovery cohort (Supplementary Table 14). Next, on the basis of the ranking of individual prediction scores, we combined the top two, three, four or five markers and tested whether combinations could predict progression in our validation dataset with high accuracy. Using this approach, we did not reach the level of accuracy that was achieved when all lineage markers were used (Fig. 4g and Supplementary Table 15). As an alternative approach, we took advantage of the spatial information embedded within our dataset, by using our CN analysis as a guide to identify rational combinations of markers whose spatial distribution are strongly correlated with survival (Fig. 3d and Supplementary Table 7). We reasoned that specific interactions may have prognostic value and would therefore be informative in predicting progression. Using this approach, we discovered that the combination of five markers—CD14, CD16, CD94, αSMA and CD117 (enriched in CN23, the neighbourhood most significantly associated with overall survival)—resulted in 90.8% accuracy (Fig. 4g and Supplementary Table 15). When we added CD20 (the individual marker demonstrating the highest predictive potential for progression), we increased accuracy to 93.3%, with 95.6% precision and recall. Overall, these data suggest that spatially resolved single-cell datasets may be highly valuable in the future to help to inform personalized peri-operative care plans to minimize toxicity for those destined to be cured, or to increase cure rates for those destined to recur.

## Discussion

Here we applied highly multiplexed IMC to characterize the cellular landscape of the LUAD TIME. We identified cellular dynamics and spatial features that correlate with distinct clinical outcomes including patient survival. Our data represent a valuable resource that adds to a quickly evolving body of literature supporting the importance of spatially resolved single-cell datasets in understanding how the TIME architecture relates to tumour biology. As lung cancer remains by far the largest cause of cancer-related death, there is untapped value in combining single-cell technology with deep-learning approaches to develop intelligent predictive algorithms to help to triage patients onto the therapeutic regimens that are best suited for their individual cancer. Our findings utilize a 5-µm section of a single 1-mm2 core of formalin-fixed paraffin-embedded tumour tissue to predict recurrence with high accuracy, which can be obtained from surgical resection or a biopsy. Nevertheless, clinical sampling bias remains a challenge in studies in which small regions of tumours are captured within a small amount of material. Future work will focus on using lower-plex technologies while attempting to maintain predictive accuracy to achieve translational feasibility. Our findings represent an important advance over existing prediction tools that use clinical and pathological variables and may enable more effective utilization of a growing armamentarium of peri-adjuvant systemic therapies to improve cancer outcomes40,41.

## Methods

### Clinical cohort

A cohort of 416 patients with LUAD were included in this study with follow-up time ranging from February 1996 to July 2020. For the validation cohort, 60 patients with LUAD with follow-up time ranging from February 2012 to May 2022 were included with two distinct cores per patient. All samples obtained were primary treatment-naive LUADs diagnosed by a board-certified pathologist following surgical resection or biopsy. Clinical information on all patients included can be found in Supplementary Tables 1 and 12. Tissue microarrays were constructed by selecting one 1-mm2 core from the surgical tumour specimen. Patient samples and clinical information were obtained following written informed patient consent. The protocols for human sample biobanking were approved (ethics, scientific and final) through the IUCPQ Biobank, protocol number IRB #2022-3474, 22090, and the MUHC protocol numbers IRB #2014-1119 and 2019-5253.

### Sample staining and IMC

Formalin-fixed paraffin-embedded (FFPE) slides were deparaffinized at 70 °C by incubation in EZ Prep solution (Roche Diagnostics) followed by antigen retrieval at 95 °C in standard cell conditioning 1 solution (Roche Diagnostics). The Ventana Discovery Ultra auto-stainer platform (Roche Diagnostics) was used for antigen retrieval. Slides were rinsed with 1× PBS and incubated for 45 min in Dako serum-free protein block solution (Agilent). Slides were stained with a cocktail containing metal-tagged antibodies at optimized dilutions overnight at 4 °C. All conjugations were performed by the Single Cell and Imaging Mass Cytometry Platform at the Goodman Cancer Institute (McGill University), using Maxpar Conjugation Kits (Fluidigm). Information on the antibodies used can be found in Supplementary Table 2. Slides were then washed with 0.2% Triton X and 1× PBS. An optimized dilution of the secondary antibody cocktail containing metal-conjugated anti-biotin was prepared in Dako antibody diluent. After a 1-h incubation, slides were washed with 0.2% Triton X and 1× PBS. Before IMC acquisition, Cell-ID Intercalator-Ir (Fluidigm) at a dilution of 1:400 was used to counterstain slides in 1× PBS for 30 min at room temperature. Slides were then rinsed for 5 min with distilled water and air-dried. IMC images were acquired at a resolution of roughly 1 μm. Cores were laser-ablated at a frequency of 200 Hz using the Hyperion Imaging System (Fluidigm) and raw data were compiled using the Fluidigm commercial acquisition software. Of note, in our validation cohort, we stained with alpha cleaved H3 (176Yb) instead of histone H3 (176Yb). Accordingly, this marker was excluded from validating our deep-learning predictions of progression.

### Antibody optimization

Antibodies were optimized on control tissues including the spleen, tonsil, appendix, placenta, thymus, normal lung and LUAD. Multiplex quality-control staining of positive and negative control tissue can be seen in Extended Data Figs. 24, with four representative images staining for each of the 35 markers in our panel.

### Data transformation and normalization

Data presented were not transformed. All analyses were based on raw IMC measurements. For heatmap visualization, expression data were normalized to the 95th percentile and z-scored cluster means were plotted. Single-cell marker expressions were summarized by mean pixel values for each channel.

### Cell segmentation and lineage assignment

All markers underwent a staining quality check before cell segmentation (Extended Data Figs. 24). A small number of markers did not consistently stain every sample in our cohort, so we chose not to make any conclusions based on those markers (GM-CSFR, PD-1, PD-L1 and B7-H3). Note that CD163 (a putative ‘M2-like’ marker) was chosen to subdivide macrophage populations on the basis that this marker is often upregulated in tumour-associated macrophages and has been used to categorize macrophages in multiplex imaging studies14,42,43. Although the terms ‘M1/pro-inflammatory’ and ‘M2/anti-inflammatory’ have traditionally been used to classify macrophage activation states, these terms are outdated and were therefore avoided44,45. Using a novel cell segmentation pipeline that combines classical and modern machine-learning-based computer vision algorithms, we segmented all cells contained within the IMC images. The model used is a fully automated hybrid cell detection model that eliminates subjective bias and enables high-throughput image segmentation. It allows us to accurately distinguish cells across diverse tissue microenvironments and resolve low-resolution structures. The details of our image segmentation approach can be found here: https://biorxiv.org/cgi/content/short/2022.02.27.482183v1. Owing to existing phenotyping challenges for highly multiplexed imaging, we created a cell phenotyping pipeline to assign cell phenotypes. Our strategy relies on canonical lineage markers and uses a supervised hierarchal approach that integrates the staining quality, the expected population abundance and cell lineage maturation to assign cells. We used k-means clustering46 and a mixture of generalized Gaussian models47 to generate a mask or level for each marker within a multi-level image stack created based on staining intensity. This allowed us to evaluate the existence of a marker at a particular location. Each marker in our panel was assessed using six levels and the appropriate mask was subsequently manually curated for each marker. Each mask is produced using the following procedure:

1. (1)

The greyscale image channel is convolved with a median filter with a particular window size (3 × 3).

2. (2)

Each pixel in the image is clustered into six groups of intensity levels using the k-means algorithm.

3. (3)

For each channel, we then selected all groups up to a particular level as foreground (1) and the rest as background (0).

4. (4)

To obtain smoother binary masks, we also applied a morphological blob removal process in which binary blobs of a particular area are removed from masks to avoid noisy regions.

5. (5)

To further refine the accuracy of select markers, additional channel-specific morphological operations were applied by computing an additional binary mask obtained using the adaptive binarization method with a sensitivity of 0.4. This mask is then amalgamated with the mask obtained in step 4.

As a formula, for each cell $${c}_{i}$$, we consider the curated mask for each lineage marker $${M}_{k}$$, where $$k=1,\ldots ,n$$ and n is the number of lineage markers. Let us assume $${p}_{{c}_{i}}^{j}$$ be the jth pixel that lies in the surrounding of $${c}_{i}$$ and each pixel has the following presence vector based on the lineage markers:

$$E({p}_{{c}_{i}}^{j})=\{{p}_{{M}_{1}}^{j},{p}_{{M}_{2}}^{j},\ldots ,{p}_{{M}_{n}}^{j}\}$$

where $${p}_{{M}_{i}}=\{0\,\text{or}\,1\}$$, which determines whether the pixel $${p}_{{c}_{i}}^{j}$$ is positive for a particular marker. Next, to determine whether each pixel within a cell is positive or negative for a given marker, we determined the majority vector by summing the presence of all vectors as:

$${M}_{{c}_{i}}=\left\{\mathop{\sum }\limits_{j=1}^{{N}_{{c}_{i}}}{p}_{{M}_{1}}^{j},\mathop{\sum }\limits_{j=1}^{{N}_{{c}_{i}}}{p}_{{M}_{2}}^{j},\ldots ,\mathop{\sum }\limits_{j=1}^{{N}_{{c}_{i}}}{p}_{{M}_{n}}^{j}\right\}$$

where $${N}_{{c}_{i}}$$ is the number of pixels in the cell $${c}_{i}$$. The maximum value in vector $${M}_{{c}_{i}}$$ determines the cell-type assignment. Cell lineages were assigned in rank priority order (Extended Data Fig. 1c). See the ‘Code availability’ section for additional details.

### Cell–cell pairwise interaction

We performed a permutation-test-based analysis of spatial single-cell interactions to identify significant pairwise interaction–avoidance between cells12. Interacting cells were defined as those within six pixels. P values less than 0.01 were deemed significant.

### Neighbourhood identification

To generate CNs, we used a ‘window’ capture strategy consisting of the number of cells (n) in closest proximity to a given cell as previously described14. Each window is a frequency vector consisting of the types of X (as indicated) closest cells to a given cell. Obtaining all the window vectors for each cell, initial cells (Extended Data Fig. 7a) were clustered using Scikit-learn, a software machine-learning library for Python, and MiniBatchKMeans clustering algorithm version 0.24.2 with default batch size = 100 and random_state = 0. Subsequent CN analysis was performed using the MiniBatchKMeans clustering algorithm version 1.1.2 with default batch size = 1,024 and random_state = 0. Every cell was subsequently allocated to a CN based on their defining window. The prevalence of each neighbourhood in each core was normalized so that the sum of neighbourhood prevalence for that core was 100%. Values were then z-scored and cores with a z-score above or equal to 0 and below 0 were compared for survival outcomes.

### t-SNE

All t-SNE plots were generated in MATLAB (version 2019b) using default parameters. For visualization, expression data were normalized to the 95th percentile.

### Deep learning

All deep-learning analysis steps were performed in Python (version 3.7.12). We used the TensorFlow (version 2.8.0) framework alongside Keras, which now acts as an interface for the TensorFlow library. We have two modes of data for our experiments: (1) raw IMC images, and (2) cell frequencies obtained from cell phenotyping. For raw IMC images, the pretrained ResNet-50 model with weights pretrained on ImageNet is first utilized to extract embeddings from each channel within the multiplex IMC channels. Each channel is fed to the three-channels of ResNet-50 and the embeddings are computed before the classificationlayers are obtained. Each channel produces an embedding vector size of 2,048 and then these are all concatenated into a single vector of features representing that specific core. We then reduced the dimensionality of the extracted feature vectors using mini-batch sparse principal components analysis to a specific number of principal components (for most applications we tried nine principal components). Principal components were later used to train a support vector machine with a radial basis function kernel with the parameters specified in our codebase. For the imbalanced datasets, we used a random oversampling to achieve an equal number of samples for each class during the training. The function used is RandomOverSampler (version 0.9.1) and it is available at: https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.RandomOverSampler.html. To compare with cell frequencies, we imagined that cell-frequency vectors also represent a core (in which each vector is simply a vector of cell prevalence of each type). Similar to images, we reduced the dimensionality of the extracted feature vectors to nine principal components and then trained a support vector machine with a radial basis function kernel with the same parameters. Various classes of Scikit-learn (version 1.0.2) machine-learning libraries have been utilized for the tasks of splitting the dataset, dimensionality reduction and training support vector machines for the prediction tasks. All feature extraction and training steps were performed on Google Cloud GPU/TPU servers. See the ‘Code availability’ section for additional details.

### Statistical analysis and workflow

All image analysis steps were performed in MATLAB (version 2019b) and Python (version 3.7.12). Statistical analyses were performed using RStudio version 4.2.2 and GraphPad Prism 9 statistical software. Data are expressed as mean ± s.e.m. or mean ± s.d.; P < 0.05 was considered significant unless otherwise indicated. All statistical tests are indicated in the figure legends. Survival data were analysed by log-rank (Mantel–Cox) test.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.