Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Spatial domain analysis predicts risk of colorectal cancer recurrence and infers associated tumor microenvironment networks

## Abstract

An unmet clinical need in solid tumor cancers is the ability to harness the intrinsic spatial information in primary tumors that can be exploited to optimize prognostics, diagnostics and therapeutic strategies for precision medicine. Here, we develop a transformational spatial analytics computational and systems biology platform (SpAn) that predicts clinical outcomes and captures emergent spatial biology that can potentially inform therapeutic strategies. We apply SpAn to primary tumor tissue samples from a cohort of 432 chemo-naïve colorectal cancer (CRC) patients iteratively labeled with a highly multiplexed (hyperplexed) panel of 55 fluorescently tagged antibodies. We show that SpAn predicts the 5-year risk of CRC recurrence with a mean AUROC of 88.5% (SE of 0.1%), significantly better than current state-of-the-art methods. Additionally, SpAn infers the emergent network biology of tumor microenvironment spatial domains revealing a spatially-mediated role of CRC consensus molecular subtype features with the potential to inform precision medicine.

## Introduction

Colorectal Cancer (CRC) is the fourth most common type of cancer and the second leading cause of cancer-related deaths worldwide1. This multi-factorial disease like other carcinomas, develops and progresses through the selection of epithelial clones with the potential to confer malignant phenotypes in the context of a reciprocally coevolving tumor microenvironment (TME) comprising immune and stromal cells2,3. CRC patients are staged using the well-established tumor-node-metastases (TNM) classification4,5. However, there is significant variability in patient outcomes within each stage. For example, CRC will recur in up to 30% of Stage II patients despite complete tumor resection, no residual tumor burden and no signs of metastasis6. In contrast, more advanced CRC has been known to show stability or indeed even to spontaneously regress6,7.

The intrinsic plasticity of the TME underlying this variability in outcome is controlled by complex network biology emerging from the spatial organization of diverse cell types within the TME and their heterogeneous states of activation8,9,10. The important role of the TME in CRC progression and recurrence has recently been highlighted by the identification of four consensus molecular subtypes (CMS)11,12, functional studies defining the critical role of stromal cells in determining overall survival13, and the development of Immunoscore®14 which quantifies tumor-infiltrating T lymphocytes in different regions of the tumor and associates their infiltration with CRC recurrence14,15. However the TME can be further harnessed to significantly improve CRC prognosis through the identification of biomarkers mechanistically linked to disease progression and the development of novel therapeutic strategies.

Deeper understanding of the TME may arise from imaging methods capable of labeling >7 cellular and tissue components in the same sample (hyperplexed16 (HxIF) fluorescence and other imaging modalities)16,17,18,19,20. To fully extract the intrinsic information within each primary tumor we have developed a spatial analytics computational and systems pathology platform (SpAn) applicable to all solid tumors to analyze the spatial relationships throughout TME signaling networks. SpAn constructs a computationally unbiased and clinical outcome-guided statistical model enriched for a subset of TME signaling networks that are naturally selected as dependencies of the corresponding malignant phenotype. Traditionally, advances in experimental and systems biology have been made by identifying associations between differential biomarkers expressions/correlations, or clusters in T-SNE plots, with particular outcome-specific malignant phenotypes, such as cancer progression or recurrence phenotypes. Instead of using this association-driven paradigm, SpAn introduces an outcome-driven approach to predict 5-year risk of CRC recurrence in patients with resected primary tumor that also enables inference of recurrence-specific network biology.

## Results

### Hyperplexed immunofluorescence imaging of tissue microarrays

The acquired data were generated using GE Cell DIVETM, also known as MultiOmyx19 (GE Healthcare, Issaquah, WA), hyperplexed16 immunofluorescence (HxIF) imaging and image processing workflow instrument. Cell DIVETM can perform hyperplexed imaging of greater than 60 biomarkers via sequentially multiplexed imaging of 2–3 biomarkers plus DAPI nuclear counterstain through iterative cycles of label–image–dye inactivation visualized in Supplementary Fig. 119 (see the “Methods” section for more details). Extensive validation of this approach has demonstrated that a majority of epitopes tested are extremely robust to the dye-inactivation process. The biological integrity of the samples was preserved for at least 50 iterative cycles19.

In this study we use 55 biomarkers, which include markers for epithelial, immune, and stromal cell lineage, along with those in categories, which include (1) biomarkers sampling the network biology of signaling pathways, (2) biomarkers associated with extracellular transport and metabolism, (3) biomarkers associated with tumor suppressive potential, (4) biomarkers associated with oncogenic potential, (5) biomarkers associated with cell–cell adhesion, cellular and stromal structure, (6) biomarkers associated with post-translational modifications (PTM), and (7) biomarkers associated with cell types and their states. They are detailed in Supplementary Fig. 2, with additional details in Supplementary Table 1. Figure 1a shows the HxIF image stack of a 5-µm thick and 0.6-mm wide tissue microarray21 (TMA) spot from resected primary tumor of a Stage II CRC patient labeled with the 55 biomarkers plus DAPI. Figure 1b highlights a sub-region of this patient TMA spot enabling optimal visualization of the 55 HxIF biomarker images resulting from the iterative label–image–chemical-inactivation cycles.

Cell DIVETM was employed to generate HxIF image stacks of FFPE tissue microarrays from resected tissue samples from 432 chemo-naive CRC patients at single-cell resolution. This 55-dimensional spatial profiling of the patient-level tumor microenvironment served as input in our study. The 432 patient cohort was retrospectively acquired from Clearview Cancer Institute of Huntsville Alabama, and included Stage I through III CRC patients, who were followed between the years of 1993 and 2002. As shown in Supplementary Table 2, the median patient age and gender proportions were similar across all stages, with CRC recurring in 65 patients. The outcome distribution of the patients and their clinical attributes across the CRC stages are detailed in Supplementary Table 2. The use of chemo-naive (no administration of neoadjuvant or adjuvant therapies for the 5+ years of follow-up) CRC patient cohort provides SpAn the opportunity to interrogate unperturbed primary tumor biology.

### Recurrence-guided and spatially informed CRC prognosis

SpAn performs a virtual three-level spatial dissection of the tumor microenvironment, by first explicitly decomposing the TMA into epithelial and stromal regions as detailed in Methods and illustrated in Supplementary Fig. 3. The cells in the epithelial region are identified using E-cadherin cell–cell adhesion labeling and pan-cytokeratin, with individual epithelial cells segmented using a Na+K+ATPase cell-membrane marker, ribosomal protein S6 cytoplasmic marker, and DAPI-based nuclear staining. The resulting epithelial spatial domain of the TMA in Fig. 1a is shown in Fig. 1c. The remaining cells are assigned to the stromal domain and are visualized in Fig. 1d. These stromal cells have diverse morphologies19. Based on the epithelial and stromal domains, SpAn also identifies a third epithelial-stromal domain, shown in Fig. 1e, to explicitly capture a 100 µm boundary wherein the stroma and malignant epithelial cells interact in close proximity. Together these three intra-tumor spatial domains comprise the virtual three-level spatial dissection of the tumor microenvironment that forms the basis for the SpAn spatial model overviewed in Fig. 1f and detailed in Fig. 2.

Utilizing expression of the 55 hyperplexed biomarkers, SpAn first computes the corresponding 55 mean intensities and 1485 Kendall rank-correlations as features characterizing each of the three spatial domains (see Fig. 2a). The mean intensity captures the average domain-specific expression profile of each biomarker, while the Kendall rank-correlations22 measure strength of association between any two biomarkers without presuming linearity (see Methods for details). Importantly, computation of domain-specific rank-correlations as explicit features for SpAn is used in place of the more typical approach of implicitly incorporating correlations as interactions between covariates (average biomarker expressions) within the prediction model23. These explicit features not only detect the association between two biomarkers presumably mediated by intracellular and intercellular networks all within the same spatial domain but also by mediators (e.g., exosomes) derived from another spatial domain. As an example, SpAn finds enrichment of KEGG ‘microRNAs in cancer’ pathway in the epithelial and epithelial-stromal domains (see below), while concurrently selecting correlation between CD163 and PTEN as a feature in the stromal domain for recurrence prognosis (see Fig. 3). As has been reported in gastrointestinal cancers, tumor cell-derived exosomal miRNAs mediate crosstalk between tumor cells and the stromal microenvironment, and induce polarization of the macrophages to the anti-inflammatory and tumor-supportive M2 state via activation of the PTEN-PI3K signaling cascade under hypoxic conditions resulting in enhanced metastatic capacity24,25.

SpAn then uses CRC-recurrence-guided learning to determine those specific spatial-domain features that constitute the optimal subset for prognosis via model selection based on L1-penalized Cox proportional hazard regression method (Fig. 2b)26,27. See Methods for details on penalized regression, Supplementary Fig. 4 for validity of the proportional hazard assumptions, and Supplementary Fig. 5 for determination of threshold for concordance with recurrence outcome. A follow-up analysis of the selected features is performed to test the stability of their contribution to recurrence prognosis through testing the stability of the sign of the corresponding coefficients at the 90% threshold. The final domain-specific features are shown in Fig. 3 and Supplementary Table 3 with additional details described in Methods.

The coefficients that control the contribution of the selected features to each of the domain-specific models for assessing recurrence outcome were learned under L1 penalization and their values are, therefore, dependent on all 1540 features. To remove this dependence, SpAn relearns each of the three domain-specific model coefficients using L2 penalty in our penalized Cox regression model with only the optimally selected features as input. This L2-regularized learning allows SpAn to estimate optimal contribution of the selected features that are 90% concordant with the recurrence outcome. The resulting domain-specific coefficients are shown in Supplementary Fig. 6. As detailed in Methods, SpAn combines these domain-specific features weighted by their corresponding coefficients into a single recurrence-guided spatial-domain prognostic model, whose performance is shown in Fig. 4a. The results were obtained by bootstrapping (sampling with replacement) patient dataset to generate 500 pairs of independent training and testing sets using stratified sampling that ensured the proportion of patients in whom cancer recurred in each of the 5 years remained the same in each bootstrap. For each bootstrap, SpAn used the training data for learning and the independent testing data to compute the receiver operating characteristic (ROC) curve. These ROC curves are shown in Fig. 4a along with the mean ROC curve. The mean area under the curve (AUC) for bootstrapped ROC curves is 88.5% with a standard error of 0.1%, demonstrating the stable performance of SpAn. We also maximized Youden’s index28 to identify the clinically relevant operating point on the ROC curves that minimized the overall misdiagnosis rate. Figure 4b shows the resulting sensitivity and specificity values for all bootstrap runs, with mean values, respectively, of 80.3% (standard error of 0.4%) and 85.1% (standard error of 0.3%). High specificity limits SpAn from misidentifying no-evidence-of-disease patients as being at high risk of CRC recurrence, while at the same time good sensitivity allows SpAn to not miss high-risk patients. This is emphasized by a high positive likelihood ratio value of 7.2 (standard error of 0.23), which quantifies the large factor by which odds of CRC recurring in a patient go up, when SpAn identifies the patient as being at risk of CRC recurrence. At the same time a small negative likelihood value of 0.22 (standard error of 0.003) quantifies the decrease in odds of CRC recurrence in a patient when SpAn identifies the patient as being at low risk. Finally, these results are brought together in Fig. 4c, which show the large separation in recurrence-free survival curves of patients identified by SpAn at low and high risk of 5-year CRC recurrence.

### Validating the rationale behind SpAn

The rationale behind our “virtual-dissection followed by combination of the three specific spatial domains” approach is motivated by the acknowledged active role of the microenvironment and its spatial organization, and the differential role played by the epithelial and stromal domains in tumor growth and recurrence2,13,29. We tested the validity of this rationale within the context of our data by comparing the performance of SpAn with the null model, which is based on recurrence-guided learning of the spatially undissected patient TMA spot. We note that the learning procedure for the null model was identical to the domain-specific learning within SpAn. In addition, we also compared the performance of SpAn with four other models that included a clinical model, biomarker expression model, biomarker expression + clinical model, and SpAn + clinical model. The input into the clinical model were clinical features associated with age, gender and TNM stage, and the learning procedure was based on Cox proportional hazard regression30. The biomarker expression model input were biomarker expression intensities alone and the learning procedure was identical to SpAn. The biomarker expression + clinical, and SpAn + clinical models, respectively, combined biomarker expression and SpAn with the clinical model. Figure 4d shows the AUC violin and boxplots of the bootstrapped ROC curves achieved by each model. The figure illustrates the improvement SpAn achieves over the performance of other models. To quantify the statistical significance of this improvement, we performed Dunn’s pairwise multiple comparison post hoc analysis between the models based on non-parametric Kruskal–Wallis test31. Our analysis shows that the improvement in performance achieved by SpAn over all other models is statistically significant at the 99% confidence interval with a p-value much less than 0.005 (Supplementary Table 4). We specifically note that this is true for SpAn performance in comparison to the null model based on the spatially undissected patient TMA spot without spatial-domain context. This improved performance of SpAn highlights the importance of explicitly modeling the epithelial, stromal and epithelial-stromal spatial domains associated with the TME. Interestingly, beyond supporting our rationale, this comparative test also demonstrates that joint utilization of biomarker expressions and their correlations results in superior performance of both SpAn and its null model over clinical features and biomarker expressions alone (also see Supplementary Fig. 7). We note that published state-of-the-art approaches that include Immunoscore®14,15 rely on biomarker expressions. Finally, we observe that the marginal performance improvement over SpAn achieved by including clinical features with SpAn—the SpAn + clinical model—is not statistically significant with a p-value of 0.082.

### SpAn predicts 5-year recurrence in Stages I–III CRC patients

The ability to identify patients in whom CRC will recur, especially for those patients in Stages II and III of tumor progression is highly clinically relevant. Figure 4e shows that, although the modeling of SpAn is TNM stage-independent, SpAn can consistently identify patients in whom risk of CRC recurrence is high for Stages I through III, with mean AUC of bootstrapped ROC curves for the three stages, respectively, being 82.1%, 89.4%, and 88.6%. Standard error of these mean AUC values, respectively, is 0.4%, 0.2%, and 0.2%, demonstrating the stability of SpAn performance. Although the overall performance across all three stages is highly significant with the potential of improving prognosis, the relative reduction in Stage I performance may be a consequence of the small cohort of only ten patients in Stage I with CRC recurrence.

The ability of SpAn to predict risk of recurrence in individual patients from all three Stages, is relevant in the context of administering adjuvant therapy, especially for Stage II patients. Current guidelines for treating Stage II CRC patients from The National Comprehensive Cancer Network (NCCN)32, the American Society of Clinical Oncology (ASCO)33, and the European Society of Medical Oncology (ESMO)34 do not recommend routine adjuvant chemotherapy for Stage II patients, but do state that it should be considered for sub-population of Stage II patients that are at higher risk and might benefit from being put on adjuvant therapy regimen35. The personalized prognostic potential of SpAn implies that we could triage Stage II patient cohorts into low and high-risk groups, with the latter being further considered for therapy. Furthermore, SpAn could help with postoperative surveillance of high-risk Stage II patients with more intensive follow-up regimes36.

While 20–30% of Stage II CRC patients are at high risk of recurrence, there are Stage III patients that have good prognoses of stable 5-year recurrence-free survival. SpAn, therefore, could also be used to fine-tune their postoperative surveillance and adjuvant chemotherapy regimens.

### Prognostic performance of SpAn remains stable over 5 years

A majority of CRC recurrence occurs in the first 5 years, with 90% occurring in the first four37,38. We, therefore, consider the time-dependent performance39 of SpAn during the first 5-year period. Figure 4f plots the AUC for time-dependent ROC performance. The performance of SpAn in predicting risk of recurrence remains consistent and stable (95% confidence interval shown) with only a small, and gradual reduction in time-dependent AUC values as we move away from the resection and imaging timepoint. This result suggests SpAn captures the critical biological underpinnings of recurrence in the primary tumor. Supplementary Fig. 8 shows the time-dependent AUCs for domain-specific temporal performance of SpAn.

### SpAn infers spatial-domain networks underlying CRC recurrence

Given the high prognostic performance of SpAn, we took a systems perspective to understand and to explain the underlying network biology responsible for this performance within each of the three spatial domains. For each domain, we quantified the unique associations between biomarkers included in the selected features through partial correlations between every biomarker pair, when controlling for other biomarkers as described in Methods. This approach was performed for all patients. The resulting partial correlation for every biomarker pair was separated into two groups according to no-evidence-of-CRC and CRC-recurrence patient cohorts and the information distance based on Jensen–Shannon divergence40 was computed between them (see Methods for more details). The resulting domain-specific distance matrices, shown in Fig. 5a–c, define associated graphs with the nodes being the biomarkers and edge weights quantifying the differential change, the information distance, in biomarker association between patients in which CRC recurred and those in which there was no evidence of recurrence. The stronger the weights, the larger the distance, and the more significant the differential change in association between the two markers for the two patient cohorts. We defined the graphs generated by the distance matrices thresholded at the 99th percentile as the spatial-domain networks that were most significant for CRC-recurrence prognosis. Figure 5d–f shows the resulting networks for the three spatial domains that reveal the heterogeneous nature of the cell populations and signaling pathways leveraged by SpAn in CRC-recurrence prognosis.

The epithelial-stromal domain network is comprised of three dominant subnetworks associated with tumor-invading T lymphocytes41, disruption in DNA mismatch repair cellular process, and the role of cancer-associated fibroblasts (CAFs) in the desmoplastic microenvironment as indicated by the strong edge weight between smooth muscle actin (SMA) and collagen IV. CAFs are well-known to promote EMT42 and the differential expression of beta-catenin and phosphorylated-MET in Fig. 5f is also consistent with the epithelial-stromal domain supporting the mesenchymal phenotype43. These features have also been identified with those distinguishing consensus molecular subtype (CMS) 4 that is associated with a poor prognosis in comparison to the other 3 subtypes in the transcriptome-based classification11,12. Interestingly, the epithelial-stromal spatial domain also reveals the presence of DNA mismatch repair network that has been associated with regulation of T-lymphocyte infiltration, a prominent feature of CMS1. Thus, the epithelial-stromal spatial domain associated with recurrence combines two features, where in contrast, each alone is associated with two different CMS subtypes. This theme extends to the epithelial spatial domain in Fig. 5d, where metabolic deregulation, a prominent feature of CMS3, and DNA mismatched repair, a hallmark of CMS1 are evident. The association of these two subnetworks in the epithelial domain has the potential to promote tumor cell growth while escaping immune surveillance. Finally, we observe a prominent tumor associated macrophage (TAM) network in the stromal spatial domain (Fig. 5e). TAM polarization toward the M2 phenotype regulated by AKT/PTEN has been associated with poor prognosis in CRC that could result from their immunosuppressive and matrix remodeling phenotypes25.

### Spatial-domain networks reveal domain-specific CRC network biology

We used STRING44 and KEGG45 databases to identify pathways enriched by biomarkers within each of the spatial-domain networks and further corroborate their connections to prominent features in the CMS subtype classification. Figure 6 shows the pathways enriched in each of the three spatial domains, and further identifies those that are common to a majority of at least two of the three spatial domains. Since their identification is based on the spatial-domain networks we computationally identified as significant for CRC-recurrence prognosis, these pathways play a differentially important role in prognosis of CRC recurrence.

CMS2 tumors are associated with chromosomal instability pathway and enrichment of genes associated with cell cycle and proliferation. Interestingly, both Pi3k-Akt signaling and cell cycle pathways enriched in our analysis are associated with CMS2 tumor subtype, with almost 60–70% of CRCs associated with dysregulation of Pi3k-Akt signaling pathways46.

Tumors associated with the CMS4 mesenchymal phenotype show upregulated expression of genes involved in epithelial-to-mesenchymal transition along with increased stromal invasion, angiogenesis and transforming growth factor-β (TGF-β) activation11,12. Interestingly, proteoglycans in cancer, focal adhesion and microRNAs in cancer pathways enriched in our analysis enable the mesenchymal phenotype. For example, non-coding microRNAs both regulate and are targets of upstream regulators for modulating the epithelial-to-mesenchymal phenotype by targeting EMT-transcription factors such as ZEB1, ZEB2, or SNAIL47. Similarly, the focal adhesion pathway through the integrin family of transmembrane receptors mediates attachment to the extracellular matrix, and when dysregulated promotes cell motility and the mesenchymal phenotype48,49. Furthermore, extracellular and cell surface proteoglycans with their interaction with cell surface proteins such as CD44 have been known to promote tumor cell growth and migration50,51.

Our analysis suggests that by capturing correlation-based crosstalk between heterocellular signaling pathways, SpAn leverages the interconnections between the subtypes for a high performing CRC-recurrence prognosis and reveals a synergistic role of the CMS subtypes in CRC progression and recurrence. We note that the ability of SpAn to leverage these interconnections is due to the spatial-context-preserving sampling of a diverse set of CRC-relevant biomarkers enabled by HxIF imaging.

Interestingly, this network biology paradigm also shows enrichment of pathways specific to a single spatial domain whose oncogenic or tumor suppressive roles in CRC is an active area of research but whose differential role in CRC recurrence has not been widely studied. For example, in the epithelial domain our analysis shows the enrichment of Thyroid hormone signaling pathway that has been associated with a tumor suppressive role in CRC development52,53. In contrast, the bacterial invasion pathway, enriched in epithelial-stromal boundary region, has been implicated in the oncogenic role of the colonic microbiome in CRC development54,55.

Our analysis also reveals enrichment of certain other pathways, such as the hypoxia-inducible factor 1 (HIF-1), human epidermal growth factor receptor 2 (HER2) and T-cell receptor signaling pathways in the epithelial domain. Hypoxia is typical in many solid tumors in CRC with HIF-1 regulating tumor adaptation to hypoxic stress56. Alterations in Her-2 signaling, either through genomic amplification or mutations is tumor promoting, and anti-HER2 therapies for preventing CRC recurrence and are a focus of on-going work57. We finally note that MAPK and PI3K-AKT signaling cascades are implicated in many of the above discussed pathways.

## Discussion

This study highlights the importance of spatial context of the primary tumor microenvironment in conferring distinct malignant phenotypes such as recurrence in CRC. We show how a computationally unbiased approach can be implemented through statistical modeling of spatially defined domains leading to a highly specific and sensitive platform for prognostic and diagnostic tests, as well as potentially inferring therapeutic strategies (Fig. 7). Although type, density and location of immune cells within CRC tumor samples have previously been utilized to predict patient outcome58, SpAn is novel in concept and distinct in approach for a few inter-related reasons. Unlike studies that are association-based, for example, associating recurrence with immune profiling of the CRC tumor58, SpAn is an outcome-driven method that utilizes the recurrence outcome to implement a systems approach to both predict CRC recurrence in patients and infer domain-dependent network biology most significant for this prediction. This outcome-driven systems approach, therefore, allows formal modeling of CRC recurrence as an emergent phenotype of the underlying TME, its spatial context and its molecular and cellular diversity. Furthermore, identifying spatial-domain networks that capture differential change most significant for recurrence, allows SpAn to be a hypothesis generating systems pathology platform that provides testable hypotheses regarding how spatial association of common networks could potentially lead to emergent signaling networks that confer malignant phenotypes in CRC patients. For example, an epithelial domain network coupling cell metabolism and DNA repair is consistent with tumor cell growth at the expense of T-cell exclusion and functional deficiency11,12 (Figs. 5 and 6). Likewise, the hijacking of CAFs to support EMT in the context of diminished immune surveillance in the epithelial-stromal spatial domain42,59,60 (Figs. 5 and 6) and the PI3K/AKT-mediated polarization of TAMs within the stromal domain (Figs. 5 and 6) can conspire to facilitate migration of tumor initiating cells to promote both local and distant recurrence61.

SpAn, when used in combination with a non-destructive hyperplexed imaging platform such as Cell DIVETM allows mechanistic hypotheses to be tested through iterative probing of the same spatial domains with additional biomarkers inferred by the pathway analyses (Figs. 6 and 7). We expect even more specific mechanistic biomarkers to emerge based on the iterative hyperplexed imaging approach that incorporates finer stage-based focus, thereby reducing the total number of biomarkers needed for optimal analyses. We will be pursuing this in subsequent studies. This feature of SpAn combined with a hyperplexing imaging platform will not only allow refinement of its prognostic ability, but since the iterative analysis can be potentially conducted in real time with further advances in the technologies, it may allow the prognosis to be specific to individual patients. Importantly, by enabling the testing of mechanistic hypotheses in patient samples directly connected to a specific clinical outcome, SpAn may inform therapeutic strategies to prevent the outcome. For example, immunotherapy has shown benefits in microsatellite instable (MSI) CRC patients but remains refractory in microsatellite stable (MSS) CRC patients62. By combining non-destructive hyperplexed imaging with iterative analysis, SpAn has the potential to identify spatial-domain networks that are differentially expressed  in MSI and MSS CRC patient cohorts. The next phase of this work will take advantage of sampling multiple regions of primary tumors with larger TMAs and/or whole side sections, exploring other spatial analytics ranging from simple to sophisticated spatial heterogeneity metrics63 and incorporating a combination of protein and nuclei acid biomarkers64.

The ability of SpAn to exploit spatial context of the tumor makes it suitable to study cancers that progress via spatially mediated signaling interactions with their TME. SpAn, therefore, is applicable to solid tumors including sarcomas, carcinomas, and lymphomas, which co-evolve with the TME65 of their abnormal tissue mass. We will pursue applications of SpAn to solid tumors beyond colorectal adenocarcinoma in subsequent studies. Our present retrospective study provides the foundation for such studies in other solid tumors. It also establishes feasibility of implementing SpAn in prospective studies predicting disease outcomes in patients with CRC and other malignant solid tumors. The high specificity and sensitivity of SpAn lies in its ability to unbiasedly identify emergent networks that appear to be closely associated and likely to be mechanistically linked to recurrence. We anticipate that hyperplexed datasets based on multiple imaging modalities will be generated faster and become less expensive as the technology evolves to become a mainstay tool to analyze solid tumors.

## Methods

### Patient cohort and tissue microarray (TMA)

As detailed in Gerdes et al.19, the CRC cohort in this analysis was collected from the Clearview Cancer Institute of Huntsville Alabama from 1993 until 2002 with 747 patient tumor samples collected as paraffin-embedded specimens. Tissue microarrays were constructed by Applied Genomics (later part of Clarient Lab) to facilitate large scale biomarker analysis. Cores with 0.6-mm diameters from the patient samples were distributed across seven slides. After quality control measures were taken, 694 TMA patient spots remained for analysis. Sample attrition was due to insufficient tumor fraction in the representative TMA core. Of the remaining samples, 450 were chemo-naive CRC patients that were treated with surgery alone, and the remaining 244 patients were treated with 5-fluorouracil-based chemotherapy regimens. Four-hundred and thirty-two chemo-naive patients were used in this study. Supplementary Table 2 details the median age, gender, recurrence, recurrence time, and survival times for the 432 Stage I–III CRC patients. As can be seen the patient cohort is balanced in age and gender across the three stages.

### Antibody validation

We ensured that the correct biomarker expression was captured by the imaging system using an antibody standardization process19. Specifically, antibodies were selected based on their staining specificity and sensitivity, compatibility with the two-step antigen retrieval, and resilience during 1, 5, and 10 rounds of dye-inactivation chemistry. Depending on the marker, a variety of specificity tests were conducted including, immunogen peptide blocking before incubation with tissue, drug-treated fixed cell lines, fixed cell lines with gene amplification or deletion, phosphatase treatment of samples to verify phospho-specificity, and visual inspection by expert pathologists of expected localization patterns. Furthermore, fluorescent dyes were conjugated to the primary antibody at several initial dye substitution ratios and specificity of each conjugate was verified and sensitivity compared with levels found in previous experiments. Staining performance was assessed by expert biologists and poor or non-specific staining was excluded.

### Cell DIVE-based hyperplexed imaging (HxIF) of tissue microarrays (TMA)

The 55 biomarkers plus DAPI nuclear counterstain included in this study are described in Supplementary Fig. 2 and Supplementary Table 1. HxIF imaging of a TMA slide was performed using sequentially multiplexed labeling and imaging of 2–3 biomarkers along with DAPI counterstain through a label–image–chemical-inactivation iterative cycle19 visualized in Supplementary Fig. 1 and detailed in Supplementary Table 5. Broadly, the supporting information details the hyperplexed immunofluorescence workflow with information on iterative cycles of antibody labeling of single 5 µm formalin-fixed and paraffin-embedded tissue sections and TMA slides, autofluorescence removal, imaging, and dye inactivation in tissue. All samples were stained and imaged in a single batch for 2–3 biomarkers and DAPI at a time.

### Image processing and single-cell analysis

DAPI-based nuclear staining was used to register and align sequentially labeled and imaged TMA spots prior to downstream image analysis steps19. Autofluorescence was removed from the stained images19,66, which were then segmented into epithelial and stromal regions (Supplementary Fig. 3), differentiated by epithelial E-cadherin staining. This was followed by segmentation of individual cells in both the epithelium and stroma. Epithelial cells were segmented using Na+K+ATPase-based cell-membrane staining to delineate cell borders and membrane regions, the cytoplasmic ribosomal protein S6 for cytoplasm identification, and DAPI stain for nuclear regions. Protein-expression level and standard deviation were subsequently quantified in each cell. The epithelial-stromal domain was identified via a three-step process. First, a tessellation of the patient TMA spot was performed using partially overlapping circles with a diameter of 50 µm. Second, only those circles with both positive and negative E-cadherin staining were retained. Finally, union of these circles resulted in a contiguous epithelial-stromal domain with a width of 100 µm.

### Quality checks and data normalization

Following single-cell segmentation, several data pre-processing steps were conducted. These included cell filtering, spot exclusion, log2 transformation and slide to slide normalization. Cells were included for downstream analysis if their size was greater than 10 pixels at ×20 magnification. The hyperplexing process can result in the tissue being damaged, folded, or lost. Image registration issues can also result in poor-quality cell data. Therefore, a tissue quality index based on the correlation of that image with DAPI was calculated for each cell for each round. Only those cells whose quality index equals to or greater than 0.9 (meaning that at least 90% of the cells overlapped with DAPI) were included. All the slides for all the biomarkers were adjusted to a common exposure time per channel. The data were then log2 transformed. A median normalization that equalizes the medians of all the slides was performed to remove slide to slide non-biological variability.

### SpAn input features

For each of the epithelial, stromal, and epithelial-stromal spatial domains, SpAn used M = 1540 domain-specific biomarker feature vector f as input. This input feature vector comprised of (1) mean intensity value of 55 biomarkers averaged across all cells within the spatial domain, and (2) 1485 (=55*54/2) Kendall rank-correlations between all 55 biomarker pairs. Kendall rank-correlation was chosen as the correlation metric because it is a non-parametric measure of association between two biomarkers. Moreover, its use of concordant and discordant pairs of rank-ordered biomarker expression for computing correlation coefficients allows it to robustly capture biomarker associations in presence of measurement noise and small sample size. Rank-correlation for each pair of biomarkers was computed for each spatial domain from all cells across the spatial domain expressing the biomarkers. This approach is distinctly different from prediction models that typically consider correlations via interactions, implicit within the models, between mean biomarker intensity expressions—with the biomarker expressions being the only covariates of the model23. We emphasize that we did not compute correlations through mean intensity biomarker expression across the spatial domain, but instead used biomarker expressions across individual cells of the spatial domain to explicitly compute domain-specific rank-correlation values between every pair of biomarkers to form the SpAn correlation feature set.

Before computing these two sets of features, SpAn analysis workflow included an initial intensity threshold step to ensure feature robustness. Specifically, we computed intensity-based distribution of cell-level biomarker expression separately for every biomarker across each patient TMA spot. Only intensities above the 85th percentile on this distribution were considered as biomarker expression and included in computing the intensity features. This focus on the right-tail of the intensity distribution was deliberately conservative, and although it might have potentially excluded low-intensity biomarker expression, it minimized inclusion of false-positive expressions into the analysis.

### Penalized Cox proportional hazard regression

For each spatial domain, SpAn implemented the Cox proportional hazard model via the partial likelihood function $$L\left( {\mathbf{\beta }} \right) = \mathop {\prod}\nolimits_{k = 1}^K {\frac{{e^{\left( {{\mathbf{f}}_{i_k}^{\mathrm{T}}{\mathbf{\beta }}} \right)}}}{{\mathop {\sum }\nolimits_{i \in R_k} e^{\left( {{\mathbf{f}}_i^{\mathrm{T}}{\mathbf{\beta }}} \right)}}}}$$ with the penalty given by $$P_{\lambda ,\alpha }\left( {\mathbf{\beta }} \right) = \mathop {\sum}\nolimits_{m = 1}^M \lambda \left( {\alpha \left| {\beta _m} \right| + \frac{1}{2}\left( {1 - \alpha } \right)\beta _m^2} \right)$$, and α = {0,1}. (The validity of using the Cox proportional hazard regression model is demonstrated in Fig. S4.) Given feature vector f as input, the partial likelihood L(β) quantifies the conditional probability of observing CRC recur in a patient at time tk (proportional to the numerator $$e^{\left( {{\mathbf{f}}_{i_k}^{\mathrm{T}}{\mathbf{\beta }}} \right)}$$ of L(β)), given the risk that a patient will recur from the set Rk of patients at risk at time tk (proportional to the denominator $$\mathop {\sum}\nolimits_{i \in R_k} {e^{({\mathbf{f}}_i^{\mathrm{T}}\beta )}}$$), over all time tk, k = 1,…,K, as quantified by the product over time index k. The partial likelihood is a function of the coefficient vector β, whose penalized estimate is then used to compute the proportional hazard ratio HR = $$e^{\left( {{\mathbf{f}}^{\mathrm{T}}\beta } \right)}$$. SpAn computed this estimate via a two-step process that first selects the parsimonious set of features required for optimally predicting the risk of recurrence, and then learns the model predicting the risk of recurrence based on the selected features.

The feature selection step is implemented via L1-penalized (LASSO) Cox regression where α is set to 1 in penalty Pλ,α(β). LASSO-based L1-penalized model selection performs feature selection by forcing the coefficients of vector β that play a minimal role in predicting risk of recurrence to zero. This is done in a principled manner by minimizing the model deviance along the LASSO regularization path27,67. The features corresponding to the non-zero coefficients in β are the features selected by SpAn to define the final functional form of Cox proportional hazard model. Model learning based on this functional form is performed in the second step via maximizing the partial likelihood function with L2-regularization as the penalty, implemented by setting α to 0 in the penalty term27,67. L2-regularization allows SpAn to learn the Cox proportional hazard model while avoiding over-fitting. An advantage of this two-step process is the decoupling of feature selection from estimation of beta coefficient values, resulting in the latter not being conditioned on the complete set of 1540 features but being dependent only on the selected features.

To ensure the stability of the selected features, SpAn repeated model selection over 500 bootstraps, and included only those features that were consistently concordant at the 90% level with the recurrence outcome. (The rationale for 90% concordance is discussed in Supplementary Fig. 5.) SpAn next performed a stability check on the beta-coefficients estimated in the second step. Specifically, the stability of the coefficient sign in 90% of the 500 bootstrap runs was tested, and only features that passed this threshold (Fig. 3) were included in the spatial-domain model. SpAn performed this process independently for each of the three spatial domains resulting in domain-specific recurrence-guided features (Fig. 3) and their coefficients (Fig. S6).

### SpAn is computationally unbiased

SpAn begins penalized Cox proportional hazard regression by including the full 1540 features. It then utilizes LASSO-based shrinkage to parsimoniously optimize the full model along the L1 regularization path by minimizing model deviance67. By combining this principled shrinkage via L1-penalized Cox proportional hazard regression, with bootstrapping to establish the stability of the selected subset of features at 90% concordance with the recurrence outcome (Supplementary Fig. 5), SpAn avoids typical biases associated with many model selection approaches based on stepwise variable selection, backward elimination, and forward selection68. These biases include R2 values being biased high, F and χ2 test-statistics not having their associated distributions, p-values being biased toward zero, and standard errors of regression coefficient estimates being biased low, while absolute values of regression coefficients being biased high.

### SpAn is robust to dataset imbalance

Many machine learning algorithms underpinning prediction models can potentially result in suboptimal performance for imbalanced datasets, where the number of resected CRC patients with no evidence of disease in the first 5 years (low risk) is not similar in number to patients in whom CRC recurred within the first 5 years (high risk). SpAn, however, is relatively robust to this imbalance due to its use of Cox proportional hazard model for predicting 5-year CRC recurrence. Cox proportional hazard model is based on the hazard function which utilizes the conditional probability of CRC recurrence in a patient at time t, given a risk set at that time and the knowledge that CRC has not recurred in the patient until time t27. Therefore, in Cox proportional hazard model the timing is more critical than the number of recurrences per se. As an example, consider the hypothetical scenario, where, instead of actual 65 high-risk CRC patients in the study, all 432 CRC patients in the cohort were high risk, with no patient at low risk. The Cox proportional hazard model will, in principle, remain valid because it models the conditional rate of CRC recurrence as a function of time and not the number of recurrences themselves. However, the SpAn workflow does ensure that in this study the size of the high-risk CRC patients is large enough for us to be able to sample high-risk CRC patients in each recurrence risk set corresponding to each of the 5 years. Supplementary Table 6 illustrates a typical stratified sampling result employed by SpAn to construct the training and testing sets. As can be seen this stratified sampling not only ensures that high-risk patients are approximately equally distributed between the training and testing datasets, but it also ensures that high-risk CRC patients are captured in each recurrence risk set corresponding to each of the 5 years. Thus, SpAn implements a form of risk set sampling69. It is also instructive to note that, as highlighted by CRC epidemiological studies37,38, our CRC patient cohort along with our sampling strategy captures the real-world trend that a majority of CRC recurrence in patients occurs in the first 5 years of primary tumor resection.

### Spatial model

Each of the three recurrence-guided domain-specific models defined a hazard risk given by $$e^{\left( {{\mathbf{f}}_{{\mathrm{epithelial}}}^{\mathrm{T}}{\mathbf{\beta }}_{{\mathrm{epithelial}}}} \right)}$$, $$e^{\left( {{\mathbf{f}}_{{\mathrm{stromal}}}^{\mathrm{T}}{\mathbf{\beta }}_{{\mathrm{stromal}}}} \right)}$$, and $$e^{\left( {{\mathbf{f}}_{{\mathrm{epi}} - {\mathrm{stromal}}}^{\mathrm{T}}{\mathbf{\beta }}_{{\mathrm{epi}} - {\mathrm{stromal}}}} \right)}$$ for the epithelial, stromal, and epithelial-stromal domains, respectively. SpAn then defined the final overall risk of recurrence model as $$\prod_{s \in S} e^{\left( {{\mathbf{f}}_s^{\mathrm{T}}\beta _s} \right)}$$, with S = {epithelial, stromal, epi–stromal}.

### Partial correlations and spatial-domain networks

For each spatial domain, the selected features identified a set of biomarkers specific to predicting risk of CRC recurrence. SpAn used them to define a space of biomarkers within which partial correlations between every pair was computed by controlling for confounding effect of biomarkers not defining the pair70. The process performed on each patient was as follows: Let the set of biomarkers identified by the selected features be N (≤55). Using the already computed Kendall rank-correlations between the 55 biomarkers, an N × N correlation matrix C corresponding to the N biomarkers was constructed, with small shrinkage-based modification to guarantee its positive definiteness, and therefore, its invertibility. Next, the N × N precision matrix P was computed by inverting C. The partial correlation between any two biomarkers bmi and bmj within the set identified by the selected features, was then computed using $$\rho _{{\mathrm{bm}}_i,{\mathrm{bm}}_{\mathrm{j}}} = \frac{{ - p_{{\mathrm{bm}}_i,{\mathrm{bm}}_j}}}{{\sqrt {p_{{\mathrm{bm}}_i,{\mathrm{bm}}_i} \cdot p_{{\mathrm{bm}}_j,{\mathrm{bm}}_j}} }}$$, where $$p_{{\mathrm{bm}}_i,{\mathrm{bm}}_j}$$ is the (i,j)th element of the precision matrix P. The partial correlations were performed for all patients and were then separated into two groups corresponding to patients with no evidence of disease and those patients in which CRC recurred. Probability distributions of the partial correlations—on the compact set [−1,1]—within each group were computed and the information distance between these two distributions was computed using the Jensen–Shannon divergence. This information distance defines the differential change in the association—partial correlation—between biomarkers bmi and bmj in the two patient cohorts. Greater the distance, larger the differential change. Repeating this process for all N(N−1)/2 biomarker pairs resulted in the information distance matrices shown in Fig. 5a–c for the three spatial domains. These information distance matrices were thresholded at the 99th percentile resulting in the computationally inferred spatial-domain networks shown in Fig. 5d–f. The high percentile was chosen to ensure that most discriminative networks are captured.

### Enrichment analysis

The STRING database44 was queried with the set of proteins identified by the spatial-domain networks generated by thresholding the domain-specific information distance matrices at the 99th percentile, to perform functional enrichment analysis using Fisher’s exact test with multiple testing correction.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

Due to the multi-terabyte size of the dataset, the data that supports the findings of this study can only be made available upon reasonable request.

## Code availability

The computational and systems pathology intellectual property is owned by the University of Pittsburgh and is exclusively licensed to SpIntellx Inc., Pittsburgh, PA as TumorMapr™. Use of TumorMapr code requires a licensing agreement with SpIntellx (info@spintellx.com).

## References

1. 1.

Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2020. CA. Cancer J. Clin. 70, 7–30 (2020).

2. 2.

Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

3. 3.

Tape, C. J. The heterocellular emergence of colorectal cancer. Trends Cancer 3, 79–88 (2017).

4. 4.

Weiser, M. R. AJCC 8th edition: colorectal cancer. Ann. Surg. Oncol. 25, 1454–1455 (2018).

5. 5.

Brierley, M. K., Gospodarowicz, J. D. & Wittekind, C. TNM Classification of Malignant Tumours (Wiley, 2017).

6. 6.

Mlecnik, B., Bindea, G., Pagès, F. & Galon, J. Tumor immunosurveillance in human cancers. Cancer Metastasis Rev. 30, 5–12 (2011).

7. 7.

Bir, A. S., Fora, A. A., Levea, C. & Fakih, M. G. Spontaneous regression of colorectal cancer metastatic to retroperitoneal lymph nodes. Anticancer Res. 29, 465–468 (2009).

8. 8.

Stanta, G. & Bonin, S. Overview on clinical relevance of intra-tumor heterogeneity. Front. Med. 5, 85 (2018).

9. 9.

Carmona-Fontaine, C. et al. Metabolic origins of spatial organization in the tumor microenvironment. Proc. Natl Acad. Sci. USA 114, 2934–2939 (2017).

10. 10.

Marusyk, A. et al. Spatial proximity to fibroblasts impacts molecular features and therapeutic sensitivity of breast cancer cells influencing clinical outcomes. Cancer Res. 1457, 2016 (2016).

11. 11.

Guinney, J. et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 21, 1350–1356 (2015).

12. 12.

Dienstmann, R. et al. Consensus molecular subtypes and the evolution of precision medicine in colorectal cancer. Nat. Rev. Cancer 17, 79 (2017).

13. 13.

Tauriello, D. V. F., Calon, A., Lonardo, E. & Batlle, E. Determinants of metastatic competency in colorectal cancer. Mol. Oncol. 11, 97–119 (2017).

14. 14.

Pagès, F. et al. International validation of the consensus Immunoscore for the classification of colon cancer: a prognostic and accuracy study. Lancet 391, 2128–2139 (2018).

15. 15.

Jorissen, R. N., Sakthianandeswaren, A. & Sieber, O. M. Immunoscore—has it scored for colon cancer precision medicine? Ann. Transl. Med. 6, S23 (2018).

16. 16.

Gough, A. et al. in The Molecular Basis of Cancer. 369–392 (Elsevier, 2015).

17. 17.

Schubert, W. et al. Analyzing proteome topology and function by automated multidimensional fluorescence microscopy. Nat. Biotechnol. 24, 1270 (2006).

18. 18.

Lin, J.-R., Fallahi-Sichani, M. & Sorger, P. K. Highly multiplexed imaging of single cells using a high-throughput cyclic immunofluorescence method. Nat. Commun. 6, 8390 (2015).

19. 19.

Gerdes, M. J. et al. Highly multiplexed single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue. Proc. Natl Acad. Sci. USA 110, 11982–11987 (2013).

20. 20.

Goltsev, Y. et al. Deep profiling of mouse splenic architecture with CODEX multiplexed imaging. Cell 174, 968–981 (2018).

21. 21.

Bubendorf, L., Nocito, A., Moch, H. & Sauter, G. Tissue microarray (TMA) technology: miniaturized pathology archives for high-throughput in situ studies. J. Pathol. 195, 72–79 (2001).

22. 22.

Kendall, M. G. A new measure of rank correlation. Biometrika 30, 81–93 (1938).

23. 23.

Chambers, J. M. & Hastie, T. J. in Statistical Models in S 13–44 (1992).

24. 24.

Wang, X. et al. Hypoxic tumor-derived exosomal miR-301a mediates M2 macrophage polarization via PTEN/PI3Kγ to promote pancreatic cancer metastasis. Cancer Res. 78, 4586 LP–4584598 (2018).

25. 25.

Maia, J., Caja, S., Strano Moraes, M. C., Couto, N. & Costa-Silva, B. Exosome-based cell-cell communication in the tumor microenvironment. Front. Cell Developmental Biol. 6, 18 (2018).

26. 26.

Goeman, J. J. L1 penalized estimation in the Cox proportional hazards model. Biometrical J. 52, 70–84 (2010).

27. 27.

Simon, N., Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39, 1–13 (2011).

28. 28.

Youden, W. J. Index for rating diagnostic tests. Cancer 3, 32–35 (1950).

29. 29.

Peddareddigari, V. G., Wang, D. & Dubois, R. N. The tumor microenvironment in colorectal carcinogenesis. Cancer Microenviron. 3, 149–166 (2010).

30. 30.

Therneau, T. M. & Grambsch, P. M. Modeling Survival Data: Extending the Cox Model (Springer-Verlag, 2000).

31. 31.

Dunn, O. J. Multiple comparisons using rank sums. Technometrics 6, 241–252 (1964).

32. 32.

Benson, A. B. et al. Colon cancer, version 1.2017, NCCN clinical practice guidelines in oncology. J. Natl Compr. Cancer Netw. 15, 370–398 (2017).

33. 33.

Benson, A. B. et al. American society of clinical oncology recommendations on adjuvant chemotherapy for stage II colon cancer. J. Clin. Oncol. 22, 3408–3419 (2004).

34. 34.

Labianca, R. et al. Early colon cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann. Oncol. 24(Supp 6), VI64–VI72 (2013).

35. 35.

Varghese, A. Chemotherapy for Stage II colon cancer. Clin. Colon Rectal Surg. 28, 256–261 (2015).

36. 36.

Gan, S., Wilson, K. & Hollington, P. Surveillance of patients following surgery with curative intent for colorectal cancer. World J. Gastroenterol. 13, 3816–3823 (2007).

37. 37.

Ryuk, J. P. et al. Predictive factors and the prognosis of recurrence of colorectal cancer within 2 years after curative resection. Ann. Surg. Treat. Res. 86, 143–151 (2014).

38. 38.

Hammond, K. & Margolin, D. A. The role of postoperative surveillance in colorectal cancer. Clin. Colon Rectal Surg. 20, 249–254 (2007).

39. 39.

Heagerty, P. J., Lumley, T. & Pepe, M. S. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344 (2000).

40. 40.

Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37, 145–151 (1991).

41. 41.

Sevinsky, C. et al. Abstract 1467: multiplexed immunofluorescence quantitation and validation of multiple immune cell types in colon cancer epithelium and stroma. Cancer Res. 76, 1467 LP–1461467 (2016).

42. 42.

Tommelein, J. et al. Cancer-associated fibroblasts connect metastasis-promoting communication in colorectal cancer. Front. Oncol. 5, 63 (2015).

43. 43.

Bradley, C. A. et al. Transcriptional upregulation of c-MET is associated with invasion and tumor budding in colorectal cancer. Oncotarget 7, 78932–78945 (2016).

44. 44.

Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).

45. 45.

Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K. & Tanabe, M. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 47, D590–D595 (2019).

46. 46.

Colakoglu, T. et al. Clinicopathological significance of PTEN loss and the phosphoinositide 3-kinase/Akt pathway in sporadic colorectal neoplasms: is PTEN loss predictor of local recurrence? Am. J. Surg. 195, 719–725 (2008).

47. 47.

Vu, T. & Datta, P. K. Regulation of EMT in colorectal cancer: a culprit in metastasis. Cancers (Basel) 9, 171 (2017).

48. 48.

Bonnans, C., Chou, J. & Werb, Z. Remodelling the extracellular matrix in development and disease. Nat. Rev. Mol. Cell Biol. 15, 786–801 (2014).

49. 49.

Morgan, M. R., Humphries, M. J. & Bass, M. D. Synergistic control of cell adhesion by integrins and syndecans. Nat. Rev. Mol. Cell Biol. 8, 957–969 (2007).

50. 50.

Chen, C., Zhao, S., Karnad, A. & Freeman, J. W. The biology and role of CD44 in cancer progression: therapeutic implications. J. Hematol. Oncol. 11, 64 (2018).

51. 51.

Theocharis, A. D., Skandalis, S. S., Tzanakakis, G. N. & Karamanos, N. K. Proteoglycans in health and disease: novel roles for proteoglycans in malignancy and their pharmacological targeting. FEBS J. 277, 3904–3923 (2010).

52. 52.

Krashin, E., Piekiełko-Witkowska, A., Ellis, M. & Ashur-Fabian, O. Thyroid hormones and cancer: a comprehensive review of preclinical and clinical studies. Front. Endocrinol. 10, 59 (2019).

53. 53.

Brown, A. R., Simmen, R. C. M. & Simmen, F. A. The role of thyroid hormone signaling in the prevention of digestive system cancers. Int. J. Mol. Sci. 14, 16240–16257 (2013).

54. 54.

Iftekhar, A., Sperlich, A., Janssen, K.-P. & Sigal, M. Microbiome and Diseases: Colorectal Cancer BT—the Gut Microbiome in Health and Disease (ed. Haller, D.) 231–249 (Springer International Publishing, 2018).

55. 55.

Dejea, C., Wick, E. & Sears, C. L. Bacterial oncogenesis in the colon. Future Microbiol. 8, 445–460 (2013).

56. 56.

Vaupel, P. The role of hypoxia-induced factors in tumor progression. Oncol 9, 10–17 (2004).

57. 57.

Ross, J. S. et al. Targeting HER2 in colorectal cancer: the landscape of amplification and short variant mutations in ERBB2 and ERBB3. Cancer 124, 1358–1373 (2018).

58. 58.

Galon, J. et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 1960–964 (2006).

59. 59.

Chen, X. & Song, E. Turning foes to friends: targeting cancer-associated fibroblasts. Nat. Rev. Drug Discov. 18, 99–115 (2019).

60. 60.

Haviv, I., Polyak, K., Qiu, W., Hu, M. & Campbell, I. Origin of carcinoma associated fibroblasts. Cell Cycle 8, 589–595 (2009).

61. 61.

Vergadi, E., Ieronymaki, E., Lyroni, K., Vaporidi, K. & Tsatsanis, C. Akt signaling pathway in macrophage activation and M1/M2 polarization. J. Immunol. 198, 1006–1014 (2017).

62. 62.

Sveen, A., Kopetz, S. & Lothe, R. A. Biomarker-guided therapy for colorectal cancer: strength in complexity. Nat. Rev. Clin. Oncol. 17, 11–32 (2019).

63. 63.

Spagnolo, D. et al. Pointwise mutual information quantifies intratumor heterogeneity in tissue sections labeled with multiple fluorescent biomarkers. J. Pathol. Inform. 7, 47 (2016).

64. 64.

Janiszewska, M. et al. In situ single-cell analysis identifies heterogeneity for PIK3CA mutation and HER2 amplification in HER2-positive breast cancer. Nat. Genet. 47, 1212–1219 (2015).

65. 65.

Balkwill, F. R., Capasso, M. & Hagemann, T. The tumor microenvironment at a glance. J. Cell Sci. 125, 5591–5596 (2012).

66. 66.

Woolfe, F., Gerdes, M., Bello, M., Tao, X. & Can, A. Autofluorescence removal by non-negative matrix factorization. IEEE Trans. Image Process. 20, 1085–1093 (2011).

67. 67.

Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

68. 68.

Harrell, F. E. Regression Modeling Strategies (Springer, 2015).

69. 69.

Langholz, B. & Goldstein, L. Risk set sampling in epidemiologic cohort studies. Stat. Sci. 11, 35–53 (1996).

70. 70.

Whittaker, J. Graphical Models in Applied Multivariate Statistics (Wiley, 1990).

## Acknowledgements

The work of S.U. was supported in part by NIH/NCI R01CA232593. The work of D.M.S., A.G., D.L.T., and S.C.C. was supported by grant NIH/NCI U01CA204836. S.C.C. was also supported in part by NIH/NHGRI U54HG008540, and UPMC Center for Commercial Applications of Healthcare Data 711077. D.L.T. was supported in part by NIH P30CA047904, and PA DHS 4100054875. D.M.S. was supported in part by NIH/NIBIB 5T32EB009403-07. F.G. was supported by NIH/NCI RO1CA208179. The computational and systems pathology intellectual property is owned by the University of Pittsburgh and is exclusively licensed to SpIntellx Inc., Pittsburgh, PA as TumorMapr™. The authors are grateful to the following team members for their role in development and generation of this multiplexed dataset: Elizabeth McDonough, Sireesha Kaanumalle, Jingyu Zhang, Denise Hewgley, Lori Corwin, Qing Li, and Michael Gerdes for antibody validation and review; Alberto Santamaria-Pang and Yousef al-Kofahi for single-cell segmentation; Alex Corwin, Kevin Kenny, Steve Zingelewicz for Cell DIVE acquisition and processing software; and Yunxia Sui and Brian Ring for data QC and compilation for analysis.

## Author information

Authors

### Contributions

S.U., A.M.S., D.L.T., and S.C.C. conceived and designed the study and wrote the paper. S.U. and S.C.C. developed the computational platform. S.U. performed the formal analyses. S.U., A.M.S., F.P., D.L.T., and S.C.C. developed the systems biology framework. C.J.S. and F.G. selected and validated the biomarkers, performed imaging and acquired the data, quality checks and preliminary data analysis. S.U., C.J.S., S.F., F.P., D.M.S., and L.N. were responsible for data quality control and providing analysis support. A.G., C.J.S., and F.G. helped with technical issues and edited and reviewed the paper.

### Corresponding authors

Correspondence to Shikhar Uttam or S. Chakra Chennubhotla.

## Ethics declarations

### Competing interests

S.C.C. and D.L.T. have ownership interest in SpIntellx, Inc., a computational and systems pathology company. The other authors declare no competing interests.

Peer review information Nature Communications thanks Edwin Roger Parra and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Uttam, S., Stern, A.M., Sevinsky, C.J. et al. Spatial domain analysis predicts risk of colorectal cancer recurrence and infers associated tumor microenvironment networks. Nat Commun 11, 3515 (2020). https://doi.org/10.1038/s41467-020-17083-x

• Accepted:

• Published: