Spatial analysis and CD25-expression identify regulatory T cells as predictors of a poor prognosis in colorectal cancer

Regulatory T cells (Tregs) are a heterogeneous cell population that can either suppress or stimulate immune responses. Tumor-infiltrating Tregs are associated with an adverse outcome from most cancer types, but have generally been found to be associated with a good prognosis in colorectal cancer (CRC). We investigated the prognostic heterogeneity of Tregs in CRC by co-expression patterns and spatial analyses with diverse T cell markers, using multiplex fluorescence immunohistochemistry and digital image analysis in two consecutive series of primary CRCs (total n = 1720). Treg infiltration in tumors, scored as FOXP3+ or CD4+/CD25+/FOXP3+ (triple-positive) cells, was strongly correlated to the overall amount of CD3+ and CD8+ T cells, and consequently associated with a favorable 5-year relapse-free survival rate among patients with stage I–III CRC who underwent complete tumor resection. However, high relative expression of the activation marker CD25 in triple-positive Tregs was independently associated with an adverse outcome in a multivariable model incorporating clinicopathological and known molecular prognostic markers (hazard ratio = 1.35, p = 0.028). Furthermore, spatial marker analysis based on Voronoi diagrams and permutation testing of cellular neighborhoods revealed a statistically significant proximity between Tregs and CD8+-cells in 18% of patients, and this was independently associated with a poor survival (multivariable hazard ratio = 1.36, p = 0.017). These results show prognostic heterogeneity of different Treg populations in primary CRC, and highlight the importance of multi-marker and spatial analyses for accurate immunophenotyping of tumors in relation to patient outcome.


INTRODUCTION
The immune system has been established as a central player in cancer development and progression, standing at a crossroads between pro-and anti-tumorigenic functions 1 . High densities of lymphocytes in the tumor microenvironment are associated with a good patient prognosis across several cancer types 2 . However, it has become increasingly clear that cancer cells modulate the immune microenvironment and foster anti-tumor immune evasion by a variety of mechanisms 3 . The identification of immune checkpoints and development of inhibitors against these molecules has revolutionized the treatment of certain cancers. Patients who earlier had a dismal prognosis can now experience long-term survival 4 . The microsatellite instability (MSI) phenotype predicts response to immune checkpoint inhibition, and was in 2017 the first biomarker to be approved as a treatment indication agnostic of cancer site 5 . MSI tumors generally have a dense immune infiltration 6 , associated with the high mutational burden and presentation of diverse neoantigens to the immune system.
Colorectal cancer (CRC) is responsible for a large subset of MSI tumors [7][8][9] . Approximately 15-20% of CRCs are of the MSI subtype 9,10 , but high levels of immune infiltration can also be found in microsatellite stable (MSS) CRCs. The Immunoscore scores pathological specimens according to the amount of infiltrating CD8 + and CD3 + lymphocytes in the invasive front and central tumor 2 , and has been extensively validated as a prognostic marker of CRC beyond the MSI phenotype 11 . However, the tumor immune contexture includes a variety of cell types beyond the CD8 + and CD3 + populations, and both the composition and organization of the total leukocyte infiltrate can have prognostic relevance 12 .
Regulatory T cells (Tregs) are a subset of T cells that play a central role in dampening the inflammatory response 13,14 . In line with this immune suppressive function, Tregs have been associated with a poor patient prognosis in several studies and across cancer types, including in a pan-cancer meta-analysis 15 . This meta-analysis also suggested that Tregs are associated with a prolonged overall survival in certain cancer types, including CRC, head and neck cancer, and esophageal cancer 15 . The choice of markers for analysis of the Treg population has been debated 16 . Traditionally, CD4 + CD25 + -cells have been recognized as the immune cell population with the suppressive function of Tregs 17 , but it was later demonstrated that the transcription factor FOXP3 is critical to the development of these cells 18 . FOXP3 is currently the most widely used marker to analyze Tregs by immunohistochemistry (IHC). However, expression levels of CD25 vary among FOXP3-expressing cells, and it has been shown that the subset of FOXP3 + -cells with the highest levels of CD25 also have the phenotype with the strongest immune-suppressive activity 19 . CD25 High -cells have been shown to suppress tumor-associated antigen responses and to be associated with recurrence also from CRC 20 . Furthermore, direct interaction with effector cells, such as cytotoxic T cells, may be needed for the Tregs to exert their immune-suppressive functions [21][22][23] . This implies a critical impact on the spatial context of lymphocyte populations in the tumor microenvironment, but this has not been evaluated in large series of CRCs.
We hypothesized that tumor immunophenotyping with multiple Treg markers and spatial analysis relative to markers of cytotoxic T cells would improve the prediction of outcome from CRC. Tumor immunophenotyping was performed by fluorescencebased multiplex IHC and digital image analyses of tissue microarrays (TMA) of two single-hospital series of primary CRCs (total n = 1720).

MATERIALS AND METHODS
This study follows the Reporting recommendations for tumor marker prognostic studies (REMARK) 24 (Supplementary Table 1).

Patients and tumor tissue samples
Two independent, single-hospital series of primary tumor samples from patients treated by major surgical resection for stage I-IV CRC at Oslo University Hospital, Norway were analyzed on TMAs. Samples for the Norwegian series 1 were collected between 1993 and 2003 (n = 922), and samples for the Norwegian series 2 between 2003 and 2012 (n = 798). Patients were treated with adjuvant chemotherapy according to national guidelines, as previously described 25 . All tumor samples were retrospectively retrieved from the diagnostic biobank at Oslo University Hospital, and construction of TMAs from formalin-fixed paraffin-embedded (FFPE) tumor material has previously been described, with tumor tissue cores of 0.6 mm and 1.0 mm in diameter on the Norwegian Series 1 and 2 TMAs, respectively 25 . Briefly, all available tumor blocks from each patient were assessed by an expert pathologist (A. Svindland) and a representative block with tissue from the central tumor region was selected as the donor block. The recipient blocks (TMA) included a single tissue core from each patient, taken from an area of the donor block marked as representative by the pathologist. Mucinous tumors were included in the TMAs (n = 19 of the total 1720 samples) and in the analysis of T cells, as long as they met the technical requirements outlined in Supplementary Fig. 1. The two series have previously been analyzed for MSI status by PCR-based analysis of five microsatellite loci (BAT25, BAT26, D2S123, D5S346, and D17S250), as well as KRAS (exons 2 and 3) and BRAF V600E mutation hotspots [26][27][28] in DNA extracted from FFPE material and parallel fresh frozen tissue samples for series 1 and 2, respectively. MSI and BRAF V600E mutation status based on PCR were available for 1179 and 1114 patients, respectively. IHC was performed to increase the number of samples with known MSI and BRAF V600E mutation status. MSI status was determined by IHC in the Norwegian series 2 using antibodies targeting the four mismatch repair proteins MLH1 (clone ES05, DAKO/Agilent, Santa Clara, CA, USA), MSH2 (clone FE11, DAKO/Agilent), MSH6 (clone EP49, DAKO/Agilent), and PMS2 (clone EP51, DAKO/Agilent) 25 . BRAF V600E expression was analyzed in both series using the Ventana anti-BRAF V600E (clone VE1) mouse monoclonal antibody (Roche, Basel, Switzerland). Concordance between IHC and PCR-based results for MSI status and BRAF V600E mutation status was 97% and 96% in the 315 and 750 tumors with both data types available, respectively; the results from PCR analyses were used in cases of discrepancy.

Multiplex immunohistochemistry
Two five-color multiplex IHC stains were performed on 4μm thick sections of the TMAs. Stain 1 (pan/cytotoxic T cell stain) was performed using antibodies against CD56 (clone MRQ-42, Cell Marque, Rocklin, CA, USA), CD8 (clone C8/144B, DAKO/Agilent), CD3 (clone F7.2.38, DAKO/Agilent), and a cocktail of antibodies targeting the epithelial cancer cells (E-cadherin clone 36 (BD-Biosciences, Franklin Lakes, NJ, USA)/cytokeratin C-11 (Abcam, Cambridge, UK)/cytokeratin Type I/II clone AE1/AE3 (Thermo Fisher Scientific, Waltham, MA, USA)). Notably, data on CD56 (neural lineage marker expressed on a subset of immune cells, including natural killer cells) was not analyzed in the present study. Stain 2 (Treg stain) was performed with antibodies against CD4 (clone EP204, Cell Marque), FOXP3 (clone D6O8R, Cell Signaling Technology, Danvers, MA, USA), CD25 (clone EP218, Cell Marque), and CD8 (clone C8/144B, DAKO/Agilent). Both stains also included incubation with DAPI for staining of cell nuclei prior to mounting. A detailed description of the antibodies and reagents used is provided in Supplementary Table 2. The stains were carried out using a four-plex kit (NEL810001KT, PerkinElmer/Akoya, Marlborough, MA, USA), together with Opal 620 reagent (FP1495001KT, PerkinElmer/Akoya) as previously described 29 . The Opal protocol (PerkinElmer/Akoya) was followed with the exception that slide deparaffinization, antigen retrieval, and antibody stripping were all performed in a PT-link module (DAKO/ Agilent, Santa Clara, CA, USA) at 97°C for 20 min. All primary antibodies were incubated for 30 min at room temperature. For stain 1, CD56 was visualized with Opal 620, CD8 with Opal 520, CD3 with Opal 570, and the cocktail targeting the epithelial cancer cells with Opal 690. For stain 2, CD4 was visualized with Opal 520, FOXP3 with Opal 690, CD25 with Opal 620, and CD8 with Opal 570. The staining process is outlined in Supplementary  Table 3. Briefly, this protocol involves cycles of antibody hybridization and staining, followed by removal of the bound antibodies. The fluorescent probes bind covalently to the tissue, and several stains can be performed on the same tissue section prior to mounting and imaging. Determination of optimal concentrations of the antibodies for staining (Supplementary Table 2), and verification of complete antibody removal between staining cycles were performed in separate experiments on test-TMAs prior to staining of Norwegian series 1 and 2 (data not included).

Digital image analysis
Stained TMAs were multispectral imaged using the Vectra 3.0 system (PerkinElmer/Akoya). A single 20× (0.5 μm/pixel) image was taken for each sample of the Norwegian Series 1 TMA (0.6 mm diameter cores). A 2 × 2 image field was captured for each sample of the Norwegian Series 2 TMA (1.0 mm diameter cores). Samples were spectrally unmixed, including removal of tissue autofluorescence, and analyzed in inForm software (PerkinElmer/Akoya). Two batch-analysis algorithms were built based on a subset of the TMA samples, one for each stain. Samples were randomly selected and used to optimize each algorithm, until the algorithms were found to be sufficiently generalizable upon addition of new images (~20 samples were sufficient for algorithm development). The algorithms were then applied to the full sample sets in batch-analysis mode. For stain 1, samples were segmented into stromal and epithelial cancer cell regions based on staining by the epithelial cocktail (empty glass was used to segment out the background). For stain 2 (no epithelial markers), the whole tissue cores were segmented from the background. For both stains, individual nuclei were segmented using signal from DAPI, while segmentation of the cell membrane and cytoplasm was aided by signals from the other markers. Following batch analysis of the two TMA-cohorts, images were manually checked and regions to be excluded from analysis (including tissue folds and necrotic regions) were marked. Marked images were submitted to re-analysis. Samples with inadequate technical quality were removed from downstream analyses ( Supplementary Fig. 1).

Marker scoring
Data tables with raw mean fluorescence intensity values per marker per cell for each TMA core and spatial coordinates per cell for each TMA core, were exported from inForm and further processed in R (v. 3.6.3). For stain 1, each individual cell in each individual TMA core was scored as positive/negative for CD3 and CD8 based on the mean signal intensity of their corresponding fluorophores. Cancer cell and area fractions were also calculated based on this stain, by dividing the number of cells in the epithelial cancer tissue region by the total number of cells, and by dividing the area occupied by the epithelial cancer tissue region by the total area, respectively. For stain 2, each individual cell was scored as positive/negative for CD4, CD25, FOXP3, and CD8. Thresholds for each marker were set within each stain and patient series individually, based on visual inspection of the images and the intensity distributions of the markers ( Supplementary Fig. 2). CD8 analyzed in stain 1 showed good concordance with CD8 analyzed in stain 2 (Pearson's r = 0.92 in Norwegian series 2; Supplementary Fig. 3, and 0.76 in Norwegian series 1; Supplementary Fig. 4). A lower concordance in Norwegian series 1 is expected due to the smaller size of the TMA cores. An infiltration score for each immune cell marker was calculated for each sample (tissue core) as the number of positive cells/mm 2 . For stain 1, scores were also calculated for the intraepithelial and stromal regions separately. Infiltration scores were log2 transformed (after addition of a constant of 1 for inclusion of samples with 0 positive cells) since the distributions across samples were heavily rightskewed (data not shown). Tumors were grouped by the median log2 scores of CD3 and CD8 across the sample sets, in line with previous studies 30 , while determination of thresholds delineating high/low infiltration of other cell populations is indicated where applicable. Tregs were scored both as a FOXP3 + cell population and as a CD4 + /CD25 + /FOXP3 + (triple-positive) cell population. The triple-positive Tregs were categorized into four groups based on high and low FOXP3 and CD25 expression, using the 75th percentile of positive cells as a threshold for both markers. Triple-positive Tregs with high levels of both markers (CD4 + CD25 High FOXP3 High ) represented 16% of the total triple-positive Treg population. The mean expressions of FOXP3 and CD25 across all triple-positive cells per sample were also calculated.

Gene expression-based estimation of tumor-infiltrating immune cells
A subset of the tumors (parallel fresh frozen samples) has previously been analyzed for gene expression on the Human Exon 1.0 ST (n = 187) and Human Transcriptome 2.0 arrays (n = 167, Affymetrix Inc., Santa Clara, CA, USA) 31 . Background correction, quantile normalization, and gene-level probe summarization of the raw data CEL files were performed using the robust multi-array average method 32 implemented in the R package Affy (v 1.58.0) 33 , using custom Entrez CDF files (v22) from Brainarray 34 . Gene expression data generated on the two different platforms were merged by Entrez IDs and adjusted by batch correction using the ComBat function implemented in the R package sva (v 3.28.0) 35 . Entrez IDs were converted to HGNC gene symbols using the org.Hs.eg.db package (v 3.8.0) from Bioconductor 36 .
Gene expression-based scores for Tregs in each tumor were estimated with the algorithms from Danaher et al. 37 , CIBERSORTx 38 and ImmuCellAI 39 , and used to determine the concordance with IHC-based estimates. Danaher et al. estimated Treg scores as the log2 expression of the single marker gene FOXP3. For CIBERSORTx, abundances of immune cell populations were calculated using the online platform (https://cibersortx. stanford.edu/) in high-resolution mode with 100 permutations and in absolute mode with the default LM22 mixture file. Gene expression values on log2 scale were exponentially transformed to the linear scale using the exp function in the base R package for input to CIBERSORTx. ImmuCellAI was run via the online platform (http://bioinfo.life.hust.edu.cn/ImmuCellAI) using the "Immune cell abundance in sample" mode.

Spatial analysis
Spatial analysis of markers on TMAs was performed similarly to the approach described by Enfield et al. 40 . Tissue segmentation maps were exported from inForm software and used to construct tissue outlines in R software using the packages EBImage (v 4.28.1) and raster (v 3.4-5). Cell nuclei coordinates of all cells within each sample, obtained from the image analysis in inForm described above, and the tissue outlines were used to construct Voronoi diagrams with the R packages ggvoronoi (v 0.8.3), sp (v 1.4-4), spdep (v 1.1-5), and rgeos (v 0.5-5). These diagrams were then used to determine cellular neighborhoods for each individual cell. First, cells were placed into one of the following four categories based on marker expression: CD8 + , FOXP3 + , triple-positive Treg (CD4 + /CD25 + / FOXP3 + ), or other. Then, the numbers of CD8 + -cells neighboring at least one FOXP3 + -cell and CD8 + -cells neighboring at least one triple-positive Treg cell were determined based on output from the voronoi_polygon function in the ggvoronoi package and the poly2nb function in the spdep package. To assess whether the neighbor frequencies of these cell populations were higher than expected by chance, permutation testing was performed by Monte Carlo sampling (n = 2000 permutations) within each individual tumor sample. In each permutation, cell positions and the number of cells in each of the four categories were fixed based on the observed data, while the category of each individual cell was randomized in the cell map (output from the voronoi_polygon function) using the function sample in the base R package. The numbers of CD8 + -cells neighboring at least one FOXP3 + -cell or triple-positive Treg were determined in each randomized cell map, and the 2000 random cell maps represented the permuted distribution of cell neighbors. The sample was classified with a significant interaction between the cell types investigated if the observed number of cell neighbors was higher than the 95th percentile of the permuted distribution (illustrated in Fig. 3C). Samples without CD8 + -cells (n = 56) were not included for spatial analysis, while samples without FOXP3 + -cells (n = 75) or triple-positive Tregs (n = 212) were classified in the non-significant interaction subgroup.
For comparison, spatial interactions were also calculated based on the fraction of CD8 + -cells neighboring at least one FOXP3 + -cell or triplepositive Treg per sample. However, these estimates were strongly associated with the absolute numbers of marker positive cells in each sample ( Supplementary Fig. 5). The scoring of significant spatial proximities based on Monte-Carlo permutations was less influenced by the number of marker positive cells, illustrated by a more even distribution of samples with significant proximities relative to the absolute abundances of CD8 + -cells and Tregs per sample (Fig. 3D), and was chosen for further analyses.

Survival analyses
Survival analyses included 1316 of the totally 1720 patients, after exclusion of patients based on technical data quality (n = 276) and clinicopathological parameters (n = 128; Supplementary Fig. 1). For patients with stage I-III CRC, only those with a negative resection margin >1 mm (R0 status) were included. Patients with stage IV CRC were analyzed separately. Patients with synchronous primary CRCs at diagnosis, and those treated with pre-operative radiotherapy were excluded. Among patients with stage I-III CRC and R0 status, there were no significant differences in the clinicopathological or molecular marker distributions between included and excluded cases, with the exception of a more frequent exclusion of BRAF wild-type tumors (Supplementary Table 4). Five-year relapse-free survival (RFS) was evaluated as endpoint for stage I-III CRC, and calculated as time from surgery to recurrence or death from any cause, as defined by Punt et al. 41 . Overall survival was calculated as time from surgery to death from any cause and evaluated as endpoint in stage IV patients. Uni-and multivariable Cox proportional hazards models were estimated using the coxph function in the survival R package (v. 2.44.1.1), and the assumption of proportional hazards was tested using the cox.zph function. Kaplan-Meier plots were made using the survminer package (v. 0.4.6).
For the best possible comparison with the Immunoscore, the TMA cores were evaluated for combinations of CD3 and CD8 scores in the intraepithelial and stromal tissue compartments. The optimal marker combination for prediction of 5-year RFS was evaluated by the Akaike Information Criterion. Potentially confounding variables included in multivariable analysis of CD3 and CD8 (Table 2) were the same as used in the international validation of the Immunoscore 11 ; pT and pN stage, MSI status, patient age and sex. Notably, this study included both colon and rectal cancers, and tumor location was therefore added as a variable. Multivariable survival analyses were stratified by cohort (Norwegian Series 1 and 2).
For exploratory analyses of Tregs, Norwegian Series 2 served as a discovery series due to its larger TMA-cores, providing more robust estimation of tumor-infiltrating immune cells per sample. Norwegian Series 1 served as a historical validation series. For the final multivariable survival model (Table 3), three variables involving Tregs (CD25-expression in triple-positive Tregs, spatial associations between CD8 + -cells and FOXP3 + -cells and spatial associations between CD8 + -cells and triple-positive Tregs) and BRAF V600E status were considered together with the variables included for evaluation of CD3 and CD8 scores, described above. Final selection of variables was performed by bootstrap sampling (n = 1000) combined with a backward selection procedure implemented in the mfp package (v. 1.5.2), in line with the methodology described in Sauerbrei and Schumacher 42 . This procedure indicated a nonsignificant prognostic distinction between pT stage 1 and 2, and the two subcategories were therefore combined in the final model. Variables that were retained in less than 30% of the runs were excluded from the final model. Patients were excluded from multivariable analysis if data was missing for any of the variables included. KRAS mutation status was missing in 41% of cases and was only used to test for associations with the immune cell scores, and not included in multivariable survival analyses.

Statistics
All statistical analyses were performed in RStudio v. 1.1.383 with R v. 3.6.3. All statistical tests were two-sided (except for the determination of spatial proximity, which was one-sided) and p values less than 0.05 were considered significant. Infiltration scores of different immune cell populations were evaluated both as continuous variables and after dichotomization into high/low categories. The correlation-matrices for immune cell populations were based on Pearson's correlations and produced using the corrplot package (v. 0.84).

RESULTS
Multiplexed analysis of CD3 and CD8 on tissue microarrays recapitulates the prognostic value of T cells Immune marker infiltration scores per mm 2 tumor tissue in the two CRC series (n = 1720) are presented in Table 1 Tables 5 and 6). Akaike's information criterion indicated that the optimal MSI microsatellite instable, MSS microsatellite stable, pN regional lymph node classification, pT primary tumor classification, TNM Tumor-node-metastasis, tp-Treg triple-positive Treg. a n(Norwegian series 1) = 757 and n(Norwegian series 2) = 687; including all samples that were not excluded due to technical reasons ( Supplementary Fig. 1) prognostic combination of markers was intraepithelial CD8 and stromal CD3 (Supplementary Tables 5 and 6), and patients with a low expression of both had the worst survival (Fig. 1). Both intraepithelial CD8 and stromal CD3 infiltration were inversely associated with depth of tumor infiltration (pT) and nodal involvement (pN), specifically in MSS cancers (Supplementary Table 7). Both markers retained prognostic value in multivariable survival models when analyzed as categorical (Table 2) and/or continuous variables (Supplementary Table 8), but with strongest statistical significance for intraepithelial CD8. A stratified analysis indicated that MSI status was associated with a favorable patient survival only in the subgroup with low immune cell scores (low intraepithelial CD8 and stromal CD3), and this was not attributed to differences in the immune cell densities between the MSI and MSS tumors ( Supplementary  Fig. 6). Table 9). This could not solely be attributed to the different distributions of certain clinicopathological characteristics (patient age and tumor location) or MSI status between stage IV and stage I-III cancers, and stage IV cancer remained a significant predictor of a low infiltration of the various T cell populations in linear models incorporating these characteristics (Supplementary  Table 10). Interestingly, the largest difference in T cell abundances between stages I-III and IV was found within the intraepithelial compartment (Supplementary Table 10). Neither intraepithelial CD8 infiltration nor stromal CD3 infiltration were associated with 5-year overall survival in stage IV patients ( Supplementary Fig. 7) and these patients were not included in further analyses.
Tregs were also analyzed on the transcriptomic level in parallel fresh frozen samples from a subset of the tumors (n = 270), using three common algorithms for estimation of immune cell populations. There was no correlation between IHC-based FOXP3 infiltration and level of Treg infiltration estimated by the transcriptomic signatures from Danaher et al. (FOXP3 only) or CIBERSORTx ( Supplementary Fig. 11A, B, respectively), but a weak correlation with the ImmuCellAI Treg signature ( Supplementary  Fig. 11C). However, the ImmuCellAI signature was also correlated with IHC-based CD8 infiltration, indicating estimation of a broader immune cell population (Supplementary Fig. 12). Notably, the gene expression-based Treg signatures correlated weakly with Fig. 1 Infiltrating lymphocytes, scored by tissue-microarray analysis, are strongly associated with CRC patient prognosis. Intraepithelial (ie) CD8-and stromal (s) CD3-scores were dichotomized at the median within the pooled cohorts, resulting in four groups; ieCD8 High sCD3 High , ieCD8 High sCD3 Low , ieCD8 Low sCD3 High and ieCD8 Low sCD3 Low . Representative images of tissue-cores for each of these categories are displayed (red; CD3, green; CD8, purple; epithelial (tumor) markers, white; DAPI). Scale-bar equals 200μm. each other (Supplementary Fig. 11D-F), while correlations for signatures of CD8 + -cells were stronger ( Supplementary Fig. 13), supporting that Tregs are a particularly difficult cell population to accurately score across methods.
High scores for the strongly correlated Treg populations estimated by IHC of FOXP3 and CD4/CD25/FOXP3 were both associated with a higher 5-year RFS rate among patients with stage I-III CRC in Norwegian Series 2 ( Supplementary Fig. 14A). Neither of the Treg populations had a prognostic impact within the subgroup of patients with high intraepithelial CD8 and stromal CD3 scores, inconsistent with the hypothesis that Tregs have a negative influence on the activity of other T cell populations ( Supplementary  Fig. 14B). Considering the large variation in CD25 and FOXP3 expression among the triple-positive Tregs, these cells were further divided into four groups according to the expression levels of FOXP3 and CD25 (described in methods). High infiltration of CD4 + CD25 High-FOXP3 High -cells was also associated with a better 5-year RFS ( Supplementary Fig. 15). However, analysis of CD25 by its mean intensity value among triple-positive Tregs showed that patients with a high mean CD25 expression (mean-CD25 High ) had a significantly worse prognosis than patients with a low mean CD25 expression (mean-CD25 Low ) or no triple-positive Tregs (HR = 1.55, CI = 1.10-2.18, p = 0.012, Fig. 2B, left). Corresponding analysis of FOXP3 expression did not reveal any prognostic associations (Supplementary Fig. 8D). The negative correlation between mean CD25 expression in triple-positive Tregs and CD8 + -cell infiltration identified in the Norwegian series 2 was validated in the Norwegian series 1 (Supplementary Fig. 16), and patients in the Norwegian series 1 with mean-CD25 High samples also had inferior 5-year RFS compared to those with mean-CD25 Low or no triple-positive Treg samples (HR = 1.54, CI = 1.09-2.18, p = 0.013, Fig. 2B, right).

Spatial proximity between Tregs and cytotoxic T cells in a subset of CRCs
Spatial proximity to CD8 + -cells is essential for some of the immune suppressive functions of Tregs, and was initially analyzed on the TMA with largest tumor cores (Norwegian series 2; Methods; Fig. 3A-C). A statistically significant proximity between CD8 + -cells and FOXP3 + -cells or triple-positive Tregs was found in 47% and 30% of intraepithelial-CD8 High /stromal-CD3 High tumors, respectively (Supplementary Table 11). This corresponded to 40 and 22% of the total series of stage I-III tumors with R0-status (Supplementary Table 11). Spatial proximity was more frequent in samples with high densities of CD8 + -and FOXP3 + -cells or triple-positive Tregs (Fig. 3D and Supplementary Fig. 17), but was not associated with MSI status (Supplementary Table 11; p = 0.91 and 0.50, respectively). Notably, there was no significant difference in the cancer cell fraction or cancer area fraction of tumors according to the spatial proximity groups (Supplementary Fig. 17).
Patients whose tumors had a significant proximity between CD8 + -and FOXP3 + -cells or between CD8 + -cells and triple-positive Tregs had lower 5-year RFS compared to patients with CD8 + tumors and non-significant proximity to Tregs, although not statistically significant ( Supplementary Fig. 18, top). In the Norwegian series 1, a similar prognostic association for spatial proximity was found only with the triple-positive Treg population ( Supplementary Fig. 18, bottom). Data for the two series were pooled (Fig. 3E), and subgroup analyses according to T cell abundances supported the indications of a worse survival for patients with significant proximity between CD8 + -cells and triplepositive Tregs irrespective of the level of intraepithelial CD8 and stromal CD3 infiltration (Supplementary Fig. 19). Tregs and spatial proximity to cytotoxic T cells are associated with poor survival in multivariable analysis A multivariable model for prediction of 5-year RFS in the pooled series of stage I-III CRCs (n = 1088) was built based on the various prognostic immune cell estimates and clinicopathological factors. A high mean CD25 expression in triple-positive Tregs was found in 14% (n = 157) of the total tumor series (Supplementary Table 12). Mean-CD25 High samples were found more frequently in patients with higher pN stage (p = 0.005, Supplementary Table 12), and were less infiltrated by intraepithelial CD8-and stromal CD3-cells (p < 0.0001 for both populations, Supplementary Table 12). Significant proximity between CD8 + -cells and triple-positive Tregs was found in 18% of tumors (n = 183 of the 1032 evaluated samples; n = 56 samples with no detected CD8 + -cells were excluded from this analysis) (Supplementary Table 13). This measure was associated with intraepithelial CD8 (p = 0.0004) The cutoffs were set individually for each series, at the point that appeared as a shoulder for Norwegian series 2 and at the point between the two maxima for Norwegian series 1. Two samples with mean expression >20 were excluded from the density plot for Norwegian series 2, and one from the plot for the Norwegian series 1, for visualization purposes. Kaplan-Meier survival analysis was performed in the Norwegian series 2 (left) and Norwegian series 1 (right) individually; patients with mean-CD25 High samples were compared to those with mean-CD25 Low and those with no triple-positive Tregs (tp-Treg Negative ) combined.
and stromal CD3 (p < 0.0001) also in the pooled patient series, but not with MSI status (p = 0.92) or any of the other clinicopathological features evaluated (Supplementary Table 13). Bootstrapped sampling and a backward selection procedure of the full model was implemented to determine the optimal combination of predictors (described in methods, results presented in Supplementary Table 14). Intraepithelial CD8 + -cell infiltration, mean CD25-expression level in triple-positive Tregs and spatial proximity between CD8 + -cells and triple-positive Tregs were all included in the final multivariable model, together with MSI status, pT-and pN-status, and patient age (Table 3 and Supplementary  Tables 14 and 15). Stromal CD3 + -cell infiltration was included in 42% of the models during bootstrapping and backward selection, but did not add independent value to the final model ( Supplementary Tables 14 and 16). The model was also evaluated upon inclusion of adjuvant chemotherapy as a possible confounder, but this variable did not have a significant impact (Supplementary Table 17). All three immune cell estimates had independent prognostic value in the final model, and CD8 + -cells were strongly associated with prolonged 5-year RFS, while each of the triple-positive Treg measures were independently associated with shorter RFS (Table 3). Notably, multivariable subgroup analyses according to MSI status indicated that proximity between CD8 + -cells and triple-positive Tregs was prognostic only in MSI (Supplementary Table 18), while high mean expression of CD25 in triple-positive Tregs was prognostic only in MSS (Supplementary  Table 19).

DISCUSSION
This study presents a detailed multi-marker analysis of Tregs in primary CRC, and shows that both the expression level of CD25 in CD4/CD25/FOXP3 triple-positive Tregs and spatial proximity to cytotoxic T cells are key factors to understand the prognostic impact of these immune cells. Tregs are a heterogeneous cell population that can both stimulate and suppress immune responses depending on their phenotype and marker expressions 16 . It has been indicated that Tregs have T-cell-suppressive functions in CRC 20,43 , but prognostic studies have shown that high amounts of tumor-infiltrating Tregs are associated with an improved patient outcome 15,44,45 . This is apparently contradictory to the thoroughly validated favorable prognosis associated with infiltrating CD3 + and CD8 + T cells in CRC 11 . This study supported the favorable prognosis associated with both the cytotoxic and regulatory T cell populations, possibly attributed to their strong correlation in infiltration densities among tumors. Furthermore, Tregs identified both by FOXP3 expression alone and in combination with CD4 and CD25 (triple-positive) showed similar results, again associated with a strong correlation in densities of the two immune populations among tumors, and indicating that the choice of markers does not confound the analysis. However, subsets of FOXP3 + -and/or CD25 + -cells can have very different functions depending on their expression levels and the specific set of co-expressed molecules 16 . Our analyses showed that the large variation in CD25-expression in triple-positive Tregs was both inversely correlated to CD8 + -T cell infiltration and strongly prognostic, with a high CD25 expression level apparently conferring a negative impact of Tregs on patient survival. Expression of CD25 on Tregs can deprive other T cells of IL-2, a cytokine with anti-apoptotic effects 46 . Furthermore, suppressive Tregs can also promote the destruction of T cells and exert other negative influences on surrounding immune cells 21 . The subpopulation of immune suppressive Tregs is therefore a potential drug target, and anti-CD25 antibodies have been shown to synergize with anti-PD1 treatment in mouse models transplanted with CRC cell lines 47 . Previous versions of anti-CD25 antibodies have failed to provide clinical responses against solid cancers, which can be explained by a negative bystander effect on IL-2 receptor signaling in effector T cells 48 . New anti-CD25 antibodies have been optimized to selectively deplete Tregs, while preserving IL-2 signaling in other effector T cells, and have shown exciting results in pre-clinical models 48 . Another potential strategy to inhibit the function of Tregs in the tumor microenvironment is through treatment with COX inhibitors 43 .
The spatial relationship with CD8 + -cells was found to be another discriminatory factor in the prognostic evaluation of Tregs, consistent with exertion of immune suppressive functions via direct cell-cell interactions 21 . Due to the exploratory nature of these analyses, our study could not disentangle a separate impact of CD25 expression and spatial proximity with cytotoxic T cells on the prognostic value of Tregs. However, both Treg estimates showed independent prognostic value in multivariable models. Fig. 3 Spatial association analysis of CD8 + -cells and triple-positive Tregs. One example of a sample displaying significant spatial proximity between CD8 + -cells and triple-positive Tregs is shown on the left, and one displaying non-significant spatial proximity is shown on the right (A-C). A Spectrally unmixed composite images of the samples exported from inForm software. Scale-bar equals 100 μm in A, B. B Voronoi diagrams constructed in R. C Density plots of the permuted data distribution were used to determine whether the observed number of neighbors between cell types was higher than could be expected by chance. The sample on the left had higher amounts of CD8 + -cells neighboring triple-positive Tregs than the 95th percentile of the permuted distribution and is therefore classified as having a significant association between the cell types. Conversely, the sample on the right had lower amounts of CD8 + -cells neighboring triple-positive Tregs than the 95th percentile of the permuted distribution and is therefore classified as having a non-significant association. D Plot of CD8 + -cell infiltration versus triple-positive Treg infiltration, colored according to whether a sample was classified as having significant spatial proximity between CD8 + -cells and triple-positive Tregs, or not. E Kaplan-Meier survival analysis according to the groups defined by spatial proximity between CD8 + -cells and triple-positive Tregs. Analysis performed within the pooled Norwegian series 1 and 2. Abbreviation: tp-Tregs; triplepositive Tregs. Subgroup analyses also indicated a dependence on MSI status. A prognostic impact of spatial proximity primarily in MSI tumors may reflect the higher overall level of immune infiltrates in this subtype, while an impact of CD25 expression levels primarily in MSS may reflect a more subtle functional effect associated with this marker. Independent studies are needed to validate these associations. This study is the first to analyze the in situ proximity of cytotoxic and regulatory T cells in relation to patient outcome in a large series of CRCs. We adapted a previously published approach 40 to determine the statistical rigor of marker positive cell neighbors by Monte-Carlo simulations. Tumor epithelial markers were not included in the stains used to analyze Tregs in association with CD8 + -cells. Accordingly, we did not take into account a potential difference in the spatial distribution of the immune cells between tumor epithelial and stromal regions. However, using data obtained from the neighboring tissue section (i.e., stain 1 described in Supplementary Table 3), no significant difference in cancer cell fractions were found in the samples determined to have significant vs nonsignificant neighbor associations between CD8 + -cells and triple-positive Tregs. Gene expression profiles of neighboring tumor tissue samples supported the notion that Tregs are particularly difficult to score. Correlations of Treg scores from three different algorithms for estimation of immune cell abundances were relatively poor, and weaker than for scoring of cytotoxic lymphocytes. Correspondence with IHC-based Treg estimates were also poor, including for FOXP3 expression on the gene and protein levels, although this could be attributed to intra-tumor heterogeneity resulting from analyses of parallel tumor samples.
A limitation of this study is the analysis of tissue cores from the central tumor region rather than whole-tissue sections. A formal comparison with the Immunoscore could therefore not be made, although we noted that the intraepithelial CD8 and stromal CD3 scores showed a similar prognostic power in multivariable models compared to separate reports on the Immunoscore in stage I-III colon cancer 11 . Use of TMAs facilitated a higher throughput for multi-marker analyses, and illustrated the potential for improved prognostic stratification of CRCs beyond the subgroups defined by CD8 + and CD3 + T cells. Analysis of Tregs according to CD25 expression and spatial associations with cytotoxic T cells refined the prognostic value of immune cell scoring in colorectal cancer, indicating the need for a more detailed tumor immunophenotyping in relation to patient outcome.
In conclusion, we have demonstrated heterogeneity in the prognostic associations of different Treg populations in primary CRC by multiplex immunohistochemistry and spatial marker analysis. A high total abundance of Tregs is associated with a favorable patient outcome, but triple-positive Tregs with a high mean expression of CD25 or spatial proximity to CD8 + -cells are independent markers of a poor prognosis. If validated in independent studies, these results could serve as grounds to explore the use of anti-CD25 antibodies to improve anti-tumor immunity in a subset of CRC patients.

DATA AVAILABILITY
The data in this study are available from the corresponding author upon reasonable request. Only limited clinical data for individual patients can be shared due to national legislation regarding privacy protection.