Introduction

Cell function and behavior cannot be fully understood without the context of their microenvironment. Communication between cells and their surroundings allows the functional organization of cells into tissues and organs. It also plays a vital role in maintaining tissue homeostasis by generating signals that suppress and revert malignant phenotypes1. Experiments in animal and cell culture models have demonstrated that certain conditions of the microenvironment can cause potent cancerous cells to revert to an almost normal phenotype2,3. Although the normal tissue microenvironment is known to be resilient to tumorigenesis, false signals in the microenvironment can disrupt tissue homeostasis and subsequently initiate tumors. The microenvironment in which tumor exists is both complex and heterogeneous, inhabited by a multitude of cellular and non-cellular components including tumor cells, extracellular matrix, tumor stroma, blood vessels, inflammatory cells, signaling molecules4,5,6. Studies over the last decade have increasingly recognized the role of these different components in the development and progression of tumors5. This paper adds to this evidence and shows how its quantification may be automated.

Metastasis is the major cause of morbidity and death in colorectal cancer (CRC). The 5-year survival rate in CRC patients with distant metastasis is approximately 10%, considerably smaller than 70% with regional metastasis and 90% without metastasis7. Patients at high risk of developing distant metastasis could benefit from appropriate adjuvant and follow-up treatments if stratified accurately. The literature reports several histopathological features carrying prognostic value for CRC progression. Each of the features reflects competing cellular stimuli that influence tumor progression or suppression within the microenvironment. Type, density, and relative locations of different tissue components in the tumor microenvironment are crucial in determining progression and patient survival in CRC. For instance, the number of cytotoxic and memory T cells in the tumor center and the invasive margin have been linked to an improved prognosis of CRC8. Similarly, numerous studies have reported cancer-associated fibroblasts (CAFs) and desmoplasia to be important histopathological features associated with an unfavorable prognosis for CRC and an increased mortality rate9,10,11,12,13. Analogous to a wound that never heals14,15, tumors stimulate many associated responses, wherein normal fibroblasts have been reported to acquire a cancer-associated phenotype5,16. Furthermore, the extent of necrosis in CRC has been reported to correlate strongly with cancer progression and patient survival13,17,18. The link between necrosis and tumor progression is possibly due to the hypoxic nature of tumors, which drives tumor infiltrating inflammatory cells, namely phagocytic macrophages and granulocytes, to secrete pro-inflammatory cytokines which in turn promote cell proliferation4.

In this study, we investigate the significance of tissue phenotypic and morphometric features, exploring in particular cellular heterogeneity in tumor microenvironments, in determining metastatic potential in CRC patients diagnosed with advanced primary tumors. Based on the AJUCC/UICC-TNM staging system19, this group of patients have a primary tumor that has grown through the outer lining of colon wall (T3/T4), have no lymph nodes that are affected by cancer cells (N0), and no clinical evidence of distant metastasis at the time of diagnosis (M0). Detailed quantitative analysis was performed on whole slide images (WSIs) of CRC histology slides, stained with routine Hematoxylin & Eosin (H&E) dyes in a fully quantitative manner, using bespoke image analysis methods to provide an objective and reproducible assessment. Quantitative analysis of various types of cell population reveals novel tissue phenotypic features, derived from both cell-cell connection frequencies and tissue appearance, with significant association with metastasis incidence and distant metastasis-free survival (DMFS) in the advanced primary CRC tumors.

Results

Quantifying tissue phenotypic signatures of CRC tumors

In this study, WSIs of Hematoxylin and Eosin (H&E)-stained histological sections from 102 patients with advanced node negative primary CRC tumors (T3/T4, N0, M0) were acquired from two independent cohorts from two different institutes: University Hospitals Coventry and Warwickshire (UHCW, 72 patients) and Hamad General Hospital (HGH, 30 patients). Summary details of the cohorts and clinical information are given in Table 1.

Table 1 A summary of clinicopathological data.

CRC, like other solid tumors, is a disease of substantial heterogeneity20,21. Different parts of the same tumor can exhibit different features including cellular morphology, gene expression, metabolism, motility, angiogenic, proliferative, immunogenic and metastatic potential22. The tumor microenvironment is composed of diverse cell types; each plays a different role in tumor development and progression — some support and promote tumor progression while others play host protective roles5. The biological functions of cells are not only determined by their type but are also greatly influenced by their surrounding context. It follows that tissue morphometric signatures measuring tumor heterogeneity could be computed from the analysis of distributions and relative locations of cellular populations in the tumor microenvironment.

Here, we outline the quantification of digital tissue phenotypic signatures (see Methods for details). We divided each tumor histology image (i.e., each WSI) into small square regions or sub-images (Fig. 1a) and analyzed the small sub-images to obtain local characteristics that were then summarized to characterize the entire tumor section. We first applied our artificial intelligence (AI) based algorithm23, which was recently shown to be the state-of-the-art in detecting and distinguishing between four types of cells based on their morphology and context, to each sub-image. The four types of cells were: malignant epithelial cells, spindle-shaped cells (normal fibroblasts, cancer-associated fibroblasts and smooth muscle cells), inflammatory cells (eosinophils, lymphocytes and neutrophils), and necrotic debris (Fig. 1b). This allowed us to do quantification of tissue morphological characteristics associated with tumor, based on both distributions and relative spatial locations of diverse cell types. For each small tissue region (sub-image) in the large WSI, we then constructed a cell network (Fig. 1c). Each vertex of the network represents a cell of a certain type, and an edge denotes a cell-cell connection between immediately neighboring cells. Based on the distribution of cell-cell connections in the network (Fig. 1d), we then grouped the local tissue regions into different phenotypes using an unsupervised learning approach. The six resulting connection frequency (CF) based tissue phenotypes were visually discernible with each phenotype corresponding mainly to local areas of smooth muscle, inflammation, tumor-stroma interface, tumor, stroma, or necrosis (Fig. 1e). Finally, we used the ratio of the area of each CF tissue phenotype to the total tissue area to give digital tissue phenotypic signature of each tumor sample (Methods).

Figure 1
figure 1

Profiling tissue morphometric phenotypes. A WSI was divided into small regions of size 200 × 200 μm2 (a). Cellular components in the image were localized and classified into 4 different cell types, including malignant epithelial cell, inflammatory cell, spindle-shaped cell and necrotic debris, based on their nuclear morphology and surrounding tissue context (b). A cell network was subsequently constructed from the cell detection and classification results, in which nodes in the network represent cells and edges conceptualize relationships among them (c). A distribution of cell-cell connections was calculated for each small region (d). According to their distributions of cell-cell connections, tissue regions were profiled into 6 different phenotypes (e).

To further examine the extent to which the aforementioned automatically derived cell-cell CF tissue phenotypes correlate with known tissue types, we also quantified the tissue types by means of appearance based (AP) tissue segmentation. The tissue content of each WSI was automatically segmented into the following eight categories: tumor, stroma, loose connective tissue, normal/hyperplastic mucosa, smooth muscle, necrosis, fat, and inflammation (Fig. S1). We then investigated correlation between the CF and AP based tissue phenotypes. These are smooth muscle, inflammation, tumor, stroma, and necrosis. The Spearman correlation coefficients for individual pairs of CF and AP features range from 0.427 to 0.698 (Fig. S2), indicating moderate correspondence between the automatically-derived phenotypes and the underlying tissue types.

In addition to the phenotypic and standard clinical features, we considered the following automatically-derived features: Morisita index24, stroma-tumor ratio9,11,12, and necrosis-tumor ratio13,17,18. These features have previously been identified as having prognostic significance for CRC or other malignancies. Morisita index measures the spatial coexistence of inflammatory cell and malignant epithelial cells24. Stroma-tumor ratio is defined as the proportion of the total area of stroma to the total area of combined stroma and tumor in the tissue. Necrosis-tumor ratio is defined in the similar manner as that of stroma-tumor ratio (Methods). It is worth noting that in the above studies, the stroma-tumor ratio and necrosis-tumor ratio were semi-quantitatively assessed on manually selected small regions of histological slides. In contrast, we measured these quantities with greater precision and using all regions of our WSIs, thus avoiding subjective bias.

Association between phenotypic and clinical features

Here, we determined the strength of association between the CF tissue phenotypic features and standard clinical features normally used in routine prognostication of colon cancer (Table 2). The clinical features included tumor differentiation, tumor histological type, and primary tumor (T) stage. For example, to check whether there is association between the CF inflammation ratio and the T stage, we test if the distribution of CF inflammation ratio of the group of samples that are annotated as pT3 stage is significantly different from the distribution of samples that are annotated as pT4 stage using Mann-Whitney U test (also known as Wilcoxon rank-sum test).

Table 2 Association between the CF tissue phenotypic features and standard clinical features.

We found statistically significant association between CF tumor-stroma interface ratio and T stage (p = 0.027). Nonetheless, the relatively small values of the coefficients of determination (r2 = 0.048) indicate that CF tumor-stroma interface ratio are only weakly associated with T stage. There is no statistically significant association between other pairs of the CF phenotypic features and the standard clinical features. Altogether, these results suggest that the CF tissue phenotypic features are not strongly associated with standard clinical features and, therefore, are potentially new features whose prognostic significance is worth further investigation.

Logistic regression analysis

To assess the significance of each phenotypic feature in identifying a patient’s risk of subsequent distant metastasis, we carried out logistic regression analysis. Odds ratio factor and 95% confidence interval (CI) estimates were obtained for each feature to quantify the risk of distant metastasis incidence associated with the phenotypic features (Methods). In the multivariate analysis, the effect of individual features was adjusted for the effect of standard clinical features, as well as, the cohort membership since samples from two cohorts (UHCW and HGH) were used in the analysis.

The results show that CF smooth muscle and inflammation ratios are statistically significant (p < 0.05) in univariate and multivariate analyses (p < 0.05, Table 3 and Table S1). The interquartile change in CF smooth muscle ratio increases the odds of distant metastasis by a factor of 1.889 (95% CI: 0.903–3.95) in univariate analysis and by 2.101 (95%CI: 0.919–4.801) in multivariate analysis. The interquartile change in CF inflammation ratio, on the other hand, decreases the odds by a factor of 0.3 (95% CI: 0.119–0.758) in the univariate analysis and 0.305 (95%CI: 0.11–0.846). Despite the fact that CF smooth muscle and inflammation ratios are separately shown to be statistically significant in both the univariate and multivariate analyses, when considered together in the multivariate model (Table S1), their joint contribution towards the prediction of metastasis development becomes less clear. This is likely due to a moderate degree of correlation (ρ = −0.64, Fig. S3) between the features. Thus, when one is used, the other should probably be disregarded.

Table 3 Prognostic values of different features according to the logistic regression analysis.

We investigated if the above statistical results could be achieved by means of the AP smooth muscle ratio and AP inflammation ratio features. Only the AP inflammation ratio is shown to be statistically significant in the univariate and multivariate analyses (p < 0.05, Table 3).

Distant metastasis-free survival analysis

Next, we investigated the prognostic significance of various features, using DMFS as a criterion. The analysis was carried out on all cases from the UHCW cohort (72 cases), for which survival data were available. In our multivariate analysis, the effect of individual features was adjusted for the effect of standard clinical features.

The tissue CF smooth muscle ratio feature and the AP inflammation ratio feature were shown to be influencing features in determining the DMFS probability of the patients under Cox proportional hazards models (p < 0.05, Table 4 and Table S2). The effect of the interquartile change in CF smooth muscle ratio is to increase the hazard by 1.770 times (95% CI: 0.676–4.635) in the univariate analysis and by 2.106 times (95% CI: 0.793–5.595) in the multivariate analysis. The effect of interquartile change in AP inflammation ratio on the DMFS probability is to reduce the hazard by a factor of 0.376 (95% CI: 0.191–0.741) in the univariate analysis and by a factor of 0.389 (95% CI: 0.189–0.803) in the multivariate analysis. In addition, when CF smooth muscle and AP inflammation are compared together in the same multivariate model, the effect of each feature on the DMFS probability vanishes (Table S2). There is a statistically significant difference between the survival distributions of cases when stratified by AP inflammation ratio (log-rank p < 0.05, Table 4, Fig. 2). Stratification by other features does not yield statistically significant results (Fig. S4).

Table 4 Prognostic values of different features according to the Cox proportional hazards regression analysis on the UHCW cohort.
Figure 2
figure 2

Prognostic values of the AP inflammation ratio in the univariate survival analysis. (Left) A 5-year DMFS estimate. The gray shaded regions indicate the 95% confidence intervals of the estimates. (Right) Kaplan-Meier curves. A log-rank p-value was computed for each pair of Kaplan-Meier estimates.

In summary, CF smooth muscle and AP inflammation ratios are shown to be important prognostic factors for DMFS across in the univariate and multivariate Cox regression analyses. Nonetheless, they are not shown to be independent of each other and therefore when one is used, the other should probably be disregarded.

Discussion

The goal of this study was to investigate the prognostic significance of novel image-based quantitative morphometric features derived from diverse cellular populations that constitute the tumor microenvironment of CRC with advanced primary tumors (T3/T4, N0, M0).

Digital phenotypic features vs histological features

To fully explore the rich microscopic level information available in a tissue section, we have developed an automated system to provide quantitative measurements and to avoid subjectivity from visual assessment. The analysis was conducted on WSIs of H&E-stained formalin-fixed paraffin-embedded (FFPE) histological sections. Unlike previous works that identify diverse cellular components in a tumor section25,26, our morphometric features are not limited to tumor cells, lymphocytes, and stromal cells, but also include other types of inflammatory cells, spindle-shaped cells, and necrotic debris. In addition, we explored the relationship between these cellular components through a cell-cell network in order to characterize the morphological and tissue phenotypic heterogeneity of tumor. Our system did not adopt a commonly used approach27,28 that calculates a large number of features followed by feature selection methods to select a handful of features suitable for the objectives of the analysis. Although such an exploratory approach has proved successful in some applications27,28, the resulting features may not be easily interpretable in clinical terms. Moreover, if sufficiently many features are tried, it is likely that one of them will turn out to be “statistically significant” and so this approach requires follow-up tests of reproducibility. Instead, we investigated a small set of 8 features (6 CF + 2 AP phenotypic features), automatically found through unsupervised phenotyping and segmentation (see Methods for details). These features are visually meaningful as they correspond to distinct histological patterns of CRC tissue (Fig. 1).

Our systematic analysis shows that (a) the CF smooth muscle, CF inflammation ratios, and AP inflammation ratio are potentially independent markers predicting the occurrence of distance metastasis (binary logistic regression analysis) and (b) the CF smooth muscle and AP inflammation ratios are potential prognostic markers of 5-year DMFS for CRC patients diagnosed with advanced primary tumor (Cox proportional hazards regression analysis). CF smooth muscle ratio essentially measures the amount of the smooth muscle that is part of the colon wall. It quantifies the extent of spread and potential advancement of the tumors — the concept is related (but not similar) to other measures such as T stage, tumor-stroma ratio9,11,12, and tumor border configuration29,30. Low CF smooth muscle ratio is strongly associated with favorable prognosis. CF inflammation and AP inflammation ratios largely measure the amount of inflammation within the tumor tissue. High inflammation ratio is strongly associated with favorable prognosis, which supports the host-protective role of inflammatory cells in CRC that has been described by several studies8,31,32. From this observation, one may hypothesize about the biological relevance of each of our automatically derived tissue phenotypes for tumor development and progression.

In logistic regression and survival analyses, the obtained AUCs for CF smooth muscle, CF inflammation and AP inflammation ratios are approximately within the range of 0.57–0.64. This indicates that the classifiers and survival predictors perform better than random but considered not satisfactory in general. This may imply that the metastasis risk of CRC cannot be rigorously assessed by only a single or a few variable(s). Leveraging other sources of information, such as molecular data, clinical record, and other imaging modalities, in conjunction with the proposed tissue phenotypic signature is one of the possibilities to address the performance issue.

The prognostic value of stroma-tumor ratio9,11,12 and necrosis-tumor ratio13,17,18 could not be confirmed in this study. It should be emphasized that, in those studies, both the ratios were semi-quantitatively measured in manually selected tumor-rich areas and were inevitably prone to observer bias. By contrast, our study measured these quantities in a fully automated and quantitative manner from all regions of the tumor section and therefore can be considered to be more objective and reproducible.

Uncertainty found in our analysis pertaining to the prognostic impact of standard clinical factors has also been confirmed in existing literature33,34,35,36,37,38. Despite the fact that tumor differentiation has been consistently shown to be a prognostic feature independent of stage39,40,41,42,43, the conventional grading process is subjective by its very nature and can exhibit a substantial degree of observer variability33,34. It is also worth noting that according to the revised WHO criteria44, only poorly differentiated tumor histology without mismatch repair protein deficiency is considered a high-risk factor. Presence of the mucinous histologic type in general is not an independent prognostic factor, given that available results are contradictory35,36. Recent data have demonstrated the primary tumor extent (T4 stage) to be a likely prognostic factor for recurrence/metastasis45,46,47. Nevertheless, like other semi-quantitative features, there have been reports of variability in assessment of the degree of tumor extent37,38. Results from our analysis also indicate that T4 tumors have adverse DMFS outcome compared to T3 tumors, though the difference is not statistically significant.

The samples in this study come from patients diagnosed with stage II (Dukes stage B) CRC. This is characterized by advanced primary tumor with neither lymph node nor distant metastasis involvement (T3/T4, N0, M0). Stage II CRC consists of a heterogeneous population; some subgroups appear more likely to develop distant metastasis than others. Although adjuvant chemotherapy treatment is effective in other stages of the disease, there is a limited incremental benefit that stage II CRC patients could derive from this type of treatment in general48,49,50. Due to the high financial cost and morbidity of the treatment coupled with uncertainty over which patients will relapse, there has long been a debate as to whether adjuvant chemotherapy treatment should be given to the patients, since a majority of the patients will already have been cured by surgical resection alone. In the absence of molecular or genetic predictive markers for chemotherapy response45,51,52, improved prognostication accuracy seems to be the only key to better identify candidates who could potentially benefit the most from systemic therapies and thereby avoid unnecessary overtreatment as well as provide more efficient use of healthcare resources.

Even though several histological features have been demonstrated to be potential prognostic markers for recurrence or distant metastasis in stage II CRC, their prognostic significance is less clear and needs further validation. Primary tumor (T) stage and the number of lymph nodes examined have been recommended as risk factors by the National Comprehensive Cancer Network53. Moreover, there is a controversy as to whether examining more lymph nodes can, in fact, reduce tumor staging error and in turn result in improved stage II patient survival54,55. High-frequency microsatellite instability has been associated with improved disease-free survival in one study52 while in another study the effect was the opposite51. Gene expression profile is another factor that has shown promise for prediction of recurrence45,46.

Study limitations

Based on the makeup of our dataset and the results from our analysis, we hypothesize that high CF smooth muscle ratio and low CF or AP inflammation ratios are potential risk factors for distant metastasis in stage II CRC. There are nevertheless some limitations of this study as described below.

Firstly, while it is possible that there may be subjectivity in the sample selection by the pathologist when selecting slides showing the deepest invasion into the bowel wall and/or the worst differentiated parts of the tumor, that was considered as part of the inclusion criteria in this study (Methods). Furthermore, most tumor sections used in this study were of 2–3 cm2 × 4-5 μm across the face of the tumour in the horizontal plane. In volumetric terms, this is clearly a small proportion of the tumour and it is possible that there is sampling bias, as is the case for most such studies. It is also worth mentioning that the limited number of metastatic cases (n = 27) in this study may have rendered the analyses underpowered to detect prognostic effects of some features.

Secondly, although our cell detection and classification approach23 was developed to be robust to a certain degree of variation of images arising from factors such as stain inconstancy, batch effects, failed autofocus, and artefacts in the tissue preparation process, it remains to be tested if the degree of variation is excessive. Good image quality is therefore critical if the system is to produce accurate results. This issue can be addressed by careful tissue preparation and slide scanning.

Thirdly, due to the nature of the H&E stain and cellular morphology, our system is capable of identifying only a limited number of cell categories that are somewhat coarse. IHC stains could provide an effective means of identifying more specific cell types, such as different types of immune cells and fibroblasts (normal fibroblasts or CAFs), at the additional costs of IHC slide preparation and associated antibodies.

Fourthly, the phenotyping proposed in this work was done on the basis of local cell-cell connection frequencies and also on the basis of appearance and other important contextual information such as tissue textures. This, on the one hand, can be seen as a limitation of the proposed quantitative tissue phenotyping approach, as it relies on local cell populations to generate global statistics. On the other hand, a number of studies have reported that normal cells of various types undergo transformation when coming into contact with tumor cells, thus resulting in some of the previously normal cells exhibiting new biological functions different from the original ones. The proposed approach focuses on cellular morphology and cellular context and avoids influences from other possibly misleading contextual information.

Finally, our analysis was based on a single dataset consisting of two independent cohorts from different institutes. To further confirm the reproducibility of the results and generalizability of our automated histologic quantification system, large-scale validation using independent cohorts from multiple institutes is required. To be translated into clinical practice, these limitations will need to be carefully addressed.

The outlook

With the increasing uptake of digital slide scanning technology in histopathology laboratories, digitized WSIs will gradually replace glass slides in routine pathology workflow56. This presents an opportunity to advance image analytical techniques and computational algorithms for quantitative analysis of tissue morphology and consequently to provide an accurate and reproducible means for the diagnosis and prognostication of cancers. This is the first step towards effective treatment, decision-making, and personalized medicine with computational support. In this work, we have demonstrated the usefulness of such morphometric tools to reveal prognostic features in CRC. Our morphometric analysis is not restricted to images of FFPE CRC tissues but is also applicable to frozen tissue images as well as to images from different types of cancers. This morphometric approach was not designed to replace pathologists, but rather to provide additional information to assist in their diagnostic decision-making and risk stratification. Another potentially important direction would be to investigate potential associations between genomic alterations and digital tissue phenotypic signatures reflecting measurable aspects of in the tumor microenvironment.

Methods

Experimental design

The main objective of this study was to assess the significance of tissue phenotypic features for determining distant metastasis in advanced primary CRC. Specifically, we asked what quantitative tissue phenotypic features are biologically meaningful and important in predicting the subsequent development of distant metastasis and the distant-metastasis-free survival. Based on results from our statistical analyses, we have shown that digital tissue phenotypic features are independent prognostic factors for distant metastatic potential in CRC patients with advanced primary tumors (T3/T4, N0, M0). The sample size for logistic and Cox proportional hazards regression analyses was calculated based on the concept of events per variable57,58,59 which60 indicates that a minimum of 30 metastatic subjects would be sufficient to control for a type I error rate at 7%, 95% CI coverage of 93%, and a relative bias of 7% of the estimate in the Wald test. We retrospectively recruited 130 CRC subjects (90 UHCW + 40 HGH) with advanced primary tumors. The enrollment was stopped when the calculated sample size was reached. We excluded cases without a 5-year distant metastasis status or with clinical evidence of metastasis at the time of diagnosis. We further excluded outlier cases whose tissue section had no tumor. In total, 28 cases (18 UHCW + 10 HGH) were excluded and there were 27 metastatic cases left. Our analyses were conducted on H&E-stained WSIs of tumor sections. In view of the limited number of cases, randomization was not used in any experiments.

Patient and clinical information

This study involved two independent cohorts of CRC patients from two institutes. The first cohort consisted of 72 patients initially admitted for CRC treatment during the years 2006 to 2010 at University Hospitals Coventry and Warwickshire (UHCW), Coventry, UK. The second cohort comprised 30 patients admitted during the years 2007 to 2012 at Hamad General Hospital (HGH), Doha, Qatar. For each case, clinical data included tumor histological type, differentiation, stage of the primary tumor (T), lymph node metastasis (N), and distant metastasis (M). The 5-year DMFS data were available only for UHCW cases. All CRC patients were diagnosed with locally advanced tumors (T3/T4) and negative lymph node (N0), and distant metastasis free (M0). The TNM classification was reviewed and conducted according to the AJUCC/UICC-TNM staging system19. Summary details of the clinical information are given in Table 1.

The data used for this study including the WSIs and clinical information was provided after de-identification and informed patient consent was obtained from all subjects. Ethics approval for this study was obtained from the National Research Ethics Service North West (REC reference 15/NW/0843) and the Medical Research Center (RC/35213/2013) for the HGH cohort. All the experiments were carried out in accordance with approved guidelines and regulations.

Histological samples and imaging

For each case, tissue sections were prepared from an FFPE tumor tissue block and were then stained with H&E. Each tissue section was prepared in the pathology laboratory of the UHCW hospital. Histological slides were digitally scanned using the Omnyx VL120 Scanner (GE Omnyx, LLC) with an ×40 setting (equivalent to 0.275 μm/pixel). The scanned images were manually reviewed to control for failed autofocus. The tumor slides of all the cases were reviewed by the pathologists (DS, YT and IM) and the slides showing the deepest invasion into the bowel wall and/or the worst differentiated parts of the tumor, were selected for analysis. The reviewing pathologists agreed with the selection of slides in all cases.

Detection and classification of cells based on nuclear appearance

Two separate convolutional neural networks (CNNs) were trained, one for detection and another for classification of cells23. A spatially-constrained CNN produced a probability map assigning to each pixel the probability of being the center of a cell. Subsequently, the locations of cells were estimated by the local maxima of the probability map. To classify a detected cell, multiple small sub-images in the neighborhood of the detected cell were extracted and then fed to the neighboring ensemble predictor (NEP). The NEP was trained to classify 4 cell types: malignant epithelial cells, inflammatory cells (including eosinophils, lymphocytes, and neutrophils), spindle-shaped cells (including normal fibroblasts, CAFs and smooth muscle cells), and necrotic debris.

The training and validation of the two algorithms were carried out on a dataset consisting of more than 20,000 cells, annotated by an experienced pathologist and a trained observer. The pixel resolution of images in the dataset was reduced to 0.55 μm (equivalent to using a ×20 microscope objective). This dataset consisted of certain H&E-stained WSIs from cases that were initially excluded from the study. Based on a 2-fold cross-validation, the cell detection algorithm achieved an F1-score of 0.802 and the cell classification algorithm a multiclass AUC score61 of 0.917. For more details of the cell detection and classification method and the running time of the methods, see Sirinukunwattana et al.23 and Table S3.

Quantifying local tissue characteristic

We first split a WSI into small non-overlapping image tiles of size 200 × 200 μm2 (Fig. 1a), which was within the limit of effective intercellular communication distance62. For each image tile, a cell network (in computational terms, a graph) was constructed based on cell detection and classification results (Fig. 1b). The vertices of the network represent cells of different types. The network itself is the associated Delaunay triangulation (Fig. 1c), so that an edge represents a connection between a pair of neighboring cells. The edges connecting cells in one tile with cells in an adjacent tile were not considered. Since there are 4 cell classes, there are 10 possible pairs of cell-cell connections in the network. We then used the distribution of different cell-cell connection types (Fig. 1d) to characterize a given image tile.

Tissue phenotyping using cell-cell connection frequencies

In order to group image tiles into different phenotypes, we first calculate a feature vector based on cell-cell connection frequencies. We consider the 4-element set A = {M, I, S, N}, where M denotes the malignant epithelial type, I the inflammatory type, S the spindle-shaped type, and N the necrotic debris type. We also identify A with 1, 2, 3, 4 and define an indexing set Q = {(i, j)|i ≥ j}. Let h = [h(i,j)|(i, j) Q] R10 be the ten-dimensional cell-cell connection frequency vector representing the frequencies of all cell-cell connections, where h(i,j) [0, 1] denotes the proportion of connection frequencies between cells of types i and j. We calculated this vector for every image tile extracted from every WSI in the dataset.

Next, we performed k-medoid clustering on all frequency vectors, calculated as above, for all tiles in all WSIs in the dataset in order to group image tiles into different phenotypes. This unsupervised algorithm (we used the k-medoid algorithm implemented in Matlab 2016b) automatically finds a set of medoids — representative frequency vectors for tile phenotypes within the data — and assigns a phenotype label to each tile according to its nearest medoid. We employed the Chi-squared distance between a frequency vector h and a medoid m given by:

$$d(h,m)=\sum _{k\in Q}\,\frac{{({h}_{k}-{m}_{k})}^{2}}{{h}_{k}+{m}_{k}}$$

We initialized the medoids randomly and ran the clustering algorithm 100 times for each trial. We then used the results from the replicate that yielded the smallest total sum of distances between the frequency vectors and their corresponding medoids. The criteria used to determine the number of phenotypes k were the similarity between the phenotypes and the correlation between tissue morphometric features derived from the phenotypes (described below). The similarity between a pair of phenotypes was measured in terms of the Chi-squared distance between the pair of medoids representing the phenotypes. Correlation between a pair of features was measured by the Spearman correlation coefficient. In order to find a suitable number of distinct phenotypes k, we chose the maximum number of phenotypes that produced relatively high values of Chi-squared distance and relatively low values of correlation between distinct features. A distance value less than 0.2 and a correlation coefficient value greater than 0.8 were considered undesirable. We found that k = 6 is the maximum number of phenotypes that satisfies both criteria (Fig. S5).

Examples of image tiles from different tissue phenotypes discovered using cell-cell connection frequencies are shown in Fig. 1e. As can be observed in Fig. 1e, the six connection frequency (CF) based phenotypes found automatically corresponded well with the following distinct tissue phenotypes: smooth muscle, inflammation, tumor-stroma interface, tumor, stroma, and necrosis.

Tissue phenotyping based on appearance

We also trained a deep learning based CNN for patch-based tissue phenotyping, in which the following 9 categories of image patches were explicitly considered: normal, non-tissue denotes the proportion of connectionbackground, loose connective tissue (submucosa), fat (adipose), stroma (desmoplasia), inflammation, necrosis, smooth muscle, and tumor. Each image patch was of size 32 × 32 pixels with a pixel resolution of 2.2 μm/pixels (~5× objective). The architecture of the CNN was a simplified version of that proposed by Simonyan et al.63.

In developing this appearance (AP) based approach to tissue phenotyping, we used a dataset consisting of 193 sub-images, each of size 1,346 × 982 pixels. These images were extracted from WSIs of cases that were initially excluded from the study. A trained observer (KS) annotated all images. We randomly split the images into three parts with 52.5% for training, 17.5% for validation, and 30% for testing. Each WSI contributed images to only one part of the split. For training and validation, we extracted multiple patches of size 32 × 32 pixels from the training and validation images. We selected the version of the algorithm that yielded the best performance on the validation part. In testing, for each test image, we extracted patches in a sliding-window fashion and classified each of them separately before merging the results together to obtain a segmentation result for the whole image. The correct classification accuracies for the 9 tissue phenotypes were as follows: normal 98.9%, non-tissue background 99.9%, loose connective tissue (submucosa) 98.4%, fat (adipose) 97.9%, stroma (desmoplasia) 90.4%, inflammation 99.3%, necrosis 98.2%, smooth muscle 97.5%, and tumor 96.0%.

We ran the trained segmentation algorithm on the 108 H&E-stained WSI images, used in the analyses. Examples of the segmentation results can be seen in Fig. S1. Furthermore, as a quality control, segmentation results of 10 images (out of 108 images) were randomly selected and then reviewed by expert pathologists (DS, IC).

Automatically-derived tissue phenotypic features

The CF and AP based tissue phenotypic features were calculated as follows:

$${\rm{phenotype}}\,{\rm{ratio}}=\frac{{\rm{area}}\,{\rm{of}}\,{\rm{the}}\,{\rm{tissue}}\,{\rm{phenotype}}}{{\rm{total}}\,{\rm{tissue}}\,{\rm{area}}}$$

Here, the tissue area was computed from all tissue types excluding the normal and fat regions. The other tissue phenotypic features were quantified as follows:

$${\rm{stroma}}-{\rm{tumor}}\,{\rm{ratio}}=\frac{{\rm{stroma}}\,{\rm{area}}}{{\rm{stroma}}\,{\rm{area}}+{\rm{tumor}}\,{\rm{area}}}$$
$${\rm{necrosis}}-{\rm{tumor}}\,{\rm{ratio}}=\frac{{\rm{necrosis}}\,{\rm{area}}}{{\rm{necrosis}}\,{\rm{area}}+{\rm{tumor}}\,{\rm{area}}}$$

where stroma, tumor and necrosis areas were obtained from the AP based phenotyping results.

Statistical analyses

Our analysis did not distinguish well differentiated from moderately differentiated tumors—as recommended by Compton et al.64,65, this helps to avoid contradictory labelling by two different observers, or even by a single observer, looking at the same sample on two different occasions. Missing data were filled in with 100 imputed values using the multiple imputation method implemented in the R ‘mice’ library66. Analyses were performed on every imputed dataset and the results were combined to yield an overall estimate67. The significance level was set to 0.05 for all the tests described below.

Association between the tissue phenotypic and standard clinical features was tested by the Mann-Whitney test and the strength of association was determined through coefficients of determination (r2) of the test68,69. The median p-value and r2 were reported for a variable with multiple imputed values. We used the ‘rms’ library in R70 to fit logistic regression models, to calculate the area under the receiver operating characteristic curve (AUC), and to perform survival and bootstrap analyses.

Logistic regression analysis was performed to assess the predictive power of each phenotypic feature in identifying patients with a propensity for distant metastasis development. Effects of the automatically-derived features were gauged after adjusting for the standard clinical variables and cohort indicator variable in multivariate logistic regression models. A total of 102 cases (72 UHCW and 30 HGH) were used in the analysis. The 5-year metastasis status was treated as a binary outcome and features were treated as predictors in regression models. Estimated odds ratio and its 95% CI were obtained for each feature to quantify the risk of distant metastasis development associated with the feature. We reported the factor of change in odds ratio when the value of a feature changes from the baseline value to the new value. For a continuous feature, the baseline and the changed values were set to the 1st and 3rd quartiles of the feature. Furthermore, likelihood ratio p-values were computed to assess goodness of fit of predictive models contributed by various features.

Survival analysis was performed to determine the prognostic value for DMFS associated with each feature. Univariate and multivariate Cox proportional hazards regression analyses were conducted on 72 cases from the UHCW cohort for which DMFS data were available. The former was used to evaluate the prognostic impact of each feature separately while the latter was used to assess the prognostic value of image-based tissue phenotypic features while adjusting for the effects of the clinical features. Rao’s score test and Wald test were employed in the univariate and multivariate analyses, respectively, to test whether the regression coefficient corresponding to a particular feature in the Cox proportional hazards model was nonzero. Note that the score test is equivalent to the log-rank test when only a single categorical feature is considered in the model71. Hazard ratio and 95% CI estimates were obtained for each feature. To internally validate the performance of each fitted Cox proportional hazards model in predicting the survival probability, a bootstrap routine72 with 100 resampling replicates was employed to estimate the AUC. The statistical significance difference between survival stratifications was determined through the log-rank test using the R ‘survival’ library73. The cutoff with minimum p-value was used for stratification, and the p-value was adjusted according to Altman’s correction74 in case of a continuous feature.