Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients

Tsai, Pei-Chen; Lee, Tsung-Hua; Kuo, Kun-Chi; Su, Fang-Yi; Lee, Tsung-Lu Michael; Marostica, Eliana; Ugai, Tomotaka; Zhao, Melissa; Lau, Mai Chan; Väyrynen, Juha P.; Giannakis, Marios; Takashima, Yasutoshi; Kahaki, Seyed Mousavi; Wu, Kana; Song, Mingyang; Meyerhardt, Jeffrey A.; Chan, Andrew T.; Chiang, Jung-Hsien; Nowak, Jonathan; Ogino, Shuji; Yu, Kun-Hsing

doi:10.1038/s41467-023-37179-4

Download PDF

Article
Open access
Published: 13 April 2023

Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients

Nature Communications volume 14, Article number: 2102 (2023) Cite this article

14k Accesses
21 Citations
188 Altmetric
Metrics details

Subjects

Abstract

Histopathologic assessment is indispensable for diagnosing colorectal cancer (CRC). However, manual evaluation of the diseased tissues under the microscope cannot reliably inform patient prognosis or genomic variations crucial for treatment selections. To address these challenges, we develop the Multi-omics Multi-cohort Assessment (MOMA) platform, an explainable machine learning approach, to systematically identify and interpret the relationship between patients’ histologic patterns, multi-omics, and clinical profiles in three large patient cohorts (n = 1888). MOMA successfully predicts the overall survival, disease-free survival (log-rank test P-value<0.05), and copy number alterations of CRC patients. In addition, our approaches identify interpretable pathology patterns predictive of gene expression profiles, microsatellite instability status, and clinically actionable genetic alterations. We show that MOMA models are generalizable to multiple patient populations with different demographic compositions and pathology images collected from distinctive digitization methods. Our machine learning approaches provide clinically actionable predictions that could inform treatments for colorectal cancer patients.

Prediction of tumor origin in cancers of unknown primary origin with cytology-based deep learning

Article Open access 16 April 2024

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Article 18 April 2024

Microenvironmental reorganization in brain tumors following radiotherapy and recurrence revealed by hyperplexed immunofluorescence imaging

Article Open access 15 April 2024

Introduction

Colorectal cancer (CRC) is the second most common cause of cancer death in the United States, accounting for nearly 53,000 deaths annually¹. Histopathologic evaluation remains a cornerstone for diagnosing and staging CRC, and the histology subtypes and genetic variations are the keys to treatment selection². However, inter-rater variability in histopathology diagnoses has been reported^{2, 3}, and the genomic profiling process requires days to weeks to complete and is not available to every hospital in the developing world. These limitations have hindered CRC patients from receiving timely and appropriate treatments.

With the recent development of reliable whole-slide pathology scanners and high-performing computer vision techniques, quantitative pathology evaluation has become increasingly feasible⁴. Several studies using machine learning techniques reported remarkable diagnostic accuracy for various cancer types, such as lung, breast, ovarian, renal cell, and colorectal carcinomas^{5,6,7,8,9,10,11}. Previous works also demonstrated unexpected correlations between histopathology image features and clinically actionable molecular variations, such as microsatellite instability and PTEN gene deletion, in colorectal carcinoma samples^{12, 13}. These studies indicate that high-resolution pathology images contain underutilized biomedical signals useful for personalizing cancer care^{14,15,16,17,18,19}.

Nonetheless, many computational challenges hinder the extraction of useful histopathology signals, and several reports expressed concerns about the generalizability of deep learning models²⁰. Typical high-resolution digital pathology whole-slide images of colorectal cancer tissues contain up to billions of pixels, making it infeasible for standard convolutional neural networks to process the whole image at once. In addition, deep learning models are highly complex, and it is difficult to connect the image patterns discovered by these data-driven models with biological knowledge²¹. Furthermore, since there are a large number of parameters that researchers need to optimize in data-driven machine learning models, generalizability to other image acquisition methods remains a substantial challenge to many digital pathology models²². The lack of extensive validation in different patient cohorts can diminish the applicability of machine learning models in clinical settings.

In this study, we propose the Multi-omics Multi-cohort Assessment (MOMA) system, an explainable machine learning framework for analyzing digital pathology images at scale. Our informatics methods successfully predict the prognoses of early-stage colorectal cancer patients and achieve state-of-the-art performance in identifying the genomics and proteomics status of cancer samples using a weakly supervised prediction framework. We connect high-resolution digital pathology images with clinically actionable multi-omics aberrations, and we identify interpretable pathology predictors of patients’ survival outcomes. We further validate our framework in multiple large patient cohorts and demonstrate its generalizability in different populations and using different image acquisition methods. Our study provides a robust and flexible machine learning framework for scalable histopathology image analyses.

Results

Overview and patient characteristics

We develop the Multi-omics Multi-cohort Assessment (MOMA) machine learning framework for predicting clinically actionable variations in cancer genomics, proteomics, and patient prognoses using histopathology images. Figure 1A and 1B show an overview of our interpretable machine learning methods. In brief, MOMA leverages robust image pre-processing (tiling, color normalization, and feature extraction), multiple-instance learning, and vision transformers to connect whole-slide pathology images with clinical and molecular profiles of interest. We further quantify the importance of each microenvironment component in each prediction task (Fig. 1C, D). To demonstrate the generalizability of our methods, we apply MOMA to multiple cohorts, including TCGA colorectal cancer cohorts (TCGA-COAD and TCGA-READ), the PLCO cohort, and the NHS and HPFS cohorts. Table 1 summarizes the demographic, molecular, and clinical profiles of patients in each cohort.

**Fig. 1: An overview of the Multi-omics Multi-cohort Assessment (MOMA) machine learning framework.**

Table 1 Patient characteristics of our study cohorts

Full size table

MOMA predicts patients’ overall survival and progression-free survival

Early-stage colorectal cancer patients have heterogeneous survival outcomes. Although many clinical and molecular predictors have been proposed, they cannot fully explain the divergent prognoses. To address this challenge, we employ MOMA to predict both overall survival and progression-free survival outcomes of stage I-II colorectal cancer patients. Results show that MOMA successfully identifies patients’ overall survival outcomes in the TCGA held-out test set (Fig. 2A), with a concordance index (c-index) of 0.67 and log-rank test p-value of 0.01 between the two predicted prognostic groups. We further validate our model in two independent external cohorts: NHS-HPFS (Fig. 2B; P = 0.0495) and PLCO (Fig. 2C; P = 0.046), demonstrating the generalizability of our approaches. We visualize our models and show that dense clusters of adenocarcinoma cells are highly indicative of worse overall survival outcomes (Fig. 2D, E). Analyses that stratify colon cancer and rectal cancer samples show similar prediction performance in both cancer groups (Supplementary Data 1). Quantitative concept-based analyses reveal that regions of carcinoma cells, tumor-associated stroma, and interactions of carcinoma cells with smooth muscle cells in the cancerous regions are related to unfavorable overall survival (Fig. 1D).

**Fig. 2: MOMA predicts overall survival outcomes of stage I and II colorectal cancer patients using digital histopathology images, with validation in multiple independent cohorts.**

In addition, MOMA reliably predicts the progression-free survival outcomes of the same cohorts of patients. In the TCGA held-out test set, our progression-free survival outcome prediction model achieves a c-index of 0.88 and a log-rank test p-value of 0.02 in distinguishing the prognostic groups (Fig. 3A). We further demonstrate the applicability of our model in the NHS-HPFS cohorts (Fig. 3B; c-index=0.6, P < 0.005). When stratifying the datasets into colon cancer and rectal cancer groups, our approaches successfully identify the prognostic differences in both groups (Supplementary Data 1). A sensitivity analysis that was restricted to a surgery-only subgroup demonstrates the robustness of our results (Supplementary Fig. 1). Attention visualization shows that morphology patterns in tumor-associated stroma and groups of adenocarcinoma cells are highly indicative of progression-free survival (Fig. 3C, D). Compared with the overall survival prediction, our progression-free survival model puts more emphasis on infiltrating lymphocytes and regions associated with extracellular mucin in its prediction.

**Fig. 3: Quantitative histopathology imaging predicts stage I and II colorectal cancer patients’ progression-free survival outcomes.**

Furthermore, we employ MOMA to predict both overall survival and progression-free survival outcomes of stage III colorectal cancer patients. Results show that MOMA successfully identifies patients’ overall survival outcomes in the TCGA held-out test set (Fig. 4A), with a c-index of 0.66 and log-rank test p-value of 0.02 between the two predicted prognostic groups. We successfully validate our model in two independent external cohorts: NHS-HPFS (Fig. 4B; P = 0.0495) and PLCO (Fig. 4C; P = 0.04). On model visualization, we show that dense clusters of adenocarcinoma cells are highly indicative of worse overall survival outcomes (Fig. 4D, E). Similarly, MOMA successfully predicts patients’ progression-free survival outcomes (Fig. 5A), with a c-index of 0.74 and log-rank test p-value of 0.02 between the two predicted prognostic groups in the TCGA held-out test set. These results are validated in our independent external cohorts from NHS-HPFS (Fig. 5B; P = 0.003). Similar to our overall survival results, model visualization shows that dense clusters of adenocarcinoma cells are highly indicative of worse progression-free outcomes (Fig. 5D, E). Quantitative concept-based analyses reveal that regions of tumor-associated stroma and interactions of carcinoma cells with smooth muscle cells in the cancerous regions are related to unfavorable progression-free survival.

**Fig. 4: MOMA predicts overall survival outcomes of stage III colorectal cancer patients using digital histopathology images, with validation in multiple independent cohorts.**

**Fig. 5: MOMA predicts progression-free survival outcomes of stage III colorectal cancer patients using digital histopathology images, with validation in independent patient cohorts.**

MOMA provides improved prediction of MSI status using histopathology images

Immune checkpoint inhibitors have shown substantial survival benefits among a fraction of colorectal cancer patients. However, not all patients respond to this treatment modality with substantial immune-related adverse events. High-level microsatellite instability (MSI) status has been identified as a biomarker that predicts the response to immune checkpoint inhibitors. To facilitate the treatment effectiveness prediction for immune checkpoint inhibitors, we employ MOMA to predict the MSI status of each patient. Results show that the AUROC of the TCGA held-out test set is 0.88 ± 0.06 (Fig. 6A), and in the NHS-HPFS dataset, the AUROC is 0.76 ± 0.04 (Fig. 6B and Supplementary Table 1). Our methods improve the AUROC by 4% compared with the state-of-the-art methods by Kather et al.¹² (Supplementary Table 2). In both colon cancer and rectal cancer groups, MOMA shows correlations between histopathology images and MSI status (Supplementary Data 1). Model visualization further demonstrates that MOMA attends to lymphocytes, stroma, mucosa, and cancer regions when predicting MSI status (Fig. 6C, D).

**Fig. 6: MOMA predicts MSI status in colorectal cancer patients.**

MOMA predicts copy number alterations (CNAs) and expression levels of key genes in cancer development

We further examine the performance of MOMA in predicting copy number alterations (CNAs), whole-genome doubling, and overexpression of the BECN1 gene using histopathology images. CNAs of many key genes, including FHIT and PTEN, have been implicated in carcinogenesis. Here we show that MOMA predicts CNAs in FHIT and many other tumor suppressor genes (Fig. 7A–C). Compared with PC-CHiP, a commonly used image-based CNA prediction method, MOMA attains substantially improved prediction performance (Supplementary Table 3). In addition to the previously reported histopathology-CNA associations, MOMA further predicts amplifications in NOL4L, HM13, and FOXS1, and deletions in WWOX and CCER1, among many others (Fig. 7D–F). Furthermore, MOMA demonstrates improved prediction performance for whole-genome doubling, compared with PC-CHiP (Supplementary Table 4).

**Fig. 7: MOMA provides improved copy number alteration prediction compared with the current state-of-the-art methods and predicts additional copy number alterations not achieved in previous studies.**

Moreover, MOMA reveals the correlation between histopathology image patterns and the expression levels of BECN1 (Supplementary Fig. 2A), with the results validated in the NHS-HPFS dataset (Supplementary Fig. 2B and Supplementary Table 1). Stratified analyses by colon and rectal cancers show similar prediction performance in both cancer groups (Supplementary Data 1). In both BECN1-high and BECN1-low tumors, the model focuses on tumor and mucus regions; however, in BECN1-high tumors, the model also focuses on regions occupied by lymphocytes, while in BECN1-low tumors the model focuses on the stroma. (Supplementary Fig. 2C and Supplementary Fig. 2D).

MOMA identifies the histopathology patterns associated with BRAF mutation status

Genomic variations of proto-oncogenes and tumor suppressor genes are central to the development of colorectal cancers. For example, mutations in the BRAF gene propagate cell growth signals and are associated with reduced patient survival²³. Several targeted therapy drugs focusing on BRAF inhibition have been developed, and combinatorial targeted therapy trials are underway. To identify the morphological impact of clinically important genomic variations, we leverage MOMA to systematically predict the mutation status of BRAF, HIF1A, and PIK3CA. Results show that MOMA identifies a moderate histopathology signal for predicting BRAF c.1799T > A (p.V600E) mutation in the TCGA test set, with an AUROC of 0.71 ± 0.07 (Supplementary Fig. 3A and Supplementary Table 1). To further identify the morphological patterns associated with this actionable genetic aberration, we visualize the attention distribution of our models in Supplementary Fig. 3B and Supplementary Fig. 3C. The concept scores of mucus, stroma, and tumor regions for BRAF mutation with c.1799T > A (p.V600E) detection are 19.89, 18.94, and 16.87, respectively (Fig. 1D). When classifying samples with BRAF mutation at any locus (n = 529) with those without BRAF mutation, we also show that MOMA can identify the morphological signals associated with BRAF mutations in general (Supplementary Fig. 4A, B). Similar approaches also identify the relationship between histopathology and the mutation status of HIF1A and PIK3CA (Supplementary Figs. 5 and 6).

MOMA correlates histopathology patterns with the CpG island methylator phenotype

CpG island methylator phenotype (CIMP) colorectal cancer is a subtype characterized by widespread hypermethylation of promoter CpG islands. This hypermethylation pattern inactivates many tumor suppressor genes and causes global gene expression dysregulations and metabolic alterations. Previous studies suggest that patients with CIMP-high status have worse prognoses under the standard treatments²⁴. To identify the histopathology patterns indicative of CIMP-high status, we employ MOMA to predict CIMP-high status and visualize the resulting model. Results show that the AUROC of the held-out test set in the TCGA cohort is 0.66 ± 0.06 (Supplementary Fig. 7A), and in the independent NHS-HPFS validation dataset, the AUROC is 0.63 ± 0.03 (Supplementary Fig. 7B and Supplementary Table 1). Furthermore, regions of lymphocytes and cancer cells are highly indicative of CIMP-high status (Supplementary Fig. 7C and Supplementary Fig. 7D).

MOMA predicts consensus molecular subtypes using histopathology patterns

The consensus molecular subtype (CMS) is a commonly used molecular subtyping system for colorectal cancer that addresses inconsistencies in gene-expression-based classifications and reflects the biological differences in tumor characteristics²⁵. To identify the histopathology patterns indicative of the CMS subtypes, we employ MOMA to classify the major CMS subtypes with sufficient numbers of samples (CMS2 and CMS4). Results show that MOMA achieved an AUROC of 0.66 ± 0.04 in the held-out test set not participating in the model development process (Supplementary Fig. 8A and Supplementary Table 1). When stratifying the analysis by the colon and rectal cancer groups, we see a slightly improved performance in CMS prediction (AUROC = 0.74–0.77; Supplementary Data 1). MOMA indicates that regions of cancer-associated stroma and mucus are highly indicative of CMS2 and CMS4 (Supplementary Fig. 8B and Supplementary Fig. 8C).

Comparisons of regions predictive of key clinical and multi-omics profiles

We summarize the regions indicative of patients’ prognostic outcomes and molecular profiles identified by our interpretable machine learning framework. Our approaches provide a quantitative measurement of the relative importance of each region in predicting these outcomes of interest. For example, we show that histological patterns of the tumor, stroma, and mucus regions are relevant to the prediction of overall survival and disease-free survival, while regions with lymphocytes and mucus provide signals for predicting CIMP-high status. Figure 1D visualizes the regions of importance for each prediction task.

Discussion

In this study, we designed the MOMA framework for molecular characterization and clinical prognostic prediction using histopathology images of colorectal cancer, and we further validated our models in two independent patient cohorts. Our results demonstrate that interpretable machine learning approaches can predict patients’ survival outcomes and clinically important molecular profiles²⁶. Our methods can automatically identify informative regions from whole-slide pathology images without the need for detailed region-level annotations. In addition, we employed the vision transformer and obtained significantly improved performance compared with that of standard deep learning methods¹². Our multi-cohort validation showed the generalizability of our data-driven approaches for analyzing high-resolution digital pathology images.

Our models demonstrated that high-resolution histopathology slides contain useful predictive signals for genetic aberrations and survival outcomes. Because genetic profiling requires additional tissue samples, processing time, and costs, our prediction models that use only the H&E-stained histopathology slides can provide timely decision support for treatment selection in resource-limiting settings or in clinical scenarios with limited tissue availability. In addition, our stage-stratified survival outcome prediction successfully identified patients with shorter overall and disease-free survival under the standard treatments. These results showed that our machine learning approaches extracted stage-independent morphological signals indicative of patients’ clinical outcomes. Because patient prognosis depends on many clinical factors, no prediction models can perfectly identify the survival outcomes of individual patients. Nonetheless, our approach unveiled histopathology patterns related to patient prognosis, which could be useful in guiding clinical decision-making. For example, clinicians may provide closer follow-up to patients with suboptimal clinical prognoses, consider more aggressive treatment options, or enroll them in ongoing clinical trials²⁷.

Compared with previously published methods, our approaches achieved substantially improved prediction performance. For instance, we first reproduced a widely used patch-based convolutional neural network¹² for MSI prediction using the TCGA dataset, and we showed that MOMA achieved a 4% improvement on the same dataset (Supplementary Table 2). For CNA and WGD prediction, our approaches outperform models derived by the state-of-the-art PC-CHiP methods²⁸ by 7–29% (Fig. 7). Wilcoxon signed-rank tests confirmed that the performance difference is statistically significant in many clinically important genetic alterations, including BCL2L1 amplification^{29, 30} and FHIT deletion³¹ (Supplementary Table 3). Furthermore, we successfully predicted the copy number alterations of 14 additional genes and connected our attention-based deep learning framework with time-to-event models for survival prediction. These methods have the potential to guide clinical decision-making, suggest clinical trial enrollment, and reduce costs attributed to sequencing by serving as a screening tool. We further validated our models in two independent patient populations, i.e., the NHS-HPFS and the PLCO cohorts, which demonstrated the reliability of our approaches when applied to previously unseen populations^32,33,34,35.

Our approaches provide several advantages compared with conventional methods. First, we embedded color normalization approaches in our end-to-end sample processing pipeline, which contributed to the improved robustness of our survival prediction models. In addition, we developed a tumor detection model trained on slide-level annotations and employed this model to identify the regions of interest for multi-omics and survival prediction tasks. Our approaches effectively reduced the need for detailed annotations by pathologists.

Due to the large number of parameters in deep learning models³⁶, they are largely viewed as black boxes with limited interpretability³⁷. To enhance our understanding of model behaviors, we developed concept scores to quantitatively investigate the relevance of each region to the prediction tasks of interest, and we connected these regions with pathologists’ annotations to provide biological insights into our data-driven models. Our results demonstrated that regions occupied by tumors, supporting stroma, and mucus are crucial for survival prediction. These findings are consistent with observations that tumor invasiveness and tumor-stromal interactions are related to tumor progression³⁸. We further revealed that lymphocyte-infiltrated regions are associated with MSI status, BECN1 overexpression, and CpG island methylator phenotype (CIMP) status. The MSI-high status is an established biomarker for responses to immune checkpoint blockade³⁹, while BECN1 is a key regulator of autophagy⁴⁰ and has been proposed as a potential target for immunotherapy⁴¹. Our findings confirmed the relevance of these biomarkers with immune cell infiltration and suggested the role of digital pathology profiling in the prediction of response to immunotherapy.

Our study has a few limitations. First, our study is based on patient populations in North America. Although our results are validated in two large-scale, independent, and diverse patient cohorts, additional studies that focus on specific patient populations are needed to evaluate the applicability of our models in the targeted clinical settings. In addition, recent studies on self-supervised machine learning hinted at the potential for enhanced representation learning for efficient machine learning^42,43,44,45, which may be useful for enhancing deep learning-based pathology feature extraction. Future research can investigate the benefits and potential caveats of these methods. Furthermore, incorporating patients’ radiology imaging data, pathology profiles, molecular aberrations, and clinical characteristics may further improve the prognostic prediction for colorectal cancer patients. Additional research is required to identify the optimal prognostic prediction methods and enable personalized treatments and advance care planning.

In summary, we presented an interpretable machine-learning framework that systematically identifies the relationships between histopathology, molecular variations, and patients’ survival outcomes. We successfully predicted key genetic aberrations, gene expression profiles, overall survival, and progression-free survival in colorectal cancer patients, with the results validated in two independent validation cohorts. Our approaches can be extended to characterize the prognostic-informing quantitative pathology patterns of other complex diseases.

Methods

Data sources

We obtained histopathology images of colon and rectal cancer patients from The Cancer Genome Atlas (TCGA) tissue slide dataset⁴⁶, the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO)⁴⁷, the Nurses’ Health Study (NHS)⁴⁸, and the Health Professionals Follow-up Study (HPFS)^{49, 50}. We acquired the digital whole-slide pathology images, whole-exome sequencing results, and RNA-sequencing data of TCGA patients from the National Cancer Institute Genomic Data Commons Portal (https://portal.gdc.cancer.gov/). Mutation status, copy number alterations (including genetic amplifications and deletions), microsatellite instability, and CpG island methylator phenotypes (CIMP) of both colon and rectal adenocarcinomas were extracted from the cBioPortal (https://www.cbioportal.org/). Whole genome doublings and consensus molecular subtypes (CMS) of colorectal cancers were obtained from a previous TCGA publication⁵¹.

In addition, we obtained PLCO data from the National Cancer Institute Cancer Data Access System, and we collected clinical, genomic profiles, immunohistochemistry, and hematoxylin-and-eosin (H&E) stained tissue microarray images from the NHS and the HPFS coordinated by Harvard T.H. Chan School of Public Health, Harvard Medical School, and Brigham and Women’s Hospital. Notably, colorectal tumor tissue blocks in the NHS and the HPFS were retrieved from over a hundred hospitals throughout the U.S. with variable tissue age, which increased the generalizability of our findings⁵². For each histopathology from the NHS and HPFS cohort, two experienced colorectal cancer pathologists reviewed the cancer samples and selected the cores to ensure the representativeness of the cores. Thus, the TMA images include regions of tumor cells, stroma, tumor/stromal interfaces (i.e., microscopic tumor invasive edges), lymphocyte infiltration, and other pathological changes characteristic of the tumor sample from which the core was generated. Our multi-center study was approved by the Institutional Review Boards of Harvard Medical School (IRB20-1509). Our study protocol was also approved by the Brigham and Women’s Hospital, Harvard T.H. Chan School of Public Health, and the participating registries as required.

Overview of the Multi-omics Multi-cohort Assessment (MOMA) Framework

We designed a Multi-omics Multi-cohort Assessment (MOMA) machine learning framework to enable robust predictions of cancer genomics, proteomics, and important clinical outcomes. In this framework, we first pre-processed the whole-slide histopathology images by tiling each image into patches with 1000×1000 pixels, and we employed the color normalization proposed by Macenko et al.⁵³ to account for the staining differences across tissues and convert pixel values to a similar space in optical density. We used convolutional neural networks and vision transformers to extract pathology image features from each tile and connect these features with genomics, gene and protein expression levels, as well as patients’ overall survival and disease-free survival outcomes.

One key feature of MOMA is the integration of multiple-instance learning⁵⁴, multi-modality outcome prediction frameworks³², and biological interpretations of the prediction models. Leveraging state-of-the-art vision transformer models, we extracted informative features from the whole-slide images, and we connected them with genomic mutations, copy number alterations, transcriptomic profiles, and survival outcomes by incorporating relevant statistical models (e.g., Weibull models for survival prediction) to predict the molecular profiles and clinical outcomes of each individual. We further provided biological interpretations of the model predictions using pathology concepts of the tumor microenvironment⁵⁵. Below we describe our methodology in detail.

Multi-omics characterization via histopathology

Using the MOMA platform, we conducted multi-omics subgroup predictions on colorectal cancer patients, with a focus on clinically actionable molecular aberrations. Specifically, we predicted the microsatellite instability (MSI) status (65 MSI-high patients; 389 non-MSI-high patients in TCGA), CpG island methylator phenotype (CIMP; 58 CIMP-high tumor patients; 396 CIMP-low/negative tumor patients), BRAF c.1799T > A (p.V600E) mutation (48 patients with BRAF c.1799T > A (p.V600E) tumor; 529 patients with BRAF wild-type tumor), and the most prevalent Consensus Molecular Subtypes (CMS; 152 CMS2 (canonical) patients; 105 CMS4 (mesenchymal) patients) of colorectal cancer. These tasks were constructed as weakly-supervised classification tasks.

We employed the ResNet-50 network, a residual neural network with 50 layers⁵⁶, to extract 2048 features from the image patches. To mitigate the impact of artifacts in the whole-slide images, we applied the k-means algorithm to cluster the extracted feature vectors into 10 clusters, because typical colorectal cancer pathology images contain 10 different types of regions with biological significance (lymphocyte, stroma, debris, mucus, muscle, tumor, adipose tissue, background, normal, and others), and we used vision transformers⁵⁷ to derive the informative features for each cluster. The clusters whose weights were in the top three (strong positive association) or bottom three (strong negative association) would be used in the downstream analyses. After the vision transformer, the dimension of the feature vectors of each cluster was reduced to 512 to obtain efficient image representations for multiple-instance learning^58,59,60. We used a trainable attention-based pooling operation to aggregate these feature vectors. Our transformer encoder layers contain 512 neurons in the hidden layer, 8 heads, and 2048 neurons in the multi-layer perceptron, with a dropout rate of 0.1.

Finally, we applied two loss functions in the prediction tasks. The first one was the bag loss function of standard binary or multiclass cross-entropy with the inverted class weights informed by the number of tiles in each class. The inverted class weights enabled machine learning models to account for the classes with fewer instances and prevent the models from biasing toward predicting all instances as the majority class. The other was the instance loss function of the tile-level classifiers. To compute the instance loss function, we first ranked the weights obtained from the attention-based multiple-instance learning to select the top three clusters with positive labels and the bottom three clusters with negative labels. Next, we employed the smooth support vector machine⁶¹ with varying hyperparameter tau optimized for each task. We computed the total loss of the model as the sum of the bag loss function and the instance loss function.

To develop our models, we first split the TCGA dataset into 60% training, 20% validation, and 20% test sets. All tiles from the same whole-slide images were put in the same partition, in order to prevent information leaks. We trained our models using the training set, selected the optimal hyperparameters using the validation set, and reported our results in the untouched test set. We further validated our models in independent validation cohorts (please see the External Validation section). We reported the area under the receiver operating characteristic curves (AUROCs) of the test set for each classification task. We used a stochastic gradient descent (SGD) optimizer with a learning rate of 1e-3, a momentum of 0.9, a batch size of 1, and a weight decay of 5e-4. We trained all models with 250 epochs with a cosine annealing learning rate scheduler. We implemented our methods using Python 3.6 with PyTorch 1.6.0 in a single GPU system with NVIDIA Titan RTX. To make MOMA easily accessible to pathologists, oncologists, and biomedical informatics researchers, we further developed a web portal (https://rebrand.ly/MOMA_demo) that allows users to upload pathology images and employ our trained models to generate predictions. Our source codes for data analyses and trained models could be found at https://github.com/hms-dbmi/MOMA.

Overall survival and progression-free survival prediction

To demonstrate the extensibility of our MOMA platform to different prediction tasks, we connected our machine learning framework with the Weibull modeling methods⁶² to predict overall survival and progression-free survival outcomes of early-stage (stage I and stage II) and stage III colorectal cancer patients. We distinguished patients in the same stage groups into a “predicted longer-term survival group” and a “predicted shorter-term survival group,” and we used the log-rank test to evaluate their differences in actual survival outcomes. Stage IV patients received heterogeneous treatments and were thus not included in our stratified survival outcome prediction analyses. The Weibull distribution is a probability distribution with shape (kappa) and scale (lambda) parameters. Combinations of the shape and scale parameters can model different hazard functions for survival analyses. We modified our machine learning framework to estimate these two statistical parameters in the Weibull survival model. We used a trainable attention-based pooling operation to aggregate the image feature vectors and employed the exponential activation function for lambda and softplus for kappa⁶³. Our deep learning-based Weibull modeling approach can handle right censoring and accommodate different patterns of death rate over time (e.g., increasing failure rate, decreasing failure rate, and constant failure rate). Due to the smaller sample size in the TCGA dataset for the survival prediction task (337 patients with stage I or II cancer and 181 stage III patients), we first conducted a 5-fold cross-validation on TCGA before validating our approaches in external validation cohorts. We divided the prediction results into shorter-term survival and longer-term survival groups using the median predicted survival index, and we tested the survival differences between the predicted groups using the log-rank test. We used an RMSprop optimizer with a learning rate of 1e-5 and trained the model with 5 epochs using a batch size of 1. We implement our training and testing processing using Python 3.6 with TensorFlow 2 in the same single GPU system.

Multi-cohort external validation

To investigate the generalizability of our machine learning models, we harnessed two additional cohorts collected at different hospitals. Specifically, we used the pathology tissue microarray, genetic, immunohistochemistry, and clinical datasets from the NHS and HPFS to validate our trained prediction models. We further validated our survival prediction models using the whole-slide histopathology and survival information from the PLCO cohort. We applied the same image tiling, pre-processing, and color normalization methods to preprocess histopathology images from these external cohorts. We reported the AUROC (for classification tasks), concordance index (c-index), and log-rank test p-value (for survival prediction tasks) in these independent validations.

Model visualization and interpretation

We further identified human-interpretable pathology features employed by our machine learning models to obtain biological insights into the connections between histopathology morphology and molecular profiles. We developed a model interpretation method that incorporates model-derived concept scores and expert-annotated concepts based on prior pathology knowledge. Specifically, we first quantified the importance of each image region by occluding all pixels in the region and computing the extent to which the predicted outcome changed when the region was occluded. We define the importance index of each image region as the numerical change of the predicted probability due to the occlusion of the region. To connect crucial regions with pathology interpretation, we leverage 100,000 histopathology images annotated by gastrointestinal pathologists with seven concepts: colorectal adenocarcinoma epithelium, cancer-associated stroma, lymphocytes, smooth muscle, mucus, adipose tissue, and tissue debris. We developed a deep learning model that classified image regions into these pathology concepts with an accuracy of 99.38%, and we employed this model to compute the concept scores for regions with importance indices greater than 0.7. We scaled our concept scores to a range of [0, 100], where 100 indicates the region has the highest relevance to the concept of interest. Thus, our concept scores indicate the amount of attention our machine learning model pays to different regions of pathology changes in making the prediction, and it is not directly related to the amount of area occupied by each pathology pattern in the slides. We repeated this process for each machine learning task we performed. We used the standard color map to visualize the importance index and overlaid it with the original histopathology images. These approaches provide intuitive model interpretations using well-established concepts in cancer pathology.

Statistics & reproducibility

No data were excluded from the analyses. All available samples were included in the machine learning analyses, and the experiments were not randomized. The investigators were blinded to the labels of the samples in the test set before the final model evaluation.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

TCGA histopathology, molecular, and clinical data used in this study are available through the Genomic Data Commons portal [https://portal.gdc.cancer.gov/]. Mutation status, copy number alterations (including genetic amplifications and deletions), microsatellite instability, and CpG island methylator phenotypes (CIMP) of TCGA samples were extracted from the cBioPortal [https://www.cbioportal.org/]. The PLCO data is available at the National Cancer Institute Cancer Data Access System [https://cdas.cancer.gov/plco/]. The data from Nurses’ Health Studies and the Health Professionals Follow-up Study are available under restricted access due to patient privacy considerations. Procedures to access the data are described at https://www.nurseshealthstudy.org/researchers (contact email: nhsaccess@channing.harvard.edu) and https://sites.sph.harvard.edu/hpfs/forcollaborators/.

Code availability

The codes for our data analyses and trained models could be found at https://github.com/hms-dbmi/MOMA. The demo website of our MOMA system is at https://rebrand.ly/MOMA_demo.

References

Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer Statistics, 2021. CA Cancer J. Clin. 71, 7–33 (2021).
Article PubMed Google Scholar
Benson, A. B. et al. Colon Cancer, Version 2.2021, NCCN Clinical Practice Guidelines in Oncology. J. Natl Compr. Canc. Netw. 19, 329–359 (2021).
Article PubMed Google Scholar
Otálora, S., Atzori, M., Andrearczyk, V., Khan, A. & Müller, H. Staining Invariant Features for Improving Generalization of Deep Convolutional Neural Networks in Computational Pathology. Front Bioeng. Biotechnol. 7, 198 (2019).
Article PubMed PubMed Central Google Scholar
Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep. 6, 26286 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Article CAS PubMed PubMed Central Google Scholar
Han, Z. et al. Breast Cancer Multi-classification from Histopathological Images with Structured Deep Learning Model. Sci. Rep. 7, 4172 (2017).
Article ADS PubMed PubMed Central Google Scholar
Yu, K.-H. et al. Deciphering serous ovarian carcinoma histopathology and platinum response by convolutional neural networks. BMC Med. 18, 236 (2020).
Article CAS PubMed PubMed Central Google Scholar
Marostica, E. et al. Development of a Histopathology Informatics Pipeline for Classification and Prediction of Clinical Outcomes in Subtypes of Renal Cell Carcinoma. Clin. Cancer Res. 27, 2868–2878 (2021).
Article CAS PubMed Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wulczyn, E. et al. Interpretable survival prediction for colorectal cancer using deep learning. NPJ Digit Med. 4, 71 (2021).
Article PubMed PubMed Central Google Scholar
Chuang, W.-Y. et al. Identification of nodal micrometastasis in colorectal cancer using deep learning on annotation-free whole-slide images. Mod. Pathol. 34, 1901–1911 (2021).
Article PubMed PubMed Central Google Scholar
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ektefaie, Y. et al. Integrative multiomics-histopathology analysis for breast cancer classification. NPJ Breast Cancer. 7, 147 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jang, H.-J., Lee, A., Kang, J., Song, I. H. & Lee, S. H. Prediction of clinically actionable genetic alterations from colorectal cancer histopathology images using deep learning. World J. Gastroenterol. 26, 6207–6223 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yu, K.-H. et al. Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks. J. Am. Med. Inform. Assoc. 27, 757–769 (2020).
Article PubMed PubMed Central Google Scholar
Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
Bychkov, D. et al. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 8, 3395 (2018).
Article ADS PubMed PubMed Central Google Scholar
Yamashita, R. et al. Deep learning model for the prediction of microsatellite instability in colorectal cancer: a diagnostic study. Lancet Oncol. 22, 132–141 (2021).
Yu, G. et al. Accurate recognition of colorectal cancer with semi-supervised deep learning on pathological images. Nat. Commun. 12, 6311 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Tizhoosh, H. R. & Pantanowitz, L. Artificial Intelligence and Digital Pathology: Challenges and Opportunities. J. Pathol. Inform. 9, 38 (2018).
Article PubMed PubMed Central Google Scholar
Turki, H., Taieb, M. A. H. & Aouicha, M. B. Developing intuitive and explainable algorithms through inspiration from human physiology and computational biology. Brief. Bioinforma. 22, bbab081 (2021). vol.
Article Google Scholar
Stacke, K., Eilertsen, G., Unger, J. & Lundstrom, C. Measuring Domain Shift for Deep Learning in Histopathology. IEEE J. Biomed. Health Inf. 25, 325–336 (2021).
Article Google Scholar
Ducreux, M. et al. Molecular targeted therapy of BRAF-mutant colorectal cancer. Ther. Adv. Med. Oncol. 11, 1758835919856494 (2019).
Article CAS PubMed PubMed Central Google Scholar
Juo, Y. Y. et al. Prognostic value of CpG island methylator phenotype among colorectal cancer patients: a systematic review and meta-analysis. Ann. Oncol. 25, 2314–2327 (2014).
Article CAS PubMed PubMed Central Google Scholar
Guinney, J. et al. The consensus molecular subtypes of colorectal cancer. Nat. Med. 21, 1350–1356 (2015).
Article CAS PubMed PubMed Central Google Scholar
Phipps, A. I. et al. Association between molecular subtypes of colorectal cancer and patient survival. Gastroenterology 148, 77–87.e2 (2015).
Article CAS PubMed Google Scholar
Yu, K.-H. & Snyder, M. Omics Profiling in Precision Oncology. Mol. Cell. Proteom. 15, 2525–2536 (2016).
Article CAS Google Scholar
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer. 1, 800–810 (2020).
Article CAS PubMed Google Scholar
Zhang, H. et al. Genomic analysis and selective small molecule inhibition identifies BCL-X(L) as a critical survival factor in a subset of colorectal cancer. Mol. Cancer. 14, 126 (2015).
Article PubMed PubMed Central Google Scholar
Cho, S.-Y. et al. A Novel Combination Treatment Targeting BCL-X and MCL1 for -mutated and -amplified Colorectal Cancers. Mol. Cancer Ther. 16, 2178–2190 (2017).
Article CAS PubMed Google Scholar
Mendelaar, P. A. J. et al. Whole genome sequencing of metastatic colorectal cancer reveals prior treatment effects and specific metastasis features. Nat. Commun. 12, 574 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Yu, K.-H. et al. Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma. Cell Syst. 5, 620–627.e3 (2017).
Article CAS PubMed PubMed Central Google Scholar
Thrift, A. P., Kanwal, F. & El-Serag, H. B. Prediction Models for Gastrointestinal and Liver Diseases: Too Many Developed, Too Few Validated. Clin. Gastroenterol. Hepatol. 14, 1678–1680 (2016).
Article PubMed Google Scholar
Yu, K.-H. et al. Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation. J. Med. Internet Res. 22, e16709 (2020).
Article PubMed PubMed Central Google Scholar
Roberts, K. et al. Biomedical informatics advancing the national health agenda: the AMIA 2015 year-in-review in clinical and consumer informatics. J. Am. Med. Inform. Assoc. 24, e185–e190 (2017).
Article PubMed Google Scholar
Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018).
Article PubMed Google Scholar
Castelvecchi, D. Can we open the black box of AI? Nature 538, 20–23 (2016).
Article ADS CAS PubMed Google Scholar
Zhou, Y. et al. Single-Cell Multiomics Sequencing Reveals Prevalent Genomic Alterations in Tumor Stromal Cells of Human Colorectal Cancer. Cancer Cell. 38, 818–828.e5 (2020).
Article CAS PubMed Google Scholar
Ganesh, K. et al. Immunotherapy in colorectal cancer: rationale, challenges and potential. Nat. Rev. Gastroenterol. Hepatol. 16, 361–375 (2019).
Article PubMed PubMed Central Google Scholar
Sooro, M. A., Zhang, N. & Zhang, P. Targeting EGFR-mediated autophagy as a potential strategy for cancer therapy. Int. J. Cancer. 143, 2116–2125 (2018).
Article CAS PubMed Google Scholar
Tan, P. et al. Myeloid loss of Beclin 1 promotes PD-L1hi precursor B cell lymphoma development. J. Clin. Invest. 129, 5261–5277 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning (eds. Iii, H. D. & Singh, A.) vol. 119 1597–1607 (PMLR, 2020).
Ciga, O., Xu, T. & Martel, A. L. Self supervised contrastive learning for digital histopathology. Mach. Learn. Appl. 7, 100198 (2022).
Google Scholar
Dehaene, O. et al. Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology. arXiv. Preprint at https://doi.org/10.48550/arXiv.2012.03583 (2020).
He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 9729–9738 (2020).
Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Article ADS Google Scholar
Black, A. et al. PLCO: Evolution of an Epidemiologic Resource and Opportunities for Future Studies. Rev. Recent Clin. Trials. 10, 238–245 (2015).
Article PubMed PubMed Central Google Scholar
Mehta, R. S. et al. Association of Dietary Patterns With Risk of Colorectal Cancer Subtypes Classified by Fusobacterium nucleatum in Tumor Tissue. JAMA Oncology. 3, 921–927 (2017).
Bao, Y. et al. Origin, Methods, and Evolution of the Three Nurses’ Health Studies. Am. J. Public Health. 106, 1573–1581 (2016).
Article PubMed PubMed Central Google Scholar
Väyrynen, J. P. et al. Prognostic Significance of Immune Cell Populations Identified by Machine Learning in Colorectal Cancer Using Routine Hematoxylin and Eosin-Stained Sections. Clin. Cancer Res. 26, 4326–4338 (2020).
Article PubMed PubMed Central Google Scholar
Taylor, A. M. et al. Genomic and Functional Approaches to Understanding Cancer Aneuploidy. Cancer Cell. 33, 676–689.e3 (2018).
Article CAS PubMed PubMed Central Google Scholar
Liu, L. et al. Utility of inverse probability weighting in molecular pathological epidemiology. Eur. J. Epidemiol. 33, 381–392 (2018).
Article PubMed Google Scholar
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. 1107–1110 (2009).
Carbonneau, M.-A., Cheplygina, V., Granger, E. & Gagnon, G. Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition. 77, 329–353 (2018).
Blonska, M., Agarwal, N. K. & Vega, F. Shaping of the tumor microenvironment: Stromal cells and vessels. Semin. Cancer Biol. 34, 3–13 (2015).
Article CAS PubMed PubMed Central Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition 770–778 (2016).
Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR. (2021).
Dietterich, T. G., Lathrop, R. H. & Lozano-Pérez, T. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997).
Article MATH Google Scholar
Zhang, G. et al. Multi-instance learning for skin biopsy image features recognition. In 2012 IEEE International Conference on Bioinformatics and Biomedicine 1–6 (2012).
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Article PubMed PubMed Central Google Scholar
Lee, Y.-J. & Mangasarian, O. L. SSVM: A Smooth Support Vector Machine for Classification. Comput. Optim. Appl. 20, 5–22 (2001).
Article MathSciNet MATH Google Scholar
Carroll, K. J. On the use and utility of the Weibull model in the analysis of survival data. Control. Clin. Trials. 24, 682–701 (2003).
Article PubMed Google Scholar
Martinsson, E. Wtte-rnn: Weibull time to event recurrent neural network. (Chalmers University of Technology & University of Gothenburg, 2016).

Download references

Acknowledgements

We thank Li-Jin Huang for her exploratory data analyses, Shih-Yen Lin for his suggestions on manuscript revision, and Shannon Gallagher for her administrative support. K-H.Y. is partly supported by the National Institute of General Medical Sciences grant R35GM142879, Google Research Scholar Award, and the Blavatnik Center for Computational Biomedicine Award. We thank the AWS Cloud Credits for Research, Microsoft Azure for Research Award, the NVIDIA GPU Grant Program, and the Extreme Science and Engineering Discovery Environment (XSEDE) at the Pittsburgh Supercomputing Center (allocation TG-BCS180016) for their computational support. The Nurses’ Health Study and the Health Professionals Follow-up Study were supported in part by U.S. National Institutes of Health (NIH) grants (P01 CA87969, UM1 CA186107, P01 CA55075, UM1 CA167552, U01 CA167552, R35 CA197735, R01 CA151993, R01 CA248857); by Cancer Research UK Grand Challenge Award (UK C10674/A27140 to the OPTIMISTICC Team). Funds for this work were provided to P.-C.T., T.-H.L., K.-C.K., F.-Y.S., J-H.C. by the National Science and Technology Council (NSTC), Taiwan (MOST 110-2634-F-006-021 and NSTC 111-2634-F-006-011) and National Center for High-performance Computing (NCHC), Taiwan. We would like to thank the participants and staff of the Nurses’ Health Study and the Health Professionals Follow-up Study for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY. The authors assume full responsibility for the analyses and interpretation of these data.

Author information

These authors jointly supervised this work: Jonathan Nowak, Shuji Ogino, Kun-Hsing Yu.

Authors and Affiliations

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Pei-Chen Tsai, Eliana Marostica & Kun-Hsing Yu
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan ROC
Pei-Chen Tsai, Tsung-Hua Lee, Kun-Chi Kuo, Fang-Yi Su & Jung-Hsien Chiang
Department of Computer Science and Information Engineering, Southern Taiwan University of Science and Technology, Tainan, Taiwan ROC
Tsung-Lu Michael Lee
Division of Health Sciences and Technology, Harvard-Massachusetts Institute of Technology, Boston, MA, USA
Eliana Marostica
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Tomotaka Ugai, Mingyang Song & Shuji Ogino
Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
Tomotaka Ugai, Melissa Zhao, Mai Chan Lau, Yasutoshi Takashima, Seyed Mousavi Kahaki, Jonathan Nowak, Shuji Ogino & Kun-Hsing Yu
Cancer and Translational Medicine Research Unit, Medical Research Center Oulu, Oulu University Hospital and University of Oulu, Oulu, Finland
Juha P. Väyrynen
Department of Medicine, Dana Farber Cancer Institute, Boston, MA, USA
Marios Giannakis & Jeffrey A. Meyerhardt
Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Kana Wu
Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
Andrew T. Chan
Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Andrew T. Chan
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Shuji Ogino

Authors

Pei-Chen Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Tsung-Hua Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Chi Kuo
View author publications
You can also search for this author in PubMed Google Scholar
Fang-Yi Su
View author publications
You can also search for this author in PubMed Google Scholar
Tsung-Lu Michael Lee
View author publications
You can also search for this author in PubMed Google Scholar
Eliana Marostica
View author publications
You can also search for this author in PubMed Google Scholar
Tomotaka Ugai
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Mai Chan Lau
View author publications
You can also search for this author in PubMed Google Scholar
Juha P. Väyrynen
View author publications
You can also search for this author in PubMed Google Scholar
Marios Giannakis
View author publications
You can also search for this author in PubMed Google Scholar
Yasutoshi Takashima
View author publications
You can also search for this author in PubMed Google Scholar
Seyed Mousavi Kahaki
View author publications
You can also search for this author in PubMed Google Scholar
Kana Wu
View author publications
You can also search for this author in PubMed Google Scholar
Mingyang Song
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey A. Meyerhardt
View author publications
You can also search for this author in PubMed Google Scholar
Andrew T. Chan
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Hsien Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Nowak
View author publications
You can also search for this author in PubMed Google Scholar
Shuji Ogino
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Hsing Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P-C.T., T-H.L., K-C.K. performed the analyses and wrote the manuscript, F-Y.S. and E.M. interpreted the results and edited the manuscript, T.U., M.Z., M.C.L., J.P.V., M.G., Y.T., S.M.K., K.W., M.S., J.A.M., and A.T.C., contributed to the collection of data from the Nurses’ Health Study and the Health Professionals Follow-up Study and edited the manuscript. J.H.C. designed the study, supervised the work, and edited the manuscript, J.N. and S.O. contributed to the collection of data from the Nurses’ Health Study and the Health Professionals Follow-up Study, interpreted the results, and edited the manuscript. K-H.Y. conceived and designed the study and analyses, obtained the data, interpreted the results, supervised the work, and wrote and revised the manuscript.

Corresponding authors

Correspondence to Jung-Hsien Chiang or Kun-Hsing Yu.

Ethics declarations

Competing interests

K-H.Y. is an inventor of US 16/179,101, entitled “Quantitative Pathology Analysis and Diagnosis using Neural Networks.” This patent is assigned to Harvard University. K-H.Y. was a consultant of Curatio. DL. K.W. is currently a stakeholder and employee of Vertex Pharmaceuticals. This study was not funded by this entity. All other authors have nothing to disclose.

Peer review

Peer review information

Nature Communications thanks Manuel Rodriguez-Justo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Description of Additional Supplementary Files

Supplementary Data 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tsai, PC., Lee, TH., Kuo, KC. et al. Histopathology images predict multi-omics aberrations and prognoses in colorectal cancer patients. Nat Commun 14, 2102 (2023). https://doi.org/10.1038/s41467-023-37179-4

Download citation

Received: 04 August 2022
Accepted: 03 March 2023
Published: 13 April 2023
DOI: https://doi.org/10.1038/s41467-023-37179-4

This article is cited by

Research on liver cancer segmentation method based on PCNN image processing and SE-ResUnet
- Lan Zang
- Wei Liang
- Chong Shen
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.