DeePaN: deep patient graph convolutional network integrating clinico-genomic evidence to stratify lung cancers for immunotherapy

Immuno-oncology (IO) therapies have transformed the therapeutic landscape of non-small cell lung cancer (NSCLC). However, patient responses to IO are variable and influenced by a heterogeneous combination of health, immune, and tumor factors. There is a pressing need to discover the distinct NSCLC subgroups that influence response. We have developed a deep patient graph convolutional network, we call “DeePaN”, to discover NSCLC complexity across data modalities impacting IO benefit. DeePaN employs high-dimensional data derived from both real-world evidence (RWE)-based electronic health records (EHRs) and genomics across 1937 IO-treated NSCLC patients. DeePaN demonstrated effectiveness to stratify patients into subgroups with significantly different (P-value of 2.2 × 10−11) overall median survival of 20.35 months and 9.42 months post-IO therapy. Significant differences in IO outcome were not seen from multiple non-graph-based unsupervised methods. Furthermore, we demonstrate that patient stratification from DeePaN has the potential to augment the emerging IO biomarker of tumor mutation burden (TMB). Characterization of the subgroups discovered by DeePaN indicates potential to inform IO therapeutic insight, including the enrichment of mutated KRAS and high blood monocyte count in the IO beneficial and IO non-beneficial subgroups, respectively. Our work has proven the concept that graph-based AI is feasible and can effectively integrate high-dimensional genomic and EHR data to meaningfully stratify cancer patients on distinct clinical outcomes, with potential to inform precision oncology.

Supplementary Fig. 2: Kaplan-Meier survival plots of patient subgroups at different settings.For crossed over survival curves, log-rank test is not appropriate to calculate test statistics.Therefore we have used Fleming-Harrington test 4 calculate the p-value for the crossed Kaplan-Meier survival plots.used three recently FDA approved NSCLC IO trials in 2019 and 2020 as references.In these 3 recently FDA approved IO treated NSCLC trials, the Median overall survival (OS) for IO treated groups vs control groups are 17.1 months versus 14.9 (nivolumab plus ipilimumab 5 ), 20 months and 12.2 months (pembrolizumab 6 ), and 20.2 months and 13.1 months (atezolizumab 7 ), respectively.As shown in Fig. 4b, DeePaN discovered two subgroups with median survival of 20.8 months vs 10.8 months respectively.The better survival group in Fig. 4b has the median survival of 20.8 months, which is comparable with the median survivals of the IO treated groups in these recent FDA approved IO trials and therefore demonstrated clinical-relevant IO beneficial outcomes.

Supplementary Note 5: DeePaN based patient clustering are generally robust
In this study, we would like to demonstrate the feasibility and application of GCN to patient subtyping detection.To test the robustness of our approach, we performed a ten-round "Adjusted Rand Index" test 8 , which is a commonly used evaluation method for clustering robustness.For each round, we randomly removed 5% of patients.That would randomly remove 97 patients from the original dataset.Now the reduced cohort has 1,840 patients.Then, we re-ran DeePaN framework using the reduced dataset to get new five subgroups.For each new subgroup, we calculated the "Adjusted Rand Index" against the reference clustering labels (i.e. the original clustering results reported in the manuscript).The result is shown in the following, which indicates that in each round the clustering outcome matches reasonably well with the original clustering outcomes reported in the manuscript.The overall results are generally robust with the mean adjusted rand index of 0.91.
• Note: the Adjusted Rand Index gives a value between -1 and 1, where 0 means random labeling and 1 means perfect match between two clustering results.

Supplementary Note 6: DeePaN shows better performance than k-medoids clustering on identification of significant IO beneficial and non-beneficial subgroups
We compared DeePaN against k-medoids clustering using cosine similarity.The comparison uses performance evaluation on identification of significant IO beneficial and non-beneficial subgroups as below.
As described in the method section of the manuscript, since our goal is to provide actionable insight to support the clinical decision for immune therapy, i.e. to cluster patients into subgroups and decide which subgroups are IO beneficial vs IO non-beneficial.We therefore used three measures impacting relevance to IO outcomes to assess the performance in a volcano plot (see Supplementary Fig. 4 below as an example).These criteria were 1) difference of median survival times between an identified cluster and the overall cohort as the baseline, with positive values corresponding to the tendency of IO beneficial outcomes and negative values corresponding to the tendency of IO nonbeneficial outcomes (x axis); 2) statistical significance of the observed survival difference between an identified cluster and the overall cohort as the baseline (y axis); and 3) percentage of patients clearly assigned to significant IO beneficial and IO non-beneficial clusters using a P-value cutoff of 0.05.
A better performance corresponds to identify more patients with significant IO beneficial and non-beneficial outcomes, with stronger statistical significance, and with bigger median survival difference in comparison with the overall cohort as the baseline.
Based on the above evaluation criteria, the results showed DeePaN has better performance than k-medoids by identifying more patients with significant IO beneficial and nonbeneficial outcomes and with stronger statistical confidence (Supplementary Fig. 4).In particular, the IO beneficial subgroup identified by DeePaN is more significant than kmedoids clustering (circles located in the top right corner, with P-value of 3. We use marginalized graph autoencoder (MGAE) as an implementation of our graph neural network (GCN).We compared the performances of MGAE with other related AI alternatives including the graph autoencoder (GAE), autoencoder (AE), and the denoising Autoencoder (DAE).The design details of these AI models are as below.
For marginalized graph autoencoder (MGAE), after exploring different numbers of hidden layers and different numbers of patient clusters as two major hyperparameter tunings, we selected three hidden layers and set the number of clusters to be five to optimize performance.Based on literature recommendation 9 , for MGAE we set noise corruption level to be 0.4, lambda to be 1e-5 as regularization for the network training, and hidden layer node number to be 275.For GAE, it used the same settings as the MGAE, except there is no noise added toward feature inputs.For AE and DAE, the input is a matrix, of which each row represents a patient and each column represents a set of features.The categorical features are represented as one-hot encoding.The input layer contains 275 features, the embedding layer for DAE or AE contains three hidden layers, and each layer contains 100 nodes.The output layer is 275 features.For DAE, the noise (drawn from the normal distribution with zero mean and 1 standard distribution; applied 0.5 times multiplication; clipped between 0 and 1) is added to the dataset.Then the embeddings were taken and a k-mean clustering was applied to get the clustering results.To compare with MGAE, we set the number of clusters to be five for all the other methods.
The autoencoder (AE) network architecture in our experiment is shown as follows.
Denoising autoencoder (DAE) network architecture is the same as AE, except for an additional component for corrupting the data.
subgroups as well as to characterize individual subgroups to inform precision medicine instead of characterizing the entire patient population.
Because lung cancer is a heterogenous disease, our work focuses on discovery of patient subgroups to inform precision oncology.
Actually, building a predictive model and patient subtype modeling tend to be complementary and can be combined for synergy.For instance, the enriched clinicogenomic features characterizing IO beneficial vs non-beneficial subgroups derived from DeePaN can serve as pre-selected input features to enhance predictive modeling.

Supplementary Note 10: Defining the IO beneficial vs non-beneficial subgroups discovered by DeePaN base on the clinical relevance
To assess if the IO beneficial vs non-beneficial patient subgroups (green group vs red group in Fig. 2d) discovered by DeePaN has clinical relevance, we used three recently FDA approved NSCLC IO trials in 2019 and 2020 as references.In these 3 recently FDA approved IO treated NSCLC trials, the median overall survival (OS) for IO treated groups vs control groups are 17.1 months versus 14.9 (nivolumab plus ipilimumab 5 ), 20 months and 12.2 months (pembrolizumab 6 ), and 20.2 months and 13.1 months (atezolizumab 7 ), respectively.As shown in Fig. 2d, DeePaN discovered two subgroups with median survival of 20.35 months vs 9.42 months respectively.The better survival group in Fig. 2d has the median survival of 20.35 months, which is comparable with the median survivals of the IO treated groups in these recent FDA approved IO trials and therefore demonstrated clinicalrelevant IO beneficial outcomes.Thus this better survival group is defined as the IO beneficial subgroup.The worse survival group has about 10 months less median survival in comparison with the better survival group, therefore we define them as IO non-beneficial subgroup.

1 :
NSCLC IO treated patient cohort and visualization of their clinical and genomic features A) Cohort Identification: an illustration of how patient cohort was identified using inclusion and exclusion criteria in this study.B) visualization of clinical and genomic features in the study cohort.Features are categorized into molecular pathology features, blood test features, etc. Gray color indicates missingness in the feature.Note that "Tumor response" is not included as an input feature.C) Waterfall plot of DNA alterations in the study cohort.The genes are sorted based on frequency.2C: KM plots for five patient subgroups from T-SNE 2D: KM plots for five patient subgroups from UMAP 2E: KM plots for five patient subgroups from Autoencoder (AE) 2F: KM plots for five patient subgroups from Denoising Autoencoder (DAE) 2 × 10 −6 in DeePaN and P-value of 8.7 × 10 −5 in k-medoids); DeePaN also identified more patients with significant IO beneficial and non-beneficial outcomes than k-medoids (400 vs. 376 significant IO beneficial patients in DeePaN and k-medoids, respectively; 896 in total vs 534 significant IO non-beneficial patients in DeePaN and k-medoids, respectively).Supplementary Fig.4: DeePaN outperforms k-medoids clustering on identification of more patients with significant IO beneficial and IO non-beneficial outcomes and with stronger statistical confidence.We used a volcano plot for performance comparison.Each bubble represents a patient subgroup, the X axis represents the difference of the estimated median survival times between a patient subgroup and the overall cohort as baseline.The vertical line marked zero median survival difference, with bubbles on the right of the vertical line showing the tendency of beneficial IO outcomes and bubbles on the left showing the tendency of IO non-beneficial outcomes.Y axis is the -log10(FDR) of the corresponding log-rank test between a subgroup vs the overall cohort with multiplecomparison adjustment by Benjamini-Hochberg procedure, representing the statistical significance of the observed survival difference.The horizontal dashed line marked the statistical significance cutoff of FDR of 0.05.Supplementary Note 7: Design details of our graph neural network in comparison with other design alternatives.