Harnessing synthetic lethality to predict the response to cancer treatment

While synthetic lethality (SL) holds promise in developing effective cancer therapies, SL candidates found via experimental screens often have limited translational value. Here we present a data-driven approach, ISLE (identification of clinically relevant synthetic lethality), that mines TCGA cohort to identify the most likely clinically relevant SL interactions (cSLi) from a given candidate set of lab-screened SLi. We first validate ISLE via a benchmark of large-scale drug response screens and by predicting drug efficacy in mouse xenograft models. We then experimentally test a select set of predicted cSLi via new screening experiments, validating their predicted context-specific sensitivity in hypoxic vs normoxic conditions and demonstrating cSLi’s utility in predicting synergistic drug combinations. We show that cSLi can successfully predict patients’ drug treatment response and provide patient stratification signatures. ISLE thus complements existing actionable mutation-based methods for precision cancer therapy, offering an opportunity to expand its scope to the whole genome.

that a true SL when both its partnering genes are down-regulated, will have a detrimental effect on tumor growth and thus improve overall patient survival.
For example, consider the case of evaluating the clinical relevance of the SLi between AKT1 and BRCA1, which is frequently mutated in breast cancer. Experimentally, this can be interrogated in breast cancer cell lines via (A) AKT1 shRNA knockout in BRCA1-isogenic cancer cell line, or (B) double AKT1-BRCA1 knockout screen. In our evaluation of step I and step II, the co-inactivation state was determined in two different ways following the respective experimental screens (A and B). (A) In the case of an isogenic cell line, the co-inactivation of the gene pair is determined based on (1) BRCA1 mutation and AKT1 down-regulation by using AKT1 expression, and (2) BRCA1 mutation and AKT1 copy number loss using its SCNA values. (B) Similarly, in the case of a double knockout screen, the co-inactivation is determined based on (3) down-regulation of both genes using their expression, and (4) copy number loss of both genes using their SCNA values. Then, we evaluate clinical relevance in (i) pancancer samples and in (ii) breast cancer samples of TCGA cohort, resulting in 4 different statistical tests for a given experimentally identified SL pair. That means, a candidate SL pair that is identified by an isogenic cell line screen (or double knockout screen) are evaluated by steps I and II using both pancancer and breast cancer data, where the co-inactive state is determined by 1 and 2 (or 3 and 4, respectively). If significance was observed in at least one of these four tests, the in vitro SL pair was marked as significant for each of steps I and II, independently. We (i) included mutation data and (ii) considered pancancer and cancer type specific significance not to miss any important clinically relevant SL pairs. The total number of pairs that pass each of step I and II of To overcome above limitations, we inferred candidate SL pairs by capitalizing on the large-scale single gene knockout screens. We inferred the candidate SL pairs (Initial Set II) from 5 large-scale shRNA/sgRNA single gene knockout datasets (see Methods), and we select a gene pair significant if it passes false discovery correction in at least one of the 5 datasets (FDR<0.2). This is because each of the 5 shRNA/sgRNA datasets covers a different range of cancer types and genes. For example, the latest Achilles shRNA screening 18 covers 216 cell lines of 19 cancer types but the data includes only 4,898 gene knock-outs (because only those genes passed the quality control), whereas the previous version of Achilles 19 covers less number of cell lines and cancer types, but covers more gene knock-downs (N=6,885). Similarly, a recent shRNA screening from Marcotte et al. 20 covers essentiality scores of all 20,000 protein-coding genes but only for breast cancer cell lines, while their old screening 21 includes the essentiality score of 15,110 genes but covers breast, pancreatic, and ovarian cancer cell lines. The recent sgRNA screen includes all protein-coding genes in 33 cell lines 22 . Hence we included a candidate gene-pair if it is statistically significant in at least one dataset. For the inferred SL pairs that do not belong to Initial Set I, we applied standard ISLE steps I and II to evaluate the clinical relevance (as described in the Methods,

Identifying under-represented candidate SL pairs
To determine a gene to be underactive, we controlled for the different basal expression levels in each cancer type. Specifically, we determine inactive genes in each cancer type separately, which controls for cancer-type-specific different basal expression levels, and performed a single hypergeometric test on all cancer types together, as follows. First, we compared the expression level of that gene to its expression in samples of the same cancer type, and denoted the gene is down-regulated if it falls below 1/3-quantile. Second, once the down-regulated gene is identified in each cancer type, we performed a hypergeometric test for each pair of genes across all samples, combining all cancer types, to assess whether a given pair is observed jointly down-regulated significantly less than expected. We applied the same approach for mRNA expression level and SCNA data and used FDR<0.2 based on Benjamini-Hochberg 23 for both cases.

Identifying phylogenetically linked candidate SL pairs
The purpose of including the gene phylogeny screen is to further refine SL prediction. This enables us to better distinguish the true SL-partners from spurious ones because many gene co-expression and correlated SCNA of proximal genes.
To determine the phylogenetic distance between a pair of genes, we used non-negative matrix factorization (NMF) to consider the structure of the tree of life. Let's consider two cases of phylogenetic profiles of genes A and B, each having five orthologues in different species: (Case 1) the five orthologues of genes A and B are widely spread in multiple domains of life encompassing archaea, bacteria and eukaryotes, and (Case 2) the five orthologues of genes A and B are focused on primates. Simple similarity measures such as Hamming distance will find that the two cases have the same phylogenetic distance. However, they are vastly different when considering the structure of the tree of life, with Case 1 obviously being much more conserved. To capture this important difference we used NMF, following a standard procedure in the literature 24 .
To determine the threshold for phylogenetic similarity, we used the experimentally identified gold standard SL set (Initial Set I as described in Methods) -a total of 154,707 pairs, among which 6,033 pairs constitute the positive set (and the rest belongs to the negative set). We first identified the phylogenetic similarity of all the pairs and found the cut-off value of phylogenetic distance that best separates the positive set and negative set, with the objective being minimizing the p-value in the hypergeometric test. The optimal value was the top 50-percentile of the randomly selected gene pairs (a phylogenetic distance of 10.5). ISLE uses this criterion to determine phylogenetically linked pairs.

ISLE-identified cSL pairs predict patient survival
We investigated the performance of the ISLE-identified cSL pairs in predicting patient survival in an independent breast cancer dataset (Molecular Taxonomy of Breast Cancer International Consortium; METABRIC) 25 . To this end, we characterized a cSL pair to be functionally active if both its partnering cSL genes are underactive in that sample (determined from that sample's transcriptomic profile). Analyzing the METABRIC collection, we found that tumors with many functionally active cSLi exhibit significantly better patient survival than tumors with few functionally active cSLi, as predicted (Cox hazard ratio= 1.35 (P<3.45E-10)) controlling for age, metastasis to lymph nodes and breast cancer. Supplementary Figure 10B demonstrates that the signal is robust versus different thresholds for dividing patient groups in KM analysis, where the patients with high cSL-score consistently show better prognosis than the patients with low cSL-score (logrank P<1E-6 in all thresholds), demonstrating ISLEbased prediction does not strongly depend on the parameter choices. This analysis was performed using the core cSL network by applying FDR<0.1 to ISLE pipeline ( Figure 1B).

Validating ISLE with in vitro drug response data
In this analysis, we focused on the five inhibitor compounds whose response is highly variable across different cell lines (standard deviation within top 75-percentile). We used the cSL-pair-score of the P -T pair to predict the association between the low expression of P and better response to the drug targeting T (i.e. the labels, 1 if the low expression of P is associated with the better response to the drug targeting T, 0 otherwise) using Wilcoxon rank sum test with FDR<0.2 and fold change >0. 35, which leads to 2.3% of P -T pairs to be the positive set (1527 out of a total of 66,664 pairs), and all the remaining P -T pairs in the negative set.

Validating ISLE with in vivo drug response data
For in vivo analysis, we analyzed the large-scale mouse xenograft dataset, which collects 36 single drug response screening of 375 mouse samples in 15 cancer types 26 . Following Procedure 1 (see 'ISLE-based drug response prediction' in Methods) using drug-cSL network, we evaluated our prediction (cSL-score deduced from their transcriptomic and copy number profiles in the pre-treatment tumor samples) vs. experimentally measured drug response. Specifically, we marked a gene is inactive when both its expression and copy number is less than 1/3 quantile across samples in each cancer type. Based on the in vivo pathological drug response annotation, the samples were divided into responders (CR, PR, SD) vs. non-responders (PD) and their cSL-score were compared using Wilcoxon rank sum test. We focused on 7 drugs that have a limited number of drug targets (≤3), sufficient variability in best average response (BestAvgResponse) 26 across different samples per drug (variance >10percentile), and sufficient sample size (at least 4 responders and 10 nonresponders). We further filtered out the drugs whose average response rate and maximal response rate have large values (more than 50% and 275%, respectively) in terms of the same response measure (BestAvgResponse), and focused on the drugs that were primarily used in a single cancer type. In addition, we also tested our prediction against in vivo Progression Free Survival (PFS) data, which measures the time for the tumor to grow double of the baseline tumor size (see Methods in Gao et al. 26 ). We tested whether high cSL-score is associated with improved survival using Cox regression analysis while controlling for other confounders such as cancer subtypes.

Benchmarking ISLE with DREAM7 challenge
We evaluated the prediction accuracy of ISLE in comparison to existing supervised methods tested in DREAM7 challenge for single 27 and double 28 drug response screens. For single drug response, the challenge was to predict the ranking of effectiveness of the 28 drugs in 18 test cell lines based on the drug response data provided for 28 drugs in 35 training cell lines and various types of omics data available for each training cell line. We focused our analysis on a set of 15 inhibitor compounds that have highly specific targets (Ntarget<3) and significant SL partners predicted by ISLE. We followed Procedure 1 (see 'ISLE-based drug response prediction' in Methods) using cSL network to make ISLE-based drug response prediction with cSL-score, which denotes the number of its inactive cSL-partners, as inferred from the cell-line's transcriptomic data. We resort to both gene expression and RNAseq data to increase the coverage of cell lines because a particular data type was not available across all cell lines. We then evaluated and compared the performance using weighted probabilistic concordance index (wpc-index), that is the collective measure of concordance-index for each drug, taking the variance of the response to individual drugs into account, as defined in Costello et al. 27 . The resulting ISLE-based prediction ranks fairly high versus the top supervised algorithms reported in DREAM7 27 ( Figure 2C (left columns)).
For drug combination, the task was to predict drug synergism in a single cancer cell line using transcriptomic profiles before and after the treatment. We hypothesized that a pair of drugs will be synergistic if there exists a strongly predicted SL between any of their gene targets (Methods). To this end, we used the ISLE-inferred SL significance between the drug targets as the predictor for synergism (cSL-pair-score), focusing on 12 inhibitor compounds (66 combinations) excluding 2 non-specific drugs from the data. We followed Procedure 2 (see 'ISLE-based drug response prediction' in Methods) to make synergistic compound prediction. For multi-target drugs, we assigned the minimum cSL-pair-score between the target genes as the synergy prediction for a given drug pair. We then evaluated and compared the performance using probabilistic concordance index (pc-index), taking the variance of the response to individual drugs into account, as defined in Bansal et al 28 . ISLE prediction accuracy ranks at the top compared to the top 5 algorithms reported in the DREAM7 28 ( Figure 2C (right columns)). In both the cases, the performance of randomly selected, DAISY-identified and most importantly, ncSL partners is markedly inferior to that of ISLE (Supplementary Figure 3). Notably, ISLE does not require any post-treatment transcriptomic profiles that were used to train all the other competing approaches in the original challenge, making it broadly applicable for drug synergy prediction without the need of further training.

Experimentally testing ISLE-based gene essentiality and drug prioritization prediction
We show that SL-network can predict context-specific gene essentiality using an shRNA knockdown (KD) screening focused on known cancer drivers 29,30 in a liv7k oral cancer cell line under hypoxic and normoxic conditions. We performed an SL-based computational analysis that predicts the effects of each gene KD on cellular proliferation in each condition -this was done using cSL-score, i.e, the count of predicted cSL partners of each gene whose expression is down-regulated in that specific cell-line. Starting from the initial SL pool, an FDR threshold of 0.2 was applied to identify cSL interactions between the genes whose essentiality was measured and the candidate SL partners that are inactive in at least one replicate (since the remainder does not change cSLscore). The result presented for predicting general growth reduction are for genes that show a robust essentiality score within each condition (< variance 45-percentile in both conditions). We focused on the genes which have at least 2 SL partners to guarantee sufficient variation in cSL-score. We compared the performance of ISLE in predicting gene essentiality ( Figure 3A, the individual t-test p-values between each bin are P=0.02, 0.06, 0.01, and 0.07, respectively) to that of DAISY, ncSL and random networks using the SL-score based on these respective networks as prediction for essentiality. DAISY and ncSL partners do not show a significant difference in essentiality between the genes of different SL-score, and the random networks (randomly assigned SL-partners) do not achieve the significance of ISLE (empirical P<1E-3). In predicting context-specific essentiality, we considered 38 experimentally differentially essential genes (Wilcoxon rank sum P<0.2, |log(fold change)|>0.1) that are not differentially expressed (|log (fold change)|<0.05) to rule out the genes that are differentially essential merely due to the differential expression. We focused on the genes whose cSL-scores are differential between the two conditions (Wilcoxon rank sum P<0.1). We made an analogous comparison of ISLE's performance ( Figure   3B) to that of DAISY, ncSL, and random networks . The fold change of cSL-scores quantified based on DAISY   and ncSL networks do not show a significant correlation to the fold change of essentiality in the two different   conditions, and ISLE's performance is not reached by random SL partners (empirical P<1E-3).
For the drug prioritization prediction in oral and breast cancer cell lines, the drug response was predicted as cSLscores of drug's target genes. For conditional effectiveness of each drug, we focused on the drugs whose cSLscore is different in the two conditions (N=20). We evaluated the accuracy of the prediction in a manner analogous to that of conditional essentiality. The fold change (FC) values of predicted cSL-score and the measured drug response in hypoxia to that of normoxia show a significant correlation (Spearman R=0.36 (P<0.01)). The drugs whose cSL-score is higher in one condition indeed are more effective based on the measured drug response in the condition -19 out of 20 drugs shows concordance -the same sign of log(FC) of predicted cSL-score and measured growth inhibition.

Experimentally testing ISLE-based drug synergy prediction in patient-derived cell lines
We show ISLE predicts personalized novel synergistic drug combinations. We tested the top hits of ISLE in melanoma using patient-derived cell lines (PDC). Starting from the drug cSL-network, we first filtered the actionable targets relevant to melanoma, prioritizing the pathway alterations associated with the key melanoma drivers 31 . We ranked the pairs using cSL-pair-score (defined in Methods) based on pancancer and melanoma ISLE screens to take into account the melanoma-specific features. We used their combined score as the final criteria to select the pairs, where averaged rank of pancancer cSL-pair-score and melanoma cSL-pair-score was normalized between 0 and 1 (as presented in the 3 rd column of Supplementary Data 9). Among the top five cSL pairs, we selected two new combinations between highly specific inhibitor compounds that are marked to be the most synergistic based on the predicted cSL interactions -AKT1 and PIK3CA, and BCL2L1 and IGF1R   Figure 7D for Fa-CI plots, and Supplementary Data 12 for raw data).

Predicting erlotinib response in lung cancer patients
To compare the predictive performance of the a 76-gene epithelial-mesenchymal-transition (EMT) signature reported in the original study 33 , we first identified top 76 significant SL-partners of EGFR based on their cSLpair-score (Methods). We applied the resulting drug-cSL network to predict the response of EGFR wild-type 25 patients with recurrent or metastatic Non-Small Cell Lung Cancer (NSCLC) to the EGFR-inhibitor erlotinib 33,34 .
The erlotinib cSL-score predicts the specific response to erlotinib: for an independent arm of the same trial, in which 37 NSCLC patients were treated with sorafenib, a VEGFR inhibitor, the predictions were not associated with months-to-progression in this case (log rank P>0.95, Supplementary Figure 9C). We observed that the expression levels of EGFR itself do not have a predictive signal (R = 0.03, P = 0.5), and even though mutated KRAS is a biomarker of resistance to EGFR-inhibitors 33

Threshold for gene underactivation
We selected the simplest possible approach that is free from assuming a specific background distribution to minimize the number of parameters needed. For every gene, across samples in each cancer type, we marked the bottom 1/3-quantile as under-expressed. We applied the exact same threshold (1/3-quantile) for determining inactivity from gene expression and SCNA data across all datasets analyzed. We verified that the outcome of our analysis does not depend on the threshold for gene under-activation using the METABRIC breast cancer cohort.
We divided the patients into two groups of high cSL-score (top 50%) and low-cSL score (bottom 50%) and compared the difference in their survival using five different underexpression thresholds (Supplementary Figure   10A), demonstrating that ISLE's predictive signal does not depend on the parameter choices for gene underactivation. This analysis was performed using the core cSL network by applying FDR<0.1 to ISLE pipeline ( Figure 1B). (2) uses a similar set of data as those used to construct ISLE's initial set II. To make a fair comparison between DAISY and ISLE, that is not biased with the types and size of the input dataset, we used the exactly same dataset for step (1) and step (2) of ISLE to implement DAISY. The SL partners were identified using FDR threshold of 0.2 in an analogous manner to ISLE following the DAISY's Methods described in the original paper 36 . DAISY SL-pair-score was calculated in an analogous manner as ISLE's cSL score, namely, DAISY SL pair score = (3) + (6) + (7) ,

Implementation of DAISY
where DAISY-SL-pair-score is a qualitative measure combining the significance levels, r (3) , r (6) and r (7) denote the rank-normalized values (between 0 and 1, with 1 representing a pair with the highest significance and 0 with the lowest) of the statistical significance levels across all gene pairs tested for step (1), (2), and (3) of DAISY, respectively.

Supplementary Note 2
We provide interactive maps of the network, which can be explored using the freely available Cytoscape software 37,38 . The maps visualize different subnetworks and include various gene properties and annotations, as well as alternative views that dissect the network hubs or genes with specific characteristics. The interactive maps are accessible through https://github.com/jooslee/ISLE/.
The network includes genes (nodes) and their SLi (edges), and the largest subnetwork is displayed in Figure 1B.  ANKRD57 , where (i) SL-partners were randomly assigned (gray) and (ii) SL-partners of other drugs were assigned (blue) in predicting DREAM7 challenge drug response to single agents (left columns) and combinations (right columns). For drug combinations, only the randomly shuffled SL network of the drug targets were used as random control (blue). The performance was measured using weighted probabilistic concordance index for single agents and probabilistic concordance index for double treatments as was done in the original papers 27,28 .  The KM plots shows that patients with high cSL-scores (top 50%; solid line) have better prognosis than those with low cSL-scores (bottom 50%; dashed line) with the with five different gene under-expression thresholds (50% (red), 1/3 (orange), 25% (green), 20% (blue), and 10% (purple)). (B) Thresholds for high vs low cSL-score groups: The KM plots shows that patients with high cSL-scores (solid line) have better prognosis than those with low cSL-scores (dashed line), using five different cSL score thresholds, namely top/bottom 10% (red), 20% (orange), 30% (green), 40% (blue), and 50% (purple). The 1/3quantile threshold was used to determine gene underexpression. Both analyses examined the core clinical-SL-network inferred with FDR<0.1 ( Figure 1B).