Synthetic lethality prediction in DNA damage repair, chromatin remodeling and the cell cycle using multi-omics data from cell lines and patients.

Markowska, Magda; Budzinska, Magdalena A.; Coenen-Stass, Anna; Kang, Senbai; Kizling, Ewa; Kolmus, Krzysztof; Koras, Krzysztof; Staub, Eike; Szczurek, Ewa

doi:10.1038/s41598-023-34161-4

Download PDF

Article
Open access
Published: 29 April 2023

Synthetic lethality prediction in DNA damage repair, chromatin remodeling and the cell cycle using multi-omics data from cell lines and patients.

Magda Markowska^1,2^na1,
Magdalena A. Budzinska^1,3^na1,
Anna Coenen-Stass⁴,
Senbai Kang¹,
Ewa Kizling¹,
Krzysztof Kolmus³,
Krzysztof Koras¹,
Eike Staub⁴ &
…
Ewa Szczurek¹

Scientific Reports volume 13, Article number: 7049 (2023) Cite this article

3087 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Discovering synthetic lethal (SL) gene partners of cancer genes is an important step in developing cancer therapies. However, identification of SL interactions is challenging, due to a large number of possible gene pairs, inherent noise and confounding factors in the observed signal. To discover robust SL interactions, we devised SLIDE-VIP, a novel framework combining eight statistical tests, including a new patient data-based test iSurvLRT. SLIDE-VIP leverages multi-omics data from four different sources: gene inactivation cell line screens, cancer patient data, drug screens and gene pathways. We applied SLIDE-VIP to discover SL interactions between genes involved in DNA damage repair, chromatin remodeling and cell cycle, and their potentially druggable partners. The top 883 ranking SL candidates had strong evidence in cell line and patient data, 250-fold reducing the initial space of 200K pairs. Drug screen and pathway tests provided additional corroboration and insights into these interactions. We rediscovered well-known SL pairs such as RB1 and E2F3 or PRKDC and ATM, and in addition, proposed strong novel SL candidates such as PTEN and PIK3CB. In summary, SLIDE-VIP opens the door to the discovery of SL interactions with clinical potential. All analysis and visualizations are available via the online SLIDE-VIP WebApp.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Targeting DCAF5 suppresses SMARCB1-mutant cancer by stabilizing SWI/SNF

Article 27 March 2024

Introduction

Synthetic lethality (SL) is classically defined as an interaction of two genes, where the co-inactivation of both genes results in cellular death, while inactivation of each individual gene results in a viable phenotype¹. Discovering synthetic lethal gene pairs is an important step in developing new targeted cancer therapies^1,2. The mechanism that leads to a therapeutic window for SL-based therapies in cancer is that inactivation of one gene in the SL pair already occurs via the endogenous mutation in the tumor cells, and not in the normal cells throughout the rest of the body. Thus, applying a drug that targets the SL partner of that gene is expected to selectively kill cancer cells, leaving the normal cells viable.

The first approved targeted cancer therapy inhibiting the DNA damage response harnesses the SL interaction between BRCA1 or BRCA2 (genes often mutated in breast and ovarian cancer) and PARP1/2 genes^1,3. Therapies using PARP inhibitors exhibit enhanced efficacy in patients with pre-existing BRCA deficiencies. Other ongoing clinical trials screen drugs targeting genes involved in DNA damage response (DDR), such as ATR inhibitors, for patients with ATM-mutated cancers^2,4,5,6. Further efforts to discover novel SL interactions are needed to propose new candidate pairs and develop new SL-based biomarker-driven cancer therapies.

With the assumption of around 20 thousand genes in the human genome, the space of potential SL partners consists of around 200 million potential pairs. Thus, experimental discovery of new SL partners without previous computational screening is infeasible. On the other hand, statistical analysis of all potential gene pairs can return many false positives. In our work, we aim at finding potential SL interactions that can be used for cancer therapy. Thus we focus on genes that take part in molecular processes important for cancer development and progression (further referred to as focus genes) or can be considered as therapeutic targets (druggable genes). The focus gene set comprises genes related to DDR, chromatin remodeling and cell cycle. The DDR pathway includes genes that are responsible for recognizing and repairing DNA damage in the cell⁷, such as known tumor suppressors TP53, ATM, ATR, BRCA1 and BRCA2. Genes from chromatin remodeling pathway, such as ARID1A, ARID1B, SMARCA2 and SMARCA4 from the SWI/SNF complex, encode proteins forming molecular machinery that alters chromatin structure and thus takes part in regulating gene expression as well as DNA repair⁸. Mutations in cell cycle genes (for example TP53, RB1, CDKs or transcription factors from E2F family) can deregulate the cell cycle, which is an indication of four cancer hallmarks: Self-Sufficiency in Growth Signals, Insensitivity to Antigrowth Signals, Limitless Replicative Potential and Genome Instability⁹.

The computational methods for detecting SL gene pairs can be classified according to the input data source they utilize. The first type of methods, such as SLIdR¹⁰, Synlet¹¹, or DepMap¹² exploit high-throughput gene dependence screens performed on cell lines, generated by large experimental efforts such as project DRIVE¹³ or DepMap¹². The second group of methods, such as SurvLRT¹⁴, analyzes cancer patients survival and tumor genomic data (gene expression or somatic alterations). Pipelines such as MiSL¹⁵, DAISY¹⁶, DiscorverSL¹⁷ with its online tool SL-BioDP¹⁸ or ISLE¹⁹ combine several statistical methods to analyze different types of pan-cancer tumor and cancer cell line data. Finally, many methods do not directly use experimental data, but mine databases containing biological networks²⁰, gene ontologies, protein-protein interactions or known SL interactions and employ machine learning techniques to detect patterns similar to those present for the known SL gene pairs^21,22.

Integrating multiple statistical tests performed on several independent data sources is expected to provide better statistical power and increase the robustness of SL discovery from the imperfect and noisy biological data. In this study, we devised SLIDE-VIP (Synthetic Lethality Integrated Discovery Engine-Verified In Patients), a framework for SL interaction discovery, combining eight statistical tests, including a novel, patient data-based test iterative SurvLRT (iSurvLRT). Our framework leverages multi-omics data from four different sources: gene inactivation (knockout and knockdown) cell line screens (two tests on two different datasets), cancer patient data (four tests), drug screens (one test on two datasets) and genomic pathways (one test; Fig. 1). We carefully assess the quality of the outcomes for each of them, rank the results with combined and adjusted p-values and choose potential clinically relevant SL pairs that are best supported by the most reliable results. Additionally, we offer an easily available SLIDE-VIP WebApp with pre-computed results and visualizations of eight SL tests for 220,169 gene pairs. By closely examining the known and newly discovered SL pairs we show that our comprehensive framework has the potential to discover new biomarkers or targets for cancer therapy.

Results

SLIDE-VIP: a workflow for identifying synthetic lethal partners of genes involved in DNA damage response, chromatin remodeling and cell cycle from integrated cell line and patient data

We devised a uniquely comprehensive workflow for identifying clinically relevant synthetic lethal interactions (Fig. 1). We performed two series of synthetic lethality tests (Fig. 1B). In the first series, we closely investigated interactions between genes involved in carcinogenesis by testing pairs of focus genes (genes involved in DNA damage response, chromatin remodeling and cell cycle; “Methods” and Supplementary Table 1). The second series was more exploratory—we searched for SL partners for focus genes among all other genes, provided they presented therapeutic potential (focus vs druggable genes that have known targeting drugs in advanced stages of development; “Methods” and Supplementary Table 2). For each gene pair, gene A was chosen from the focus genes and assessed if at least 20 loss of function (LoF) alterations were present in the dataset (see “Methods” for a definition of LoF alteration). The second gene in the pair, referred to as gene B, was either a focus or a druggable gene, and it was considered a candidate for a synthetic lethal partner of gene A.

First, a total of 224,169 gene pairs in both series (126,175 focus gene pairs and 97,994 focus vs druggable gene pairs) were assessed for SL interactions using cell line tests, namely Synthetic Partner Inactivation Dependency (SPID) and Synthetic Partner Enrichment Analysis (SPEA) tests based on the Achilles and DEMETER2 knock-out screen data (see “Methods”). The number of analyzed pairs results from the fact that we only included those possible to test on both datasets. Next, we combined the obtained four p-values using Fisher method²³ and then performed Benjamini-Hochberg multiple testing correction²⁴ for the resulting combined p-value for all 224,169 gene pairs (together for the focus and focus vs druggable gene pairs). With that procedure we obtained 24,193 pairs (14,556 focus, 9637 focus vs druggable) with combined p-value $< 0.05$, and 1833 pairs (1267 focus and 566 focus vs druggable) with combined and corrected p-value $< 0.05$. We consider the 1833 pairs to be the top cell line-based SL candidates.

Next, we screened the 224,169 gene pairs for confirmation in the patient data. First, four patient tests were performed using 7256 tumor samples in The Cancer Genome Atlas (TCGA). Since each of the tests has different application criteria (see “Methods”), we were able to apply each of them to a different number of pairs (see Table 1). It is worth noting that introducing a novel iSurvLRT test allowed us to use patient survival data for a much higher number of pairs than with the existing SurvLRT test. We then performed Benjamini-Hochberg multiple test correction for each of the four patient tests for all tested pairs (together for focus and focus vs druggable gene pairs, see Table 1). Finally, we checked which of the top 1833 cell line-based pairs showed evidence of SL for at least one of the patient tests (see “Methods” for SL evidence criteria). In this manner, we obtained 883 clinically relevant SL gene pairs (683 focus and 200 focus vs druggable) that we consider the top SL candidates with strong evidence in both cell line and patient data.

Table 1 Summary of the performed SL tests.

Full size table

To rank the top 883 clinically relevant SL gene pairs, we used the corrected combined p-value for the cell line-based tests. We did not include the other tests in the combined p-value for the ranking because they could not be performed for all of the pairs and each of them had a different sample size, resulting in varying statistical power. All the results for the top pairs, sorted according to the final ranking, are presented in Supplementary Table 3 (683 focus pairs; referred to as focus ranking) and Supplementary Table 4 (200 focus vs druggable pairs; referred to as focus vs druggable ranking). Combined with plots illustrating all the performed tests (easily available through our webtool SLIDE-VIP WebApp) they are a rich source of knowledge about relevant SL interactions among genes from DDR, chromatin remodeling and cell cycle pathways and potentially druggable genes. To independently validate the top 883 clinically relevant SL pairs detected by our framework, we cross-checked the results with SL interactions experimentally identified by¹⁹ using 17 focused in vitro SL screens (Supplementary Table 5). Due to our strict criteria, which include clinical relevancy, SLIDE-VIP identified a relatively low number of SL pairs and thus achieved a low sensitivity. Nevertheless, we can boast a very high 99% specificity and 93% accuracy. The comparison to alternative approaches DAISY¹⁶ and SL-BioDP¹⁸ was limited by the fact that these reported only positive results (only pairs identified as SL out of an unknown number of all tested). Among the three methods, SLIDE-VIP achieved the highest precision (the only quality measure possible to calculate for all three approaches). For completeness, all gene pairs tested by SLIDE-VIP are reported by Supplementary Table 6.

Additional evidence in cell line drug screening data

We checked for additional evidence for all the potential SL pairs in public cell line drug screening data (“Methods”). A relatively small number of pairs (62,026 pairs for the Profiling Relative Inhibition Simultaneously in Mixtures database (PRISM)²⁵ dataset and 26,354 for the Genomics of Drug Sensitivity in Cancer database (GDSC)^26,27,28 dataset) included a gene B that was also targeted in publicly available drug screens (see Table 1). After performing Synthetic Partner Drug Dependency (SPDD) test on PRISM and GDSC dataset (see “Methods”), we applied the Benjamini-Hochberg multiple test correction to the obtained SPDD p-values separately for each of the two datasets. The p-value histograms for both datasets (Supplementary Fig. 1) show a strong skew of the unadjusted p-value distribution towards 0, indicating that for multiple tested pairs there is a signal for their SL interaction in that data. Accordingly, before the multiple testing correction, there were 5,833 pairs in GDSC dataset and 12,293 pairs in PRISM dataset (see Table 1) with p-values less than 0.05. However, after the Benjamini Hochberg correction, there were only 220 pairs in GDSC dataset and one pair in PRISM dataset with the adjusted p-value less than 0.05. This resulted from the fact that, in contrast to the previous tests, the range of the original p-values does not include extremely low values. Since we consider this test as an additional validation rather than a means for identifying new pairs, we further considered the unadjusted p-values for that test. With that, we acquired 115 pairs among the top 1833 cell line based for GDSC dataset and 98 pairs for PRISM dataset (see Table 1). Those pairs are especially interesting because drugs targeting gene B demonstrate potential for drug repurposing and targeted cancer therapy for patients with LoF alteration in gene A.

Insights into biological function and pathway membership of the top SL pairs

Finally, to gain insight into the biological function of the identified pairs, we evaluated 1951 pathways from the c2 collection in the Molecular Signatures Database (MSigDB)²⁹ to identify significant common enrichment of potential SL partners in curated gene sets. We performed Syntehtic Partners Shared Pathways (SPSP; see “Methods”) test for 178,256 gene pairs and then performed Benjamini-Hochberg multiple test correction. Since our SL analysis focused on cell cycle, DDR, and chromatin remodelling genes, an overlap is expected between the pathways pertaining to the SL pairs. Still, these and additional identified pathways can widen our understanding of the mechanisms behind their SL interactions. Among the top 1833 cell line based pairs, 212 SL gene pairs shared a highly significant number of pathways, which suggests that their SL interaction can be explained by their shared functionalities. The common pathways for these pairs may be important for conveying the SL interactions, i.e. it is likely that the pair is SL because the pathways may operate when only one of the genes is turned off, but they fail to function when both are lost. SLIDE-VIP WebApp includes a MSigDB data module which illustrates SPSP test results and names the pathways shared by genes from a given pair.

Further, we checked whether SL gene pairs that have positively or negatively correlated expression occur within pathway or between pathways. Interestingly, we found that gene pairs with negatively correlated expression almost exclusively occur in different pathways (Supplementary Fig. 2A). In contrast, a much higher fraction of gene pairs with positively correlated expression (183 gene pairs) occurs in the shared pathways (Supplementary Fig. 2B).

Previously known SL pairs for focus genes

Several synthetic lethal interactions between the focus genes found by using SLIDE-VIP have already been identified previously, thus confirming the validity of our approach. Among the top 683 focus pairs, 30 are featured in the SynLethDBv2.0 database³⁰. Specifically, the four top pairs are already confirmed SL gene pairs: TP53 and CHEK2, RB1 and E2F2, RB1 and SKP2, TP53 and PLK1.

For DDR genes, the top identified focus pairs include known SL gene pairs such as: TP53 and CHEK2 (first in the focus ranking^31,32), PRKDC and ATM (15th in the focus ranking³³) and TP53 and IGF2 (44th in the focus ranking^34,35. All these three pairs have very strong confirmation in cell line tests and are confirmed either by Survival of the Fittest (SoF) or SurvLRT patient data test. For the cell cycle genes, we rediscovered known SL interactions between RB1 and E2F3 (second in the focus ranking^36,37,38) and RB1 and SKP2 (third in the focus ranking^37,39,40). Those two pairs have full confirmation in 4 cell line data tests and are also confirmed by SoF patient data test. Among the top ranking pairs, there was also one already known pair TP53 and PLK1 (fourth in the focus ranking^4,41) that has very good confirmation in DEMETER2 cell line data tests and also SoF patient test.

Additionally, we verified our top focus pairs against a systematic experimental approach that tested pairs of chosen 73 cancer genes with combinatorial CRISPR-Cas9 perturbations in three cell lines⁴². Among the top SLIDE-VIP 683 focus pairs, 13 were checked by this experimental approach, and 12 of those have a negative Z score measured for at least one of the three cell lines, indicating a synthetic lethal relationship. The Z scores from⁴² for the top focus pairs are included in the Supplementary Table 3.

Previous work identified a set of 31 gene pairs, consisting of paralogs that share common functionalities and thus display SL interactions²², as well as SL pairs in SWI/SNF chromatin remodeling complex deficient cancers⁴³. Out of these 31 pairs, seven were among the 224,169 gene pairs tested using our framework. The remaining ones were excluded either because they did not belong to the focus or druggable sets, or did not pass the testing criteria. To enrich the body of knowledge of these seven pairs, we investigated in detail the type of evidence for their SL interactions that could be found using SLIDE-VIP. Three out of those seven were among the top 1833 cell line based pairs, six were confirmed by at least one test on patient data, three found additional confirmation using drug screens and four using pathway-based tests (Table 2). In the SWI/SNF complex, the cell line tests confirmed the SL interaction for known paralog pairs ARID1A and ARID1B^13,44,45 and SMARCA4 and SMARCA2^43,46. ARID1A and ARID1B synthetic lethal interaction was also observed in patients tests. SMARCA4 and AURKA synthetic lethal interaction was confirmed both in cell line and patient data.

Table 2 Paralogs and known SL pairs in SWI/SNF.

Full size table

Previously known SL pairs for focus vs druggable genes

In contrast to the top 683 focus gene pairs, the top 200 pairs of the focus vs druggable genes are not necessarily expected to be previously studied and known (Supplementary Table 4). The druggable partners may be involved in any type of other processes than the focus DDR, chromatin remodeling or cell cycle pathways. Thus, the focus vs druggable test series was more exploratory and intended to provide candidates for drug repurposing. For the top 200 focus vs druggable pairs, four were also found in SynLethDBv2.0 database³⁰. Out of those, the best ranked are TP53 and MAPKAPK5 (ninth in the focus vs druggable ranking) and TP53 and ITGB1 (49th in the focus vs druggable ranking) with additional confirmation in SoF patient data test.

Additionally, we verified our top focus vs druggable pairs against⁴². Among the top SLIDE-VIP 200 focus vs druggable pairs, two were checked by this experimental approach, and both have a negative Z score measured for at least one of the three cell lines, indicating a synthetic lethal relationship. The Z scores from⁴² for the top focus vs druggable pairs are included in the Supplementary Table 4.

SLIDE-VIP confirmed synthetic lethality interaction between PRKDC and ATM

To illustrate the results of our framework, we selected one interesting pair from the top candidates for the focus SL gene pairs, namely the PRKDC and ATM pair (Fig. 2). This pair has quite high 0.65 score and SL potential confirmed experimentally in CRISPR screen according to the SynLethDBv2.0 database³⁰. The waterfall plot of ATM dependency in cell line knock-out screens shows enrichment of negative dependency scores in cell lines with PRKDC LoF (red bars in Fig. 2A). For that pair, SPID test for the Achilles dataset identified that dependency of cell lines on ATM with PRKDC LoF alteration is significantly higher compared to cell lines without PRKDC LoF alteration (p-value $9.77 \times 10^{-6}$; Fig. 2B first plot). SPEA confirmed that the cell lines with PRKDC LoF alteration are significantly enriched at the top of the list of cell lines as ordered by their sensitivity to CRISPR-mediated knock-out ATM knock-out (p-value 0.005; Fig. 2B first plot). Importantly, the same tests on the independent DEMETER2 dataset also confirmed the interaction (p-values 0.0009 for SPID and 0.005 for SPEA; Fig. 2B,C second plots). Using SurvLRT, we showed that patients with simultaneous LoF alterations of PRKDC and ATM survive significantly better than expected given the survival of patients with individual LoF alterations and without LoF alterations of those genes (p-value 0.004; Fig. 2D). The evidence for synthetic lethal interaction between PRKDC and ATM was also found in drug data, where the sensitivity of cell lines with PRKDC LoF alteration to the drug KU-60019, known to target ATM, was significantly larger than of cell lines without PRKDC inactivation (p-value 0.043; Fig. 2E). This gives additional drug-repurposing insight, that the drug KU-60019, originally developed as a drug targeting ATM, could be used to target cancers with PRKDC LoF alteration. Additional confirmation of the interaction between the genes was found in SPSP test, which shows that the genes share a significantly high number of pathways (PRKDC is part of 14 pathways, ATM is part of 51, and they have 5 common pathways in common; p-value 0.0004; Fig. 2F).

PRKDC, encoding the catalytic subunit of the DNA-dependent protein kinase DNA-PK, and ATM are both kinases capable of sensing DNA damage and activating downstream effector kinases to initiate the repair system. Loss of ATM has been tested in multiple clinical trials as a selection biomarker for drugs that target the DNA repair system such as Olaparib⁴⁷. Pre-clinical research indicates that loss of ATM could sensitize cancer cells for DNA-PK inhibition^33,48. Given that DNA-PK was not profiled in the Depmap Achilles dataset, we could not see this previously described interaction, we rather see the reciprocal gene pair: loss of PRKDC as a sensitiser for ATM inhibition. Drugs targeting ATM have entered clinical development in recent years, so it would be of high interest to confirm this synthetic lethal interaction further.

PTEN and PIK3CB pair identified as a pair with high evidence for synthetic lethality for the focus vs druggable gene pairs

Another highly interesting candidate SL pair, PTEN and PIK3CB, was identified as second at the top focus vs druggable pairs ranking (Fig. 3). The waterfall plot of PTEN dependency in cell line knock-out screens clearly shows enrichment of negative dependency scores in cell lines with PIK3CB LoF (red bars in Fig. 3A). SPID test on both Achilles and DEMETER2 data indicates that cell lines with PTEN LoF alteration are significantly more sensitive to inactivation of PIK3CB (p-values $6.2 \times 10^{-6}$ and 0.0004 respectively; Fig. 3B). This is confirmed on the same datasets using SPEA test (both p-values equal to 0.005; Fig. 3C). Finally, for patient data, iSurvLRT test found that patients with simultaneous PTEN LoF alteration and low expression of PIK3CB survive significantly better than expected from survival of patients with individual or without LoF alteration of PTEN or PIK3CB (p-value equal to 0.0004; Fig. 3D), indicating high clinical relevance for that SL interaction. Drug data also shows evidence of synthetic lethal interaction between the genes, where the sensitivity of cell lines with PTEN LoF alteration to the drug AZD6482, known to target PIK3CB, was significantly larger than of cell lines without PTEN inactivation (p-value $7.4 \times 10^{-8}$; Fig. 3E). Drug AZD6482 is thus a strong candidate for SL-based repurposing, as this result suggests that AZD6482 should be further evaluated for the treatment of patients with PTEN LoF alteration. Additional confirmation of the SL interaction between the genes was found in SPSP test, which shows that the genes share a significantly high number of pathways (PTEN is present in 44 pathways, PIK3CB in 97, and they have 22 pathways in common; p-value $3.6 \times 10^{-15}$; Fig. 3F).

The PIK3 pathway regulates a myriad of cellular processes including proliferation and survival by initiating phosphorylation signaling cascade⁴⁹. The lipid phosphatase PTEN is a negative regulator of PIK3 pathway activity. Consequently, deregulation of the PIK3 pathway in tumors is frequently caused by inactivation or loss of PTEN^50,51. Interestingly, it has previously been observed that PTEN-deficient cancer cells require the lipid kinase activity of PIK3CB for both PIK3 signaling and growth in vitro^52,53, which gives additional support for that potential SL interaction.

Synthetic lethality evidence network for focus genes

To visually inspect the evidence found for the most promising SL candidates and their interconnections, we created SL evidence networks for the top 50 pairs, both for the focus gene list (Fig. 4) and for the focus vs druggable gene list (Fig. 5). The evidence networks summarized all tests and data sources that confirmed the synthetic lethal interaction for each category of the pairs.

Interestingly, for the focus genes (Fig. 4), a set of cell cycle genes formed a separated cluster, with RB1 as a hub, involving genes such as CDK2 or E2F3 as partners. TP53 emerged as the largest hub in the network, with a cluster of partners involved in chromatin remodeling (including HDAC5, HDAC6 or UBR2). Around ATM, there was another, most densely connected cluster of genes involved in chromatin remodeling, which included NOS1, KMT2D, or NCOA6. Connected genes involved in the DDR include ARID1B and ARID1A (together with ATR via KMT2D). The network contained a link between SMARCA2 and SMARCA4, paralogue genes known as synthetically lethal, involved in both chromatin remodeling and DDR.

Interestingly, the WRN gene formed an additional hub, with PRKDC, ARID1A, ARID1B and others as potential SL partners. Given WRN dependency is typically coupled to MSI status, these interactions could indicate additional vulnerabilities for MSI tumor cells.

Synthetic lethality evidence network for focus vs druggable genes

Overall, compared to the SL evidence network for the focus genes, the SL evidence network for the focus vs druggable genes is less densely connected (Fig. 5). However, there emerged several interesting hubs. First, again, TP53, a tumor suppressor involved in all three focus pathways, was identified as a SL partner of multiple druggable genes, 5 of which were already approved. Interestingly, two druggable genes, not involved in any of the focus pathways, emerged as hubs, which means they were identified as potential partners for many of the focus genes. In particular, KEAP1 was found to be a candidate partner for five focus genes, four of which are involved in chromatin remodeling. DHODH was another popular partner, and was also identified in three pairs.

The proposed workflow identifies novel potential SL pairs

We identified several novel interesting SL interactions relating to genes with roles in chromatin modification. For example, we observed that cell lines with LoF inactivation of the lysine methyltransferase 2D (KMT2D/MLL4) were more dependent on ATM than those with WT KMT2D/MLL4. Furthermore, evidence from SurvLRT, iSurvLRT and GDSC drug screening supported this potential SL interaction (Supplementary Table 3). Curiously, previous studies suggested that inactivation of KMT2D and also KMT2C increase sensitivity to Poly(ADP-ribose) polymerase inhibitors (PARPi)^54,55, thus highlighting an important link between chromatin modification and targeting the DNA damage response. Moreover, NCOA6 mutated cancer cell lines in GDSC screen exhibited a higher sensitivity to ATM inhibitor KU-60019 than those with WT NCOA6. Therefore this could be an interesting pair to investigate further experimentally.

We also observed that LoF alteration in the chromatin remodeling enzyme chromodomain helicase DNA-binding protein 8 (CHD8) sensitized cancer cell lines to inactivation of the exonuclease MRE11. Although this pair lacks confirmation from patient tests, it was supported by GDSC drug screen data where CHD8 mutated cell lines were more susceptible to the MRN complex inhibitor Mirin.

Discussion

Here, we introduced SLIDE-VIP—a novel comprehensive SL analysis framework, combining eight statistical tests and leveraging multi-omics data from four sources. It includes an original, patient survival-based iSurvLRT test, which allows us to harness survival data for a higher number of potential pairs than using our previous SurvLRT test. Both iSurvLRT and SurvLRT inform about the clinical relevance of the tested SL interaction. Previous approaches for SL identification combining several data sources^17,19 used less comprehensive workflows with a lower number of performed tests.

For a defined target space consisting of focus genes from DDR, chromatin remodeling and cell cycle pathways, our framework indicated known and novel pairs with clinical relevance. We comprehensively explored the interactions between the genes important for cancerogenesis. In this category we rediscovered many known SL gene pairs, at the same time revealing new ones that can be further looked into. The results obtained for focus vs druggable gene pairs are of a more exploratory nature. This test series extended the set of candidate SL partners beyond cancer-related genes. Additionally, gene pairs from this category may have a higher clinical potential, with genes B being targeted with drugs that are in the advanced stage of development.

Conveniently, all our analysis results (including tested pairs which do not meet the significance thresholds) are publicly accessible via SLIDE-VIP WebApp. The plots illustrating the results are easily available for manual inspection by a specialist interested in developing new cancer therapies. To complete the picture, the app also shows the results of the additional tests on drug screens and pathway information.

Our SLIDE-VIP framework employed a different, more conservative strategy than other approaches, such as DAISY¹⁶, ISLE¹⁹, or DiscoverSL¹⁷ and its online tool SLBiodp¹⁸. It amalgamated the results from different tests, two of them being novel ones, with the aim of getting rid of false positives. Our discovery pipeline effectively combined signals in both siRNA and CRISPR cell-line data, together with several signals in patient data. The reason for this was that many pairs, which were previously discovered from cell line screens alone, such as SMARCA4 and SMARCA2 pair^43,46, failed further validation by any of our patient data-based tests, and thus may be considered as false positives. Consequently, we return a lower number of SL candidates that are more comprehensively supported by the data. We hope that through this strategy, the SLIDE-VIP framework will facilitate the discovery of clinically relevant SL gene pairs in the future. We also like to emphasize that in contrast to previous methods, we did not calculate a single score summarizing all tests. This is because such a score can be dominated by an outcome of a single out of many performed tests, due to differences in p-value ranges. Instead, we facilitate the investigation of each test individually. In particular, through our WebApp, we provide the results and visualization of each test for the users to decide what is relevant to their research questions.

Our framework must be viewed within the limitations of the available datasets. They are the results of either a biological experiment or data collected from the patients—both of those categories are prone to confounding factors and include noise in the data produced either on the experimental stage or during the bioinformatics preprocessing steps. They also have different sample sizes not only between the datasets but also inside them—for example, more cell lines and patients have LoF mutations in common cancer genes such as TP53 or RB1, which can be part of the reason for their high position in the list ranked by combined p-values. Furthermore, the patient data is sparse, which we partly overcame by performing pan-cancer analysis and using four patient tests, including a novel one—iSurvLRT, that uses different types of input. Having higher sample sizes for each of the data types would allow us to apply SLIDE-VIP in a cancer indication specific manner. Cancer type is potentially an important predictive factor, especially for the survival data used in SurvLRT and iSurvLRT tests. Our testing framework is general and extendable to other lists of focus or druggable genes. Thus future discoveries can be made using SLIDE-VIP for genes residing in other cancer pathways than investigated in this study.

SLIDE-VIP uncovered gene pairs with excellent potential for developing new targeted cancer therapies. Our framework vastly reduces the space of candidate SL pairs and thereby enables focusing experimental verification on the most promising candidates. The pairs with additional verification in drug screens and pairs from the focus vs druggable genes are especially attractive for drug repurposing. In this way, SLIDE-VIP opens the door to the discovery of SL interactions with clinical potential.

Methods

Definitions

LoF alteration

For both cell line and patient data, we considered loss of function (LoF) alterations that result in producing a non-functional protein. LoF alterations were defined based on the Ensembl⁵⁶ classification of the variants, and only variants with a predicted high impact effect were utilized. This included the following alterations: splice site, nonsense and damaging missense mutations, frameshift insertion/deletion, start codon insertion/deletion/single nucleotide polymorphism, stop codon insertion/deletion. To assess the functional effect of the missense variants and to define damaging ones, we used ANNOVAR⁵⁷. Five variant effect prediction methods included in ANNOVAR were used to increase the accuracy of the prediction and decrease the false-positive rate: PolyPhen-2⁵⁸, SIFT⁵⁹, PROVEAN⁶⁰, CADD⁶¹ and DANN⁶². A missense mutation was classified as damaging if at least three out of these five methods predicted a damaging effect on the protein function according to the damaging score cut-off established by each method.

Focus genes

Focus genes were defined as genes from DDR, chromatin remodeling and cell cycle pathways. Sets of genes involved in these pathways were collected based on published literature^{43,63,64,65,66,67} and curated databases (total 1241 genes; Supplementary Table 1; Fig. 1A).

Druggable genes

Druggable genes were defined as genes targeted by drugs in an advanced stage of development. Specifically, 953 druggable genes were selected from the Open Targets Platform database (⁶⁸, release 19.06), based on the stage of development of the drug that targets the selected gene, including genes in the approved, advanced clinical and phase 1 clinical development stage (Bucket 1–3, Supplementary Table 2).

Data source and pre-processing

Cell line data

The dataset of genome-scale CRISPR knockout screens from the project Achilles (https://depmap.org/portal/achilles/) for 789 cell lines was obtained from the Cancer Dependency Portal (DepMap 20Q3 release²⁵, https://depmap.org/portal/download). 787 of these cell lines had genomic information and thus were utilized in the analysis. Additionally, large-scale RNAi screening datasets including the Broad Institute Project Achilles, Novartis Project DRIVE¹³ and the Marcotte et al.⁶⁹ breast cell lines, with the genetic dependencies estimated using the DEMETER2 model⁷⁰, were downloaded from the same source, the DepMap portal (712 cell lines). The genomic characterization data of the cell lines was generated by the Cancer Cell Line Encyclopedia (CCLE) and obtained from DepMap 20Q3 (787 and 670 cell lines for the Achilles and DEMETER2 projects, respectively). The cell line alteration data was then organized into a cell line per gene binary matrix with the value 1 if the cell line acquired at least one LoF alteration in the gene, and 0 otherwise.

Patient data

The somatic SNV (single nucleotide variants) alterations, mRNA gene expression and clinical data for cancer patients were downloaded for The Cancer Genome Atlas (TCGA) Pan-Cancer via the University of California Santa Cruz Xena data portal⁷¹. Raw counts from TCGA expression data were log2-scaled and quantile normalized to mitigate batch effects between different datasets (e.g. sequencing lab, time, etc.). Quantile normalization was performed using the “preprocessCore” R package⁷². The patient data was organized into a patient per gene binary matrix with the value 1 if the patient acquired at least one LoF alteration in the gene, and 0 otherwise.

Drug screen data

The drug screening data for the drug sensitivity analysis was collected from the Genomics of Drug Sensitivity in Cancer database (GDSC; https://www.cancerrxgene.org/downloads)^26,27,28, and the Profiling Relative Inhibition Simultaneously in Mixtures database (PRISM²⁵) from the DepMap data portal. Specifically, we exploited both the two available GDSC experimental setups (GDSC1 and GDSC2) and the secondary PRISM Repurposing 19Q4 screen. GDSC datasets were curated using the Open Targets database⁶⁸ to unify the gene and protein information and assign gene targets to each drug. The screens included 518 unique drugs (367 in GDSC1 and 198 in GDSC2, with some overlap) targeting genes B in 988 unique cell lines in the GDSC database (987 cell lines in GDSC1 and 809 cell lines in GDSC2, with some overlap) and 1210 drugs targeting genes B in 481 cell lines in PRISM database.

Pathway data

The gene pathways were collected from the curated gene sets: Kyoto Encyclopedia of Genes and Genomes (KEGG^73,74,75), Pathway Interaction Database (PID⁷⁶) and Reactome⁷⁷, available as part of the c2 collection in MSigDB²⁹. Together, we evaluated 1951 pathways: 186 in KEGG, 196 in PID and 1569 in REACTOME.

Synthetic lethality tests

We used eight tests on four data sources (cell line, patient, drug and pathway data) to collect evidence of synthetic lethality: two types of statistical tests on cell line data, four tests on patient data, two tests on drug screens and one pathway enrichment test. Although several of those tests were previously used in SL discovery by many different frameworks^{15,16,17,18,19}, here we introduced a new patient-based test (iSurvLRT) and a novel modification of the GSEA, called SPEA, and further combined all tests in a comprehensive framework for identifying clinically relevant SL pairs.

Synthetic partner inactivation dependency (SPID)

SPID test is an application of a one-sided Wilcoxon rank-sum test to determine whether cell lines with LoF alteration in gene A have significantly lower gene B dependency scores than the cell lines without gene A LoF alteration, i.e. whether they are more sensitive to gene B inactivation. In addition, we calculate the positive dependency percentage which is the fraction of such cell lines out of the total number of cell lines, which have a positive estimated dependency score as a result of gene B deactivation. A positive dependency score implies that the cells grow and divide more efficiently when gene B is deactivated. Thus, a high positive dependency percentage implies that a given pair should not be considered SL.

Test application criteria To perform SPID test for a given gene pair, gene A must have LoF alterations in at least 20 cell lines and gene B must be a knock-out target in a given cell line experiment. The criterion of minimum of 20 LoF mutations in cell lines (SPID, SPEA and SPDD tests) or patients (SoF, SurvLRT and iSurvLRT) is set arbitrarily and consequently across all tests to ensure enough statistical power.

SL evidence criteria Gene pairs with the combined and adjusted p-value less than 0.05, where the combination is across two datasets (Achilles and DEMETER2) and two tests (SPID and SPEA).

Synthetic partner enrichment analysis (SPEA)

SPEA test is a novel modification of a widely known and used Gene Set Enrichment Analysis (GSEA). GSEA identifies sets of genes that are over- or under-represented in lists ranked by gene expression²⁹. Specifically, given a list of genes ranked by correlation of their expression with a certain phenotype of interest and a certain set of genes, GSEA uses a permutation-based test and a Kolmogorow-Smirnoff statistic to compute the significance of enrichment of the gene set at the top of the ranked gene list.

Instead of gene expression, SPEA ranks cell lines according to their gene dependency score of gene B (instead of differences in expression between two samples typically used in GSEA) so that the cell lines most sensitive to gene B silencing are situated at the top of the list. We are then interested whether the subset of such cell lines that carry gene A LoF alteration, is enriched at the top of that ranked list of cell lines. We thus calculate an enrichment score (ES) by walking down the list, increasing a running-sum statistic when we encounter a cell line from the subset, and decreasing it when we encounter cell lines without gene A LoF alteration. The ES is the maximum deviation from zero encountered in the random walk and corresponds to a weighted Kolmogorov-Smirnov-like statistic²⁹. The ES indicates the direction of enrichment - a positive score means that cell lines with gene A LoF alteration are concentrated at the beginning of the ranking i.e. are more sensitive to gene B knockdown. See Supplementary Text for a formal description of the ES calculation.

We estimate the statistical significance (nominal p-value) of the ES by comparing it with the set of scores $\text {ES}_{\text {rand}}$, computed with randomly assigned gene A LoF alteration status and for a reordered cell line list. We perform this permutation step 200 times, recompute $\text {ES}_{\text {rand}}$ of the gene set for the permuted data and compile all results to generate a null distribution for the ES. The empirical, nominal p-value of the observed ES is then calculated relative to this null distribution.

Test application criteria To perform SPEA test for a given gene pair, gene A must have LoF alterations in at least 20 cell lines and gene B must be a knock-out target in a given cell line experiment.

SL evidence criteria Gene pairs with the combined and adjusted p-value less than 0.05, where the combination is across two datasets (Achilles and DEMETER2) and two tests (SPID and SPEA).

Gene co-expression (ExprSL)

ExprSL test is computed as the pairwise gene expression correlation and its significance is evaluated using a two-sided t-test for Spearman correlation. ExprSL is based on two assumptions. First is that SL gene pairs are involved in closely related biological processes and thus are more likely to be significantly positively correlated. Second is that the SL partner genes may compensate for each other and thus can be significantly negatively correlated⁷⁸.

Test application criteria To perform ExprSL test for a given gene pair, both gene A and gene B’s expression must be measured.

SL evidence criteria Gene pairs with the absolute Spearman correlation coefficient higher than 0.4 and adjusted p-value less than 0.05 are considered significantly correlated.

Survival of the fittest (SoF)

SoF test assumes that cells with joint LoF alteration of gene A and reduced expression of gene B in a given SL pair will not survive in a tumor cell population⁷⁸. Thus, intuitively, SoF test assesses whether tumors with LoF alteration of gene A compensate for this loss by an increase of gene B expression. Specifically, SoF uses a one-sided Wilcoxon rank-sum test to examine whether gene B has a significantly higher expression in samples with LoF alteration in gene A compared to the rest of the samples.

Test application criteria. To perform SoF test for a given gene pair, gene A must have LoF alterations in at least 20 patients and gene B’s expression must be measured.

SL evidence criteria Gene pairs with SoF test adjusted p-value less than 0.05 are considered significant according to this test.

SurvLRT

We use the survival likelihood ratio test (SurvLRT)¹⁴ to estimate the tumor fitness with a given genotype g from survival data of patients. Here, the genotype $g = (g_A, g_B)$ is defined by alterations in gene A and gene B. Specifically, for a given patient, $g_A = 0$ if gene A is not altered in that patient’s tumor, and $g_A = 1$ otherwise, and similarly for $g_B$. In the original approach¹⁴, the alteration could be of any type. Here, it is strictly confined to the definition of LoF alteration. SurvLRT assumes a survival model of tumor fitness, stating that a decrease in tumor fitness due to LoF alteration in gene A and gene B is exhibited by a proportional increase of survival of the patients. Thus, the survival of patients with LoF alterations in both SL genes should be longer than expected from the survival of patients without LoF alteration in those genes or with only one gene altered.

Consider a reference survival function S(t), estimated based on a cohort of patients who did not die of cancer as by¹⁴. Denote the fitness of a tumor with genotype g by $\Delta _g$ and denote the log fitness as $\delta _g = \log (\Delta _g)$. We assume that the survival of patients whose tumor carries genotype g is given by $S(t)^{\Delta _g}$. In the case when there is no epistatic relation between genes A and B, we expect

$$\begin{aligned} \delta _{00} + \delta _{11} = \delta _{01} + \delta _{10}. \end{aligned}$$

(1)

In the case when gene A and gene B are in any epistatic relation (positive or negative), however, we expect that

$$\begin{aligned} \delta _{00} + \delta _{11} \ne \delta _{01} + \delta _{10}. \end{aligned}$$

(2)

Finally, for A and B being synthetic lethal partners, we expect that

$$\begin{aligned} \delta _{00} + \delta _{11} < \delta _{01} + \delta _{10}. \end{aligned}$$

(3)

SurvLRT is a likelihood ratio test, which is based on analytical estimates of the parameters ${\bar{\Delta }}_{00}$, ${\bar{\Delta }}_{11}$, ${\bar{\Delta }}_{01}$, and ${\bar{\Delta }}_{10}$, (and, correspondingly ${\bar{\delta }}_{00}$, ${\bar{\delta }}_{11}$, ${\bar{\delta }}_{01}$, and ${\bar{\delta }}_{10}$,) and verifies the null hypothesis given by Eq. (1) against an alternative hypothesis defined by inequality Eq. (2). To decide if the detected interaction is synthetic lethal, we compute the effect size $\delta = {\bar{\delta }}_{00} + {\bar{\delta }}_{11} - {\bar{\delta }}_{01} - {\bar{\delta }}_{10}$ and check if the constraint $\delta < 0$ holds. If it holds, we set an “SL flag” to 1 and report the pair as synthetic lethal with the associated p-value from the likelihood ratio test. Otherwise, we set the “SL flag” to $-1$. For each investigated pair, we in addition report the log fitness of the double LoF alteration genotype that would be expected in the case of no epistatic interaction, i.e., in the case when Eq. (1) would hold, denoted $\delta _{11}^{\text {Expected}}$.

Importantly, not all such SL interactions are clinically relevant. Inequality Eq. (3) can also hold when $\delta _{00} < \delta _{11}$. In such a case, the fitness of the genotype with double LoF alterations $\Delta _{11}$ is still unexpectedly high, given the single LoF alterations. In particular, it is smaller than the fitness of the double genotype expected in the case of no epistatic interaction, $\delta _{11}^{\text {Expected}}$. Such an interaction, however, is not clinically relevant: turning the synthetic lethal partner off in addition to the first gene in the pair using treatment would cause the patients to survive worse than patients without the inactivation of either of the genes. Thus, we in addition set a clinically relevant (“CL”) flag, which is 1 if $\delta _{00} > \delta _{11}$ and $-1$ otherwise.

Test application criteria To perform SurvLRT for a given gene pair, gene A must have LoF alterations in at least 20 patients. Additionally, at least 5 patients must have LoF alteration and be classified as deceased for each genotype.

SL evidence criteria Gene pairs with SurvLRT test adjusted p-value less than 0.05, both SL flag and CL flag equal to one are considered significant evidence for clinically relevant SL according to this test.

iSurvLRT

Here, we propose a novel test, called iterative SurvLRT (iSurvLRT), which is an extension to SurvLRT. SurvLRT test¹⁴ has limited applicability, as it can only be used to test such gene pairs where both genes carry LoF alterations in a sufficient number of tumor samples. Instead, for a given gene A, which is often altered in cancer, it is desirable to find a partner B that itself does not necessarily acquire alterations in tumors.

The solution is iSurvLRT that bases on LoF alterations in gene A and expression of gene B. Specifically, LoF alteration status of gene A in a given patient is defined as $g_A = 0$ if A is not altered in the patient and $g_A = 1$ if gene A is altered. The expression status of gene B in the same patient is defined based on the fact whether the expression of gene B, denoted $e_B$, is low in that patient or not, i.e. $g_B(t) = 0$ if $e_B >= t$ and $g_B(t) = 1$ if $e_B < t$ for a given threshold t. To define the threshold t, we consider a grid of possible thresholds given by the quantiles of the empirical distribution of expression of gene B. Specifically, the grid of thresholds is defined by $t \in (q(0.05), q(0.1), \dots , q(0.5))$, where $q(\alpha )$ is the $\alpha$-th quantile of the distribution of $e_B$. We next iterate over the grid of thresholds and define the genotypes $g (t)= (g_A, g_B(t))$ for each patient based on the current threshold t in the iteration. For each iteration, we compute the p-value in SurvLRT test for the genotype g(t) obtained for the current threshold. Finally, we return the lowest p-value across all iterations, the threshold used in that iteration and the obtained SL and CL flags.

Test application criteria To perform iSurvLRT for a given gene pair, gene A must have LoF alterations in at least 20 patients and gene B’s expression must be measured. At least 5 patients must have LoF alteration and be classified as deceased for each genotype.

SL evidence criteria Gene pairs with iSurvLRT test adjusted p-value less than 0.05, both SL flag and CL flag equal to one are considered significant evidence for clinically relevant SL according to this test.

Synthetic partner drug dependency (SPDD)

SPDD test assesses the susceptibility of cancer cell lines with gene A LoF alteration to drugs targeting gene B. To this end, cell lines are grouped to either wild type (WT) or gene A LoF. The potential drug sensitivity is assessed using the drug targets from GDSC and PRISM data and a one-sided Wilcoxon test on the ln(IC50) values (natural logarithm of the fitted half maximal inhibitory concentration for GDSC datasets) and median-collapsed log fold change profile corresponding to the WT and LoF group (cell lines without or with LoF alteration in gene A, respectively). This test is performed separately for each drug and the drug with the best result (the lowest p-value) is reported.

Test application criteria To perform SPDD test for a given gene pair, gene A must have LoF alterations in at least 20 cell lines and gene B must be a known drug target.

SL evidence criteria Gene pairs with SPDD test p-value less than 0.05 are considered significant evidence for SL according to this test.

Synthetic partners shared pathways (SPSP)

SPSP assesses enrichment in common pathways using a part of c2 collection from MSigDB, consisting of pathways from KEGG, PID and Reactome databases. Specifically, a hypergeometric test (calculating the probability of co-existence of gene A and gene B in pathways) is used to detect pairs that share a significantly large number of pathways, given the number of pathways each of the genes is involved in.

Test application criteria To perform SPSP test for a given gene pair, both gene A and gene B must be present in at least one of the considered pathways.

SL evidence criteria Gene pairs with SPSP test adjusted p-value less than 0.05 are considered significant evidence for SL according to this test.

slideCell and slidePat: cell line and patient tests R implementation

SL tests on cell line data were implemented in an R package called slideCell. Given dependency scores and alteration information for cancer cell lines, as well as a list of gene pairs, slideCell can be used to generate statistics, p-values and plots for SPID and SPEA tests. The source code for slideCell package is freely available at https://github.com/szczurek-lab/slideCell. SoF, ExprSL, SurvLRT and iSurvLRT tests on patient data were implemented in an R package slidePat, which takes as input gene alteration and expression, as well as patient survival data. The source code for slidePat package is freely available at https://github.com/szczurek-lab/slidePat.

SLIDE-VIP WebApp

The SLIDE-VIP WebApp is an online application for the visualization of test results for 224,169 potential SL gene pairs from this publication. The application has been developed in RStudio⁷⁹, version 1.2.5033, using the Shiny package, version 1.5.0 and R⁸⁰ version 4.0.5. The application is freely available online at slidevip.app.

Data availibility

The sources of the data used in the framework are described in the Materials and “Methods” Section. The results for the top patient verified tests are summarized in the Supplementary Tables 3 and 4. The results of all the performed tests are available at slidevip.app. The slideCell R package is available at https://github.com/szczurek-lab/slideCell. The slidePat R package is available at https://github.com/szczurek-lab/slidePat.

References

Nijman, S. M. Synthetic lethality: General principles, utility and detection using genetic screens in human cells. FEBS Lett. 585, 1–6 (2011).
Article CAS PubMed PubMed Central Google Scholar
Topatana, W. et al. Advances in synthetic lethality for cancer therapy: Cellular mechanism and clinical translation. J. Hematol. Oncol. 13, 1–22 (2020).
Article Google Scholar
Lord, C. J. & Ashworth, A. Parp inhibitors: Synthetic lethality in the clinic. Science 355, 1152–1158 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Hu, J. et al. Targeting mutant p53 for cancer therapy: Direct and indirect strategies. J. Hematol. Oncol. 14, 1–19 (2021).
Article Google Scholar
Yap, T. A. et al. First-in-human trial of the oral ataxia telangiectasia and rad3-related (atr) inhibitor bay 1895344 in patients with advanced solid tumorsatr inhibitor bay 1895344 in advanced solid tumors. Cancer Discov. 11, 80–91 (2021).
Article CAS PubMed Google Scholar
Yap, T. A. et al. A first-in-human phase I study of atr inhibitor m1774 in patients with solid tumors. J. Clin. Oncol. 39, 3153. https://doi.org/10.1200/JCO.2021.39.15_suppl.TPS3153 (2021).
Article Google Scholar
Giglia-Mari, G., Zotter, A. & Vermeulen, W. Dna damage response. Cold Spring Harbor Perspect. Biol. 3, a000745 (2011).
Article Google Scholar
Nair, S. S. & Kumar, R. Chromatin remodeling in cancer: A gateway to regulate gene transcription. Mol. Oncol. 6, 611–619 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).
Article CAS PubMed Google Scholar
Srivatsa, S. et al. Discovery of synthetic lethal interactions from large-scale pan-cancer perturbation screens. bioRxiv 2019, 810374 (2019).
Google Scholar
Shao, C., Westermann, F. & Höfer, T. Synlet: An r package for systemically analyzing synthetic lethal rna interference screen data. bioRxiv 2016, 043570 (2016).
Google Scholar
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576 (2017).
Article CAS PubMed PubMed Central Google Scholar
McDonald, E. R. III. et al. Project drive: A compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep rnai screening. Cell 170, 577–592 (2017).
Article CAS PubMed Google Scholar
Matlak, D. & Szczurek, E. Epistasis in genomic and survival data of cancer patients. PLoS Comput. Biol. 13, e1005626 (2017).
Article ADS PubMed PubMed Central Google Scholar
Sinha, S. et al. Systematic discovery of mutation-specific synthetic lethals by mining pan-cancer human primary tumor data. Nat. Commun. 8, 1–13 (2017).
Article ADS Google Scholar
Jerby-Arnon, L. et al. Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality. Cell 158, 1199–1209 (2014).
Article CAS PubMed Google Scholar
Das, S., Deng, X., Camphausen, K. & Shankavaram, U. Discoversl: An r package for multi-omic data driven prediction of synthetic lethality in cancers. Bioinformatics 35, 701–702 (2019).
Article CAS PubMed Google Scholar
Deng, X., Das, S., Valdez, K. & Camphausen, K. Sl-biodp: Multi-cancer interactive tool for prediction of synthetic lethality and response to cancer treatment. Cancers 11, 1682 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lee, J. S. et al. Harnessing synthetic lethality to predict the response to cancer treatment. Nat. Commun. 9, 1–12 (2018).
ADS Google Scholar
Ku, A. A. et al. Integration of multiple biological contexts reveals principles of synthetic lethality that affect reproducibility. Nat. Commun. 11, 1–12 (2020).
Article ADS Google Scholar
Huang, J., Wu, M., Lu, F., Ou-Yang, L. & Zhu, Z. Predicting synthetic lethal interactions in human cancers using graph regularized self-representative matrix factorization. BMC Bioinform. 20, 1–8 (2019).
Article Google Scholar
De Kegel, B., Quinn, N., Thompson, N. A., Adams, D. J. & Ryan, C. J. Comprehensive prediction of robust synthetic lethality between paralog pairs in cancer cell lines. Cell Syst. 12, 1144–1159 (2021).
Article PubMed Google Scholar
Fisher, R. A. Statistical methods for research workers. In Breakthroughs in Statistics 66–70 (Springer, 1992).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
MathSciNet MATH Google Scholar
Corsello, S. M. et al. Discovering the anticancer potential of non-oncology drugs by systematic viability profiling. Nat. Cancer 1, 235–248 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yang, W. et al. Genomics of drug sensitivity in cancer (gdsc): A resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41, D955–D961 (2012).
Article PubMed PubMed Central Google Scholar
Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Article CAS PubMed PubMed Central Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, J. et al. Synlethdb 2.0: A web-based knowledge graph database on synthetic lethality for novel anticancer drug discovery. Database (2022).
Reinhardt, H. C., Jiang, H., Hemann, M. T. & Yaffe, M. B. Exploiting synthetic lethal interactions for targeted cancer therapy. Cell Cycle 8, 3112–3119 (2009).
Article CAS PubMed Google Scholar
Fang, B. Development of synthetic lethality anticancer therapeutics. J. Med. Chem. 57, 7859–7873. https://doi.org/10.1021/jm500415t (2014).
Article CAS PubMed PubMed Central Google Scholar
Riabinska, A. et al. Therapeutic targeting of a robust non-oncogene addiction to prkdc in atm-defective tumors. Sci. Transl. Med. 5, 18978 (2013).
Article Google Scholar
Clermont, F., Nittner, D. & Marine, J.-C. Igf2: The achilles’ heel of p53-deficiency?. EMBO Mol. Med. 4, 688–690 (2012).
Article CAS PubMed PubMed Central Google Scholar
Haley, V. L. et al. Igf2 pathway dependency of the trp53 developmental and tumour phenotypes. EMBO Mol. Med. 4, 705–718 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lyu, J. et al. Synthetic lethality of rb1 and aurora a is driven by stathmin-mediated disruption of microtubule dynamics. Nat. Commun. 11, 1–16 (2020).
Article ADS Google Scholar
Brough, R. et al. Identification of highly penetrant rb-related synthetic lethal interactions in triple negative breast cancer. Oncogene 37, 5701–5718 (2018).
Article CAS PubMed PubMed Central Google Scholar
Linn, P. et al. Targeting rb1 loss in cancers. Cancers 13, 3737 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhao, H. et al. Deletions of retinoblastoma 1 (rb1) and its repressing target s phase kinase-associated protein 2 (skp2) are synthetic lethal in mouse embryogenesis. J. Biol. Chem. 291, 10201–10209 (2016).
Article CAS PubMed PubMed Central Google Scholar
Parkhitko, A. A. et al. Cross-species identification of pip5k1-, splicing-and ubiquitin-related pathways as potential targets for rb1-deficient cells. PLoS Genet. 17, e1009354 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. & Simon, R. Identification of potential synthetic lethal genes to p53 using a computational biology approach. BMC Med. Genom. 6, 1–10 (2013).
Article Google Scholar
Shen, J. P. et al. Combinatorial crispr-cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Sasaki, M. & Ogiwara, H. Synthetic lethal therapy based on targeting the vulnerability of swi/snf chromatin remodeling complex-deficient cancers. Cancer Sci. 111, 774–782 (2020).
Article CAS PubMed PubMed Central Google Scholar
Helming, K. C. et al. Arid1b is a specific vulnerability in arid1a-mutant cancers. Nat. Med. 20, 251–254 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kelso, T. W. et al. Chromatin accessibility underlies synthetic lethality of swi/snf subunits in arid1a-mutant cancers. Elife 6, e30506 (2017).
Article PubMed PubMed Central Google Scholar
Ehrenhöfer-Wölfer, K. et al. Smarca2-deficiency confers sensitivity to targeted inhibition of smarca4 in esophageal squamous cell carcinoma cell lines. Sci. Rep. 9, 11661. https://doi.org/10.1038/s41598-019-48152-x (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
de Bono, J. et al. Olaparib for metastatic castration-resistant prostate cancer. N. Engl. J. Med. 382, 2091–2102 (2020).
Article PubMed Google Scholar
Callén, E. et al. Essential role for dna-pkcs in dna double-strand break repair and apoptosis in atm-deficient lymphocytes. Mol. Cell 34, 285–297 (2009).
Article PubMed PubMed Central Google Scholar
Cantley, L. C. The phosphoinositide 3-kinase pathway. Science 296, 1655–1657 (2002).
Article ADS CAS PubMed Google Scholar
Li, J. et al. Pten, a putative protein tyrosine phosphatase gene mutated in human brain, breast, and prostate cancer. Science 275, 1943–1947 (1997).
Article CAS PubMed Google Scholar
Maehama, T. & Dixon, J. E. The tumor suppressor, pten/mmac1, dephosphorylates the lipid second messenger, phosphatidylinositol 3, 4, 5-trisphosphate. J. Biol. Chem. 273, 13375–13378 (1998).
Article CAS PubMed Google Scholar
Wee, S. et al. Pten-deficient cancers depend on pik3cb. Proc. Natl. Acad. Sci. USA 105, 13057–13062 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Ni, J. et al. Functional characterization of an isoform-selective inhibitor of pi3k-p110$\beta$ as a potential anticancer agent. Cancer Discov. 2, 425–433 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chang, A. et al. Recruitment of kmt2c/mll3 to dna damage sites mediates dna damage responses and regulates parp inhibitor sensitivity in cancerkmt2c regulates ddr and parpi responses. Cancer Res. 81, 3358–3373 (2021).
Article CAS PubMed PubMed Central Google Scholar
Rampias, T. et al. The lysine-specific methyltransferase kmt 2c/mll 3 regulates dna repair components in cancer. EMBO Rep. 20, e46821 (2019).
Article PubMed PubMed Central Google Scholar
Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res. 49, D884–D891 (2021).
Article CAS PubMed Google Scholar
Wang, K., Li, M. & Hakonarson, H. Annovar: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164–e164 (2010).
Article PubMed PubMed Central Google Scholar
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using polyphen-2. Curr. Protoc. Hum. Genet. 76, 7–20 (2013).
Google Scholar
Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the sift algorithm. Nat. Protoc. 4, 1073–1081 (2009).
Article CAS PubMed Google Scholar
Choi, Y., Sims, G. E., Murphy, S., Miller, J. R. & Chan, A. P. Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7, 1–13. https://doi.org/10.1371/journal.pone.0046688 (2012).
Article CAS Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS PubMed PubMed Central Google Scholar
Quang, D., Chen, Y. & Xie, X. Dann: A deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2015).
Article CAS PubMed Google Scholar
Giacomini, C. P. et al. A gene expression signature of genetic instability in colon cancer. Cancer Res. 65, 9200–9205 (2005).
Article CAS PubMed Google Scholar
Carter, S. L., Eklund, A. C., Kohane, I. S., Harris, L. N. & Szallasi, Z. A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nat. Genet. 38, 1043–1048 (2006).
Article CAS PubMed Google Scholar
Pearl, L. H., Schierz, A. C., Ward, S. E., Al-Lazikani, B. & Pearl, F. M. Therapeutic opportunities within the dna damage response. Nat. Rev. Cancer 15, 166–180 (2015).
Article CAS PubMed Google Scholar
Knijnenburg, T. A. et al. Genomic and molecular landscape of dna damage repair deficiency across the cancer genome atlas. Cell Rep. 23, 239–254 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cunningham, C. E. et al. Targeting the cinful genome: Strategies to overcome tumor heterogeneity. Prog. Biophys. Mol. Biol. 147, 77–91 (2019).
Article CAS PubMed Google Scholar
Ochoa, D. et al. Open targets platform: Supporting systematic drug-target identification and prioritisation. Nucleic Acids Res. 49, D1302–D1310 (2021).
Article CAS PubMed Google Scholar
Marcotte, R. et al. Functional genomic landscape of human breast cancer drivers, vulnerabilities, and resistance. Cell 164, 293–309 (2016).
Article CAS PubMed PubMed Central Google Scholar
McFarland, J. M. et al. Improved estimation of cancer dependencies from large-scale rnai screens using model-based normalization and data integration. Nat. Commun. 9, 1–13 (2018).
Article ADS CAS Google Scholar
Goldman, M. J. et al. Visualizing and interpreting cancer genomics data via the xena platform. Nat. Biotechnol. 38, 675–678 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bolstad, B. “preprocesscore: A collection of pre-processing functions. r package version 1.48.0 (2019).
Ogata, H. et al. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27, 29–34 (1999).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28, 1947–1951 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M. & Tanabe, M. Kegg: Integrating viruses and cellular organisms. Nucleic Acids Res. 49, D545–D551 (2021).
Article CAS PubMed Google Scholar
Schaefer, C. F. et al. Pid: The pathway interaction database. Nucleic Acids Res. 37, D674–D679 (2009).
Article CAS PubMed Google Scholar
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).
CAS PubMed Google Scholar
Thompson, J. M., Nguyen, Q. H., Singh, M. & Razorenova, O. V. Focus: A multifaceted battle against cancer: Approaches to identifying synthetic lethal interactions in cancer. Yale J. Biol. Med. 88, 145 (2015).
PubMed PubMed Central Google Scholar
RStudio Team. RStudio: Integrated Development Environment for R (RStudio, PBC., 2020).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2021).

Download references

Acknowledgements

The work by MM was done during Interdisciplinary doctorate studies using next generation sequencing in personalized medicine (at Postgraduate School of Molecular Medicine, Medical University of Warsaw). The work was funded by the Polish National Science Centre grant OPUS no. 2019/33/B/NZ2/00956 to ESz. This project was co-funded by Merck Healthcare KGaA, Darmstadt, Germany. MM was co-funded by the European Social Fund POWER program (financing agreement no. POWR.03.02.00-00-I041/16-00).

Author information

These authors contributed equally: Magda Markowska and Magdalena A. Budzinska.

Authors and Affiliations

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Stefana Banacha 2, 02-097, Warsaw, Poland
Magda Markowska, Magdalena A. Budzinska, Senbai Kang, Ewa Kizling, Krzysztof Koras & Ewa Szczurek
Postgraduate School of Molecular Medicine, Medical University of Warsaw, Zwirki i Wigury 61, 02-091, Warsaw, Poland
Magda Markowska
Ardigen S.A., Podole 76, 30-394, Cracow, Poland
Magdalena A. Budzinska & Krzysztof Kolmus
Translational Medicine, Oncology Bioinformatics, Merck Healthcare KGaA, Frankfurt Strasse 250, 64293, Darmstadt, Germany
Anna Coenen-Stass & Eike Staub

Authors

Magda Markowska
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena A. Budzinska
View author publications
You can also search for this author in PubMed Google Scholar
Anna Coenen-Stass
View author publications
You can also search for this author in PubMed Google Scholar
Senbai Kang
View author publications
You can also search for this author in PubMed Google Scholar
Ewa Kizling
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Kolmus
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Koras
View author publications
You can also search for this author in PubMed Google Scholar
Eike Staub
View author publications
You can also search for this author in PubMed Google Scholar
Ewa Szczurek
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.M., M.B. and E.Sz. conceptualized the initial idea of the framework. M.B. and E.Sz. developed the iSurvLRT methodology. M.M. performed the cell line tests and developed slideCell package, M.B. performed the patient data tests and implemented slidePat package, K.Kor. and K.Kol. performed the tests on drug data and E.K. performed the tests on pathway data. M.M. combined and ranked the results of all the tests. K.Kol.implemented the SLIDE-VIP WebApp. M.M., M.B. and E.K. prepared the figures. A.C.S. and E.St. provided biological interpretation of the results. M.M., M.B. and E.Sz. wrote the initial draft of the manuscript. All authors provided critical feedback; helped shape the research and analysis; edited, reviewed and approved the manuscript.

Corresponding author

Correspondence to Ewa Szczurek.

Ethics declarations

Competing interests

Merck Healthcare KGaA provides funding for the research group of ESz. MM is co-financed by part of these research funds. ACS and ESt work at Merck Healthcare KGaA. MB and KKol work at Ardigen S.A. The development of the SLIDE-VIP WebApp was done by KKol during his work at Ardigen S.A. and was funded by Merck Healthcare KGaA.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Table 1.

Supplementary Table 2.

Supplementary Table 3.

Supplementary Table 4.

Supplementary Table 5.

Supplementary Table 6.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Markowska, M., Budzinska, M.A., Coenen-Stass, A. et al. Synthetic lethality prediction in DNA damage repair, chromatin remodeling and the cell cycle using multi-omics data from cell lines and patients.. Sci Rep 13, 7049 (2023). https://doi.org/10.1038/s41598-023-34161-4

Download citation

Received: 22 September 2022
Accepted: 25 April 2023
Published: 29 April 2023
DOI: https://doi.org/10.1038/s41598-023-34161-4

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.