High-throughput RNAi screen for essential genes and drug synergistic combinations in colorectal cancer

Metastatic colorectal cancer is a leading cause of cancer death. However, current therapy options are limited to chemotherapy, with the addition of anti-EGFR antibodies for patients with RAS wild-type tumours. Novel drug targets, or drug combinations that induce a synergistic response, would be of great benefit to patients. The identification of genes that are essential for cell survival can be undertaken using functional genomics screens. Furthermore, performing such screens in the presence of a targeted agent would allow the identification of combinations that result in a synthetic lethal interaction. Here, we present a dataset containing the results of a large scale RNAi screen (815 genes) to detect essential genes as well as synergistic combinations with targeted therapeutic agents using a panel of 27 colorectal cancer cell lines. These data identify genes that are essential for colorectal cancer cell survival as well as synthetic lethal treatment combinations using novel computational approaches. Moreover, this dataset could be utilised in combination with genomic profiling to identify predictive biomarkers of response.


Background & Summary
Colorectal cancer is a leading cause of cancer mortality in the UK 1,2 , with >15,000 deaths each year. While early diagnosis leads to significantly higher survival, the prognosis for advanced stage disease remains poor. Therapeutic options for advanced colorectal cancer are currently limited to chemotherapy regimens (FOLFOX or FOLFIRI), and the anti-EGFR antibodies cetuximab or panitumumab 3,4 . While anti-EGFR therapy is beneficial in a subset of patients, tumours with activating KRAS, NRAS or BRAF mutations are intrinsically resistant, and exhibit constitutive activation of the MEK/ERK and/or PI3K/AKT signalling pathways 3,5,6 .
Despite the development of targeted agents over the last few decades, effective strategies to treat latestage colorectal cancer have not emerged. While the concept of inhibiting a single signalling molecule remains attractive, in practice the inhibition of multiple targets often results in a more durable response. Furthermore, through the simultaneous inhibition of multiple signalling pathways, combination therapy may allow synthetic lethal interactions to be harnessed, and overcome intrinsic resistance to targeted agents [7][8][9][10] .
Several recent studies have successfully utilised high-throughput RNAi screening for profiling the essential gene landscape, and the identification of novel drug targets for cancer therapy [11][12][13][14] . Further, both chemical and genomic screens for synthetic lethal combinations have successfully identified targetable cancer-specific sensitivities 8,[15][16][17][18][19] . However, it is important to note that in order to properly address the genomic heterogeneity of cancer, multiple cell lines are required. As is commonly seen, failure to do so results in a dataset that fails to capture the genetic diversity observed in the clinic, and identifies hits that are relevant only to the one cell line used.
In this study, high-throughput functional genomics screens were used to identify and validate synthetic lethal combinations in a panel of colorectal cancer cell lines. An overview of the experimental design is shown in Fig. 1. The screen contained four experimental arms: the DMSO anchor arm, to assess the activity of siRNA alone, and the three 'plus-drug' arms where cells were also treated with one of three anchor drugs targeting EGFR (cetuximab), MEK (trametinib) or PI3K (BYL719). Twenty-seven colorectal cancer cell lines were transfected with a customised siRNA library targeting the human kinome as well as 95 genes commonly mutated in colorectal cancer (Data records 1 and 3). We elected to target kinases in our screen as they are potentially more amenable to drug development in the event of specific vulnerabilities being identified. Additionally, for some kinases chemical probes or indeed drugs may  already exist. Following 72 h incubation with the siRNA libraries, cells were assessed for decreased ATP levels (using the CellTiter Glo assay (Promega)), as an indicator of decreased viability. The primary siRNA screen hits were binned according to the screen arm: DMSO arm hits were binned as 'Essential' using a z-score cut-off of o − 3, while 'plus-drug' arm hits were ranked according to their synergy score, and the number of cell lines that showed a synergistic response (Data records 1 and 3). Essential genes identified in the primary screen were enriched for genes connected to the cell cycle, and contained many known pan-cancer single gene vulnerabilities [20][21][22][23] (Data records 1 and 3). A total of 37 genes were identified that scored o − 3 across three or more different cell lines.
The top 38 drug/gene synergistic combinations were selected for confirmation in a secondary deconvolution screen (4 siRNA sequences per gene) (Data records 2 and 3). siRNAs targeting the 'Essential' genes PLK1 and SF3B1 were also included. Deconvoluted siRNAs targeting 40 genes were rescreened as previously against the DMSO arm and three 'plus-drug' arms ( Supplementary Fig. 1). This secondary screen led to the validation of 8 drug/siRNA combinations, which reproduced the original synergistic phenotype with 2 or more individual siRNA in greater than 75% of the cell lines that showed synergy in the original primary screen (Data records 2 and 3). These strict validation criteria mean that only hits that were confirmed in multiple cell line models with multiple siRNAs are considered 'validated' with high confidence and should be prioritised in future investigations. As a result the datasets presented here will be of benefit to the fields of cancer biology, therapeutics, and molecular signalling, and further provides the broader high-throughput screening community with a robust method and screening dataset.

Cell lines
A panel of commercially available colorectal cancer cell lines 24,25 were grown in either RPMI or DMEM/ F12 medium, supplemented with 10% fetal calf serum and 1% penicillin/streptavidin, and maintained at 37°C in a humidified atmosphere at 5% CO 2 . Cell lines used in these screens, and the number of cells seeded per well, are listed in Table 1. The number of cells per well was established as part of the Genomics of Drug Sensitivity in Cancer (GDSC) screening project. The dynamic range within the assay was maximised by determining the number of cells required to achieve the greatest intensity value, while also ensuring the growth of untreated cells was unrestricted by factors such as space and nutrients. A minimum of 6 seeding densities were tested under screening conditions using a two-fold dilution series. All cell lines are routinely SNP profiled to detect cross-contamination and STR profiled to confirm their identity with the providing repository.

Assay optimisation
Prior to screening we performed optimisation of assay conditions using a subset of 11 colorectal cancer cell lines. The type and concentration of transfection reagent was established by transfecting cells with Non-targeting siRNA pool #2 or si_PLK1 using six different transfection reagents (DharmaFECT 1, DharmaFECT 2, DharmaFECT 3, DharmaFECT 4 (all from Dharmacon), Lipofectamine RNAiMAX, and Lipofectamine 2,000 (both from Invitrogen)) at 4 different concentrations, with 4 different siRNA concentrations. The optimal transfection score was calculated for each cell line as follows: This score takes into account any negative effect on cell viability that the transfection reagent may have, as well as rewarding conditions where the positive control siRNA (si_PLK1) is best at reducing cell viability compared to the control. The conditions with the best average OTS across all 11 lines were selected for screening. The concentration of anchor drugs was selected by performing dose response curves for each drug against 16 colorectal cancer cell lines. As the aim was to identify synergistic drug/siRNA combinations, it was important to select an anchor drug concentration that had a minimal effect on cell viability alone. A dose of the cetuximab and BYL719 compounds was therefore selected that inhibited cell viability by less than 20% in more than 75% of cell lines. Colorectal cancer is often driven by activation of the MEK/ERK pathway, and many of the lines on which trametinib was tested showed sensitivity to MEK inhibition. The dose of trametinib was therefore selected to 1) demonstrate effective inhibition of MEK phosphorylation and 2) inhibit cell viability by o30% in more than 50% of the cell lines.

High throughput RNA interference screening
The protocol developed for this screen was adapted from previously published methods 26 , and is described below in detail. The primary siRNA screen was performed with biological replicates (see Supplementary Fig. 2 for number of replicates of each line), in 1,536-well plates. This enabled all 815 siRNA SMARTpools (4 siRNA pooled per gene) to be screened on one assay plate. A wide variety of positive and negative controls were located in specified wells across the plate, as depicted in Supplementary Fig. 1. The key positive controls for siRNA transfection were siRNA SMARTpools targeting PLK1 (GE Dharmacon, #M-003290-01); known to be important for cell cycle, and a propriety cell death siRNA control (siTOX) (GE Dharmacon, #D-001500-01). The broad kinase inhibitor staurosporine (2 uM) was also added to select wells of each plate as a positive control for treatment by the anchor compounds. Non-targeting siRNA pool #2 was utilised as a negative control for siRNA transfection (GE Dharmacon, #D-001206-14). Mock transfected (lipid only) wells were also added as a reference. Other controls that were included, but not used were siRNA SMARTpool targeting KIF11 (GE Dharmacon, #M-003317-01), Non-targeting siRNA sequence #2 (GE Dharmacon, #D-001210-02) and siGENOME RISC-Free Control siRNA (GE Dharmacon, #D-001220-01).
The custom SMARTpool siRNA library was designed to target 95 genes commonly mutated in colorectal cancer (Data Record 3). Of these 95 genes, 15 were already present in the kinome library, resulting in 15 duplicates, and overall 794 unique siRNAs (the kinome library also has 6 duplicates). Note that duplicate wells are not able to be reported in Pubchem (Data Record 1), but are included in the raw data files (Data Record 3).
The siRNA transfer and addition of each anchor drug were performed using an Echo 555 acoustic dispenser (Labcyte Inc.). An XRD-384 (FluidX) automated reagent dispenser was used for all other liquid handling steps.
Specialised consumable reagents that were required to perform the siRNA screens: •  (Table 1). For the siRNA transfection, 70 nl of the siRNA library (2.5 μM stock) was dispensed into each well of 1,536-well plates (Corning, #3893) (using an Echo 555 acoustic dispenser). Lipofectamine RNAiMAX transfection reagent was diluted in OPTI-MEM media (1:50) and incubated for 5 min, before dispensing 1.5 μl into each well of each plate.
Plates were then incubated for 20 min at room temperature, before dispensing 6 μl cells to each well (FluidX XRD-384 multiwell dispenser) (total well volume 7.5 μl). The final siRNA concentration was 23.9 nM.
Column 1 of each plate received media only (no cells) for background luminescence readings. The plate layout was designed so that edge wells were not used, and control wells (positive and negative) were spread across the plate (Supplementary Fig. 1).
Plates were incubated at 37°C in a humidified atmosphere at 5% CO 2 in a Cytomat 24C rotating incubator (Thermo Fisher Scientific) to minimise temperature gradients.

Day 2: Drug treatment
After 24 h incubation, 7.5 nl of anchor drug or vehicle (DMSO) alone was added to each well as appropriate. Cells were treated to achieve a final concentration of 5 μg ml −1 (32.9 μM) cetuximab (obtained from the Addenbrookes' Hospital pharmacy), 10 nM trametinib (Selleckchem), or 1 μM BYL719 (Selleckchem). Vehicle treated wells received an equivalent volume of DMSO.

Day 4: Viability assay
To measure cellular ATP levels, the CellTiterGlo assay (Promega) was used. 2.5 μl of CellTiterGlo reagent was added to each well and incubated for 10 min. Luminescence was then measured using a Molecular Devices Paradigm plate reader. Changes in ATP levels were used as an indicator of overall cell viability.

Data analysis
The analysis of screen data entailed background correction, normalisation and scoring steps. Analysis was performed using custom R scripts, and is detailed below. Preliminary analysis of the raw data (Data Record 3) uncovered a relatively consistent diagonal viability gradient across each plate. The raw luminescence intensity readings for each plate of the primary screen were corrected for these position bias effects using a loess normalization approach 27,28 . This method was chosen as it performed better than the B-score method 29 , which we found to overfit the data and significantly increased the kurtosis of the dataset. Note that no correction method was required for the secondary screen data.
For each plate the luminescence intensity readings were background corrected by subtracting the mean value of blank wells. This removes any background noise that may result from the cell medium. Each well was then normalized to the mean of DMSO treated Non-targeting siRNA pool #2 negative control wells (24 wells) on that plate to obtain a relative viability score (Data records 1 and 3). Note that viability values were capped at a maximum of 1 in order to obtain meaningful Bliss additivity score values (below).
For quality control purposes, we calculated two Strictly Standardized Mean Difference (SSMD) 30-32 values for each plate, using the two positive siRNA controls (siPLK1 and siTOX), and passed plates if either SSMD value was greater than 3 30 . Using plates that passed the SSMD threshold, the biological replicate plates for each cell line were then averaged.
In order to identify hits in the DMSO anchor arm (i.e., effect of siRNA alone) z-scores were calculated on a per-cell line basis 29 . Z-score normalisation is used to scale the results to a standard normal distribution, using the mean and standard deviation of the experimental wells. This approach ensured that variation in siRNA transfection efficiency across the panel did not affect our ability to select important viability genes in each cell line.
To identify synergistic drug/siRNA combination hits, we calculated the Bliss additivity score 33 for each drug/siRNA combination across each cell line as following (where V is the observed relative viability): We then calculated the synergy score for each combination, by computing the difference between the expected Bliss additivity and the observed viability of the combination as following: overall viability. Many siRNAs that ranked highly with one anchor drug were also synergy hits with a second anchor, and so we designed the secondary screen so that all 38 siRNAs were rescreened against all three 'plus-drug' arms. The secondary screen utilised four siRNA sequences per gene, assayed separately (i.e., deconvoluted). The secondary screen data was then analysed using the same synergy score metric as previously, so that a synergy score was calculated for each individual siRNA duplex (Data records 2 and 3). For each siRNA duplex we determined whether the drug/siRNA combination reproduced a synergistic phenotype. The following threshold was used: If : Synergy primary > 0:15 AND Synergy duplex > Synergy primary -0:05 ; then reproduced ¼ TRUE: The number of siRNA duplex per cell line that scored as reproduced was then tallied for each combination.

Data Records
Data record 1 Primary siRNA screen data for all 27 cell lines are available at PubChem (Data Citation 1 to Data Citation 27). Assay ID accession numbers are provided in Table 2 (available online only). Screen-wide normalised data (negative control normalisation and z-score normalisation, where appropriate) are provided, as well as synergy scores for drug/siRNA combinations and the results of consequent binning strategies. The PubChem activity score indicates whether an siRNA was 'active' and binned as 'Essential' (designated 2, i.e., a screen hit in the DMSO arm) or 'inactive' (designated 1, i.e., not a screen hit). Samples are defined by siRNA catalogue number (Dharmacon) and Entrez Gene ID.

Data record 2
Secondary deconvolution siRNA screen data for 23 cell lines are available at PubChem (Data Citation 28 to Data Citation 50). Assay ID accession numbers are provided in Table 2 (available online only). Screenwide normalised data (negative control normalisation) are provided, as well as synergy scores for drug/ siRNA combinations and the results of consequent binning strategies. The PubChem activity score indicates whether an siRNA was 'active' and binned as 'Synergy (designated 2, i.e., a screen hit in any 'plus-drug' arm) or 'inactive' (designated 1, i.e., not a screen hit). Samples are defined by siRNA catalogue number (Dharmacon) and Entrez Gene ID.

Data record 3
Raw data for both the Primary siRNA screen and the Secondary deconvolution screen are available at Figshare (Data Citation 51). Data for all assay plates are provided, including those with SSMD o3 that failed our QC threshold. Details of genes targeted by the custom siRNA library used in the Primary siRNA screen are also included, as are details of the siRNA/drug combinations selected for rescreening in the secondary deconvolution screen.

Technical Validation
Control performance and plate QC The performance of positive and negative siRNA controls was quantified using the Strictly Standardised Mean Difference (SSMD) [30][31][32] (Table 3); a statistical measure of the dynamic range between positive to negative controls, encompassing the mean and standard deviation of each control. Generally, a desirable SSMD for an RNAi screen is ≥3 in the context of a strong control [30][31][32] . We calculated two SSMD scores for each assay plate, using each of the positive siRNA controls (siTOX and siPLK1), compared with the negative control Non-targeting siRNA pool #2 (NTPool#2). While most cell lines responded equally well to both siTOX and siPLK1, there were some cell lines that did not respond to siTOX while siPLK1 had a large viability effect, and vice versa. Therefore, for each assay plate the higher of the two SSMD scores was used to pass or fail the plate. Plates that passed the quality control criteria had an average NTPool#2/ siPLK1 SSMD of 4.23 and median of 4.09, and an average NTPool#2/siTOX SSMD of 4.12 and median of 4.16 (Table 3, Fig. 2) indicating very good data in the context of a strong positive control 27 .
Assay plates for four cell lines consistently failed ( Supplementary Fig. 2), and could be attributed to very low siRNA transfection efficiency, or, in the case of MDST8, a significant unexplained viability decrease in the presence of the Non-targeting siRNA controls. These cell lines were therefore excluded from the primary screen. Of the 416 total plates set up for the primary screen, a total of 310 passed our QC threshold (74.5%). In the secondary deconvolution screen 83 plates were assayed, of which 71 passed the QC threshold (85.5%). Two cell lines (HCC-56 and SNU-407) could not be rescreened in the secondary screen due to technical reasons.

Biological reproducibility across screening experiments
To assess the biological replicate plate reproducibility, the Pearson correlation coefficient was calculated for each set of duplicate plates per cell line (using all plates from DMSO and synergy arms). All cell lines were highly reproducible with an overall median correlation between biological replicates of 0.77 (standard deviation: 0.13) (Fig. 3).

Primary screen identified known candidates
The primary siRNA screen data (DMSO arm) returned a list of genes scored as essential (z-score o − 3) for each cell line (Data record 1). Summarising this gene list showed that, as expected, PLK1 (polo like kinase 1) was identified as essential (z-score o − 3) in 24 of the 27 cell lines, with a further 2 lines scoring o − 2.9. Other genes that were scored as essential in many of the cell lines include the cell cycle kinases AURKA (aurora kinase A) and WEE1, and SF3B1 (mRNA splicing factor 3b subunit 1) (Fig. 4a).
Cancer cells that harbour mutations in the KRAS gene are expected to be highly dependent on KRAS expression compared to KRAS wild-type lines. siRNA-mediated knockdown of KRAS was found to be lethal (z-score o − 3) for 7 of 14 KRAS mutant lines compared to 1 of 13 KRAS wild-type cell lines (mean z-score KRAS mutant = − 3.12, mean KRAS wild-type = − 0.37; P = 0.0002) (Fig. 4b). While we focused on identifying essential gene phenotypes that were observed across multiple cell lines, our dataset also allows the identification of vulnerabilities that are restricted to a single cell line. One example is the sensitivity of NCI-H716 to siRNA knockdown of FGFR2 (z-score = − 4.31). Medico et al. 34 recently reported that NCI-H716 has an amplification of FGFR2 and is sensitive to an FGFR inhibitor as a result.
The analysis of the primary screen (synergy arms) synergy scores showed that the combination of siRNA targeting PIK3CA with the anchor drug trametinib was identified as having a synergistic phenotype (Synergy score>0.15) in 7 cell lines (Data record 1), in keeping with the role of PI3K signalling in resistance to MEK1/2 (MAP2K1/2) inhibition 35 . While the reverse combination (BYL719 anchor drug with siRNA targeting MAP2K1) was not identified as synergistic, this is likely due to redundancy between MAP2K1 and MAP2K2. These results confirmed the ability of the screening methodology to identify

Number of combinations validated in the deconvolution screen
The deconvolution screen was designed to primarily confirm the top synergistic combinations. We did however also deconvolute siRNAs targeting the top essential genes PLK1 and SF3B1. The results showed that individual siRNAs often produced stronger viability phenotypes than that observed using siRNA SMARTpools in the primary screen (Data record 2). In the Primary screen the effective concentration of each siRNA sequence in a SMARTpool is~6 nM. However in the Secondary screen each individual siRNA sequence was 23.9 nM (4 × higher concentration). While this may contribute to stronger viability phenotypes in the Secondary screen, part of the rationale for using siRNA SMARTpools is that by combining different siRNA sequences together the effective concentration of each siRNA sequence can be lower as the different siRNAs can act in concert. The results of deconvoluted siRNAs validated the role of PLK1 (22/23 cell lines, where 2-4 of 4 siRNAs caused relative viability o0.5) and SF3B1 (16/23 cell lines, where 2-4 of 4 siRNAs caused relative viability o0.5) as essential genes in colorectal cancer. Of note is the fact that several genes (e.g., GUCY2D, CAMK2N1) that were ranked high in the primary screen (DMSO arm) list of essential genes were included in the secondary screen due to their role in synergistic combinations. Neither of these two genes were convincingly validated as single gene vulnerabilities (GUCY2D: 1/23 cell lines, and CAMK2N1: 1/23 cell lines) (Data record 2), despite a recent report which also identified these genes as essential genes in a pan-cancer siRNA screen 11 . This highlights the importance of a secondary deconvolution stage in any siRNA screening campaign.
The secondary screen validated a large proportion of synergistic drug/siRNA combinations. Table 4 summarises the top drug/gene combinations validating in the deconvolution screen. Over 75% of synthetic lethal combinations tested reproduced the primary screen synergy effect with at least two siRNA duplexes in at least one cell line. Fourteen combinations were highly validated with at least 2 siRNA duplexes, in three or more cell lines. Eight combinations were validated with at least 2 siRNA duplexes, in >75% of lines that showed the phenotype in the primary screen. This included siRNA targeting PIK3CA in combination with the anchor drug trametinib. Overall, the average synergy score for each combination across all cell lines in the secondary screen was well correlated with the average synergy score for that combination in the primary screen (Fig. 5). Mechanistic characterisation of novel synergistic combinations will be the subject of further publications.

Usage Notes
All siRNA screening data (Data records 1, 2 and 3) are provided so that users are able to investigate changes in viability and synergistic phenotypes by applying their own normalisation strategies and thresholds. This study focussed on identifying essential genes, across a heterogeneous panel of colorectal cancer cell lines. Genes that were scored as 'Essential' by siRNA knockdown alone (DMSO arm) can be investigated for their role in cell survival, with proof of principle being the lethality of PLK1 knockdown. The dependence of KRAS mutant cell lines on expression of KRAS, further demonstrates that the dataset will reveal associations between gene essentiality and genomic biomarkers.
While the aim of this study was to identify drug/siRNA combinations that showed a synergistic response, investigators may also want to analyse the overall cell viability measurements in order to identify effective combinations that yield high cell death. In the clinic these combinations could also have potential benefit for patients.