Detection of functional protein domains by unbiased genome-wide forward genetic screening

Establishing genetic and chemo-genetic interactions has played key roles in elucidating mechanisms by which certain chemicals perturb cellular functions. In contrast to gene disruption/depletion strategies to identify mechanisms of drug resistance, searching for point-mutational genetic suppressors that can identify separation- or gain-of-function mutations has been limited. Here, by demonstrating its utility in identifying chemical-genetic suppressors of sensitivity to the DNA topoisomerase I poison camptothecin or the poly(ADP-ribose) polymerase inhibitor olaparib, we detail an approach allowing systematic, large-scale detection of spontaneous or chemically-induced suppressor mutations in yeast or haploid mammalian cells in a short timeframe, and with potential applications in other haploid systems. In addition to applications in molecular biology research, this protocol can be used to identify drug targets and predict drug-resistance mechanisms. Mapping suppressor mutations on the primary or tertiary structures of protein suppressor hits provides insights into functionally relevant protein domains. Importantly, we show that olaparib resistance is linked to missense mutations in the DNA binding regions of PARP1, but not in its catalytic domain. This provides experimental support to the concept of PARP1 trapping on DNA as the prime source of toxicity to PARP inhibitors, and points to a novel olaparib resistance mechanism with potential therapeutic implications.


INTRODUCTION
In model organisms, genetic screens have long been used to characterize gene functions, to define gene networks, and to identify the mechanism-of-action of drugs (1)(2)(3)(4). The genetic relationships identified by such screens have been shown to involve positive and negative feedbacks, backups and cross-talks that would have been extremely difficult to discover using other approaches (5). Currently, the large majority of reported screens in model organisms and in mammalian-cell systems have used gene-deletion libraries and/or methodologies to inactivate gene functions, such as short-interfering RNA, CRISPR-Cas9 or transposon-mediated mutagenesis (6,7). While powerful, such approaches usually identify loss-of-function phenotypes, and only rarely uncover separation-of-function or gain-offunction mutations. This limitation is significant because such separation-or gain-offunction mutations -which can arise spontaneously or via the action of genotoxic agentscan dramatically affect cell functions or cellular response to chemicals, and can have profound impacts on human health and disease (8,9). Suppressor screens, either based on lethal genetic deficiencies and/or the use of drugs, have also facilitated the characterization of functionally relevant protein domains and sites of post-translational protein modification through the identification of relevant single nucleotide DNA variants (SNV)s (10).
In their simplest experimental setup, suppressor screens based on point-mutagenesis rely on four tools: (i) a genetically amenable organism or cell; (ii) a selectable phenotype; (iii) a method to create a library of mutants; and (iv) a method to identify mutations driving the suppressor phenotype amongst all the mutations in the library. Reflecting their relative amenability, these screens have mostly been carried out in microorganisms, either bacteria or yeasts, both of which benefit from the ability to survive in a stable haploid state. Despite not being strictly essential for such studies, a haploid state greatly improves the chances of identifying loss-of-function or separation-of-function recessive alleles, which would be masked in a diploid cell state (11). While the first three tools mentioned above are often amenable to a researcher, the lack of fast and efficient methods to bridge the knowledgegap between phenotype and genotype has discouraged the widespread implementation of suppressor screens based on point-mutagenesis. Indeed, until recently, recessive suppressor alleles could only be identified by labor-intensive methods involving genetic mapping and cloning in yeast, whereas the natural diploid state of mammalian cells largely precluded straightforward SNV suppressor screens in such systems.
Here, we describe an approach to overcome the above limitations that is based on sequencing genomic DNA extracted from various independent suppressor clones, followed by bioinformatic analysis. With small adaptations, this method can be applied to both the budding yeast Saccharomyces cerevisiae and other haploid model organisms, as well as to haploid mammalian cells (Figure 1). To highlight the utility of this approach, we describe its application to study resistance to the anti-cancer drugs camptothecin or olaparib, leading to the identification of various mutations in yeast TOP1 and in mouse Parp1, respectively. Importantly, we establish that drug target identification and mechanisms of drug resistance can be unveiled without a priori knowledge of the drug target.
Furthermore, if a sufficient number of suppressors is screened, this method also allows identification of functional protein domains required to drive drug sensitivity and resistance.

MATERIALS AND METHODS
Yeast suppressors of camptothecin sensitivity. S. cerevisiae strains used were derived from W303. All gene deletions were introduced by using one-step gene disruption, and were confirmed by PCR and whole-genome sequencing. Full genotypes of strains are described in Supplementary Table 1. Standard growth conditions (1% yeast extract, 2% peptone, 2% glucose, 40 mg/l adenine) were used. Strains YFP1001 and YFP1073 were mutagenized by adding 4.5% ethyl methane sulfonate (EMS) to liquid cultures in logarithmic growth-phase, pelleted by centrifugation and then resuspended in 50 mM Kphosphate buffer for 10 minutes, followed by EMS inactivation with 1 volume of 10% sodium thiosulfate. Suppressors were obtained by plating each strain on 10 YPD plates supplemented with 5 µg/ml of camptothecin (approximately 10 7 cells per plate). Resistant colonies were picked after 2-3 days of growth at 30°C and isolated by streaking on YPD plates. Suppression was confirmed by retesting camptothecin sensitivity of the isolated strains. Confirmed suppressors were processed for DNA extraction shortly thereafter, in parallel with 2-3 colonies of the initial strain ( Figure 1A).
Mouse embryonic stem cell suppressors of olaparib sensitivity. Haploid mouse AN3-12 embryonic stem cells (mESCs) (12,13) were used for all the experiments and were free from mycoplasma. Cells were grown in DMEM high glucose (Sigma) supplemented with glutamine, fetal bovine serum, streptomycin, penicillin, non-essential amino acids, sodium pyruvate, 2-mercaptoethanol and Leukemia inhibitory factor (LIF). All plates and flasks were gelatinized before cell seeding.
Cell sorting for DNA content was performed on mESCs by using a MoFlo flow sorter (Beckman Coulter) after staining with 15 μg/ml Hoechst 33342 (Invitrogen). The 1n peak was purified to enrich for haploid mESCs.
Mutagenesis with EMS was performed as described previously (14) with the following adjustments: after cell sorting, haploid-enriched cells were grown in DMEM plus LIF for overnight EMS treatment. After EMS treatment, cells were cultured for five passages in DMEM plus LIF and plated into 6-well plates at a density of 5 × 10 5 cells per well. Cells were then treated with 6 μM of olaparib (AZD2281; Stratech Scientific Ltd.) for 6 days, supplying new medium with olaparib daily. Cells were then grown for another four days without olaparib until mESC colonies could be isolated. manufacturers protocol from there. Genomic DNA was eluted from the columns using 200μl distilled water. A second elution was performed if the yield of the genomic DNA obtained was lower than 2 μg. Genomic DNA was stored at -20°C short-term before sequencing.
Extracted DNA was tested for total volume, concentration and total amount by using gel electrophoresis and the Quant-iTTM PicoGreen® dsDNA Assay Kit (ThermoFisher Scientific). Genomic DNA -500 µg (yeast) or 1-3 μg (mouse) -was fragmented to an average size of 100-400bp (mouse) or 400-600bp (yeast) by using a Covaris E210 or Analysis of DNA sequence data to identify suppressor mutations ( Figure 1C) Alignment of DNA sequencing data. Sequencing reads were aligned to the appropriate reference genome using BWA aln (v0.5.9-r16) (15). The S. cerevisiae S288c assembly (R64-1-1) from the Saccharomyces Genome Database was obtained from the Ensembl genome browser. For mouse samples, the Mus musculus GRCm38 (mm10) was used. Where appropriate, all lanes from the same library were merged into a single BAM file, and PCR duplicates were marked by using Picard Tools (Picard version 1.128). The quality of the sequencing data post-alignment was assessed by using SAMTools stats and samtools flagstats (1.1+htslib-1.1), plot-bamstats, bamcheck and plot-bamcheck (16). Variant prioritization. Variants were prioritized by their Ensembl VEP (17) predicted consequence: we retained variants predicted to cause a frameshift, a premature stop codon, a missense mutation, a lost start/stop codon, a synonymous mutation, an in-frame insertion or deletion, and in case of mouse data those annotated to affect splice donor/acceptor bases. Genes were prioritized by ranking them by the number of distinct mutations identified in each gene.

Analysis of olaparib resistant cell lines
Molecular modeling. Molecular models were generated by using pymol. Crystal structure data were obtained from RCSB Protein Data Bank (PDB). The codes for PARP1 structures were 4DQY (human PARP1 without Zn2 or BRCT domain) and 3ODC (human PARP1 Zn2). PARrylation assay. Cells were treated with 6μM Olaparib overnight, followed 1mM H 2 O 2 for 10 minutes in the dark, washed with ice-cold PBS and collected. Cells were lysed in Laemmli buffer (120mM TrisHCl pH6.8, 4% SDS, 20% glycerol) and lysates were separated on 4-20% Tris-Glycine gradient gel followed by transfer onto PVDF membrane. Membranes were immunoblotted with the appropriate antibodies. Experiments were repeated at least twice.

Identification of TOP1 mutations conferring camptothecin resistance.
To demonstrate the utility of the procedure described above, we sought to identify mutations imparting resistance to camptothecin, a DNA topoisomerase 1 inhibitor (27)(28)(29).
Genomic DNA sequencing of the resistant clones highlighted TOP1 as the gene carrying the largest number of unique mutations in our dataset, as expected for it being the drug target.
The second most mutated gene -PDR1 -carried 11 unique mutations, 10 of which did not co-occur with mutations in TOP1, whereas all the mutations found in the third most mutated gene (GLT1) co-occurred with mutations in either TOP1 or PDR1 (Figure 2A and data not shown). Globally, out of the 251 yeast strains sequenced, 191 contained one or more mutation in TOP1 (Figure 2B, light yellow). Furthermore, by manual inspection, we found that 27 additional strains carried mutations in TOP1 (Figure 2B, dark yellow); the inability to automatically detect these mutations was caused by the fact that these strains were either not pure clones, or they carried large (>25bp) deletions in TOP1 ( Figure 2B and Supplementary Figure 1A). To the list of TOP1-mutated suppressors strains, we added another 38 suppressors bearing TOP1 mutations that we had identified in previous, published screens (31,35), bringing the total number of TOP1 mutants analyzed to 256.
Missense, nonsense and frameshift TOP1 mutations were roughly equally represented in the non-mutagenized samples. However, where samples had been mutagenized with EMS the vast majority of mutations were nonsense or missense base substitutions (Figure 2C).
In the few cases in which the same suppressor clone contained missense and nonsense mutations in TOP1, the suppressive effect was attributed to the gained STOP codon.
When the positional distribution of each mutation type was plotted, nonsense and frameshift mutations were shown to be quite evenly distributed along the length of the TOP1 open reading frame (Figure 2D and 2E). The prediction is that such mutations either result in null alleles -as the prematurely-terminated messenger RNA (mRNA) would be degraded by nonsense-mediated decay mechanisms (36) -or would give rise to an unstable protein or a truncated version that could retain partial activity. Since the Y727 residue is essential for the catalytic activity of Top1, truncation before this residue is predicted to produce a non-functional protein (37,38). As might be expected, the distribution of nonsense mutations loosely correlated with them arising from codons in the open reading frame that only required one nucleotide change to change them to a STOP codon (Supplementary Figure 1B). Notably, the observed enrichment of frameshifts near the 5' end of the TOP1 transcript was localized to an 8-nucleotide homopolymeric adenine tract that is presumably particularly susceptible to mutagenesis (Supplementary Figure   1C).
In striking contrast to the situation with nonsense or frameshift mutation, missense mutations were localized to specific regions of the TOP1 protein-coding sequence, overlapping with known functional domains of Top1. Indeed, the vast majority of mutations identified localized within three distinct regions of the larger DNA binding and catalytic domain, while a minority was located in the smaller C-terminal domain, essential for catalysis ( Figure 2F).
Functional consequences of the amino acid residue changes induced by missense mutations were assessed by using PROVEAN and PredicProt (24,25). These tools use chemical properties of amino acid residues and phylogenetic conservation to predict whether or not a particular substitution is likely to be functionally tolerated by the protein analyzed. Both these methods suggested that the vast majority of the TOP1 mutations we identified in camptothecin resistant strains were likely to produce deleterious effects (PROVEAN score < -2.5; PredictProtein score >50) ( Figure 3A). Notably, missense mutations located in the C-terminal domain of Top1 affected both conserved and non-conserved residues and were primarily positioned in the vicinity of the catalytic residue Y727, although three substitutions were closer to the C-terminus of the protein (Figure 3B).  Figure 3D; the Lip2 domain also contains an active-site residue, R420). Remaining mutations clustered between amino acid residues 500 and 600, which encompass the end of the DNA binding/catalytic domain and the base of the coiled-coil linker domain. In this region two other active site residues (R517 and H558) are located ( Figure 3D).
Collectively, these results showed that even with no a priori knowledge, our approach for identifying suppressor strains and associated mutations would have identified Top1 as the likely target of camptothecin and would have highlighted the critical Top1 domains functionally relevant for Top1 activity and drug hypersensitivity.

Identification of mouse Parp1 mutations conferring olaparib resistance
Based on a similar approach to that described above, we recently identified genes whose mutation in haploid mammalian cells causes resistance to the anti-metabolite drug 6thioguanine (41). To further highlight the wider applicability of our approach in mammalian cell systems, we carried out a screen to identify mutations that allow haploid mouse cells to survive in the presence of the anti-cancer agent olaparib, a potent smallmolecule inhibitor of the DNA-repair protein PARP1 (Poly ADP-ribose polymerase 1) (42,43). Thus, wild-type, haploid mouse embryonic stem cells (mESCs) were mutagenized by using EMS, and mutant libraries were screened for resistance to olaparib (Figure 1A).
Forty-five olaparib-resistant clones were isolated and subjected to whole-exome sequencing.
Analysis of ensuing sequence data for putative, acquired mutations, revealed Parp1 as the most mutated gene in the dataset with 25 different mutations detected (Figure 4A,   Supplementary Table 3). Globally, 40 out of the 45 clones harbored Parp1 mutations ( Figure 4B, Supplementary Table 3). Further manual examination of the aligned sequencing data from the five remaining clones revealed that four of these also likely Of the Parp1 mutations we detected, more than half led to premature termination codons, splice acceptor/donor, or frameshift mutations, which would likely lead to the production of aberrant mRNAs subject to nonsense-mediated decay and/or the generation of unstable, truncated PARP1 protein. As we previously noted for premature-termination mutations in yeast TOP1, these mutations did not cluster in any particular domains of the Parp1 open reading frame (Figure 4C). Furthermore, similar to what we observed in yeast, EMS treatment resulted in an overrepresentation of single nucleotide variants, compared to frameshift mutations ( Figure 4C).
Strikingly, missense mutations detected in Parp1 were more frequent in the N-terminal part of the protein, where the DNA-binding domains reside, while no such mutations were observed in the catalytic domain ( Figure 4C). Analysis of PARP1 protein levels indicated that other than G400R and A610V, which resulted in complete loss or marked reduction of the PARP1 protein product, no other of the identified missense mutations impacted on PARP protein stability (Figure 5A, Supplementary Figure 3). Computational predictions for the likely consequences of these remaining missense mutations on protein structure and function suggested that all of them were likely to be functionally deleterious ( Figure   4D). Due to the fact that all these missense mutations localized within domains known to be involved in DNA binding, we examined their locations relative to the DNA-protein interface as defined by previously published PARP1 structures (26,44) (Figure 5B).
Notably, most of the missense mutations affecting residues in the DNA binding domains clustered at the DNA-protein interface, and did so in proximity to residues that make key DNA contacts (26).
Without any a priori knowledge about how olaparib causes cell toxicity, the above data would have suggested that such toxicity is largely driven by a mechanism connected to PARP1 DNA binding. To test this idea, we assessed the missense mutations identified in the DNA binding domains of PARP1 for their potential effects on the ability of PARP1 to bind a double-stranded DNA oligonucleotide. Significantly, this analysis revealed that all the point mutants that did not reduce PARP1 levels showed reduced levels of DNA binding when compared to the wild-type PARP1 protein ( Figure 5C, Supplementary Figure 3).
Consistent with PARP1 DNA binding triggering its auto-modification by poly ADP-ribose, we found that the PARP1 S568F mutation, which impairs DNA binding, did not exhibit evidence of parylation when cells were treated with hydrogen peroxide (Figure 5D). These findings were therefore in accord with the fact that toxicities of PARP inhibitors such as olaparib are linked to their ability to trap PARP1 on DNA by blocking its catalytic activity (42).
The last clone without an assigned suppressor mutation (C1) may also carry one or more mutations in the non-exonic regions of the Parp1 gene, or epigenetic modifications altering PARP1 expression, since we could not detect the presence of PARP1 protein in this clone ( Figure 5A). Taken together, these results are consistent with a model in which olaparib resistance can arise either from loss of PARP1 or from its decreased ability to bind DNA ( Figure 5E).

DISCUSSION
Various approaches have been described for systematic identification of genetic and chemo-genetic interactions. Until recently, this search has been largely conducted using approaches based on gene inactivation, either in arrayed or pooled assays. While these approaches have played crucial roles in determining gene-gene and gene-drug interactions, their limited power of resolution does not in general provide information regarding the functional protein domains relevant for the identified interaction. While transposon-based mutagenesis has recently been shown to provide some information at the domain level, this approach is only applicable to loss-of-function mutations, and is biased towards C-terminal domains of proteins (45). In contrast, SNV based approaches can provide a higher level of resolution, and in many cases produce unanticipated results (10,(46)(47)(48). Lack of rapid and facile procedures to bridge the phenotype-to-genotype gap has until recently, however, precluded the use of these techniques on a high-throughput scale. The approach we have described allows the identification of SNV driving drug resistance or resistance to essentially any selective growth condition in a systematic and unbiased way (other than any bias imposed by the mutagenic agent of choice). Importantly, this approach can equally be applied to yeast and to more complex eukaryotes, bringing the power of high-resolution genetic screens to mammalian systems. While we acknowledge that we have carried out our screens with strong selectable cell-viability phenotypes, we envisage applicability in more complex scenarios, for example involving FACS-based selection or cell migration, motility or attachment assays. Highlighting this potential, our results show how, with no previous knowledge, Top1 and PARP1 would have been identified as the most likely targets for the drugs camptothecin and olaparib, respectively.
Toxicity to PARP inhibitors was initially linked to the involvement of PARP1 in the repair of single-strand DNA breaks (49,50), but more recent data challenged this view (51). The fact that loss of PARP1 drives resistance to PARP inhibitors in wild type genetic backgrounds (52) indeed suggests that inhibition of PARP1 catalytic activity -and not the accumulation of unrepaired DNA lesions -is the major effector of toxicity in such genetic backgrounds.
Indeed, recent findings suggest that PARP1 trapping onto DNA, caused by inhibition of its catalytic activity, is the main cause of toxicity (42). Our data further support this model, as all the suppressor clone variants which we identified that did not result in loss of PARP1 protein negatively affected its binding to DNA. Preventing PARP1 binding to DNA thus appears to be sufficient to circumvent the toxicity of PARP inhibitors, and the fact that we identified no mutants specifically defective in catalytic function reinforces the idea of trapped PARP1 as the main cytotoxic lesion for olaparib in wild-type mammalian cells. This may not only be important for our understanding how PARP inhibitors function but also for mechanisms of intrinsic or evolved tumor resistance towards such clinical agents in patients.
As we have exemplified by our analyses of TOP1 and PARP1, the level of detail on critical functional domains and residues increases with the number of samples sequenced. Because of their genome size, screens based in mammalian systems require greater sequencing power than screens conducted in simpler organisms such as yeast. Moreover, as compared to yeasts, the more complex genome architecture in mammalian systems -where there is more intergenic DNA, a larger number of genes and an abundance of intronic sequencesincreases the chances of isolating variants affecting protein levels, rather than protein function. One solution to bypass such issues will be to run two-tiered screens, initially using whole exome sequencing on a subset of suppressors in order to identify top gene hits driving resistance, and then using targeted exome sequencing to test the rest of the samples, either through analysis of various individual clones or bulk sequencing of the resistant population. In addition, we can envision alternative scenarios where a gene identified in an initial screen could be marked/tagged in a way to allow selection of mutations that affect protein function but not protein levels. This approach can also be combined with CRISPR-Cas9-mediated in vivo targeted mutagenesis, via a library of gRNAs directed towards the exonic regions of the gene (Christopher Lord, personal communication). We anticipate that such developments, along with expected further increases in sequencing throughput and associated cost reductions, will pave the way for hitherto unprecedented genetic analyses on comprehensive and systematic scales.

Data Availability
Yeast DNA sequencing data are available from European Nucleotide Archive (ENA) under the accession code PRJEB2977, and mouse DNA sequencing data are available under accession code PRJEB13612. Access codes for specific samples are detailed in Supplementary  Figure 3.

Code Availability
Custom code used to analyse sequencing data and to draw figures is available in the    PredictProtein (y-axis; a score above 50 is considered deleterious) and PROVEAN (x-axis; a score below -2.5 is considered deleterious). Datapoints are coloured by the domain the missense mutations occur in: Light blue, dark blue and green are the three Zinc fingers, respectively, purple denotes the BRCT domain, while red indicates the WGR domain.