Abstract

We developed a systematic approach to map human genetic networks by combinatorial CRISPR–Cas9 perturbations coupled to robust analysis of growth kinetics. We targeted all pairs of 73 cancer genes with dual guide RNAs in three cell lines, comprising 141,912 tests of interaction. Numerous therapeutically relevant interactions were identified, and these patterns replicated with combinatorial drugs at 75% precision. From these results, we anticipate that cellular context will be critical to synthetic-lethal therapies.

Main

Simultaneous mutation of two genes can produce a phenotype that is unexpected in light of each mutation's individual effect1. This phenomenon, known as genetic interaction, identifies an underlying functional relationship between the genes, such as contributions to the same protein complex or pathway2. Mapping these functional relationships in a systematic fashion has broad applicability for advancing fundamental understanding of biological systems3,4,5. Genetic interactions also have implications for therapeutic development, for instance, in cancers in which negative or 'synthetic-lethal' interactions via simultaneous disruption of both genes cause cell killing6. The feasibility of this approach has been demonstrated with the recent approval of the drug olaparib, an inhibitor of PARP1/2, specifically for tumors with loss-of-function mutations in BRCA1 or BRCA2. However, further applications of synthetic-lethal cancer therapy have been limited by poor understanding of the important genetic interactions in cancer cells and how these vary from one cancer type to another or from patient to patient7,8.

To enable systematic mapping of genetic-interaction networks, we developed a CRISPR–Cas9 screening methodology for targeting single genes and pairs of genes in a high-throughput format. In the CRISPR–Cas9 system, a guide RNA (gRNA), in complex with the Cas9 protein, targets genomic sequences homologous to the gRNA9,10. Targeting new genomic elements entails modifying the gRNA sequence, thus enabling many targeted genome-editing and regulation capabilities9. Notably, Cas9 also enables easy multiplex targeting via delivery of multiple gRNAs per cell11. Here, we combined multiplex targeting with array-based oligonucleotide synthesis11,12,13,14 to create dual-gRNA libraries covering up to 105 defined gene pairs (Fig. 1a and Supplementary Figs. 1 and 2a–d). In these libraries, each construct bears two gRNAs, each of which is designed to target either a gene or a scrambled nontargeting sequence absent from the genome. Thus, all combinations of gene–gene (double-gene perturbation) and gene–scramble (single-gene perturbation) are exhaustively assayed for effects on cell growth. Notably, in our approach, both spacers for a dual-gRNA construct are directly specified during oligonucleotide synthesis, thereby enabling the library constituents to be exactly defined to facilitate custom gRNA pairing. By enabling determination and comparison of single-gene- and dual-gene-perturbation effects in the same assay, this approach allows for the systematic quantification of genetic interactions in humans.

Figure 1: Experimental and analytical framework for identification of genetic interactions with combinatorial CRISPR knockout.
Figure 1

(a) Schematic of overall experimental approach: array-based oligonucleotide synthesis is used to create dual-gRNA libraries containing all gene–gene (double-gene perturbation; C2) and gene–scramble (single-gene perturbation; C1) combinations, which can then be assayed for effects on cell growth. (b) Schematic of computational analysis workflow: CRISPR screens are run as two independent replicate experiments, cells are harvested at four time points, and gRNA frequencies are determined by high-throughput sequencing. All gRNAs below a threshold (red dash) are excluded from further analysis. Fitness is determined from a fit of log relative abundance over time; probes are subsequently ranked on the basis of absolute fitness and weighted, and then a numerical Bayesian method is used to test for the presence of a genetic interaction.

We conducted genetic-interaction screens by transducing the dual-gRNA lentiviral library into a population of cells stably expressing Cas9, maintaining these cells in exponential growth over the course of four weeks, then sampling the relative changes in gRNAs at multiple time points: days 3, 14, 21 and 28 post-transduction (Online Methods). To robustly quantify gene fitness and genetic interactions, we developed a computational analysis framework that integrates all samples across the multiple days of the experiment. This method (i) detects and removes gRNA constructs with insufficient read coverage; (ii) fits growth curves to the measured log2 abundances of each construct over time, the slopes of which reflect fitness; and (iii) integrates data from the multiple gRNA constructs to derive a robust fitness value for disruption of each gene, fg, and gene pair, fg,g. Finally (iv), a genetic-interaction score, πgg, is calculated as the difference between the observed and the expected fitness of the double-gene knockout (Fig. 1b and Supplementary Fig. 3a–d). Significant departures from expected (π <3σ or π >3σ) are called as negative or positive genetic interactions, respectively. A negative interaction indicates slower-than-expected growth, thus suggesting synthetic sickness or lethality, whereas a positive interaction indicates faster-than-expected growth, thus suggesting epistasis.

Using this method, we evaluated all pairwise gene-knockout combinations among a panel of 73 genes divided between tumor-suppressor genes (TSGs) and cancer-relevant drug targets (DT), a subset of which were also verified oncogenes15 (Fig. 2a and Supplementary Table 1). Experiments were performed in three cell lines: HeLa, a human papilloma virus–positive cervical cancer cell line; A549, a KRAS G12S-mutant lung cancer cell line; and 293T, an SV40 large T antigen–transformed embryonic kidney cell line. With nine gRNA pairs per combination, the library comprised 23,652 double-gene-knockout constructs and 657 single-gene-knockout constructs; testing two replicates in each cell line yielded a total of 141,912 unique tests of interaction (Supplementary Table 2a). Measurements of gene fitness (fg) were well correlated between biological replicates in the same cell line (HeLa, Pearson r = 0.96, two-tailed P = 4.2 × 10−40; A549, Pearson r = 0.94, P = 1.2 × 10−37; and 293T, Pearson r = 0.97, P = 1.5 × 10−44), as were the π scores for significant genetic interactions (HeLa, r = 0.81, P = 4.7 × 10−18; A549, r = 0.65, P = 2.9 × 10−8; and 293T, r = 0.79, P = 4.7 × 10−4; Supplementary Fig. 4a–f and Supplementary Table 2b).

Figure 2: Genetic interactions in HeLa, A549 and 293T cancer cells.
Figure 2

(a) Genes selected for study included tumor-suppressor genes (TSG) and cancer-relevant drug targets (DT), which included many oncogenes. (b) Scatter plot of fitness of single-gene knockout (KO) in HeLa versus A549. (c) Scatter plot of interaction scores in HeLa versus A549, generated with the smoothScatter R function with default settings. The density of gene pairs at each x,y location is represented by darkness of blue shading; single gene pairs in low-density regions are marked by black dots. (d) Proportional Venn diagram summarizing the number of synthetic-lethal interactions per cell line and the number conserved between each cell-line pair. (e) Combined synthetic-lethal network for all three cell lines. Circles, TSGs; squares, DTs. Node colors indicate the single-gene-knockout fitness effect: red, positive fitness effect; blue, negative fitness effect. Thick black borders around nodes indicate that the protein product of the gene is the target of an FDA-approved drug. The line colors indicate the cell lines in which the interaction was identified: blue, HeLa; red, A549; green, 293T; black, multiple cell lines.

Moreover, we observed a significant correlation between the total number of genetic interactions identified for a gene and its single gene fitness (HeLa, r = 0.77, two-tailed P = 3.4 × 10−10; A549, r = 0.45, P = 0.0018; and 293T, r = 0.77, P = 9.0 × 10−10; Supplementary Fig. 4g), thus suggesting that genes that are network 'hubs' may be more functionally important than genes with fewer interactions. Such a relationship has previously been observed in model organisms but has not previously been reported in humans5.

We next moved from comparison between replicates to comparison among the three cancer cell lines. First, we found a lower but significant correlation of the single-gene fitness scores across pairs of cell lines (Fig. 2b, Supplementary Fig. 4h,i and Supplementary Table 3a). Differences in these fitness scores recapitulated known biological differences, including the large positive growth effect of TP53 knockout in A549 but not HeLa or 293T, in which TP53 is already inactivated by viral proteins. Gene fitness scores did not significantly correlate with gene expression in any of the three cell lines (Supplementary Fig. 5a–c). Genes with very low or no expression had fitness scores very near the average for that cell line, in agreement with a neutral growth effect.

Second, we found that the genetic interactions identified from these data were different among cell lines (Fig. 2c and Supplementary Fig. 4j,k). A total of 152 synthetic-lethal (negative) genetic interactions were identified in HeLa, A549, or 293T cells (false discovery rate of 0.3; Fig. 2d, Supplementary Fig. 6a–c and Supplementary Table 3b,c). Of these, 16 (10.5%) were identified in multiple cell lines, and no interactions were common to all three cell lines. The remaining 136 interactions were 'unique' to a cell line (HeLa, 38 of 52; A549, 43 of 57; and 293T, 55 of 59; Fig. 2e and Supplementary Fig. 6d–f). Additionally, there were eight positive genetic interactions (epistasis) identified in HeLa, two in 293T and none in A549. Among all of these discoveries, we found that 28 interactions had previously been identified, including the therapeutically relevant interactions BRCA1–PARP1 (ref. 6) and PTEN–MTOR16.

We next sought to validate these findings, particularly the discrepancies across cell lines. We selected eight pairs of DT genes for which a synthetic-lethal genetic interaction had been identified in only HeLa or A549 cells. Rather than simply reproducing the dual CRISPR knockout experiment (gene–gene interaction), we sought to examine the viability of cells exposed to drugs inhibiting the corresponding gene products (drug–drug interaction), evaluating whether the interaction could be identified by an independent technology at the protein level and whether it was also accessible therapeutically. Drug–drug assays validated six of eight interactions during testing in the cell line for which the interaction had been first observed by dual CRISPR (75% precision or positive predictive value). In contrast, for gene pairs tested in a cell line for which an interaction had not been implicated by dual CRISPR knockout, only two of eight pairs showed an interaction in drug–drug assays (75% negative predictive value, Supplementary Figs. 7a–g and 8a–j and Supplementary Table 4). Thus, the differences in genetic interaction across cell lines, as identified by systematic CRISPR screens were largely reproduced as drug–drug interactions in small-scale assays.

In the future, by allowing for genetic-interaction mapping directly in eukaryotic cells, our combinatorial CRISPR–Cas9 technology may pave the way for systematic determination of cancer pathways, with twofold applications: improving understanding of how networks of genes influence tumorigenesis and aiding in the development of precision therapeutics via new druggable synthetic-lethal interactions. Recognizing that there may be great diversity in genetic interactions among different tumors, it will be important to perform these studies across a large number of samples; such a broad approach should be enabled by the high-throughput method presented here. We also note the importance of gRNA efficacy and anticipate that improvements in gRNA design17,18,19 to increase the editing rate and decrease false negatives, as well as use of gRNAs that specifically target functional protein domains20, will be critical to further scale these experiments and improve consistency. We also note that the variability of Cas9 expression between individual cells, and from one cell line to another, may also affect perturbation efficiency. In the future, we believe that integrating results from complementary perturbation strategies such as CRISPR inhibition and activation, as well proteomic and chemogenetic studies, should enable the generation of more comprehensive interaction maps7. Finally, this experimental and analytic framework is not unique to cancer cell cells and can readily be applied to systematically map the genetic architecture of complex biological systems and diseases in any eukaryotic system amenable to lentiviral transduction and growth in culture.

Methods

The protocol for the dual gRNA library cloning can be found in ref. 21.

Dual-gRNA-library cloning.

Preparation of the dual-gRNA library involved a two-step cloning process whereby each synthesized oligonucleotide was assembled progressively with promoters and 3′-gRNA scaffolds21 (Supplementary Fig. 1). This multistep protocol is critical, because array-based oligonucleotides from commercial vendors have a maximum length of 300 bp, whereas a dual-gRNA cassette is 1,000 bp in size; thus, additional steps of cloning are needed to reconstitute the full sequence. We optimized the library efficacy by eliminating large repeat sequences in the dual-gRNA vectors, because such repeats could potentially compromise both viral production and sequencing quality. Toward this goal, we chose nonhomologous polymerase III promoters (hU6 and mU6), on the basis of their comparable activity22. We also explored mutagenized gRNA scaffold sequences to further increase sequence diversity while maintaining the primary hairpin loops in the gRNA scaffold (via G–C versus A–U interactions)23,24,25. Experiments showed that although engineered versions 2 and 3 were active, the wild-type scaffold and version 4 showed the most consistent activity (Supplementary Fig. 2a,b) and were therefore used for all subsequent studies. We also confirmed that the two gRNA positions in the construct were equally functional (hU6 gRNA and mU6 gRNA), and thus the dual-gRNA libraries did not need to include both positions for each gRNA (Supplementary Fig. 2c). Additionally, we confirmed that the two gRNAs were simultaneously active by targeting both EGFR and mCherry. There was a moderate decrease in activity when two guides were expressed in a dual-gRNA format, but each guide remained equally functional in both positions and with both gRNA scaffolds (Supplementary Fig. 2d).

To construct combinatorial libraries that exhaustively interrogated the network of genetic interactions among a panel of genes, our approach was to design three gRNAs against each gene. Additionally, three gRNAs were designed as nontargeting controls. Dual-gRNA lentiviral constructs were then synthesized for all pairwise gRNA combinations between genes (double perturbations) and between genes and scrambled sequences (single perturbations). This format resulted in nine pairwise gRNA constructs per gene pair. The first step was to assemble the paired gRNAs into a backbone vector, and in the next step, a fragment including both the first gRNA scaffold and a mouse U6 promoter was inserted between the paired gRNAs.

Step I: paired-gRNA cloning. The pooled oligonucleotide libraries were synthesized by CustomArray. Full-length oligonucleotides with dual-gRNA spacers (i.e., 20-bp sequences used for targeting desired genes) were amplified by PCR with Kapa Hifi (Kapa Biosystems). PCR reactions were set up according to the manufacturer's protocol, with 1 μL of synthesized oligonucleotide template (typically 20 ng), an annealing temperature of 55 °C and an extension time of 15 s. The numbers of cycles were tested to ensure that they fell within the linear phase of amplification; 28 cycles were used in this experiment. The primer sequences were OLS_gRNA-SP_F, TATATATCTTGTGGAAAGGACGAAACACCG; OLS_gRNA-SP_R, CTTATTTTAACTTGCTATTTCTAGCTCT.

To obtain high-yield coverage of the PCR products, ten repeats of 50-μL PCR reactions were performed for each library. The 144-bp amplicons were separated via 2% agarose gel electrophoresis and purified with a QIAquick Gel Extraction Kit (Qiagen). Subsequently, the gRNA-LGP vector (Addgene no. 52963) was digested with BsmBI (NEB) via the following reaction at 55 °C for 3 h: gRNA-LGP vector, 4 μg; buffer 3.1, 5 μL; 10× BSA, 5 μL; BsmBI, 3 μL; H2O up to 50 μL.

After digestion, the vector was treated with 2 μL of calf intestinal alkaline phosphatase (NEB) at 37 °C for 30 min, then purified with a QIAquick PCR Purification Kit (Qiagen). To assemble the paired gRNAs into the vector, ten Gibson assembly reactions were performed as follows: linearized gRNA-LGP vector, 200 ng; dual-gRNA inserts, 36 ng (molar ratio 1:10); 2× Gibson Assembly Master Mix (NEB), 10 μl; H2O up to 20 μL.

After incubation at 50 °C for 1 h, the product was purified with a QIAquick PCR Purification Kit (Qiagen) and then transformed into One Shot Stbl3 chemically competent Escherichia coli (Invitrogen). Twenty parallel transformations were performed to ensure adequate library representation. A small fraction (20–100 μL) of cultures was spread on carbenicillin (50 μg/ml) LB plates to calculate the library coverage, and the rest of the cultures were amplified overnight in 150 ml LB medium; 100× library coverage was ensured. The plasmid DNA was then extracted with a HiSpeed Plasmid Maxi Kit (Qiagen), and 20 independent clones were picked and Sanger-sequenced to estimate the overall quality of the library.

Step II: insertion of the gRNA scaffold and the mouse U6 promoter. The step 1 library plasmids were digested with BsmBI (NEB), in the following reaction at 55 °C for 3 h: step 1 library, 4 μg; buffer 3.1, 5 μL; 10× BSA, 5 μL; BsmBI, 3 μL; H2O up to 50 μL.

After digestion, the linearized plasmids were treated with 2 μL of calf intestinal alkaline phosphatase (NEB) at 37 °C for 30 min, and cut plasmids were gel-purified via 0.6% agarose gel electrophoresis and QIAquick gel extraction (Qiagen).

Concurrently, the step 2 inserts, synthesized commercially (Integrated DNA Technologies) and cloned into a TOPO vector, were digested with BsmBI (NEB), in the following reaction at 55 °C for 3 h: purified step 2 insert PCR product, 0.8 μg; buffer 3.1, 5 μL; 10× BSA, 5 μL; BsmBI, 3 μL; H2O up to 50 μL.

The sequence of the step 2 insert, with the left gRNA scaffold underlined and mU6 promoters in bold, was TATGAGGACGAATCTCCCGCTTATACGTCTCTGTTTCAGAGCTAT  GCTGGAAACTGCATAGCAAGTTGAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG  AGTCGGTGCTTTTTTGTACTGAGTCGCCCAGTCTCAGATAGATCCGACGCCGCCAT  CTCTAGGCCCGCGCCGGCCCCCTCGCACAGACTTGTGGGAGAAGCTCGGCTACTCCCC  TGCCCCGGTTAATTTGCATATAATATTTCCTAGTAACTATAGAGGCTTAATGTGCGATAA  AAGACAGATAATCTGTTCTTTTTAATACTAGCTACATTTTACATGATAGGCTTGGATTTCTA  TAAGAGATACAAATACTAAATTATTATTTTAAAAAACAGCACAAAAGGAAACTCACCCTAA  CTGTAAAGTAATTGTGTGTTTTGAGACTATAAATATCCCTTGGAGAAAAGCCTTGTTT GAGAGACGGTACAAGCACACGTTTGTCAAGACC.

Subsequently, the following ligation reaction was set up, involving overnight incubation at 16 °C and subsequent heat inactivation at 65 °C for 10 min: 10× T4 DNA ligase buffer, 2 μL; step 1 library, digested, 100 ng; step 2 insert, digested, 100 ng; T4 DNA ligase (high concentration), 1 μL; H2O up to 20 μL.

4 μL of the reaction was transformed into 100 μL of ElectroMAX Stbl4 competent cells (Invitrogen) according to the manufacturer's protocol, with an Eppendorf Electroporator. A small fraction (1–10 μL) of cultures was spread on carbenicillin (50 μg/ml) LB plates to calculate the library coverage, and the remainder was plated on ten 15-cm LB–carbenicillin plates and grown overnight at 37 °C for amplification. Two transformations were required to obtain 100× library coverage. The plasmid DNA was extracted with a HiSpeed Plasmid Maxi Kit (Qiagen). Library diversity was determined by deep sequencing.

NGS library preparation.

Harvested cell pellets were stored at −80 °C until extraction of genomic DNA with a DNeasy Blood and Tissue Kit (Qiagen). The dual-gRNA cassette was amplified and prepared for deep sequencing through two steps of PCR. The first step was performed as ten separate 50-μL reactions with 2 μg input genomic DNA per reaction (total of 20 μg for each sample) with Kapa Hifi. The PCR primers were as follows: NGS_dual-gRNA_SP_Lib_F, ACACTCTTTCCCTACACGACGCTCTTCCGATCT TATATATCTTGTGGAAAGGACGAAACACCG; NGS_dual-gRNA_SP_Lib_R, GACTGGAGTTCAGACGTGTGCTCTTCCGATCT CCTTATTTTAACTTGCTATTTCTAGCTCTA.

The thermocycling parameters were: 95 °C for 30 s; 21–26 cycles of 98 °C for 15 s; 55 °C for 15 s; and 72 °C for 45 s); and 72 °C for 5 min. The numbers of cycles were tested to ensure that they fell within the linear phase of amplification. Amplicons (600 bp) of ten reactions for each sample were pooled, size-selected and purified with Agencourt AMPure XP beads at an 0.8 ratio, then further purified with a QIAquick PCR Purification Kit (Qiagen). The second step of PCR was performed with four separate 50-μL reactions with 5 ng of first-step PCR product per reaction (total of 20 ng for each sample), and Next Multiplex Oligos for Illumina (New England Biosciences) were used to attach Illumina adaptors and indexes. The thermocycling parameters were: 95 °C for 30 s; 7 or 8 cycles of (98 °C for 15 s; 72 °C for 45 s); and 72 °C for 5 min. The amplicons from these four reactions for each sample were pooled, size-selected and purified twice with Agencourt AMPure XP beads at an 0.8 ratio. The purified second-step PCR library was quantified by real-time PCR with Illumina Library Quantification (Kapa Biosystems) and used for downstream sequencing on an Illumina HiSeq rapid-run platform.

Viral production and Cas9 cloning.

HEK293T cells were maintained in DMEM supplemented with 10% FBS. To produce lentivirus particles, HEK293T cells were seeded in 15-cm tissue culture dishes 1 d before transfection and were 70–80% confluent at the time of transfection. Before transfection, the culture medium was changed to prewarmed DMEM supplemented with 10% FBS. For each 15-cm dish, 36 μL of Lipofectamine 3000 (Life Technologies) was diluted in 1.2 mL OptiMEM (Life Technologies). Separately, 3 μg pMD2.G (Addgene no. 12259), 12 μg of pCMV delta R8.2 (Addgene no. 12263), 9 μg of lentiviral vector and 48 μL of P3000 reagent were diluted in 1.2 mL OptiMEM. After incubation for 5 min, the Lipofectamine 3000 mixture and DNA mixture were combined and incubated at room temperature for 30 min. The mixture was then added dropwise to HEK293T cells. Viral particles were harvested 48 h and 72 h after transfection, further concentrated with Centricon Plus-20 centrifugal ultrafilters with a cutoff 100,000 NMWL (Millipore) to a final volume of 450 μL, divided into aliquots and frozen at −80 °C.

For screening assays, the CRISPR Cas9 nuclease was stably integrated into the human AAVS1 sites in HeLa, 293T and A549 cell lines. Cas9 cell lines were obtained from GeneCopoeia, tested for mycoplasma contamination and expanded and frozen in multiple aliquots so that subsequent experiments could be performed with low (<5) passage numbers. The cells were grown in DMEM supplemented with 10% FBS and hygromycin to select for cells that had integrated Cas9. Nearly 100% killing was observed in cells without the Cas9 vector after 120 h of exposure.

Design of gene constructs.

A panel of 73 genes comprising 17 validated oncogenes, 30 validated tumor-suppressor genes and 26 cancer-relevant DTs were selected for study (Fig. 2a and Supplementary Table 1). Priority was given to the genes most frequently mutated in human cancer15 and genes that were targets of FDA-approved drugs. Three unique 20-bp gRNAs were designed for each target gene. A large number of gRNAs were designed to target the earliest exon of each gene and/or constitutive exons, as previously reported26. Poly(T) sequences (i.e., those with more than two consecutive Ts) were avoided and, to prevent off-target editing, gRNAs were used only if they would require at least three substitutions to match any other sequence in the genome. After filtering of all gRNA designs for the aforementioned criteria, three gRNAs were selected: one targeting the earliest exon and two targeting the earliest constitutive exons. Dual-gRNA constructs were synthesized for all pairwise gRNA combinations between genes. In addition, 12 gRNAs were designed to be 'nontargeters' that should not target any specific site in the genome. Three of these were randomly selected and paired with all targeting gRNAs to provide single-knockout constructs. In addition, pairs of nontargeting gRNAs were included as negative controls. In total, this process resulted in 23,652 double-gene-knockout constructs and 657 single-gene-knockout constructs.

Competitive-growth experiments.

The pooled library of double-gRNA constructs was packaged into lentiviruses, and each cell line was infected at an MOI of 0.1–0.4 to ensure that each cell had zero or one double-gRNA constructs. Experiments were performed in three cell lines: HeLa, a human papilloma virus–positive cervical cancer cell line; A549, a KRAS G12S-mutant lung cancer cell line; and 293T, an SV40 large T antigen–transformed embryonic kidney cell line. To maintain adequate representation of all library elements (>200 fold), each screen was started with 107 cells. To accommodate this large number of cells, 500 cm2 bioassay plates (Corning) were used. Puromycin selection was started 2 d after transduction and was maintained throughout the course of the experiment to eliminate cells without gRNAs. The puromycin selection doses was 5 μg/ml. After transduction, cells containing integrated gRNAs were maintained in exponential growth by harvesting and removing a fraction of the cells approximately every 2–4 d. A minimum of 5 × 106 cells were maintained in culture for all cell lines at each passage. DNA was extracted from cells harvested at 3-, 14-, 21- and 28-d time points after transduction with a Blood and Cell Culture DNA Mini Kit (Qiagen) according to the manufacturer's protocols. To assess the frequency of gRNAs before and after selection, integrated DNA encoding the gRNA sequence was PCR-amplified and prepared for HiSeq rapid-run sequences, according to the manufacturer's protocols. Standard Illumina sequencing primers were used for library preparation, and sequencing was conducted to generate 75-bp reads in a paired-end fashion. After sequencing, data quality was assessed with FastQC.

Processing of paired-end reads.

Analysis was performed with a software pipeline constructed from Python, R and Jupyter Notebooks21. FASTQ files were trimmed of scaffold sequence with cutadapt27, after which trimmed reads with unexpected lengths <19 or >21 bp were discarded. The remaining reads were truncated to 19 bases from the appropriate end, and reverse reads were reverse-complemented. Both reads in a pair were checked for sequence matches against gRNA sequences used in the library, and one mismatch was allowed anywhere in a read. Read pairs that matched a known construct were aggregated to compute the total counts for that construct in the relevant sample, which was used for subsequent analysis.

Estimation of fitness of each construct.

We assume that each subpopulation of cells expressing a particular construct c grows exponentially. In the continuous limit

where Nc(t) is the number of cells in the population expressing construct c at time t; fc is the fitness of construct c measured in units of cell doublings per day (d−1); and f0 is the fitness of cells expressing a double-null (control) construct. Pooled sequencing does not measure Nc directly but estimates the relative abundance, xc, of each construct in the population:

Combining equations (1) and (2) yields:

which has linear (in time) and nonlinear components. Here acxc(0) is the initial condition. Notably, although the same construct is expected to have the same fitness in replicate experiments, it may have different initial conditions. The nonlinear term reflects effective interaction, whereby the relative frequency of one construct is modulated by the growth of other constructs. Thus, a particular xc may possibly decrease even when its fitness, fc, is positive. Because we are working with relative frequencies, there is no need to 'normalize' the raw counts in any way. By definition, log2 relative frequencies satisfy the constraint at all times.

Experimentally measured log2 relative frequencies Xc(t) deviate from the expected values xc(t). The parameters of the model are found by minimizing the sum of squares

which is subject to the constraint ∑c2ac = 1. Because E is invariant under the substitution , where δ is an arbitrary constant, the single-gene fitness is determined up to an overall additive constant, which can be fixed by setting the mean null-probe fitness to zero. Formally, one must find the minimum of the function , where λ is the Lagrange multiplier. In other words, the following system of nonlinear equations must be solved: An analytical solution does not exist; however, an excellent approximation exists when the number of constructs is large, ∑c1>>1, in which case the solution is:

and

where the bars indicate means over time points. The ac values do not depend on the choice of δ.

To avoid fitting to spurious data, we use only data points above a certain threshold of raw sequencing reads. The threshold depends primarily on (i) the size of the sample (number of cells) collected at a given time in relation to the size of the viral library and (ii) the depth of sequencing. Notably, the leftmost peaks in the histograms of (Supplementary Fig. 3a) contain severely undersampled constructs with zero counts. Their x coordinates correspond to a pseudocount of one introduced only for visualization purposes, which is arbitrary and therefore should not be used for fitting the model. Likewise, finite but very low counts are considered missing data. We set a threshold for every time point (red lines in Supplementary Fig. 3a). Notably, the right tails of these histograms move to the right over time, as the fastest-growing subpopulations become progressively larger fractions of the cells sampled. Relatedly, the peaks of zero counts become taller as smaller subpopulations are outcompeted by faster-growing subpopulations and become undersampled.

Estimation of gRNA fitness and gRNA–gRNA interactions.

After fc values are known, the gRNA-level fitness and gRNA-level interactions are determined as follows. Because each construct contains two gRNA probes, p and p′, we write:

where πpp is the gRNA-level interaction. Because there are n = 74 'genes' in the gRNA panel (73 genes and one null 'gene'), each represented by three distinct probes, there are 32n(n − 1)/2 = 24,309 constructs in total. Each gRNA is effectively replicated 3(n − 1) = 219 times, because it appears in as many constructs. The gRNA-level π scores are as unique as the construct fitness, fc. The fp values are found by robust fitting of equation (4). The gRNA-level π scores are the residuals of the robust fit.

A negative interaction indicates slower-than-expected growth, thus suggesting synthetic sickness or lethality, whereas a positive interaction indicates faster-than-expected growth, thus suggesting epistasis28. Genes with very low or no expression had fitness scores very near the average for that cell line, thus suggesting a neutral growth effect. In agreement with results from prior competitive growth CRISPR knockout screens29, the average fitness effect of all genes was slightly negative in all three cell lines.

We constructed a replicate plot of gRNA-level fitness (Supplementary Fig. 3c), in which each gene was represented by three gRNA probes. We highlighted three 'genes': the null gene as well as two genes with large positive and negative gRNA-probe fitness. The origin was set to the center of mass of the null probes on the basis of the choice of δ. Reassuringly, the null probes clustered closely together. For almost all genes, one of the three probes had almost zero fitness effect, and the median probe split the difference. To avoid diluting the signal with underperforming probes, we rank the probes as r(p) {0,1,2} in ascending order of |fp|. The ranks define weights for averaging as follows: the gene-level fitness values are calculated as the weighted means of probe-level fitness values, with weights given by the squares of probe ranks, r2(p), and the gene-level interactions are calculated as the weighted means of gRNA-level interactions, with weights given by the products of gRNA-probe ranks, r(p)r(p′). The means are over gRNA probes that represent the pair of interacting genes. The weights are designed so that probes with rank 0 do not contribute to the means, and the 'best' gRNA probes have the highest weights. Thus, each gene-level fitness is determined by two gRNA probes, and each gene-level interaction is determined by four gRNA-probe pairs. This heuristic may not be appropriate for other probe designs. For instance, if all three probes were performing well, it might be appropriate to choose equal weights.

Example fits are shown in Supplementary Figure 3b. In the top panels, the fitted fc agreed well between replicates, but only after undersampled points had been removed. The bottom panels show examples when fc does not agree well between replicates despite no obvious undersampling. These cases come in two types: those in which the measured data have large variance (bottom left) and those in which the data have clear but disagreeing trends in both replicates (bottom right). In the latter case, it is not known whether there is a real biological difference between replicated experiments or whether this is just a random ordering of four data points with large variances into apparent trends; therefore, we take this variance at face value and incorporate it into a model that borrows power from both replicates. In this model, we do not look for fc separately for each replicate. Instead, we find a single optimal fc from nc data points (nc = 2nt minus any number of points below the threshold). We assume that fc does not change across experiments (although the initial conditions ac may be different in each replicate). Each fc is associated with a raw P value calculated from the t statistic

where

is the standard error of fc. The factor (nc − 2) is the number of degrees of freedom that can be between 1 and 2nt − 2, depending on the number of data points used for fitting. The raw P values, Pc, are transformed into posterior probabilities, PPc, according to the theory of Storey30,31 (http://github.com/jdstorey/qvalue), which connects P values with Bayesian posterior probabilities in the context of the two-groups model. We find that approximately 2/3 of the posterior probabilities are zero; hence, approximately 2/3 of the fc values in equation (3) are likely to be truly zero. We avoid fitting to noise by designing a numerical Bayesian ensemble of experiments. In each member of the ensemble, we assign a fitness value to construct c, which is either 0 with probability (1 − PPc) or a Gaussian-distributed random number with mean fc and s.d. . The latter value of s.d. includes both the experimental variance from sampling and counting, and possible biological variance. We typically created 103 samples, calculated gene-level quantities fg and for each ensemble member and reported ensemble means and other statistics. We believe that the above sampling procedure is a reasonable data-driven solution to the bias-versus-variance problem. We calculate the z scores by dividing raw values of πgg by the s.d. of all interactions in a given experiment. We consider an interaction to be a candidate for further validation if it has a large absolute z score, typically |z| >3. We define the false discovery rate FDR (π) as the ratio of the observed number of interactions more extreme than π to the expected number of such interactions in the null model32, as has been adopted by other authors33. The null model is obtained from the Bayesian ensemble by mean-centering of the marginal distribution of every πgg (ref. 34). This null ensemble preserves correlations between gene pairs but is devoid of signal.

Replicate correlation.

To assess the variance between the two biological replicates for each cell line, single-gene fitness (f) and genetic interaction (π) were separately calculated for each replicate. Standard Pearson correlation was used to compare a single gene, f, from replicates 1 and 2 (Supplementary Fig. 4a–c). Given that genetic interaction is rare, the true value of most π scores is zero; hence, the measured values are driven entirely by noise. Therefore, correlation analysis was performed over the gene pairs with significant positive or negative interactions in at least one replicate (|z| >3), as has previously been proposed35. Additionally, the linear fit was constrained to pass through the origin (Supplementary Fig. 4d–f). For the calculation of genetic-interaction scores for each cell line, the analysis pipeline combined the data from both biological replicates.

Gene expression analysis.

RNA-seq data for the HeLa, A549 and 293T cell lines were obtained from the ENCODE project (GSE30567)36. The reported reads per kilobase of transcript per million mapped reads (RPKM) values represent the average of two separate experiments.

Drug–drug interaction testing.

We selected eight pairs of DT genes for which a synthetic-lethal genetic interaction had been identified in only HeLa or A549 cells. Rather than simply reproducing the dual CRISPR knockout experiment (gene–gene interaction), we sought to examine the viability of cells exposed to drugs inhibiting the corresponding gene products (drug–drug interaction), evaluating whether the interaction could be detected by an independent technology at the protein level and whether it was also accessible therapeutically. Interactions were prioritized for validation testing by identification of the interactions with the most-negative z scores in either HeLa of A594 cells for which we could obtain specific chemical inhibitors of gene product. In certain cases, multiple drugs were tested for each gene to identify the drug that best recapitulated the gene-knockout phenotype. HeLa or A549 cells were seeded in clear 96-well plates and allowed to attach overnight. The next day, drugs or solvent controls (DMSO for all compounds except hydroxyurea, which was dissolved in H2O) were added, and the cells were allowed to grow for 72 h in the presence of drug. Six replicates (individual wells, treated with drug by manual pipetting) were performed for each dose. After 72 h, 20 μL of 10× Resazurin (450 μM) was added to each well, and fluorescence was read on an Infinite F200 plate reader (Tecan) at excitation wavelength of 565 nM and an emission wavelength of 590 nM. Each drug was initially run by itself to establish its single-drug dose–response curve in each cell line. The drug hydroxyurea had a flat dose–response curve, such that doses well in excess of the reported IC50 for in vitro inhibition of RRM2 had minimal effects on cell viability; here, a fixed dose expected to achieve near-maximal inhibition of target, as previously reported37, was chosen for use in combination experiments. For the other combinations in which both drugs showed single-agent toxicity, the second drug was tested at a fixed dose that inhibited growth by 20% (IC20), as determined by its single-agent dose–response curve (Supplementary Fig. 7). For two of the seven compounds, the IC20 dose differed between HeLa and A549 (Supplementary Table 4). To test for interaction between genes A and B, a dose–response curve was established for drug 1 (inhibitor of gene A) in the presence or absence of drug 2 (inhibitor of gene B) at a fixed dose. Raw fluorescence values were normalized to values for either DMSO solvent wells (dose–response curve in the absence of drug 2) or drug 2 alone (dose–response curve in the presence of drug 2). Because the single-agent activity of drug 2 was normalized to zero (i.e., defined as 100% normalized viability), the dose–response curves with and without drug 2 would be the same if there were only an additive effect. To assess synergistic effects, which would suggest a synthetic-lethal relationship between gene A and gene B, a four-parameter nonlinear regression was used to fit a curve to each drug38. The IC50 of drug 1 alone was compared with the IC50 in the presence of drug 2 with the sum-of-squares F test in the software package GraphPad Prism (GraphPad Software; Supplementary Fig. 8a–h).

Data availability.

The authors declare that all data supporting the findings of this study are available within the paper and its supplementary information files. Additionally, networks and visualizations have been deposited in NDEx 2.0 as follows: 293T, http://www.ndexbio.org/#/newNetwork/199f9bb1-c3eb-11e6-8e29-06603eb7f303/; A549, http://www.ndexbio.org/#/newNetwork/ec8bdae3-c3c9-11e6-8e29-06603eb7f303/; HeLa, http://www.ndexbio.org/#/newNetwork/e50ee3c2-c3d4-11e6-8e29-06603eb7f303/. Source code for the analysis pipeline is available at http://ideker.ucsd.edu/papers/rsasik2017/.

References

  1. 1.

    , , , & Proc. Natl. Acad. Sci. USA 105, 3461–3466 (2008).

  2. 2.

    & Trends Genet. 8, 312–316 (1992).

  3. 3.

    et al. Nature 446, 806–810 (2007).

  4. 4.

    et al. Science 330, 1385–1389 (2010).

  5. 5.

    et al. Science 353, af1420 (2016).

  6. 6.

    , & Annu. Rev. Med. 66, 455–470 (2015).

  7. 7.

    , , , & Mol. Cell 58, 690–698 (2015).

  8. 8.

    et al. Mol. Cell 63, 514–525 (2016).

  9. 9.

    , & Nat. Methods 10, 957–963 (2013).

  10. 10.

    et al. Cell 159, 647–661 (2014).

  11. 11.

    et al. Proc. Natl. Acad. Sci. USA 113, 2544–2549 (2016).

  12. 12.

    et al. Cell 152, 909–922 (2013).

  13. 13.

    et al. Nat. Methods 8, 341–346 (2011).

  14. 14.

    , , , & Nat. Methods 10, 427–431 (2013).

  15. 15.

    et al. Science 339, 1546–1558 (2013).

  16. 16.

    et al. Proc. Natl. Acad. Sci. USA 98, 10314–10319 (2001).

  17. 17.

    , , & Nat. Methods 12, 823–826 (2015).

  18. 18.

    et al. Nat. Biotechnol. 34, 184–191 (2016).

  19. 19.

    et al. Genome Res. 25, 1147–1157 (2015).

  20. 20.

    et al. Nat. Biotechnol. 33, 661–667 (2015).

  21. 21.

    , , , & Combinatorial CRISPR-Cas9 knockout screen. Protocol Exchange (2017).

  22. 22.

    , , & Nucleic Acids Res. 42, e147 (2014).

  23. 23.

    et al. Mol. Cell 56, 333–339 (2014).

  24. 24.

    et al. Genome Biol. 16, 280 (2015).

  25. 25.

    et al. Nat. Biotechnol. 31, 833–838 (2013).

  26. 26.

    , & Nat. Methods 11, 783–784 (2014).

  27. 27.

    EMBnet.journal 17, 10–12 (2011).

  28. 28.

    , , , & Annu. Rev. Genomics Hum. Genet. 14, 111–133 (2013).

  29. 29.

    et al. Cell 163, 1515–1526 (2015).

  30. 30.

    Ann. Stat. 31, 2013–2035 (2003).

  31. 31.

    , , & qvalue: Q-value estimation for false discovery rate control. R package version 2.4.2. (2015).

  32. 32.

    & J. R. Stat. Soc. B 57, 289–300 (1995).

  33. 33.

    , & Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001).

  34. 34.

    & J. Stat. Plan. Inference 125, 85–100 (2004).

  35. 35.

    et al. Nat. Methods 7, 1017–1024 (2010).

  36. 36.

    ENCODE Project Consortium. Nature 489, 57–74 (2012).

  37. 37.

    & Clin. Infect. Dis. 30 (Suppl. 2), S193–S197 (2000).

  38. 38.

    , & Am. J. Physiol. Gastrointest. Liver Physiol. 235, G97–G102 (1978).

Download references

Acknowledgements

This work was generously supported by the following sources: UC San Diego School of Engineering and School of Medicine institutional funds (P.M., T.I., R.S. and A.B.), the Burroughs Wellcome Fund (1013926 to P.M.), the March of Dimes Foundation (5-FY15-450 to P.M.), the Sidney Kimmel Foundation (SKF-16-150 to P.M.), the California Institute for Regenerative Medicine (T.I. and J.P.S.), the National Cancer Institute (R21 CA199292 to N.E.L. and L30 CA171000 to J.P.S.), the National Institute for Environmental Health Sciences (R01 ES014811 to T.I.), the National Institute for General Medical Sciences (T32 GM008806 to J.L., R01 GM084279 to T.I. and N.K. and P50 GM085764 to T.I.), the UC San Diego Clinical and Translational Research Institute Grant (UL1TR001442 to R.S. and A.B.) and the Novo Nordisk Foundation Center for Biosustainability at DTU (NNF16CC0021858 to N.E.L.).

Author information

Author notes

    • John Paul Shen
    • , Dongxin Zhao
    •  & Roman Sasik

    These authors contributed equally to this work.

Affiliations

  1. Department of Medicine, Division of Genetics, University of California, San Diego, La Jolla, California, USA.

    • John Paul Shen
    • , Ana Bojorquez-Gomez
    • , Katherine Licon
    • , Kristin Klepper
    • , Daniel Pekin
    • , Alex N Beckett
    • , Kyle Salinas Sanchez
    • , Jason F Kreisberg
    •  & Trey Ideker
  2. Moores UCSD Cancer Center, La Jolla, California, USA.

    • John Paul Shen
    • , Trey Ideker
    •  & Prashant Mali
  3. The Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, California, USA.

    • John Paul Shen
    • , Dongxin Zhao
    • , Dan Du
    • , Assen Roguev
    • , Jason F Kreisberg
    • , Nevan Krogan
    • , Lei Qi
    • , Trey Ideker
    •  & Prashant Mali
  4. Department of Bioengineering, University of California, San Diego, La Jolla, California, USA.

    • Dongxin Zhao
    • , Chih-Chung Kuo
    •  & Prashant Mali
  5. Center for Computational Biology & Bioinformatics, University of California, San Diego, La Jolla, California, USA.

    • Roman Sasik
    • , Amanda Birmingham
    • , Aaron N Chang
    •  & Trey Ideker
  6. Bioinformatics & Systems Biology Program, University of California, San Diego, La Jolla, California, USA.

    • Jens Luebeck
    •  & Alex Thomas
  7. Novo Nordisk Center for Biosustainability at the University of California, San Diego, La Jolla, California, USA.

    • Alex Thomas
    • , Chih-Chung Kuo
    •  & Nathan E Lewis
  8. Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Department of Cell and Tissue Biology, University of California, San Francisco, San Francisco, California, USA.

    • Dan Du
  9. Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, California, USA.

    • Assen Roguev
    •  & Nevan Krogan
  10. Department of Pediatrics, University of California, San Diego, La Jolla, California, USA.

    • Nathan E Lewis
  11. Department of Bioengineering, Stanford University, Stanford, California, USA.

    • Lei Qi

Authors

  1. Search for John Paul Shen in:

  2. Search for Dongxin Zhao in:

  3. Search for Roman Sasik in:

  4. Search for Jens Luebeck in:

  5. Search for Amanda Birmingham in:

  6. Search for Ana Bojorquez-Gomez in:

  7. Search for Katherine Licon in:

  8. Search for Kristin Klepper in:

  9. Search for Daniel Pekin in:

  10. Search for Alex N Beckett in:

  11. Search for Kyle Salinas Sanchez in:

  12. Search for Alex Thomas in:

  13. Search for Chih-Chung Kuo in:

  14. Search for Dan Du in:

  15. Search for Assen Roguev in:

  16. Search for Nathan E Lewis in:

  17. Search for Aaron N Chang in:

  18. Search for Jason F Kreisberg in:

  19. Search for Nevan Krogan in:

  20. Search for Lei Qi in:

  21. Search for Trey Ideker in:

  22. Search for Prashant Mali in:

Contributions

J.P.S., D.Z., T.I. and P.M. conceived and supervised all experiments and wrote the paper. R.S. and J.L. performed computational analysis and wrote the paper. A.B., C.-C.K., A.T., N.L. and A.C. performed computational analysis. J.P.S., D.Z., A.B.G., K.L., K.K., D.P., A.N.B., K.S.S. and P.M. performed experiments. D.D., A.R., N.K. and L.Q. provided technical advice. J.K. provided technical advice and wrote the paper.

Competing interests

J.P.S., D.Z., R.S., T.I. and P.M. have filed a patent based on the described methods and findings; otherwise, the authors declare no competing financial interests.

Corresponding authors

Correspondence to Trey Ideker or Prashant Mali.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–8

Excel files

  1. 1.

    Supplementary Table 1

    Annotated list of the 73 genes assayed in this study

  2. 2.

    Supplementary Table 2

    Raw and processed data from CRISPR competitive growth assay

  3. 3.

    Supplementary Table 3

    List of hit genetic interactions from CRISPR assay

  4. 4.

    Supplementary Table 4

    Summary of drug-drug validation experiments

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.4225

Further reading