Understanding the functional effects of DNA sequence variants is of critical importance for studies of basic biology, evolution, and medical genetics; however, measuring these effects in a high-throughput manner is a major challenge. One promising avenue is precise editing with the CRISPR–Cas9 system, which allows for generation of DNA double-strand breaks (DSBs) at genomic sites matching the targeting sequence of a guide RNA (gRNA). Recent studies have used CRISPR libraries to generate many frameshift mutations genome wide through faulty repair of CRISPR-directed breaks by nonhomologous end joining (NHEJ)1. Here, we developed a CRISPR-library-based approach for highly efficient and precise genome-wide variant engineering. We used our method to examine the functional consequences of premature-termination codons (PTCs) at different locations within all annotated essential genes in yeast. We found that most PTCs were highly deleterious unless they occurred close to the 3′ end of the gene and did not affect an annotated protein domain. Unexpectedly, we discovered that some putatively essential genes are dispensable, whereas others have large dispensable regions. This approach can be used to profile the effects of large classes of variants in a high-throughput manner.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR–Cas9. Nat. Rev. Genet. 16, 299–311 (2015).
DiCarlo, J. E. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Res. 41, 4336–4343 (2013).
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016). D1.
Stenson, P. D. et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014).
Teo, S.-H. & Jackson, S. P. Identification of Saccharomyces cerevisiae DNA ligase IV: involvement in DNA double-strand break repair. EMBO J. 16, 4788–4795 (1997).
Valencia, M. et al. NEJ1 controls non-homologous end joining in Saccharomyces cerevisiae. Nature 414, 666–669 (2001).
Giaever, G. et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391 (2002).
Kastenmayer, J. P. et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 16, 365–373 (2006).
Fisk, D. G. et al. Saccharomyces cerevisiae S288C genome annotation: a working hypothesis. Yeast 23, 857–865 (2006).
He, F. & Jacobson, A. Identification of a novel component of the nonsense-mediated mRNA decay pathway by use of an interacting protein screen. Genes Dev. 9, 437–454 (1995).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016). D1.
Scannell, D. et al. The awesome power of yeast evolutionary genetics: new genome sequences and strain resources for the Saccharomyces sensu stricto genus. G3 (Bethesda) 1, 11–25 (2011).
Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, D1049–D1056 (2015).
Forsberg, H. & Ljungdahl, P. O. Genetic and biochemical analysis of the yeast plasma membrane Ssy1p-Ptr3p-Ssy5p sensor of extracellular amino acids. Mol. Cell. Biol. 21, 814–826 (2001).
Klasson, H., Fink, G. R. & Ljungdahl, P. O. Ssy1p and Ptr3p are plasma membrane components of a yeast system that senses extracellular amino acids. Mol. Cell. Biol. 19, 5405–5416 (1999).
Forsberg, H., Hammar, M., Andréasson, C., Molinér, A. & Ljungdahl, P. O. Suppressors of ssy1 and ptr3 null mutations define novel amino acid sensor-independent genes in Saccharomyces cerevisiae. Genetics 158, 973–988 (2001).
Ljungdahl, P. O., Gimeno, C. J., Styles, C. A. & Fink, G. R. SHR3: a novel component of the secretory pathway specifically required for localization of amino acid permeases in yeast. Cell 71, 463–478 (1992).
Kern, L., de Montigny, J., Jund, R. & Lacroute, F. The FUR1 gene of Saccharomyces cerevisiae: cloning, structure and expression of wild-type and mutant alleles. Gene 88, 149–157 (1990).
Koren, A., Ben-Aroya, S., Steinlauf, R. & Kupiec, M. Pitfalls of the synthetic lethality screen in Saccharomyces cerevisiae: an improved design. Curr. Genet. 43, 62–69 (2003).
Gebert, N. et al. Dual function of Sdh3 in the respiratory chain and TIM22 protein translocase of the mitochondrial inner membrane. Mol. Cell 44, 811–818 (2011).
Wu, N. Y., Chung, C. S. & Cheng, S. C. Role of Cwc24 in the first catalytic step of splicing and fidelity of 5′ splice site selection. Mol. Cell. Biol. 37, e00580–16 (2016).
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
Corbett, M. A. et al. A novel X-linked trichothiodystrophy associated with a nonsense mutation in RNF113A. J. Med. Genet. 52, 269–274 (2015).
Matangkasombut, O., Buratowski, R. M., Swilling, N. W. & Buratowski, S. Bromodomain factor 1 corresponds to a missing piece of yeast TFIID. Genes Dev. 14, 951–962 (2000).
Volanakis, A. et al. Spliceosome-mediated decay (SMD) regulates expression of nonintronic genes in budding yeast. Genes Dev. 27, 2025–2038 (2013).
Spelbrink, R. G. & Nothwehr, S. F. The yeast GRD20 gene is required for protein sorting in the trans-Golgi network/endosomal system and for polarization of the actin cytoskeleton. Mol. Biol. Cell 10, 4263–4281 (1999).
Decker, C. J., Teixeira, D. & Parker, R. Edc3p and a glutamine/asparagine-rich domain of Lsm4p function in processing body assembly in Saccharomyces cerevisiae. J. Cell Biol. 179, 437–449 (2007).
TerBush, D. R., Maurice, T., Roth, D. & Novick, P. The Exocyst is a multiprotein complex required for exocytosis in Saccharomyces cerevisiae. EMBO J. 15, 6483–6494 (1996).
Michel, A. H. et al. Functional mapping of yeast genomes by saturated transposition. eLife 6, e23570 (2017).
Dowell, R. D. et al. Genotype to phenotype: a complex problem. Science 328, 469 (2010).
Decourty, L. et al. Long open reading frame transcripts escape nonsense-mediated mRNA decay in yeast. Cell Rep. 6, 593–598 (2014).
Nagy, E. & Maquat, L. E. A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem. Sci. 23, 198–199 (1998).
Garst, A. D. et al. Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nat. Biotechnol. 35, 48–55 (2017).
Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123 (2014).
Chu, V. T. et al. Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells. Nat. Biotechnol. 33, 543–548 (2015).
Lin, S., Staahl, B. T., Alla, R. K. & Doudna, J. A. Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. eLife 3, e04766 (2014).
Song, J. et al. RS-1 enhances CRISPR/Cas9- and TALEN-mediated knock-in efficiency. Nat. Commun. 7, 10548 (2016).
Bao, Z. et al. Homology-integrated CRISPR-Cas (HI-CRISPR) system for one-step multigene disruption in Saccharomyces cerevisiae. ACS Synth. Biol. 4, 585–594 (2015).
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
Becker, D. M. & Lundblad, V. Introduction of DNA into yeast cells. Curr. Protoc. Mol. Biol. 27, 13.7 (2001).
Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina paired-end read merger. Bioinformatics 30, 614–620 (2014).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 31, 233–239 (2013).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–lme48 (2015).
Searle, S. R., Casella, G. & McCulloch, C. E. Variance Components (Wiley, Hoboken, NJ, USA, 2006).
Nielsen, S., Yuzenkova, Y. & Zenkin, N. Mechanism of eukaryotic RNA polymerase III transcription termination. Science 340, 1577–1580 (2013).
Alexa, A. & Rahnenfuhrer, J. topGO: Enrichment Analysis for Gene Ontology https://doi.org/doi:10.18129/B9.bioc.topGO (2016).
Grant, B. J., Rodrigues, A. P. C., ElSawy, K. M., McCammon, J. A. & Caves, L. S. D. Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics 22, 2695–2696 (2006).
Liu, G. et al. Gene essentiality is a quantitative property linked to cellular evolvability. Cell 163, 1388–1399 (2015).
Wootton, J. C. & Federhen, S. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571 (1996).
Fox, J. & Weisberg, S. An R Companion to Applied Regression (Sage, Thousand Oaks, CA, USA, 2011).
Tjur, T. Coefficients of determination in logistic regression models: a new proposal: the coefficient of discrimination. Am. Stat. 63, 366–372 (2009).
Muggeo, V. M. R. segmented: an R Package to fit regression models with broken-line relationships. R News 8, 20–25 (2008).
Curran, K. A. et al. Short synthetic terminators for improved heterologous gene expression in yeast. ACS Synth. Biol. 4, 824–832 (2015).
We thank members of the laboratory of L.K., and F. Albert, M. P. Hughes, and J. Rine for helpful discussions; R. Cheung and E. Pham for technical assistance; and G. Church (Harvard Medical School) for plasmids. Funding was provided by the Howard Hughes Medical Institute and NIH grants R01 GM102308 (L.K.) and F32 GM116318 (M.J.S.).
Integrated supplementary information
Further details on the BstEII and SphI cloning sites are shown in Supplementary figure 12.
Histograms of the positions of chosen PTCs, represented either as (a) the number of codons from gene ends, or (b) the fraction of the ORF length.
Supplementary Figure 3 Edit-directing plasmid effects are replicable and depend on the repair template.
a, Scatterplot of PTC tolerance scores in nej1∆ cells calculated independently for each replicate experiment (n = 3132 independently targeted PTCs that were observed in both replicates, Pearson's r = 0.6, p < 2 × 10-16). b, Scatterplot of gene PTC tolerance scores calculated independently for each replicate experiment (n = 1140 independently targeted genes, Pearson's r = 0.7, p < 2 × 10-16). c, Repair templates introducing essential gene PTCs were less tolerated than repair templates introducing dubious ORF PTCs. The experiment included 72 cases of pairs of edit-directing plasmids that both used the same gRNA. In each pair, one plasmid introduced a PTC in an essential gene, and the other introduced a PTC in a dubious ORF overlapping the same essential gene; the latter had the effect of introducing a synonymous or nonsynonymous substitution in the essential gene. Lines correspond to effects of the same gRNA targeting either a PTC to an essential gene or an edit to an overlapping dubious ORF. The edit-directing plasmids targeting PTCs to essential genes were less tolerated than their partners targeting PTCs to the overlapping dubious ORFs (Student's two-tailed paired t-test n = 72, t = 6.5, P = 8 × 10−9). The outer boxplots show the distribution of edit tolerance scores. The centerline of each box corresponds to the data's median value; the top and bottom of the box span from the first quartile to the third quartile of the data; and the whiskers reach to either the data's most extreme values or 1.5 times the interquartile range.
a, Tolerance score for each tested PTC in essential genes and dubious ORFs in nej1∆ nmd2∆ cells, with overlaid boxplots (n = 8,346 PTCs in essential genes and 695 PTCs in dubious ORFs). The centerline of each box corresponds to the data's median value; the top and bottom of the box span from the first quartile to the third quartile of the data; and the whiskers reach to either the data's most extreme values or 1.5 times the interquartile range. P < 2 × 10−16, two-sided Wilcoxon rank test. b, Scatterplot of PTC tolerance score versus distance in codons from the 3′ ends of essential genes in nej1∆ nmd2∆ cells (top) and in nej1∆ NMD2 cells (bottom; same as Fig. 1d). As in Fig. 1d, the thick blue line shows a two-segment regression fit, and the 95% confidence interval for the boundary between the segments is shown by the vertical blue lines. The segmented regression was fit on PTC tolerance scores for n = 7,583 PTCs and n = 7,561 PTCs that were within 500 codons of the 3′ end of a gene, for the top and bottom panels, respectively.
For each PTC, the fraction of amino acid residues perfectly conserved downstream of the PTC among S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus v. uvarum 12 was calculated. PTCs were then binned by whether their affected sequence was more or less conserved than the median truncated sequence, as well as by whether or not they disrupted a Pfam-annotated domain. Scatterplots show the PTC tolerance scores versus the distance in codons from the 3′ end of essential genes, as in Fig. 1d, with the blue line showing a two-segment regression fit on PTCs within 500 codons of the 3′ end of the gene. a, PTCs affecting less conserved sequence and disrupting an annotated protein domain. n = 2,228 PTCs. b, PTCs affecting more conserved sequence and disrupting an annotated protein domain. n = 3,233 PTCs. c, PTCs affecting less conserved sequence and not disrupting an annotated protein domain. n = 1,754 PTCs. d, PTCs affecting more conserved sequence and not disrupting an annotated protein domain. n = 854 PTCs.
a, Histogram of the number of genes tolerating a given number of tested PTCs. For each gene, the number of tolerated PTCs was determined by a Hidden Markov Model analysis. b, The 3′-most tested PTC in POB3, at position 551, had a tolerance score of -1.2. It was also called deleterious by the HMM analysis. We confirmed that POB3 does not tolerate a deletion spanning its terminal two codons (n = 6 tetrads analyzed), and found that it tolerates deletion of its last codon (n = 8 tetrads analyzed). Tetrad dissections were done as in Fig. 2, from a diploid yeast heterozygous for a truncation mutation of interest. c, The 3′-most tested PTC in PCNA, at position 251, had a tolerance score of -0.59, and was called deleterious by the HMM analysis. We confirmed that PCNA does not tolerate a deletion spanning its terminal eight codons, and discovered that terminal deletions of five or more codons were lethal. n = 6 tetrads were analyzed for each truncation mutant.
Data is shown as dot-plots overlaid with box-and-whisker plots. The centerline of each box corresponds to the data's median value; the top and bottom of the box span from the first quartile to the third quartile of the data; and the whiskers reach to either the data's most extreme values or 1.5 times the interquartile range. See also Supplementary Table 4. a, Gene tolerance scores for genes in the biological process category of “RNA splicing, via spliceosome,” compared to the remaining tested genes (n = 69 and 965, respectively; Kolmogorov-Smirnov test, Bonferroni corrected P = 0.0017) b, Gene tolerance scores for genes in the molecular function category of “catalytic activity,” compared to the remaining tested genes (n = 477 and 557, respectively; Kolmogorov-Smirnov test, Bonferroni corrected P = 0.0024).
a, RNA-seq and ribosome footprinting read depth by position (Albert, F. W., Muzzey, D., Weissman, J. S. & Kruglyak, L. Genetic influences on translation in yeast. PLoS Genet. 10, e1004692; 2014) in the vicinity of YJR012C and GPI14. We mark the position of YJR012C(M76), which we propose to be the actual start of YJR012C. b, Alignment of the annotated YJR012C protein sequence from related yeast species. The proposed start position at M76 is highlighted with a red box. The 48 codons of S. mikatae YJR012C not shown include two additional stop codons. c, Tetrad dissections of a yjr012c(76-207Δ)/YJR012C diploid strain, as in Fig. 2. n = 12 tetrads were analyzed. d The positions of UTR5, HYP2, and the TATA box of HYP2 (left) (Rhee, H. S. & Pugh, B. F. Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature 483, 295–301; 2012). Tetrad dissections of a utr5(34-166Δ)/UTR5 diploid strain; n = 4 tetrads were analyzed (right).
Tetrad dissections of an mmf1Δ/MMF1 strain were done as in Fig. 2, with pictures taken after four days of growth. Top two panels show tetrads dissected on rich medium; bottom two panels show tetrads dissected on defined (CSM) medium. Left two panels show tetrads dissected on 2% glucose medium; right two panels show tetrads dissected on 2% galactose medium. The original annotation of essential genes was done in rich glucose medium (top left), whereas our PTC tolerance experiment was done in defined galactose medium (bottom right). n = 6 tetrads were analyzed for each condition tested.
Supplementary Figure 10 Comparison of gene PTC tolerance and SATAY-determined gene-transposon tolerance
Transposon tolerance was determined as the logarithm of the number of transposons tolerated per kilobase of gene length. Black points correspond to essential genes; red points correspond to dubious ORFs.
Supplementary Figure 11 Predicted effects of human NMD on PTCs, for human genes that are homologs of essential yeast genes
For human genes, we plot the fraction of PTCs escaping NMD regulation according to the 50-bp rule (Nagy, E. & Maquat, L. E. A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem. Sci. 23, 198–199; 1998) versus distance from gene ends. Note that the majority of PTCs within the last 27 codons are predicted to escape NMD regulation.
Sequence numbering corresponds to the nucleotide position in the gRNA sequence. (a) The end of the SNR52 promoter sequence and start of the gRNA. The upper sequence is what was used for S. cerevisiae sgRNA expression by DiCarlo, et al. (2); the lower sequence shows the modifications, in red, that incorporated a BstEII cloning site. (b) The first gRNA hairpin. The left sequence is what DiCarlo, et al., used; the right sequence shows the modifications, in red, that incorporated an SphI cloning site.
(a) The number of uniquely barcoded edit-directing plasmids tracked per PTC in nej1Δ cells. (b) The number of uniquely barcoded edit-directing plasmids tracked per PTC in nej1Δ nmd2Δ cells. (c) The number of uniquely barcoded edit-directing plasmids tracked per PTC, combined across nej1Δ and nej1Δ nmd2Δ cells.
We fit a generalized linear model to the read count data for each tracked barcoded plasmid. This histogram shows the resulting slopes (thetas) for the barcoded plasmids. The vertical red line demarcates our “persisting” versus “depleted” binarization threshold of −0.025.
Supplementary Figures 1–14, Supplementary Notes 1 and 2, and Supplementary Tables 1, 2 and 5–10
Effects were determined using a generalized linear mixed model (n = 84,284 barcoded variant-engineering plasmids). Coefficients were obtained from the glmer function. Type III analysis-of-variance tables were computed for the fixed effect terms in the model with the Anova() function in the car R package. Likelihood ratio chisquare values and p-values for the fixed-effect terms in the model were also computed using this function. Tjur's D was used to calculate a pseudo R2 statistic for overall model fit (Tjur's D = 0.39).
Supplementary Table 4: Gene Ontology (GO) enrichment results for gene tolerance of PTCs (n = 1,034 genes)
Significance was determined with a non-parametric Kolmogorov-Smirnov test.
PTC tolerance scores for each directed PTC (n = 10,971 PTCs).
Gene PTC tolerance scores for each targeted gene (n = 1,140 genes).
About this article
Current Genetics (2019)
Transgenic Research (2018)