High-throughput functional evaluation of human cancer-associated mutations using base editors

Kim, Younggwang; Lee, Seungho; Cho, Soohyuk; Park, Jinman; Chae, Dongwoo; Park, Taeyoung; Minna, John D.; Kim, Hyongbum Henry

doi:10.1038/s41587-022-01276-4

Article
Published: 11 April 2022

High-throughput functional evaluation of human cancer-associated mutations using base editors

Younggwang Kim^1,2^na1,
Seungho Lee¹^na1,
Soohyuk Cho^1,2,
Jinman Park^1,2,
Dongwoo Chae¹,
Taeyoung Park³,
John D. Minna⁴ &
…
Hyongbum Henry Kim ORCID: orcid.org/0000-0002-4693-738X^1,2,5,6,7,8

Nature Biotechnology volume 40, pages 874–884 (2022)Cite this article

9783 Accesses
24 Citations
24 Altmetric
Metrics details

Subjects

Abstract

Comprehensive phenotypic characterization of the many mutations found in cancer tissues is one of the biggest challenges in cancer genomics. In this study, we evaluated the functional effects of 29,060 cancer-related transition mutations that result in protein variants on the survival and proliferation of non-tumorigenic lung cells using cytosine and adenine base editors and single guide RNA (sgRNA) libraries. By monitoring base editing efficiencies and outcomes using surrogate target sequences paired with sgRNA-encoding sequences on the lentiviral delivery construct, we identified sgRNAs that induced a single primary protein variant per sgRNA, enabling linking those mutations to the cellular phenotypes caused by base editing. The functions of the vast majority of the protein variants (28,458 variants, 98%) were classified as neutral or likely neutral; only 18 (0.06%) and 157 (0.5%) variants caused outgrowing and likely outgrowing phenotypes, respectively. We expect that our approach can be extended to more variants of unknown significance and other tumor types.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Base editor-directed generation of cancer-associated transition mutations.**

**Fig. 2: Functional classification of cancer-associated transition mutations.**

**Fig. 3: High-throughput classifications are reproducible at different scales.**

**Fig. 4: Individual validation of sgRNAs and their associated base-edited variants supports the high accuracy of high-throughput functional classifications.**

**Fig. 5: Base editor-directed investigation of mutations related to resistance to an EGFR tyrosine kinase inhibitor.**

Comprehensive characterization of posttranscriptional impairment-related 3′-UTR mutations in 2413 whole genomes of cancer patients

Article Open access 02 June 2022

Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants

Article 14 February 2022

Interrogating Mutant Allele Expression via Customized Reference Genomes to Define Influential Cancer Mutations

Article Open access 04 September 2019

Data availability

We have submitted the deep sequencing data from this study to the National Center of Biotechnology Information’s Sequence Read Archive under accession number PRJNA667758. We have provided the datasets used in this study as Supplementary Tables 2–4 and deepcrispr.info/BEvariants.

Code availability

The custom Python scripts used for the generation of the MAGeCK input file using UMIs are available on GitHub (https://github.com/oreolic/CancerLibrary).

References

McLendon, R. et al. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Article CAS Google Scholar
Hudson, T. J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
Article CAS PubMed Google Scholar
Campbell, P. J. et al. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Article CAS Google Scholar
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).
Article CAS PubMed PubMed Central Google Scholar
Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).
Article CAS PubMed PubMed Central Google Scholar
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719 (2009).
Article CAS PubMed PubMed Central Google Scholar
Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 50, 1381–1387 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kotler, E. et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol. Cell 71, 178–190 (2018).
Article CAS PubMed Google Scholar
Majithia, A. R. et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016).
Article CAS PubMed PubMed Central Google Scholar
Brenan, L. et al. Phenotypic characterization of a comprehensive set of MAPK1/ERK2 missense mutants. Cell Rep. 17, 1171–1183 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ahler, E. et al. A combined approach reveals a regulatory mechanism coupling Src’s kinase activity, localization, and phosphotransferase-independent functions. Mol. Cell 74, 393–408 (2019).
Article CAS PubMed PubMed Central Google Scholar
Matreyek, K. A. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018).
Article CAS PubMed PubMed Central Google Scholar
Starita, L. M. et al. A multiplex homology-directed DNA repair assay reveals the impact of more than 1,000 BRCA1 missense substitution variants on protein function. Am. J. Hum. Genet. 103, 498–508 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chiasson, M. A. et al. Multiplexed measurement of variant abundance and activity reveals VKOR topology, active site and human variant impact. eLife 9, e58026 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. & Kim, J. S. A guide to genome engineering with programmable nucleases. Nat. Rev. Genet. 15, 321–334 (2014).
Article CAS PubMed Google Scholar
Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).
Article CAS PubMed PubMed Central Google Scholar
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016).
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. S. et al. Systematic identification of molecular subtype-selective vulnerabilities in non-small-cell lung cancer. Cell 155, 552–566 (2013).
Article CAS PubMed Google Scholar
Bamford, S. et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br. J. Cancer 91, 355–358 (2004).
Article CAS PubMed PubMed Central Google Scholar
Hanna, R. E. et al. Massively parallel assessment of human variants with base editor screens. Cell 184, 1064–1080 (2021).
Article CAS PubMed Google Scholar
Kuscu, C. et al. CRISPR-STOP: gene silencing through base-editing-induced nonsense mutations. Nat. Methods 14, 710–712 (2017).
Article CAS PubMed Google Scholar
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Article CAS PubMed PubMed Central Google Scholar
Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037–1043 (2020).
Article CAS PubMed Google Scholar
Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).
Article CAS PubMed Google Scholar
Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
Article CAS PubMed Google Scholar
Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).
Article CAS PubMed Google Scholar
Kim, N. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat. Biotechnol. 38, 1328–1336 (2020).
Article CAS PubMed Google Scholar
Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).
Article CAS PubMed Google Scholar
Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).
Article CAS PubMed PubMed Central Google Scholar
Michlits, G. et al. CRISPR-UMI: single-cell lineage tracing of pooled CRISPR–Cas9 screens. Nat. Methods 14, 1191–1197 (2017).
Article CAS PubMed Google Scholar
Schmierer, B. et al. CRISPR/Cas9 screening using unique molecular identifiers. Mol. Syst. Biol. 13, 945 (2017).
Article PubMed PubMed Central Google Scholar
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).
Article CAS PubMed PubMed Central Google Scholar
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
Article PubMed PubMed Central CAS Google Scholar
Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).
Article CAS PubMed PubMed Central Google Scholar
Carter, H. et al. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 69, 6660–6667 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001).
Article CAS PubMed PubMed Central Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Miosge, L. A. et al. Comparison of predicted and actual consequences of missense mutations. Proc. Natl Acad. Sci. USA 112, E5189–E5198 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sun, S. et al. An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res. 26, 670–680 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chen, H. et al. Comprehensive assessment of computational algorithms in predicting cancer driver mutations. Genome Biol. 21, 43 (2020).
Article PubMed PubMed Central Google Scholar
Markusic, D., Oude-Elferink, R., Das, A. T., Berkhout, B. & Seppen, J. Comparison of single regulated lentiviral vectors with rtTA expression driven by an autoregulatory loop or a constitutive promoter. Nucleic Acids Res. 33, e63 (2005).
Article PubMed PubMed Central Google Scholar
Yi, S. A. et al. HPV-mediated nuclear export of HP1γ drives cervical tumorigenesis by downregulation of p53. Cell Death Differ. 27, 2537–2551 (2020).
Article CAS PubMed PubMed Central Google Scholar
Eekels, J. J. M. et al. A competitive cell growth assay for the detection of subtle effects of gene transduction on cell proliferation. Gene Ther. 19, 1058–1064 (2012).
Article CAS PubMed Google Scholar
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Article CAS PubMed Google Scholar
Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).
Article CAS PubMed Google Scholar
Sequist, L. V. et al. Genotypic and histological evolution of lung cancers acquiring resistance to EGFR inhibitors. Sci. Transl. Med. 3, 75ra26 (2011).
Article PubMed PubMed Central Google Scholar
Ganesan, P. et al. Epidermal growth factor receptor P753S mutation in cutaneous squamous cell carcinoma responsive to cetuximab-based therapy. J. Clin. Oncol. 34, e34–e37 (2016).
Article PubMed Google Scholar
Stabile, L. P. et al. Combined targeting of the estrogen receptor and the epidermal growth factor receptor in non-small cell lung cancer shows enhanced antiproliferative effects. Cancer Res. 65, 1459–1470 (2005).
Article CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Article CAS PubMed Google Scholar
Chen, Y. et al. PHLDA1, another PHLDA family protein that inhibits Akt. Cancer Sci. 109, 3532–3542 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nagai, M. A. Pleckstrin homology-like domain, family A, member 1 (PHLDA1) and cancer. Biomed. Rep. 4, 275–281 (2016).
Article CAS PubMed PubMed Central Google Scholar
Botti, E. et al. Developmental factor IRF6 exhibits tumor suppressor activity in squamous cell carcinomas. Proc. Natl Acad. Sci. USA 108, 13710–13715 (2011).
Article CAS PubMed PubMed Central Google Scholar
Jobling, R. et al. Monozygotic twins with variable expression of Van der Woude syndrome. Am. J. Med. Genet. A 155A, 2008–2010 (2011).
Article PubMed Google Scholar
Stupack, D. G. Caspase-8 as a therapeutic target in cancer. Cancer Lett. 332, 133–140 (2013).
Article CAS PubMed Google Scholar
Jia, D. et al. Crebbp loss drives small cell lung cancer and increases sensitivity to HDAC inhibition. Cancer Discov. 8, 1422–1437 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pasqualucci, L. et al. Inactivating mutations of acetyltransferase genes in B-cell lymphoma. Nature 471, 189–195 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cuella-Martin, R. et al. Functional interrogation of DNA damage response variants with base editing screens. Cell 184, 1081–1097 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-01172-3 (2022).
Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371–376 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. Base editing with a Cpf1–cytidine deaminase fusion. Nat. Biotechnol. 36, 324–327 (2018).
Article CAS PubMed Google Scholar
Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).
Article CAS PubMed PubMed Central Google Scholar
Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Article CAS PubMed PubMed Central Google Scholar
Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).
Article PubMed PubMed Central CAS Google Scholar
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR–Cas9 variants. Science 368, 290–296 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhou, C. et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275–278 (2019).
Article CAS PubMed Google Scholar
Thuronyi, B. W. et al. Continuous evolution of base editors with expanded target compatibility and improved activity. Nat. Biotechnol. 37, 1070–1079 (2019).
Article CAS PubMed PubMed Central Google Scholar
Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 883–891 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gaudelli, N. M. et al. Directed evolution of adenine base editors with increased activity and therapeutic application. Nat. Biotechnol. 38, 892–900 (2020).
Article CAS PubMed Google Scholar
Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol. 39, 41–46 (2021).
Article CAS PubMed Google Scholar
Zhao, D. et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nat. Biotechnol. 39, 35–40 (2021).
Article CAS PubMed Google Scholar
Hanson, G. & Coller, J. Codon optimality, bias and usage in translation and mRNA decay. Nat. Rev. Mol. Cell Biol. 19, 20–30 (2018).
Article CAS PubMed Google Scholar
Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).
Article CAS PubMed PubMed Central Google Scholar
Meier, J. A., Zhang, F. & Sanjana, N. E. GUIDES: sgRNA design for loss-of-function screens. Nat. Methods 14, 831–832 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ramirez, R. D. et al. Immortalization of human bronchial epithelial cells in the absence of viral oncoproteins. Cancer Res. 64, 9027–9034 (2004).
Article CAS PubMed Google Scholar
Ellis, B. L., Potts, P. R. & Porteus, M. H. Creating higher titer lentivirus with caffeine. Hum. Gene Ther. 22, 93–100 (2011).
Article CAS PubMed Google Scholar
Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR–Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).
Article PubMed PubMed Central CAS Google Scholar
Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Article CAS PubMed Google Scholar
Billon, P. et al. CRISPR-mediated base editing enables efficient disruption of eukaryotic genes through induction of STOP codons. Mol. Cell 67, 1068–1079 (2017).
Article CAS PubMed PubMed Central Google Scholar
Behan, F. M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature 568, 511–516 (2019).
Article CAS PubMed Google Scholar
Hart, T., Brown, K. R., Sircoulomb, F., Rottapel, R. & Moffat, J. Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Mol. Syst. Biol. 10, 733 (2014).
Article PubMed PubMed Central Google Scholar
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kandasamy, K. et al. NetPath: a public resource of curated signal transduction pathways. Genome Biol. 11, R3 (2010).
Article PubMed PubMed Central CAS Google Scholar
Wang, G. & Fersht, A. R. Mechanism of initiation of aggregation of p53 revealed by Φ-value analysis. Proc. Natl Acad. Sci. USA 112, 2437-2442 (2015).
Zhao, D. et al. Combinatorial CRISPR–Cas9 metabolic screens reveal critical redox control points dependent on the KEAP1–NRF2 regulatory axis. Mol. Cell 69, 699–708 (2018).
Article CAS PubMed PubMed Central Google Scholar
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Article CAS PubMed PubMed Central Google Scholar
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zhu, S. et al. Guide RNAs with embedded barcodes boost CRISPR-pooled screens. Genome Biol. 20, 20 (2019).
Article PubMed PubMed Central Google Scholar
Xu, P. et al. Genome-wide interrogation of gene functions through base editor screens empowered by barcoded sgRNAs. Nat. Biotechnol. 39, 1403–1413 (2021).

Download references

Acknowledgements

We thank J. W. Choi for assisting with computational analysis. This work was supported, in part, by the National Research Foundation of Korea (grants 2017R1A2B3004198 (H.H.K.), 2017M3A9B4062403 (H.H.K.) and 2018R1A5A2025079 (H.H.K)); the Brain Korea 21 Plus Project (Yonsei University College of Medicine); the Yonsei Signature Research Cluster Program of 2021-22-0014 (H.H.K.); a grant of the MD-PhD/Medical Scientist Training Program (S.L.) through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea; Lung Cancer SPORE P50 (CA070907; J.D.M.); and the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grant HI21C1314 (H.H.K.)).

Author information

These authors contributed equally: Younggwang Kim, Seungho Lee.

Authors and Affiliations

Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea
Younggwang Kim, Seungho Lee, Soohyuk Cho, Jinman Park, Dongwoo Chae & Hyongbum Henry Kim
Graduate School of Medical Science, Brain Korea 21 Plus Project for Medical Sciences, Yonsei University College of Medicine, Seoul, Republic of Korea
Younggwang Kim, Soohyuk Cho, Jinman Park & Hyongbum Henry Kim
Department of Applied Statistics, Yonsei University, Seoul, Republic of Korea
Taeyoung Park
Hamon Center for Therapeutic Oncology Research, University of Texas Southwestern Medical Center, Dallas, TX, USA
John D. Minna
Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea
Hyongbum Henry Kim
Center for Nanomedicine, Institute for Basic Science (IBS), Seoul, Republic of Korea
Hyongbum Henry Kim
Yonsei-IBS Institute, Yonsei University, Seoul, Republic of Korea
Hyongbum Henry Kim
Institute for Immunology and Immunological Diseases, Yonsei University College of Medicine, Seoul, Republic of Korea
Hyongbum Henry Kim

Authors

Younggwang Kim
View author publications
You can also search for this author in PubMed Google Scholar
Seungho Lee
View author publications
You can also search for this author in PubMed Google Scholar
Soohyuk Cho
View author publications
You can also search for this author in PubMed Google Scholar
Jinman Park
View author publications
You can also search for this author in PubMed Google Scholar
Dongwoo Chae
View author publications
You can also search for this author in PubMed Google Scholar
Taeyoung Park
View author publications
You can also search for this author in PubMed Google Scholar
John D. Minna
View author publications
You can also search for this author in PubMed Google Scholar
Hyongbum Henry Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.K., S.L. and H.H.K. conceived and designed the study. Y.K. and S.L. performed most of the experiments. J.P. critically contributed to computational analysis. S.C. critically assisted in the wet experiments. Y.K. and S.L. analyzed the data based on comments of H.H.K. J.D.M. generated and provided HBEC30KT-shTP53 cells (P cells). D.C. and T.P. contributed to the mathematical analysis (Supplementary Note 2). Y.K. and H.H.K. wrote the manuscript with input from all authors.

Corresponding author

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

Yonsei University has filed a patent application based on this work, in which Y.K., S.L. and H.H.K. are listed as inventors. J.D.M. receives licensing fees from the National Institutes of Health and the University of Texas Southwestern Medical Center for distributing human cell lines. All the other authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Exon transcript profiles of P cells.

a, Expression of TP53 mRNA in P cells and HBEC30KT cells. FPKM, fragments per kilobase of transcript per million. Boxplots are represented for n = 3 biologically independent samples as follows: center line of box indicating the median, box limits indicating the upper and lower quartile; whiskers show the 1.5 times interquartile range. b, Gene set enrichment analysis (GSEA) of exon transcript profiles of HBEC30KT, P cells, and HCC4017, a lung cancer cell line. The single sample GSEA score (ssGSEA score) represents the degree to which the genes in a particular gene set are up- or down-regulated within the sample. RNA expression data were retrieved from Kim et al³⁴.

Extended Data Fig. 2 Generation of libraries C and A.

a, The process of selecting sgRNA-target pairs for the generation of libraries C and A. SNVs, single nucleotide variants; sgRNA, single guide RNA. b, Generation of lentiviral libraries of sgRNA-encoding and target sequence pairs with unique molecular identifiers (UMIs). Oligonucleotides containing a 20-nt guide sequence, and the corresponding target sequence were synthesized and cloned into the pLenti-gRNA-puro vector to create plasmid library 1. The plasmids were then digested with BsmBI restriction enzyme and ligated with fragments containing the sgRNA scaffold sequences and UMIs to create plasmid library 2. Lentiviral libraries generated from plasmid library 2 were then transduced into cells expressing cytosine base editor (CBE) or adenine base editor (ABE) in a doxycycline-inducible manner.

Extended Data Fig. 3 Base editing efficiencies and indel frequencies at integrated target sequences.

Base editing efficiencies measured at each position in the indicated region for target nucleotide Cs (a) or As (b) in integrated surrogate target sequences. Position 1 is the 5’ end of the target sequence and position 20 is immediately upstream of the NGG PAM. The numbers of analyzed target sequences (n) are as follows: n = 5,865 (position −4), 5,393 (position −3), 5,782 (position -2), 5,815 (position -1), 5,292 (position 1), 5,614 (position 2), 5,697, 6,394, 10,586, 9,382, 8,837, 5,421, 6,130, 5,339, 5,541, 5,796, 5,058, 5,723, 5,955, 5,348, 5,779, 5,437, 4,884, 5,502 (position 20) for ABE (a); n = 19,475 (position -4), 20,753 (position -3), 20,110 (position -2), 19,425 (position -1), 19,984 (position 1), 20,004 (position 2), 17,873, 24,870, 35,421, 33,186, 32,807, 19,895, 19,195, 20,227, 19,549, 18,986, 20,367, 18,793, 18,361, 20,478, 19,605, 20,975, 21,542, 22,952 (position 20) for CBE (b). Boxplots are represented as follows: center line of box indicating the median, box limits indicating the upper and lower quartile; whiskers show the 10th and 90th percentiles. Outliers are shown using dots. c, Indel frequencies measured 10 days after the transduction of sgRNA target pairs. The number of analyzed target sequence is indicated at the top of each dataset. (n = 62,000 (Library C, Replicate 1), 77,201 (Library C, Replicate 2), 21,617 (Library A, Replicate 1) and 20,913 (Library A, Replicate 2). Boxplots are represented as follows: center white dot of box indicating the median, box limits indicating the upper and lower quartile; the distributions of indel frequencies are represented with kernel densities. d, Nonsynonymous base editing efficiencies at the integrated target sequences of synonymous control sgRNAs and other sgRNAs in the given datasets. The number of synonymous and other sgRNAs are as follows; 431 and 21,055 (Library A, Replicate 1), 413 and 20,372 (Library A, Replicate 2), 2,272 and 59,390 (Library C, Replicate 1), and 2,795 and 73,691 (Library C, replicate 2), respectively. Boxplots are represented as follows: center line of box indicating the median, box limits indicating the upper and lower quartile; whiskers show the 1.5 times interquartile range.

Extended Data Fig. 4 Performance of high-throughput evaluations.

a, Distribution of median normalized log fold changes (LFCs) of 338 sgRNAs targeting essential genes depending on the nonsynonymous base editing efficiencies determined at the integrated target sequences in library C2. NT, nontargeting sgRNAs. The number of sgRNAs n = 359 (NT), 5 (<20%), 10 (20%~40%), 55 (40%~60%), 268 (>60%). Boxplots are represented as follows: center line of box indicating the median, box limits indicating the upper and lower quartile; whiskers show the 1.5 times interquartile range. (in comparison with NT, student’s t-test; NS, not significant, *P = 1.5 × 10⁻⁴, **P = 2.2 × 10⁻³²). b,c, Receiver operating characteristic-area under the curve (ROC-AUC) analysis of LFCs for sgRNAs predicted to induce stop codons in common essential genes versus nontargeting sgRNAs in library C2 (b) and library C (c) at increasing thresholds of nonsynonymous base editing efficiencies. AUC values are indicated in parentheses. d, ROC-AUC analysis of LFCs for sgRNAs predicted to induce stop codons in common essential genes versus nontargeting controls at increasing thresholds of the number of UMIs in each sgRNA in library C. An area under curve for each UMI cutoff is shown in the parenthesis. e, Correlations between median LFCs of UMIs for sgRNAs and LFCs of UMI CPM (counts per million) for the same sgRNAs in library C2. Red dots indicate sgRNAs predicted to induce nonsense mutations in selected common essential genes. The number of sgRNAs n = 3,229 (merged), 2,913 (other sgRNAs, blue dots), 217 (sgRNAs targeting essential genes, red dots), 99 (nontargeting sgRNAs, black dots). Pearson correlation coefficients (r) are shown.

Extended Data Fig. 5 Design of small libraries and reproducibility of base editing efficiencies using these libraries.

a-b, Design of small libraries C1, C2, and A1 (a) and C3 and A2 (b). UMIs, unique molecular identifiers. c, Correlations between nonsynonymous base editing efficiencies at the integrated target sequences of biological replicates. The color of each dot was determined by the number of neighboring dots (that is, dots within a distance that is three times the radius of the dot). The base editing efficiencies were determined ten days after the initial transduction of each library into P-C or P-A cells. Only sgRNAs with more than 100 raw read counts in each replicate were included. Pearson correlation coefficients (r) are shown. The number of sgRNAs n = 3,181 (library C1), 3,063 (library C2), and 1,520 (library A1).

Extended Data Fig. 6 The number of protein variants generated by an sgRNA.

a, The proportion of sgRNAs that induce a primary protein variant. The numbers of sgRNAs are indicated in parentheses. b, The number of significant (frequency > 10%) protein variants generated by sgRNAs that induce multiple protein variants without a primary protein variant. The numbers of protein variants are indicated in parentheses.

Extended Data Fig. 7 Association between computationally predicted functions of variants and measured functions of variants.

a, The scores from driver detection algorithms (CTAT-cancer and CHASM) for 4,143 protein variants. The number of variants n = 15 (depleting), 39 (likely depleting), 864 (possibly depleting), 2,141 (neutral), 1,056 (possibly outgrowing), 25 (likely outgrowing), and 3 (outgrowing). b, The scores from algorithms that predict the functional effects of variants (SIFT and PolyPhen-2) for 3,899 protein variants. The number of variants n = 12 (depleting), 38 (likely depleting), 807 (possibly depleting), 2,009 (neutral), 1,008 (possibly outgrowing), 22 (likely outgrowing), and 3 (outgrowing). c,d, Distribution of SIFT scores (c) and PolyPhen-2 scores (d) for missense variants in common essential genes according to the LFC in library C. The number of variants n = 10 (<−0.4), 65 (−0.4~0), 82 (0~0.4). Boxplots are represented as follows: center line of box indicating the median, box limits indicating the upper and lower quartile; whiskers show the 1.5 times interquartile range.

Extended Data Fig. 8 Allele frequency tracking after transduction of sgRNA-encoding sequences into P-C or P-A cells.

sgRNA-encoding lentivirus was transduced into P-C and P-A cells at day 0 and doxycycline was added to induce expression of CBE and ABE, respectively, and maintained until day 10, after which doxycycline was removed. The functional classification results obtained from the high-throughput experiments and those from these individual experiments are shown in red and green, respectively, on the top of each graph. The mean values of two independent samples are indicated.

Extended Data Fig. 9 The results of competitive proliferation assays.

a, An example for flow cytometry gating strategy used in the competitive proliferation assays. b, Mean relative enrichment values ± standard deviation of three replicates. Student’s t test was performed under the null hypothesis that the proportions of sgRNA-transduced and nontargeting sgRNA-transduced cells would be the same. Two nontargeting sgRNAs were used as the control and the mean values of relative enrichment were used as the control.

Extended Data Fig. 10 Notable gene groups associated with outgrowing/likely outgrowing and depleting/likely depleting sgRNAs and variants.

a, (Left panel) The fraction of functionally classified sgRNAs (top) targeting cancer gene census (CGC)⁵ genes and primary protein variants (bottom) encoded by CGC genes in the outgrowing and likely outgrowing groups. Results from all libraries except library eC were combined. P-values from two-sided Fisher’s exact test are shown. The number of sgRNAs or variants either targeting or encoded by CGC genes among all sgRNAs or variants in each group are shown on the x-axes. (Right panel) Detailed distribution of sgRNAs predicted to introduce mutations in CGC genes (top) and variants generated in CGC genes (bottom). The number of sgRNAs or variants corresponding to each gene is specified in parentheses. b, The fraction of functionally classified sgRNAs (left) targeting Depmap common essential genes (CEGs) and protein variants (right) encoded by CEGs in the depleting and likely depleting groups. Results from all libraries except library eC were combined. P-values from two-sided Fisher’s exact test are shown. The numbers of sgRNAs or variants either targeting or encoded by CEG genes among all sgRNAs or variants in each group are shown on the x-axes.

Supplementary information

Supplementary Information

Supplementary Notes 1 and 2 and Supplementary Figs. 1–7

Reporting Summary

Supplementary Table

Supplementary Table 1. Composition of sgRNA-encoding libraries C, A, C1, C2, C3, A1, A2, dA and eC. Barcode sequences used for sorting, sgRNA sequences and target sequences, including neighboring sequences (5′-neighboring sequence (4 bp) + target sequence (20 bp + 3-bp PAM = 23 bp) + 3′-neighboring sequence (3 bp) = 30 bp of genomic DNA sequence). Information about intended mutations and DeepCBE or DeepABE efficiency scores are also included (provided as a separate Excel file). Supplementary Table 2. The results of MAGeCK analyses. RPM of four replicate^UMI, LFCs, median LFCs (mLFCs), positive or negative MAGeCK RRA P values and LFCs of UMI CPM are shown for each sgRNA (provided as a separate Excel file). Supplementary Table 3. Functional classifications of sgRNAs and protein variants. a, Functional classification of sgRNAs based on the proliferation and survival (sheet 1). b, Functional classification of sgRNAs in library eC (sheet 2). c, Base editing outcomes and allele frequencies at the integrated target sequences (dependency on EGF signaling) (sheet 3). d, Potential classification of sgRNAs with low base editing efficiencies (sheet 4) (provided as a separate Excel file). Supplementary Table 4. Results of allele frequency tracking after delivery of an individual sgRNA for 20 selected sgRNAs. After lentiviral transduction of the specified individual sgRNA, protein variant frequencies were calculated from DNA sequence analysis. Endogenous DNA sequence variants encoding the same amino acid change were combined into one protein variant (provided as a separate Excel file). Supplementary Table 5. Oligonucleotides used in this study (provided in a separate Excel file).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, Y., Lee, S., Cho, S. et al. High-throughput functional evaluation of human cancer-associated mutations using base editors. Nat Biotechnol 40, 874–884 (2022). https://doi.org/10.1038/s41587-022-01276-4

Download citation

Received: 18 May 2021
Accepted: 10 March 2022
Published: 11 April 2022
Issue Date: June 2022
DOI: https://doi.org/10.1038/s41587-022-01276-4

This article is cited by

High-throughput evaluation of genetic variants with prime editing sensor libraries
- Samuel I. Gould
- Alexandra N. Wuest
- Francisco J. Sánchez Rivera
Nature Biotechnology (2024)
Recent advances in CRISPR-based functional genomics for the study of disease-associated genetic variants
- Heon Seok Kim
- Jiyeon Kweon
- Yongsub Kim
Experimental & Molecular Medicine (2024)
Precise genome-editing in human diseases: mechanisms, strategies and applications
- Yanjiang Zheng
- Yifei Li
- Yimin Hua
Signal Transduction and Targeted Therapy (2024)
Generation of precision preclinical cancer models using regulated in vivo base editing
- Alyna Katti
- Adrián Vega-Pérez
- Lukas E. Dow
Nature Biotechnology (2024)
Direct measurement of engineered cancer mutations and their transcriptional phenotypes in single cells
- Heon Seok Kim
- Susan M. Grimes
- Hanlee P. Ji
Nature Biotechnology (2023)