Abstract
CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The processed data used in this study have been deposited at Zenodo (https://doi.org/10.5281/zenodo.10139794), and primary sequencing data are available at the Sequence Read Archive under accession PRJNA1042659. Controlled access, patient-level data from the UKB may be requested at https://ams.ukbiobank.ac.uk/ams/. Source data are provided with this paper.
Code availability
Bean source code is available at https://github.com/pinellolab/crispr-bean. The scripts used to generate the figures and analyses presented in the study have been deposited at https://github.com/pinellolab/bean_manuscript and Zenodo110. The version (0.2.9) of ‘bean’ used for the analyses presented in this paper has been deposited at Zenodo74.
References
Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Gasperini, M., Starita, L. & Shendure, J. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 11, 1782–1787 (2016).
Araya, C. L. & Fowler, D. M. Deep mutational scanning: assessing protein function on a massive scale. Trends Biotechnol. 29, 435–442 (2011).
Myers, R. M., Tilly, K. & Maniatis, T. Fine structure genetic analysis of a β-globin promoter. Science 232, 613–618 (1986).
Inoue, F. & Ahituv, N. Decoding enhancers using massively parallel reporter assays. Genomics 106, 159–164 (2015).
Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Prim. 2, 9 (2022).
Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Hanna, R. E. et al. Massively parallel assessment of human variants with base editor screens. Cell 184, 1064–1080 (2021).
Morris, J. A. et al. Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science 380, eadh7699 (2023).
Martin-Rufino, J. D. et al. Massively parallel base editing to map variant effects in human hematopoiesis. Cell 186, 2456–2474 (2023).
Cuella-Martin, R. et al. Functional interrogation of DNA damage response variants with base editing screens. Cell 184, 1081–1097 (2021).
Pablo, J. L. B. et al. Scanning mutagenesis of the voltage-gated sodium channel NaV1.2 using base editing. Cell Rep. 42, 112563 (2023).
Coelho, M. A. et al. Base editing screens map mutations affecting interferon-γ signaling in cancer. Cancer Cell 41, 288–303 (2023).
Cheng, L. et al. Single-nucleotide-level mapping of DNA regulatory elements that control fetal hemoglobin expression. Nat. Genet. 53, 869–880 (2021).
Sánchez-Rivera, F. J. et al. Base editing sensor libraries for high-throughput engineering and functional analysis of cancer-associated single nucleotide variants. Nat. Biotechnol. 40, 862–873 (2022).
Kim, Y. et al. High-throughput functional evaluation of human cancer-associated mutations using base editors. Nat. Biotechnol. 40, 874–884 (2022).
Kweon, J. et al. A CRISPR-based base-editing screen for the functional assessment of BRCA1 variants. Oncogene 39, 30–35 (2020).
Huang, C., Li, G., Wu, J., Liang, J. & Wang, X. Identification of pathogenic variants in cancer genes using base editing screens with editing efficiency correction. Genome Biol. 22, 80 (2021).
Sangree, A. K. et al. Benchmarking of SpCas9 variants enables deeper base editor screens of BRCA1 and BCL2. Nat. Commun. 13, 1318 (2022).
Lue, N. Z. et al. Base editor scanning charts the DNMT3A activity landscape. Nat. Chem. Biol. 19, 176–186 (2023).
Després, P. C., Dubé, A. K., Seki, M., Yachie, N. & Landry, C. R. Perturbing proteomes at single residue resolution using base editing. Nat. Commun. 11, 1871 (2020).
Garcia, E. M. et al. Base editor scanning reveals activating mutations of DNMT3A. ACS Chem. Biol. 18, 2030–2038 (2023).
Lue, N. Z. & Liau, B. B. Base editor screens for in situ mutational scanning at scale. Mol. Cell 83, 2167–2187 (2023).
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).
Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).
Bouhairie, V. E. & Goldberg, A. C. Familial hypercholesterolemia. Cardiol. Clin. 33, 169–179 (2015).
Brown, M. S. & Goldstein, J. L. How LDL receptors influence cholesterol and atherosclerosis. Sci. Am. 251, 58–66 (1984).
Mundal, L. J. et al. Impact of age on excess risk of coronary heart disease in patients with familial hypercholesterolaemia. Heart 104, 1600–1607 (2018).
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Hamilton, M. C. et al. Systematic elucidation of genetic mechanisms underlying cholesterol uptake. Cell Genom. 3, 100304 (2023).
Spady, D. K. Hepatic clearance of plasma low density lipoproteins. Semin. Liver Dis. 12, 373–385 (1992).
Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 883–891 (2020).
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR–Cas9 variants. Science 368, 290–296 (2020).
Park, H., Shin, J., Choi, H., Cho, B. & Kim, J. Valproic acid significantly improves CRISPR/Cas9-mediated gene editing. Cells 9, 1447 (2020).
Shin, H. R. et al. Small-molecule inhibitors of histone deacetylase improve CRISPR-based adenine base editing. Nucleic Acids Res. 49, 2390–2399 (2021).
Yang, C. et al. HMGN1 enhances CRISPR-directed dual-function A-to-G and C-to-G base editing. Nat. Commun. 14, 2430 (2023).
Arbab, M. et al. Base editing rescue of spinal muscular atrophy in cells and in mice. Science 380, eadg6518 (2023).
Schep, R. et al. Impact of chromatin context on Cas9-induced DNA double-strand break repair pathway balance. Mol. Cell 81, 2216–2230 (2021).
Ding, X. et al. Improving CRISPR–Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. CRISPR J. 2, 51–63 (2019).
Liu, G., Yin, K., Zhang, Q., Gao, C. & Qiu, J.-L. Modulating chromatin accessibility by transactivation and targeting proximal dsgRNAs enhances Cas9 editing efficiency in vivo. Genome Biol. 20, 145 (2019).
Maeder, M. L. et al. CRISPR RNA-guided activation of endogenous human genes. Nat. Methods 10, 977–979 (2013).
Perez-Pinera, P. et al. RNA-guided gene activation by CRISPR–Cas9-based transcription factors. Nat. Methods 10, 973–976 (2013).
Qi, L. S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183 (2013).
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
Li, W. et al. Quality control, modeling, and visualization of CRISPR screens with MAGeCK-VISPR. Genome Biol. 16, 281 (2015).
Jeong, H.-H., Kim, S. Y., Rousseaux, M. W. C., Zoghbi, H. Y. & Liu, Z. Beta-binomial modeling of CRISPR pooled screen data identifies target genes with greater sensitivity and fewer false negatives. Genome Res. 29, 999–1008 (2019).
Daley, T. P. et al. CRISPhieRmix: a hierarchical mixture model for CRISPR pooled screens. Genome Biol. 19, 159 (2018).
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘Sum of Single Effects’ model. PLoS Genet. 18, e1010299 (2022).
Tehranchi, A. et al. Fine-mapping cis-regulatory variants in diverse human populations. eLife 8, e39595 (2019).
Tehranchi, A. K. et al. Pooled ChIP–seq links variation in transcription factor binding to complex disease risk. Cell 165, 730–741 (2016).
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Currin, K. W. et al. Genetic effects on liver chromatin accessibility identify disease regulatory variants. Am. J. Hum. Genet. 108, 1169–1189 (2021).
Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at bioRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Biasella, F., Plössl, K., Karl, C., Weber, B. H. F. & Friedrich, U. Altered protein function caused by AMD-associated variant rs704 links vitronectin to disease pathology. Invest. Ophthalmol. Vis. Sci. 61, 2 (2020).
Yao, Q. et al. Motif-Raptor: a cell type-specific and transcription factor centric approach for post-GWAS prioritization of causal regulators. Bioinformatics 37, 2103–2111 (2021).
Jing, Z., Liu, Y., Dong, M., Hu, S. & Huang, S. Identification of the DNA binding element of the human ZNF333 protein. J. Biochem. Mol. Biol. 37, 663–670 (2004).
Witzgall, R., O’Leary, E., Leaf, A., Onaldi, D. & Bonventre, J. V. The Krüppel-associated box-A (KRAB-A) domain of zinc finger proteins mediates transcriptional repression. Proc. Natl Acad. Sci. USA 91, 4514–4518 (1994).
Fass, D., Blacklow, S., Kim, P. S. & Berger, J. M. Molecular basis of familial hypercholesterolaemia from structure of LDL receptor module. Nature 388, 691–693 (1997).
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Yu, T., Fife, J. D., Adzhubey, I., Sherwood, R. & Cassa, C. A. Joint estimation and imputation of variant functional effects using high throughput assay data. Preprint at medRxiv https://doi.org/10.1101/2023.01.06.23284280 (2023).
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery 2016).
Clarke, S. L. et al. Coronary artery disease risk of familial hypercholesterolemia genetic variants independent of clinically observed longitudinal cholesterol exposure. Circ. Genom. Precis. Med. 15, e003501 (2022).
Zhou, Y., Pan, Q., Pires, D. E. V., Rodrigues, C. H. M. & Ascher, D. B. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res. 51, W122–W128 (2023).
Jubb, H. C. et al. Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures. J. Mol. Biol. 429, 365–371 (2017).
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H. & Zehfus, M. H. Hydrophobicity of amino acid residues in globular proteins. Science 229, 834–838 (1985).
Ryu, J. & Pinello, L. pinellolab/crispr-bean: v0.2.9. Zenodo https://doi.org/10.5281/zenodo.10191493 (2023).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Brnich, S. E. et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 3 (2019).
Domanski, M. J. et al. Time course of LDL cholesterol exposure and cardiovascular disease event risk. J. Am. Coll. Cardiol. 76, 1507–1516 (2020).
Duncan, M. S., Vasan, R. S. & Xanthakis, V. Trajectories of blood lipid concentrations over the adult life course and risk of cardiovascular disease and all-cause mortality: observations from the Framingham Study over 35 years. J. Am. Heart Assoc. 8, e011433 (2019).
Mundal, L. & Retterstøl, K. A systematic review of current studies in patients with familial hypercholesterolemia by use of national familial hypercholesterolemia registries. Curr. Opin. Lipidol. 27, 388–397 (2016).
Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).
Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).
Gao, H. et al. The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023).
Klimentidis, Y. C. et al. Phenotypic and genetic characterization of lower LDL cholesterol and increased type 2 diabetes risk in the UK Biobank. Diabetes 69, 2194–2205 (2020).
Oommen, D., Kizhakkedath, P., Jawabri, A. A., Varghese, D. S. & Ali, B. R. Proteostasis regulation in the endoplasmic reticulum: an emerging theme in the molecular pathology and therapeutic management of familial hypercholesterolemia. Front. Genet. 11, 570355 (2020).
Wheeler, T. J., Clements, J. & Finn, R. D. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics 15, 7 (2014).
Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).
Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Untergasser, A. et al. Primer3—new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. In Proceedings of the 9th Python in Science Conference (eds Van der Walt, S. & Millman, J.) https://doi.org/10.25080/majora-92bf1922-011 (SciPy, 2010).
McWilliam, H. et al. Analysis tool web services from the EMBL-EBI. Nucleic Acids Res. 41, W597–W600 (2013).
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
Goujon, M. et al. A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 38, W695–W699 (2010).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
The PyMOL Molecular Graphics System v.1.8 (Schrödinger, 2015).
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Webb, B. & Sali, A. Comparative protein structure modeling using MODELLER. Curr. Protoc. Protein Sci. 86, 2.9.1–2.9.37 (2016).
Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Ryu, J. K., Tognon, M. & Li, Z. pinellolab/bean_manuscript: v1.0.2. Zenodo https://doi.org/10.5281/zenodo.10775808 (2024).
Acknowledgements
We thank G. Losyev, A. James, Q. Qin, C. Smith, L. Blaine, K. Clement, Z. Patel, S. Yang and H. Boen for technical assistance. Funding for this work was obtained from UM1HG012010 (R.I.S. and L.P.), 1R01HL164409 (C.A.C., R.I.S. and L.P.), 1R01GM143249 (R.I.S.), R01HG010372 (C.A.C. and T.Y.), the American Cancer Society (R.I.S.), the American Heart Association (R.I.S.), the National Organization for Rare Diseases (R.I.S.), 1R35HG010717-01 (L.P.), the National Health and Medical Research Council of Australia (GNT1174405; D.B.A. and Y.Z.), and the Victorian Government’s Operational Infrastructure Support Program (Y.Z. and D.B.A.). We are indebted to the UKB and its participants (UKB application 41250 and IRB protocol 2020P002093).
Author information
Authors and Affiliations
Contributions
R.I.S. conceived the experimental design, and J.R. and L.P. conceptualized BEAN. S.B. collected screen data. J.R. developed BEAN, and M.J., M.I.L. and L.P. advised on design and implementation of BEAN. J.R. and T.Y. processed and analyzed data. T.Y. performed BE-Hive and FUSE analysis. M.F., Q.V.P. and R.I.S. performed downstream characterization of LDL-C GWAS variants. T.Y., L.B., V.B. and C.A.C. obtained and analyzed UKB data. Y.Z. led structural analysis of LDLR variants with J.R. and D.B.A. J.R. and Z.L. benchmarked classification performance. M.T. and L.P. performed analysis of variant impact on transcription factor binding. G.L. advised on library design. J.R. and R.I.S. drafted the manuscript. R.I.S., L.P. and C.A.C. provided guidance and supervised this project. All the authors wrote and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
L.P. has financial interests in Edilytics and SeQure Dx. L.P.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict-of-interest policies. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Andrew Wood and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Base editor editing preference profile and context specificity.
Deamination motif and PAM-dependent editing preference of AID-BE5-SpRY from 7294 gRNAs and AID-BE5-Cas9NG from 7299 gRNAs with more than 9 read counts across any replicates of bulk samples. a) Context specificity of AID-BE5-SpRY are represented as sequence logos. The height of each base represents the relative editing efficiency with each base. b) Mean editing efficiency of AID-BE5-SpRY by protospacer position and PAM sequence. c) Context specificity of AID-BE5-CasNG is represented as sequence logos. The height of each base represents the relative editing efficiency with each base. d) Mean editing efficiency of AID-BE5-CasNG by protospacer position and PAM sequence.
Extended Data Fig. 2 Nucleotide-level editing comparison of reporter and endogenous locus.
a) Scatterplots comparing of per-nucleotide-level editing efficiencies between the reporter and endogenous target sites. All edits introduced by each of 49 gRNAs across four loci across 3 experimental replicates were plotted. Points are colored by the identity of nucleotide edit and gRNA. b) The same plot colored by gRNA strand. R; Pearson correlation coefficient. n; number of plotted editing rates.
Extended Data Fig. 3 BEAN plate diagrams.
Plate diagrams of a) BEAN b) BEAN-Reporter, c) BEAN-Uniform. Xb and all parameters with superscript b is not used for benchmark analyses.
Extended Data Fig. 4 LDL-C GWAS library classification task benchmark.
a) AUPRC plot for classifying positive splicing control variants against negative control variants. Metrics for all 5 replicates are shown as markers and metrics of 15 two-replicate subsamples among the 5 replicates are shown as box plots. Boxplot was plotted as described in the statistical note of the Methods section. b) Precision-Recall curve for classifying all positive control splice sites of against negative controls for all replicates with no failing samples. c) Precision-Recall curve for classifying splice sites of LDLR/MYLIP against negative controls for all replicates with no failing samples. d) Precision-Recall curve for classifying all positive control splice sites of against negative controls for 2-replicate subsample of the data. Mean Precision value for a recall across 15 subsample runs are plotted as solid line. e) Precision-Recall curve for classifying splice sites of LDLR/MYLIP against negative controls for 2-replicate subsample of the data. Mean Precision value for a recall across 15 subsample runs are plotted as solid line.
Extended Data Fig. 5 Comparison of inferred effect sizes of individually transfected LDL-C GWAS library gRNAs.
Scatterplot and Pearson correlation coefficients (R) of effect size estimates and the log fold change (LFC) of fluorescence signal following individual transfection of 22 gRNAs. R; Spearman correlation coefficient.
Extended Data Fig. 6 BEAN accurately estimates variant effect confidence from per-variant evidence in input data.
a-c) Scatterplot of 2,182 LDLR tiling library variants comparing a) nnorm and effective edit rates, b) effective edit rates and BEAN σμ, and c) nnorm and BEAN σμ. d) Histogram of effective edit rates of 76 LDLR tiling library variants with UKB LDL-C levels. Quartile bin cutoffs used to categorize variants are shown as dotted lines. e) Scatterplots of BEAN z-scores and statin-adjusted UKB LDL-C measurements for variants in each effective edit rate quartile bin. r and rho shows the Pearson and Sparman correlation coefficients, respectively.
Extended Data Fig. 7 LDLR tiling library classification task benchmark.
a) AUPRC of classifying Pathogenic/Likely Pathogenic from Benign/Likely Benign variants when using 4 replicates without failing samples and 6 2-replicates combinations among the replicates. Bounds and the center of the boxes are the interquartile ranges. Boxplot was plotted as described in the statistical note of the Methods section. b-e) Precision-recall curve of classifying b, d) Pathogenic/Likely Pathogenic c, e) Pathogenic from Benign/Likely Benign variants. Top panels (b, c) show classification when used 4 replicates without failing samples. Bottom panels (d, e) show when used 6 2-replicates combinations among 4 replicates without failing samples.
Extended Data Fig. 8 Comparison of functional impact and conservation within conserved LDLR domains.
Repeat domain alignments shown with BEAN z-score for a) LDLR class A repeat domain, b) LDLR class B repeat domain, c) EGF-like domains aligned with the Pfam profile HMM logo by Skylign, where the height of each position show its information content and letter heights show the total height scaled by relative frequencies of the letters in the position. For a), conserved cysteine residue position is highlighted and for b-c), consensus positions from Clustal Omega alignment output are highlighted in grey.
Extended Data Fig. 9 Expanded LDLR missense variant pathogenicity estimates with FUSE.
a) Scatterplot of all considered UKB variant mean statin-adjusted LDL-C level against imputed BEAN-FUSE score. b) Prediction outcome of unobserved variants with XGBoost model trained on observed UKB variants and mean statin-adjusted LDL levels. c-d) Correlation coefficients and root mean squared error (RMSE) for predicted and true UKB mean statin-adjusted LDL-C level for XGBoost model with FUSE score, PhastCons PhyloP conservation score, and both as the input in predicting LDL-C levels. c) Boxplot of metrics for prediction of observed variants with 10-fold cross validation (n = 10) d) Barplot of metrics for prediction of unobserved variants with model trained on observed variants (n = 1). r, ρ; Pearson, Spearman correlation coefficient, RMSE; Root mean squared error.
Extended Data Fig. 10 Local atomic interaction in wild type and mutated structure for selected variants in LDLR class B repeat domain.
a–k, Residues with interaction with the variant position are shown. Variant positions and interacting residues are colored by the reference amino acid and atomic elements (O: red, N: blue, S: yellow). Ref AA; reference amino acid.
Supplementary information
Supplementary Information
Supplementary Figs. 1–20 and Notes 1–8.
Supplementary Tables
Supplementary Tables 1–11.
Supplementary Data 1
Annotated base editor plasmid sequences.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 9
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ryu, J., Barkal, S., Yu, T. et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet 56, 925–937 (2024). https://doi.org/10.1038/s41588-024-01726-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01726-6