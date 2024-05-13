Cell culture and genomic DNA extraction

Human osteosarcoma U2OS cells (American Type Culture Collection (ATCC)), human embryonic kidney cells (HEK293, ATCC) and HepG2 cells (a gift from Julian König’s laboratory) were cultured in DMEM (Gibco, 41965062) supplemented with 10% FBS (PAN-Biotech, P40-37500), 100 U ml−1 penicillin–streptomycin and 2 mM L-glutamine. K562-Cas9 cells (GeneCopoeia, SL552) were cultured in RPMI1640 medium (Gibco, 11875093) supplemented with 10% FBS (PAN-Biotech, P40-37500), 100 U ml−1 penicillin–streptomycin and 2 mM L-glutamine and kept under selection with hygromycin. HeLa Kyoto cells were infected with viral particles from LentiCas9-Blast (Addgene, 5292), and stable clones expressing Cas9 were maintained in DMEM supplemented with 10% FBS, 100 U ml−1 penicillin–streptomycin, 2 mM L-glutamine and 7 μg ml−1 blasticidin. Immortalized B cells from GIAB donors Chinese son (GM24631, Coriell), Chinese father (GM24694, Coriell), Chinese mother (GM24695, Coriell), Ashkenazi Jewish son (GM24385, Coriell) and Ashkenazi Jewish mother (GM24143, Coriell) were maintained in RPMI 1640 medium (Gibco, 11875093) supplemented with 15% FBS (PAN-Biotech, P40-37500), 100 U ml−1 penicillin–streptomycin and 2 mM L-glutamine. All cell lines were maintained in a humidified incubator at 37 °C supplemented with 5% CO 2 .

The gDNA of cells was extracted using a Qiagen Blood & Tissue Kit (Qiagen, 69506) following the manufacturer’s instructions and eluted in nuclease-free water.

gDNA of GIAB30,31 individuals was purchased from Coriell: female Utah/Mormon (NA12878), Ashkenazi Jewish son (NA24385), Ashkenazi Jewish father (NA24149), Ashkenazi Jewish mother (NA24143), Chinese son (NA24631), Chinese father (NA24694) and Chinese mother (NA24695).

Expression and purification of homemade Tn5

Expression and purification of hyperactive Tn5 (E54K, L372P) were performed as described previously50 with the following modifications: Tn5 was expressed as an N-terminal His 6 –GST fusion followed by a 3C protease cleavage site. GSH affinity purification was used to capture the fusion protein, and it was subsequently cleaved using recombinant 3C protease.

Tn5 loading and BreakTag linker preparation

Tn5-B adapter was prepared by mixing 100 µM Tn5ME-B and 100 µM Tn5MErev51 (Supplementary Table 7) resuspended in annealing buffer (50 mM NaCl, 40 mM Tris, pH 8) at a 1:1 ratio. The oligos were annealed in a thermocycler programmed as follows:

Step Temperature Time 1 95 °C 5 min 2 65 °C −0.1 °C s−1 3 65 °C 5 min 4 4 °C −0.1 °C s−1 5 4 °C Hold

Tn5 was loaded with pre-annealed Tn5-B adapter for 1 h at room temperature with agitation (300 r.p.m.) in a thermoshaker.

The BreakTag linker was prepared by combining 10 µM BreakTag_fwd and 10 µM BreakTag_rev oligos (Supplementary Table 7) in T4 polynucleotide kinase buffer (New England Biolabs (NEB), M0201S). The oligos were annealed in a thermocycler programmed as follows:

Step Temperature Time 1 95 °C 5 min 2 Cool to 25 °C −0.1 °C s−1 3 25 °C Hold

In vitro digestion of gDNA with Cas9 RNPs

RNPs were assembled by mixing Cas9 and sgRNA at equimolar ratios in NEB 3.1 buffer (NEB, B72030), followed by incubation at 37 °C for 10 min. For HiPlex BreakTag, pools were mixed with the nuclease at a 2:1 ratio. An input of 500 ng of gDNA was mixed with each RNP at a final concentration of 90 nM and incubated at 37 °C for 1 h in a thermocycler with the lid set at 37 °C. The reaction was terminated by adding RNase A (Thermo Fisher Scientific, 10753721) and proteinase K (NEB, P8107) at final concentrations of 0.8 µg µl−1 and 0.2 µg µl−1, respectively, at 37 °C for 20 min, followed by incubation at 55 °C for 20 min. Nuclease-digested gDNA was purified with DNA AMPure XP beads (1.2× volumes, Beckman Coulter, A63881).

HiPlex sgRNA production

Sequences for HiPlex1 (ref. 7) and HiPlex2 (ref. 10) pools (Supplementary Table 1) were bioinformatically split into 10 pools. Each pool contained 150 gRNAs for HiPlex1 and 140 gRNAs for HiPlex2, modified as follows: the last nucleotide at the 5′ end of the gRNA sequence (position 20) was replaced with a G for efficient T7 transcription. A T7 promoter sequence 5′-GGATCCTAATACGACTCACTATAG-3′ was added at the 5′ end of the protospacer, and a SpCas9 scaffold sequence 5′-GTTTTAGAGCTAGAA-3′ was added at the 3′ end. The sequences were ordered as DNA oPools (Integrated DNA Technologies (IDT)) and reconstituted in nuclease-free water at 100 µM. In-house production of sgRNAs was performed using the HighYield T7 sgRNA Synthesis Kit (SpCas9) (Jena Bioscience, RNT-105) following the manufacturer’s instructions. In brief, each pool (1 µM) was used for an assembly PCR reaction using three primers: T7fwd_sRNA: 5′-GGATCCTAATACGACTCACTATAG-3′, T7rev_sgRNA: 5′-AAAAAAGCACCGACTCGG-3′ and SpCas9_scaffold: 5′-AAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC-3′. To increase complexity and avoid PCR bias, we performed three separate PCR reactions for each pool, which were then combined before IVT. The expected size of the assembled DNA template was confirmed on an agarose gel and used directly for T7 IVT. Three IVT reactions per pool were performed for increased yield and were incubated for 90 min at 37 °C. IVT products were purified using 2× volumes of Agencourt RNAClean XP magnetic beads (Beckman Coulter, A66514) and resuspended in nuclease-free water. RNA concentration was estimated using Qubit RNA Broad Range (Invitrogen, Q10211).

BreakTag procedure and sequencing

DNA DSB ends of nuclease-digested gDNA were repaired and 3′ adenylated using the NEBNext Ultra II End Repair/dA-Tailing Module (NEB, E7546) according to the manufacturer’s instructions with the following modification: the total volume of the reaction was halved by using half the volume of the reagents. Labeling of DSB ends by ligation with the BreakTag linker was performed using the NEBNext Ultra II Ligation Module (NEB, E7595) according to the manufacturer’s instructions with the following modifications: the total volume of the reaction was halved by using half the volume of the reagents, and the USER enzyme digestion step was omitted. The BreakTag linker was used at a final concentration of 50 nM per sample. Labeled DNA was size selected two times using 0.7× volumes of DNA AMPure XP beads (Beckman Coulter, A63987) and eluted in nuclease-free water. Tagmentation with in-house Tn5 was performed in freshly prepared 10 mM Tris-HCl (pH 7.5) buffer containing 10 mM MgCl 2 and 25% N,N-dimethylformamide (DMF, Sigma-Aldrich, 227056). Tagmentation reactions were assembled using 100–200 ng of DSB-labeled DNA as input. Single-handle hyperactive Tn5 was used at a final concentration of 1.25 ng µl−1 per reaction. Tn5 was loaded with the Tn5ME-B oligonucleotide for 1 h at room temperature (Supplementary Table 7). The tagmentation mix was then incubated at 55 °C for 5 min in a pre-heated thermocycler followed by termination with 0.2% SDS at room temperature for 5 min. Libraries were amplified with NEBNext Ultra II Q5 Master Mix (NEB, M0544) in a thermocycler programmed as follows:

Step Temperature Time 1 72 °C 5 min Gap-filling reaction 2 98 °C 30 s 3 98 °C 10 s 4 63 °C 30 s 14 loops (steps 3–5) 5 72 °C 60 s 6 72 °C 5 min 7 12 °C Hold

Amplified and barcoded samples were size selected by performing two consecutive 0.5× volume right-tail + 0.35× volume left-tail size (final volume 0.85x) selections using DNA AMPure XP beads (Beckman Coulter, A63987). Libraries were quantified using a Qubit dsDNA High Sensitivity Assay Kit or a sparQ Universal Library Quant Kit (QuantaBio, 95210-100), and fragment size distribution was assessed on a Bioanalyzer High Sensitivity DNA chip. Libraries were pooled and sequenced on a NextSeq 500/550 platform with NextSeq 500/550 High Output Kit v2 chemistry for SE 1 × 75 bp sequencing or NovaSeq PE 2 × 150 bp with a 15% PhiX spike-in.

BreakTag data analysis with BreakInspectoR

Initial pre-processing was done in a Linux cluster using the BreakTag NGSpipe2go pipeline (https://github.com/roukoslab/breaktag). The pipeline processes raw reads as they are output by the sequencer and generates a BED file with coordinates containing DSBs. Raw reads (single-end or paired-end) were first scanned, and those not containing the expected 8-nt UMI followed by the 8-nt sample barcode in the 5′ end of read 1 were discarded. Valid reads were aligned to the human reference genome version hg38 downloaded from UCSC with timestamp of 15 January 2014, 21:14, using the ‘mem’ command in BWA (version 0.7.17-r1188)52 with a seed length of 19 and default scoring/penalty values for mismatches, gaps and read clipping. Reads mapped with a minimum quality score Q = 60 were retained to ensure that we worked only with uniquely mapping reads. A final de-duplication step was performed in which spatial consecutive reads mapping within a window of 30 nt, and their UMIs differing by up to two mismatches, were considered close PCR duplicates, and only one was kept. The resulting reads were aggregated per position and reported as a BED file.

Subsequent analysis was done using the BreakInspectoR package in R (https://github.com/roukoslab/breakinspectoR), which performs a guided search toward putative on-targets/off-targets. Starting from the previously generated BED files, BreakInspectoR identifies stacks of read ends near a PAM as candidate loci for containing a DSB, and it calculates a P value and a false discovery rate for each site identified, considering also the signal found in a non-targeted library. For HiPlex libraries, this process was sequentially repeated for all sgRNAs included in the pool. BreakInspectoR may identify ambiguous targets for sgRNAs in the pool that are separated by a Hamming distance of seven substitutions or less. Any ambiguous targets were removed from the list of all targets for a HiPlex library as necessary. The identification of sites required the function ‘breakinspectoR()’ to search for stacks of at least three read ends at a distance of 3 nt from an ‘NGG’ PAM, which is preceded by a protospacer sequence that differs by seven mismatches at most from the sgRNA sequence. Only breaks identified in standard chromosomes were retained. For the ‘PAM usage’ analysis (Fig. 1g), we called ‘breakinspectoR()’ with the same parameters but allowing any PAM (‘NNN’). RNA and DNA bulges in the off-targets nominated with BreakInspectoR were not excluded from the analysis.

Blunt rate estimation

For each site identified by BreakInspectoR, we analyzed the scission profile using the ‘scission_profile_analysis()’ function. This function analyzes the signal in the PAM-proximal side and returns a table in the form of a ‘data.frame’ attached as metadata columns of a ‘GRanges’ object53. The table extends the coordinates of the original DSB with the signal found around the position at which the enzyme is expected to cut, a P value and a false discovery rate that assess the significance of the signal found outside the expected cut site compared to the non-target library and the classification of a site according to its preference for forming blunt or staggered breaks. We performed the analysis by using the function to look in a region between [−3, +3] nucleotides upstream/downstream of the expected cut site; for Cas9, this was 3 nt upstream (toward the 5′ end) from the PAM. To avoid sites that could mislead the analysis, we focused only on sites with an ‘NGG’ PAM, for which, in principle, expected cut sites are readily identified. Finally, from the table generated by ‘scission_profile_analysis()’, we could calculate the blunt rate for a site. We did this in two ways: (1) as a fraction of the signal found in the expected cut site (PAM 3 nt upstream—that is, position 17 of the protospacer) and the total amount of signal in the region [−3, +3] around the cut site and (2) as a log 2 ratio of the signal in the expected cut site versus the signal in the region [−3, +3] around the cut site after excluding the signal in the cut site.

Machine learning model for the prediction of blunt rates

We trained a machine learning model to predict scission profiles using the XGBoost flavor of the Gradient Boosting Machine algorithm implemented in the H2O.ai framework (Extended Data Fig. 4a). The software was installed in the Bioconductor R container release version 3.15 (ref. 54) (bioconductor/bioconductor_docker:RELEASE_3_15). We tuned the hyperparameters of the algorithm to use 1,000 trees of unlimited depth, DART as the booster algorithm55 and five folds for K-fold cross-validation with automatic fold assignment of instances.

Because the number and scission profiles of the identified targets differ greatly among sgRNA constructs, we used only a subset of the total identified targets as training instances. We selected only highly covered sites with at least 16 raw reads in the PAM-proximal side and accounted for specific biases. We limited the number of targets selected per sgRNA to 100 to avoid biases toward highly promiscuous sgRNA sequences and additionally sampled staggered targets with a probability K−1, where K is the ratio between the number of staggered (blunt reads < 20%) and blunt (blunt reads > 80%) targets for a specific sgRNA, to pick more from the pool of staggered targets and compensate for their under-representation in the total set of identified targets. This resulted in a final set of 18,759 ‘instances’ in the training set.

The ‘response’ variable to be predicted was the log 2 ratio between the number of raw reads mapped in the PAM-proximal side exactly at position 17 of the protospacer (the expected cut site) and the sum of raw reads mapped in the PAM-proximal side found in positions 14–16 and 18–20 of the protospacer. A pseudocount was added to both the denominator and numerator of this fraction to avoid a division by 0.

We reflected in the ‘predictor’ variables both the on-target/off-target protospacer sequence and the actual gRNA sequence, along with the mismatches between the two. We performed one-hot encoding by constructing a 4 × 4 matrix for each of the 20 positions of the protospacer, each row representing one of the possible nucleotides (A, C, G, T) to occupy that position in the targeted protospacer, and in each column the same for the sgRNA sequence. The matrix was filled with ‘0’ with the exception of the cell representing the nucleotide in the protospacer (row) and the sgRNA (column) for that position, which would contain ‘1’. Each matrix was converted into a vector of length 16 by concatenating the column vectors, and, finally, the 20 vectors were concatenated into one large vector of length 320 with the final representation of the one-hot encoding. In addition, we included an additional predictor variable representing the number of mismatches between the targeted protospacer and the sgRNA sequence in the first 10 positions of the protospacer and a second variable representing the mismatches in the last 10 positions of the protospacer. In total, we used 322 variables to represent each training instance. Sequence motifs related to the scission profile were produced with the ggseqlogo package in R56.

Selection of SNP-containing sites in GIAB genomes

We downloaded the VCF file containing the single-nucleotide variants (SNVs) called in GIAB31 (Supplementary Table 9). We filtered the files to retain SNPs only and retrieved the 20 bp of sequence context around those sites. We retained two subsets of 394,585 and 395,392 putative CRISPR–Cas9 target sites that contain an ‘NGG’ PAM preceded by a protospacer containing at positions 17 or 18 (respectively) a SNP found in at least one of the GIAB samples. We then used the reduced machine learning model, which uses only the last 10 positions of the protospacer, to predict the expected blunt rate of those putative target sites for the reference allele sequence targeted with an sgRNA matching the reference sequence and also for the mutated allele targeted with an sgRNA containing the mutation. The top 150 sites with the lowest blunt rates (75 in sense and 75 in antisense strands) and targets with the highest predicted changes were selected for HiPlex BreakTag sgRNA pool generation. For greater statistical power, we selected sites for which the alternative allele is found in three or four donors.

GIAB SNP analysis

We used the ‘scission_profile_analysis()’ function in BreakInspectoR to obtain the scission profile of the 300 sites picked from the previously selected SNP-containing sites in GIAB genomes. We calculated the blunt rate as the fraction of the BreakTag signal in the expected cut site (position 17 of the protospacer) with respect to the total signal in the region [−3, +3] around the cut site, obtaining an approximation for the number of blunt breaks compared to the total number of breaks as captured by BreakTag. For the visualizations comparing the blunt rate and the genotype, we selected highly covered sites with at least 16 raw reads in the PAM-proximal side and reference and alternative genotype information in at least one sample for each genotype.

1000G database SNP analysis

The full set of biallelic SNVs and indels called by Lowy-Gallego et al.57 from phase three of the 1000 Genomes Project was downloaded from the EBI’s FTP server (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.wgs.shapeit2_integrated_snvindels_v2a.GRCh38.27022019.sites.vcf.gz) with the timestamp of 12 March 2019, 16:06. We further processed the file to keep only the SNPs that were called in at least 10% of the samples used in this call set (n = 5,248). The positions of the SNPs were cross-referenced with a table of all 11,431,163 putative CRISPR–Cas9 targets on exons annotated in the Ensembl version 98 database58 that have an NGG PAM. We shortlisted two subsets of 18,961 and 18,883 putative target sites with a SNP at positions 17 or 18 (respectively) of the protospacer sequence. We then used the reduced machine learning model, which uses only the last 10 positions of the protospacer, to predict the expected blunt rate of those putative target sites for the reference allele sequence targeted with an sgRNA matching the reference sequence and also for the mutated allele targeted with an sgRNA containing the mutation.

Prediction of blunt rates of gRNAs targeting pathogenic deletions

The full set of variants annotated in ClinVar as of April 2023, comprising a total of 2,122,310 variants, was downloaded from the National Institutes of Health FTP server (https://ftp.ncbi.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz). Only variants that were 1-nt deletions, located in standard chromosomes, overlapping an exon annotated in TxDb.Hsapiens.UCSC.hg38.knownGene (data package made from resources at UCSC on 16:50:30 + 0000, Thursday, 7 April 2022) and annotated in ClinVar as ‘Pathogenic’ or ‘Likely_pathogenic’, were considered (31,010 variants). We focused on a subset of 8,705 deletions that had an NGG motif directly adjacent to them in either strand and up to 4 nt upstream. Those sites were candidates for being cut by Cas9 in a staggered manner, which could potentially induce a templated +1 insertion as the repair outcome, correcting the frameshift in the pathogenic allele and potentially recovering the original protein sequence. We calculated that a total of 4,999 of those deletions would recover the original protein sequence with a templated +1 insertion. Next, we designed ‘in silico’ the gRNA sequences that would target the regions containing the deletions, and we estimated the blunt rate using the previously described XGBoost models for SpCas9 and LZ3 trained with the HiPlex library. Those sites predicted to be cut in a highly staggered manner (log 2 blunt rate < −2) in which a templated insertion would recover the original protein were finally reported as pathogenic variants being potentially treated with a CRISPR–Cas9 therapy.

Construction of gRNA-target pair lentiviral libraries

Using our XGBoost models for SpCas9, we predicted the blunt rate of human genome sites and selected 150 sites predicted to be cut mostly blunt and 150 sites predicted to be cut mostly staggered. For the ‘ALT’ and ‘REF’ libraries, all gRNAs used in the HiPlex3 dataset were used. The cloning strategy of gRNA-target pair lentiviral libraries was adapted from Allen et al.10. In brief, a scaffoldless lentiviral expression vector, pKLV2-U6(BbsI)-PKGpuro2ABFP-W, was generated by removing the improved gRNA SpCas9 scaffold from pKLV2-U6gRNA5(BbsI)-PKGpuro2ABFP-W34 (gift from Kosuke Yusa, Addgene plasmid no. 67974). The deletion was generated by amplifying two fragments encompassing the 5′ end of the AmpR cassette to U6 promoter and PGK promoter of the 3′ end of the AmpR cassette, followed by Gibson assembly. The empty vector was transformed into Stabl3 chemically competent cells; single colonies were picked; and scaffold deletion was confirmed via Sanger sequencing.

For the library cloning step, we generated a 170-nt oligonucleotide pool (IDT) encoding the gRNA and a portion of the allele sequence containing 79 nucleotides with the target sequence + PAM in the center for the four individual libraries (Extended Data Fig. 5a). The oligonucleotide was amplified with primers compatible with the scaffold used, and a Gibson assembly was used to fuse the amplified pool to a 193-nt Ultramer duplex (IDT) encoding the improved version of the gRNA scaffold and a spacer sequence10. Three separated Gibson assembly reactions were performed per pool at a 1:1 molar ratio, followed by an incubation for 1 h at 50 °C, and subsequently pooled for column-based purification (Monarch PCR & DNA Cleanup Kit, NEB, T1030S), and removal of linear DNA was achieved by treating the samples with Plasmid-Safe ATP-Dependent DNAse (Epicentre). The intermediate circular insert and scaffoldless vector were linearized with a FastDigest BpiI (IIs class) kit (Thermo Fisher Scientific, FD1014) for 30 min and ligated in triplicates per pool (T4 DNA ligase, NEB, M0202). The replicates were pooled and transformed in Stabl3 chemically competent cells.

Transduction of gRNA-target lentiviral pools

For lentiviral packaging of gRNA-target libraries, the gRNA-target libraries were independently co-transfected with the two packaging plasmids, and the supernatants were pooled and concentrated 50–100-fold. Packaging and transduction were performed as described previously59. In brief, we produced the viruses by co-transfection of 293T cells with each of the four library pools and two helper plasmids, psPAx2 and pMD2.g, encoding the VSV-G envelope and the lentiviral gag-pol genes, respectively. We harvested the lentiviral vector-containing supernatant twice, at approximately 42 h and 66 h after transfection, and concentrated it by using Lenti-X Concentrator (Takara, 631232). We plated 300,000 cells in a well of a six-well plate and transduced with the vector supernatants and 4 μg ml−1 polybrene in a total volume of 2 ml. After 48 h, the transduced cells were removed from the six-well plate, and one fifth of the cells were tested for BFP expression by flow cytometry (BD Canto), whereas the rest were plated in 10-cm2 tissue culture dishes for selection with puromycin (1 μg ml−1). Cells were kept under puromycin selection for 5 d. On the last day, cells were collected and tested for BFP expression, and gDNA was isolated using the Qiagen Blood & Tissue Kit (Qiagen, 69506).

gRNA-target pair amplicon sequencing library preparation

The region containing the gRNA sequence and 79-nt portion of the allele was amplified using the Fwd_pool and Rev_pool primers (Supplementary Table 13) with NEBNext Ultra II Q5 Master Mix (NEB, M0544) with the following program: 98 °C for 60 s, 24 loops of 98 °C for 10 s and 72 °C for 30 s, followed by a final extension at 72 °C for 2 min. The PCR product was purified using 0.9× volumes of DNA AMPure XP beads (Beckman Coulter, A63987) and eluted in nuclease-free water. The cleanup product was used for a second PCR round with indexed primers (Supplementary Table 13) with the following conditions: 98 °C for 60 s, 13 loops of 98 °C for 10 s, 67 °C for 10 s and 72 °C for 20 s, followed by a final extension at 72 °C for 2 min. The indexed libraries were pooled, and the band corresponding to the amplicon size (464 bp) was excised from a 2% agarose gel, purified and sequenced in paired-end mode (2 × 150 bp) in a NextSeq 2000 sequencer with 40% PhiX spike-in.

Analysis of gRNA-target repair outcomes

The first read in pair was used solely to estimate the abundance of each gRNA, as it reads into the gRNA portion of the construct. The second pair that reads into the target sequence was reverse complemented with the fastx_toolkit (http://hannonlab.cshl.edu/fastx_toolkit) and stripped from the first 57 bases and kept only the immediate 79 nt using Trimmotatic60 with options SE HEADCROP:57 CROP:79, which would keep only the 79-nt-long portion of the read containing the actual amplicon of the targeted sequence. Processed reads from technical replicates were merged in a single FASTQ file, and indels were called using CRISPResso2 (ref. 61) in pooled mode (CRISPRessoPooled), restricting the analysis to regions with at least 100 aligned reads and ignoring substitutions other than indels. gRNAs with detected activity in wild-type (WT) cells not expressing Cas9 that had been reported in the CRISPResso2 analysis with at least 100 edited reads were excluded from the analysis. For the rest, we extracted from the CRISPResso2 analysis output the length of the indel, the frequency of the most common +1 insertion over all edited sequences and the inserted nucleotide.

Nucleofection of RNP complexes into lymphoblastoid cells

For the preparation of RNP complexes, sgRNAs targeting SNP-containing loci (Supplementary Table 8) were generated in-house using the HighYield T7 sgRNA Synthesis Kit (SpCas9) (Jena Bioscience, RNT-105). Two hundred picomolar sgRNA was mixed with 100 pM Alt-R S.p. Cas9-GFP V3 (IDT, 10008100) and incubated at room temperature for 10 min. A total of 5 × 105 cells per reaction were resuspended in SF Cell Line 4D-Nucleofector solution (Lonza, V4XC-2032) and nucleofected in a 4D-NucleoFector system using the pulse code DN-100. Nucleofected cells were transferred to a plate containing culture medium and kept in a humidified incubator at 37 °C supplemented with 5% CO 2 for 3 d before gDNA was extracted for indel analysis.

Amplicon sequencing and editing analysis using CRISPResso2

The gDNA of lymphoblastoid cells nucleofected with RNPs was extracted 3 d after CRISPR delivery. Approximately 100 ng of gDNA from each sample was used for locus amplification using the primers listed in Supplementary Table 8. Amplicon libraries were generated as described previously62 with the following modifications: a first round of amplification using NEBNext Ultra II Q5 Master Mix (M0544) was performed with 33 cycles. The amplified DNA was purified using a 1× volume of DNA AMPure XP beads (Beckman Coulter, A63987), and the entire purified product was used for a second round of PCR with primers containing p5 and p7 sequences for Illumina sequencing (Supplementary Table 8). Amplicons were pooled and sequenced in a MiniSeq sequencer in single-read mode and 150 cycles.

Indel analysis was performed in a local Linux cluster using CRISPresso2 in pooled format61 using the following parameters: –amplicon_min_alignment_score 50–quantification_window_size 10–quantification_window_center -3–exclude_bp_from_left 0–exclude_bp_from_right 0–ignore_substitutions–plot_window_size 20–min_frequency_alleles_around_cut_to_plot 0.

Cas9 variant cloning, expression and purification

The pET-Cas9-NLS-6×His expression vectors for Cas9 variants were generated by using Gibson assembly. As a PCR template for the expression vector backbone, pET WT Cas9-NLS-6×His was used63 (Addgene plasmid no. 62933). The PCR templates for the Cas9 variants were pX165-LZ3 Cas9 (Addgene plasmid no. 140561), pX165-evoCas9 (Addgene plasmid no. 140569), pX165-xCas9 (Addgene plasmid no. 140568), pX165-HypaCas9 (Addgene plasmid no. 140567) and pX165-SniperCas9 (Addgene plasmid no. 140560).

The pET expression vectors were transformed into Escherichia coli BL21 (DE3) CodonPlus (Agilent) and grown at 37 °C and 140 r.p.m. until an optical density at 600 nm (OD 600 ) value of 0.5 was achieved. Cultures were cooled to 18 °C on ice, and protein expression was induced using IPTG at a final concentration of 0.5 mM and incubated for a further 21 h at 18 °C and 140 r.p.m. Cells were harvested by centrifugation (4,000g, 15 min), resuspended in ice-cold lysis buffer (30 mM Tris-HCl, 500 mM NaCl, 10 mM imidazole, 1 mM MgCl 2 , 1 mM TCEP, 5% glycerol, 1× complete protease inhibitor, 100 U ml−1 benzonase, pH 8.0) and lysed by high-pressure homogenization at 28 kpsi (Constant Systems CF1 Cell Disruptor). Cells were cleared by centrifugation (40,000g, 30 min, 4 °C), and the cleared lysate was applied to a HisTrap FF 5-ml column (Cytiva), using an automated chromatography system (Bio-Rad, NGC Quest Plus; used for all chromatography steps). The column was washed with 20 CV wash buffer (30 mM Tris-HCl, 500 mM NaCl, 10 mM imidazole, 5% glycerol), and the Cas9 variants were eluted from the Ni–NTA column by applying a linear gradient of 10–500 mM imidazole (containing 30 mM Tris-HCl, 500 mM NaCl, 5% glycerol). The eluted proteins were diluted 1:10 in a low-salt buffer (25 mM Na–HEPES, pH 7.2, 100 mM NaCl, 5% glycerol), applied to a HiTrap Heparin 5-ml column (Cytiva) and eluted by applying a linear NaCl gradient from 100 mM to 1,000 mM. Elution fractions containing the Cas9 variants were pooled and concentrated using Amicon Ultra-15 spin concentrators (Merck). Concentrated proteins were applied to a gel filtration column (Superdex 200 16/60 pg, Cytica, 40 mM Na–HEPES, pH 7.4, 400 mM NaCl, 10% glycerol). Peak fractions containing the Cas9 variants were pooled, concentrated to 6.4 g L−1 and diluted 1:2 with 86% glycerol to a final concentration of 3.2 g L−1 (20 µM). HiFiCas9 was purchased from IDT (no. 1081060).

