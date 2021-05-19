Cell culture, transfection and harvest

Hepa1-6 (ATCC CRL-18.30) and HEK293T (ATCC CRL-321) cells were maintained in DMEM plus GlutaMAX (Thermo Fisher Scientific), and HepG2 (ATCC HB-8065) cells were maintained in EMEM (Gibco). The media were supplemented with 10% (vol/vol) FBS and 1× penicillin–streptomycin (Thermo Fisher Scientific). Cells were maintained at 37 °C and 5% CO 2 at a confluency below 90% and seeded on 48-well cell culture plates (Greiner) for transfection. Then, 12–16 h after seeding, at approximately 70% confluency, cells were transfected using 1.5 µl of Lipofectamine 2000 (Thermo Fisher Scientific) with 750 ng of base editor plasmid (Addgene no. 112101) and 250 ng of sgRNA plasmid (Addgene no. 52963). HepG2 cells were transfected using the Neon Transfection System (Invitrogen) following the manufacturer’s instructions. Briefly, 1.2 × 106 cells and 2.4 μg of the ABEmax plasmid DNA (MHp27) and 0.9 μg of the hPCSK9-gRNA (gRNA-6) plasmid DNA were resuspended in electroporation buffer R (Invitrogen) and electroporated with the following program: 1,230 V, 20 ms and three pulses. Transfection efficiency was checked by microscopy of GFP-positive cells and enriched by puromycin selection (2.5 µg ml−1). Cells were expanded until they reached confluency in a six-well plate. Upon detachment using TrypLE Express Enzyme (Thermo Fisher Scientific) for Hepa1-6 and HEK293T cells or Trypsin-EDTA for HepG2 cells, the cells were washed 2× in PBS and distributed for DNA lysis, RNA isolation or protein harvest.

Genomic DNA amplification, Sanger sequencing and BEAT analysis

Next, 30 µl of a cell suspension in PBS was directly lysed using 10 µl of 4× lysis buffer (10 mM Tris-HCl at pH 8, 2% Triton X-100, 1 mM EDTA, 1% freshly added proteinase K) and incubated at 60 °C for 60 min and 95 °C for 10 min. Target sites were amplified by polymerase chain reaction (PCR) using GoTaq G2 Hot Start Green Master Mix (Promega) and the respective primer pair (Supplementary Table 5). Amplification products were purified using Agencourt AMPure XP beads (Beckman Coulter) and sequenced with the respective in-sequence primers (Supplementary Table 5) via the Sanger method. Editing efficiency was determined by BEAT analysis52.

RNA isolation and RT–qPCR

RNA isolation was performed using the RNeasy Kit (Qiagen). cDNA was reverse transcribed using the GoScript Reverse Transcriptase Kit (Promega). RT–PCR was performed using GoTaq qPCR Master Mix (Promega) with specific primers for mouse or human PCSK9 and mouse B2M or human β-actin as housekeeping genes (Supplementary Table 5) and analyzed by QuantStudio 5 Real-Time PCR System (Thermo Fisher Scientific) or 7900HT Fast Real-Time PCR System (Applied Biosystems). Fold changes were calculated using the ΔΔCT method.

Western blot

Harvested Hepa1-6 cells were lysed in RIPA buffer containing protease inhibitor and PhosSTOP phosphatase inhibitor (Sigma-Aldrich). Protein amount was determined with the Pierce BCA Protein Assay Kit (Thermo Fisher Scientific), and equal amounts of proteins were separated by SDS–PAGE electrophoresis followed by transfer to nitrocellulose membrane (Sigma-Aldrich). Membranes were incubated with goat anti-mouse-Pcsk9 (1:10,000, cat. no. AF3985-SP, R&D Systems) and rabbit anti-β-actin (1:3,000, cat. no. 4970S, Cell Signaling) or rabbit anti-GAPDH (1:5,000, cat. no. 4970, Abcam). HRP- or IRDye-conjugated secondary antibodies (donkey anti-goat: LI-COR cat. no. 926-32214; anti-rabbit: LI-COR cat. no. 926-68073; Cell Signaling cat. no. 7074; Promega cat. no. V8051) were used, and signal was revealed by enhanced chemiluminescence substrate (Thermo Fisher Scientific) or fluorescence using LI-COR.

ELISAs

Human and NHP PCSK9 levels were determined using the Human Proprotein Convertase 9/PCSK9 Quantikine ELISA Kit (R&D Systems, cat. no. DPC900), and mouse Pcsk9 levels were determined by using the Mouse Proprotein Convertase 9/PCSK9 Quantikine ELISA Kit (R&D Systems, cat. no. MPC900) according to the manufacturer’s instructions. Anti-Cas9- or anti-TadA-specific antibodies were determined by an in-house set-up direct ELISA. In short, 10 ng of Cas9 or TadA were immobilized on 96-well polystyrene MaxiSorp plates (Thermo Fisher Scientific, cat. no. 439454) diluted in 1× ELISA Coating Buffer (BioRad, cat. no. BUF030A) for 2 h at room temperature. After washing in 1× ELISA Wash Buffer (Bio-Rad, cat. no. BUF031C), the wells were blocked for 30 min in ELISA BSA blocking solution (Bio-Rad, cat. no. BUF032C). For Cas9 detection, mouse-anti-Cas9 mAB (7A9-3A3; clone no. 14697T, Cell Signaling, cat. no. 14697) was used as positive control and standard curve. Plasma samples were diluted 1:20,000 for mouse plasma and 1:2,000 for NHP serum in Tris-buffered saline with Tween (TBS-T) and incubated for 2 h at room temperature. Goat-anti-mouse (SouthernBiotech, cat. no. 1030-05) or mouse-anti-monkey (SouthernBiotech, cat. no. 4700-05) HRP-linked secondary antibodies were used to detected protein-binding antibodies and developed using 1-Step Turbo TMB-ELISA Substrate Solution (Thermo Fisher Scientific, cat. no. 34022) and stopped after 20 min with Stop Solution for TMB Substrates (Thermo Fisher Scientific, cat. no. N301). Absorbance was measured at 450 nm and background at 540 nm; the latter was subtracted for quantification. For further background control, a 5% BSA coating was analyzed simultaneously.

Protein production

His 6 -MBP-tev-TadA-tadA* was expressed overnight at 18 °C in Escherichia coli Rosetta 2 (DE3) (Novagen) cells upon induction of T7 RNA polymerase with IPTG. Cells were resuspended and lysed in 20 mM HEPES-KOH pH 7.5, 200 mM KCl, 10 mM imidazole and supplemented with protease inhibitors, using a Maximator High Pressure Homogenizer Type HPL6. Clarified lysate was loaded on a 15-ml Ni-NTA Superflow column (Qiagen) and eluted with 20 mM HEPES-KOH pH 7.5, 200 mM KCl, 200 mM imidazole. In the second step, TadA is further purified using a gradient elution from a 10-ml HiTrap Heparin HP column (GE Healthcare) equilibrated in 20 mM HEPES-KOH pH 7.5, 100 mM KCl, 1 mM DTT. Protein containing fractions are pooled, and affinity tag is removed using TEV protease with incubation overnight at 4 °C. Uncleaved TadA is removed using reverse nickel-affinity chromatography step, and the untagged TadA flowthrough is applied to a Superdex 200 16/600 column (GE Healthcare) and eluted in 20 mM HEPES-KOH pH 7.5, 200 mM KCl. Purified fractions were concentrated, flash frozen in liquid nitrogen and stored at −80 °C.

Cloning

The sequences of the AAV constructs used in this work were generated by using pLV302 and pLV312.3 (Addgene plasmid nos.119943 and 119944) where regions of interest were exchanged using NEBuilder HiFi DNA Assembly Master Mix (NEB no. E2621). Amino acid sequences are listed in Supplementary Note 1. PCR was performed using Q5 High-Fidelity DNA Polymerase (New England Biolabs). pCMV_ABEmax_P2A_GFP was a gift from David Liu (Addgene plasmid no. 112101). lentiGuide-Puro was a gift from Feng Zhang (Addgene plasmid no. 52963). The coding sequence of ABEmax was cloned into the mRNA production plasmid behind a T7 promoter for mRNA production and into pET His6 LIC cloning vector (2Bc-T, Addgene plasmid no. 37236) for protein production.

AAV vector production

All pseudotyped AAV8 vectors were produced by the Viral Vector Facility of the Neuroscience Center Zurich. AAV vectors were ultracentrifuged and diafiltered. Physical titers (vector genomes ml–1) were determined using a Qubit 3.0 Fluorometer. Identity of the packaged genomes of each AAV vector was confirmed by Sanger DNA sequencing.

Animal studies

Mouse experiments were performed in accordance with protocols approved by the Kantonales Veterinäramt Zürich. Mice were housed in a pathogen-free animal facility at the Institute of Molecular Health Sciences at ETH Zurich and kept in a temperature- and humidity-controlled room on a 12-h light/dark cycle. For long-term studies of mice with sensitized background, conditional Trp53F2-10/F2-10 knockout mice53 were mated with albumin (Alb)-Cre transgenic mice54. Mice were fasted for 3–4 h before blood was collected from the inferior vena cava before liver perfusion. Mice were injected with 1–3 mg kg−1 of total RNA (LNP) or 1 × 1012 AAV vector genomes per mouse at 5 weeks of age. Injection volumes were 120–150 µl. Only male C57BL/6J animals were used. Male and female Alb-Cre × Trp53flox/flox animals were used (untreated group: 17 males and 12 females; AAV-only group: 16 males and two females; AAV-treated group: 16 males and nine females). Studies involving NHPs were conducted at a facility accredited by the Association for Assessment and Accreditation of Laboratory Animal Care International, operating in accordance with the principles of the U.S. Food and Drug Administration’s Good Laboratory Practice and the Guide for the Care and Use of Laboratory Animals from the Institute of Laboratory Animal Resources (2011). All protocols were reviewed and approved by the Acuitas animal care and use committee. Male M. fascicularis (approximately 2 years of age) were housed in a temperature- and humidity-controlled room on a 12-h light/dark cycle. Animals received a 60-min intravenous infusion of 0.75 mg kg−1 or 1.5 mg kg−1 of total RNA, formulated in LNP and diluted in 0.9% sodium chloride USP. A volume of 5 ml kg−1 was administered by a 1-h infusion via the cephalic vein. Animals were fasted for 4 h before serum collection for ELISA and clinical chemistry.

RNA synthesis and LNP encapsulation

Heavily modified sgRNA (P1) for mouse studies was synthesized using Synthego’s CRISPRevolution platform using solid-phase phosphoramidite chemistry. After synthesis and a series of post-processing steps and purification, oligonucleotides were quantified by ultraviolet (UV) absorption, and their identity and quality were confirmed using an Agilent 1290 Infinity II liquid chromatography system coupled with an Agilent 6530B Quadrupole time-of-flight mass spectrometry (Agilent Technologies) in a negative ion polarity mode. Chemically ultra-heavily modified sgRNAs (P2) were ordered from Agilent Technologies. Heavily modified sgRNAs for large-scale production for NHP studies were synthesized using an ÄKTA Oligopilot Plus 100 oligonucleotide synthesizer at a 112-µmol scale. After synthesis and de-protection steps, the oligo was subjected to solid-phase extraction using an ÄKTA Explorer FPLC system. This material then underwent further purification and quality assessment using the Agilent 1200 HPLC System. HPLC fractions were selected, combined and processed by tangential flow filtration using the Pall Minimate EVO TFF system. Final product quantity was evaluated using UV absorption, and its identity and quality were confirmed using Agilent 1290 Infinity II liquid chromatography system coupled with an Agilent 6530B Quadrupole time-of-flight mass spectrometry (Agilent Technologies) in a negative ion polarity mode.

The coding sequence of ABEmax was cloned into the mRNA production plasmid. mRNA production and LNP encapsulation were performed as described55. Briefly, mRNAs were transcribed to contain 101 nucleotide-long poly(A) tails. m1Ψ-5′-triphosphate (TriLink no. N-1081) instead of UTP was used to generate modified nucleoside-containing mRNA. Capping of the in vitro transcribed mRNAs was performed co-transcriptionally using the trinucleotide cap1 analog, CleanCap (TriLink, no. N-7413). mRNA was purified by cellulose (Sigma-Aldrich, no. 11363–250G) purification as described56. All mRNAs were analyzed by agarose gel electrophoresis and were stored frozen at −20 °C. Cellulose-purified m1Ψ-containing mRNA, together with the synthesized sgRNA, were encapsulated in LNPs. LNPs were formulated as described previously57. In short, an ethanolic solution of 1,2-distearoyl-sn-glycero-3-phosphocholine, cholesterol, a PEG lipid and an ionizable cationic lipid was rapidly mixed with an aqueous solution (pH 4) containing SpCas9-ABEmax mRNA and sgRNA (1:1 weight ratio) using an in-line mixer. The ionizable lipid and LNP composition are described in U.S. patent application US 2016/0376224 A1 (2016), with the ionizable lipid (pKa in the 6.0–6.5 range) belonging to lipid class defined by the structure shown in Supplementary Fig. 4a. The resulting LNP formulation was dialyzed overnight against 1× PBS, 0.2-μm sterile filtered and stored at −80 °C at a concentration of 1 μg μl−1 of total RNA. LNP had an average hydrodynamic diameter of 67–71 nm with a polydispersity index of 0.02–0.06 as determined by dynamic light scattering (Malvern Nano ZS Zetasizer) and a mode size of 67–75 nm as determined by nanoparticle tracking analysis (Malvern Panalytical NanoSight NS300). Encapsulation efficiencies of SpCas9-ABEmax mRNA and sgRNA in the LNP were both at 96% measured by the Quant-iT Ribogreen Assay (Life Technologies). Acuitas will provide the LNP used in this work to any academic investigator who would like to test it.

Primary hepatocyte isolation

Mice were euthanized using CO 2 and immediately perfused with Hank’s balanced salt solution (Thermo Fisher Scientific) plus 0.5 mM EDTA via the inferior vena cava and a subsequent incision in the portal vein. During this step, one liver lobe was squeezed off via a thread to inhibit perfusion of this lobe to collect whole liver samples for embedding and whole liver lysates. After blanching of the liver, mice were perfused with digestion medium (low-glucose DMEM plus 1× penicillin–streptomycin (Thermo Fisher Scientific), 15 mM HEPES and freshly added Liberase (Roche)) for 5 min. Livers were isolated in cold isolation medium (low-glucose DMEM supplemented with 10% (vol/vol) FBS plus 1× penicillin–streptomycin (Thermo Fisher Scientific) and GlutaMax (Thermo Fisher Scientific)), and the liver was gently dissociated to yield a cell suspension that was passed through a 100-µm filter. The suspension was then centrifuged at 50g for 2 min and washed with isolation medium 2–3 times until the supernatant was clear. The primary hepatocytes were pelleted for DNA or RNA isolation.

Genomic DNA isolation and HTS

Genomic DNA from mouse tissues was isolated using the DNeasy Blood and Tissue Kit (Qiagen) according to the manufacturer’s protocol or directly lysed using direct lysis buffer: 10 µl of 4× lysis buffer (10 mM Tris-HCl pH 8, 2% Triton X-100, 1 mM EDTA, 1% freshly added proteinase K) and incubated at 60 °C for 60 min and 95 °C for 10 min. Target sites were amplified by PCR using GoTaq G2 Hot Start Green Master Mix (Promega) or NEBNext High-Fidelity 2× PCR Master Mix and the respective primer pair (Supplementary Table 6) in 26 cycles. The PCR product was purified using 0.8× Agencourt AMPure XP beads (Beckman Coulter) and amplified with primers containing sequencing adaptors for another six cycles. The products were gel purified and quantified using the Qubit 3.0 fluorometer with the dsDNA HS Assay Kit (Thermo Fisher Scientific). Samples were sequenced on Illumina MiSeq. After demultiplexing, the samples were analyzed using CRISPResso2 (ref. 58).

Clinical chemistry and cyokine and inflammatory biomarkers

Total cholesterol, triglyceride, high-density lipoprotein (HDL), AST and ALT levels from all mouse samples were measured as routine parameters at the Division of Clinical Chemistry and Biochemistry at the University Children’s Hospital Zurich using Alinity ci-series. LDL levels were calculated by using the Friedewald formula. NHP serum was subjected to a full clinical chemistry panel, including ALT, AST, total bilirubin, LDL cholesterol, HDL cholesterol and total cholesterol. Approximately 1 ml of blood was taken from the femoral vein, processed to serum and analyzed using a Beckman Coulter analyzer. For cytokine and inflammatory biomarker analyses, approximately 0.8 ml of blood was processed to serum, and a panel of ten cytokine and inflammatory biomarkers (IFN-α2a, IL-18, IL-1RA, IL-1β, IL-6, IP-10, MCP-1, MIP-1α, MIP-1β and TNF-α) was evaluated using U-PLEX Biomarker Group 1 (NHP) assay kits (Meso Scale Diagnostics).

Tissue cryosections

Mouse livers were perfused with Hank’s buffer and bound off before further perfusion. The separated section was fixed in 4% paraformaldehyde (PFA) at 4 °C overnight. Tissues were transferred to a 30% sucrose solution overnight at 4 °C and embedded in OCT compound in cryomolds (Tissue-Tek). Frozen tissues were sectioned at 7 µm at −20 °C, and mounted directly on Superfrost Plus slides (Thermo Fisher Scientific). Cryosections were counterstained with DAPI (Thermo Fisher Scientific) and mounted in VECTASHIELD mounting medium (Vector Labs). Two frozen sections were analyzed per mouse per tissue.

Single-molecule fluorescence in situ hybridization

The ABEmax probe library was designed using Stellaris FISH Probe Designer Software (Biosearch Technologies) (Supplementary Table 7) and coupled to Cy5 as described59. Livers were fixed in 4% PFA in PBS for 3 h and subsequently agitated in 30% sucrose and 4% PFA in PBS overnight at 4 °C. Fixed tissues were embedded in Tissue-Tek OCT Compound (Sakura, 4583). Then 8-μm-thick sections were sectioned onto poly-l-lysine-coated coverslips, air dried for 5 min, fixed for 15 min in 4% PFA and permeabilized overnight in 70% EtOH. The liver sections were hybridized with single-molecule fluorescence in situ hybridization (smFISH) probe sets according to a previously published protocol60. DAPI (Sigma-Aldrich) was used as nuclear counterstain. smFISH imaging was performed on a Leica THUNDER 3D live cell imaging system, using the following THUNDER computational clearing settings: Feature Scale (nm): 350; Strength (%): 98; Deconvolution settings: Auto; and Optimization: High.

Histology and staining

Tissues were fixed using 4% PFA at 4 °C overnight and dehydrated the next day before paraffinization. Paraffin blocks were cut into 5-μm-thick sections, deparaffinized with xylene and rehydrated. Sections were stained for hematoxylin and eosin (H&E) or Sirius Red and examined for histopathological changes.

Microscopy

Mouse tissue was imaged using a Zeiss Apotome. Imaging conditions and intensity scales were matched for all images. Images were taken using Zeiss software Zen2 and analyzed by Fiji ImageJ software (v1.51n)61.

Guide-dependent off-target prediction and analysis

For CIRCLE-seq and CHANGE-seq, the sgRNA was first tested for functionality by digesting the Sanger amplicon described above. The library was prepared as previously described40,41. Data were processed using version 1.1 of the CIRCLE-seq analysis pipeline (https://github.com/tsailabSJ/circleseq) with the following parameters: ‘window_size: 3; mapq_threshold: 50; start_threshold: 1; gap_threshold: 3; mismatch_threshold: 6; merged_analysis: True, variant_analysis: True’. The respective target sites were deep sequenced and covered by at least 10,000 reads per site. Highly repetitive sequences were further processed by extracting the amplicon with cutadapt62 (v3.1) excluding the protospacer region. If this was also not possible because the region was too similar to a different site in the genome, the off-target editing events could not be determined. For iGUIDE, libraries were prepared as previously described46.

RNA-seq experiments and data analysis

RNA library preparation was performed using the TruSeq Stranded Total RNA Kit (Illumina) with ribosomal RNA (rRNA) deletion. RNA-seq libraries were sequenced on an Illumina NovaSeq machine at the Functional Genomics Center Zurich, achieving an average of more than 160 million paired-end (PE) reads per sample. Quality control, pre-processing, alignment of RNA-seq reads: Quality of Illumina PE RNA-seq reads was evaluated using FastQC version 0.11.7 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Using FastQ Screen version 0.11.1 (https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/), potential sample contaminations (genomic DNA, rRNA and mycoplasma) were screened against a custom database including UniVec (https://www.ncbi.nlm.nih.gov/tools/vecscreen/univec/), RefSeq mRNA sequences, selected genome sequences (human, mouse, arabidopsis, bacteria, virus, phix, lambda and mycoplasma) (https://www.ncbi.nlm.nih.gov/refseq/) and SILVA rRNA sequences (https://www.arb-silva.de/). Illumina PE reads were pre-processed using fastp version 0.20.0 to trim off sequencing adaptors and low-quality ends (average quality lower than 20 within a 4-nt window). Flexbar version 3.0.3 was used to remove the first six bases of each read, which showed priming bias introduced by the library preparation protocol63. PE reads longer than 50 bp were trimmed to 50 bp before being aligned to remove overlapping reads ends, which can inflate allele frequency calculation and variant calls. Quality controlled reads (average quality 20 and above, read length 20 and above) were aligned to the reference genomes (mouse reference genome: GRCm38.p5, Ensembl release 91; human reference genome: GRCh38.p10, Ensembl release 91) using STAR version 2.7.0e with two-passes mode. PCR duplicates were marked using Picard version 2.9.0. Read alignments were comprehensively evaluated in terms of different aspects of RNA-seq experiments, such as sequence quality, genomic DNA and rRNA contamination, GC/PCR/sequence bias, sequencing depth, strand specificity, coverage uniformity and read distribution over the genome annotation, using R scripts in ezRun (https://github.com/uzh/ezRun/) developed at the Functional Genomics Center Zurich. RNA sequence variant calling and filtering: Variant calling from RNA-seq reads was performed according to Genome Analysis Toolkit (GATK) Best Practices (https://gatkforums.broadinstitute.org/gatk/discussion/3891/calling-variants-in-rnaseq). In detail, GATK (v4.1.2.0) tool SplitNCigarReads was applied to post-process the read alignments. Afterwards, variants were called using HaplotypeCaller (GATK v4.1.2.0) on PCR-deduplicated, post-processed aligned reads. Variant loci in base editor overexpression experiments were filtered to exclude sites without high-confidence reference genotype calls in the control experiment. For a given SNV, the read coverage in the control experiment should be above the 90th percentile of the read coverage across all SNVs in the corresponding overexpression experiment. Only loci having at least 99% of reads containing the reference allele in the control experiment were kept. Only sites with at least ten reads in the treated sample were considered. Quantification of gene expression: Transcript expression was calculated using kallisto (v0.44.0).

WGS and data analysis

Upon confirmation of on-target editing, DNA was harvested using the QIAamp DNA Mini Kit (Qiagen) or Quick DNA Microprep Kit (Zymo Research) according to manufacturer instructions. DNA concentrations were determined using the Qubit dsDNA HS Kit (Invitrogen). WGS was performed at a mean coverage of 30× using an Illumina NovaSeq. Read alignment, variant calling and variant filtering: Sequence reads were mapped against mouse reference genome GRCm38 by using the Burrows–Wheeler Aligner version 0.7.5 mapping tool64 with settings ‘bwa mem -c 100 -M’. Sequence reads were marked for duplicates by using Sambamba version 0.4.732 and realigned per donor by using the GATK IndelRealigner version 2.7.2. Raw variants were multisample-called by using the GATK HaplotypeCaller version 3.4-46 (ref. 65) and GATK-Queue version 3.4-46 with default settings and additional option ‘EMIT_ALL_CONFIDENT_SITES’. The quality of variant and reference positions was evaluated by using GATK VariantFiltration version 3.4-46 with options ‘-snpFilterName LowQualityDepth -snpFilterExpression “QD < 2.0” -snpFilterName MappingQuality -snpFilterExpression “MQ < 40.0” -snpFilterName StrandBias -snpFilterExpression “FS > 60.0” -snpFilterName HaplotypeScoreHigh -snpFilterExpression “HaplotypeScore > 13.0” -snpFilterName MQRankSumLow -snpFilterExpression “MQRankSum < -12.5” -snpFilterName ReadPosRankSumLow -snpFilterExpression “ReadPosRankSum < -8.0” -cluster 3 -window 35’. Full pipeline description and settings are also available at https://github.com/UMCUGenetics/IAP. To obtain high-quality somatic mutation catalogs, we applied post-processing filters as described51. Briefly, we considered variants at autosomal chromosomes without any evidence from a paired control sample (genomic DNA isolated from untreated tissue from the same mouse); passed by VariantFiltration with a GATK phred-scaled quality score ≥100 for base substitutions and ≥250 for indels; a base coverage of at least 20× in the clonal and paired control sample; mapping quality ≥60; and no overlap with single-nucleotide polymorphisms in the Single Nucleotide Polymorphism Database version 142. We additionally filtered base substitutions with a GATK genotype quality (GQ) score lower than 99 or 10 in clonal or paired control samples, respectively. For indels, we filtered variants with a GQ score lower than 99 in both clonal and paired control samples and filtered indels that were present within 100 bp of a called variant in the control sample. In addition, for both SNVs and indels, we considered only variants with a variant allele frequency (VAF) of 0.2 or higher in the clones to exclude in vitro accumulated mutations51,66. The scripts are available at https://github.com/ToolsVanBox/SNVFI and https://github.com/ToolsVanBox/INDELFI. Owing to the karyotypically unstable nature of the cells and for the fair comparison of the number of mutations in the later analysis, only the mutations from the regions considered as diploid (1.5 < ratio < 2.5 from the Control-FREEC67 output when the samples were treated as diploid) and callable were included. The absolute number of mutations was corrected for the lengths of the accounted genomic regions. Mutational profile and signature analysis: The numbers of six substitution types (C > A, C > G, C > T, T > A, T > C and T > G) or 96-trinucleotide mutation types (six substitution types with 5′ and 3′ flanking bases) were reported, and the frequencies of the 96-trinucleotide mutations were plotted for every mouse using an in-house-developed R package68. For the normalized absolute number and relative amount of six substitution types, the samples were classified based on the injected chemicals; for each group, the mean and standard deviation were calculated and plotted. To illustrate the potential TadA activity in the samples, the identified TadA motif39 was used as TadA signature for cosine similarity comparison. The 96-trinuclueotide frequencies were pooled from the two signatures and normalized so that the frequencies add up to 1. The 96-nt TadA signature was deduced under the assumption that other substitutions do not contribute to the TadA signature. For the three control mouse samples, the 96-nt mutational profile was constructed and normalized by the total number of SNVs and multiplied by the median number of SNVs (428 SNVs) to make them comparable between the samples. To mimic the TadA activity on the mutational profile, the additional number of SNVs were distributed over the 96-nt mutational patterns according to the determined TadA signature, for 10, 25, 50 and 100 SNVs. Any decimal values were rounded and summed to the profiles of the controls. For all the samples and the TadA signature-added controls, cosine similarity with the TadA signature was calculated using MutationalPatterns68 in R. To calculate the variant detection sensitivity of our method, we identified germline variants and counted how many of them were found in the clones. To exclude potential artifacts in our data, the direct output from the IAP pipeline was further filtered with the following criteria: located in diploid and CALLABLE region, passed by VariantFiltration with a GATK phred-scaled quality score ≥100, GATK GC score equals 99, base coverage of at least 20× in all the clones and the bulk samples, does not overlap with the variants in our blacklists (available upon reasonable request) and present as a heterozygous variant (VAF ≥ 0.3) in the three bulk samples. Our filtering resulted in 86 heterozygous variants, and any position with VAF < 0.3 was counted as absent in the clones. The global maximum-likelihood estimates and the confidence intervals for both mice groups were calculated using the dNdScv package69 and plotted using ggplot2 (ref. 70) in R. Called SNVs were compared among groups using the online tool by the van de Peer lab (http://bioinformatics.psb.ugent.be/webtools/Venn/) provided by the VIB/UGent. Comparison of more than six groups was analyzed and but retrospectively visualized using Adobe Illustrator.

Statistical analyses

A priori power calculations to determine sample sizes for animal experiments were performed using G*Power71. Statistical analyses were performed using GraphPad Prism 8.0.0 for macOS. Sample sizes and the statistical tests used are described in the figure legends. P < 0.05 was considered statistically significant.

