ATM germline variants in a young adult with chronic lymphocytic leukemia: 8 years of genomic evolution

Chronic lymphocytic leukemia (CLL) is a disease commonly diagnosed in the elderly with a median age of ~70 years. However, CLL can also be detected in adolescent and young adults (AYA). According to different studies, 0.85–3.7% of patients with CLL are diagnosed in AYA and 3% of these patients had a first-degree relative with CLL [1]. Families with multiple individuals affected with CLL and other related B-cell tumors have been described with contradictory findings regarding their potential early age at diagnosis [2]. Despite these observations, our knowledge about the molecular profile and predisposing factors in AYA CLL is scarce [3, 4]. Comprehensive studies have dissected the (epi)genomic, and transcriptomic landscape of CLL [5]. Approximately 9–18% of CLL harbor del(11q) which occurs in younger patients with bulky disease and poor survival. These deletions are frequently associated with germline and acquired mutations of ATM [6]. Patients with the inherited disorder ataxia telangiectasia have biallelic alterations of the ATM gene and increased susceptibility to lymphoid malignancies [7]. Rare, protein-coding germline ATM variants are associated with CLL in adults [8]. However, ATM mutations are uncommon in familial CLL [9]. Here, we describe an 18-year-old woman diagnosed with CLL whose family history included a younger brother with B-cell acute lymphoblastic leukemia (B-ALL) and other family members carrying germline ATM mutations. A combination of wholegenome and single-cell characterization of this CLL at diagnosis and during the course of the disease provided an opportunity to understand the genomic profile of AYA CLL and the sequence of events driving its evolution. An 18-year-old female was diagnosed with CLL, Binet-Rai stage AI, at another institution, in the study of a lymphocytosis detected in a routine blood test. She had a past medical history of anxietydepressive syndrome during childhood and chronic headache, but no neurological symptoms were reported. The patient had a younger brother diagnosed with B-ALL when he was 3 years old, and was in complete remission 13 years later, and an older sister with epilepsy. Her parents were both healthy. At the time of CLL diagnosis, the patient was asymptomatic with a normal physical exam. Her white blood cell count (WBC) was 9.08 × 10/L, with 75% lymphocytes. Hemoglobin and platelet count were normal. Peripheral blood smear showed small atypical lymphocytes consistent with CLL, which phenotype was CD5, CD23, CD43, CD200, CD10, CD20 and CD22 weakly positive with weak kappa light chain restriction. The fluorescence in situ hybridization (FISH) analysis for ATM (11q22), D12Z3 (cen 12), DLEU (13q14.3), LAMP1 (13q34), and TP53 (17p13) were normal. One year after diagnosis, the patient received two cycles of rituximab plus fludarabine and cyclophosphamide (FCR) due to progressive disease, achieving a complete remission. The patient was then referred to our hospital. Physical examination was normal without evidence of lymphadenopathy or splenomegaly. WBC count was 2 × 10/L with 10% lymphocytes, hemoglobin 117 g/L, and normal platelet count. Watchful waiting was recommended. Five years later, the CLL progressed with increased lymphocytosis, inguinal, axillary, and laterocervical lymphadenopathy (2–3 cm) and splenomegaly of 4 cm below the costal margin. At that time, the karyotype was 46,XX,del[13](q12q21)[6]/46,XX[10] and a heterozygous del(13q14.3) was detected by FISH in 92% of nuclei. FISH for ATM, D12Z3, and TP53 were normal and no TP53 mutations were observed. The sequence of the IGHV genes showed a clonal rearrangement of the IGHV3-21 with 100% homology to the germline, not belonging to any major stereotype subset (Supplementary Tables 1, 2). Due to CLL progression, ibrutinib 420mg per day was started and the patient achieved a partial response. However, after 20 months, ibrutinib had to be discontinued due to the severe diarrhea and acalabrutinib 100mg every 12 h was started. Progression of CLL was observed after 13 months of treatment and rituximab and venetoclax were initiated (Fig. 1A). The patient was included in the CLL program of the International Cancer Genome Consortium and the whole genomes of the germline and tumor sample at diagnosis were sequenced [5]. No somatically-acquired driver alterations were detected but three germline ATM mutations were identified, including a pathogenic 28-base frameshift deletion (p.N3003Dfs*6) and two missense single nucleotide variants (p.K2204M and p.Y1961C). Although the p.K2204M missense variant has not been identified in previous studies, the p.Y1961C has been reported in a CLL patient and its modeling showed reduced ATM kinase activity [10]. Based on this result, we studied the segregation of these mutations in the family members by Sanger sequencing. The mother harbored the frameshift deletion, while the father and the sister carried the two missense variants. Both the patient and her brother with B-ALL inherited all three variants (Fig. 1B, Supplementary Tables 3, 4). A milder ataxia telangiectasia phenotype, where the disease progresses at a slower pace, has been observed in patients with reduced levels of ATM kinase activity [11]. At time of last follow-up the two siblings (28 and 16 years old) had not developed neurological symptoms. To better unfold the contribution of somatic alterations during the evolution of the disease, whole-genome sequencing (WGS) was performed at 3 additional time points over a period of 8 years and complemented with single-cell DNA-sequencing (Fig. 1A, Supplementary Table 1). Using a longitudinal sample-aware mutation calling pipeline that increases sensitivity, we identified 689 genome-wide and 7 non-synonymous variants in the WGS at diagnosis, increasing up to 1779 genome-wide and 18 nonsynonymous at the latest sample analyzed. Among them, four mutations were found in CLL driver genes over the course of the

3 instrument (Illumina) aiming at a mean coverage of 30x. Primary data analysis, image analysis, base calling, and quality scoring of the run were processed using the manufacturer's software.
A sample-based summary can be found in Supplementary Table 1.
Structural variants (SV): SV were detected using BRASS (v6.0.5) (12), SvABA, and DELLY2 (v0.8.1) (13). We filtered out variants called by BRASS with MAPQ<90, and those with MAPQ<60 for SvABA or DELLY2. Finally, SV identified by at least two programs and passing caller-specific filters for at least one program were kept. All SV were visually inspected using the Integrative Genomic Viewer (IGV) (14). Similar to SNV and indels, we recovered SV identified in any of the samples if they were detected by any program disregarding all filters and/or if they were seen by visual inspection using IGV.
Immunoglobulin gene rearrangements, stereotypy, and IGHV mutational status: IgCaller (17) was used to analyze immunoglobulin gene rearrangements (heavy and light chain rearrangements as well as class switch recombination) from WGS. The sequences obtained from IgCaller were used as input of Curated sequences were used as input of IMGT/V-QUEST (18) to annotate the genes, functionality and IGHV mutational status based on current guidelines (19). The ARResT/AssignSubsets online tool (20) was used to analyze stereotypy.
Mutational signatures: SNV were used to identify the mutational processes active during the course of the disease. SNV were classified into 96 substitution classes considering the base substitution and their 5' and 3' flanking bases. COSMIC mutational signatures (v3) known to be found in CLL were considered (SBS1, SBS5, SBS8 and SBS9) (1,21,22). We measured their contribution using a fitting approach (MutationalPatterns, v1.12.0) and iteratively removing the less contributing signature if removal of the signature decreased the cosine similarity between the original and reconstructed 96-profile less than 0.01, as previously described (22).
Subclonal architecture and clonal evolution: SNV were used to assess the subclonal architecture and evolution of the tumor. SNV were clustered using a Bayesian method (10,(23)(24)(25). First, a Markov chain Monte Carlo (MCMC) sampler for a Dirichlet process mixture model was used to infer putative subclones (assignment of mutations to subclones, and estimation of the subclone frequencies in each sample) from the SNV read counts, copy number states, and tumor purities. The MCMC sampler was run for 10000 iterations, discarding the first 5000. Clusters with less than 50 mutations were excluded. The phylogenetic tree of the subclones was identified following the "pigeonhole principle" (25), allowing a tolerated error 6 of 0.001. Clusters not assigned in the reconstructed tree were not considered. The length of each tree branch in the tree is proportional to the number of mutations assigned to the corresponding subclone. TimeScape R package (v1.6.0) was used to plot the fish plots.

Single-cell DNA-seq (scDNA-seq)
Sample preparation: scDNA-seq was performed for 3 different time points on a commercial gene panel (Tapestri single-cell DNA CLL panel from Mission Bio) covering 32 CLL driver genes, using the Tapestri Platform from Mission Bio. Sample and library preparation were performed following manufacturer's recommendations. Sequencing of all libraries was carried out on an Illumina NovaSeq 6000 S1 sequencer to obtain approximately 1300 reads/cell.

Data analysis:
The Tapestri Pipeline (V1, Mission bio) was used to analyze the data.
In short, adaptor sequences were trimmed, reads where aligned to the reference genome (hg19) using BWA, barcodes were corrected and reads were assigned to the corresponding cell barcode, and genotype calling was performed using the Genome Analysis Tooklit (GATK, v.37). Tapestri Insights (v2.2, Mission Bio) was used to analyze the output files (loom format) altogether. Genotypes with quality <30, read depth <10, or allele frequency <20% were marked as missing. Variants genotyped in less than 50% of the cells or mutated in less than 1% of the cells were not considered. Cells with less than 50% of genotypes present were removed.
After applying all these filters, a mean of 5948 cells per sample was obtained. Variants detected in bulk WGS were included as a white-list on Tapestri Insights. Variants at low-frequency (1-10% of cells) in all scDNA-seq samples and not present in COSMIC were black-listed to remove potential artifacts from library preparation and/or sequencing. Only coding and splice site mutations (SNV and indels) were analyzed. Genotypes of the detected mutations were exported and used as input of ∞SCITE (https://github.com/cbg-ethz/infSCITE) (26), encoded 7 as follows: zero for wild type, one and two for heterozygous and homozygous mutation, respectively, and three for missing genotypes. ∞SCITE was used to infer the mutation tree and assign cells into subclones. Cells assigned to more than one subclone or genotyped as wildtype for all mutations were not considered. As previously described (27), ∞SCITE was run using a global sequencing error rate (false positive rate) of 1%, following Mission Bio's recommendation, using an estimated rate of non-mutated sites identified as homozygous mutations of 0%, and an estimated rate of allele dropout rate (false negative rate) specific of each sample. Germline single-nucleotide polymorphisms in gnomAD with a population frequency >1% and identified as mutated in at least 75% of cells with a variant allele frequency per read count between 47% and 53%, were used to estimate the rate of mutated allele and normal allele dropouts. 8

Supplementary Tables
Supplementary Tables are placed in the Supplementary Tables Excel file. Supplementary