Telomere fusions associate with coding sequence and copy number alterations in CLL

Short dysfunctional telomeres are detected prior to clinical progression in chronic lymphocytic leukaemia (CLL) and result in chromosomal fusions that propagate genome instability, driving disease progression. To investigate the impact of telomere dysfunction on the CLL genome, we performed a large-scale molecular characterisation of telomere fusion events in CLL B-cells. A cohort of 276 CLL patient samples was selected for analysis based on short telomere length (TL) profiles, with the majority (97%, n=269) having mean TL within the previously-defined fusogenic range in CLL [1]. Patient samples were screened for the presence of telomere fusions using a single-molecule telomere fusion assay [2] modified to include the 5p telomere (Supplementary Figure 1). Telomere fusions were detected in 72% (198/276) of the samples, which were subsequently arbitrarily stratified by fusion frequency (Supplementary Table 1). Fusions were detected for all telomeres assayed, including the 5p telomere, for which fusions were present in 23% (40/177) of patient samples (Supplementary Figure 2, Supplementary Table 2).

. A 5p-specific primer (5p8: 5ʹ-CCTCTACTAACCTTTAAGGCTGTG-3ʹ) was designed in the 5p sub-telomeric sequence to target this chromosome end that is distal to TERT. For Southern blotting with a 5p-specific radiolabelled probe, the gel-purified product of the fusion primer 5p8 with 5p6: 5ʹ-CGTAGAGGAGGGTGGAACCTC-3ʹ was used. For each CLL patient sample 100ng of gDNA was used per reaction and 10 replica telomere fusion PCR reactions were performed. From the total cohort of 276 CLL patient samples, the first 33 samples (15 from the UHW and 18 from the LRF CLL4 clinical trial) were screened with the radiolabelled probes combined in the following sequence: 5p+17p, XpYp+16p and 21q. The remaining 243 patient samples were initially screened using the 5p probe on its own to facilitate identification of 5p telomere fusions.
Fusion frequency was calculated by dividing the total number of fusion events detected for each sample by the total number of diploid genomes used in the original PCR reaction.

Sequencing telomere fusion amplicons
For 9 CLL patient samples, 200-300 telomere-fusion PCR reactions were performed, generating 600-900 fusion amplicons per sample. Pooled fusion amplicons for each sample were purified using Agencourt AMPure XP beads. Verification of purification was assessed by Southern blotting. Fusion amplicons were subjected to paired-end Illumina HiSeq4000 PE100 sequencing at the Oxford Genomic Centre (CLL1 sample) and Beijing Genomics Institute, Hong Kong (samples CLL2-9).

Characterisation of intra-and inter-chromosomal telomere fusions
The mapping approaches taken to characterise telomere fusions and rare genomic recombination events were based on a novel pipeline previously developed (7), adapted to include the 5p sub-telomere, and delineated in Supplementary Figure 2. The human reference sequence employed was hg19, GRCh37. Identified telomere fusion sequences were visualised using the Broad Institute Integrative Genome Viewer (IGV) (8) and underwent a second round of alignment validation using Basic Local Alignment Search Tool (BLAST) to assess mapping accuracy. Only unambiguously-aligned events were included in subsequent analyses and 68% of these included the actual fusion junction within a single read.
Intra-chromosomal telomere fusions were defined as paired-end sequence reads mapping to the same sub-telomeric sequence in the same orientation 5ʹ-3ʹ towards the telomere repeat array. Inter-chromosomal telomere fusions were defined as paired-end sequence reads where at least one of the read pair was mapped to a defined sub-telomeric sequence and the other mapped to a locus on a different chromosome. Validated telomere fusion junctions were individually investigated to determine measures of microhomology (sequence identity) and deletion at contributing chromatids. Analyses of gene sets enriched within telomere fusions with the genome were performed using Gene Set Enrichment Analysis (GSEA, v5.2) and Molecular Signatures Database (MSigDB) (9).

Whole Genome Sequencing (WGS)
WGS was undertaken at BGI Technologies with Illumina HiSeq2000, using 30µg of CD19 + CLL Bcells gDNA for 60x coverage of the tumour genome and 2µg of CD3 + CLL T-cells gDNA for 30x coverage of the control. Somatic SNVs were called using Mutect (10) and Somatic-Sniper (11) with default settings, and the intersection of these sets was taken to reduce the false positive rate for low frequency alleles. The variant allele frequency distribution was analysed at diploid regions of the genome identified from running cn.mops (12).

Statistics
All statistical analyses, including one-and two-tailed t-tests, ANOVA tests and Chi-square analyses were performed using GraphPad software, including Prism 6.

Availability of data and material
The sequencing data is deposited in NCBI under a BioProject ID: PRJNA459488 and available at http://www.ncbi.nlm.nih.gov/bioproject/459488 Pipeline for the detection of telomere fusion events using bioinformatics tools, followed by manual curation and downstream analysis for each CLL patient sample. Sequencing data of telomere fusion amplicons from 9 CLL patient samples were obtained from BGI Tech after they performed HiSeq4000 PE NGS. Data handling and QC were performed following intra-and inter-chromosomal mapping strategies. Finally, manual curation and downstream analysis were performed.

CD8A
CD8a molecule Expressed in cytotoxic T lymphocytes but aberrant expression has been reported in low frequencies in patient CLL-B cells and carries an adverse prognostic impact in the disease (15).

RAR Related Orphan
Receptor A Involved in lymphocyte development and inflammatory responses and has been found over expressed in CLL among other cancers (16)(17)(18)(19). RORA is also a very large CFS gene (within FRA15A 15q22.2) susceptible to genomic instability and inactivated in many tumours (20).

TESPA1 Thymocyte
Expressed, Positive Selection Associated 1 Expressed in T and B lymphocytes and regulates the inositol 1,4,5trisphosphate (IP3R) calcium-dependent activation of signalling pathways playing an important role modulating immune function (21).

DMD
Dystrophin Expressed at low but stable level in B cells and upregulated in unmutated CLL cases which associated with shorter survival (22,23).
NOX-related damage, like reactive oxigen species (ROS), has been associated to the initiation and progression of haematopoietic malignancies (25).

FTO
Fat mass and obesity associated gene The lipid metabolism gene FTO has been implicated in cancer cell metabolism (26). Unlike normal B cells, CLL cells present an altered lipid metabolism. Similarly to adipocytes and myocytes, they store lipids in vacuoles, produce energy from free fatty acids (FFA) and express genes related to the lipid metabolism (22,27,28).

NTF3
Neurotrophin 3 B cells are a source of neurotrophins that provide protective autoimmunity in the damaged nervous system; however, they express the neurotrophins NGF and BDNF but do not seem to express NTF3 and trkB (29).

EVI5
Ecotropic viral integration site 5 Regulator of cell cycle progression and cytokinesis. It has been suggested to prevent exhaustion in pre-leukemic stem cells in Runx1-deficient mouse and to cooperate with BCL6 (B cell lymphoma 6) transcription factor in B and T-cell lymphomas (30,31). In addition, deletion of 1p22 comprising EVI5 has been identified in over 20% of patients with multiple myeloma (MM) and low expression of this gene associates with worse prognosis in early stage patients (32).