Single-cell genetics were used to interrogate clonal complexity and the sequence of mutational events in STIL-TAL1+ T-ALL. Single-cell multicolour FISH was used to demonstrate that the earliest detectable leukaemia subclone contained the STIL-TAL1 fusion and copy number loss of 9p21.3 (CDKN2A/CDKN2B locus), with other copy number alterations including loss of PTEN occurring as secondary subclonal events. In three cases, multiplex qPCR and phylogenetic analysis were used to produce branching evolutionary trees recapitulating the snapshot history of T-ALL evolution in this leukaemia subtype, which confirmed that mutations in key T-ALL drivers, including NOTCH1 and PTEN, were subclonal and reiterative in distinct subclones. Xenografting confirmed that self-renewing or propagating cells were genetically diverse. These data suggest that the STIL-TAL1 fusion is a likely founder or truncal event. Therapies targeting the TAL1 auto-regulatory complex are worthy of further investigation in T-ALL.
Single-cell genetics in haematopoietic [1,2,3] and other cancers [4, 5] have revealed substantial intraclonal complexity. In general, this diversity reflects evolutionary phylogenies with derivative subclones branching off from founder precursors . Architectural population diversity in cancer has important implications for reservoirs of cells involved in progression of disease and drug resistance therapy. Bioinformatic derivations of evolutionary trees can reveal the most likely sequence of genetic events and distinguish mutations that are present in all cancer cells, as truncal or founder events, versus those that are secondary and subclonally distributed [7,8,9]. This in turn carries implications for minimal residual disease (MRD) monitoring and targeted therapy.
Few such studies have been performed to date in T-ALL, although comparative genetic profiling of diagnostic, xenograft and relapse samples confirms clonal complexity [10, 11]. T-ALL is biologically diverse reflecting levels of differentiation arrest within the thymus and distinctive genetic lesions . We elected to study a single, common subtype of T-ALL, namely those with STIL-TAL1 fusion. We used multicolour FISH and single-cell multiplex quantitative-PCR (qPCR) to determine the phylogenetic architecture of diagnostic samples and to infer the order of genetic events comparing STIL-TAL1 fusion, which we postulated as a founder lesion, with other common genetic lesions including CDKN2A loss, PTEN mutation or loss and NOTCH1 mutation. In selected cases, we compared the clonal architecture of xenotransplanted samples with that observed in the diagnostic sample. This enabled us to infer the subclonal origins and genetic diversity of cells with propagating or stem cell activity.
Materials and methods
Diagnostic DNA of 19 T-ALL cases aged 1–24 years and 1 cell line (RPMI 8402) known to have the STIL-TAL1 rearrangement were available. The study was conducted in accordance with the Declaration of Helsinki and appropriate consent and ethical approval for the study was obtained (Ethics approval numbers CCR2285 and 16/SE/0219).
T-ALL molecular screening and cloning
Diagnostic DNA from all STIL-TAL1 cases was analysed for mutations in known T-ALL mutational hotspots in NOTCH1 (exons 26, 27 and 34), FBXW7 (exons 9 and 10), PTEN (exon 7) and IL7R (exon 6) using previously published methods [13,14,15,16]. All diagnostic samples were analysed by SNP-array to identify genomic losses and gains using the Affymetrix SNP 6.0 platform. Genotyping and generation of QC data were performed in Genotyping ConsoleTM v4.1.4 software (Affymetrix). CNAG version 188.8.131.52 beta was used to normalise output to a self-reference (patient remission DNA) or via a batch pairwise analysis using sex-matched control samples. The STIL-TAL1 patient-specific gene fusion was sequenced for the three cases that underwent single-cell genotyping analyses using previously published methods . The TA Cloning Kit® (Invitrogen by Life TechnologiesTM) was used for cloning experiments according to the manufacturer’s instructions.
Next-generation sequencing (NGS)
Whole exome sequencing (WES) was undertaken by Oxford Gene Technology. See Supplementary Methods for details and bioinformatics. Any genomic ‘drivers’ included in the single-cell genotyping experiments were validated with Sanger sequencing using custom primers designed using Primer Blast (Table S1).
Fluorescence in situ hybridisation
Fixed cytospins were prepared from archived viable cells and interphase FISH was performed with patient-specific FISH probes for the various copy number losses using in-house FISH probes (Table S2) and previously described methods . See Supplementary Methods for details.
Bioinformatic assessment of RAG recombinase activity at PTEN indel breakpoints
MEME Suite 4.11.4  was used to conduct an agnostic search for the RAG recombinase consensus heptamer and nonamer and also for the tetramer sequence (CACA) identified  as being recurrently present at RAG-mediated breakpoint sequences. We also used a weighted matrix algorithm (code availability—script kindly provided by the laboratory of Dr Papaemmanuil) to generate RAG recombination signal sequences (RSS) scores for each deletion breakpoint of interest. Essentially, this ascribed a likelihood score or weight to each base pair in the putative heptamer-spacer-nonamer sequence of interest according to the likelihood of deviation from consensus based on what the base pair is for the heptamer/nonamer and the number of bases rather than base choice per se for the spacer (score details outlined in ref. ).
Single-cell genotyping and single-cell Sanger sequencing
Single-cell genetic analysis was performed using stored viable cells for cases CUL76, 6116 and 6030 and paired xenograft material. Our previously established multiplex qPCR approach was used  with minor modifications. See Supplementary Methods for details.
Limited archived xenograft DNA and single-cell material prepared from xenograft bone marrow was available on samples CUL76, 6030 and 6116. Material generated using NOG (NOG-ShiSCID-IL2gamma null) mice was available for samples 6030 and 6116 and using NRG (NOD-Rag1nullIL2rgnull, NOD rag gamma) mice for sample CUL76. In all cases, 1 × 106 cells were injected and experiments were performed on material stored from primary passage bone marrow.
Molecular screening on STIL-TAL1+ T-ALL samples for recurrent drivers
Nineteen STIL-TAL1+ cases and one cell line were screened for common genetic rearrangements using copy number profiling and Sanger sequencing. Data on Brazilian samples BR75, BR74, S1 and S2 have been previously published [22, 23]. Four cases also underwent WES to identify both known driver targets for inclusion in single-cell experiments and novel drivers. Results of molecular screening are summarised in Table 1 (further details available from Tables S8, S9 and S10). Mean target coverage across the samples (and paired remission samples) that underwent WES was 95×–126× (average 109×). Total of 7–10 protein altering SNVs and 3–5 protein altering indels (detectable at a read depth >20) were detected per sample (Tables S11 and S12). All drivers incorporated in to single-cell experiments were validated with qPCR or Sanger sequencing.
In keeping with previous work highlighting that cases from the TAL/LMO gene expression subgroup tend to have a higher incidence of PTEN mutations and a lower incidence of NOTCH1 mutations , within our cohort of 20 patients (combining PTEN inactivation due to exon 7 mutation or copy number loss) we detected a frequency of PTEN inactivation of at least 40% (around double the frequency of PTEN inactivation (22%) detected in a recent study of 145 T-ALL cases using heteroduplex analysis, mutations and SNP arrays) . Additionally in the cases that underwent WES, mutations in PTEN outside the hotspot of exon 7 were detected (exon 8 in sample 6030 and exon 5 in CF10), suggesting that estimates based on copy number analysis and PTEN exon 7 sequencing may be an underestimate. A high frequency of copy number losses of 9p (95%, incorporating CDKN2A/CDKN2B locus) and 6q (30%) was also observed and these were identified as key drivers to include in single-cell genotyping studies.
Exome sequencing of four cases defined relevant known and potential T-ALL drivers to include in single-cell analysis experiments. Novel T-ALL potential drivers were detected: mutations in BMPR1A, FREM2 and PIK3CD. BMPR1A (bone morphogenetic protein receptor, type 1A) is a polyposis associated gene—a type 1A transmembrane serine/threonine kinase listed in the cosmic cancer gene census of genes functionally linked to cancer and PIK3CD is a subunit of phosphatidylinositol-4,5-bisphosphate 3-kinase a key member of the PTEN-PI3 kinase pathway. Both 6030 and 6116 contained one mutation per sample in FREM2 and one of these mutations (found re-iteratively in sample 6116) is found in the cosmic database as a somatic mutation in a GI carcinoma (COSM287123). Additionally, this gene was recurrently but non-significantly mutated in three T-ALL samples in a large exome T-ALL sequencing study . The gene encodes an integral membrane protein containing many chondroitin sulphate proteoglycan element repeats and calx-beta domains (although the identified mutations lie outside these domains, Figure S3). In view of the possible association with T-ALL, and the emerging role of the micro-environment/extracellular matrix in leukaemia and cancer, we included FREM2 mutations in our single-cell studies.
Reiterative inactivation of PTEN
Sanger sequencing and NGS data for samples 6030 and CF5 suggested the presence of multiple low-level PTEN exon 7 indels running in parallel (Fig. 1a, b). Parallel work by a collaborating group using HPLC wave technology (heteroduplex analysis)  also led to similar conclusions. Cloning experiments validated at least four independent indels for sample 6030. Sample CF10 also had two PTEN indels in exons 5 and 7 (detected by WES and Sanger sequencing). The multiple PTEN indels resulted functionally in the generation of stop codons in all but one of the mutations analysed (10/11; Table S13). Given the striking reiterative inactivation of PTEN, we hypothesised that these small structural alterations could be RAG-mediated given the involvement of aberrant RAG in the formation of the STIL-TAL1 fusion and CDKN2A deletions and the observation that RSS elsewhere in PTEN have been implicated in the formation of small microdeletions . However, we did not find bioinformatic evidence to support this explanation with maximum RSS scores of 6.75 (Table S14) compared to >8.55 in B cell precursor (BCP)-ALL samples using the same weighted matrix algorithm . Only one sample (CF10) had the RAG-associated CACA tetramer identified in Papaemmanuil et al. (2014) close to the mutation breakpoint.
Single-cell studies in STIL-TAL1+ T-ALL
We investigated the order of acquisition of CNAs with reference to the STIL-TAL1 fusion in samples 6030, CF5, CF6 and HK328. In the majority of cases, the earliest ancestor subclone contained the STIL-TAL1 fusion in combination with bi-allelic loss of 9p21.3 (containing CDKN2A locus) as shown for samples CF5 and CF6 (Fig. 2). In contrast with BCP-ALL  in the majority of cases 9p21.3 could not be separated in time from the presumed founder gene fusion/translocation. Additional copy number losses including 6q and PTEN occurred in a secondary and subclonal fashion. However, in sample 6030 the earliest detectable subclone contained STIL-TAL1+ cells with 1 copy of 9p21.3. This observation was validated using two independent FISH probes and the clone was also detected in xenograft bone marrow derived from sample 6030 (Figure S1a).
Single-cell multiplex qPCR
Although multicolour FISH provides proof-of-principle evidence for clonal heterogeneity in T-ALL, the number of driver genetic events that can be examined is limited and mutations in key T-ALL signalling pathways cannot be incorporated. We undertook a detailed single-cell genotyping study on three samples that represented all the key STIL-TAL1 genetic driver events identified in this study and which had archived paired xenograft material available. Archived diagnostic leukaemia cells from cases 6030, 6116 and CUL76 underwent high-throughput single-cell multiplex qPCR analysis allowing simultaneous investigation of the patient-specific STIL-TAL1 gene fusion in each case along with indels, SNVs and CNA designated driver status for each leukaemia sample. Quality control assessments and analysis were performed as previously described .
Phylogenetic trees for these three diagnostic cases constructed from single-cell data using the principle of maximum parsimony are shown in Fig. 3a–c. In all cases the root of the tree contains a common ancestor cell containing the patient-specific STIL-TAL1 fusion and copy number loss of CDKN2A. Other key drivers are subclonal but in some cases re-iterative in keeping with observations made in prior studies of B cell leukaemias and solid tumours [2, 9, 28,29,30]. The STIL-TAL1 assays used were specific for the patient-specific fusions (Table S15) and tested against both normal DNA and other STIL-TAL1 positive material to confirm assay specificity. This study provides the first evidence that a single STIL-TAL1 fusion occurs per case (previous studies assessing stability through xenograft passage have used chromosome 1p copy number as a surrogate for the gene fusion but this would not differentiate the presence of multiple overlapping fusions with relatively conserved mutation breakpoint region). NOTCH1 mutation and PTEN mutation are secondary and subclonal events in the case studies. The multiple PTEN mutations noted in sample 6030 were tracked at single-cell level and single-cell cloning and Sanger sequencing were used to demonstrate unequivocally that in clones with >1 mutation, mutations occurred on independent alleles suggesting a selective pressure for bi-allelic inactivation of PTEN (Fig. 1c).
The use of xenograft models can be used to assess clonal heterogeneity and evolution in the leukaemia initiating or so called ‘cancer stem-cell’ compartment providing insight in to genetic driver stability and evolutionary pressures on clonal selection. Single-cell assessment of xenograft bone marrow in the three cases examined demonstrated contrasting models of clonal evolution, although it is acknowledged that limited conclusions can be drawn from the single xenograft per case experiments performed. Clones that read-out in the xenograft are designated ‘T’ status in the evolutionary tree diagrams (Fig. 3a–c). In cases 6116 and CUL76, multiple clones read out in the xenograft confirming the position of NOTCH1 as a subclonal driver and demonstrating multiple competing subclones in the leukaemia initiating cell compartment. However, in case 6030 a single dominant clone (C7) predominated containing the STIL-TAL1 fusion, bi-allelic 9p21.3 deletion and a PTEN exon 8 mutation, which was present in both heterozygous and homozygous form. The proportion of cells within this subclone with a homozygous (as opposed to heterozygous) PTEN exon 8 mutation was higher in the xenograft than the diagnostic sample (88% versus 47%) and all xenograft single cells had either the heterozygous or homozygous mutation (i.e., no PTEN exon 8 wild-type cells detected based on Sanger sequencing of 78 single cells). The clones with PTEN exon 7 mutations were not detected in the xenograft. This validated bulk DNA sequencing data from the xenograft material which demonstrated the homozygous PTEN exon 8 mutation and wild-type PTEN exon 7 analysis (data not shown).
Comparison of 6030 xenograft single-cell FISH and multiplex qPCR data initially appeared to generate conflicting results with FISH data suggesting the presence of a clone with 1 copy of 9p21.3 in both diagnostic and xenograft material but single-cell data demonstrating bi-allelic loss in all subclones. Examination of SNP copy number data resolved this discrepancy, which was due to one of the 9p21.3 deletions being smaller than the size of the FISH probe so that the FISH subclone containing ‘1 copy’ of 9p21.3, 2 copies of 4p/6q (Figure S1a) actually corresponded to clone C7 in multiplex qPCR data (Fig. 3c), i.e., CDKN2A actually demonstrated bi-allelic deletion when small qPCR assays were used to assess copy number status. Examination of paired diagnosis/xenograft copy number data for case 6030 also demonstrated that CDKN2A loss is a subclonal event as with regard to the loss of the second allele of CDKN2A at least two independent clones had emerged with distinct breakpoints (Figure S1b). Since multiplex qPCR data for CDKN2A was based on copy number assays, it is possible that multiple CDKN2A deletions may exist in any one sample, which would not be detected using the analysis method used.
These single-cell genetic analyses allow us to infer phylogenetic trees describing clonal evolution. Technical limitations mean that interpretation of these data carries the caveat that we will have under-estimated clonal complexity. Our genetic markers are also limited. More genetically distinct subclones will exist than we currently detect. For case 6116 due to the limited genetic markers used we were unable to determine whether the re-iterative mutation event was PIK3CD or FREM2 mutation but did validate the bioinformatic analysis with single-cell Sanger sequencing analysis of FREM2 and PIK3CD mutation in single cells of the diagnostic and xenograft samples to confirm that re-iterative mutation had occurred (Figure S2).
Despite these caveats, several informative conclusions can be drawn. A consistent feature is that STIL-TAL1 fusion and CDKN2A loss are both early or truncal events, in contrast to other recurrent genetic changes including NOTCH1 and PTEN mutation that are secondary and subclonal. These observations have implications for selection of mutations for minimal residual disease tracking and as targets for therapy. Given the position of the STIL-TAL1 fusion in leukaemia evolution, therapies targeting the TAL1 regulatory complex  are worthy of further investigation.
It is difficult to discern from our study whether STIL-TAL1 or CDKN2A loss is an initiating event, or which comes first. The most ancestral cell in the phylogenetic structure has STIL-TAL1 fusion plus loss of both CDKN2A alleles. However, the latter are distinctive and presumed independent events, but only one clone-specific STIL-TAL1 fusion exists. This finding suggests that at least one of the CDKN2A allele deletions occurs subsequent to STIL-TAL1 fusion. This is in keeping with previously published work demonstrating reiterative CDKN2A deletions in STIL-TAL1 cases when breakpoints of 9p21.3 deletions in paired diagnostic/xenograft/relapse material were examined . We note that both STIL-TAL1 fusion and CDKN2A loss are likely to involve ‘off target’ RAG-dependent mutational mechanisms.
The clonal architectures in STIL-TAL1+ ALL cases (Fig. 3) for which we had genome sequencing data showed branching structure as previously described in B cell precursor ALL . To some extent, this is driven by reiterative mutations of the same driver genes and resultant parallel clonal evolution. This is evident with several driver genes but most clearly with PTEN in patient 6030 (Fig. 3c). Cloning these multiple mutations from single cells confirmed their uniqueness (Fig. 1c). In BCP-ALL, reiterative copy number changes (e.g., in ETV6, PAX5, CDKN2A or BTG1) are the consequence of RAG-mediated mutation, followed by selection [9, 20, 30]. We found no bioinformatics support for RAG involvement in the PTEN mutations we identified. We note, however, that RSS for RAGs have been previously implicated in PTEN small, microdeletions . Irrespective of the mutational mechanisms involved in these PTEN mutations, we conclude that in STIL-TAL1 ALL there is a strong selective pressure for these genetic lesions, most likely related to epistasis or a strong functional complementarity between PTEN inactivation/loss, STIL-TAL1 fusion and CDKN2A loss.
The objective, in our limited xenograft studies, was to determine the subclonal origin and genetic diversity of propagating or self-renewing cells in STIL-TAL1+ ALL. The data indicate that multiple subclones read out in the mice reflecting, we suggest, the existence of genetically diverse stem cells. In patient 6116 (Fig. 3a), all four diagnostic subclones read out in the transplants and in patient CUL76 (Fig. 3b) all three subclones. Patient 6030 gave a different result however. One dominant subclone (subclone C7; Fig. 3c) read-out of the seven that existed in the diagnostic sample. However, the diagnostic subclone C7 was heterogeneous with respect to PTEN exon 8 status and this directly was also reflected in the transplant read-outs, indicating that two small subclones of clone C7 had propagated in the mice, providing further evidence for selection favouring PTEN-mutated clones. Genetically diverse propagating or stem cells were similarly demonstrated in B cell precursor ALL [9, 32] and in glioblastoma  and are likely to be a common feature in cancer [34, 35]. The implication of this observation is that multiple subclones harbour the proliferative potential to fuel progression of disease, relapse and drug resistance.
We thank Mrs Susan M Colman for assistance with FISH protocols; Dr Claire Schwab for identification of STIL-TAL1 samples and Ms Tracey Perry for assistance with in vivo experiments. We acknowledge the ECOG group for the use of the sample 21922 and the Bloodwise Childhood Leukaemia Cell Bank for provision of primary haematological malignancy samples used in this study. We thank Dr Elli Papaemmanuil for the use of the weighted matrix algorithm for assessment of RAG recombinase activity.
CLF was supported by a clinical research training fellowship from Bloodwise (formerly, Leukaemia & Lymphoma Research) (grant number 11035). MBM was supported by the Partner Fellowship (European Haematology Association #2011/01); by the International Award for Research in Leukaemia (Lady Tata Memorial Trust) and by the Ministry of Health (INCA-Brazil). MSM was supported by Rally Foundation fellowship. AAF was supported by NIH (grant R35 CA210065) and Leukemia and Lymphoma Society (R0749-14). MG is supported by a Wellcome Trust award [105104/Z/14/Z] to the Centre for Evolution and Cancer.
CLF co-designed the study, conducted, supervised and analysed all the experiments and wrote the paper; MBM performed molecular investigations and analysed data; AMF, SJ and RG assisted with molecular investigations and analyses; FWvD assisted with SNP procedures and analyses; VJW and PK conducted in vivo experiments; MSM performed characterisation and selection of samples; AAF provided stored xenograft material for sample CUL76; LE performed the bioinformatics analyses; CJH, MSPO, AAF and PK provided clinical samples and immunophenotypic/cytogenetic and/or clinical data; IT supervised all single-cell sorting; NEP supervised all single-cell experiments and phylogenetic analysis; MG co-designed and supervised the study and co-wrote the paper. All authors critically reviewed and approved the final draft of the manuscript.