Transient abnormal myelopoiesis (TAM) is a myeloid proliferation resembling acute megakaryoblastic leukemia (AMKL), mostly affecting perinatal infants with Down syndrome. Although self-limiting in a majority of cases, TAM may evolve as non-self-limiting AMKL after spontaneous remission (DS-AMKL). Pathogenesis of these Down syndrome–related myeloid disorders is poorly understood, except for GATA1 mutations found in most cases. Here we report genomic profiling of 41 TAM, 49 DS-AMKL and 19 non-DS-AMKL samples, including whole-genome and/or whole-exome sequencing of 15 TAM and 14 DS-AMKL samples. TAM appears to be caused by a single GATA1 mutation and constitutive trisomy 21. Subsequent AMKL evolves from a pre-existing TAM clone through the acquisition of additional mutations, with major mutational targets including multiple cohesin components (53%), CTCF (20%), and EZH2, KANSL1 and other epigenetic regulators (45%), as well as common signaling pathways, such as the JAK family kinases, MPL, SH2B3 (LNK) and multiple RAS pathway genes (47%).
At a glance
TAM represents a transient proliferation of immature megakaryoblasts that occurs in 5–10% of perinatal infants with Down syndrome1, 2. Although morphologically indistinguishable from AMKL, TAM is self-limiting in the majority of cases and usually terminates spontaneously within 3–4 months of birth1. Hepatic infiltration of myeloid cells is a common finding and can be severe enough to be fatal, owing to hepatic failure, with liver fibrosis occurring in 5–16% of cases2, 3, 4. Moreover, even when spontaneous remission is achieved, approximately 20–30% of surviving infants develop DS-AMKL years after remission, although some DS-AMKL cases have no documented history of TAM4. In contrast to non–Down syndrome–related AMKL (non-DS-AMKL), which generally shows poor prognosis, individuals with DS-AMKL typically have a favorable prognosis. In molecular pathogenesis of these Down syndrome–related myeloid disorders, GATA1 mutations are detected in virtually all affected infants, suggesting their central role in Down syndrome–related myeloid proliferation5, 6. However, it is still open to question whether a GATA1 mutation is sufficient for the development of TAM in individuals with Down syndrome, what is the cellular origin of the subsequent AMKL, whether additional gene mutations are required for progression to AMKL, and, if so, what are their gene targets, although several genes have been reported to be mutated in occasional cases with DS-AMKL, including JAK1, JAK2 and JAK3 (refs. 7,8,9,10), TP53 (refs. 10,11), FLT3 (ref. 8) and MPL12. We reasoned that identifying a comprehensive registry of gene mutations and tracking them at a clonal level using massively parallel sequencing would provide vital information for addressing these questions.
Genomic landscape of Down syndrome–related myeloid neoplasms
We performed whole-genome sequencing of 4 trios consisting of samples from TAM, AMKL and complete remission phases (Supplementary Figs. 1 and 2 and Supplementary Table 1). In total, we confirmed 411 single-nucleotide variants (SNVs) and 17 small nucleotide insertions and deletions (indels) by Sanger sequencing and/or deep resequencing (Supplementary Fig. 1 and Supplementary Table 2). We detected only a few structural variants, including deletion, amplification and uniparental disomy, in the TAM and DS-AMKL genomes (Fig. 1 and Supplementary Fig. 3). The mean number of validated somatic mutations in DS-AMKL samples (71 or 0.023 mutations/Mb) was twice the number observed in TAM samples (36 or 0.012 mutations/Mb) (Supplementary Fig. 1a). Mutation numbers in samples from both phases were substantially lower than in most other cancers (Supplementary Fig. 4), although differences in mutation rates could partly be affected by different definitions and algorithms for mutation calling. The spectrum of mutations was over-represented by C-to-T and G-to-A transitions in both TAM and DS-AMKL samples, resembling the mutational spectra in gastric and colorectal cancers13 and in other blood cancers (Supplementary Fig. 1b)14, 15. We unmasked the details of clonal evolution and expansion leading to AMKL through the use of deep sequencing of individual mutations detected by combined whole-genome and whole-exome sequencing (Fig. 2 and Supplementary Table 2). Intratumoral heterogeneity was evident at initial diagnosis with TAM and in the AMKL phase in all cases (Supplementary Fig. 5). In UPN001, UPN002 and UPN004, AMKL evolved from one of the major subclones in the TAM phase with a shared GATA1 mutation, as reported previously in relapsed acute myeloid leukemia (AML) in adults (Fig. 2a,b,d)15. In contrast, UPN003 showed a unique pattern of clonal evolution, in which AMKL originated from a minor subclone in the TAM phase that was totally unrelated to the predominant clone in terms of somatic mutations, with no mutation shared by both phases, and carried an independent GATA1 mutation (Fig. 2c). In both scenarios, progression to AMKL seemed to be accompanied by many additional mutations, including common driver mutations that were absent in the original TAM population, indicating a multistep process of leukemogenesis.
We further investigated non-silent mutations by whole-exome sequencing of additional samples to generate a full registry of driver mutations that are relevant to the development of TAM and subsequent progression to AMKL (Supplementary Fig. 6 and Supplementary Table 1). We detected GATA1 mutations in all TAM and DS-AMKL cases, indicating sufficient sensitivity in our whole-exome analysis. In total, we confirmed 26 and 81 non-silent somatic mutations identified in the exome analysis of 15 TAM and 14 DS-AMKL samples, respectively, with 3 GATA1 mutations common to both phases (Supplementary Table 3). The mean number of non-silent mutations was significantly higher in DS-AMKL samples (5.8; range of 1–11) than in TAM samples (1.7; range of 1–5) (P = 0.0002) (Fig. 3a). Of the 107 mutations, 84 were single-nucleotide substitutions that were mostly within coding sequences, except for 4 splice-site mutations. We also observed predominantly C-to-T and G-to-A transitions for non-silent substitutions (Supplementary Fig. 7). The remaining mutations were frameshift (n = 21) or non-frameshift (n = 2) indels, most frequently involving GATA1 (n = 13). One individual with DS-AMKL (UPN004) had no SNVs or indels (Fig. 3a), but copy number analysis identified a large deletion at 16q involving the CTCF locus (Supplementary Fig. 3), suggesting that the alteration of CTCF could be a driver event in this case. Therefore, at least one additional genetic lesion other than GATA1 mutation was detected in our whole-exome sequencing, despite the low frequency of leukemic cells appearing to show the morphology of immature megakaryoblasts (blast percentage) in many cases, which is a known characteristic of DS-AMKL samples16, 17. Whole-exome sequencing results suggested the presence of intratumoral heterogeneity in the majority of DS-AMKL cases (Fig. 3b).
Spectrum of recurrent mutations in DS-AMKL
Recurrently affected genes are of primary interest in identifying driver mutations. Whereas GATA1 was the only recurrent mutational target in TAM samples, an additional eight genes were recurrently mutated in the DS-AMKL samples, including RAD21, STAG2, NRAS, CTCF, DCAF7, EZH2, KANSL1 and TP53 (Table 1). These genes are expressed in a wide variety of hematopoietic compartments, including in both myeloid and lymphoid cells, except for EZH2, whose expression is largely confined to CD34+ cells18 (Supplementary Fig. 8). We also found that these genes were expressed in DS-AMKL cells at similar levels to common hematopoietic genes19, although we did not observe significant difference in their expression levels in DS-AMKL and non-DS-AMKL cells (Supplementary Fig. 9).
We then performed targeted deep sequencing of these 8 genes in an extended set of 109 samples (including 29 samples in 25 discovery cases) consisting of 41 TAM, 49 DS-AMKL and 19 non-DS-AMKL samples (Supplementary Tables 1 and 4). We also included additional genes in targeted sequencing that were either functionally related to the above eight genes or were mutated only in single cases but had been previously reported to be mutated in DS-AMKL (JAK3) or other myeloid neoplasms (SH2B3, SUZ12, SRSF2 and WT1), together with other common mutational targets in adult myeloid malignancies (Supplementary Fig. 10 and Supplementary Tables 5 and 6). We also analyzed by RT-PCR two recurrent fusion genes previously reported in non-DS-AMKL cases, RBM15-MKL1 (OTT-MAL)20, 21 and CBFA2T3-GLIS2 (refs. 22,23).
Mutations of cohesin and associated molecules
Major components of the cohesin complex, including RAD21 and STAG2, were frequent targets of gene mutations in DS-AMKL (Table 1). Including an additional mutation in NIPBL, 8 of the 14 discovery DS-AMKL cases (57%) had a mutated cohesin or associated component (Supplementary Table 3). Cohesin is a multiprotein complex consisting of 4 core components, including the SMC1, SMC3, RAD21 and STAG proteins24, 25. In concert with several functionally associated proteins, such as the NIPBL and ESCO proteins, cohesin is engaged in the cohesion of newly replicated sister chromatids by forming a ring-like structure25, preventing their premature separation before late anaphase. Cohesin has also been implicated in post-replicative DNA repair and long-range regulation of gene expression26, 27, 28, 29, 30. Targeted deep sequencing confirmed recurrent mutations and deletions in all core cohesin components (STAG2, RAD21, SMC3 and SMC1A) and in NIPBL in 26 of 49 DS-AMKL cases (53%) but in none of the 41 TAM cases, although 2 non-DS-AMKL cases (11%) had STAG2 mutations (Fig. 4a,b and Supplementary Tables 7 and 8). Strikingly, all mutations and deletions in different cohesin components were completely mutually exclusive, suggesting that cohesin function was the common target of these mutations. All but one STAG2 mutation (encoding a p.Arg370Gln substitution) was either a nonsense, frameshift or splice-site change (Fig. 4a,b, Supplementary Figs. 11 and 12a, and Supplementary Table 7). Similarly, 6 of 9 RAD21 mutations were heterozygous nonsense or frameshift alterations. Four of the five mutations in NIPBL, SMC1A and SMC3 were also nonsense or splice-site changes causing abnormal exon skipping (Fig. 4a and Supplementary Table 7). Thus, most of these mutations were thought to result in premature truncation, leading to loss of cohesin function. The leukemogenic mechanism of mutated cohesin components is still elusive; some studies have implicated aneuploidy caused by cohesin dysfunction in oncogenic actions31. However, DS-AMKL cases have been characterized by a largely normal karyotype32. We found no significant difference in the frequency of aneuploidy between cases with mutated and wild-type cohesin in the current DS-AMKL cohort. Many cases with mutated cohesin had completely normal karyotypes, except for constitutive trisomy 21, arguing against the hypothesis that aneuploidy has a major role in the pathogenesis of cohesin-mutated DS-AMKL (Fig. 5a).
Given the high frequency of cohesin mutations, new recurrent CTCF mutations were of particular interest because the functional interaction of cohesin and CTCF proteins has been of emerging interest in the long-range regulation of gene expression26, 30, 33, 34. CTCF is a zinc-finger protein implicated in diverse regulatory functions, including transcriptional activation and/or repression, insulation, formation of chromatin barrier, imprinting and X-chromosome inactivation35. CTCF binds to target sequence elements and blocks the interaction of enhancers and promoters through DNA loop formation (insulator activity)36, and several lines of evidence suggest that cohesin occupies CTCF-binding sites to contribute to the long-range regulation of gene expression by participating in the formation and stabilization of a repressive loop26, 37. CTCF was mutated or deleted in ten DS-AMKL cases (20%), one TAM case (2%) and four non-DS-AMKL cases (21%), with seven mutations representing nonsense, frameshift or splice-site changes and an additional six alterations representing deletions resulting in the loss of protein function (Fig. 4a,b, Supplementary Figs. 11 and 12b, and Supplementary Tables 7 and 8). To our knowledge, this is the first report of frequent recurrent CTCF mutations in cancer, although rare mutations (occurring in approximately 2% of cases) have recently been reported in breast cancer sequencing38.
Mutations in epigenetic regulators
EZH2, which encodes a catalytic subunit of the Polycomb repressive complex 2 (PRC2) that is responsible for di- and trimethylation of histone H3 lysine 27 (H3K27)39, is another recurrent mutational target in DS-AMKL (Table 1). Inactivating mutations in EZH2 have been reported in up to 13% of myelodysplastic syndromes and related chronic myeloid neoplasms40. Although rarely mutated in adult AML41, EZH2 represents one of the most frequently mutated and deleted genes in childhood AMKL, as we identified mutations or deletions in 16 of 49 DS-AMKL cases (33%) and in 3 of 19 non-DS-AMKL cases (16%) (Fig. 4a,b, Supplementary Fig. 12c and Supplementary Tables 7 and 8). No other PRC2 components were mutated, except for SUZ12, which was mutated in a single DS-AMKL case (Fig. 4a and Supplementary Table 7). Although frequent mutations in other epigenetic regulators, including in TET2, IDH1 or IDH2, DNMT3A and ASXL1, are cardinal features of myeloid neoplasms in adults, we rarely found these mutations in DS-AMKL and non-DS-AMKL cases, only identifying occasional DNMT3A (n = 1), ASXL1 (n = 1) and BCOR (n = 2) mutations in DS-AMKL (Fig. 4a).
KANSL1 (encoding KAT8 regulatory NSL complex subunit 1; also known as MSL1V1 or NSL1) represents a new recurrent mutational target in human cancer (Table 1), although haploinsufficiency of KANSL1 through germline deletions or mutations has been implicated in a congenital disease known as 17q21.31 microdeletion syndrome (MIM 610443)42, 43. We found heterozygous mutations in KANSL1 in three DS-AMKL and three non-DS-AMKL cases, and most of these mutations were nonsense or frameshifts, leading to loss of protein function (Fig. 4a and Supplementary Table 7). KANSL1 protein is necessary and sufficient for the activity of the KAT8 (MOF) histone acetyltransferase complex, which is engaged in the acetylation of histone H4 lysine 16 (H4K16), leading to transcriptional activation. Loss of acetylation of H4K16 has been reported to be a common hallmark of human cancer, and other histone acetyltransferases for H4K16 have been reported to form recurrent fusion partners in leukemia, including MOZ and MORF44, suggesting a role for compromised H4K16 acetylation by KANSL1 mutations in leukemogenesis. Of interest, KANSL1 is also responsible for the acetylation of the TP53 tumor suppressor that is important for TP53-dependent transcriptional activation45. KAT8 also interacts with a histone H3 lysine 4 (H3K4) methyltransferase, MLL, and the interaction of MLL and KAT8 complexes facilitates the cooperative recruitment of both complexes to gene promoters and enhances transcription initiation at target genes45. Thus, impaired TP53 function and/or deregulated expression of MLL gene targets could also contribute to leukemogenesis by KANSL1 mutations.
Other mutations in DS-AMKL
RAS pathway mutations are common in hematopoietic malignancies and other human cancers but have not to our knowledge been described in DS-AMKL. In the current cohort, we identified RAS pathway mutations in the NRAS, KRAS, PTPN11, NF1 and CBL genes in 8 DS-AMKL cases (16%) and 6 non-DS-AMKL cases (32%), but these mutations were rarely found in TAM cases (n = 3; 7%) (Fig. 4a). Tyrosine kinase and cytokine receptor mutations were also common in DS-AMKL. We found mutations in JAK1, JAK2, JAK3, MPL or SH2B3 (LNK) in 17 DS-AMKL cases (35%) but rarely in TAM (n = 1) and non-DS-AMKL (n = 2) cases. We found no FLT3 mutations in our cohort. The identified mutations were largely mutually exclusive. We found JAK2 mutations in 4 DS-AMKL cases and 1 non-DS-AMKL case, including mutations encoding p.Val617Phe (n = 2), p.Leu611Ser (n = 1), p.Arg683Ser (n = 1) and p.Arg867Gln (n = 1); of these, JAK2 mutations encoding p.Arg683Ser and p.Arg867Gln substitutions have been reported in acute lymphoblastic leukemia (ALL)46, 47 but not in myeloid malignancies8, 46. Thus, we re-evaluated the diagnosis of AMKL in both UPN097 (p.Arg683Ser) and UPN023 (p.Arg867Gln), in whom the initial diagnosis of AMKL was strongly supported by typical surface marker expression of CD41, CD41b, CD117, CD13, CD33, CD34 and CD36 in UPN097 and of CD7, CD13, CD34, CD41a and CD42b in UPN023, together with characteristic cytomorphology. Similarly, the mutation encoding p.Leu611Ser was reported in both ALL48 and polycythemia vera49. Thus, it seems that some JAK2 mutations are involved in both myeloid and lymphoid leukemogenesis. As reported previously10, 11, TP53 mutations were found in approximately 10% of DS-AMKL cases. Two identical somatic mutations found in the DCAF7 gene (encoding p.Leu340Phe) might be interesting because the DCAF7 protein interacts with the DYRK1a kinase encoded within the Down syndrome critical region on chromosome 21 (ref. 50). DCAF7 has been shown to interact with DYRK1a through its N-terminal or C-terminal region, and the p.Leu340Phe substitution identified in our study was also located in the C-terminal domain. However, no additional mutation was detected in the extended cohort; therefore, the relevance of DCAF7 remains to be determined.
Allelic burden of major recurrent mutations relative to GATA1 mutations
We assessed intratumoral heterogeneity and the clonal origin of mutations by calculating the variant allele frequency (VAF) of each mutation relative to that of the GATA1 mutation using deep sequencing. Mutations in cohesin components, CTCF and EZH2 showed comparable VAFs to GATA1 mutations (Fig. 5b), suggesting their role in the early stage of DS-AMKL development. In contrast, RAS pathway and other tyrosine kinases and cytokine receptor mutations showed significantly lower VAFs than corresponding GATA1 mutations (P = 0.0001) (Fig. 5b), indicating that they are more likely to represent subclonal mutations, which were typically preceded by mutations in cohesin components, CTCF and EZH2 and were involved in the evolution of multiple DS-AMKL subclones. Although RAS and JAK pathways activated by gene mutations represent potentially druggable targets and several promising compounds are currently available, this observation may largely preclude the efficient use of such compounds in eradicating founding DS-AMKL clones.
Distinct genetic features of Down syndrome– and non–Down syndrome–related AMKL
Despite their morphological similarities, both forms of AMKL in childhood are characterized by distinctive genetic features. According to the current study and a recent report of integrated analysis of non-DS-AMKL22, GATA1 mutations and trisomy 21 are less common in non-DS-AMKL than in DS-AMKL cases (Fig. 4a and Supplementary Table 9). In our series, DS-AMKL was characterized by high frequencies of mutations in the cohesin complex, EZH2 and other epigenetic regulators, as well as in JAK family kinases, which were less common mutational targets in non-DS-AMKL. Previous studies identified recurrent CBFA2T3-GLIS2 and RBM15-MKL gene fusions in non-DS-AMKL, which were found in 27% and 15.2% of non-DS-AMKL cases, respectively22, 51, whereas these fusions were not detected in DS-AMKL cases in another report (n = 10 cases)23. Similarly, in the current cohort, RT-PCR analysis identified 2 CBFA2T3-GLIS2 and 3 RBM15-MKL fusion genes in 19 non-DS-AMKL cases but not in TAM and DS-AMKL cases (Fig. 4a and Supplementary Table 10), illustrating the genetic differences between DS-AMKL and non-DS-AMKL. In addition, our RNA sequencing of the current cases (n = 17) (Supplementary Table 11) also showed no CBFA2T3-GLIS2 and RBM15-MKL fusions.
Whole-genome and/or whole-exome analyses and follow-up targeted sequencing identified several new aspects of the pathogenesis of Down syndrome–related myeloid proliferation. First, the initial TAM phase was characterized by a paucity of somatic mutations. The mean number of non-silent mutations per sample (1.7; range of 1–5) was surprisingly small compared with that reported in other human cancers (Supplementary Fig. 13), in line with a recent report that identified 1.2 (range of 1–2) mutations per sample by whole-exome sequencing in 5 TAM samples52. In addition to reporting a low somatic mutation frequency in their initial TAM phase, Nikolaev et al.52 also reported accumulation of somatic mutations (including single cases of SMC3 and EZH2 mutation) during progression from TAM to DS-AMKL. Excluding common GATA1 mutations, we identified no other recurrent mutations, with only 0.7 non-silent mutations per case, indicating that TAM could be caused by a single acquired GATA1 mutation in addition to constitutive trisomy 21.
Intratumoral heterogeneity was evident not only in the DS-AMKL phase but also at the initial diagnosis of TAM, and subsequent DS-AMKL originated from one of the multiple subclones present in the TAM phase, usually representing the progeny of the largest subpopulation. In most cases, the DS-AMKL clone was accompanied by newly acquired driver mutations not shared by the original TAM population, generating a unique landscape of gene mutations in DS-AMKL, which was characterized by high mutational frequencies in cohesin or CTCF (65%), other epigenetic regulators (45%), and RAS or signal-transducing molecules (47%) (Fig. 4a). Tumor recurrence or evolution has not to our knowledge been characterized by the distinct gene mutations in greater detail than in the present study. In total, 44 of the 49 DS-AMKL cases had additional mutations beyond those in GATA1 (Fig. 4a), even though there was a clear limitation on capturing mutations using the targeted sequencing approach.
The very high frequency of cohesin (53%) and EZH2 (33%) mutations and deletions in DS-AMKL but not in TAM or non-DS-AMKL cases was noteworthy because the reported mutation rates of cohesin and EZH2 in adult AML and other human cancers remain approximately 10% (refs. 14,40,41), underscoring a major role for these mutations in the pathogenesis of DS-AMKL. The leukemogenic mechanism of mutated cohesin remains elusive, and frequent CTCF mutations also need further evaluation to characterize their possible cooperative role with cohesin mutations26, 30, 33, 34. To our knowledge, KANSL1 mutations have not been reported previously and represent a new recurrent mutational target in human cancer, although their functional impact on AMKL development remains unknown. Evaluation of the allelic burden of these mutations by deep sequencing disclosed a clonal hierarchy among different driver mutations in which clonal mutations in cohesin, CTCF and epigenetic regulators frequently preceded subclonal mutations in RAS and signal transduction molecules.
In conclusion, Down syndrome–related myeloid proliferation is shaped by multiple rounds of acquisition of new mutations and clonal selection, which are initiated by a GATA1 mutation in the TAM phase and further driven by mutation in cohesin or CTCF, EZH2 or other epigenetic regulators, and RAS or signal-transducing molecules, leading to AMKL. DS-AMKL and non-DS-AMKL showed similar phenotypes but had distinct genetic features, which may underlie their different clinical characteristics.
Subjects and samples.
Genomic DNA from 84 individuals with Down syndrome–related myeloid disorders (41 samples from the TAM phase and 49 from the AMKL phase) and 19 with non-DS-AMKL were analyzed by whole-genome and/or whole-exome and/or targeted deep sequencing. In six cases with Down syndrome–related myeloid disorders, samples were collected from both the TAM and AMKL phases. RNA sequencing was also performed for 12 of the 49 DS-AMKL cases and for 5 additional DS-AMKL cases. RNA samples were also available for RT-PCR analysis from 30 cases with TAM, 32 cases with DS-AMKL and 15 cases with non-DS-AMKL. Written informed consent was obtained from each subject's parents before sample collection (Supplementary Note). This study was approved by the Ethics Committees of the University of Tokyo according to the Helsinki convention. GATA1 mutations were detected by Sanger sequencing of all TAM and DS-AMKL samples according to the previously described procedure5. Detailed information on subjects and samples is provided in Supplementary Tables 1, 4, 11 and 12. Tumor DNA was extracted from bone marrow– or peripheral blood–derived mononuclear cells at diagnosis. Genomic DNA samples from peripheral blood from subjects in remission or from nail tissues at diagnosis were used as germline controls. Genomic DNA was extracted using a QIAamp DNA Blood Mini kit and a QIAamp DNA Investigator kit (Qiagen). Total RNA was extracted using the RNeasy kit (Qiagen) with RNase-free DNase (Qiagen).
DNA samples were processed for whole-exome sequencing using NEBNext DNA sample Prep Reagent (New England Biolabs) according to the modified Illumina protocol. Sequence data were generated on the Illumina HiSeq 2000 platform in 100-bp paired-end reads. Data processing and variant calling were performed as described previously54. All candidate variants were validated by deep sequencing.
Validation and quantitative measurements of the frequencies of mutant alleles by deep sequencing.
Individual mutation sites were amplified by genomic PCR using primers tagged with NotI cleavage sites and subjected to high-throughput sequencing as described previously55, except that target DNA was not pooled. Deep sequencing was performed using the MiSeq or HiSeq 2000 platform. Data processing was performed according to the previously described method with minor modifications55. Briefly, each read was aligned to a set of PCR-amplified target sequences using BLAT56, and dichotomic variant alleles were differentially enumerated. For indels, individual reads were first aligned to each of the wild-type and indel sequences and then assigned to the one to which better alignment was obtained in terms of the number of matched bases. Each SNV and indel whose VAF in the tumor sample was equal to or greater than 2.0% and significantly higher than the frequency in the germline sample was adopted as a somatic mutation. The error size for estimated VAFs was evaluated by assuming binomial distributions in deep sequencing, which were confirmed by observed allele frequencies at heterozygous SNPs in normal DNA samples (Supplementary Fig. 14a), in which the variance (σ2) ranged from 4.0−11.0 × 10−4 (Supplementary Fig. 14b).
Clustering analysis of mutations.
To identify the chronological behavior of the structure of the tumor subpopulation for the TAM and AMKL phases, somatic mutations detected in both phases by whole-genome sequencing were clustered according to their VAFs as measured by deep sequencing. Copy number–adjusted deep sequencing data, in which the VAFs of genes on the X chromosome in male cases or in regions of uniparental disomy were halved, were subjected to unsupervised clustering. Six mutations located in amplified or deleted genomic regions were excluded from the analysis. Long indels of >3 bp, except for those affecting key genes such as GATA1 and RAD21, and mutations in repetitive regions were excluded from the analysis because their VAFs could tend to be underestimated.
All validated mutations were grouped into three categories according to the following criteria: (i) mutations found only in TAM (VAF in AMKL < 0.02), (ii) mutations found only in AMKL (VAF in TAM < 0.02) and (iii) mutations found in both TAM and AMKL (VAF in TAM > 0.02 and VAF in AMKL > 0.02). Clustering of mutations in each category was performed using Mclust, provided as an R package, on the basis of the VAFs of the mutations in the TAM and AMKL phases, where one-dimensional clustering of mutations in categories (i) and (ii) was performed on the basis of the homoscedastic model and two-dimensional clustering was performed for mutations in category (iii) on the basis of the ellipsoidal model. The most appropriate number of clusters was determined by using the Bayesian information criterion (BIC) score. Singleton points identified by this algorithm were regarded as outliers. Clonal subpopulations within tumors were also evaluated by kernel density analysis (Supplementary Fig. 5), where we drew kernel density estimate plots for the VAFs of validated variants using the density function in R.
Whole-exome sequencing and detection of somatic mutations.
Exome capture was performed using SureSelect Human All Exon V3 or V4 (Agilent Technologies) or the TruSeq Exome Enrichment kit (Illumina). Enriched exome fragments were then subjected to massively parallel sequencing using the Genome Analyzer IIx or HiSeq 2000 platform (Illumina). Candidate somatic mutations were detected using our in-house pipeline EBCall (Empirical Bayesian mutation Calling; see URLs)57. All candidates were validated by Sanger sequencing or independent deep sequencing.
PCR-based targeted deep sequencing.
Deep sequencing of DCAF7, EED, JAK1, JAK3, KANSL1, SH2B3, and SUZ12 was performed using the primers tagged with NotI cleavage sites whose sequences are listed in Supplementary Table 6. Data processing and variant calling were performed as described previously58. All candidate variants were validated by Sanger sequencing or independent deep sequencing using non-amplified DNA.
Targeted deep sequencing.
In total, 39 gene targets were exhaustively examined for mutations in all 109 cases using deep sequencing (Supplementary Table 5). Genomic DNA (1–1.5 μg) from bone marrow–derived mononuclear cells or peripheral blood was enriched for target exons using a SureSelect custom kit (Agilent Technologies) designed to capture all of the coding exons from the 39 target genes, and high-throughput sequencing was performed on the enriched targets using the HiSeq 2000 platform with a standard 100-bp paired-end read protocol. Sequencing reads were aligned to hg19 using Burrows-Wheeler Aligner (BWA) version 0.5.8 with default parameters. The allele frequencies of SNVs and indels were calculated at each genomic position by enumerating the relevant reads with SAMtools59. Initially, all variants showing VAF > 0.02 were extracted and annotated using ANNOVAR60 for further consideration if they were found in >6 reads out of >10 total reads and appeared in both plus- and minus-strand reads. For the cases for which no germline DNA was available, relevant somatic mutations were called by eliminating the following entries, unless they were registered in the Catalogue of Somatic Mutations in Cancer (COSMIC) v60 (ref. 61) or reported as somatic mutations in PubMed: (i) synonymous variants and those having ambiguous (unknown) annotations, (ii) known SNPs in public and private databases, including dbSNP131, the 1000 Genomes Project as of 23 November 2010 and our in-house database, (iii) sequencing or mapping errors, (iv) all missense SNVs with allele frequencies of 0.45–0.55 and (v) variants localized to duplicated regions found in SegDups of the UCSC Genome Browser. To eliminate sequencing errors in category (iii), we excluded all variants found in 31 normal Japanese samples at, on average, allele frequency > 0.25. Mapping errors were removed by visual inspection with the Integrative Genomics Viewer browser62. All candidate variants were validated by Sanger sequencing or independent deep sequencing.
Calculation of copy numbers for target exons.
Letting be the sequencing depth at the ith nucleotide of the jth exon in sample s, the standardized depth of the jth exon is calculated as
where ks is determined to satisfy
for a fixed constant k0 (for example, k0 = 1). The correlation coefficient (R = Rs,t) between two vectors and was calculated, where and represent the depth for a given sample (sample s) and each of the 443 samples (sample t), analyzed for other projects, with completely normal copy numbers in array–comparative genomic hybridization (aCGH; t = 1, 2, 3,..., 443), respectively, through which a total of m0 (= 12) control samples showing the largest R values were selected (Tm; m = 1, 2, 3,..., m0) and used for copy number calculation. The copy number of the ith target exon of sample s was calculated as
where was calculated by averaging m0 samples by
Copy numbers were calculated for exons with mean depth of >500. Circular binary segmentation was also used to identify discrete copy number segments using DNACopy (see URLs); segmented copy number was defined for the ith exon of sample s. The distribution of was calculated for all samples, and exons showing > 4 s.d. were considered to have copy number losses or gains.
Screening for CBFA2T3-GLIS2 and RBM15-MKL1 fusion genes.
CBFA2T3-GLIS2 and RBM15-MKL1 fusion genes were screened by RT-PCR22, 63. Primer sequences are given in Supplementary Table 13. PCR amplification was performed by 40 cycles at 94 °C for 2 min, 60 °C for 30 s and 68 °C for 1 min, followed by denaturation at 94 °C for 2 min and extension at 68 °C for 7 min.
SNP array analyses.
All tumor samples subjected to whole-exome sequencing were also analyzed for copy number alterations using SNP arrays (Affymetrix GeneChip Human Mapping 250K NspI Array or Genome-Wide Human SNP Array 6.0) as described previously10, 64, 65.
RT-PCR analysis of STAG2 and CTCF transcripts.
To confirm abnormal splicing of CTCF in UPN016 and UPN071 and that of STAG2 in UPN067, RT-PCR were performed using cDNA derived from each subject, with cDNA from CMK11-5 (DS-AMKL–derived cell line with no known mutations in both genes) used as a control (Supplementary Fig. 11). Primer sequences are given in Supplementary Table 14. Total RNA (1 μg) was subjected to reverse transcription using M-MLV reverse transcriptase (Invitrogen) according to the manufacturer's instructions. Electrophoresis was performed using Experion (Bio-Rad).
Detailed information on samples is provided in Supplementary Table 11. Library preparation and sequencing were performed as described previously54. Fusion transcripts were detected using Genomon-fusion.
Gene expression analysis of recurrently mutated genes.
Expression data for the recurrently mutated genes in whole-exome sequencing were retrieved from the BioGPS database18 for normal hematopoietic cells, including whole bone marrow, CD33+ myeloid cells, CD34+ cells, CD19+ B cells and CD4+ T cells, and from published data19 and our RNA sequencing data for DS-AMKL samples.
The number of non-silent mutations identified by whole-exome sequencing in TAM and DS-AMKL samples (Fig. 2a) and the number of chromosome abnormalities in DS-AMKL cases with and without cohesin mutations or deletions (Fig. 5a) were compared using the Mann-Whitney U test. The difference in VAF between two mutations (Fig. 5b) was tested by Wilcoxon signed-rank test.
European Genome-phenome Archive (EGA), https://www.ebi.ac.uk/ega/; EBCall, https://github.com/friend1ws/EBCall; Catalogue of Somatic Mutations in Cancer (COSMIC), http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/; PubMed, http://www.ncbi.nlm.nih.gov/pubmed/; UCSC Genome Browser, http://genome.ucsc.edu/; Integrative Genomics Viewer, http://www.broadinstitute.org/igv/; DNACopy, http://biostatistics.oxfordjournals.org/content/5/4/557.full.pdf; Genomon-fusion (in Japanese), http://genomon.hgc.jp/rna/.
Sequencing data have been deposited in the European Genome-phenome Archive (EGA) under accession EGAS00001000546.
- Myeloid leukemia in Down syndrome. Crit. Rev. Oncog. 16, 25–36 (2011). , &
- A prospective study of the natural history of transient leukemia (TL) in neonates with Down syndrome (DS): Children's Oncology Group (COG) study POG-9481. Blood 107, 4606–4613 (2006). et al.
- Risk factors for early death in neonates with Down syndrome and transient leukaemia. Br. J. Haematol. 142, 610–615 (2008). et al.
- Treatment and prognostic impact of transient leukemia in neonates with Down syndrome. Blood 111, 2991–2998 (2008). et al.
- Frequent mutations in the GATA-1 gene in the transient myeloproliferative disorder of Down syndrome. Blood 102, 2960–2968 (2003). et al.
- Acquired mutations in GATA1 in the megakaryoblastic leukemia of Down syndrome. Nat. Genet. 32, 148–152 (2002). et al.
- Activating alleles of JAK3 in acute megakaryoblastic leukemia. Cancer Cell 10, 65–75 (2006). et al.
- Activating mutations in human acute megakaryoblastic leukemia. Blood 112, 4220–4226 (2008). et al.
- Frequency and prognostic implications of JAK 1–3 aberrations in Down syndrome acute lymphoblastic and myeloid leukemia. Leukemia 25, 1365–1368 (2011). et al.
- Molecular lesions in childhood and adult acute megakaryoblastic leukaemia. Br. J. Haematol. 156, 316–325 (2012). et al.
- The role of p53 in megakaryocyte differentiation and the megakaryocytic leukemias of Down syndrome. Cancer Genet. Cytogenet. 116, 1–5 (2000). , &
- MPLW515L mutation in acute megakaryoblastic leukaemia. Leukemia 23, 852–855 (2009). et al.
- Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 (2007). et al.
- The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012). et al.
- Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012). et al.
- Diagnosis and management of acute myeloid leukemia in children and adolescents: recommendations from an international expert panel. Blood 120, 3187–3205 (2012). et al.
- WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues (International Agency for Research on Cancer, Lyon, France, 2008). , & International Agency for Research on Cancer & World Health Organization
- BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol. 10, R130 (2009). et al.
- Identification of distinct molecular phenotypes in acute megakaryoblastic leukemia by gene expression profiling. Proc. Natl. Acad. Sci. USA 103, 3339–3344 (2006). et al.
- Involvement of a human gene related to the Drosophila spen gene in the recurrent t(1;22) translocation of acute megakaryocytic leukemia. Proc. Natl. Acad. Sci. USA 98, 5776–5779 (2001). et al.
- Fusion of two novel genes, RBM15 and MKL1, in the t(1;22)(p13;q13) of acute megakaryoblastic leukemia. Nat. Genet. 28, 220–221 (2001). et al.
- An inv(16)(p13.3q24.3)-encoded CBFA2T3-GLIS2 fusion protein defines an aggressive subtype of pediatric acute megakaryoblastic leukemia. Cancer Cell 22, 683–697 (2012). et al.
- Characterization of novel genomic alterations and therapeutic approaches using acute megakaryoblastic leukemia xenograft models. J. Exp. Med. 209, 2017–2031 (2012). et al.
- Chromosomal cohesin forms a ring. Cell 112, 765–777 (2003). , &
- Cohesin: its roles and mechanisms. Annu. Rev. Genet. 43, 525–558 (2009). &
- Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451, 796–801 (2008). et al.
- Postreplicative formation of cohesion is required for repair and induced by a single DNA break. Science 317, 242–245 (2007). et al.
- The cohesin complex is required for the DNA damage–induced G2/M checkpoint in mammalian cells. EMBO J. 28, 2625–2635 (2009). &
- Effects of sister chromatid cohesion proteins on cut gene expression during wing development in Drosophila. Development 132, 4743–4753 (2005). et al.
- Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422–433 (2008). et al.
- Mutational inactivation of STAG2 causes aneuploidy in human cancer. Science 333, 1039–1043 (2011). et al.
- Cytogenetic features of acute lymphoblastic and myeloid leukemias in pediatric patients with Down syndrome: an iBFM-SG study. Blood 111, 1575–1583 (2008). et al.
- CTCF physically links cohesin to chromatin. Proc. Natl. Acad. Sci. USA 105, 8309–8314 (2008). et al.
- Cohesins localize with CTCF at the KSHV latency control region and at cellular c-myc and H19/Igf2 insulators. EMBO J. 27, 654–666 (2008). et al.
- CTCF shapes chromatin by multiple mechanisms: the impact of 20 years of CTCF research on understanding the workings of chromatin. Chromosoma 119, 351–360 (2010). , &
- CTCF: master weaver of the genome. Cell 137, 1194–1211 (2009). &
- How cohesin and CTCF cooperate in regulating gene expression. Chromosome Res. 17, 201–214 (2009). &
- Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
- Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 298, 1039–1043 (2002). et al.
- Inactivating mutations of the histone methyltransferase gene EZH2 in myeloid disorders. Nat. Genet. 42, 722–726 (2010). et al.
- Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. N. Engl. J. Med. 366, 1079–1089 (2012). et al.
- Mutations in the chromatin modifier gene KANSL1 cause the 17q21.31 microdeletion syndrome. Nat. Genet. 44, 639–641 (2012). et al.
- Mutations in KANSL1 cause the 17q21.31 microdeletion syndrome phenotype. Nat. Genet. 44, 636–638 (2012). et al.
- The diverse superfamily of lysine acetyltransferases and their roles in leukemia and other diseases. Nucleic Acids Res. 32, 959–976 (2004).
- Two mammalian MOF complexes regulate transcription activation by distinct mechanisms. Mol. Cell 36, 290–301 (2009). , , , &
- Mutations of JAK2 in acute lymphoblastic leukaemias associated with Down's syndrome. Lancet 372, 1484–1492 (2008). et al.
- JAK mutations in high-risk childhood acute lymphoblastic leukemia. Proc. Natl. Acad. Sci. USA 106, 9414–9418 (2009). et al.
- Mutational screen reveals a novel JAK2 mutation, L611S, in a child with acute lymphoblastic leukemia. Leukemia 20, 381–383 (2006). et al.
- Detection of JAK2 mutations in paraffin marrow biopsies by high resolution melting analysis: identification of L611S alone and in cis with V617F in polycythemia vera. Leuk. Lymphoma 53, 2479–2486 (2012). et al.
- DYRK1A binds to an evolutionarily conserved WD40-repeat protein WDR68 and induces its nuclear translocation. Biochim. Biophys. Acta 1813, 1728–1739 (2011). &
- NUP98/JARID1A is a novel recurrent abnormality in pediatric acute megakaryoblastic leukemia with a distinct HOX gene expression pattern. Leukemia doi:10.1038/leu.2013.87 (27 March 2013). et al.
- Exome sequencing identifies putative drivers of progression of transient myeloproliferative disorder to AMKL in infants with Down Syndrome. Blood 122, 554–561 (2013). et al.
- Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009). et al.
- Integrated molecular analysis of clear-cell renal cell carcinoma. Nat. Genet. 45, 860–867 (2013). et al.
- Frequent pathway mutations of splicing machinery in myelodysplasia. Nature 478, 64–69 (2011). et al.
- BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
- An empirical Bayesian framework for somatic mutation detection from cancer genome sequencing data. Nucleic Acids Res. 41, e89 (2013). et al.
- Exome sequencing identifies secondary mutations of SETBP1 and JAK3 in juvenile myelomonocytic leukemia. Nat. Genet. 45, 937–941 (2013). et al.
- The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). et al.
- ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). , &
- COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011). et al.
- Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011). et al.
- Acute megakaryoblastic leukemia with a four-way variant translocation originating the RBM15-MKL1 fusion gene. Pediatr. Blood Cancer 56, 846–849 (2011). et al.
- A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res. 65, 6071–6079 (2005). et al.
- Highly sensitive method for genomewide detection of allelic composition in nonpaired, primary tumor specimens by use of Affymetrix single-nucleotide-polymorphism genotyping microarrays. Am. J. Hum. Genet. 81, 114–126 (2007). et al.
We thank Y. Mori, M. Nakamura, O. Hagiwara and N. Mizota for their technical assistance. This work was supported by the Research on Measures for Intractable Diseases Project and Health and Labor Sciences Research grants (Research on Intractable Diseases) from the Ministry of Health, Labour and Welfare, by Grants-in-Aid from the Ministry of Health, Labor and Welfare of Japan and KAKENHI (22134006, 23249052, 23118501, 23390266 and 25461579) and by the Japan Society for the Promotion of Science (JSPS) through the Funding Program for World-Leading Innovative Research and Development on Science and Technology (FIRST Program), initiated by the Council for Science and Technology Policy (CSTP) and research grants from the Japan Science and Technology Agency CREST.
- Supplementary Text and Figures (5,395 KB)
Supplementary Figures 1–14, Supplementary Tables 1–14 and Supplementary Note