Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The genomic landscape of pediatric acute lymphoblastic leukemia

Abstract

Acute lymphoblastic leukemia (ALL) is the most common childhood cancer. Here, using whole-genome, exome and transcriptome sequencing of 2,754 childhood patients with ALL, we find that, despite a generally low mutation burden, ALL cases harbor a median of four putative somatic driver alterations per sample, with 376 putative driver genes identified varying in prevalence across ALL subtypes. Most samples harbor at least one rare gene alteration, including 70 putative cancer driver genes associated with ubiquitination, SUMOylation, noncoding transcripts and other functions. In hyperdiploid B-ALL, chromosomal gains are acquired early and synchronously before ultraviolet-induced mutation. By contrast, ultraviolet-induced mutations precede chromosomal gains in B-ALL cases with intrachromosomal amplification of chromosome 21. We also demonstrate the prognostic significance of genetic alterations within subtypes. Intriguingly, DUX4- and KMT2A-rearranged subtypes separate into CEBPA/FLT3- or NFATC4-expressing subgroups with potential clinical implications. Together, these results deepen understanding of the ALL genomic landscape and associated outcomes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: ALL cohort, mutational burden and mutational signatures.
Fig. 2: Temporal evolution of UV-associated mutations and copy gains in aneuploid B-ALL subtypes.
Fig. 3: Mutational landscapes across ALL subtypes.
Fig. 4: Clonality of driver SNVs and indels.
Fig. 5: Alterations in rare ALL genes.
Fig. 6: Association of secondary genetic alterations with outcome.
Fig. 7: Dichotomous CEBPA and NFATC4 expression identifies subgroups of KMT2A- and DUX4-rearranged subtypes.

Similar content being viewed by others

Data availability

Genomic data are publicly available, and data accessions for RNA-seq, WES, WGS and SNP are listed for each case in Supplementary Table 1. TARGET ALL data may be accessed through the TARGET website at https://ocg.cancer.gov/programs/target/data-matrix. TARGET BAM and FASTQ sequence files are accessible through the database of genotypes and phenotypes (dbGaP; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000218.v24.p8) under accession no. phs000218 (TARGET) and at NCI’s Genomic Data Commons (http://gdc.cancer.gov) under project TARGET. The remaining (non-TARGET) data have been deposited in the European Genome Phenome Archive, accession nos. EGAS00001000447, EGAS00001000654, EGAS00001001923, EGAS00001001952, EGAS00001002217, EGAS00001003266, EGAS00001004810, EGAS00001004998, EGAS00001005084 and EGAS00001005250, and are also accessible through St. Jude Cloud at https://platform.stjude.cloud/data/cohorts?dataset_accession=SJC-DS-1009. All raw sequencing data are available under controlled access for protection of germline information and to ensure appropriate data usage, and approval can be obtained by application through the dbGaP portal (for TARGET datasets) or by contacting the PCGP steering committee (PCGP_data_request@stjude.org) for non-TARGET (EGA-deposited) datasets. Somatic mutation data can also be explored interactively using ProteinPaint81 and GenomePaint86 on St. Jude Cloud at https://viz.stjude.cloud/mullighan-lab/collection/the-genomic-landscape-of-pediatric-acute-lymphoblastic-leukemia~15.

Code availability

This study did not involve the development of custom code.

References

  1. Iacobucci, I., Kimura, S. & Mullighan, C. G. Biologic and therapeutic implications of genomic alterations in acute lymphoblastic leukemia. J. Clin. Med. 10, 3792 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Roberts, K. G. & Mullighan, C. G. The biology of B-progenitor acute lymphoblastic leukemia. Cold Spring Harb. Perspect. Med. 10, a034835 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Den Boer, M. L. et al. A subtype of childhood acute lymphoblastic leukaemia with poor treatment outcome: a genome-wide classification study. Lancet Oncol. 10, 125–134 (2009).

    Article  CAS  Google Scholar 

  4. Mullighan, C. G. et al. Deletion of IKZF1 and prognosis in acute lymphoblastic leukemia. N. Engl. J. Med. 360, 470–480 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Roberts, K. G. Why and how to treat Ph-like ALL? Best Pract. Res. Clin. Haematol. 31, 351–356 (2018).

    Article  PubMed  Google Scholar 

  6. Tasian, S. K., Loh, M. L. & Hunger, S. P. Philadelphia chromosome-like acute lymphoblastic leukemia. Blood 130, 2064–2072 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Lilljebjorn, H. et al. Identification of ETV6-RUNX1-like and DUX4-rearranged subtypes in paediatric B-cell precursor acute lymphoblastic leukaemia. Nat. Commun. 7, 11790 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Zhang, J. et al. Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat. Genet. 48, 1481–1489 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Gu, Z. et al. Genomic analyses identify recurrent MEF2D fusions in acute lymphoblastic leukaemia. Nat. Commun. 7, 13331 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Gocho, Y. et al. A novel recurrent EP300-ZNF384 gene fusion in B-cell precursor acute lymphoblastic leukemia. Leukemia 29, 2445–2448 (2015).

    Article  CAS  PubMed  Google Scholar 

  11. Lilljebjorn, H. & Fioretos, T. New oncogenic subtypes in pediatric B-cell precursor acute lymphoblastic leukemia. Blood 130, 1395–1401 (2017).

    Article  PubMed  CAS  Google Scholar 

  12. Bastian, L. et al. PAX5 biallelic genomic alterations define a novel subgroup of B-cell precursor acute lymphoblastic leukemia. Leukemia 33, 1895–1909 (2019).

    Article  CAS  PubMed  Google Scholar 

  13. Passet, M. et al. PAX5 P80R mutation identifies a novel subtype of B-cell precursor acute lymphoblastic leukemia with favorable outcome. Blood 133, 280–284 (2019).

    Article  CAS  PubMed  Google Scholar 

  14. Gu, Z. et al. PAX5-driven subtypes of B-progenitor acute lymphoblastic leukemia. Nat. Genet. 51, 296–307 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Li, J. F. et al. Transcriptional landscape of B cell precursor acute lymphoblastic leukemia based on an international study of 1,223 cases. Proc. Natl Acad. Sci. USA 115, E11711–e11720 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Ueno, H. et al. Landscape of driver mutations and their clinical impacts in pediatric B-cell precursor acute lymphoblastic leukemia. Blood Adv. 4, 5165–5173 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Liu, Y. et al. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nat. Genet. 49, 1211–1218 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Jeha, S. et al. Clinical significance of novel subtypes of acute lymphoblastic leukemia in the context of minimal residual disease-directed therapy. Blood Cancer Discov. 2, 326–337 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Montefiori, L. E. et al. Enhancer hijacking drives oncogenic BCL11B expression in lineage-ambiguous stem cell leukemia. Cancer Discov. 11, 2846–2867 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Pounds, S. et al. A genomic random interval model for statistical analysis of genomic lesion data. Bioinformatics 29, 2088–2095 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Edmonson, M. N. et al. Pediatric Cancer Variant Pathogenicity Information Exchange (PeCanPIE): a cloud-based platform for curating and classifying germline variants. Genome Res. 29, 1555–1565 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Ma, X. et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature 555, 371–376 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Article  CAS  Google Scholar 

  27. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Petljak, M. et al. Characterizing mutational signatures in human cancer cell lines reveals episodic APOBEC mutagenesis. Cell 176, 1282–1294 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Maia, A. T. et al. Prenatal origin of hyperdiploid acute lymphoblastic leukemia in identical twins. Leukemia 17, 2202–2206 (2003).

    Article  CAS  PubMed  Google Scholar 

  31. Li, Y. et al. Constitutional and somatic rearrangement of chromosome 21 in acute lymphoblastic leukaemia. Eur. J. Cancer 50, S97–S97 (2014).

    Article  Google Scholar 

  32. Albig, W. et al. All known human H1 histone genes except the H1(0) gene are clustered on chromosome 6. Genomics 16, 649–654 (1993).

    Article  CAS  PubMed  Google Scholar 

  33. Albig, W., Trappe, R., Kardalinou, E., Eick, S. & Doenecke, D. The human H2A and H2B histone gene complement. Biol. Chem. 380, 7–18 (1999).

    Article  CAS  PubMed  Google Scholar 

  34. Ederveen, T. H., Mandemaker, I. K. & Logie, C. The human histone H3 complement anno 2011. Biochim. Biophys. Acta 1809, 577–586 (2011).

    Article  CAS  PubMed  Google Scholar 

  35. Miles, D. M., Desdouets, C. & Geli, V. Histone stress: an unexplored source of chromosomal instability in cancer? Curr. Genet. 65, 1081–1088 (2019).

    Article  CAS  PubMed  Google Scholar 

  36. Elia, M. C. & Bradley, M. O. Influence of chromatin structure on the induction of DNA double strand breaks by ionizing radiation. Cancer Res. 52, 1580–1586 (1992).

    CAS  PubMed  Google Scholar 

  37. Rosidi, B. et al. Histone H1 functions as a stimulatory factor in backup pathways of NHEJ. Nucleic Acids Res. 36, 1610–1623 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Li, Y. et al. Constitutional and somatic rearrangement of chromosome 21 in acute lymphoblastic leukaemia. Nature 508, 98–102 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Jerchel, I. S. et al. RAS pathway mutations as a predictive biomarker for treatment adaptation in pediatric B-cell precursor acute lymphoblastic leukemia. Leukemia 32, 931–940 (2018).

    Article  CAS  PubMed  Google Scholar 

  40. Mullighan, C. G. et al. Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 446, 758–764 (2007).

    Article  CAS  PubMed  Google Scholar 

  41. Weng, A. P. et al. Activating mutations of NOTCH1 in human T cell acute lymphoblastic leukemia. Science 306, 269–271 (2004).

    Article  CAS  PubMed  Google Scholar 

  42. Sulong, S. et al. A comprehensive analysis of the CDKN2A gene in childhood acute lymphoblastic leukemia reveals genomic deletion, copy number neutral loss of heterozygosity, and association with specific cytogenetic subgroups. Blood 113, 100–107 (2009).

    Article  CAS  PubMed  Google Scholar 

  43. Gutierrez, A. et al. High frequency of PTEN, PI3K, and AKT abnormalities in T-cell acute lymphoblastic leukemia. Blood 114, 647–650 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Holmfeldt, L. et al. The genomic landscape of hypodiploid acute lymphoblastic leukemia. Nat. Genet. 45, 242–252 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Tarabichi, M. et al. A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat. Methods 18, 144–155 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Okuma, T., Honda, R., Ichikawa, G., Tsumagari, N. & Yasuda, H. In vitro SUMO-1 modification requires two enzymatic steps, E1 and E2. Biochem. Biophys. Res. Commun. 254, 693–698 (1999).

    Article  CAS  PubMed  Google Scholar 

  47. Lammer, F., Klaumuenzer, M., Mossner, M. & Jann, J. C. Next generation RNA sequencing of acute promyelocytic leukemia (APL) identifies novel long non coding RNAs including new variants of MIR181A1HG that are differentially expressed during myeloid differentiation. Blood 124, 1031 (2014).

    Article  Google Scholar 

  48. Armenia, J. et al. The long tail of oncogenic drivers in prostate cancer. Nat. Genet. 50, 645–651 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Stanulla, M. et al. IKZF1(plus) defines a new minimal residual disease-dependent very-poor prognostic profile in pediatric B-cell precursor acute lymphoblastic leukemia. J. Clin. Oncol. 36, 1240–1249 (2018).

    Article  CAS  PubMed  Google Scholar 

  50. Hystad, M. E. et al. Characterization of early stages of human B cell development by gene expression profiling. J. Immunol. 179, 3662–3671 (2007).

    Article  CAS  PubMed  Google Scholar 

  51. Guo, H., Barberi, T., Suresh, R. & Friedman, A. D. Progression from the common lymphoid progenitor to B/myeloid PreproB and ProB precursors during B lymphopoiesis requires C/EBPalpha. J. Immunol. 201, 1692–1704 (2018).

    Article  CAS  PubMed  Google Scholar 

  52. Mullighan, C. G. et al. BCR-ABL1 lymphoblastic leukaemia is characterized by the deletion of Ikaros. Nature 453, 110–114 (2008).

    Article  CAS  PubMed  Google Scholar 

  53. Papaemmanuil, E. et al. RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia. Nat. Genet. 46, 116–125 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Heerema, N. A. et al. Specific extra chromosomes occur in a modal number dependent pattern in pediatric acute lymphoblastic leukemia. Genes Chromosomes Cancer 46, 684–693 (2007).

    Article  CAS  PubMed  Google Scholar 

  55. Paulsson, K. et al. The genomic landscape of high hyperdiploid childhood acute lymphoblastic leukemia. Nat. Genet. 47, 672–676 (2015).

    Article  CAS  PubMed  Google Scholar 

  56. Alpar, D. et al. Sequential and hierarchical chromosomal changes and chromosome instability are distinct features of high hyperdiploid pediatric acute lymphoblastic leukemia. Pediatr. Blood Cancer 61, 2208–2214 (2014).

    Article  CAS  PubMed  Google Scholar 

  57. Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Nielsen, K. P., Zhao, L., Stamnes, J. J., Stamnes, K. & Moan, J. The importance of the depth distribution of melanin in skin for DNA protection and other photobiological processes. J. Photochem. Photobiol. B 82, 194–198 (2006).

    Article  CAS  PubMed  Google Scholar 

  59. Maia, A. T. et al. Identification of preleukemic precursors of hyperdiploid acute lymphoblastic leukemia in cord blood. Genes Chromosomes Cancer 40, 38–43 (2004).

    Article  PubMed  Google Scholar 

  60. Jones, C. L. et al. Spectrum of mutational signatures in T-cell lymphoma reveals a key role for UV radiation in cutaneous T-cell lymphoma. Sci. Rep. 11, 3962 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Ma, X. et al. Rise and fall of subclones from diagnosis to relapse in pediatric B-acute lymphoblastic leukaemia. Nat. Commun. 6, 6604 (2015).

    Article  CAS  PubMed  Google Scholar 

  62. Brown, P. A. et al. FLT3 inhibitor lestaurtinib plus chemotherapy for newly diagnosed KMT2A-rearranged infant acute lymphoblastic leukemia: Children’s Oncology Group trial AALL0631. Leukemia 35, 1279–1290 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Petit, A. et al. Oncogenetic mutations combined with MRD improve outcome prediction in pediatric T-cell acute lymphoblastic leukemia. Blood 131, 289–300 (2018).

    Article  CAS  PubMed  Google Scholar 

  64. Pui, C. H. et al. Treating childhood acute lymphoblastic leukemia without cranial irradiation. N. Engl. J. Med. 360, 2730–2741 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Jeha, S. et al. Improved CNS control of childhood acute lymphoblastic leukemia without cranial irradiation: St Jude Total Therapy Study 16. J. Clin. Oncol. 37, 3377–3391 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Bowman, W. P. et al. Augmented therapy improves outcome for pediatric high risk acute lymphocytic leukemia: results of Children’s Oncology Group trial P9906. Pediatr. Blood Cancer 57, 569–577 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Larsen, E. C. et al. Dexamethasone and high-dose methotrexate improve outcome for children and young adults with high-risk B-acute lymphoblastic leukemia: a report from Children’s Oncology Group Study AALL0232. J. Clin. Oncol. 34, 2380–2388 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Salzer, W. L. et al. Impact of intrathecal triple therapy versus intrathecal methotrexate on disease-free dsrvival for high-risk B-lymphoblastic leukemia: Children’s Oncology Group Study AALL1131. J. Clin. Oncol. 38, 2628–2638 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Maloney, K. W. et al. Outcome in children with standard-risk B-cell acute lymphoblastic leukemia: results of Children’s Oncology Group Trial AALL0331. J. Clin. Oncol. 38, 602–612 (2020).

    Article  CAS  PubMed  Google Scholar 

  70. Angiolillo, A. L. et al. Excellent outcomes with reduced frequency of vincristine and dexamethasone pulses in standard-risk B-lymphoblastic leukemia: results from Children’s Oncology Group AALL0932. J. Clin. Oncol. 39, 1437–1447 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Winter, S. S. et al. Improved survival for children and young adults with T-lineage acute lymphoblastic leukemia: results from the Children’s Oncology Group AALL0434 Methotrexate Randomization. J. Clin. Oncol. 36, 2926–2934 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Wang, J. et al. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat. Methods 8, 652–654 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).

    Article  CAS  PubMed  Google Scholar 

  75. Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Edmonson, M. N. et al. Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format. Bioinformatics 27, 865–866 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  79. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  80. Ioannidis, N. M. et al. REVEL: An ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Zhou, X. et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat. Genet. 48, 4–6 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Yang, M. et al. 13q12.2 Deletions in acute lymphoblastic leukemia lead to upregulation of FLT3 through enhancer hijacking. Blood 136, 946–956 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Li, B. et al. Therapy-induced mutations drive the genomic landscape of relapsed acute lymphoblastic leukemia. Blood 135, 41–55 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  85. Alberti-Servera, L. et al. Single-cell DNA amplicon sequencing reveals clonal heterogeneity and evolution in T-cell acute lymphoblastic leukemia. Blood 137, 801–811 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Zhou, X. et al. Exploration of coding and non-coding variants in cancer using GenomePaint. Cancer Cell 39, 83–95 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We thank the Biorepository, the Genome Sequencing Facility of the Hartwell Center for Bioinformatics and Biotechnology and the Flow Cytometry and Cell Sorting Shared Resource and Cytogenetics core facility of St. Jude Children’s Research Hospital (SJCRH). We thank L. Hermida and T. Davidsen for assistance with genomic data management. This work was supported by the American Lebanese Syrian Associated Charities of SJCRH; St. Baldricks Foundation Robert J. Arceci Innovation Award (to C.G.M.); Alex’s Lemonade Stand Foundation Innovation Award (to C.G.M.); American Society of Hematology Scholar Award (to Z.G.); the Leukemia & Lymphoma Society’s Career Development Program Special Fellow (to Z.G.); NCI grant no. P30 CA021765 (SJCRH Cancer Center Support Grant), Outstanding Investigator Award no. R35 CA197695 (to C.G.M.) and R01 CA216391 (to J.Z.); Pathway to Independence Award no. R00 CA241297 (to Z.G.); National Institute of General Medical Sciences no. P50 GM115279 (to M.V.R., C.G.M., J.J.Y., M.L.L., W.E.E., J.Z. and C.C.); and St. Baldricks Foundation Award (to M.L.L and S.P.H.). C.G.M. is the William E. Evans Endowed Chair at SJCRH. This study is dedicated to the memory of D. S. Gerhard.

Author information

Authors and Affiliations

Authors

Contributions

S.W.B. performed genomic data analysis, data interpretation and manuscript writing. K.G.R. was responsible for study design, sample preparation, data interpretation and manuscript writing. Z.G., L.S., S.P., C.Q., X.M., L.W., S.A., K.H., Yanling Liu, D.A.F., O.P., C.R., Yu Liu, M.P., X.C., Yongjin Li, Y.F., J.R.M., M.R., M.R.W., S.F., D.H., S.N., X.Z., J.W., E.S., S.V.R., V.B.P., G.W., A.H.E. and E.R. carried out data analysis. D.P. and C.C. performed statistical analysis. A.N.H. and D.P.-T. were responsible for sample preparation. I.I. and P.B. undertook sample preparation and experimental work. M.J. Borowitz. and B.L.W. reviewed flow cytometry. Y.D., M.D., S.C.R., J.G.-F., J.M.G.A., P.G., M.A.S., N.W., A.J.C., N.A.H., R.C.H., C.L.W., E.L., E.R., W.L.C., P.A.Z.-M., K.R.R., L.A.M., K.W.M., S.S.W., M.J. Burke., W.S., K.P.D., A.L.A., K.R.C., J.R.D., S.J., C.-H.P., W.E.E., J.J.Y., M.V.R., D.S.G., M.L.L. and S.P.H. were responsible for patient samples and clinical data. J.Z. carried out data interpretation and manuscript writing. C.G.M. designed and oversaw the study, data interpretation and manuscript writing.

Corresponding authors

Correspondence to Stephen P. Hunger, Jinghui Zhang or Charles G. Mullighan.

Ethics declarations

Competing interests

C.J.M. has received consulting and speaking fees from Illumina and Amgen, and research support from Loxo Oncology, Pfizer and Abbvie. M.J. Borowitz. is on the advisory board for Amgen and Blueprint Medicines. I.I. has received honoraria from Amgen and Mission Bio. M.P. is an employee of Oxford Nanopore Technologies PLC. J.J.Y. has received research funding from Takeda Pharmaceuticals. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Marc Mansour, Mel Greaves and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Overview of ALL cohort.

(a) Number of acute lymphoblastic leukemia (ALL) patients studied (n = 2754), the different modalities of sequencing performed, and the genomic alterations identified by each. (b) Venn diagram of samples analysed by transcriptome sequencing (RNA-seq), whole exome sequencing (WES), whole genome sequencing (WGS) and single nucleotide polymorphism (SNP) profiling across the whole cohort (Pan ALL; left), in B-ALL only (middle) and in T-ALL only. (c) Distribution of patients according to lineage (left), sex (middle left), NCI standard-risk (SR), age 1 to 9.99 yrs and WBC < 50,000/μl; high-risk (HR), age 10 to 15.9 yrs and/or WBC ≥ 50,000/μl; adolescent and young adult (AYA; middle right) and age at diagnosis (right).

Extended Data Fig. 2 Subtype classification of B-ALL.

(a) Flow chart for B-ALL subtype classification; for detailed description of criteria, see Supplementary Methods. (b) Left, tSNE of B-ALL cases with RNA-seq. Right, copy number heatmap of B-ALL samples as determined by WGS or SNP copy array (n = 1,630 samples), with subtype indicated by color at top. (c) Kaplan-Meier survival curves with overall survival distributions for each B-ALL subtype. Subtypes are separated into five graphs for ease of visualizing the various subtypes. Subtypes with at least 5 samples are shown. P value shown is by two-sided log-rank test comparing all subtypes shown in all five graphs. (d) Age at diagnosis by B-ALL subtype. Boxplot shows median (thick center line) and interquartile range (box). Whiskers are described in R boxplot documentation (a 1.5*interquartile range rule is used). Text at top shows median age in the subtype. P values compare ages from the subtype vs. all other B-ALL samples by Wilcoxon rank-sum test; P values ≤ 0.05 are shown. Numbers of patients are shown at bottom, and yellow line indicates median age across B-ALL.

Extended Data Fig. 3 Subtype classification of T-ALL.

(a) Flow chart for T-ALL subtype classification and inclusion in clusters 1-4 as drawn on the tSNE plot. Classification begins at the top and samples meeting the indicated criteria are assigned to subtypes shown at right. Boxplots to the right show the expression of these genes in samples assigned to the indicated subtype (+) or not assigned (-). Samples bearing a detected fusion or rearrangement defining the subtype are marked with yellow circles with X marks. The gene expression thresholds indicated at left were determined based on the expression levels in fusion-positive samples. Samples where gene expression was above these thresholds but no fusion was detected were assumed to likely have a fusion and were thus assigned to that subtype, since the fusion may have been undetected due to technical issues (for example TLX3 enhancer hijacking rearrangements may be hard to detect with RNA-seq since they do not always create fusion transcripts). Boxplots show median (thick center line) and interquartile range (box). Whiskers are described in R boxplot documentation (a 1.5*interquartile range rule is used). (b) tSNE of T-ALL cases with RNA-seq. (c) Kaplan-Meier survival curves with overall survival distributions for each T-ALL subtype, shown in three graphs for ease of visualization. Subtypes with at least 5 samples are shown. P value shown is by two-sided log-rank test comparing all subtypes shown in all graphs. (d) Age at diagnosis by T-ALL subtype. Boxplot shows median (thick center line) and interquartile range (box). Whiskers are described in R boxplot documentation (a 1.5*interquartile range rule is used). Text at top shows median age in the subtype. P values compare ages from the subtype vs. all other T-ALL samples by two-sided Wilcoxon rank-sum test; P values ≤ 0.05 are shown. Numbers of patients are shown at bottom, and yellow line indicates median age across T-ALL.

Extended Data Fig. 4 Sequencing coverage and identification of significantly mutated genes.

(a) Each sample’s median sequencing coverage based on WGS (n = 768) or WES (n = 1,729) is shown, including both germline and cancer (ALL) samples for each patient. Boxplot shows median (thick center line) and interquartile range (box). Whiskers are described in R boxplot documentation (a 1.5*interquartile range rule is used). The median coverage across all samples is indicated by text (for example ‘39X’). For WGS, the genome-wide coverage for each sample is indicated by each point. For WES, the median coverage in all protein-coding regions of exons (excluding 5’ and 3’ untranslated regions), as defined by the UCSC refGene.txt file, is shown. (b) Approach for identification of significantly mutated genes. The sequencing platform is shown on top, followed by the variant types detected by each platform below, and the third layer shows the tools used to identify significantly altered genes, with arrows indicating the variant types used as input to these tools. Intragenic SV outliers were identified initially by frequent SVs within the gene, and were corroborated manually with copy number analysis (dotted gray line) as the SVs were usually at the boundaries of focal deletions. All significantly mutated genes’ focal deletion and SNV/indel mutation site localization were manually inspected and those considered unlikely drivers were excluded. When combining the significantly mutated genes thus identified with the list of known drivers in ALL, a list of 376 driver or putative driver genes was identified.

Extended Data Fig. 5 Correlation of COSMIC signatures with age and genetic alterations.

For each B-ALL and T-ALL subtype, the correlation between signature abundance (in number of SNVs, y-axis) and the age at diagnosis (x-axis) is shown. This includes samples sequenced by WGS which had mutational signature cosine similarities (comparing the sample profile vs. the profile as reconstructed by signatures) of 0.85 or above, and which also had available age information. Only subtypes with at least 5 samples meeting these criteria are shown, and the number of samples in each subtype are shown above each plot. Two-sided Pearson r correlation was performed to obtain the P and r values shown for each subtype. For subtypes with P < 0.05, linear regression was performed resulting in the linear fits shown, along with text indicating the slope of the line in mutations per year. (a) Signature 1 (5mC deamination). (b) Signature 5 (clock-like). (c) Signatures 2 and 13 (APOBEC). (d) Signature 7 (UV). (e) Signature 18 (ROS; left). Somatic alterations significantly correlating with signature 18 (right). Each somatic alteration (chromosome-level copy alterations and driver/putative driver genes) was tested for correlation with the presence vs. absence of signature 18, and 20q deletion and 9p deletion were significantly associated with signature 18 in the subtypes shown. P values are by two-sided Fisher’s exact test, and the number of samples in each group are shown below (n). Only WGS samples were analyzed.

Extended Data Fig. 6 Copy gain schemes in each hyperdiploid sample.

Each hyperdiploid sample sequenced by WGS is shown. This analysis tests whether copy gains likely occurred simultaneously or sequentially and is an expanded version of the examples shown in Fig. 2d, showing all 72 samples. Only 3-copy chromosomes with at least 20 somatic SNVs in the sample were analyzed, and only samples with two or chromosomes meeting this criterion were analyzed. On density plots, x-axes show VAF adjusted for tumor purity, and y-axes show each 3-copy whole-chromosome or arm gain in the sample. Vertical ticks on x-axis show individual SNV VAFs; an abundance of VAFs around 0.67 indicates late copy gains since the SNVs occurred prior to the copy gains (2 of 3 copies), while a preponderance of VAFs around 0.33 indicates early copy gains since most SNVs occurred after the copy gains (1 of 3 copies). Blue indicates an inferred early copy gain and red a late copy gain. (a) Samples falling into the asynchronous with late arm gain scheme, where most copy gains occur early with one chromosome arm gain occurring later. (b) Samples falling into the asynchronous with whole-chromosome gain scheme, where most copy gains occur early with one whole-chromosome gain occurring later. (c) Lone sample belonging to the synchronous late gain scheme, where all copy gains appear to occur simultaneously and occur late, after substantial point mutations have had time to accumulate (thus present on 2 of 3 copies). (d) Samples belonging to the synchronous early gain scheme, where all copy gains appear to occur simultaneously and occur early, before substantial point mutations have had time to accumulate (SNVs are present on 1 of 3 copies).

Extended Data Fig. 7 Genetic alterations affecting histone genes on chromosome 6p22.2 and 6p22.1.

(a) Prevalence of genetic alterations affecting any of the histones on 6p22.2 (top) or 6p22.1 (bottom) in each ALL subtype. Y-axis indicates the percentage of samples affected in each subtype, and the exact number of samples altered along with the number of samples analyzed in each subtype is shown above each plot. Samples with characterisation of both SNVs/indels and copy number alterations (through WGS or WES combined with SNP array) were analyzed. Alteration types are indicated by color (see legend at top right) and exclude fusions. If a sample had an alteration in more than one histone or more than one alteration type, only one alteration at the highest rank in the legend of alterations (for example ‘nonsense’ has top priority) was shown. (b) Focal deletions (5 Mb or less; blue indicates degree of copy loss in each sample (row) and circles indicate SVs which were available for WGS samples only) at 6p22.2 (left) or 6p22.1 (right) affecting at least one histone in either region. Color at left indicates the subtype and lineage (B-ALL or T-ALL) as indicated by legend at bottom. (c) Sites of non-silent SNVs and indels in histones on 6p22 which were recurrently altered. Protein domains are indicated in color. (d) Somatic structural variant (SV) burden in patients with or without (w/o) deletion of one or more histones on 6p22.2 or 6p22.1 or other SNV/indel alterations in histone genes such as those in c. Only patients with Illumina WGS data were analyzed, and only ALL subtypes with at least three histone-altered samples are shown. P values were determined by two-sided Wilcoxon rank-sum test. Boxplots show median (thick center line) and interquartile range (box). Whiskers are described in R boxplot documentation (a 1.5 × interquartile range rule is used).

Extended Data Fig. 8 Clonality of driver SNVs and indels in B-ALL and T-ALL.

(a-b) The cancer cell fraction (CCF, x-axis), that is the percentage of cancer cells harboring each mutation, of alterations in each driver or putative driver gene is shown in (a) all B-ALL samples or the indicated B-ALL subtype, or (b) all T-ALL samples or the indicated T-ALL subtype. The CCF was calculated based on the VAF, copy number, and tumor purity of each sample; calculated CCFs above 1.0 were considered 1.0. Samples with both SNV/indel and copy number characterisation are shown. For subtype-specific plots, only subtypes with at least 20 samples meeting this criterion are shown. Each plot shows the number of samples analyzed (n) at top. For most samples, only SNVs/indels in 2-copy regions were analyzed, except for near haploid and low hypodiploid where only SNVs/indels in 1-copy regions were analyzed. SNVs are shown in blue and indels in red; each point represents one somatic mutation. Boxplots show median (thick center line) and interquartile range (box). Whiskers are described in R boxplot documentation (a 1.5*interquartile range rule is used). Known or putative driver genes with at least 10 SNVs/indels in 2-copy regions across all B-ALL samples, or 8 SNVs/indels in 2-copy regions across all T-ALL samples, are shown. (c-d) Targeted single-cell DNA sequencing plus protein analysis of two B-ALL samples (c) and one T-ALL sample (d). For each patient, a heatmap is shown with each row representing one cell, and each column representing either one mutation (left side) or one protein (right side). Mutation VAF is indicated by blue color, while protein level (as a percent of all protein-associated reads detected in the cell) is indicated by red color. At bottom of heatmap likely normal cells are indicated. The bulk VAF of each mutation is indicated below, along with bulk CCF (if copy number was available).

Extended Data Fig. 9 Alterations in rarely mutated genes affecting gene expression.

(a) Selected alterations in rare epigenetic modulators. Putative cancer driver genes are shown in blue text. Left shows an oncoprint showing only samples with alterations in at least one of these genes, with alteration type indicated by color and the percentage of samples in B-ALL or T-ALL altered at right. Top-middle shows the percentage of each subtype with alterations in these genes, color-coded by the specific gene altered. In samples with alterations with more than one gene, only the top-most gene in the legend is shown. Number of samples in each subtype is as in Fig. 5b. Right shows example gene alterations, including focal deletions (5 Mb or less; blue indicates degree of copy loss in each sample (row) and circles indicate SVs which were available for WGS samples only) in PSPC1-ZMYM2 and INO80. Sites of sequence alterations in HDAC7 and TRRAP are shown at middle-bottom. (b-d) Oncoprints and subtype bar plots as in (a) except that shown are selected transcription factors (b), RNA processing factors (c), and cohesion-associated genes (d). P values (asterisks) are by two-sided Fisher’s exact test comparing prevalence in the indicated subtype vs. all samples not belonging to that subtype (within that lineage (B-ALL or T-ALL), so that ETV6-RUNX1 subtype would be compared to B-ALL samples of other subtypes, while TLX3 subtype would be compared to T-ALL samples of other subtypes). In (c), exact P values are 8.6 × 10−4 (Ph), 0.0047 (Ph-like other), 0.015 (ETV6-RUNX1), 9.7 × 10−6 (iAMP21), and 0.035 (NKX2-1). In (d), exact P values are 0.020 (DUX4), 0.016 (TCF3-PBX1), 0.0027 (ETV6-RUNX1), 0.023 (iAMP21), and 2.6 × 10−4 (TLX3).

Extended Data Fig. 10 Alterations in rarely mutated genes affecting the cytoskeleton and other miscellaneous alterations.

(a) Selected alterations in cytoskeleton-related genes. Putative cancer driver genes are shown in blue text. Top shows an oncoprint showing only samples with alterations in at least one of these genes, with alteration type indicated by color and the percentage of samples in B-ALL or T-ALL altered at right. Bottom shows the percentage of each subtype with alterations in these genes, color-coded by the specific gene altered. In samples with alterations with more than one gene, only the top-most gene in the legend is shown. Number of samples in each subtype is as in Fig. 5b. (b-c) Oncoprints and subtype bar plots as in (a) except that shown are selected rare alterations affecting DNA damage or the cell cycle (b), and miscellaneous alterations affecting various pathways as indicated by text to the left (c). P values (asterisks) are by two-sided Fisher’s exact test comparing prevalence in the indicated subtype vs. all samples not belonging to that subtype (within that lineage (B-ALL or T-ALL), so that ETV6-RUNX1 subtype would be compared to B-ALL samples of other subtypes, while TLX3 subtype would be compared to T-ALL samples of other subtypes). In (a), exact P values are 2.2 × 10−4 (Ph), 1.1 × 10−4 (Ph-like other), and 9.5 × 10−4 (Ph-like CRLF2). In (b), exact P values are 0.011 (Ph-like other), and 0.048 (ETV6-RUNX1). In (c), exact P values are 1.4 × 10−8 (ETV6-RUNX1), 4.2 × 10−4 (iAMP21), and 0.019 (TLX3).

Supplementary information

Supplementary Information

Supplementary Methods, Datasets 1–8 descriptions, table descriptions, Fig. 1 and references.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Table 1. Overview of cohort by case. Clinical metadata, sequencing modality, ALL subtypes. Supplementary Table 2. Summary of clinical presentation by subtype. Supplementary Table 3. ALL nonsilent SNVs and indels. List of coding-region somatic SNVs and indels identified in ALL samples sequenced by WGS and/or WES. Hg19 coordinates are used. Supplementary Table 4. ALL SVs from WGS. List of somatic structural variants identified (genome wide) in ALL samples sequenced by WGS. Hg19 coordinates are used. Supplementary Table 5. ALL fusion transcripts detected by transcriptome sequencing (RNA-seq) by case. List of driver or potential driver fusions detected in ALL samples by RNA-seq or WGS. Supplementary Table 6. List of driver genes identified in ALL. List of 328 driver and putative driver genes analyzed or newly identified in ALL by SNV, indel, CNA or rearrangement. This list focuses on somatic alterations. Genes are annotated as known ALL drivers, known cancer drivers or first time reported in cancer. Lineage preference (B-ALL, T-ALL) is also shown. Supplementary Table 7. ALL mutational signatures by case. Somatic SNV mutational signatures detected in each sample sequenced by WGS, as determined by SigProfilerSingleSample. Supplementary Table 8 Subtype associations with mutational signature. Summary of mutational signature associations for each B-ALL and T-ALL subtype, and between B-ALL and T-ALL. The similarity of somatic alteration prevalence in different leukemia subtypes was evaluated by two-sided Fisher's exact test. Supplementary Table 9. VAF for all high hyperdiploid cases with triploid chromosomes. Somatic SNVs in three-copy regions in hyperdiploid B-ALL as determined by WGS, which were used to determine the relative order of each copy gain in individual samples. Supplementary Table 10. VAF for UV-induced mutations in high hyperdiploid and iAMP21 by case. Somatic SNV mutational signatures detected in hyperdiploid and iAMP21 samples sequenced by WGS, as determined by SigProfilerSingleSample. Supplementary Table 11. Case lesion matrix. Sample by mutation matrix, indicating driver and putative driver genes somatically altered in each patient. This table includes only samples with SNV/indel and copy number alteration. Supplementary Table 12. Subtype associations with genes/pathways. Summary of analysis of genetic and pathway associations for each B-ALL and T-ALL subtype (with ten or more cases), and between B-ALL and T-ALL. Genes with alteration in ten or more cases were included. The similarity of somatic alteration prevalence in different leukemia subtypes was evaluated by two-sided Fisher's exact test. Supplementary Table 13. Comparison of altered genes and pathway by NCI risk. This table provides a breakdown of gene and pathway alteration by NCI standard risk (SR) and high risk (HR). P values were derived by Fisher’s exact test. Supplementary Table 14. Genes that define recurrently altered pathways. Supplementary Table 15. Gene–gene and gene–pathway associations. Comutation and mutual exclusivity analysis. Each gene–gene pair was analyzed for comutation or mutual exclusivity by Fisher's exact test within B-ALL (top) or T-ALL (bottom). For each association, the odds ratio for the association is shown in the first column and the P value by two-sided Fisher's exact test in the second column. Supplementary Table 16. CCF values for driver genes by case. Cancer cell fractions of SNVs and indels in two-copy regions (for low hypodiploid and near haploid samples, one-copy-region mutations were also analyzed). Supplementary Table 17. Overall survival within each subtype and lineage by gene and pathway using univariate analysis. This table provides univariate overall survival from diagnosis for each gene and pathway (mutated versus WT) within each subtype and lineage. P values were derived by log-rank test. Only subtypes with at least ten subtypes were analyzed. Supplementary Table 18. EFS within each subtype and lineage by gene and pathway using univariate analysis. This table provides univariate EFS for each gene and pathway (mutated versus WT) within each subtype and lineage. P values were derived by log-rank test. Only subtypes with at least ten subtypes were analyzed. Supplementary Table 19. Minimal residual disease within each subtype and lineage by gene and pathway. This table compares the frequency of minimal residual disease negative (<0.01%) versus positive (≥0.01%) at the end of induction for each gene and pathway (mutated versus WT) within each subtype and lineage. P values were derived by log-rank test. Only subtypes with at least ten subtypes were analyzed. Supplementary Table 20. OS within subtypes by gene alteration using multivariate analysis. This table provides multivariate OS analysis from diagnosis for each gene subtype analysis in which univariate analysis was statistically significant. P values were derived by Cox proportional-hazards modeling. Supplementary Table 21. EFS within subtypes by gene alteration using multivariate analysis. This table provides multivariate EFS analysis from diagnosis for each gene subtype analysis in which univariate analysis was statistically significant. P values were derived by Cox proportional-hazards modeling. Supplementary Table 22. Differential gene expression of DUX4-a versus DUX4-b samples analyzed by RNA-seq. The output was generated by Limma. Each row is one gene. logFC is log fold change (positive means higher in DUX4-b, negative higher in DUX4-a samples). The adjusted P value is in the 'adj.P.Val' column. P values were obtained with Limma (which uses an empirical Bayes moderated approach), are two-sided and are adjusted for multiple comparisons using Benjamini–Hochberg correction. Supplementary Table 23. Differential gene expression of KMT2A-a versus KMT2A-b samples analyzed by RNA-seq. The output was generated by Limma. Each row is one gene. The logFC is log fold change (positive means higher in DUX4-b, negative higher in DUX4-a samples). The adjusted P value is in the 'adj.P.Val; column. P values were obtained with Limma (which uses an empirical Bayes moderated approach), are two-sided and are adjusted for multiple comparisons using Benjamini–Hochberg correction. Supplementary Table 24. ALL germline variants. Germline variants probably or potentially promoting cancer predisposition across patients with ALL (both B-ALL and T-ALL) analyzed by exome sequencing.

Supplementary Data 1

Mutation sites of SNVs and indels in each driver or putative driver gene. For each known or putative driver gene, the protein site of each SNV and indel is shown as plotted with ProteinPaint, with B-ALL and T-ALL combined in each plot. This includes all samples sequenced by WGS or WES (n = 2,141). The x axis indicates amino acid position, and protein domains are indicated by color. Mutation types are also indicated by color, with the number of samples at each mutation site indicated in circles at the top of each ‘lollipop’; if no number is indicated then a single sample had the alteration.

Supplementary Data 2

Focal deletions in each driver or putative driver tumor suppressor gene. For each known or putative driver gene considered to be a tumor suppressor, each focal deletion is shown in samples sequenced by WGS or in those with SNP array plus WES (n = 1,815). Only samples with focal deletions (5 Mb or less) affecting any exon that includes protein-coding portions of the gene are shown. Blue indicates degree of copy loss in each sample (row), and circles indicate SVs available for WGS samples only. The x axis represents the chromosomal coordinates on the chromosome containing the gene of interest (hg19), with refGene transcripts shown at top and gray arrows indicating the direction of transcription. B-ALL sample rows are shown in gray and T-ALL in dark gray. Deletions that extend from left to right on the x axis are still focal deletions of 5 Mb or less, but may also affect other genes not visible in the range shown.

Supplementary Data 3

Somatic alterations associated with risk status. Within each ALL subtype, the presence of each somatic alteration (gene or chromosome-level copy alteration) was analyzed for association with clinical risk status. Patients were grouped into childhood standard risk (SR; left bar) or childhood high risk (HR), combined with AYA risk (right bar). Only statistically significant associations are shown (P ≤ 0.05 by two-sided Fisher’s exact test). Samples with SNV/indel and copy number characterization (either WGS or WES plus SNP array) were analyzed. Associations were analyzed only for somatic alterations with at least three somatically altered samples and three nonaltered (WT) samples in the subtype. Subtypes with fewer than ten samples in total were excluded from analysis. The numbers of altered and total samples represented by each bar are shown by text on or above the bar (for example, 31/245 indicates 31 of 245 samples with a specific alteration).

Supplementary Data 4

Oncoprints of the mutational landscape of each ALL subtype. Each page shows the mutational landscape of a single ALL subtype, with the number of samples (n) indicated at top. Samples with both SNV/indel and copy number characterization (WGS or WES plus SNP array) are shown. Within the main region (left), each row represents a single gene, sorted from top to bottom by the frequency of alterations, and each column represents one sample. The presence of a mutation is indicated by color, as shown in the legend at far right. Fusions are not included as an alteration type because they are implied by the subtype. To the right, the percentage of samples in the subtype with each alteration is indicated (x axis), color coded by alteration type. If a sample had more than one alteration in a gene, only the alteration top-most in the alteration legend (starting with ‘nonsense’) is shown in the barplot.

Supplementary Data 5

Subtype specificity of each driver or putative driver gene. Somatic alterations in each driver or putative driver gene are shown, with the y axis indicating the percentage of samples affected in each subtype. The exact number of samples altered, along with the number of samples analyzed in each subtype, are shown above each plot. Samples with characterization of both SNVs/indels and copy number alterations (through WGS or WES combined with SNP array) were analyzed. Alteration types are indicated by color (see legend at top right) and exclude fusions. If a sample had more than one alteration type in the gene, only one alteration at the highest rank in the legend of alterations (for example, ‘nonsense’ has top priority) is shown. Samples with both SNV/indel and copy number characterization (WGS or WES plus SNP array) are shown.

Supplementary Data 6

Coassociation and mutual exclusivity of gene alterations within each ALL subtype. Each page shows the comutation and mutual exclusivity of somatic alterations within one ALL subtype (with number of samples in the subtype noted towards bottom right). Each row or column header indicates somatic alterations in an individual gene, chromosome or chromosome arm (the latter two indicated by ‘gain’ for copy gains and ‘del’ for copy losses), with the total number of samples in the subtype bearing each alteration indicated in parentheses. The row–column intersection point of two genetic alterations in the table denotes the number of samples with both alterations and the statistical significance of this association by two-sided Fisher’s exact test (red or blue circles are shown for P values of 0.05 or less). Blue indicates mutual exclusivity and red indicates frequent comutation (see scale at bottom right). Within each subtype, q-values were computed using all P values computed in that subtype, and instances of Q ≤ 0.05 are encircled in black. No adjustment was made for multiple comparisons. Only samples with SNV/indel data and copy number data were analyzed (samples with WGS, or WES combined with SNP copy array). Only genes or chromosomes with at least five alterations in the subtype are shown. Only subtypes with at least two genes or chromosomes meeting the five-alteration criterion are shown.

Supplementary Data 7

Significant OS associations by ALL subtype. Within each ALL subtype, the presence of each somatic alteration (gene or chromosome-level copy alteration) was analyzed for association with OS. Statistically significant survival curves are shown (P ≤ 0.05 by univariate two-sided log-rank test). OS was analyzed only for somatic alterations with at least three somatically altered samples and three nonaltered (WT) samples in the subtype; at least two deaths were required in the WTF or altered group for analysis to be performed.

Supplementary Data 8

Significant EFS associations by ALL subtype. Within each ALL subtype, the presence of each somatic alteration (gene or chromosome-level copy alteration) was analyzed for association with EFS. Statistically significant EFS curves are shown (P ≤ 0.05 by univariate two-sided log-rank test). EFS was analyzed only for somatic alterations with at least three somatically altered samples and three nonaltered (WT) samples in the subtype; at least two events were required in the WT or altered group for analysis to be performed.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brady, S.W., Roberts, K.G., Gu, Z. et al. The genomic landscape of pediatric acute lymphoblastic leukemia. Nat Genet 54, 1376–1389 (2022). https://doi.org/10.1038/s41588-022-01159-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-022-01159-z

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing