Recent genomic studies have identified chromosomal rearrangements defining new subtypes of B-progenitor acute lymphoblastic leukemia (B-ALL), however many cases lack a known initiating genetic alteration. Using integrated genomic analysis of 1,988 childhood and adult cases, we describe a revised taxonomy of B-ALL incorporating 23 subtypes defined by chromosomal rearrangements, sequence mutations or heterogeneous genomic alterations, many of which show marked variation in prevalence according to age. Two subtypes have frequent alterations of the B lymphoid transcription-factor gene PAX5. One, PAX5alt (7.4%), has diverse PAX5 alterations (rearrangements, intragenic amplifications or mutations); a second subtype is defined by PAX5 p.Pro80Arg and biallelic PAX5 alterations. We show that p.Pro80Arg impairs B lymphoid development and promotes the development of B-ALL with biallelic Pax5 alteration in vivo. These results demonstrate the utility of transcriptome sequencing to classify B-ALL and reinforce the central role of PAX5 as a checkpoint in B lymphoid maturation and leukemogenesis.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The raw and analyzed data are provided in a graphical, interactive platform (see URLs). Genomic data generated for this study have been deposited in the European Genome-phenome Archive (EGA) under accession number EGAS00001003266. Other legacy data used in this study have been deposited in the EGA in previous projects under accession numbers EGAS00001000654, EGAS00001001952, EGAS00001001923, EGAS00001002217 and EGAS00001000447. The TARGET genomic data used in this study are available through the TARGET website (see URLs) and also in dbGaP (see URLs) under accession number phs000218 (TARGET). The other data supporting this study are available from the corresponding author upon reasonable request.
Hunger, S. P. & Mullighan, C. G. Acute lymphoblastic leukemia in children. N. Engl. J. Med. 373, 1541–1552 (2015).
Iacobucci, I. & Mullighan, C. G. Genetic basis of acute lymphoblastic leukemia. J. Clin. Oncol. 35, 975–983 (2017).
Roberts, K. G. et al. Genetic alterations activating kinase and cytokine receptor signaling in high-risk acute lymphoblastic leukemia. Cancer Cell 22, 153–166 (2012).
Iacobucci, I. et al. Truncating erythropoietin receptor rearrangements in acute lymphoblastic leukemia. Cancer Cell 29, 186–200 (2016).
Roberts, K. G. et al. Targetable kinase-activating lesions in Ph-like acute lymphoblastic leukemia. N. Engl. J. Med. 371, 1005–1015 (2014).
Zhang, J. et al. Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat. Genet. 48, 1481–1489 (2016).
Gu, Z. et al. Genomic analyses identify recurrent MEF2D fusions in acute lymphoblastic leukaemia. Nat. Commun. 7, 13331 (2016).
Suzuki, K. et al. MEF2D-BCL9 fusion gene is associated with high-risk acute B-cell precursor lymphoblastic leukemia in adolescents. J. Clin. Oncol. 34, 3451–3459 (2016).
Gocho, Y. et al. A novel recurrent EP300-ZNF384 gene fusion in B-cell precursor acute lymphoblastic leukemia. Leukemia 29, 2445–2448 (2015).
Yasuda, T. et al. Recurrent DUX4 fusions in B cell acute lymphoblastic leukemia of adolescents and young adults. Nat. Genet. 48, 569–574 (2016).
Lilljebjorn, H. et al. Identification of ETV6-RUNX1-like and DUX4-rearranged subtypes in paediatric B-cell precursor acute lymphoblastic leukaemia. Nat. Commun. 7, 11790 (2016).
Lilljebjorn, H. & Fioretos, T. New oncogenic subtypes in pediatric B-cell precursor acute lymphoblastic leukemia. Blood 130, 1395–1401 (2017).
Den Boer, M. L. et al. A subtype of childhood acute lymphoblastic leukaemia with poor treatment outcome: a genome-wide classification study. Lancet. Oncol. 10, 125–134 (2009).
Mullighan, C. G. et al. Deletion of IKZF1 and prognosis in acute lymphoblastic leukemia. N. Engl. J. Med. 360, 470–480 (2009).
Tibshirani, R., Hastie, T., Narasimhan, B. & Chu, G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99, 6567–6572 (2002).
Harrison, C. J. et al. An international study of intrachromosomal amplification of chromosome 21 (iAMP21): cytogenetic characterization and outcome. Leukemia 28, 1015–1021 (2014).
Holmfeldt, L. et al. The genomic landscape of hypodiploid acute lymphoblastic leukemia. Nat. Genet. 45, 242–252 (2013).
Johnson, N. A. et al. Lymphomas with concurrent BCL2 and MYC translocations: the critical factors associated with survival. Blood 114, 2273–2279 (2009).
Zhu, X. et al. Identification of functional cooperative mutations of SETD2 in human acute leukemia. Nat. Genet. 46, 287–293 (2014).
Mar, B. G. et al. Mutations in epigenetic regulators including SETD2 are gained during relapse in paediatric acute lymphoblastic leukaemia. Nat. Commun. 5, 3469 (2014).
Schebesta, A. et al. Transcription factor Pax5 activates the chromatin of key genes involved in B cell signaling, adhesion, migration, and immune function. Immunity 27, 49–63 (2007).
Churchman, M. L. et al. Efficacy of retinoids in IKZF1-mutated BCR-ABL1 acute lymphoblastic leukemia. Cancer Cell 28, 343–356 (2015).
Hu, Y., Yoshida, T. & Georgopoulos, K. Transcriptional circuits in B cell transformation. Curr. Opin. Hematol. 24, 345–352 (2017).
Lauberth, S. M. & Rauchman, M. A conserved 12-amino acid motif in Sall1 recruits the nucleosome remodeling and deacetylase corepressor complex. J. Biol. Chem. 281, 23922–23931 (2006).
Miller, N. L. et al. A non-canonical role for Rgnef in promoting integrin-stimulated focal adhesion kinase activation. J. Cell. Sci. 126, 5074–5085 (2013).
Larsen, E. C. et al. Dexamethasone and high-dose methotrexate improve outcome for children and young adults with high-risk B-acute lymphoblastic leukemia: a report from Children’s Oncology Group study AALL0232. J. Clin. Oncol. 34, 2380–2388 (2016).
Mullighan, C. G. et al. Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 446, 758–764 (2007).
Shah, S. et al. A recurrent germline PAX5 mutation confers susceptibility to pre-B cell acute lymphoblastic leukemia. Nat. Genet. 45, 1226–1231 (2013).
Dang, J. et al. Pax5 is a tumor suppressor in mouse mutagenesis models of acute lymphoblastic leukemia. Blood 125, 3609–3617 (2015).
Adams, B. et al. Pax-5 encodes the transcription factor BSAP and is expressed in B lymphocytes, the developing CNS, and adult testis. Genes Dev. 6, 1589–1607 (1992).
Urbanek, P., Wang, Z. Q., Fetka, I., Wagner, E. F. & Busslinger, M. Complete block of early B cell differentiation and altered patterning of the posterior midbrain in mice lacking Pax5/BSAP. Cell 79, 901–912 (1994).
Kuiper, R. P. et al. High-resolution genomic profiling of childhood ALL reveals novel recurrent genetic lesions affecting pathways involved in lymphocyte differentiation and cell cycle progression. Leukemia 21, 1258–1266 (2007).
Fortschegger, K., Anderl, S., Denk, D. & Strehl, S. Functional heterogeneity of PAX5 chimeras reveals insight for leukemia development. Mol. Cancer Res. 12, 595–606 (2014).
Novershtern, N. et al. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell 144, 296–309 (2011).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Hardy, R. R. & Hayakawa, K. B cell development pathways. Annu. Rev. Immunol. 19, 595–621 (2001).
Pui, C. H. et al. Treating childhood acute lymphoblastic leukemia without cranial irradiation. N. Engl. J. Med. 360, 2730–2741 (2009).
Bowman, W. P. et al. Augmented therapy improves outcome for pediatric high risk acute lymphocytic leukemia: results of Children’s Oncology Group trial P9906. Pediatr. Blood. Cancer 57, 569–577 (2011).
Goldstone, A. H. et al. In adults with standard-risk acute lymphoblastic leukemia, the greatest benefit is achieved from a matched sibling allogeneic transplantation in first complete remission, and an autologous transplantation is less effective than conventional consolidation/maintenance chemotherapy in all patients: final results of the International ALL Trial (MRC UKALL XII/ECOG E2993). Blood 111, 1827–1833 (2008).
Kantarjian, H. et al. Long-term follow-up results of hyperfractionated cyclophosphamide, vincristine, doxorubicin, and dexamethasone (Hyper-CVAD), a dose-intensive regimen, in adult acute lymphocytic leukemia. Cancer 101, 2788–2801 (2004).
Ravandi, F. et al. First report of phase 2 study of dasatinib with hyper-CVAD for the frontline treatment of patients with Philadelphia chromosome-positive (Ph+) acute lymphoblastic leukemia. Blood 116, 2070–2077 (2010).
Thomas, D. A. et al. Treatment of Philadelphia chromosome-positive acute lymphocytic leukemia with hyper-CVAD and imatinib mesylate. Blood 103, 4396–4407 (2004).
Thomas, D. A. et al. Chemoimmunotherapy with a modified hyper-CVAD and rituximab regimen improves outcome in de novo Philadelphia chromosome-negative precursor B-lineage acute lymphoblastic leukemia. J. Clin. Oncol. 28, 3880–3889 (2010).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Nicorici, D. et al. FusionCatcher - a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Preprint at https://www.biorxiv.org/content/early/2014/11/19/011650 (2014).
Edgren, H. et al. Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome. Biol. 12, R6 (2011).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Alexander, T. B. et al. The genetic basis and cell of origin of mixed phenotype acute leukaemia. Nature 562, 373–379 (2018).
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome. Biol. 11, R106 (2010).
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Pounds, S. et al. Reference alignment of SNP microarray signals for copy number analysis of tumors. Bioinformatics 25, 315–321 (2009).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Yau, C. OncoSNP-SEQ: a statistical approach for the identification of somatic copy number alterations from next-generation sequencing of cancer genomes. Bioinformatics 29, 2482–2484 (2013).
Gu, Z. & Mullighan, C. G. ShinyCNV: a Shiny/R application to view and annotate DNA copy number variations. Bioinformatics https://doi.org/10.1093/bioinformatics/bty546 (2018).
Zambon, A. C. et al. Go-elite: a flexible solution for pathway and ontology over-representation. Bioinformatics 28, 2209–2210 (2012).
Nutt, S. L., Urbanek, P., Rolink, A. & Busslinger, M. Essential functions of Pax5 (BSAP) in pro-B cell development: difference between fetal and adult B lymphopoiesis and reduced V-to-DJ recombination at the IgH locus. Genes Dev. 11, 476–491 (1997).
Pelletier, S., Gingras, S. & Green, D. R. Mouse genome engineering via CRISPR-Cas9 for study of immune function. Immunity 42, 18–27 (2015).
Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).
Mantel, N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother. Rep. 50, 163–170 (1966).
Сox, D. R. Regression models and life-tables. J. R. Stat. Soc. Series B Stat. Methodol. 34, 187–220 (1972).
We thank the Biorepository, the Genome Sequencing Facility of the Hartwell Center for Bioinformatics and Biotechnology, and the Cytogenetics core facility of SJCRH. This work was supported by the American Lebanese Syrian Associated Charities of SJCRH, American Society of Hematology Scholar Award (to Z.G. and K.G.R.), the Leukemia & Lymphoma Society’s Career Development Program Special Fellow Award (to Z.G.), St. Baldrick’s Foundation Robert J. Arceci Innovation Award (to C.G.M.), Amgen, Inc. to ECOG-ACRIN, NCI Outstanding Investigator Award R35 CA197695 (to C.G.M.), National Institute of General Medical Sciences grant P50 GM115279 (to C.G.M.), NCI grants P30 CA021765 (St. Jude Cancer Center Support Grant), ECOG-ACRIN Operations Center grants CA180820 (to P. O’Dwyer from University of Pennsylvania and the Abramson Cancer Center), CA189859 (to E.P.), CA180790 (to M.R.L.) and CA180791 (to M.S.T. and Y.Z).
The authors declare no competing financial interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
a, Heatmap of 1,988 cases’ gene expression profile clustered by Pearson correlation and Ward’s clustering method based on the 500 most variable genes (evaluated by median absolute deviation). B-ALL subtypes specified in Fig. 1a are annotated at the top of the heatmap. The newly identified subtypes including PAX5 P80R, PAX5alt and IKZF1 N159Y are highlighted in red; the subtypes not annotated in Fig. 1a are highlighted in blue. PAX5alt group was defined by this hierarchical clustering of gene expression profiles. Among the cluster (highlighted in red rectangle) of 173 cases enriched with PAX5 alterations, 25 were classified as other subtypes including NUTM1 (N = 1), BCL2/MYC (N = 1) and Ph-like (N = 23), and the remaining 148 cases were classified as PAX5alt. b, Distribution of B-ALL subtypes in different age groups. Definition of age groups is described in Table 1. The KMT2A-like (N = 5) and ZNF384-like (N = 4) subtypes are merged with KMT2A and ZNF384, respectively. The subtypes are grouped as gross chromosomal alteration, transcription factor (TF) rearrangement, other TF alteration, kinase driven and others. AYA, adolescent and young adult. SR, standard risk; HR, high risk.
Details of cohort distribution across different age groups are shown at the top. Gene rearrangements identified by transcriptome sequencing (RNA-seq) were used to define 10 B-ALL subtypes. Available karyotypic information and chromosomal level copy number alterations called from RNA-seq were used to distinguish aneuploidy: high hyperdiploid, low hypodiploid and near haploid. B-ALL subtypes (N = 11, yellow box) showing distinct gene expression profiles and with sufficient number of cases (n ≥ 10) were used as training dataset for Prediction Analysis for Microarray (PAM; Tibshirani, R. et al., Proc Natl Acad Sci U S A. 99, 6567–6572, 2002) to predict “-like” subtypes. Detailed rules are provided in Table 1. chr no., chromosome number.
a, Distribution of PAX5 rearrangements (PAX5r) and sequence mutations (PAX5mut). The same parameters as Fig. 1a are used in this tSNE plot and all samples in this study are included (N = 1,988). PAX5 rearrangements with JAK2 and ZCCHC7 are most commonly observed in Ph/Ph-like subtypes, while the other PAX5 rearrangements (PAX5r-other) are clustered in the PAX5alt group. Two frequently observed rearrangements PAX5-ETV6 and PAX5-NOL4L are highlighted. b, Distribution of all types of PAX5 alterations in 1,141 cases with both SNP array and RNA-seq data available. The copy number alterations (CNAs) were called from SNP array and divided into the following types: 1 copy gain/loss, broad copy gain/loss, which could be chromosomal or arm level CNA; focal 1 copy gain/loss, no more than 10 canonical genes are encompassed in the CNA region; partial 1 copy loss/gain, CNA’s breakpoint is in PAX5 gene body; CN-LOH, copy-neutral loss of heterozygosity; del between PAX5 and ZCCHC7, which is a deletion commonly observed in B-ALL and could result in PAX5-ZCCHC5 fusion; partial 1 copy loss within PAX5 is a type of focal deletion with both start and end breakpoints in PAX5 gene body (intragenic). Focal intragenic amplifications on PAX5 (N = 10; PAX5amp) are specifically enriched in PAX5alt group (N = 8). c, Distribution of PAX5 alterations in each B-ALL subtype.
a, PAX5-ITD (PAX5amp) detected by whole genome sequencing (WGS) in Integrative Genomics Viewer (Robinson, J.T. et al., Nat Biotechnol 29, 24-6,2011). Upper scatter plot shows WGS coverage relative to the germline sample. The genomic region with elevated copy number is highlighted by a red bar. The red arc denotes a tandem duplication encompassing PAX5 exons 2–5. Transcriptome sequencing coverage from the same sample is shown as a blue histogram and elevated expression of gained exons is shown. Below are the aligned WGS reads, and the discordant pairs are shown in red, supporting the structural variation. b, Wild-type and mutant PAX5 with amplified exons (e) 2–5. Primers (shown as arrows) were designed to amplify the fragments with the 5ʹ end in exon 5 and 3ʹ in exon 2 (primer e5: GACACCAACAAGCGCAAGAGAGAC; e2: TGATGAGCAAGTTCCACTATCCTC). c, Representative electropherogram of Sanger sequencing showing the junction of exon boundaries characterizing the duplication of exons 2–5. d, Fluorescent in-situ hybridization (FISH) confirming the presence of a PAX5 (exons 2–5) tandem duplication. Duplication is indicated by paired red signals (PAX5 exons 2–5 fosmid clone) associated with a green signal (chromosome 9 control probe). Sixty-three percent of analyzed cells were determined to be positive for the PAX5 duplication. A FISH validation in normal metaphase cells confirming the localization is shown on the left panel.
Kaplan-Meier estimates for EFS and OS of children treated on St. Jude (SJ) Total protocols. P values are calculated by the two-sided time-stratified Cochran–Mantel–Haenszel test across all the subtypes in each panel. Detailed analysis results are provided in Supplementary Table 25. Favorable subtypes include High hyperdiploid, ETV6-RUNX1, TCF3-PBX1 and DUX4, 304 patients; KMT2A, 33; PAX5 P80R, 6; PAX5al, 31; Ph, 17; Ph-like, 27; other includes BCL2/MYC, CRLF2(non-Ph-like), ETV6-RUNX1-like, TCF3-HLF, iAMP21, MEF2D, NUTM1, ZNF384 and all other, 56.
a, Schematic representation of the Pax5 gene and mutations introduced to generate Pax5P80R and Pax5G183S mouse lines. sgRNAs (blue) targeting exon 3 or exon 5 were used to introduce Pax5 P80R or Pax5 G183S mutations (red). Several silent mutations were also introduced to facilitate PCR genotyping and prevent Cas9-mediated cleavage of the loci after repair. These include mutations disrupting the protospacer adjacent motif (PAM) sequences (underlined) and protospacer elements (orange). Blue arrow, Pax5-e3-F1; green arrow, Pax5-e3-R1; yellow arrow, Pax5-e5-F1; red arrow, Pax5-e5-R1; arrowhead, Cas9 cleavage sites. b, Representative Sanger sequencing validation of Pax5 wild-type, P80R, and G183S alleles to assign genotypes.
Supplementary Figure 7 Infer chromosomal copy number alterations (CNAs) from transcriptome sequencing data.
Gene expression level (rlog) evaluated by DESeq2 was normalized and shown on each chromosome to indicate whole chromosomal copy number gain or loss (upper). The boxplot for each chromosome shows the median value as the center line, and 25 and 75% quantile as the lower and upper hinge of each box, respectively. Lower whisker equals the smallest observation greater than or equal to lower hinge - 1.5 * IQR (interquartile range), and the upper whisker reaches the largest observation less than or equal to upper hinge + 1.5 * IQR. The skyblue line indicates the rlog from all the chromosomes, and the red line shows the median expression level of genes on chromosomes with 2 copies. With copy number changes, mutant allele frequency (MAF) of SNVs are changed and the density peaks of MAF are skewed (lower, highlighted in red if the highest peak is not around 0.5). Homozygous duplication of a chromosome could be recognized by elevated gene expression level, but is not noticeable on MAF density plot (for example, chromosome 14 and 21). The example patient ID is SJALL040088 and the figure, with the exception of chromosome 15, was highly consistent with the karyotype: 61,XX,+X,+3,+4,+5,+6,+10,+11,+12,del(12)(p11.2),+14,+15,+16,+17,+18,+21,+21 (14/70%) 62,idem,+mar (3/15%) 46,XX (3/15%). As shown in Supplementary Table 34, we observed consistency in calling of aneuploidy of autosomes between SNP array CNA data and RNA-seq data; erroneous calling on karyotyping may arise from miscalling of suboptimal metaphase data.
About this article
Cite this article
Gu, Z., Churchman, M.L., Roberts, K.G. et al. PAX5-driven subtypes of B-progenitor acute lymphoblastic leukemia. Nat Genet 51, 296–307 (2019). https://doi.org/10.1038/s41588-018-0315-5
Nature Reviews Cancer (2021)
The Yin and Yang-Like Clinical Implications of the CDKN2A/ARF/CDKN2B Gene Cluster in Acute Lymphoblastic Leukemia
Emerging molecular subtypes and therapeutic targets in B-cell precursor acute lymphoblastic leukemia
Frontiers of Medicine (2021)
Monatsschrift Kinderheilkunde (2021)
Cancer Genetics (2021)