Heritable variation at the chromosome 21 gene ERG is associated with acute lymphoblastic leukemia risk in children with and without Down syndrome

V, et al. Identification of pre-leukaemic haematopoietic stem cells in acute leukaemia. Nature. 2014;506:328–33. 12. Gaidzik VI, Weber D, Paschka P, Kaumanns A, Krieger S, Corbacioglu A, et al. DNMT3A mutant transcript levels persist in remission and do not predict outcome in patients with acute myeloid leukemia. Leukemia. 2017;32:30–7. 13. Hollein A, Meggendorfer M, Dicker F, Jeromin S, Nadarajah N, Kern W, et al. NPM1 mutated AML can relapse with wild-type NPM1: persistent clonal hematopoiesis can drive relapse. Blood Adv. 2018;2:3118–25. 14. Schnittger S, Schoch C, Kern W, Mecucci C, Tschulik C, Martelli MF, et al. Nucleophosmin gene mutations are predictors of favorable prognosis in acute myelogenous leukemia with a normal karyotype. Blood. 2005;106:3733–9.

of common heritable ALL risk alleles at ARID5B, GATA3, and PIP4K2A in Latinos [2][3][4]. However, the etiologies of the increased ALL risk in Latinos have not been fully elucidated. We previously performed a large, multi-ethnic genome-wide association study (GWAS) of childhood ALL, including 3,263 cases of which~60% were of Latino ethnicity [5]. While we identified two novel risk loci, we did not identify Latino-specific risk loci, unlike a recent report from Qian et al. [6]. We have performed whole-genome imputation of our Latino dataset and combined it with GWAS data from two additional, non-overlapping Latino childhood ALL case-control datasets to identify novel and/ or Latino-specific risk loci.
The GWAS meta-analysis included the following: (i) 1,949 ALL cases and 2,120 controls from the California Cancer Records Linkage Project (CCRLP-LAT) study, supplemented with 6464 Kaiser GERA study controls [5]; (ii) 38 cases and 49 controls from a Guatemalan ALL case-control study (GTM); and (iii) 312 cases and 454 controls from the California Childhood Leukemia Study (CCLS) [7] (Supplementary Material). Methods for haplotype phasing, whole-genome imputation, and quality-control of imputed genotypes are described in Supplementary Material. Case-control association analyses were performed separately in each study using logistic regression in SNPTEST V2, adjusting for ten ancestry-informative principal components, calculated separately within each dataset. Within-study genomic inflation factors were low (λ CCRLP = 1.034, λ GTM = 1.01, λ CCLS = 1.025). A fixed-effects meta-analysis was performed, and QQ plots indicated adequate control of type I error and minimal population stratification (λ Meta = 1.029) ( Supplementary Fig. S1).
The effect of this locus on ALL risk was recently reported to increase with increasing global Native American (NA) ancestry [6]. Here we examined local ancestry at the ERG locus (Supplementary Material, Supplementary  Fig. S2), and found a larger effect size for rs8131436 in Latinos with ≥1 copy of the NA haplotype (OR = 1.30; 95% CI = 1.15-1.47; P = 2.4 × 10 -5 ) than in Latinos with zero NA haplotypes (OR = 1.15; 95% CI = 0.98-1.34; P = 0.09), further supporting a positive association between NA ancestry and the effect of ERG heritable variation on ALL risk. The frequency of NA haplotypes at rs8131436 was slightly higher in cases (42.7%) than controls (40.9%) ( Supplementary Fig. S3); however, taking into account the proportion of global NA ancestry, the case-control difference in local NA ancestry at ERG was not significant (P = 0.44) (Supplementary Table S3).
ERG is within the Down syndrome (DS) critical region on chromosome 21, and children with trisomy 21 have añ 20-fold increased risk of ALL [12]. Therefore, we explored whether ERG variation may contribute to DS-ALL risk. We genotyped rs2836371 (lead SNP across Latino discovery and non-Latino white replication sets) using a Taqman SNP genotyping assay in a Latino case-control set (DS-ALL cases, n = 103 and DS non-leukemia controls, n = 96) from the International Study of Down Syndrome Acute Leukemia (IS-DSAL, Supplementary Material). Trisomic genotypes were manually clustered to delineate the two heterozygote genotypes (TTC or TCC) (Supplementary Fig. S4). We found that rs2836371 was significantly associated with risk of DS-ALL (P = 0.016) with a per-allele OR of 1.44 (95% CI: 1.08-1.96), which was noticeably but non-significantly higher than that in non-DS Latinos (OR = 1.19, Supplementary Table S2) (P interaction = 0.21). Furthermore, subjects with three risk alleles at  Supplementary  Fig. S5). In a smaller set of non-Latino white DS-ALL cases (n = 83) and DS controls (n = 78), rs2836371 was not significantly associated with DS-ALL risk (OR = 1.07, 95% CI: 0.77-1.49), reflecting similar inter-ethnic differences in effect size observed in non-DS participants.
Observed inter-ethnic differences in SNP effect size suggest potential interactions with environmental factors, or with additional germline or somatic genetic alterations. Intriguingly, several published GWAS loci for white blood cell (WBC) traits in adults lie~50Kb downstream of rs2836371 within ERG [13]. These SNPs are in very low linkage disequilibrium (LD) with our ALL-associated SNPs, and are positioned on the other side of a strong recombination peak (Supplementary Fig. S6). Novel analysis of selection signals across ERG in Latinos revealed no evidence of positive selection for ALL risk SNPs, but identified a strongly significant signal (population branch statistic >99 th percentile genome-wide; haplotype statistic >97 th percentile) at the downstream WBC trait locus ( Supplementary Fig. S6). SNP rs2836426 showed the strongest selection signal (P = 2.2 × 10 -4 ) and, though in low LD with ALL risk SNP rs2836371 (D′ = 0.16 in AMR, 1000Genomes), it is in high LD with several WBC traitassociated SNPs (D′ = 1 in AMR). No direct association was detected between the low-frequency WBC traitassociated SNPs and ALL risk; however, we found marginally significant synergistic interaction between ALLassociated SNP rs2836371 and three perfectly linked WBC trait SNPs (rs80109907, rs7275212, and rs58030288) on ALL risk in Latinos (P = 0.079, OR = 2.00) but not in non-Latino whites (P = 0.48, OR = 0.78) (Supplementary Table S4), suggesting Latino-specific cooperation between these two independent trait-associated loci in ALL predisposition.
To explore potential functional effects of ALL-associated SNPs in ERG, we assessed 32 SNPs with P < 5.0 × 10 -5 in the Latino meta-analysis, of which 19 replicated in the European data (P < 0.05). ERG protein is expressed at low levels in lymphoblastoid cell lines, which prevented accurate expression quantitative trait locus (eQTL) analysis within Genotype Tissue Expression (GTEx) or GEUVADIS RNASeq datasets. In silico analyses, using Haploreg, RegulomeDB, UCSC Genome Browser, and Epigenome Browser, revealed no protein-coding variants, nor any obvious functional candidates based on overlap with putative regulatory elements and transcription factor binding sites.
A recently identified ALL tumor subtype, "DUX4rearranged ALL", is characterized by somatic DUX4 rearrangements that result in alternative splicing of ERG using an alternative start site at "exon 6 alt" [14]. ALLassociated SNPs at ERG did not alter known DUX4 binding motifs, and TF-binding motif analysis did not reveal any SNPs creating novel DUX4 binding motifs.
We assessed whether any SNPs overlapped ERG exon 6 alt and found that SNP rs2836361, in tight LD with rs2836371 (R 2 = 0.93 and D′ = 0.97 in 1000 Genomes individuals of Mexican ancestry; R 2 = 0.99 and D′ = 0.99 in Europeans), was located 3 bp upstream of the first exon 6 alt codon (Supplementary Fig. S7). SNP rs2836361 disrupts a strong exonic splicing silencer (ESS), with the risk allele reducing the score of a silencer motif "TCTCCCAA" [15] from 88.1 (TCTGCCAA containing the rs2836361 protective allele) to 70.9 (TCTGTCAA containing the risk allele). This ESS had the highest predicted score within a region encompassing exon 6 alt + /−100bp. Moreover, we found that the rs2836361 risk allele may increase exonic splicing enhancer activity by elevating the RNA recognition motif score for serine/arginine-rich pre-mRNA splicing factor (SRp40). Hence, the rs2836361 risk allele may increase splicing of the non-canonical ERG exon 6 alt, conferring dominant negative effects on wildtype ERG and increased risk of ALL. Further analysis is needed to confirm the causal variant at this locus and its functional effects.
In sum, we report the largest GWAS of childhood ALL among Latinos to date, identifying a risk locus at chromosome 21q22.2, encompassing the hematopoietic transcription factor ERG. This gene is frequently somatically mutated in ALL, adding to a growing list of genes that both predispose to ALL and drive tumorigenesis following somatic mutations. Insufficient patient data were available to investigate the relationship between ERG SNPs and somatic alterations; however, during preparation of this manuscript, Qian et al. reported that the ERG risk genotype was negatively correlated with somatic ERG deletions [6], supporting that the SNP may somewhat mimic effects of somatic loss of ERG.
Novel to our study, we replicated the ERG association in a case-control study of Down syndrome-ALL; this is the first reported heritable risk factor for DS-ALL, and may inform future risk stratification in this vulnerable population. Current methods to accurately assess trisomic genotypes using SNP arrays are sub-optimal; next-generation sequencing strategies are warranted to elucidate the contribution of heritable variation across chromosome 21 to DS-ALL risk.
Our study highlights the importance of Latino subjects in elucidating the germline genetic architecture of childhood ALL, and suggests that larger sample sizes may reveal additional important susceptibility loci that inform the biology of leukemogenesis.

Disclaimer
The ideas and opinions expressed herein are those of the author (s) and do not necessarily reflect the opinions of the State of California, Department of Public Health, the National Cancer Institute, and the Centers for Disease Control and Prevention or their Contractors and Subcontractors.