Introduction

Acute lymphoblastic leukemia (ALL) is characterized by the malignant clonal proliferation of lymphoid cells that are blocked at an early stage of differentiation. More than 60% of patients diagnosed with ALL are children below the age of 15 years.1, 2, 3 The annual incidence rates of childhood ALL vary worldwide between one and four new cases per 100 000 children younger than 15 years, with a peak incidence at about 2–5 years of age.3, 4, 5

As for most other human malignancies not underlying a clear hereditary genetic trait, the etiology of childhood ALL is believed to be multifactorial, including environmental risk factors like radiation and other toxins as well as genetic variables.6, 7 Different subtypes of childhood ALL are classifiable according to either immunophenotypic (for example, precursor B-cell vs T-cell lineage ALL) or a constantly growing number of recurrent somatic genetic aberrations that have been characterized during the last decades.1, 2, 8, 9 The two major genetic subtypes of childhood ALL are characterized by either hyperdiploidy (>50 chromosomes per leukemic cell) or the chromosomal translocation t(12;21)(p13;q22) leading to an ETV6–RUNX1 (TEL–AML1) gene fusion.10 Hyperdiploid and ETV6–RUNX1-rearranged childhood ALL account for 25% and 22% of the entire childhood ALL population, respectively.

A minority of childhood ALL cases (<5%) is related to specific genetic conditions such as Down's syndrome or disorders associated with impaired DNA repair capacities (for example, ataxia teleangiectasia, Nijmegen breakage syndrome or Bloom's syndrome).6, 7 Recently, two genome-wide association studies (GWAS) on childhood ALL performed in the UK and the US have identified 13 susceptibility loci.11, 12 The UK study screened around 291 000 single-nucleotide polymorphisms (SNPs) in 907 ALL cases (a mixture of different subtypes) in a meta-analysis without independent replication while the US study investigated nearly 308 000 SNPs in 317 ALL cases (also a mixture) after quality control. The two highest scoring loci were detected by both studies and subsequently verified by others:13 IKZF1, a transcription regulator of lymphoid cell differentiation, and ARID5B a member of the AT-rich interaction domain family of transcription factors.

We report here the results of a genome-wide association study conducted with patients displaying a subtype of childhood ALL characterized by the chromosomal translocation which leads to an ETV6–RUNX1 gene fusion. Thus, to the best of our knowledge, our study is the first genome-wide approach to susceptibility loci primarily addressing one of several well-defined and specific subgroups of the heterogenous disease entity ‘childhood ALL’.

Patients and methods

Recruitment of patients and healthy controls

Samples were organized in panels that corresponded to the successive steps of the present study. All individual panels (A–D, Supplementary Table S1) were independent from each other.

All patients included in panel A, 206 patients from panel B, and all patients from panels C and D were enrolled into the Austrian–German–Italian–Swiss multicenter clinical trial AIEOP–BFM ALL 2000 on treatment of childhood ALL after informed consent had been obtained. Patients were diagnosed and treated at one of 121 participating study centers in Austria, Germany and Italy between 1999 and 2008. 200 ALL patients from panel B were enrolled into two subsequent clinical trials of the German COALL study group, COALL-06-97 and COALL-07-03 between 1997 and 2008 at 19 German treatment centers. To generate the extended panel B with 664 patients, the 406 patients from panel B were complemented with an additional 258 patients enrolled in the years 2009 and 2010 into the Austrian–German–Italian–Swiss multicenter clinical trials AIEOP–BFM ALL 2000 or its predecessor, the ongoing trial AIEOP–BFM ALL 2009 after informed consent had been obtained. Diagnosis was based on cytomorphology (FAB criteria) and cytochemistry when 25% of lymphoblastic cells were present in the bone marrow, or when lymphoblasts were present in the peripheral blood or cerebrospinal fluid. Flow cytometric immunophenotyping was performed according to consensus protocols based on the guidelines proposed by the European Group for the immunological characterization of Leukemias. The presence or absence of ETV6–RUNX1, BCR–ABL and MLL–AF4 fusion gene transcripts was analyzed by a multiplex PCR assay enabling the detection of M-BCR–ABL, m-BCR–ABL, ETV6–RUNX1 and MLL–AF4 fusion transcripts in a single PCR reaction. Positive results were confirmed by interphase fluorescence in situ hybridization in the majority of the patients. Written, informed consent was obtained from all study participants or their legal guardians to use spare diagnostic specimens for research purposes and all protocols were approved by the respective institutional ethical review committees.

Bone marrow samples were obtained at initial diagnosis and at consecutive follow-up time points during therapy, mainly after induction and consolidation. Mononuclear cells were isolated by Ficoll-Paque gradient centrifugation (Pharmacia, Freiburg, Germany) from bone marrow samples followed by extraction of high-molecular weight DNA according to standardized protocols. Quality and quantity of genomic DNA was determined by spectrophotometry. In the present study only remission DNA obtained at follow-up time points after induction or consolidation was analyzed.

The German healthy control individuals of panel A and B were obtained from the popgen biobank.14 The healthy Italian controls of panel C were recruited through the blood bank of the San Gerardo Hospital at Monza, Italy. Written and informed consent was obtained from all study participants and all protocols were approved by the institutional ethical review committees of the participating centers. To account for their heterogeneous sources, all DNA samples were first quality-controlled by gel electrophoresis.

SNP genotyping for genome-wide screening

The genotyping for the GWAS samples (panel A), which was part of the German NGFN GWAS initiative funded by the NGFN, was performed as a service project by Affymetrix (South San Francisco, CA, USA) using the genome-wide human SNP array 5.0. (500 k) with 443 816 markers. The array is based on an assay termed whole-genome sampling analysis developed for highly multiplexed SNP genotyping of complex DNA. This method reproducibly amplifies a subset of the human genome through a single primer amplification reaction using restriction enzyme digested and adapter-ligated human genomic DNA.

In brief, 5 μl of genomic DNA samples at 50 ng/μl were aliquoted to the corresponding wells of two 96-well plates. The first run of samples was processed as an entire plate. In the lab, transfers were made with a 12-channel pipette, reducing the risk of sample tracking errors. One plate was digested with Nsp I and the other plate was digested with Sty I. The reaction was incubated at 37 °C for 2 h and at 65 °C for 20 min to deactivate the enzyme. The digested DNA was then ligated to their respective Nsp I adaptor and Sty I adaptor. The ligated product was then PCR-amplified using a common primer. Both Nsp I PCR product and Sty I PCR product were combined, and then purified by ethanol precipitation in combination with membrane filter plate. Purified PCR product was further fragmented with DNase I and then labeled with biotin. Labeled DNA was combined with hybridization mix and then injected into array. Arrays were hybridized for 18–22 h at 50 °C. DNA samples were recovered from arrays and washed and stained by using Affymetrix FS450 fluidic stations. Stained arrays were scanned using Affymetrix GeneChip Scanner 3000 7G.

Raw image files were converted into cel-files by Affymetrix genotyping console. A preliminary QC call rate of 86% was used to pass arrays for further data analysis. Passed arrays were clustered in the same batches which were processed together in the lab. Genotypes were assigned using the BRLMM-p algorithm. There were 12 positive control samples run in each project. The positive controls were from a CEPH trio family: NA12740 – daughter (six repeats), NA12750 – father (four repeats), and NA12751 – mother (four repeats). This set of controls allowed to calculate experimental reproducibility, trio accuracy and HapMap concordance. The average call rate was above 96%, and the average BRLMM-p call rate, average reproducibility (controls), average HapMap concordance (controls) and average trio accuracy were above 99%.

Samples with more than 5% missing genotypes, who showed excess genetic dissimilarity to the other subjects or who showed evidence for cryptic relatedness to other study participants (Supplementary Figure S1) were removed. These quality control measures left 419 ETV6–RUNX1-positive ALL patients and 474 healthy control samples for inclusion in screening panel A. All gender assignments could be verified by reference to the proportion of heterozygous SNPs on the X chromosome. Before analysis, all SNPs that had a low genotyping rate (<95% in cases or controls), were monomorphic or rare (minor allele frequency <2% in cases or controls), or deviated from Hardy–Weinberg equilibrium in the control sample (PHWE0.001) were excluded (n=88 066; 19.8% of all SNPs).

SNPlex, TaqMan and Sequenom genotyping

All downstream genotyping (panels B through D) was performed with SNPlex and TaqMan technologies (Applied Biosystems, Foster City, CA, USA) or the Sequenom platform using an automated laboratory setup and all process data were written to and administered by a database-driven laboratory information management system.

Of 100 selected SNPs 87 SNPs passed quality control in both replication samples. These SNPs had a high call rate (>95% in cases or controls), were not monomorphic or rare (minor allele frequency >1% in cases or controls) and did not deviate from Hardy–Weinberg equilibrium in the control population (PHWE>0.001).

Mutation detection

Amplicons were generated using the following touchdown PCR program: 95 °C for 12 min, (95 °C for 30 s, Tm for 30 s, 72 °C for 30 s) × 16 cycles (td −0.5 °C), (95 °C for 30 s, Tm for 30 s, 72 °C for 30 s) × 19 cycles, 72 °C for 10 min and 10 °C for ∞.

Sequencing of genomic DNA was performed using BigDye Terminator v3.1 chemistry (Applied Biosystems) and an ABI3730 capillary sequencer (Applied Biosystems) according to manufacturer protocols. Traces were inspected for SNPs and InDels using novoSNP v2.03.

Statistical analyses

Our screening panel A had 80% power to detect a variant with an odds ratio of 1.48 or higher at the 5% significance level, assuming a frequency of the disease-associated allele of at least 30% in controls (calculated with PS Power and Sample Size v3.0.1215). For subsequent statistical analyses, we used a data set that passed stringent quality control filters (see section SNP genotyping for genome-wide screening) resulting in 355 750 SNPs genotyped in 419 cases and 474 healthy controls. The total genotyping rate across these samples was 99.8%. We found minimal evidence for an overall inflation of the test statistics due to population stratification with a moderate genomic control value of λGC=1.14. Furthermore, a multidimensional scaling analysis showed genuine European ancestry of panel A and identity-by-state analysis revealed neither non-European ‘outliers’ nor cryptically related individuals after quality control (Supplementary Figure S1). Genome-wide association analyses were conducted with gPLINK v2.050 in combination with PLINK v1.07.16 GWAS data were also analyzed with R statistical environment version 2.10.0.

After the initial comparison of case–control frequencies (Supplementary Figure S2), we pruned the SNP list for redundancy due to linkage disequilibrium (using the ‘clump’ command in PLINK) and visually inspected the cluster plots of the obtained index SNPs with a P-value <2 × 10−4 (Supplementary Figure S3). The 100 most strongly associated SNPs were subsequently selected for replication analysis and genotyped in the German replication panel B (consisting of 406 cases and 1682 controls) and the Italian replication panel C (comprising 287 cases and 579 controls). After quality control, we excluded 12 follow-up SNPs with a call rate less than 95% in cases or controls and one SNP with significant deviation from Hardy–Weinberg equilibrium in the controls (P<0.001). To take the different geographic origin of the two replication panels into account, we used a Cochran–Mantel–Haenszel test (PCMH) and a Breslow–Day test for odds ratio heterogeneity (PBD) in both the combined analysis of replication panels B and C and in the overall study sample (panels A through C) (Supplementary Table S2). The eight SNPs with a P-value <5 × 10−7 in the combined analysis of panels A and B or A through C were additionally genotyped in 258 ETV6–RUNX1-positive ALL cases and 800 healthy controls from Germany (extended replication panel B: 664 cases and 2482 controls).

Imputation

The software package BEAGLE v3.1.1(ref. 17) was used to impute the genotypes of autosomal SNPs based on the 1000 Genomes data. As input for imputation, we used only SNPs on the Affymetrix platform that passed quality control. To take imputation uncertainty into account, association analysis between the phenotype and the dosage data (expected allele counts) was performed using PLINK's logistic regression framework for dosage data (with the ‘dosage’ command).

Plotting

Regional association plots were generated by using a modified version of deBakker's R script. SNPs are represented with their corresponding P-values (on −log10 scale) on the vertical axis as a function of physical position. Single-marker association analysis was performed using PLINK's standard case–control allelic test (with the ‘assoc’ command) and study SNPs as well as imputed SNPs were used as input. Physical positions on chromosomes are based on NCBI build 36. Estimated recombination rates from phased haplotypes in HapMap release 22 (build 36) were downloaded from the HapMap website, gene annotations were downloaded from the UCSC genome browser (using build 36 coordinates).

eQTL analysis

Expression data of TP63 (TP73L), PTPRJ and INTS10 (FLJ10569) in EBV-transformed lymphoblastoid cell lines from the HapMap samples was available from the GENEVAR project,18 and expression in the n=60 CEU parents was correlated to SNP genotypes, using linear regression in the web-based tool SNPEXP v1.1. (http://app3.titan.uio.no/biotools/tool.php?app=snpexp).

Results

Associations with ETV6–RUNX1 subtype of childhood ALL

Novel significant disease associations that withstood Bonferroni correction (adjusted significance threshold α=5.75 × 10−4 (0.05/87)) in the combined analysis of replication panels B and C were obtained for rs17505102, an intronic SNP in the gene encoding the tumor protein p63 isoform 3 (TP63; also known as tumor protein p73-like: TP73L) on chromosome 3q28 (PCMH=4.87 × 10−7, OR=0.63, 95% CI=0.52-0.75) and for the intronic SNP rs3942852, located in the gene PTPRJ (protein tyrosine phosphatase, receptor type, J) on chromosome 11p11.2 (PCMH=2.54 × 10−4, OR=0.77, 95% CI=0.68-0.89) (Table 1 and Figure 1). Although the associations of these loci with ETV6–RUNX1-positive ALL did not achieve genome-wide significance (conventional level of P<5.0 × 10−8) in the discovery panel, we obtained genome-wide significance for rs17505102 (PCMH=8.94 × 10−9) within the TP63 gene in the overall sample of panels A through C comprising 1370 ETV6–RUNX1-positive ALL cases and 3535 healthy controls. For rs3942852, the combined analysis of the overall sample yielded PCMH=1.00 × 10−6.

Table 1 Summary of association results in combined replication panels B and C
Figure 1
figure 1

Regional plots of the confirmed associations at TP63, OR8U8, on 8p21.3 and at PTPRJ. The regional plots of the negative decadic logarithm of the P-values obtained in the GWAS (panel A) are shown. Panel A was imputed with CEU haplotypes generated by the 1000 Genomes Project (August 2010 release) as reference. For the central lead SNP of each plot, the combined P-value of panels A and B (rs1945213, rs920590 and rs3942852) or panels A through C (rs17505102) is indicated. The magnitude of linkage disequilibrium (LD) with the central SNP measured by r2 is reflected by the color of each SNP symbol (for color coding, see upper right corner of each plot). Recombination activity (in centimorgans (cM) per Mb) is depicted by a blue line. Positions are given as NCBI's build 36 coordinates. For details, see Table 1.

As the initial screening (discovery panel A) was performed with German samples only, we additionally analyzed the follow-up SNPs in the German/Austrian replication panel B exclusively to identify potential loci that are more specific for northern Europeans as compared with those that confer susceptibility in the European population in general. Apart from the two novel shared susceptibility loci (rs17505102 and rs3942852), we found another two associated SNPs that remained significant after correction for multiple testing: rs1945213 on chromosome 11q11 within the OR8U8 gene (olfactory receptor, family 8, subfamily U, member 8; P=1.63 × 10−7) and rs920590 on 8p21.3 24 kb upstream of INTS10 (Integrator complex subunit 10; P=1.05 × 10−5). Furthermore, three of these associations (rs17505102, rs1945213 and rs920590) were genome-wide significant in the combined analysis of the German/Austrian panels A and B comprising 1083 cases and 2956 controls. For rs3942852 we obtained a P-value of 4.95 × 10−7 in the combined analysis (Table1 and Figure 1). Of the newly identified susceptibility loci especially the PTPRJ gene and the tumor protein 63 are very interesting candidates, taking account of their functional role and strong homology with the tumor suppressor gene TP53, respectively. As the marker rs17505102 is located in the first intron of the TP63 gene, we sequenced exons 1–3 in 47 ALL cases to determine whether the lead SNP might tag one or more potential causative coding polymorphisms via linkage disequilibrium. However, we did not identify any coding SNP in these regions. Additionally, we checked NCBI's dbSNP19 build 134 for reported and validated, non-synonymous coding SNPs (1% frequency) within the TP63 gene, but found no SNP entries satisfying these criteria.

Using a publicly available resource (web-based tool SNPexp v1.120), we were not able to identify any significant correlation – so-called eQTL effect — between the genotypes at the lead SNPs rs17505012, rs1945213, rs3942852 and rs920590 and the expression levels of the nearby located genes (TP63, PTPRJ and INTS10) in EBV-transformed lymphoblastoid cell lines from the HapMap samples (available from the GENEVAR project18). For OR8U8 no expression data were available. However, it is noteworthy that the SNP rs3942852, located in the first intron of the PTPRJ gene, resides within a transcription factor binding site of BAF155 (SMARCC1), BAF170 (SMARCC2) and the proto-oncogene JunD assayed by CHiPseq in HeLa-S3 cells (UCSC Browser hg18). Hence, further experiments and analyses are required to elucidate the mechanism by which the SNPs rs17505102, rs1945213, rs3942852 and rs920590 affect the risk of ALL.

Associations with childhood ALL independent of ETV6–RUNX1 subtype

In order to assess whether our findings were specifically associated with the ETV6–RUNX1-rearranged subtype, we genotyped the 20 top SNPs displayed in Table 1 in a second German replication cohort (panel D) consisting of 326 ETV6–RUNX1-negative childhood ALL cases. We analyzed these cases together with the same 1682 controls of panel B and obtained genotypes of 19 SNPs after quality control. Of the 19 SNPs only rs7738636 on chromosome 6q14.1 was significantly associated (P=2.55 × 10−3). The combined analysis of the replication panels B/Bextended through D yielded only a marginally stronger association with PCMH=1.42 × 10−3 for rs7738636 (Supplementary Table S3). These results imply that rs7738636 might represent a susceptibility locus for ETV6–RUNX1-negative ALL patients. However, due to the small sample of ETV6–RUNX1-negative cases, our analysis does not have enough power to determine whether the newly identified loci distinguish ETV6–RUNX1-positive from -negative patients.

Associations of previously reported susceptibility-conferring SNPs

Additionally, we genotyped the susceptibility-conferring SNPs identified by Papaemmanuil et al.11 and Treviño et al.,12 respectively, in panels A through D in order to confirm the previously reported loci and as a positive control for our experiment. We observed genome-wide significant association (P<5 × 10−8) for IKZF1, DDC, ARID5B and CEBPE in the combined sample (panels A–D) comprising 1438 cases and 2735 controls. Suggestive evidence of association (P<0.05) was found for SIAT7C, 1q31.3 (rs6428370), KCNMB2, PARD3 and C12orf5 (Table 2).

Table 2 Association results for known loci

Discussion

In this study, we were able to identify attractive, novel risk loci for ALL, even with a small screening sample – compared with other GWAS studies and diseases – of only 419 patients. This success might be due to the exclusive selection of patients with the homogenous phenotype of ETV6–RUNX1-rearranged childhood ALL. Given their interesting biological function, we below discuss the relevance of TP63 and PTPRJ in ALL disease etiology.

The TP63 gene — a member of the TP53 gene family — codes for p63 which has essential roles in embryonic development. TP63 contains two transcriptional start sites leading to p63 isoforms either containing (TAp63) or lacking (Np63) the trans-activation domain.21 TP63-null mice die cancer-free within a few hours of birth, showing severe loss of limbs and a wide range of epithelial structures including skin, prostate, breast and urothelia.22, 23 This suggests that p63 acts as an important regulator of stemness in developing epithelia.24, 25 In humans, TP63 mutations are associated with ectodermal dysplasia, cleft lip or palate and limb malformations, but not with a higher tumor incidence.21, 26 In line with these observations, the TP63 coding region is found only rarely mutated in human cancers.27, 28 To date, recurrent somatic TP63 mutations have only been described for non-small cell lung cancer as well as chronic myeloid leukemia28, 29 and TP63 germline genetic variation (rs710521) has only been associated with urinary bladder cancer, so far.30 However, mice heterozygous for TP63 mutations develop malignant lesions, demonstrating that TP63 can act as a tumor suppressor and mice heterozygous for mutations in both TP53 and TP63 display higher tumor burden and metastasis compared with mice only heterozygous for TP53 mutations.31 Of particular interest to our study, TAp63 was described to be involved in an antiapoptotic pathway regulating normal B and chronic lymphocytic leukemia cell survival in a CD74-dependant manner.32, 33 In chronic lymphocytic leukemia, TAp63 expression upregulates VLA-4 integrin expression leading to augmented migration and homing of chronic lymphocytic leukemia cells to the bone marrow.33 Also, it was found that thymomas and certain B-cell lymphomas express high levels of TP63, suggesting a role of TP63 in these cancers.34

PTPRJ is a receptor type protein tyrosine phosphatase involved in the regulation of cellular processes including cell growth, differentiation, mitotic cycle and oncogenic transformation. Somatic mutations of PTPRJ have been described in a wide spectrum of cancers like thyroid carcinomas, colon cancer, breast and lung cancer.35, 36, 37, 38 Furthermore, a specific PTPRJ haplotype was shown to be associated with the risk of breast cancer demonstrating that common PTPRJ variants may act as breast cancer susceptibility alleles.39 Interestingly, the PTPRJ gene is expressed at high levels in hematopoietic cells.35, 37, 40 Of particular interest to the results reported in our study, mice with a constitutively deleted transmembrane domain allele of PTPRJ display a partial peripheral B-cell developmental block.41 Heterozygous mice do not show any notable phenotype.

Although the current knowledge on TP63 and PTPRJ may provide some evidence for their role in tumorigenesis as well as B-cell development and/or maintenance, more precise mechanisms how germline genetic variation at the TP63 and PTPRJ loci may modulate susceptibility to precursour B-cell leukemia remain speculative and underscore the need for functional studies.

Apart from our novel findings, our study could clearly confirm the association of childhood ALL with the four previously published susceptibility loci IKZF1, DDC, ARID5B and CEBPE. Nominal significant P-values (P<0.05) were obtained for SIAT7C, 1q31.3, KCNMB2, PARD3 and C12orf5. Although our study panel had more than 80% power to detect all of the SNPs/loci published by Papaemmanuil et al.11 and Trevino et al.,12 respectively, we could not replicate the loci RYR2, OR2C3, KCNE4, 6q24.1, KRTHB5, 18p11.32 and ZNF230, which therefore require further investigations.

Overall, our findings demonstrate that germline genetic variation can specifically contribute to the risk of specific ALL subtypes, in this case ETV6–RUNX1-positive childhood ALL. Our results also suggest that germline genetic variation can act as a risk factor for childhood ALL in general or functions restricted to specific subtypes, only.