Original Article

Leukemia (2012) 26, 2212–2215; doi:10.1038/leu.2012.89; published online 24 April 2012

Acute Leukemias

Common genetic variation contributes significantly to the risk of childhood B-cell precursor acute lymphoblastic leukemia

V Enciso-Mora1, F J Hosking1, E Sheridan2, S E Kinsey3, T Lightfoot4, E Roman4, J A E Irving5, I P M Tomlinson6, J M Allan5, M Taylor7, M Greaves8 and R S Houlston1

  1. 1Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, UK
  2. 2Yorkshire Regional Genetic Service, St James’s University Hospital, Leeds, UK
  3. 3Department of Paediatric and Adolescent Oncology and Haematology, St James’s University Hospital, Leeds, UK
  4. 4Epidemiology and Genetics Unit, Department of Health Sciences, University of York, York, UK
  5. 5Northern Institute for Cancer Research, Newcastle University, Framlington Place, Newcastle upon Tyne, UK
  6. 6Wellcome Trust Centre, University of Oxford, Oxford, UK
  7. 7Cancer Immunogenetics Group, School of Cancer and Enabling Sciences, University of Manchester, St Mary's Hospital, Manchester, UK
  8. 8Division of Molecular Pathology, The Institute of Cancer Research, Sutton, UK

Correspondence: Dr RS Houlston, Division of Genetics and Epidemiology, Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey SM2 5NG, UK. E-mail: richard.houlston@icr.ac.uk.

Received 12 December 2011; Revised 22 February 2012; Accepted 8 March 2012
Accepted article preview online 29 March 2012; Advance online publication 24 April 2012



Recent genome-wide association studies (GWAS) have provided the first unambiguous evidence that common genetic variation influences the risk of childhood B-cell precursor acute lymphoblastic leukemia (BCP-ALL), identifying risk single-nucleotide polymorphisms (SNPs) localizing to 7p12.2, 9p21.3, 10q21.2 and 14q11.2. The testing of SNPs individually for an association in GWA studies necessitates the imposition of a very stringent P-value to address the issue of multiple testing. While this reduces false positives, real associations may be missed and therefore any estimate of the total heritability will be negatively biased. Using GWAS data on 823 BCP-ALL cases by considering all typed SNPs simultaneously, we have calculated that 24% of the total variation in BCP-ALL risk is accounted for common genetic variation (95% confidence interval 6–42%). Our findings provide support for a polygenic basis for susceptibility to BCP-ALL and have wider implications for future searches for novel disease-causing risk variants.


heritability; acute lymphoblastic leukemia; pediatric



Acute lymphoblastic leukemia (ALL) is the most commonly diagnosed pediatric cancer in developed countries.1, 2 B-cell precursor (BCP) ALL accounts for approximately 70% of childhood ALL and characteristically affects children between 3 and 5 years of age.

Evidence for an inherited genetic predisposition to ALL is provided by the high risk associated with Bloom's syndrome, neurofibromatosis, ataxia telangiectasia and constitutional trisomy 21 (collectively <5% of ALL).3 A heritable basis for susceptibility to ALL outside these syndromes is presently largely undefined. Recent genome-wide association studies (GWAS) have provided the first unambiguous evidence for common genetic susceptibility to ALL.4, 5, 6 These studies have robustly shown that single-nucleotide polymorphisms (SNPs) annotating the IKZF1 (7p12.2), CDKN2A (9p21.3), ARID5B (10q21.2) and CEBPE (14q11.2) genes influence disease risk (that is, P-values for associations <5.0 × 10−8).4, 5, 6 While each SNP only has a modest effect on an individual developing ALL, relative risks are among the strongest cancer associations identified by GWAS.7 This is compatible with a scenario in which the inherited susceptibility to this malignancy has a strong polygenic basis.

Although the tagging SNPs used in GWAS typically capture ~80% of common variation in the genome, testing SNPs individually for an association in GWAS necessitates the imposition of a very stringent P-value to address the issue of multiple testing. While this reduces the occurrence of false positives it may result in true associations being missed, especially if individual SNPs have a small effect. Thus any overall estimate of the total heritability, that is, the proportion of the ALL risk ascribable to genetic variation, will be negatively biased. An alternative approach is to fit all the SNPs simultaneously; the effects of the SNPs are treated statistically as random effects, and the variance explained by all the SNPs together is estimated. The variance calculated in this way can be used to provide an unbiased estimate of the heritability explained by all SNPs.8 Here, we apply this methodology to a GWAS of BCP-ALL to enumerate the difference in individual’s risk accounted for by common genetic variation.


Subjects and methods


Cases analyzed had been diagnosed with BCP-ALL and have been the subject of GWAS of childhood ALL we have previously reported.4, 6 Briefly, we analyzed 824 pediatric BCP-ALL patients ascertained from the United Kingdom (UK; 464 male, 360 female; mean age at diagnosis 5.4 years, s.d.=3.6), derived from the United Kingdom Childhood Cancer study9 (UKCCS), the UK Medical Research Council (MRC) ALL 97 (99) trial and from the Northern Institute of Cancer Research (NICR). Genotyping of patient samples was undertaken using Illumina Infinium HD Human 370 Duo BeadChips according to the manufacturer's protocols (Illumina, San Diego, CA, USA). For controls we used Illumina Hap550K BeadChip genotype data, which were publicly accessible for 1438 individuals from the 1958 Birth Cohort (58C, also known as the National Child development study)10 and which were generated on 960 healthy UK individuals as part of a study of colorectal cancer.11


Collection of blood samples and clinical information from subjects was undertaken with informed consent and relevant ethical review board approval in accordance with the tenets of the Declaration of Helsinki.

Quality control of SNP genotyping

We have previously confirmed an absence of systematic genetic differences between cases and controls, and shown no evidence of population stratification in these sample sets.4, 6 Artefactual differences in allele frequencies between cases and controls can contribute to the estimation of spurious genetic variation; therefore, for the current analysis we imposed a number of additional quality control measures to the data set as advocated by Lee et al.8 when estimating heritability. Using PLINK software12 we excluded SNPs in cases and controls that had a minor allele frequency (MAF)<0.01 or a Hardy–Weinberg equilibrium test with P<0.05. Performing a differential missingness test between the cases and controls, we excluded those SNPs with P<0.05. In addition, we excluded individuals having a relatedness score of >0.05. To investigate any potential overrepresentation of associations we generated quantile–quantile plots of Cochran-Armitage trend tests of association. The inflation factor λ was based on the 90% least significant SNPs.

Statistical analysis

Statistical analyses were performed using the methodology of Yang et al.13 and Lee et al.8 This method provides an estimate of the variance explained by all the SNPs in the data set while accounting for linkage disequilibrium (LD) between the genotyped SNPs and unknown causal variants (that is, correlations between SNP genotypes). Briefly, the method fits a linear mixed model of the form: y=μ+g+e, whereby y is the vector of disease status, μ is the mean vector, g is a vector of random additive genetic effects obtained from SNP data and e is a vector of residual effects. The covariance structure fitted in the data is the individual relationship estimated from the SNPs; Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author where Ajk is the genetic relationship between individuals j and k derived from the SNPs, Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is the additive genetic variance and Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is the residual variance. Under this model disease heritability, Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author is defined by: Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author For quantitative traits the scale of measurement is the same as the scale on which heritability is expressed. In a case–control study phenotypes are measured on the 0–1 scale. The relationship between observations on the observed scale and liabilities on the unobserved continuous scale are modeled through a liability threshold model, that is, BCP-ALL liability on the underlying scale follows a standard normal distribution whereby if liability exceeds a certain threshold, t, then individuals will be affected. The estimate of variance explained by the SNPs on the observed 0–1 scale is linearly transformed to that on the unobserved continuous liability scale such that Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author where K is the prevalence of the disease and z is the value of the standard normal probability density function at the threshold t. The incidence of BCP-ALL is 30–45 per 106 per year.1 As this translates to a cumulative risk of ~1 in 2000 we set the prevalence of ALL to be 1 in 2000. The relationship between additive genetic variance on the observed 0–1 and unobserved liability scales is extended to account for ascertainment bias in a case–control study.8 Estimation of the additive genetic variance was performed using restricted maximum likelihood via genome-wide complex trait analysis (GCTA) software.14 The MAF spectrum of the unobserved causal variants may be different than that of the genotyped SNPs. We thus followed the procedure in Yang et al.13 to adjust the crude heritability estimate, Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author to account for missing LD between the genotyped SNPs and unknown causal variants. SNPs were randomly assigned into two groups with one of the groups being treated as representing ‘true’ causal variants. The covariance between both groups is therefore reflective of the true variance of relatedness between individuals, while the variance derived from the SNP group equals the variation of relatedness plus estimation error. The prediction error can therefore be derived by regressing the relationships of the ‘true’ causal variants on the SNPs. As advocated, we calibrated the prediction error using data on SNPs representing causal variants having MAF<0.1.13

We made use of receiver operator characteristic curve analysis to estimate the proportion of the genetic variance on the liability scale attributable to 7p12.2, 9p21.3, 10q21.2 and 14q11.2 SNPs. The area under the curve provides a means of classifying diseased and non-diseased individuals, and can be used to estimate the proportion of the genetic variance on the liability scale attributable to loci.15



We restricted our analysis to SNPs mapping to the autosomes and following quality control filtering a total of 247761 SNPs common to 823 cases and 2194 controls were available for analysis.

We investigated the impact of SNP missingness on the estimates of variance explained by SNPs globally for BCP-ALL by imposing varying thresholds for SNP missingness. As the threshold imposed for missingness became more stringent, the number of SNPs reduced from 247761 to 202089, allowing for only three missing genotypes (that is, 0.001 of SNPs; Table 1) and with this the crude proportion of variance estimate drops from 0.30 to 0.22 (Table 1). This decline is in part a consequence of the reduced number of SNPs, and therefore the total proportion of the genome tagged, on which the estimate is based rather than a genotype missingness artifact per se. With adjustment for incomplete LD, the proportion of the variance estimate fell from 0.59 to 0.46 (Table 1). After transforming the data to account for prevalence and ascertainment on the liability scale, the variance in liability of BCP-ALL explained by the SNPs ranged from 0.24 to 0.18 (Table 1). Comparable results were obtained when the analysis was restricted to SNPs having a MAF>0.05 (Table 1). The heritability of 0.24 obtained translates to a sibling relative risk of 3.93 being associated with common genetic variation.

To determine the impact of the known loci on the heritability associated with common variation we derived the receiver operator characteristic associated with 7p12.2, 9p21.3, 10q21.2 and 14q11.2 at SNPs rs4132601, rs3731217, rs7089424 and rs2239633, respectively. The area under the curve associated with the variants was 0.64, which translates into them contributing 8% of the genetic variance and 5% of the associated sibling relative risk.



These data are compatible with polygenic susceptibility to ALL mediated through common SNPs (that is, those with MAFs>0.05) in strong LD with functional variants influencing the risk of developing ALL.

The magnitude of the estimated heritability is such that this polygenic susceptibility equates to a 3.93-fold increase in risk in siblings of BCP-ALL cases (absolute risk of ~0.2%). This accords with observations from the Swedish family-cancer database, which reported a 2.9-fold increased sibling relative risk, independent of the high concordance in monozygotic twins, which has a non-genetic, in utero explanation.16

The heritability estimated in our analysis is simply the additive variance as a proportion of the phenotypic variance. Thus, it does not include non-additive genetic variance (gene–gene interactions or dominance effects) or gene-environment interactions impacting on ALL risk. It has recently been proposed that epistatic gene–gene interactions may have a significant role in mediating the development of complex traits and underscore ‘phantom heritability’,17 that is, the apparent missing heritability from purely additive genetic effects. Moreover, given the evidence, albeit indirect, for a role for infectious exposure in relation to ALL risk it is likely that substantive gene-environment effects operate.

Our findings not only provide quantification of the impact of common variation on BCP-ALL risk, but also provide a strong rationale for continuing to search for additional novel risk variants through GWAS-based strategies. Thus far, GWAS of ALL, including the parent study on which this current analysis is based, have identified four independent loci shown conclusively to be associated with ALL, and more specifically BCP-ALL risk—7p12.2 (IKZF1), 9p12 (CDKN2A/CDKN2B), 10q21.2 (ARID5B) and 14q11.2 (CEBPE).4, 5, 6 While the risk of ALL associated with these common variants is not insignificant (RRs of 1.5–1.6), collectively they only underscore ~8% of the genetic variance in BCP-ALL risk.

The power of the two reported GWA studies of ALL over a range of allele frequencies and relative risks is shown in Figure 1. The power of both GWAS to identify common alleles conferring relative risks of 1.5 or greater (such as the 7p12.2 variant) is high. Hence, there are unlikely to be many additional SNPs with similar effects for alleles with frequencies greater than 0.3 in populations of European ancestry. In contrast, both GWA studies had low power to detect alleles with smaller effects and/or MAF<0.1. Evidence for the existence of additional risk variants for BCP-ALL with such characteristics is provided by Q–Q plots of test statistics from case–control analysis of our data set (Figure 2). The adequacy of case–control matching and the possibility of differential genotyping of cases and controls is precluded by the genomic inflation score of 1.03 rendering cryptic population substructure or differential genotype calling between cases and controls unlikely. This clearly shows that there is inflation of the test statistics at the upper tail of the distribution (P<10−4), even after exclusion of SNPs mapping to the 7p12.2, 9p12, 10q21.2 and 14q11.2 (Figure 2). It is, therefore, likely that additional common low-risk variants remain to be discovered and should be eminently harvestable in new larger GWAS or through pooling of existing data sets. However, most additional risk variants yet to be discovered are likely to have more modest effect on ALL risk than the 7p12.2 and 10q21.2 variants, which are associated with relative risks of >1.3 per allele.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Power to identify risk loci for acute lymphoblastic leukemia for relative risks of 1.3, 1.5 and 1.7 at P=5.0 × 10−7 calculated using GWA power.19 (a) Study of 441 cases and 17958 reported by Trevino et al. using Affymetrix 500K arrays. (b) Current study based on 823 BCP-ALL cases and 2194 controls. Figures are annotated with power to detect the currently identified risk loci—IKZF1 (7p12.2), CDKN2A (9p21.3), ARID5B (10q21.2) and CEBPE (14q11.2). Green, red and blue lines represent relative risks of 1.3, 1.5 and 1.7, respectively.

Full figure and legend (69K)

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Q–Q plot of test statistics (χ2) for association with BCP-ALL. The plot in blue shows test statistics for the whole data set. The plot in green shows test statistics excluding SNPs mapping to the four previously identified risk loci at 7p12.2, 9p21.3, 10q21.2 and 14q11.2. The black line represents the null hypothesis of no association.

Full figure and legend (48K)

Our analysis of heritability is derived from the analysis of ~250000 SNP genotypes. It is possible that some disease-causing variants that are very rare have a substantive effect on BCP-ALL risk; however, it is unlikely that any such mutations will explain a large proportion of the variance in risk. While the genetic variance could be mediated by a large number of risk variants with MAF<0.1, there is no reason to believe that a significant component of the missing variance is solely explained by a restricted number of high-risk variants. Higher-density SNP genotyping would, however, provide a higher probability of LD with functional disease-causing variants, thus potentially affording the capturing of a higher proportion of the genetic variance—provided the characteristics of disease-causing variants do not differ systematically from the genotyped SNPs (for example, because of lower MAF). Analysis of additional ongoing GWAS of BCP-ALL, which are based on higher-density array technology are therefore likely to be informative in refining estimates of heritability.

Further advancements are likely to be made following the establishment of large consortia,18 including the Childhood Leukemia International Consortium. Such initiatives not only provide the basis for GWA studies with increased sample size, SNP coverage and number of SNPs taken forward to large-scale replication to aid in the identification of additional novel risk variants, but also facilitate pooling studies of existing GWAS data to importantly improve the standard error of the point estimate of the heritability of BCP-ALL.

In this study, we have restricted our analysis to the commonest form of childhood ALL, BCP-ALL in order to focus on a relatively homogeneous subset of ALL. Thus, there are limitations on the interpretation of data obtained in our study in terms of the generalizability to other forms of ALL and to subtypes of BCP-ALL. However, our findings have provided further support for a polygenic basis to susceptibility to BCP-ALL and, moreover, have yielded additional evidence for the existence of BCP-ALL-associated SNPs that remain to be identified.


Conflict of interest

The authors declare no conflict of interest.



  1. Greaves M. Infection, immune responses and the aetiology of childhood leukaemia. Nat Rev Cancer 2006; 6: 193–203. | Article | PubMed | ISI | CAS |
  2. Kaatsch P. Epidemiology of childhood cancer. Cancer Treat Rev 2010; 36: 277–285. | Article | PubMed | ISI |
  3. Hodgson S, Maher E. A Practical Guide to Human Cancer Genetics. Cambridge University Press: Cambridge, 1999, pp 116–122.
  4. Papaemmanuil E, Hosking FJ, Vijayakrishnan J, Price A, Olver B, Sheridan E et al. Loci on 7p12.2, 10q21.2 and 14q11.2 are associated with risk of childhood acute lymphoblastic leukemia. Nat Genet 2009; 41: 1006–1010. | Article | PubMed | ISI | CAS |
  5. Trevino LR, Yang W, French D, Hunger SP, Carroll WL, Devidas M et al. Germline genomic variants associated with childhood acute lymphoblastic leukemia. Nat Genet 2009; 41: 1001–1005. | Article | PubMed | CAS |
  6. Sherborne AL, Hosking FJ, Prasad RB, Kumar R, Koehler R, Vijayakrishnan J et al. Variation in CDKN2A at 9p21.3 influences childhood acute lymphoblastic leukemia risk. Nat Genet 2010; 42: 492–494. | Article | PubMed | ISI | CAS |
  7. Fletcher O, Houlston RS. Architecture of inherited susceptibility to common cancer. Nat Rev Cancer 2010; 10: 353–361. | Article | PubMed | ISI |
  8. Lee SH, Wray NR, Goddard ME, Visscher PM. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 2011; 88: 294–305. | Article | PubMed | ISI | CAS |
  9. UK Childhood Cancer Study Investigators. The United Kingdom Childhood Cancer Study: objectives, materials and methods. Br J Cancer 2000; 82: 1073–1102. | Article | ISI |
  10. Power C, Elliott J. Cohort profile: 1958 British birth cohort (National Child Development Study). Int J Epidemiol 2006; 35: 34–41. | Article | PubMed | ISI |
  11. Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet 2007; 39: 984–988. | Article | PubMed | ISI | CAS |
  12. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575. | Article | PubMed | ISI | CAS |
  13. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet 2010; 42: 565–569. | Article | PubMed | ISI | CAS |
  14. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 2011; 88: 76–82. | Article | PubMed | ISI | CAS |
  15. Wray NR, Yang J, Goddard ME, Visscher PM. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet 2010; 6: e1000864. | Article | PubMed | CAS |
  16. Hemminki K, Jiang Y. Risks among siblings and twins for childhood acute lymphoid leukaemia: results from the Swedish Family-Cancer Database. Leukemia 2002; 16: 297–298. | Article | PubMed | CAS |
  17. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci USA 2012; 109: 1193–1198. | Article | PubMed |
  18. Sherborne AL, Hemminki K, Kumar R, Bartram CR, Stanulla M, Schrappe M et al. Rationale for an international consortium to study inherited genetic susceptibility to childhood acute lymphoblastic leukemia. Haematologica 2011; 96: 1049–1054. | Article | PubMed |
  19. Spencer CC, Su Z, Donnelly P, Marchini J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet 2009; 5: e1000477. | Article | PubMed | CAS |


Leukaemia and Lymphoma Research (UK) and the Kay Kendall Leukaemia Fund provided principal funding for the study. Support from Cancer Research UK (C1298/A8362 supported by the Bobby Moore Fund) is also acknowledged. This study made use of control genotyping data generated by the Wellcome Trust Case–Control Consortium. We acknowledge the use of genotype data from the British 1958 Birth Cohort DNA collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113 and 085475. We are grateful to all the patients and individuals for their participation and we would also like to thank the clinicians, other hospital staff and study staff who contributed to the blood sample and data collection for this study.


The URLs for data presented herein are as follows:
PLINK: http://pngu.mgh.harvard.edu/purcell/plink/
British 1958 birth cohort: http://www.b58cgene.sgul.ac.uk
Ilumina: http://www.illumina.com/
Genome-wide Complex Trait Analysis (GCTA): http://gump.qimr.edu.au/gcta/
Childhood Leukemia International Consortium: https://ccls.berkeley.edu/clic


RSH designed the study; RSH and VE-M drafted the manuscript; VE-M and FJH performed statistical analyses; EP oversaw laboratory analyses; ES and SEK performed curation and sample preparation of MRC ALL 97 trial samples; TL and ER managed and maintained UKCCS sample data; MT performed curation and sample preparation of UKCCS samples; JMA and JAEI performed ascertainment, curation and sample preparation of Northern Institute for Cancer Research case series; RSH and MG obtained funding and designed parent project.