Genome-wide association study identi ﬁ es susceptibility loci for B-cell childhood acute lymphoblastic leukemia

Genome-wide association studies (GWAS) have advanced our understanding of susceptibility to B-cell precursor acute lymphoblastic leukemia (BCP-ALL); however, much of the heritable risk remains unidenti ﬁ ed. Here, we perform a GWAS and conduct a meta-analysis with two existing GWAS, totaling 2442 cases and 14,609 controls. We identify risk loci for BCP-ALL at 8q24.21 (rs28665337, P = 3.86 × 10 − 9 , odds ratio (OR) = 1.34) and for ETV6-RUNX1 fusion-positive BCP-ALL at 2q22.3 (rs17481869, P = 3.20 × 10 − 8 , OR = 2.14). Our ﬁ ndings provide further insights into genetic susceptibility to ALL its

Statistical modeling of GWAS data indicates that much of the heritable risk of ALL ascribable to common genetic variation remains to be discovered [5][6][7][8][9] . To gain a more comprehensive insight into predisposition to ALL we performed a meta-analysis of two previously published GWAS and a new GWAS together totaling 2442 cases and 14,609 controls. We report two previously unidentified risk loci, providing further insights into the genetic and biological basis of this disease.

Results
Association analysis. We analyzed data from three studies of European ancestry: a new GWAS from the United Kingdom-UK GWAS II, and two previously reported GWAS-UK GWAS I and a German GWAS (Supplementary Figs. 1, 2 and Supplementary  Table 1). After imposing pre-determined (see "Methods") quality metrics to each of the three GWAS, the studies provided genotype data on 2442 cases and 14,609 controls. To increase genomic resolution, we imputed >10 million SNPs using whole-genome reference genotype data from 1000 Genomes Project (n = 1092) 11 and UK10K (n = 3781) 12 . Quantile-quantile plots of SNPs (minor allele frequency (MAF) > 0.01) post-imputation showed no evidence of substantive over-dispersion introduced by imputation (genomic inflation 13 λ for UK GWAS I, UK GWAS II, and German GWAS were 1.02, 1.05, and 1.01, respectively; Supplementary Fig. 3) 6,7 .
The 8q24.21 variant rs28665337 maps 35 kb 3′ of the long intergenic non-coding RNA 977 (LINC00977, Fig. 2). The 8q24.21 region harbors variants associated with multiple cancers, including colorectal, prostate, bladder cancer also B-cell malignancies such as diffuse large B-cell lymphoma, Hodgkin lymphoma, and chronic lymphocytic leukemia (Supplementary Table 5). The linkage disequilibrium (LD) blocks delineating these cancer risk loci are distinct from the 8q24.21 BCP-ALL association signal suggesting this risk locus is unique to BCP-ALL (pairwise LD metrics r 2 < 0.2; Supplementary Table 5). rs17481869 maps to an intergenic region at 2q22.3 with no candidate gene nearby (Fig. 2).
Relationship between SNP genotype and patient outcome. We examined the relationship between SNP genotype and patient outcome using data from UK GWAS II and German GWAS. Neither rs28665337 or rs17481869 showed a consistent association with either event-free survival (EFS) or risk of relapse, even when stratified by ETV6-RUNX1 status (Supplementary Table 6).
Functional annotation of risk loci. To gain insight into the biological basis underlying the association signals at these as well as previously identified risk loci, we examined the epigenetic landscape of BCP-ALL risk loci genome wide. For each risk locus we evaluated profiles of three histone marks of active chromatin We used summary-level Mendelian randomization (SMR) analysis to test for concordance between GWAS and cis-eQTLassociated SNPs with all correlated SNPs (r 2 > 0.8) within 1 Mb of the lead SNP at each locus (Supplementary Tables 8 and 9) deriving b XY statistics, which estimate the effect of gene expression on childhood ALL risk. This analysis showed variation in the expression of CDKN2B, FAM53B, FIGNL1, and PIP5K2A were associated with risk loci ( Supplementary Fig. 5, Supplementary Tables 8 and 9). Eight gene probes exceeded the P SMR threshold of 1.3 × 10 −4 , of which two genes passed the HEIDI test for heterogeneity (P HEIDI > 0.05). In whole blood-derived tissue, the 10q26.13 locus was associated with FAM53B expression and the 10p12.  Table 9). Following from SMR analysis we also investigated whether the most strongly associated SNP at each risk locus, individually, was associated with the expression of genes within a 2 MB window to ensure capture of long range interactions. This provided evidence for a relationship between the 8q24.21 risk allele (rs28665337) and increased expression of MYC (t-test, P = 7.20 × 10 −4 ; Supplementary Fig. 6, Supplementary Table 10), and the 2q22.3 risk allele (rs17481869) with decreased GTDC1 expression (t-test, P = 0.037; Supplementary Fig. 6, Supplementary Table 10). Since chromatin looping interactions are fundamental for regulation of gene expression, we interrogated physical interactions at respective genomic regions defined by rs28665337 and rs17481869 in GM12878 lymphoblastoid and H1 human embryonic stem (ES) cells using Hi-C data. Acknowledging limitations that these cell types may not fully reflect ALL biology, the regions containing rs28665337 and rs17481869 show significant chromatin looping interactions with the promotor regions of MYC in ES cells and GTDC1 in GM12878, respectively (Fit-Hi-C test 18 ,Supplementary Figs. 7,8).
HLA alleles and risk. A relationship between variation within the major histocompatibility complex (MHC) region and risk of ALL has long been speculated [19][20][21][22][23][24][25][26] . However, most studies have failed to address the complex LD patterns within the MHC or issues relating to population stratification. In view of the inconsistencies and limitations of published studies we conducted a more rigorous analysis. Specifically, we investigated a possible relationship between BCP-ALL risk and HLA alleles by imputing the 6p21 region using the Type I Diabetes Genetics Consortium (T1DGC) as reference [27][28][29] . The strongest association from a combined analysis of all three GWAS was provided by SNP rs9469021, which maps 167 Kb centromeric to HLA-B (combined P = 3.5 × 10 −3 ; frequentist test of association using SNPTEST); this association was, however, not significant after correcting for multiple testing.
Impact on heritable risk. Using genome-wide complex trait analysis (GCTA) 30-32 the heritability of BCP-ALL accounted for by common variants was estimated to be 0.16 (±standard error (S. E.) 0.03, REML analysis P meta = 4.25 × 10 −8 ) with little evidence for subtype difference (0.18 ± S.E. 0.05 and 0.20 ± S.E. 0.08 for hyperdiploid and ETV6-RUNX1-positive BCP-ALL, respectively). The 11 known susceptibility variants account for 34% of the familial risk (Supplementary Table 11). The impact of BCP-ALL SNPs are among the strongest GWAS associations of any malignancy, raising the possibility of clinical utility for risk prediction. To examine this, we generated polygenic risk scores (PRS) based on the composite effect of all risk SNPs assuming a log-normal relative risk distribution. Using this approach for all risk SNPs, individuals in the top 1% of genetic risk had a 7.5-fold relative risk of BCP-ALL ( Supplementary Fig. 9). The individual risk discrimination provided by the variants is shown in the receiver-operator characteristic (ROC) curves with the area under the curve (AUC) being 0.73 ( Supplementary Fig. 10).

Discussion
The evidence for the two risk loci we report has been based on a meta-analysis of three independent GWAS data sets. While the combined association P-values for each risk locus is genome-wide significant with each series providing support for association we acknowledge that we did not provide additional replication. For rare cancers such as childhood ALL, ascertaining case series which are appropriately ethnically matched and are sufficiently powered to provide independent replication is inherently problematic. Moreover as exemplified by the 10q21 and 10p14 risk loci, associations can be highly subtype-specific which adds to the difficulty in obtaining appropriate replication series. Accepting such caveats our analysis provides evidence for the existence of two additional risk loci for childhood BCP-ALL at 2q22.3 and 8q24.21.
We did not observe an association between risk SNPs at either 2q22.3 and 8q24.21 with patient survival. This is consistent with the impact of risk variants operating at an early stage of ALL evolution rather than disease progression per se. We acknowledge this analysis only has power to demonstrate a 10% difference in patient outcome. To robustly determine the relationship between genotype and outcome requires larger patient cohorts.
Given the existence of different subtypes of BCP-ALL, presumably reflecting the different etiology and evolutionary trajectories, it is perhaps not surprising that some SNPs display subtype-specific effects. Notable in this respect are the 10q21.2 and 10p14 variants that specifically influence high-hyperdiploid BCP-ALL 33 and Ph-like ALL 10 , respectively. As with 7p12.2, 9p21.3, 10p12.2, 14q11.2, and the currently identified 8q24.21 locus has generic effects on the risk of BCP-ALL. In contrast the 2q22.3 association was highly specific for ETV6-RUNX1-positive BCP-ALL.
Deregulation of MYC has been reported in ALL, in some instances as a consequence of chromosomal rearrangement 34 . Studies in other cancers have shown that disease-specific risk loci at 8q24.21 lie within tissue-specific enhancers interacting with MYC or PVT1 promotors. Furthermore, recent Hi-C analysis of this region has demonstrated a complicated 3D structure implicating various lncRNAs in mediating risk 35 . Hence, it is plausible that the susceptibility to ALL has a similar mechanistic basis, brought about through involvement of the lincRNA 00977.
Risk conferred by rs17481869 (2q22.3) was specific to ETV6-RUNX1-positive BCP-ALL. The SNP association is intergenic with no obvious candidate gene in the vicinity, presently hindering the suggestion of testable hypotheses regarding its functional basis. eQTL data does, however, provide evidence implicating GTDC1. GTDC1 encodes a glucosyltransferase whose expression is relatively high in peripheral blood leukocytes 36 . Chromosomal rearrangements of MLL (mixed lineage leukemia) genes are associated with infant leukemia and intriguingly GTDC1 has been identified as a 3′ MLL fusion partner in acute leukemia 37 .
Most cancer GWAS risk loci map to non-coding regions of the genome and in-so-far as they have been deciphered their functional basis has been attributed to changes in regulatory regions influencing gene expression 33,38,39 . The finding that the current and previously identified risk SNPs show a propensity to map within regions of B-cell active chromatin is consistent with such a model of disease susceptibility in ALL. It is therefore noteworthy that SMR analysis revealed significant relationships between 10p12.2 risk variants and PIP4K2A expression and 10q26.13 risk variants and FAM53B expression suggesting a mechanism for these associations.
Our analysis sheds further light on inherited predisposition to childhood ALL. Functional characterization of risk loci identified should provide additional insight into the biological and etiological basis of this malignancy. While the power of our metaanalysis to identify common variants loci (MAF > 0.2) associated with relative risks ≥ 1.2 was around 80%, we acknowledge that we had low power to detect alleles conferring more moderate effects or were present at low frequency. By inference, these types of variant may be responsible for a larger proportion of the heritable risk of ALL. Hence, a large number of risk SNPs may as yet be unidentified. Finally, as we have demonstrated, considering ALL subtypes individually should reveal additional specific risk variants.

Methods
Ethics. The ascertainment patient samples and associated clinical information was conducted with informed consent according to ethical board approval. Specifically, ethical committee approval was obtained for Medical Research Council UKALL97/ 99 trial by UK therapy centers and approval for UKALL2003 from the Scottish Multi-Centre Research Ethics Committee (REC:02/10/052) 40,41 . Additionally ethical approval was granted by the Childhood Leukemia Cell Bank, the United Kingdom Childhood Cancer Study, and University of Heidelberg.
Published GWAS samples. The United Kingdom (UK) GWAS I and German GWAS have been previously published 6,7 . In summary, UK GWAS I comprised (numbers post quality control (QC)) 824 BCP-ALL cases (360 female, average age at diagnosis 5.5 years) genotyped using Human 317K arrays (Illumina, San Diego; http://www.illumina.com); control genotypes were obtained from 2699 individuals from the 1958 British Birth Cohort (Hap1.2M-Duo Custom array data) and 2501 from the UK Blood Service produced by the Wellcome Trust Case Control Consortium 2 (http://www.wtccc.org.uk/; 51% male) 42 40,41 (338 cases, 160 females, mean age: 4.9 years) obtained from the Bloodwise Childhood Leukemia Cell Bank (www. cellbank.org). DNA was extracted from cell pellets by standard ethanol precipitation methods. Samples were then genotyped on an Infinium OncoArray-500K BeadChip from Illumina comprising a 250K SNP genome-wide backbone and a 250K custom content selected across multiple consortia within COGS (Collaborative Oncological Gene-Environmental Study). OncoArray genotyping was carried out in accordance with the manufacturer's recommendations by the High-Throughput Genomics Group, Oxford Genomics Center. Prior to genotyping DNA samples were quantified by Quant-iT PicoGreen (Thermo Fisher Scientific, MA, USA), normalized and 50 ng/μl aliquots plated in 96 deep-well plates. Post QC we obtained genotype data for 784 cases (365 female; mean age at diagnosis 5.3 years). Controls consisted of: (1) 2976 cancer-free, men ascertained by the PRACTICAL Consortium; (2) 4446 cancer-free women from the UK through the Breast Cancer Association Consortium. All controls were genotyped on Infinium OncoArray-500K BeadChip arrays.
Statistic and bioinformatics analysis of GWAS data sets. Analyses and/or data management were undertaken using R v3.2.3 (R Core Team 2013; http://www.Rproject.org/) 72 , PLINK v1.9 43 , and SNPTEST v2.5.2 software 44 . GenomeStudio software (Illumina, San Diego; Available at: http://www.illumina.com) was used to extract genotypes from raw data. QC of all GWAS data sets was performed as suggested by Anderson et al 45 . PLINK v1.9 43 was used for conducting the sample and SNP QC steps. Specifically, individuals with low call rate (<95%) as well as all individuals with non-European ancestry (using the HapMap version 2 CEU, JPT/ CHB, and YRI populations as a reference) were excluded using the smartpca package, part of EIGENSOFT v4.2 46,47 . SNPs with a call rate <95% were excluded as were those with a MAF < 0.01 or displaying significant deviation from Hardy-Weinberg equilibrium (i.e., P < 10 −5 ). The adequacy of case-control matching and possibility of differential genotyping of cases and controls were formally evaluated using QQ plots of test statistics. The inflation factor λ was calculated by dividing the median of the test statistics by the median expected values from a χ 2 distribution with 1 degree of freedom. Q-Q plots were generated and inflation factors estimated using R. Uncorrected and pre imputation QQ plots of UK GWAS I, UK GWAS II, and German GWAS showed λ values of 1.01, 1.05, and 1.10, respectively. Prior to imputation the data sets were pre-phased by In order to account for genomic inflation post imputation in the German data set, eigenvectors were inferred using the "smartpca" component within EIGENSOFT v2.4 and adjustment was carried out by including the first two eigenvectors as covariates in SNPTEST during association analysis 46,47 . The inflation factor λ and λ 1000 was again calculated for all SNPs post imputation, QC 13,50 . The association between each SNP and risk was calculated using SNPTEST assuming an additive model using a "-frequentist" test and applying a default genotype calling probability threshold of 0.9. Where applicable the first two eigenvectors were used as covariates in the association analyses for that data set. ORs and 95% CIs were obtained from the beta values and standard errors obtained from the SNPTEST output. Meta-analyses were performed using META v1.7 51 pooling the beta values and standard error for SNPs from each GWAS data sets. Association meta-analyses only included markers with info scores >0.8, imputed call rates/SNP >0.9, and MAFs > 0.01. Collectively the three GWAS provided genotype data on 2442 cases (mean age at diagnosis 5.6 years; 54% male) and 14,609 controls (45% male) with data for 6,755,715 SNPs 6,7,9 . We calculated Cochran's Q statistic to test for heterogeneity and the I 2 statistic to quantify the proportion of the total variation that was caused by heterogeneity 52 . LD metrics were calculated in PLINK 43 and vcftools 53 using UK10K genomic data. LD blocks were defined on the basis of HapMap recombination rate, as defined by using the Oxford recombination hotspots, and on the basis of distribution of CIs 54,55 . Association plots were generated using visPIG 14 .
HLA imputation. Classical HLA alleles were imputed, both common and rare (A, B, C, DQA1, DQB1, DRB1) and coding variants across the HLA region using SNP2HLA 29 . The imputation was based on a reference panel from the T1DGC consisting of genotype data from 5225 individuals of European descent with genotyping data of 8961 common SNPs and indel polymorphisms across the HLA region, and four digit genotyping data of the HLA class I and II molecules. This reference panel has been used previously and showed high imputation quality for the HLA regions in other studies [27][28][29] . Individual GWAS studies were imputed at the 6p21 region and meta-analyzed to identify significant HLA risk alleles. A significance threshold of 5.7 × 10 −6 was set after Bonferroni correction as the number of SNPs tested was 8654.
Sanger sequencing. To assess the accuracy of imputed genotypes, a random series of samples was Sanger sequenced using BigDye ® Terminator v3.1 Cycle Sequencing Kit (Life Technologies, CA, USA) and analyzed using a ABI 3700xl sequencer (Applied Biosystems, CA, USA). Oligonucleotide primer sequences are provided in Supplementary Table 12.
Chromatin mark enrichment analysis. To assess for an over-representation of markers for open chromatin the variant set enrichment method of Cowper-Sal Lari et al. was adapted 56 . For each risk locus, SNPs in LD were defined (i.e., R 2 > 0.8 and D′ > 0.8), and termed associated variant set (AVS). Transcription factor ChIP-Seq broad peak data were obtained from the ENCODE project for 14 cell lines for H3K27ac, H3k4me1, and H3K4me3 chromatin signatures. ChIP-Seq broad peak data for three AML and six childhood ALL cell types were obtained from the Blue-Print Epigenome database (www.blueprint-epigenome.eu) 15 . For each mark, overlap of SNPs in the AVS and the ChIP peak were derived, generating a mapping score. The null hypothesis was tested by scoring randomly chosen SNPs with the same LD structure at the risk-associated SNPs. After 10,000 iterations, approximate P-values were calculated as the proportion of permutations where null mapping score was at least equal to the AVS mapping score. Enrichment was calculated normalizing scores to the median of the null model.
Hi-C analysis. Hi-C analysis was conducted using the HUGIn browser 57 , which is based on the analysis by Schmitt et al 58 . Specifically we analyzed Hi-C data generated on the H1 ES Cells and GM12878 lymphoblastoid cell lines originally described in Dixon et al. 59 and Schmitt et al. 58 , respectively. Plotted topologically associating domains boundaries were obtained from the insulating score method at 40 kb bin resolution 57 . We searched for significant interactions (P-values generated using "Fit-Hi-C" 18 ) between bins overlapping the currently identified ALL risk loci with target genes (e.g., "virtual 4C").
Functional annotation. SNPs in LD (r 2 > 0.8) with the top SNPs from each risk loci were assessed for histone marks in relevant tissue, proteins bound and location were annotated using HaploReg 17 (Supplementary Data 1). eQTL analysis was performed by testing each sentinel SNP with genes 1MB upstream and downstream using the whole blood tissue data available from GTEx portal v6p 60 and Blood eQTL browser 61 (Supplementary Data 1). Methylation quantitative trait loci (mQTL) for all known BCP-ALL risk loci where assessed using the mQTL Database (www.mqtldb.org), which shows the presence of significant methylated CpG sites at various stages of life as described by Gaunt et al 62 .
SMR analysis. SMR analysis was conducted as per Zhu et al. (at http:// cnsgenomics.com/software/smr/index.html) 63 . Publicly available eQTL data was extracted from the whole blood eQTL, Muther consortia, and GTEx16 v6p release portals 60,61,64 . GWAS summary statistics files were generated from the metaanalysis of UK GWAS I, UK GWAS II, and German GWAS data sets. Reference files were generated by merging 1000 genomes phase 3 and UK10K (ALSPAC and TwinsUK) vcfs. Summary eQTL files for the GTEx samples were generated from downloaded v6p "all_SNPgene_pairs" files. BESD files were generated from downloaded SNP-gene eQTL data, which were converted into a query flat file format as mentioned in the SMR online guide (http://cnsgenomics.com/software/ smr) and then using the -make-besd command to make binary versions of the files. Only probes with eQTL P < 5.0 × 10 −8 were considered in the SMR analysis. A threshold for the SMR test of P smr < 1.3 × 10 −4 corresponding to a Bonferroni correction for 38 tests for all the 23 genes within 1 MB of the sentinel risk SNPs in each risk loci (38 gene probes with a top eQTL P < 5 × 10 −8 ). HEIDI test P-values < 0.05 were taken to indicate significant heterogeneity as suggested by Zhu et al. For the two genes passing the thresholds, plots of eQTL and GWAS associations as well as plots of GWAS and eQTL effect sizes were constructed.
Relationship between SNP genotype and survivorship. The relationship between SNP genotype and survival was analyzed in the, German AIEOP-BFM series, MRC ALL 97/99 and the UKALL2003 series. The German series consisted of 834 patients within the AIEOP-BFM 2000 trial 65 . Patients were treated with conventional chemotherapy (i.e., prednisone, vincristine, daunorubicin, l-asparaginase, cyclophosphamide, ifosfamide, cytarabine, 6-mercaptopurine, 6-thioguanine, and methotrexate), a subset of those with high-risk ALL were treated with cranial irradiation and/or stem cell transplantation. Events, for EFS, were defined as resistance to therapy, relapse, secondary cancer, or death. Kaplan-Meier methodology was used to estimate survival rates, with differences between groups tested using the log-rank method (two-sided P-values). Cumulative incidences of competing events were calculated using the methodology of Kalbfleisch and Prentice 66 , and compared using Gray's test 67 . Cox regression analysis was used to estimate hazard ratios and 95% CIs adjusting for clinically relevant covariates.
The full details regarding the recruitment, classification, and treatment of patients on MRC ALL97/99 (1997)(1998)(1999)(2000)(2001)(2002) or UKALL2003 (2003-2011) have been published 41,[68][69][70] . In ALL97, patients were classified as standard or high risk based on the Oxford score. In ALL99 and UKALL2003, patients were initially assigned to regimen A or B based on whether they were NCI standard or high risk. Regimen A comprised a three drug induction followed by consolidation, CNS-directed therapy, interim maintenance, delayed intensification, and continuing therapy. Regimen B patients additionally received a four drug induction and BFM consolidation. Treatment response and cytogenetics were used to re-assign high-risk patients to regimen C to receive augmented BFM consolidation and Capizzi maintenance. In ALL99 and ALL2003, early treatment response was measured by marrow morphology at day 8/15 for regimen B/A-treated patients. In addition, ALL2003 patients were randomized to regimen C if their end of induction minimal residual disease levels-evaluated by real-time quantitative PCR analysis of immunoglobulin and T-cell receptor gene rearrangements-were >0.01%. Survival analysis considered two endpoints: EFS defined as time to relapse, second tumor or death, censoring at last contact; and relapse rate defined as time to relapse for those achieving a complete remission, censoring at death in remission or last contact. Survival rates were calculated and compared using Kaplan-Meier methods and logrank tests. All analyses were performed using Intercooled Stata 13.0 (Stata Corporation, USA).
Contribution of genetic variance to familial risk. Estimation of risk variance associated with each SNP was performed as per Pharoah et al 71 . For an allele (i) of frequency p, relative risk R and log risk r, the risk distribution variance (V i ) is: where E is the expected value of r given by: E = 2p(1-p)r + 2p 2 r For multiple risk alleles the distribution of risk in the population tends toward the normal with variance: The percentage of total variance was calculated assuming a familial risk of childhood ALL of 3.2 (95% CI 1.5-5.9) as per Kharazmi et al 4 . All genetic variance (V) associated with susceptibility alleles is given as √3.2 4 . The proportion of genetic risk attributable to a single allele is: V i /V Eleven risk loci were included in the calculation of the PRS for childhood ALL by selecting the top SNP from the current meta-analysis from each previously published loci in addition to the two risk loci discovered in this study. The eleven variants are thought to act independently as previous studies have shown no interaction between risk loci [6][7][8] . PRS were generated as per Pharoah et al. assuming a log-normal distribution LN(μ, σ 2 ) with mean μ, and variance σ 232 . The population μ was set to σ 2 /2, in order that the overall mean PRS was 1.0. The sibling relative risk were assumed to be 3.2 4 . The discriminatory value of risk SNPs was examined by determining the AUC for the ROC curve.
GCTA to estimate heritability. Since artefactual differences in allele frequencies between cases and controls have the potential to bias estimation genetic variation, additional QC measures were imposed on the GWAS data sets which have been advocated by Lee et al 73 . Typed SNPs were excluded if they had a MAF < 0.01 or a HWE test with P < 0.05. SNPs were also excluded if a differential missingness test between cases and controls was P < 0.05. In addition, individuals were excluded if having a relatedness score of >0.05. Filtering resulted in the 260,127 SNPs in the UK GWAS I and 355,899 SNPs in UK GWAS II data sets, respectively. GCTA (http://cnsgenomics.com/software/gcta/) was employed to estimate the fraction of the phenotypic variance attributed by SNPs given a prevalence of 0.0005 for ALL 30 .