Genome-wide association study identiﬁes multiple susceptibility loci for glioma

,

G liomas account for B40% of all primary brain tumours and cause around 13,000 deaths in the United States of America each year 1 . Gliomas are heterogeneous and different tumour subtypes, defined in part by malignancy grade (for example, pilocytic astrocytoma World Health Organization (WHO) grade I, diffuse 'low-grade' glioma WHO grade II, anaplastic glioma WHO grade III and glioblastoma (GBM) WHO grade IV) can be distinguished 2 . Gliomas are typically associated with a poor prognosis irrespective of clinical care, with the most common type, GBM, having a median overall survival of only 10-15 months 1 .
While the glioma subtypes have distinct molecular profiles resulting from different aetiological pathways 3 , no environmental exposures have, however, consistently been linked to risk except for ionizing radiation, which only accounts for a very small number of cases 1 . Direct evidence for inherited predisposition to glioma is provided by a number of rare inherited cancer syndromes, such as Turcot's and Li-Fraumeni syndromes, and neurofibromatosis 4 . Even collectively, these diseases however account for little of the twofold increased risk of glioma seen in first-degree relatives of glioma patients 5 . Support for polygenic susceptibility to glioma has come from genome-wide association studies (GWASs) that have identified single-nucleotide polymorphisms (SNPs) at eight loci influencing glioma risk-3q26.2 (near TERC), 5p15.33 (near TERT), 7p11.2 (near EGFR), 8q24.21 (near CCDC26), 9p21.3 (near CDKN2A/CDKN2B), 11q23.3 (near PHLDB1), 17p13.1 (TP53) and 20q13.33 (near RTEL1) (refs 6-10). Perhaps not surprisingly there is variability in genetic effects on glioma by histology with subtype-specific associations at 5p15.33, 20q13.33 and 7p11.2 for GBM and at 11q23.3 and 8q24 for non-GBM glioma 6,7 .
Recovery of untyped genotypes via imputation has enabled fine mapping and refinement of association signals, for example, in identification of rs55705857 as the basis of the 8q24 association signal in glioma 11 . Recently, the use of the 1000 Genomes Project and the UK10K projects as a combined reference panel has been shown to improve accuracy compared with using the 1000 Genomes Project data alone, allowing imputation of alleles with frequencies B0.5% to be viable 12 .
Here we report a meta-analysis of four GWASs totalling 4,147 cases and 7,435 controls to identify new glioma susceptibility loci, after imputation using the 1000 Genomes and the UK10K Project data as reference. After genotyping an additional series of 1,490 cases and 1,723 controls we identified new risk loci for GBM at 12q23.33 and non-GBM at 10q25.2, 11q23.2, 12q21.2 and 15q24.2. Our findings provide further insights into the genetic basis of the different glioma subtypes.

Results
Association analysis. To identify additional glioma susceptibility loci we conducted a pooled meta-analysis of four GWASs in populations of European ancestry, the UK-GWAS, the French-GWAS, the German-GWAS and the US-GWAS, that were genotyped using either Illumina HumanHap 317, 317 þ 240S, 370Duo, 550, 610 or 1M arrays (Supplementary Table 1). After filtering, the studies provided genotypes on 4,147 cases and 7,435 controls of European ancestry (Supplementary Table 1, Supplementary Fig. 1). Consistent with our previous analysis 6 , quantile-quantile (Q-Q) plots for the German and the US series showed some evidence of inflation (inflation factor based on the 90% least-significant SNPs, l 90 ¼ 1.15 and 1.11, respectively), however after correcting for population substructure using principal-component analyses as implemented in Eigenstrat 13 , l 90 for all four studies was r1.05 (combined l 90 ¼ 1.05, Supplementary Fig. 2). To achieve consistent and dense genome-wide coverage, we imputed unobserved genotypes at 410 million SNPs using a combined reference panel comprising 1,092 individuals from the 1000 Genomes Project and 3,781 individuals from the UK10K project. Q-Q plots for all SNPs (minor allele frequency (MAF) 40.5%) post-imputation did not show evidence of substantive over-dispersion introduced by imputation after Eigenstrat adjustment (combined l 90 ¼ 1.07, l 90 for individual studies ¼ 1.04-1.06; Supplementary Fig. 2).
Pooling data from each GWAS into a joint discovery data set, we derived joint odds ratios (ORs) and 95% confidence intervals (CIs) under a fixed-effects model for each SNP with MAF 40.005 and associated per allele Eigenstrat-corrected P values. Overall and histology-specific ORs were derived for all glioma, GBM and non-GBM. In the pooled data set, associations at the established risk loci for glioma at 5p15.33, 7p11.2, 8q24.21, 9p21.3, 11q23.3, 17p13.1 and 20q13.33 showed a consistent direction of effect with previously reported studies (Po5.0 Â 10 À 8 , Fig. 1 and Supplementary Table 2). In contrast we found no significant support for the association between rs1920116 near TERC (3q26.2) and risk of high-grade glioma recently reported by Walsh et al. 10  After filtering at Po5.0 Â 10 À 6 in either all glioma, GBM or non-GBM, we selected 14 SNPs for follow-up, mapping to distinct loci not previously associated with glioma risk (Fig. 1 and  Supplementary Table 2). rs141035288, rs117527984, rs138170678 were not taken forward as there was poor concordance between imputed and sequenced genotypes (Supplementary Table 3), and rs145034266 could not be genotyped as it mapped within a highly repetitive region.
The association signal at 12q23.3 defined by rs3851634 was specific for GBM. The rs3851634 maps to intron 12 of the gene encoding polymerase III, RNA, subunit b (POLR3B; Fig. 2a) within a B350-kb block of linkage disequilibrium (LD) at 12q23.3, which also contains the genes CKAP4 and TCP1L2. The other four SNP associations defined by rs11196067, rs648044, rs12230172 and rs1801591 were specific to non-GBM glioma. rs11196067 (10q25.2) is located in intron 7 of VTI1A (vesicle transport through interaction with t-SNAREs 1A, Fig. 2b). Similarly rs648044 (11q23.2) is also intronic mapping within ZBTB16 (zinc finger and BTB domain-containing protein 16, alias PLZF; Fig. 2c). The rs12230172 (12q21.2) maps within the lincRNA RP11-114H23.1 and is centromeric to the gene encoding PHLDA1 (centromeric pleckstrin homology-like domain, family a, MEMBER 1, Fig. 2d). rs1801591 (15q24.2) is responsible for the p.Thr171Ile substitution in ETFA (electron transfer flavoprotein, alpha polypeptide gene, which resides within a 500-kb region of LD to which ISL2, TYRO3P and SCAPPER genes also map  Relationship between the new glioma SNPs and tumour profile. To investigate the impact of the new risk SNPs on glioma subtype we examined rs11196067, rs648044, rs12230172, rs1801591 and rs3851634 genotypes in the French case series for which comprehensive histology and molecular phenotyping had been performed (Supplementary Data 1). The GBM SNP rs3851634 was associated with 10q-deleted glioma (P ¼ 0.016).
Functional annotation of risk variants. For each of the sentinel risk SNPs at the five risk loci (as well as correlated variants, r 2 40.8) we examined published data 14,15 and made use of the online resources HaploReg v3, RegulomeDB and SeattleSeq for evidence of functionality and regulatory motifs at genomic regions (Supplementary Table 5). rs1801591, which is responsible for the ETFA p.Thr171Ile substitution, resides within a highly conserved region of the genome (genomic evolutionary rate profiling (GERP) ¼ 5.65) and the amino-acid change is predicted to be damaging (PolyPhen ¼ 1). Although rs648044 exhibits low evolutionary conservation (GERP ¼ À 9.32) it maps within a strong DNase hypersensitivity site and predicted enhancer/superenhancer element for multiple tissues including the brain. The region surrounding rs648044 is also predicted to interact with the ZBTB16 promoter, which combined with alteration of a Pax-5 motif is suggestive of direct functional impact. rs12230172 localizes within a moderately conserved region (GERP ¼ 3.41) and occupies promoter histone marks in the brain as well as enhancers predicted to associate with transcriptional start sites for PHLDA1 and GLIPR1. rs11196067 in VTI1A, while having a low conservation score (GERP ¼ 0.719), occupies enhancer histone marks in embryonic stem cells although not in brain cells. Similarly, rs3851634 maps to a moderately conserved region (GERP ¼ 2.37) and occupies enhancer histone marks in 18 organs including the brain. eQTL analysis of the five new glioma SNPs. To gain further insight into the functional basis of rs11196067, rs648044, rs12230172, rs1801591 and rs3851634 associations we performed an expression quantitative trait loci (eQTL) analysis using RNA-Seq expression data on 389 low-grade gliomas (LGGs) and 138 GBMs from The Cancer Genome Atlas (TCGA), together with lymphoblastoid cell line RNA-Seq data on 363 samples from GEUVADIS 16 . We examined for an association between SNP genotype and expression of genes mapping within 1 Mb of the sentinel SNP (Supplementary Data 2). After adjusting for multiple testing within each region no statistically significant eQTL was seen for rs11196067, rs12230172, rs1801591 or rs3851634. The strongest association between rs648044 genotype and gene expression was with ZW10 in LGG (P ¼ 5.7 Â 10 À 5 ), with the risk allele (T) associated with lower expression, remaining significant after adjustment for multiple testing. To explore the possibility that rs648044 is correlated with a SNP exhibiting a stronger association with ZW10, we examined associations with ZW10 expression in LGG tumours in all SNPs in LD (r 2 40.4) with rs648044. All of the proxy SNPs examined were more weakly associated with ZW10 than rs648044 (Supplementary Table 6). Following on from these analyses we made use of publically available eQTL mRNA expression array data on adipose tissue, lymphoblastoid cell lines and skin from 856 twins (MuTHER 17 ) and 5,311 non-transformed peripheral blood samples using the blood eQTL browser 18 . The risk allele (C) of rs3851634 was associated with significantly lower levels of POLR3B (P ¼ 7.49 Â 10 À 6 ) in peripheral blood analysis with a nominally significant association in skin (P ¼ 0.0052). The risk allele (T) of rs1801591, was associated with significantly lower ETFA levels in peripheral blood (P ¼ 7.90 Â 10 À 12 ); there was a nominally significant association in MuTHER lymphoblastoid cell lines (P ¼ 0.037).
Somatic mutation of newly implicated risk genes in glioma. We examined mutation data from TCGA for evidence of recurrent mutation in genes annotated by the new GWAS signals. Collectively POLR3B, ETFA, VTI1A, ZBTB16 and PHLDA1 are altered in 8% (22/286) of LGG as compared with 3% (8/273) of GBM (P ¼ 0.014, Supplementary Table 7) providing support for these genes having a role in glioma tumorigenesis.
Individual variance in risk associated with glioma SNPs. To explore the relative contributions of previously reported and newly described loci to glioma risk, we applied the method of Pharoah et al. 19 to eight previously reported SNPs as well as the five new risk SNPs (Supplementary Table 8). The variance in risk attributable to all 12 SNPs is 26%, 27% and 43% for all glioma, GBM and non-GBM, respectively.
Pathway enrichment of glioma GWAS SNPs. To gain further insights into the biological basis of associations we performed a pathway analysis on GWAS associations in all glioma, GBM and non-GBM. Applying a false discovery rate (FDR) threshold of o0.1 revealed enrichment for 14 pathways in all glioma, 8 in GBM and 9 in non-GBM tumours (Supplementary Table 9).
Pathways implicated in GBM tumours primarily include DNA repair and Notch-signalling, whereas for non-GBM tumours pathways were primarily associated with cell-cycle progression and energy metabolism (Supplementary Table 9).

Discussion
To our knowledge we have performed the largest GWAS of glioma to date, identifying five novel glioma susceptibility loci at 12q23.33, 10q25.2, 11q23.2, 12q21.2 and 15q24.2 and taking the total count of risk loci to 12. Through making use of a combined reference panel from the UK10K and the 1000 Genomes Projects we were able to recover genotypes from B8 million SNPs for association analysis, a significant increase from using array SNPs alone. In addition, we have provided further evidence that genetic susceptibility to glioma can be subtype specific, emphasising the importance of searching for histology-specific risk variants. While deciphering the functional impact of these SNP associations on glioma development requires additional analyses, a number of the genes implicated have relevance to the biology of this cancer a priori. As well as participating in regulating insulinstimulated trafficking of secretory vesicles 20 , VTI1A plays a key role in neuronal development and in selectively maintaining spontaneous neurotransmitter release 21 . Intriguingly recent GWAS have identified associations between the VTI1A SNPs rs7086803 and lung cancer 22 and between rs12241008 and colorectal cancer 23 ; rs7086803 and rs12241008 are not  ZBTB16 is highly expressed in undifferentiated, multipotential progenitor cells and its expression has been shown to influence resistance to retinoid-mediated re-differentiation in t(11;17)(q23;21) acute promyelocytic leukaemia 24 . The BTB domain of ZBTB16 has transcriptional repression activity and interacts with components of the histone deacetylase complex thereby linking the transcription factor with regulation of chromatin conformation 25 . Although rs648044 lies within an enhancer active in brain and is predicted to interact with the ZBTB16 promoter, providing an attractive functional basis for the 11q23.2 association through differential ZBTB16 expression, we found a strong association between rs648044 and ZW10 expression in LGG (P ¼ 5.7 Â 10 À 5 ). Since ZW10 plays a role in chromosome segregation 26 it also represents a plausible candidate for the 11q23.2 association.
We also observed a strong association between ETFA expression and rs1801591 in peripheral blood (P ¼ 7.90 Â 10 À 12 ). ETFA participates in mitochondrial fatty acid beta oxidation; shuttling electrons between flavoprotein dehydrogenases and the membrane-bound electron transfer flavoprotein ubiquinone oxidoreductase 27     features gliosis. While the p.Thr171Ile change is reported to decrease thermal stability of ETFA 30 thereby providing evidence for a direct functional effect the strong eQTL data is consistent with the functional basis for the 15q24.2 association being mediated through differential expression. RNA polymerase III (POLR3B) is involved in the transcription of small noncoding RNAs and short interspersed nuclear elements, as well as all transfer RNAs 31 . Although mutations in POLR3B have been shown to cause recessive hypomyelinating leukoencephalopathy 32 thus far there is no evidence implicating the gene in the development of glioma. Albeit in peripheral blood there was a strong association between POLR3B expression and rs3851634 (P ¼ 7.49 Â 10 À 6 ), providing a possible functional basis of the 12q23.2 association.
At 12q21.2 rs12230172 maps within RP11-114H23.1, a lincRNA of currently unknown function. Although only lying adjacent to PHLDA1, the known 11q23.3 association maps to the related gene PHLDB1, which is also specific to non-GBM tumours 7 . Although a role for PHLDA1 in glioma has yet to be established downregulation of PHLDA1 in neuronal cells has been shown to enhance cell death without Fas induction 33 , additionally PHLDA1 expression may be involved in regulation of anti-apoptotic effects of IGF1 (ref. 34).
Intriguingly across all of the four GWAS data sets we analysed we did not replicate the association between rs1920116 (near TERC) at 3q26.2 and risk of high-grade glioma recently reported by Walsh et al. 10 (P ¼ 8.3 Â 10 À 9 , OR ¼ 1.30 versus P ¼ 0.18, OR ¼ 1.06 relative to the G-allele in our GBM data set), despite our study having a similar power to demonstrate a relationship (1,783 GBM cases, 7,435 controls in our study as compared with 1,644 cases, 7,736 controls). It is, however noteworthy that the Walsh et al. analysed both anaplastic astrocytoma and GBM. While we could not demonstrate a significant association with either subtype we did see an association between rs1920116 and TP53-mutated glioma (P ¼ 0.016, Supplementary Data 1) suggesting that the association might be restricted to a specific molecularly defined subtype of glioma.
Our findings provide further evidence for an inherited genetic susceptibility to glioma. Future investigation of the genes targeted by the risk SNPs we have identified is likely to yield increased insight into the development of this malignancy. We estimate that the risk loci so far identified for glioma account for 27 and 43% of the familial risk of GBM and non-GBM tumours, respectively, of which 0.8% and 7.6% can be explained by the loci newly reported in this study (Supplementary Table 8). Although the power of our study to detect the major common loci (MAF40.2) conferring risk Z1.2 was high (B80%), we had low power to detect alleles with smaller effects and/or MAFo0.1. By implication, variants with such profiles probably represent a much larger class of susceptibility loci for glioma because of the truly small effect sizes or submaximal LD with tagging SNPs. Thus, it is probable that a large number of variants remain to be discovered. In addition, as we have recently shown, stratified analysis of glioma by molecular profile may lead to the discovery of additional subtype-specific risk variants. However, such subtype analyses can increase the statistical burden of adjusting for multiple testing. For example, if applying an additional Bonferroni correction for GBM and non-GBM subtypes, the rs11196067 (VTI1A) association at P ¼ 8.64 Â 10 À 8 would not be declared genome-wide significant. An issue in future subtype analyses of glioma will therefore be to have sufficient study power to mitigate type II error given the additional constraints of multiple testing. Further efforts to expand the scale of GWAS meta-analyses through international consortia and increasing the number of SNPs taken forward to large-scale replication will be required to address this challenge.

Methods
Ethics. Collection of blood samples and clinico-pathological information from patients and controls was undertaken with informed consent and relevant ethical review board approval in accordance with the tenets of the Declaration of Helsinki. Ethical committee approval for this study was obtained from relevant study centres (UK: South East Multicentre Research Ethics Committee (MREC) and the Scottish Multicentre Research Ethics Committee; France: APHP Ethical Committee-CPP (comité de Protection des Personnes); Germany: Ethics Commission of the Medical Faculty of the University of Bonn; and USA: University of Texas MD Anderson Cancer Institutional Review Board).
Genome-wide association studies. We used GWAS data previously generated on four non-overlapping case-control series of Northern European ancestry, which have been the subject of previous studies 6,7 ; summarized in Supplementary Table 1. Briefly, the UK-GWAS was based on 636 cases (401 males; mean age 46 years) ascertained through the INTERPHONE study 35  Replication genotyping. For replication we made use of DNA from 1,490 glioma cases recruited to an ongoing UK study of primary brain tumours (National Brain Tumour Study). Controls were healthy individuals that had been recruited to the National Study of Colorectal Cancer Genetics 42 and the GEnetic Lung CAncer Predisposition Study 43 . All cases and controls were UK residents and had selfreported European ancestry. Controls reported no personal history of cancer at the time of ascertainment. Genotyping of rs76178334, rs4432939, rs182521816, rs12780046, rs11196067, rs648044, rs12230172, rs3851634, rs1801591 and rs78543262 was performed using competitive allele-specific PCR KASPar chemistry (LGC, Hertfordshire, UK, primer sequences detailed in Supplementary  Table 10). Conditions used are available on request. Call rates for SNP genotypes were 495%. To ensure quality of genotyping in all assays, at least two negative controls and 1-10% duplicates (showing a concordance 499%) were genotyped. For SNPs with MAFo5%, at least two known heterozygotes were included per genotyping plate, to aid clustering.
Statistical and bioinformatic analysis. Data were imputed for all scans for over 10 million SNPs using IMPUTE2 v2.3.0 (ref. 44) software and the 1000 Genomes Project (Phase 1 integrated release 3, March 2012 (ref. 45)) and the UK10K data (ALSPAC, EGAS00001000090/EGAD00001000195, and TwinsUK, EGAS00001000108/EGAD00001000194, studies only) as reference panels (Supplementary Table 1). Genotypes were aligned to the positive strand in both imputation and genotyping. Imputation was conducted separately for each scan in which before imputation each GWAS data set was pruned to a common set of 425,190 SNPs. Poorly imputed SNPs defined by an information score (Is) o0.70 and Hardy-Weinberg equilibrium Po1.0 Â 10 À 5 were excluded from the analyses. Tests of association between imputed SNPs and glioma was performed under a probabilistic dosage model in SNPTEST v2. 5 (ref. 46).
Eigenvectors for the GWAS data sets were inferred using smartpca (part of EIGENSOFTv2. 4 (refs 13,47)) using B100,00 ld-pruned SNPs. Eigenstrat adjustment was carried out in SNPTEST by including the first 10 eigenvectors as covariates. The adequacy of the case-control matching and possibility of differential genotyping of cases and controls was evaluated using Q-Q plots of test statistics. The inflation factor l was based on the 90% least-significant SNPs as previously advocated 48 . Testing for secondary signals was carried out in SNPTEST, adjusting for the sentinel SNP using the '-condition_on' option. Visualization of population ancestry was carried out in smartpca by projecting query samples onto eigenvectors inferred from the 1000 Genomes Project populations ( Supplementary  Fig. 1). Meta-analysis of GWAS data sets under a fixed-effects model was undertaken in META v1.6 (ref. 49) using the inverse-variance approach. Cochran's Q-statistic to test for heterogeneity and the I 2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated 50 . P het values o0.05 are considered characteristic of large heterogeneity 50 . In addition, analyses stratified by glioma tumour histology and molecular characteristics were performed. All statistical P values were two sided.
Estimates of individual variance in risk associated with glioma-risk SNPs was carried out using the method described in Pharoah et al. 19 assuming the familial risk of glioma to be 1.77 (ref. 51). Briefly, for a single allele (i) of frequency p, relative risk R and ln risk r, the variance (V i ) of the risk distribution due to that allele is given by: Where E is the expected value of r given by: For multiple risk alleles the distribution of risk in the population tends towards the normal with variance: The total genetic variance (V) for all susceptibility alleles has been estimated to be O1.77. Thus the fraction of the genetic risk explained by a single allele is given by:  15 , annotating by ubiquitous enhancers as well as enhancers specifically expressed in astrocytes, neurons, neuronal stem cells and brain tissue. Similarly, we searched for overlap with 'super-enhancer' regions as defined by Hnisz et al. 14 , restricting analysis to U87 GBM cells, astrocyte cells and brain tissue. We additionally made use of 15-state chromHMM data from H1-derived neuronal progenitor cells available from the Epigenome roadmap project 62 . Mutation data in LGG and GBM tumours from TCGA was assessed using the cBioPortal for cancer genomics 63 .
To search for biological pathways enriched for glioma SNP associations we made use of Improved Gene Set Enrichment Analysis for Genome-wide Association Study (i-GSEA4GWAS v1.1) (ref. 64). SNPs up to 5 kb upstream and downstream of a given gene were mapped to that gene, with the maximum P value of all SNPs mapping to a gene used to represent the gene. Gene sets used were: canonical pathways, gene ontology (GO) biological process, GO molecular function, GO cellular component. As recommended we applied an FDR cutoff of o0.10 on all reported gene sets. In the case of multiple identical pathways, that with the lower FDR value is retained.
Imputation concordance assessment. The fidelity of imputation as assessed by the concordance between imputed and directly genotyped SNPs was examined in 192 cases and 187 controls from the UK-GWAS discovery series (Supplementary Table 3). Targeted sequencing for the SNPs rs141035288, rs117527984, rs76178334, rs4432939, rs182521816, rs138170678, rs145034266, rs12780046, rs11196067, rs648044, rs12230172 and rs78543262 was performed by Sanger on an ABI3700 analyser (Applied Biosystems; Supplementary Table 10, conditions are available on request). For SNPs with MAF o0.05, samples were included to ensure at least 10 predicted heterozygotes were sequenced. Imputed genotypes were considered for concordance assessment if exhibiting probability 40.9.
Tumour genotyping. Tumour samples were available from a subset of the patients ascertained through the Service de Neurologie Mazarin, Groupe Hospitalier Pitié-Salpêtrière Paris. Tumours were snap frozen in liquid nitrogen and DNA was extracted using the QIAmp DNA minikit, according to the manufacturer's instructions (Qiagen, Venlo, LN, USA). DNA was analysed for large-scale copy number variation by CGH array as previously described 65,66 . In the cases not analysed by CGH array, 9p, 10q, 1p and 19q status was assigned using PCR microsatellites, and EGFR amplification and CDKN2A-p16-INK4a homozygous deletion by quantitative PCR. IDH1, IDH2 and TERT promoter mutation status was determined by sequencing as previously described 67,68 .
Expression quantitative trait loci analysis. To examine the relationship between SNP genotype and gene expression, we made use of tumour RNA sequence data and blood Affymetrix 6.0 SNP Array data for 389 low-grade and 138 GBM tumours of European ancestry from TCGA (accession number phs000178.v9.p8), as well as RNA sequence data from lymphoblastoid cells (GEUVADIS project 16 ) and genotype data for 363 European individuals from the 1000 Genomes Project 45 . Sequence reads from downloaded FASTQ files were aligned to the human hg19 reference genome and GRCh37 Ensembl transcriptome using TopHat v2.0.7 and Bowtie v2.0.6. Read counts per gene were generated for 62,069 Ensembl genes using featureCounts 69 as part of the Rsubread Bioconductor package 70 . For TCGA samples, European ancestry was assessed through visualization of clustering with CEU samples after principal components analysis (data not shown). Untyped genotypes were imputed from the Affymetrix 6 array using similar methods to those discussed previously. Genotypes with probability 40.9 were taken forward for eQTL analysis. The association between SNP and gene expression was quantified using the Kruskal-Wallis trend test.
We additionally queried publically available eQTL mRNA expression data using MuTHER, and the Blood eQTL browser. MuTHER contains expression adipose tissue, lymphoblastoid cells and skin expression data from 856 healthy twins 17 . rs500629 was used as a proxy for rs648044 (r 2 ¼ 0.52, D 0 ¼ 0.85). The blood eQTL browser contains expression data from 5,311 non-transformed peripheral blood samples 18 . Putative eQTLs were thresholded at FDR o0.1.