Genome-wide association study of classical Hodgkin lymphoma identifies key regulators of disease susceptibility

Several susceptibility loci for classical Hodgkin lymphoma have been reported. However, much of the heritable risk is unknown. Here, we perform a meta-analysis of two existing genome-wide association studies, a new genome-wide association study, and replication totalling 5,314 cases and 16,749 controls. We identify risk loci for all classical Hodgkin lymphoma at 6q22.33 (rs9482849, P = 1.52 × 10−8) and for nodular sclerosis Hodgkin lymphoma at 3q28 (rs4459895, P = 9.43 × 10−17), 6q23.3 (rs6928977, P = 4.62 × 10−11), 10p14 (rs3781093, P = 9.49 × 10−13), 13q34 (rs112998813, P = 4.58 × 10−8) and 16p13.13 (rs34972832, P = 2.12 × 10−8). Additionally, independent loci within the HLA region are observed for nodular sclerosis Hodgkin lymphoma (rs9269081, HLA-DPB1*03:01, Val86 in HLA-DRB1) and mixed cellularity Hodgkin lymphoma (rs1633096, rs13196329, Val86 in HLA-DRB1). The new and established risk loci localise to areas of active chromatin and show an over-representation of transcription factor binding for determinants of B-cell development and immune response.

C lassical Hodgkin lymphoma (cHL) is a lymphoid malignancy of germinal centre (GC) B-cell origin 1 , which is characterised by Hodgkin and Reed-Sternberg (HRS) cells with a dominant background population of reactive inflammatory cells 1 . Of the four major subtypes of cHL, nodular sclerosis Hodgkin lymphoma (NSHL) and mixed cellularity Hodgkin lymphoma (MCHL) account for 65% and 20% of cHL, respectively 2 . While Epstein-Barr virus (EBV) infection is causally associated with a subset of cHL cases, proportionally higher in MCHL, no other environmental factor has thus far been robustly linked to cHL risk 3 .
Evidence for inherited genetic influence on susceptibility to cHL is provided by the familial risk and the high concordance between monozygotic twins 4,5 . A strong HLA association for cHL risk is well established; however, our understanding of cHL heritability has been transformed by recent genome-wide association studies (GWAS), which have identified singlenucleotide polymorphisms (SNPs) at seven non-HLA loci influencing risk [6][7][8][9] . Although projections indicate that additional risk variants for cHL can be discovered by GWAS 10 , the statistical power of published studies is limited.
To gain a more comprehensive insight into cHL predisposition, we performed a meta-analysis of two previous GWAS 7, 8 and a new GWAS, thereby more than doubling study power to discover risk SNPs. With replication, our study has allowed us to identify six new non-HLA risk loci. Additionally, by conducting regionspecific imputation we have defined the specific HLA associations underlying NSHL and MCHL risk.

Results
Association analysis. We analysed GWAS data from three studies of European ancestry: a new GWAS from the UK National Study of Hodgkin Lymphoma Genetics (NSHLG) and two previously reported GWAS (Supplementary Table 1) 7,8 . After quality control the three studies provided SNP genotypes on 3,077 cases and 13,680 controls (Supplementary Tables 2, 3, 4; Supplementary Fig. 1). To increase genomic resolution, we imputed >10 million SNPs using the 1000 Genomes Project and the UK10K data as reference 11,12 . Quantile-quantile (Q-Q) plots for SNPs with minor allele frequency (MAF) > 0.05% post imputation did not show evidence of substantive over-dispersion (λ = 1.03-1.09; Supplementary Fig. 2). An overview of the analysis strategy is outlined in Supplementary Fig. 3. Meta-analysing the association test results from the three GWAS into a joint discovery set, we calculated joint odds ratios and 95% confidence intervals for each SNP and associated per-allele P-value for all cHL, NSHL and MCHL cases vs. controls ( Supplementary Fig. 4). In this analysis, associations for the established non-HLA risk loci at 2p16.1, 3p24.1, 5q31.1, 6q23.3, 8q24.21, 10p14 and 19p13.3 were consistent in direction and magnitude of effect with previously reported studies (Supplementary Fig. 4; Supplementary Table 5) [6][7][8] .
Relationship between the new risk SNPs and phenotype. A hallmark of cHL epidemiology is the bimodal age-specific incidence and it has been argued that the disease in young adults and older adults is aetiologically different; in particular there is a low prevalence of EBV-positive disease in NSHL patients aged 16-35 3 . Case-only analysis did not provide evidence of sex differences at newly identified risk SNPs (Supplementary Table 10) or a relationship between age in the NSHL subgroup. Albeit not significant after correction for multiple testing, we observed an association between EBV-positive disease and cHL at 6q23.3 in 796 cases analysed (rs6928977, P = 0.03, Supplementary Table 10).
Biological inference. Five of the six new risk SNPs localise in or near genes which have either been previously implicated in the development of cHL or have established roles in B-cell development and are therefore strong candidates for cHL susceptibility. Specifically, the 6q22.33 association marked by rs9482849 maps intergenically to THEMIS (thymocyte-expressed molecule involved in selection) and PTPRK (receptor-type tyrosine protein phosphatase kappa) (Fig. 1). Downregulation of PTPRK by the EBV-encoded EBNA1 contributes to the growth and survival of HRS cells 13 . The 6q23.3 association defined by rs6928977 localises to intron 3 of AHI1 (abelson helper integration site-1) ( Fig. 1) which has been implicated in the development of both Band T-cell lymphoma 14,15 . The 13q34 association marked by rs112998813 is located in intron 5 of UPF3A (Fig. 1), a regulator of nonsense transcripts 16 . The LD region of association also harbours CDC16 (cell division cycle protein 16). CDC16, a subunit of the anaphase-promoting complex 17 , targets cell cycle regulatory proteins for proteasome degradation, thereby allowing cell cycle progression, and is downregulated in HRS cells 18 . At 16p13.13, the rs34972832 association for NSHL maps to intron 18 of CLEC16A (C-type lectin domain family 1, Fig. 1) whose loss of function affects both B-cell number and function 19 . The 10p14 association marked by rs3781093 maps intronic of GATA3 (Fig. 1). Transcriptional repression of GATA3 is essential for early B-cell commitment, and aberrant GATA3 expression has been observed in HRS cells 20,21 . Intriguingly, the rs3781093 risk allele for NSHL has previously been demonstrated to be protective for paediatric B-cell acute lymphoblastic leukaemia (ALL) 22 .
To the extent that they have been deciphered, many GWAS risk loci map to non-coding regions of the genome and influence gene regulation. Hence, to gain insight into the biological mechanisms for the associations of the newly identified risk SNPs, we interrogated publicly accessible expression data on lymphoblastoid cell lines (LCLs) 23,24 . We used the summary data-based Mendelian randomisation (SMR) analysis to test for pleiotropy between GWAS signal and cis-expression quantitative trait (eQTL) for genes within 1 Mb of the sentinel SNP at each locus to identify a causal relationship 25 . At 6q23.3 and 10p14, significant eQTLs were observed with AHI1 (P SMR = 8.63 × 10 −6 ; Supplementary Table 11 and Supplementary Fig. 5) and GATA3 (P SMR = 4.70 × 10 −8 ; Supplementary Table 11 and Supplementary  Fig. 5).
Since spatial proximity between specific genomic regions and chromatin looping interactions are central for the regulation of gene expression, we identified patterns of chromatin interactions at candidate causal SNPs by analysing promoter capture Hi-C data on GM12878 as a source of B-cell information 26 . Looping chromatin interactions were shown at 3q28 (rs4459895), 6q23.3 (rs6928977), 10p14 (rs3781093) and 16p13.13 (rs34972832). While no significant eQTL was shown for these chromatin looping interactions they involved a number of genes with biological relevance to cHL development ( Fig. 1). At 3q28, the looping interaction implicates BCL6 and mir-28, which have well documented roles in B-cell tumour biology and GC B-cell development 27,28 . At 6q23.3, we observed interactions with promoter sequences upstream in MYB and ALDH8A1. At 10p14, both risk SNPs encompass a region that interacts with TAF3, which encodes transcription initiation factor TFIID subunit 3. TAF3 forms part of the transcription initiation factor TFIID and is necessary for haematopoiesis 29 . Finally, we observed interactions at the 16p13.13 risk locus with RMI2 (encoding RecQ mediated genome instability 2) ( Fig. 1). RMI2 is an essential component of the Bloom helicase-double Holliday junction dissolvasome and is responsible for genomic stability 30 .
Across the new and established risk loci for cHL we confirmed a significant enrichment of DNase hypersensitivity in GM12878 cells (false discovery rate (FDR) adjusted P-value = 0.0035), as well as enhancer elements in primary B-cells (FDR adjusted P-value = 0.00064) and GM12878 cells (FDR adjusted P-value = 0.015) 31 . Analysis of ChIP-seq data on 82 transcription factors (TFs) showed an over-representation of the binding of TFs that play a central role in B-cell signalling-networks such as RELA (nuclear factor NF-kappa-B p65), EBF1 (early B-cell factor 1), RUNX3 (runt-related transcription factor 3) and BATF (basic leucine zipper transcription factor, ATF-like) (Fig. 2). Collectively, these observations support the assertion that risk loci for cHL mediate their effects through B-cell developmental networks, and are strongly involved in transcriptional initiation and enhancement.
Heritability of cHL. By fitting all SNPs from GWAS simultaneously using Genome-wide Complex Trait Analysis 33 , the estimated heritability of cHL, NSHL and MCHL attributable to all common variation is 24.0% (±2.3%), 25.2% (±3.4%) and 21.9% (±2.4%), respectively. This estimate represents the additive    variance, and therefore does not include the potential impact of dominance effects or gene-environment interactions having an impact on cHL risk. The currently identified non-HLA risk SNPs thus only account for around 12% of the additive heritable risk.
Co-heritability with autoimmune disease. Although not universal, some epidemiological studies have reported associations between cHL and various autoimmune diseases, raising the possibility of common genetic susceptibility and hence common biological pathways 34 . Variation at a number of the cHL risk loci, including 3p24.1, 5q31.1 and 6q23.3 has previously been implicated as determinants of autoimmune disease risk supporting such an assertion (Supplementary Data 1).
To investigate co-heritability globally between cHL and autoimmune disease, we implemented cross-trait LD score regression 35 . Using summary-level GWAS data we estimated genetic correlations between cHL and six autoimmune diseases curated by ImmunoBase; specifically rheumatoid arthritis 36 , systemic lupus erythematosus 37 , multiple sclerosis (MS) 38 , primary biliary cirrhosis 39 , ulcerative colitis (UC) 40 and coeliac disease 41 GWAS data (Supplementary Table 13). We observed a positive genetic correlation between cHL and MS (r g = 0.35, P = 0.04) and a negative correlation between cHL and UC (r g = −0.23, P = 0.01).

Discussion
To our knowledge, we have performed the largest GWAS of cHL to date, identifying six new non-HLA risk loci. The availability of comprehensive reference panels for the HLA region has allowed us to delineate class I and class II associations for NSHL and MCHL, substantiating recent documented differences between these cHL histologies 9 .
Although functional analyses are required to determine the biological basis of cHL association signals, we have demonstrated that these risk loci are enriched for regulatory elements in B-cells. Moreover, they feature an over-representation of key B-cell TF binding, notably RELA, RUNX3, EBF1 and BATF. RELA is a TF involved in NF-κB heterodimer formation. HRS cells show high constitutive activity of NF-κB (both canonical and non-canonical pathways) 42 , which promotes cell survival and growth through inducing anti-apoptotic and pro-proliferative gene programs 43,44 . Inhibition of NF-κB in HRS cells leads to caspase-independent apoptosis 43 . EBF1 cooperates with E2A and PAX5 to regulate B-cell maturation 45 . Its expression in HRS cells is low 46 , which is thought to contribute to the loss of normal B-cell phenotype 47 . RUNX3 has important roles in B-cell maturation 48 and downregulation of RUNX1 by RUNX3 is required for EBV-driven LCL  49 . BATF also appears to co-ordinate B-cell maturation 50 , and is highly expressed in HRS cells 51 .
The strong HLA associations we identified for NSHL and MCHL support recent observations for distinct class I and class II relationships for these cHL subtypes 9 . Specifically, the class II NSHL association marked by rs9269081 is in strong LD with the previously identified risk SNP rs6903608 (r 2 = 0.92, D' = 1.0) for EBV-negative NSHL 9 . For MCHL the class I association rs1633096 shows correlation with the previously identified marker SNP rs2734986 (pairwise r 2 = 0.41, D' = 0.97) for EBV-positive cHL 9 . A class I association for MCHL is consistent with a high EBV positivity and supports the notion of defective cytotoxic T-cell lymphocyte responses in EBV-infected HRS cells 52 . Variation within the class II HLA region alters the risk of autoimmune diseases 53 , but the underlying biological mechanism of these associations has yet to be fully defined. The class II HLA association for NSHL and MCHL risk, comprising both coding variants and non-coding SNPs, may explain the importance of CD4+ T follicular helper (T FH ) cells in cHL pathogenesis. In the GC, there is a requirement for CD4+ T FH cells to interact with GC B-cells through the T-cell antigen receptor (TCR) and HLA class II proteins for normal plasma and memory cells formation 54 . It is therefore plausible that variation in peptide binding and expression of the HLA class II proteins contributes to cHL pathogenesis through interaction with CD4+ T FH cells. Such a model is supported by the observation of variation at position 86 of HLA-DRB1 influencing TCR Vα gene expression 55 , the predominance of CD4+ T-cells in cHL tumours 56 , the reliance of HRS cells on the micro-environment for survival 1 , and the loss of MHC class II expression on HRS cells 57 , the last of which is associated with adverse prognosis. An alternative explanation for the class II HLA association in cHL is the involvement of an unidentified pathogen playing a causative role in cHL. Aminoacid variants and SNPs within HLA-DRB1 modulate humoral immune responses to common viruses, such as influenza A and JC polyomavirus 58 . Consistent with such a model is dimorphic variation at position 86 of HLA-DRB1 59 , which we identify as influencing risk of NSHL and MCHL, modulating the anchoring pocket of the antigen binding site, and influencing the conformation of peptide-DR protein complexes while maintaining a T-cell response 60 .
In our analysis we noted a reciprocal relationship between NSHL risk and ALL risk at 10p14 (GATA3) 22 . Since GATA3 plays a key role in B-cell development and both ALL and NSHL are malignancies derived from B-cells at different stages of maturation, our observation leads to speculation of a significant temporal effect of genetic variation at this locus in response to an environmental or mutational insult.
Although supported by a contemporaneous study and requiring further validation 61 , we found evidence for common genetic susceptibility between cHL and MS, thus raising the possibility of shared environmental risk factors. A potential biological basis for such a relationship may encompass aberrant immune activation and cell proliferation.
In conclusion, our study provides further evidence for inherited susceptibility to cHL and supports a model whereby risk loci influence disease through effects on B-cell regulatory networks, providing a mechanistic link between susceptibility and biology. Our findings also emphasise the differences between the major subtypes, which are reflective of differences in tumour aetiology. Genome-wide association studies. Primary study: We analysed constitutional DNA from 1,717 cases ascertained through the NSHLG (http://www.public.ukcrn. org.uk) from 2010 to 2013. These are detailed in Supplementary Table 1. Cases were genotyped using the Illumina Oncoarray (Illumina Inc.). Controls which were also genotyped using the oncoarray comprised: (1) 2,976 cancer-free men recruited by the PRACTICAL Consortium-the UK Genetic Prostate Cancer Study (UKGPCS) (age < 65 years), a study conducted through the Royal Marsden NHS Published studies: We used GWAS data generated on two non-overlapping case-control series of Northern European ancestry, which have been the subject of previous analyses that are summarised in Supplementary Table 1 GWAS and meta-analysis. Standard quality control measures were applied to each of the three GWAS (Supplementary Tables 2, 3 and 4) 7,8,63 . Specifically, individuals with a low call rate (< 95%) as well as all individuals evaluated to be of non-European ancestry (using the HapMap version III CEU, JPT/CHB and YRI populations as a reference, Supplementary Fig. 1) were excluded. For apparent first-degree relative pairs, we excluded the control from a case-control pair or the individual with the lower call rate. SNPs with a call rate < 95% were excluded as were those with a MAF < 0.01 or displaying deviation from Hardy-Weinberg equilibrium (HWE) (i.e., P < 10 −6 , Supplementary Table 4). GWAS data were imputed to >10 million SNP with IMPUTE2 v2.3 64 software, using a merged reference panel consisting of data from 1000 Genomes Project (phase 1 integrated release 3, March 2012) 11 and UK10K (ALSPAC, EGAS00001000090/ EGAD00001000195 and TwinsUK EGAS00001000108/EGAS00001000194 studies) 12 . HLA imputation was conducted using SNP2HLA and the Type I Diabetes Genetics Consortium reference panel of 5,225 individuals of European descent 32 . The number of variants in the HLA imputation recovered with an information measure of > 0.80 were 8,436 (94% of total variants), 8506 (95% of total variants) and 8599 (96% of total variants) in the UK-GWAS, German-GWAS and UK-NSHLG-GWAS data sets, respectively. Imputation was conducted separately for each study, and in each, the data were pruned to a common set of SNPs between cases and controls prior to imputation. Poorly imputed SNPs defined by an information measure <0.80 were excluded. Tests of association between SNPs and cHL were performed using logistic regression under an additive genetic model in SNPTESTv2.5 65 . The adequacy of the case-control matching was evaluated using Q-Q plots of test statistics ( Supplementary Fig. 2). The inflation factor λ was based on the 90% least-significant SNP 66 . Where appropriate, principal components, generated using common SNPs, were included in the analysis to limit the effects of cryptic population stratification that otherwise might cause inflation of test statistics. Eigenvectors for the GWAS data sets were inferred using smartpca (part of EIGENSOFT) by merging cases and controls with Phase III HapMap samples. LD metrics were calculated in vcftools v0.1.12b 67 , using UK10K merged 1000 Genomes Project data and plotted using visPIG 68 .

Methods
Replication studies and technical validation. The eight SNPs in the most promising loci (  [69][70][71] . Briefly, SNEHD involved ascertainment of incident cases from Scotland and Northern England during 1993-1997. YHCCS was based on newly diagnosed cases aged 16-24 years from Northern England during 1991-1995. ELCCS comprised cases residing in the north or parts of southwest of England aged 16-69 years with newly diagnosed, non-human immunodeficiency virus-related cHL during 1998-2003. UK population controls matched to cases on age, sex and area of residence were obtained from SNEHD, YHCCS and ELCCS. The EBV status of cHL tumours in the UK replication 2 series was determined by immunohistochemical staining for EBV latent membrane antigen-1 and/or EBV EBV-encoded RNA in situ hybridisation using sections of paraffin-embedded material 72,73 . The fidelity of GWAS imputation was assessed by the concordance between imputed and directly genotyped SNP in a subset of samples (Supplementary  Table 7). Replication genotyping of UK samples was performed using competitive allele-specific PCR KASP chemistry (LGC, Hertfordshire, UK). Primers, probes and conditions are detailed in Supplementary Table 14. Call rates for SNP genotypes were > 95% in each of the replication series. To ensure the quality of genotyping in assays, at least two negative controls and a set of duplicates were genotyped (concordance > 99%).
Meta-analysis. Meta-analyses were performed under a fixed-effects model using META v1.6 74 . Cochran's Q-statistic to test for heterogeneity and the I 2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated; an I 2 value ≥ 75% is considered to be characteristic of large heterogeneity 75 . We used the test-based method of Higgins et al. 76 to derive 95% CIs for I 2 values (Supplementary Table 9). To estimate study power of the discovery GWAS phase, we made use of the CaTS online calculator 77 , assuming a risk allele frequency of 0.2 and genotype relative risk of 1.20.
Expression quantitative trait locus analysis. To examine the relationship between SNP genotype and gene expression, we carried out SMR analysis as per Zhu et al., 2016 25 . Briefly, if b xy is the effect size of x (gene expression) on y (slope of y regressed on the genetic value of x), b zx is the effect of z on x, and b zy is the effect of z on y. Therefore b xy (b zy /b zx ) is the effect of x on y. To distinguish pleiotropy from linkage where the top associated cis-eQTL is in LD with two causal variants, one affecting gene expression the other affecting trait, we tested for heterogeneity in dependent instruments, using multiple SNPs in each cis-eQTL region. Under the hypothesis of pleiotropy b xy values for SNPs in LD with the causal variant will be identical. Thus testing against the null hypothesis that there is a single causal variant is equivalent to testing heterogeneity in the b xy values estimated for the SNPs in the cis-eQTL region. For each probe that passed significance threshold for the SMR test, we tested the heterogeneity in the b xy values estimated for multiple SNPs in the cis-eQTL region using HEIDI.
We used publicly available LCL expression data from the MuTHER (n = 825) 23 and GTEx consortium (n = 114) 24 . Briefly, GWAS summary statistics files were generated from the meta-analysis. Reference files were generated from merging 1000 Genomes Project phase 3 and UK10K (ALSPAC and TwinsUK) vcfs 11,12 . As previously advocated, only probes with at least one eQTL P-value of < 5.0 × 10 −8 were considered for SMR analysis 25 . We set a threshold for the SMR test of P SMR < 5.49 × 10 −4 corresponding to a Bonferroni correction for 91 tests (91 probes with a top eQTL P < 5.0 × 10 −8 across the 12 loci and two LCL eQTL data sets). For all genes passing this threshold, we generated plots of the eQTL and GWAS associations at the locus, as well as plots of GWAS and eQTL effect sizes (i.e., corresponding to input for the HEIDI heterogeneity test). HEIDI test P-values < 0.05 were considered as being reflective of heterogeneity. This threshold is conservative for gene discovery because it retains fewer genes than when correcting for multiple testing. SMR plots for significant eQTLs are shown in Supplementary  Fig. 5.
Chromatin state dynamics. Enrichment of cHL risk SNPs with DNAse and enhancers is conducted using Haploreg v4 31 . The overlap of cHL risk SNPs with enhancers in GM12878 cell is compared to a background model of all 1000 Genomes Project variants with a frequency above 5% in any population. The enrichment relative to these background frequencies was performed using a binomial test and a FDR P-value was subsequently calculated; we considered an FDR < 0.05 as being significant.
To examine enrichment in specific TF binding across risk loci, we adapted the variant set enrichment method of Cowper-Sal lari et al. 78 . For each risk locus, a region of strong LD (defined as r 2 > 0.8 and D′ > 0.8) was determined, and these SNPs were termed the associated variant set (AVS). TF ChIP-seq uniform peak data were obtained from ENCODE for the GM12878 cell line, and included data for 82 TFs. For each of these marks, the overlap of the SNP in the AVS and the binding sites was determined to produce a mapping tally. A null distribution was produced by randomly selecting SNP with the same LD structure (generated from 1000 Genomes Project and UK10K data) as the risk associated SNP, and the null mapping tally calculated. This process was repeated 10,000 times, and approximate P-values were calculated as the proportion of permutations where the null mapping tally was greater or equal to the AVS mapping tally. An enrichment score was calculated by the tallies to the median of the null distribution. Thus the enrichment score is the number of standard deviations of the AVS mapping tally from the mean of the null distribution tallies.
Promoter capture Hi-C data. To map risk SNPs to interactions involving promoter contacts and identify genes involved in cHL susceptibility, we analysed promoter capture Hi-C data on the LCL cell line GM12878 as a model B-cell 26 . Reads from technical replicates (E-MTAB-2323) were combined before processing with HiCUP 79 . Significant interactions (i.e., score ≥ 5) on two biological replicates were determined using CHiCAGO 80 .
Co-heritability of Hodgkin lymphoma with autoimmune disease. We utilised LD regression to estimate genetic correlation between individual autoimmune diseases and cHL, NSHL and MCHL 35 . Summary statistics for published studies of coeliac disease 41 , systemic lupus erythematosus 37 , primary biliary cirrhosis 39 , rheumatoid arthritis 36 , MS 38 and UC 40 were downloaded from the ImmunoBase website (http://www.immunobase.org/). Heritability analysis. We used genome-wide complex trait analysis to estimate the polygenic variance (i.e., heritability) ascribable to all genotyped and imputed GWAS SNPs 33 . SNPs were excluded based on low MAF < 0.01, poor imputation (info score < 0.9) and evidence of departure from HWE (P < 0.05). Individuals were excluded for poor imputation and where two individuals were closely related. A genetic relationship matrix of pairs of samples was used as input for the restricted maximum likelihood analysis to estimate the heritability explained by the selected set of SNPs. To transform the estimated heritability to the liability scale, we used the lifetime risk, for cHL, which is estimated to be 0.002 by SEER (https://seer. cancer.gov/statfacts/html/hodg.html).
Data availability. Genotype data that support the findings of this study have been deposited in the European Genome-phenome Archive (EGA) under accession codes EGAD00000000022 and EGAD00000000024.
Sequencing data, which forms the reference panel for imputation, have been deposited in the European Genome-phenome Archive (EGA) under accession codes EGAS00001000090, EGAD00001000195, EGAS00001000108.
Transcriptional profiling data from the MuTHER consortium that support the findings of this work have been deposited in the European Bioinformatics Institute (Part of the European Molecular Biology Laboratory, EMBL-EBI) under accession code E-TABM-1140.
Transcriptional profiling data from the Genotype-Tissue Expression (GTEx) project, that support the findings of this work are available here: https://www. gtexportal.org/ Transcription factor binding data that support the findings of this work are available here: http://genome.ucsc.edu/ENCODE/downloads.html.
Promoter capture Hi-C data in GM12878 cells that support the findings of this work have been deposited in the European Bioinformatics Institute (Part of the European Molecular Biology Laboratory, EMBL-EBI) under accession code E-MTAB-2323.
The remaining data contained within the paper and supplementary files are available from the author upon request.