Functional characterization of a multi-cancer risk locus on chr5p15.33 reveals regulation of TERT by ZNF148

Genome wide association studies (GWAS) have mapped multiple independent cancer susceptibility loci to chr5p15.33. Here, we show that fine-mapping of pancreatic and testicular cancer GWAS within one of these loci (Region 2 in CLPTM1L) focuses the signal to nine highly correlated SNPs. Of these, rs36115365-C associated with increased pancreatic and testicular but decreased lung cancer and melanoma risk, and exhibited preferred protein-binding and enhanced regulatory activity. Transcriptional gene silencing of this regulatory element repressed TERT expression in an allele-specific manner. Proteomic analysis identifies allele-preferred binding of Zinc finger protein 148 (ZNF148) to rs36115365-C, further supported by binding of purified recombinant ZNF148. Knockdown of ZNF148 results in reduced TERT expression, telomerase activity and telomere length. Our results indicate that the association with chr5p15.33-Region 2 may be explained by rs36115365, a variant influencing TERT expression via ZNF148 in a manner consistent with elevated TERT in carriers of the C allele.

Philadelphia, Pennsylvania 19104, USA. * These authors contributed equally to this work. ** These authors jointly supervised the work. Correspondence and requests for materials should be addressed to K.M.B. (email: kevin.brown3@nih.gov) or to L.T.A. (email: amundadottirl@mail.nih.gov). w The members of the PanScan Consortium are listed at the end of the paper. z The members of the TRICL Consortium are listed at the end of the paper. y The members of the GenoMEL Consortium are listed at the end of the paper. R isk variants across a small genomic region on chromosome 5p15. 33 have been reported in genome wide association studies (GWAS) for at least eleven cancer types including bladder, breast, glioma, lung, melanoma, non-melanoma skin cancer, ovarian, pancreas, prostate, testicular germ cell cancer and chronic lymphocytic leukaemia [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15] . Fine-mapping studies, either within a specific cancer type or across different cancers, have characterized up to seven independent loci in this region with either risk-enhancing or protective effects across a dozen cancers [16][17][18] . Notable is the fact that in nearly every locus, the effect is pleiotropic. This genomic region contains two plausible candidate genes, TERT and CLPTM1L. The former encodes the catalytic subunit of the telomerase reverse transcriptase (TERT), which in combination with an RNA template (TERC) adds nucleotide repeats to chromosome ends 19 . Although telomerase is active in germ cells and in early development, it is repressed in most adult tissues. Telomeres shorten with each cell division, and when they reach a critically short length, cellular senescence or apoptosis is triggered. However, cancer cells can continue to divide despite critically short telomeres, by upregulating telomerase or by alternative lengthening of telomeres (ALT) (refs 20-22). The CLPTM1L gene encodes the cleft lip and palate associated transmembrane 1-like protein, and is overexpressed in lung and pancreatic cancer where it promotes growth and survival and is required for KRAS driven lung cancer [23][24][25][26][27] .
One of the multiple risk loci in this genomic region lies within the CLPTM1L gene and has been termed Region 2 (ref. 18), originally reported to be associated with risk of pancreatic, lung, bladder cancer, and melanoma, marked by either rs401681 or rs402710 (refs 1,4,5,11,28). By conducting fine-mapping across multiple cancers and subsequently investigating the functional consequences of the subset of genetic variants most strongly associated with cancer risk, we find that risk of pancreatic, testicular and lung cancer conferred by this locus may predominantly be explained by a single-SNP. This variant, rs36115365, exhibited preferred protein-binding and enhanced regulatory activity for the C-allele, associated with increased pancreatic and testicular but decreased lung cancer and melanoma risk.
Transcriptional gene silencing of the regulatory region encompassing this variant resulted in repression of TERT but not CLPTM1L expression in an allele-specific manner. Proteomic analysis identified allele-preferred binding of Zinc finger protein 148 (ZNF148) to rs36115365-C, a finding supported by binding of purified recombinant ZNF148 specifically to the C-allele, as well as by ChIP analysis showing allele-preferential binding of endogenous ZNF148 to rs36115365-C. Knockdown of ZNF148 resulted in reduced TERT expression, telomerase activity and telomere length. Taken together, these results indicate that the association with chr5p15.33-Region 2 may be explained by rs36115365, a variant influencing TERT via ZNF148 in a manner consistent with elevated TERT expression in carriers of the C allele.

Results
Fine-mapping the chr5p15.33 Region 2 risk locus. We performed imputation and fine-mapping of the multi-cancer risk locus in the CLPTM1L gene (Region 2, originally marked by rs401681 and rs402710) using GWAS data for four cancers previously shown to have associations with this locus, namely pancreatic 11 , testicular 28 and lung cancer 7 , and melanoma 29 . For pancreatic cancer, fine-mapping identified SNPs with P values significantly lower than the previously published association signal marked by rs401681, with rs451360 being the smallest (P ¼ 2.0 Â 10 À 10 for rs451360; P ¼ 3.7 Â 10 À 7 for rs401681; Supplementary Table 1) 18 . This SNP is highly correlated with eight other SNPs (r 2 40.60, 1000G EUR population) that collectively mark Region 2 in pancreatic cancer (Fig. 1). Finemapping of Region 2 for testicular germ cell tumours (TGCT) and lung cancer revealed that the strongest SNP for each was among this group of nine SNPs (rs35953391 for TGCT, P ¼ 1.08 Â 10 À 9 ; and rs37004 for lung cancer, P ¼ 1.18 Â 10 À 13 ; Supplementary Table 1). Conditional analysis for the most significant SNP across each cancer resulted in a substantial loss of the signal for the other eight SNPs in pancreatic (P Conditional ¼ 0.47-0.91), testicular (P Conditional ¼ 0.21-0.92) and lung cancer (P Conditional ¼ 0.09-0.45). In contrast, for melanoma none of the nine SNPs were significantly associated with risk in an unconditional analysis. However, upon conditioning on the most significant SNP in Region 2 (rs2447853, P ¼ 5.7 Â 10 À 12 ) (ref. 29), all nine SNPs became more significantly associated with melanoma risk (P Conditional ¼ 5.77 Â 10 À 5 to 4.45 Â 10 À 3 ), consistent with the possibility that these SNPs may mark one or more risk variants independent of rs2447853.
We also noted in the 1000G Phase 3, version 1 reference dataset an insertion/deletion variant that was highly correlated with these nine SNPs (rs3030832, r 2 ¼ 0.96 to rs451360 in EUR) that had not been included in the imputation reference based off an earlier version (1000G Phase 1, version 3). We therefore re-imputed the pancreatic cancer GWAS with the newer 1000G reference set and observed an association signal similar in strength and significance to that of the other nine variants (rs3030832, P ¼ 8.25 Â 10 À 10 , OR¼ 1.28 95% CI 1.18-1.39; Supplementary Table 2) indicating that this indel variant should likewise be considered a candidate functional risk variant. Overall, these ten variants extend across the entire length of CLPTM1L, from the promoter to B6 kb downstream of the gene (Fig. 1). Three variants, rs36115365, rs380145 and rs27071, are located within potential gene regulatory regions, annotated by the ENCODE project ( Fig. 1, Supplementary Fig. 1).
Allele-specific regulatory effects mediated by rs36115365. We sought to assess whether any of the ten highly correlated sequence variants influence differential protein binding via electrophoretic mobility shift assays (EMSA) in the PANC-1 and/or MIA PaCa-2 pancreatic cancer cell lines (Fig. 2, Supplementary Fig. 2). Only rs36115365 exhibited allele-specific binding (Fig. 2), where the pancreatic cancer risk-associated minor C-allele (MAF 0.19 in 1000G EUR) displayed selective protein binding as indicated by greater loss of C-allele-specific banding upon addition of unlabelled C-allele competitor compared to unlabelled G-allele probe. EMSA assays for rs36115365 in seven additional cancer cell lines, including pancreatic cancer (MIA PaCa-2, Supplementary Fig. 3a), testicular germ cell cancer (NTERA-2 and 2102Ep, Supplementary Fig. 3b), lung cancer (A549, Fig. 2; NCI-H460, Supplementary Fig. 3c), and melanoma lines (UACC903 and UACC1113; Supplementary Fig. 3d) showed a similar pattern of allele-preferential binding to the C allele of rs36115365.
This SNP is located in-between the 5 0 end of TERT (B18 kb upstream) and 3 0 end of CLPTM1L (B5 kb downstream), a region that overlaps active histone modification marks and multiple transcription factor binding sites according to ENCODE data ( Fig. 1, Supplementary Fig. 1). The region harbouring rs36115365 demonstrated an allele-specific increase in luciferase reporter activity as compared to empty vector that was consistent across all eight cancer cell lines tested (Fig. 3, Supplementary  Fig. 4), including those from pancreas (PANC-1 and MIA PaCa-2, average fold change for C versus G allele 1.38, range 1.05-2.82), testis (NTERA-2, and 2102Ep, average fold change for C/G allele 1.95, range 1.12-4.83), lung (A549 and NCI-H460, average fold change for C/G allele 1.33, range 1.05-1.95), and melanoma (UACC903 and UACC1113, average fold change for C/G allele 1.35, range 1. 30-1.42). Transcriptional activity of the genomic region surrounding rs36115365 (240 bp) was higher in the forward (plasmids FG and FC) as compared to the reverse (plasmids RG and RC) orientation. Across all cancer cell lines, the C-allele on average showed an approximately 44% higher luciferase activity than the G-allele in the forward orientation, and 23% higher activity in the reverse orientation (P ¼ 4.2 Â 10 À 5 -0.031).
Analysis of imputed GWAS data from pancreatic and testicular cancers conditioned on rs36115365 are consistent with rs36115365 accounting for the majority of the Region 2 signal (P Conditional ¼ 0.03-0.99 and P Conditional ¼ 0.22-0.92, respectively for the grouping of eight SNPs highly correlated with rs36115365; Supplementary Table 1), with the minor C allele being positively associated with risk. In lung cancer and melanoma, however, fine-mapping data suggest that the genetic architecture underlying risk in Region 2 may be more complex, but are nonetheless consistent with a functional role for rs36115365. For lung, in contrast to pancreatic and testicular cancers, the C allele of rs36115365 is negatively associated with risk. Conditioning on rs36115365 revealed a possible secondary signal for lung cancer risk within the eight highly correlated SNPs (P Conditional ¼ 3.74 Â 10 À 5 -0.11; Supplementary Table 1). For melanoma, rs36115365 was not significant in single-SNP analysis (P ¼ 0.70), but became more significant after conditioning on the best Region 2 SNP (rs2447853, P Conditional ¼ 1.09 Â 10 À 4 ; Supplementary Table 1), with the C allele also being negatively associated with risk (OR ¼ 0.86; 95% CI 0.80-0.93). After conditioning on rs36115365 for melanoma, rs2447853 also becomes more significant (P Conditional ¼ 3.01 Â 10 À 15 versus P ¼ 5.7 Â 10 À 12 ). These data suggest rs36115365 may influence gene expression within the TERT-CLPTM1L region and may account for either some or the entire association signal in this region, depending on the cancer type. Recombination hotspots in the CEU population (red line), as well as 1000G combined recombination rate (blue line) across the TERT/CLPTM1L region are shown relative to the CLPTM1L and TERT genes, as well as the grouping of ten highly correlated sequence variants strongly associated with risk of pancreatic, testicular, and lung cancers in the region closest to CLPTM1L. (a) Chromatin interaction analysis paired-end (ChIA-PET) sequencing data from the K562 chronic myeloid leukaemia cell line using an antibody against RNA polymerase II generated by the ENCODE project (https://www.encodeproject.org/) is shown. For each of the ten strongly associated variants, layered H3K4Me1, H3K4Me3, and H3K27Ac chromatin immunoprecipiation (ChIP-seq), DNAse I hypersensitivity sequencing (DNase) and transcription factor ChIP-seq (TF-ChIP-Seq) data from the ENCODE project are shown (b) as displayed by the UCSC Genome Browser (lower panels).
Silencing the region harbouring rs36115365. To interrogate whether the putative gene regulatory region harbouring rs36115365 influences expression of TERT and/or CLPTM1L, siRNA mediated transcriptional gene silencing (TGS) (refs 30,31) was used to target across this region to evaluate effects on gene expression. This mechanism of gene silencing is different from the well-known siRNA-mediated post-transcriptional gene silencing (PTGS) in that it targets a genomic regulatory region that mediates gene expression rather than messenger RNA (mRNA) (refs 30,31). Eight siRNAs were designed to span the region (Fig. 4a, Supplementary Table 3) and were separately transfected into cancer cell lines from pancreas (PANC-1), lung (A549), testis (NTERA-2), and melanoma (UACC903). Three of the eight (siRNA3, siRNA5 and siRNA8; Fig. 4b, Supplementary Fig. 5) showed significant inhibition of TERT mRNA expression by RT-qPCR in all four cell lines tested compared to a scrambled siRNA control, suggesting a role for the targeted region in the regulation of TERT expression. Inhibition of TERT by the three siRNAs ranged from 24 to 74% in PANC-1, 44 to 77% in A549, 33 to 49% in NTERA-2 and 54 to 84% in UACC903. The remaining five siRNAs showed little effect on expression of TERT. In contrast, expression of CLPTM1L as well as the GAPDH and ACTB housekeeping genes were not affected by any of the eight siRNAs. In addition, four siRNAs randomly designed to target non-genic regulatory regions on chromosome 8q24.21 were used as negative controls; none altered expression of TERT, CLPTM1L, or either housekeeping gene ( Supplementary  Fig. 6). The three siRNAs altered TERT expression in four additional cancer cell lines from pancreas (MIA PaCa-2), testis (2102Ep), lung (NCI-H460) and melanoma (UACC1113). Both siRNA3 and siRNA8 consistently reduced expression of TERT, but not CLPTM1L or housekeeping gene expression in all four lines, while siRNA5 resulted in specific down-regulation of TERT in some but not all lines ( Supplementary Fig. 7). These data suggest that the genomic region harbouring rs36115365 plays a key role in the regulation of TERT, but not CLPTM1L, expression.
Allele-specific TERT gene-regulatory activity by rs36115365. We next sought to test whether the effect of TGS by siRNA targeting this putative gene-regulatory element on TERT expression was influenced by the genotype at rs36115365 by assessing allele-specific TERT mRNA expression. The human TERT gene harbours a synonymous SNP in exon 2 (rs2736098), linked to rs36115365 (r 2 ¼ 0.14, D 0 ¼ 1.0 in 1000G CEU), allowing for assessment of expression of TERT from chromosomes harbouring the C and G alleles of rs36115365 in cell lines heterozygous for both SNPs. We screened genomic DNA and complementary DNA (cDNA) from 55 pancreatic cell lines, as well as the melanoma, lung and testis cancer cell lines from the NCI60 panel to identify cell lines that are both heterozygous for rs36115365 and express two different alleles of rs2736098, yielding two assayable pancreatic cancer cell lines (Panc 05.04, IMIM-PC-1) and one lung cancer cell line (A549). The two pancreatic cancer cell lines express higher levels of TERT from the C as compared to the G allele (2.3 and 9.8 fold, respectively) whereas A549 cells express higher levels from the G allele ( Supplementary Fig. 8). However, after adjusting for DNA copy number, all three cell lines express higher levels of TERT from the C allele (1.2 fold for A549 cells). We evaluated allele-specific levels of inhibition of TERT expression by siRNA3 (which is both closest to rs36115365 and most consistently inhibits TERT expression across the cell lines previously tested) in these three cell lines using a TaqMan allelic-discrimination assay for rs2736098. Inhibition by the siRNA on the C versus the G allele of rs36115365 was 60.2 versus 49.1% in Panc 05.04 cells (P ¼ 0.007; t-test), 70.0 versus 63.6% in A549 cells (P ¼ 0.003; t-test) and 28.3 versus 16.4% in IMIM-PC-1 cells (P ¼ 0.002; t-test) (Fig. 4c).
These results indicate that rs36115365 lies in a gene-regulatory element that influences TERT expression in an allele-specific manner.
Zinc-finger transcription factor 148 binds rs36115365-C. To investigate the underlying mechanism of the differential gene regulation by genotypes at rs36115365, and to identify transcription factors potentially mediating this effect, we performed pull-down with oligonucleotides corresponding to the C or the G allele of rs36115365 incubated with nuclear extracts from PANC-1 and UACC903 cell lines, followed by quantitative mass spectrometry 32 . While most proteins identified bind both variants equally well, we noted outliers that bound the C-allele preferentially over the G-allele, as demonstrated by their location on the two-dimensional interaction plot (lower left quadrant of each, Fig. 5a and Supplementary Fig. 9), consistent with the EMSA data and suggesting preferential protein binding to this allele. Three proteins (ZNF148, VEZF1/ZNF161 and ZNF281) were identified as binding the C variant of rs36115365 preferentially in label-swapping experiments performed across both PANC-1 and UACC903 cell lines using a poly-dAdT competitor (Fig. 5a). A fourth protein, ZNF740, was also found to preferentially bind the C variant in both cell lines using mixed poly-dAdT and poly-dIdC competitors ( Supplementary Fig. 9, bottom panels). We sought to verify whether any of these four proteins differentially bound the C-allele by using antibodies against these proteins in conjunction with EMSAs for rs36115365 ( Fig. 5b, Supplementary Figs 10 and 11). Only the antibody against ZNF148 consistently resulted in loss of C allele-specific banding in pancreatic (PANC-1; Fig. 5b), as well as testis (NTERA-2) and lung cancer (A549) lines ( Supplementary  Fig. 10). Furthermore, EMSAs using recombinant purified ZNF148 protein demonstrated specific binding of ZNF148 to the C allele of rs36115365 (Fig. 5c). Notably, the resulting band had similar mobility characteristics to both those from EMSAs of ZNF148 bound to a known binding site in the CDKN1A/p21 promoter 33,34 , as well as the C allele-specific band for rs36115365 using PANC-1 nuclear extracts (Fig. 5c). Consistent with these data, ZNF148, (also named ZBP-89) a zinc-finger transcription Electrophoretic mobility shift assays (EMSA) with biotin-labelled oligonucleotides containing either rs36115365-C or rs36115365-G in pancreatic (PANC-1) and lung (A549) cancer cell line nuclear extracts. Two specific protein complexes bind the C allele of rs36115365 preferentially in both cell lines and are more strongly competed with unlabeled C probe as compared to unlabeled G probe. Unlabelled competitor was used at Â 10 and Â 100 (as indicated by gradient symbol). Arrows denote specific protein complexes bound by the C allele of rs36115365.
factor of the kruppel-like family 35 , is predicted to bind to a consensus DNA-recognition motif created by the C-allele of rs36115365 (Fig. 5d). To further establish the binding of ZNF148 to rs36115365 and surrounding genomic region, we performed chromatin-immunoprecipitation (ChIP) for ZNF148 followed by quantitative PCR, noting an enrichment of binding at rs36115365 in pancreatic and lung cancer cell lines homozygous and heterozygous for rs36115365-C as compared to background and the surrounding area (Fig. 5e, Supplementary Fig. 12a-f). We also assessed allelic enrichment in the immunoprecipitates and noted a significant enrichment of the C allele as compared to the G allele in A549 cells (1.51 fold, P ¼ 0.01; t-test; Supplementary Fig. 12g), with Panc 05.04 cells showing a nonsignificant trend in the same direction (1.12 fold, P ¼ 0.06; t-test; Supplementary Fig. 12g).
ZNF148 knockdown reduces TERT mRNA and telomerase activity. To determine the effect of ZNF148 depletion on expression of TERT and CLPTM1L in pancreatic, lung, testicular, and melanoma cell lines (n ¼ 8 total), we used siRNA-mediated PTGS. We observed that while depletion of ZNF148 resulted in little change in expression of CLPTM1L, expression of TERT was significantly decreased in most of the cell lines, with an average expression of 0.50 relative to a scrambled siRNA control (range 0.27-0.89, P ¼ 2.0 Â 10 À 4 -0.012; t-test; Fig. 6a, Supplementary  Figs 13 and 14), consistent with a role for ZNF148 in regulating TERT expression. In contrast, siRNA-mediated knockdown of VEZF1 (ZNF161), ZNF281 and ZNF740 showed no effect on expression of either TERT or CLPTM1L ( Supplementary Fig. 15). We next sought to assess if ZNF148-mediated regulation of TERT expression was accompanied by effects on telomerase activity and telomere length. Knockdown of ZNF148 via PTGS resulted in reduced telomerase activity in A549 and MIA PaCa-2 cells (Fig. 6b), as well as in NTERA-2 and UACC903 cells ( Supplementary Fig. 16). This reduction was similar to that observed via siRNA-mediated depletion of TERT itself, or by transcriptional gene silencing (TGS, siRNA3) to target the gene regulatory element encompassing rs36115365. To further assess the role of ZNF148 in regulating TERT expression and activity, we performed rescue experiments after depletion of endogenous ZNF148 using an siRNA targeting the 3 0 -UTR of ZNF148. Overexpression of exogenous ZNF148 lacking the 3 0 -UTR indeed rescued both TERT expression and telomerase activity in A549 and MIA PaCa-2 cells (Supplementary Fig. 17). Consistent with these data, depletion of either ZNF148 or TERT, or alternatively targeting the rs36115365 regulatory region in both A549 and MIA PaCa-2 cells all resulted in similar reductions of telomere length (Fig. 6c).

Discussion
A small genomic region on chr5p15.33, that harbours the TERT and CLPTM1L genes, has been reported to influence risk of  multiple cancers and may contain up to seven or more independent susceptibility loci [16][17][18] . The complexity of this locus is highlighted by the fact that the same alleles confer susceptibility to some cancers while they are protective for others. One of these susceptibility loci termed Region 2, initially marked by rs401681 and rs402710 in CLPTM1L, was fine-mapped in a subset-based meta-analysis across multiple cancer types 18 and is the focus of the current study. The ten variants that mark Region 2 span the whole length of CLPTM1L to B17 kb upstream of the transcriptional start site of TERT. Here, we identify rs36115365 as a functional SNP in this region and provide a plausible biological explanation underlying risk, featuring altered TERT, but not CLPTM1L, expression. Fine-mapping of Region 2 using GWAS data from pancreatic, lung and testicular cancer confirmed significant association with this small set of tightly linked SNPs (Fig. 1). Little signal remained within Region 2 after accounting for rs36115365 or alternatively the respective most significant SNP in pancreatic and testicular cancer, consistent with the notion that one or more of these variants (and/or an as-of-yet unidentified variant tightly linked with these SNPs) is responsible for mediating cancer risk attributable to this locus. For lung cancer, residual signal was seen after conditional analysis on rs36115365 (P Conditional ¼ 3.74 Â 10 À 5 to 7.71 Â 10 À 4 ), indicating that this SNP may not explain the entire Region 2 signal for lung cancer. For melanoma, a SNP (rs2447853) highly correlated to the original GWAS SNP reported for these cancers (rs401681, r 2 ¼ 0.97) represents the most significant SNP in Region 2 (ref. 29). Although rs36115365 was non-significant in single-SNP analyses, it became more significant after conditioning on rs2447853 (P Conditional ¼ 1.09 Â 10 À 4 ). The LD structure between these SNPs and conditional analyses suggest that in melanoma both may mark independent functional variants, with the signal at rs2447853 masking the association between rs36115365 and melanoma risk in single-SNP analysis.
In contrast with the other variants, preferred protein binding was seen on the minor (C) allele of rs36115365 across multiple cell lines representing all four cancer types. Luciferase reporter assays consistently showed differential gene regulatory activity between alleles across the cancer cell lines assayed. These data suggested rs36115365 as a strong candidate for a functional multi-cancer risk variant but did not specifically implicate which gene(s) may be influenced by this SNP.
While a suite of tools is commonly used to interrogate potential gene-regulatory GWAS loci and link regulatory variants to a specific gene or genes 36 , their application was challenging for this locus. Expression quantitative trait locus analysis proved problematic for TERT given the relatively low expression of this gene in normal tissues. Likewise, the utility of chromosome conformation capture (3C) methods to establish a physical association between risk variants and specific target genes is greatly limited by the relatively short distances between rs36115365 and the TERT promoter. To establish a relationship between this element and regulation of gene expression, we targeted the intergenic risk region using siRNAs. This method has previously been used to inhibit promoter function via small RNA duplexes by a process termed TGS (refs 30,31,37-41). We applied this methodology to our study of an intergenic GWAS susceptibility variant, and established a role for the regulatory element in driving TERT (but not CLPTM1L) gene expression across multiple cancer types. These data suggest that the method may be of broader utility in the functional interrogation of GWAS loci.
Our results suggest that the binding of one or more proteins to the C-allele of rs36115365 is likely to play a key role in regulating TERT expression. Through quantitative mass spectrometry, we identified preferential binding of zinc finger protein 148 (ZNF148, also named ZBP-89) to the C-allele of rs36115365 in multiple cancer cell lines, and ChIP data confirmed binding of ZNF148 over rs36115365. We observed a subtle but significant preference for ZNF148 binding to the C-allele in ChIP experiments using A549 lung cancer cells, with a non-significant trend in the same direction in Panc 05.04 pancreatic cancer cells. These subtle differences in transcription factor binding are consistent with the very small effects this locus confers on cancer risk over a person's lifetime. Consistent with a central role for ZNF148 in regulating expression of TERT, siRNA-mediated gene knockdown of ZNF148 consistently resulted in reduced expression of TERT. Furthermore, both knockdown of ZNF148 as well as TGS of the gene regulatory element in which rs36115365 resides reduced telomerase activity and telomere length, to a degree similar to knockdown of TERT itself. After depletion of ZNF148, this effect was rescued by exogenous ZNF148.
ZNF148 is a transcriptional regulator of the kruppel-like family that binds GC-rich DNA sequences in a variety of promoters to either activate or repress gene expression (reviewed by Zhang et al. 42 ). Overexpression of ZNF148 promotes growth arrest and  apoptosis in gastrointestinal cancer cell lines in vitro and suppresses adenoma formation in the ApcMin/ þ mouse model in vivo 43 . This may be due, at least in part, to ZNF148 binding of p53 that prevents nuclear export and results in elevated levels of nuclear p53 (ref. 44). ZNF148 has also been shown to be important in regulating CDKN1A gene expression, binding a GC-rich element in the promoter of the gene and recruiting both ataxia-telangiectasia mutated kinase and histone acetyltransferase p300 into a complex that drives histone deacetylase inhibitor (HDACi) mediated induction of this gene 33,34 .
Our results indicate that ZNF148 may regulate TERT expression in pancreatic, testicular, lung, and melanoma tumour cell lines via a regulatory element that is disrupted by the G allele at rs36115365. As some of these cell lines have TERT promoter mutations (UACC903, UACC1103) whereas others do not (PANC-1, MIA PaCa-2, unpublished data), our results indicate that regulation by ZNF148 is important even in the presence of these presumably activating mutations.
In summary, our work has uncovered a likely causal variant in the TERT-CLPTM1L Region 2 susceptibility locus and identified ZNF148 as a potential effector of a gene-regulatory element that mediates increased TERT expression in an allele-specific manner. Furthermore, our fine-mapping results highlight the complexity of this region and indicate that Region 2 may, in some cancers, consist of more than one underlying functional signal. Our results are remarkably consistent in eight cell lines across four different cancer types and explain, at least in part, the biological underpinnings of risk for rs36115365. Notably, our data suggest that the mechanism by which ZNF148 influences TERT is similar for cancer types in which the C-allele of rs36115365 contributes to increased risk, or alternatively to disease protection. Although TERT expression and ensuing effects on telomere length may be the crucial underlying mechanism in mediating inverse risk for different cancers, studies of surrogate tissue telomere length and cancer risk have been contradictory and shown associations with short or long telomeres, or no effect [45][46][47][48][49][50][51][52][53] . TERT could also mediate risk through its telomere-independent functions that include transcriptional regulation and mitochondrial RNA polymerase activity (for review see Martinez et al. 54 ). Other factors may contribute to the pleiotropic effects observed for rs36115365, including differential environmental exposures, regulatory effects through genes beyond TERT, interaction with additional risk variants and/or somatic mutations both within Region 2 and the larger TERT/CLPTM1L locus, or tissue-specific regulation of ZNF148 and other transcription factors mediating TERT expression. Our findings represent the first steps in unravelling the complex functional consequences of carrying risk variants in Region 2 of chr5p15.33 and strongly indicate a major role for expression of TERT in influencing risk of multiple cancer types.

Methods
Studies. Subjects were drawn from GWAS studies of four cancers: pancreatic cancer: PanScan I and II (3,525 cases and 3,642 control subjects; dbGaP Study Accession: phs000206.v5.p3) (refs 11,18); testicular germ cell tumours: NCI (581 cases and 1055 control subjects) and PENN (477 cases) (ref. 28) and PLCO controls (178 control subjects) 55 29 . All participants provided informed written consent and all studies were reviewed and approved by institutional ethics review committees at the involved institutions. Participation of subjects in the PanScan GWAS was also reviewed by the NCI Special Studies Institutional Review Board. Each participating study obtained approval from its institutional review board (IRB) permitting data sharing in accordance with the NIH policy for Sharing of Data obtained in NIH-Supported or NIH-Conducted Genome Wide Association Studies. Analysis of melanoma GWAS was reviewed by The Northern and Yorkshire Research Ethics Committee; each participating study obtained informed consent from study participants, approval from its local IRB as previously described 29 . Meta-analysis of data conducted for the Transdisciplinary Research in Cancer of the Lung has been approved as protocol numbers STUDY00023900 and STUDY00023602 which were approved by the Committee for the Protection of Human Subjects under the auspices of the Trustees of Dartmouth College Dartmouth-Hitchcock Medical Center. All studies were reviewed and approved by institutional ethics review committees at the involved institutions. Analysis of the testicular germ cell tumour GWAS was reviewed by the NCI Special Studies Institutional Review Board and the University of Pennsylvania IRB #4.
Fine-mapping. Imputation across 2 Mb of chr5p15.33 (250,000 to 2,250,000 bps, hg19) was performed using phased haplotypes from the 1000G reference set (Phase 1 integrated release 3, March 2012) and IMPUTE2 for pancreatic cancer 11,57 and testicular germ cell tumours 28 . Imputed SNPs with low MAF (o0.01) or low-quality scores (IMPUTE2 information score o0.5) were removed before the association analysis. Association analysis between SNPs and case control status were performed using the score test of the log additive genetic effect with covariate adjustment using SNPTEST as previously described 18 . Imputation and association analysis for melanoma was performed using 1000G (Phase 1 integrated release 3, March 2012) as previously described 29 . Imputation for lung cancer 7,56 was performed by using 1000G (Phase 1 integrated release 3, March 2012) with the same quality thresholds as described, followed by association analysis and conditional analysis using summary statistics from a meta-analysis of the six studies of TRICL with GCTA 58 .
Overall, Region 2 was well-imputed. Within the pancreatic cancer GWAS data, all common 1000G variants (n ¼ 195, MAFZ0.01) in Region 2 (defined as the genomic region between the two recombination hotspots at 1,306,281-1,367,281 in NCBI build Hg19) had imputation accuracy (INFO) scores above 0.3 (the lowest quality score was 0.48). The imputation quality for the set of nine Region 2 variants most significantly associated with pancreatic cancer risk was high in the PanScan GWAS studies, with quality scores (INFO) ranging from 0.82 to 0.96 (average 0.92). Similar imputation quality scores were observed for these SNPs in the lung cancer, TGCT, and melanoma GWAS (INFO range 0.82 to 0.98; average 0.94). In addition, imputation quality was high for all SNPs that were statistically correlated with rs36115365 in 1000 Genomes CEU data (r 2 40.2). In PanScan, only a single such 1,000 Genomes variant had an imputation quality score (INFO) below 0.8 (rs186156459; INFO ¼ 0.79), suggesting that poor imputation quality did not lead to the exclusion of additional strong functional candidates from consideration. Similar imputation quality was likewise observed for the other cancer GWAS.
For completeness we assessed the newer 1000G (Phase 3, October 2014) reference dataset and noted an insertion/deletion variant (rs3030832) that was highly correlated to rs36115365 (r 2 ¼ 0.87 in EUR). We therefore re-imputed the pancreatic cancer GWAS dataset 11,57 with the newer 1000G reference set to reassess the association signal across Region 2 (defined as the genomic region between the two recombination hotspots at 1,306,281-1,367,281 in NCBI build Hg19) including this variant. rs36115365 became non-significant when analysis was conditioned on rs3030832, as was rs3030832, when analysis was conditioned on rs36115365 (Supplementary Table 2), indicating that this variant is among the highly correlated variants representing Region 2 and thus represents an additional strong functional candidate. We also observed seven additional variants with similar or slightly higher ORs as compared to rs36115365 (OR MAX ¼ 1.42). To formally test if these seven variants represented potential functional variants in Region 2 we performed a series of conditional analyses. After the analysis was conditioned on rs36115365 we noted a large drop in significance for these seven variants while conditional analysis for each of the seven variants did not dramatically influence the significance or rs36115365 (Supplementary Table 2).  59 . The cells were routinely tested for Mycoplasma and were negative on each occasion. None of the cell lines used are on the NCI or ICLAC lists of misidentified cells.
RNA and genomic DNA isolation. RNA was extracted using an RNeasy Plus Mini Kit (Qiagen). Quality and quantity of RNA was assessed in an Agilent 2100 Bioanalyzer (Agilent Technologies); only samples with RIN scores 49.0 were used. Genomic DNA was isolated using the ZR genomic DNA (D3050, ZYMO Research) and assessed by Nanodrop 8000 (Thermo Scientific).
EMSAs and ChIP. Nuclear extracts were purchased from Active Motif (PANC-1, MIA PaCa-2) or alternatively generated using a Nuclear Extraction Kit (A549, NCI-H460, UACC1113, UACC903, NTERA-2 and 2102Ep) (10009277, Cayman) according to the manufacturer's instructions. Recombinant human ZNF148 protein was purchased from Origene (TP602963, Origene). Oligos (30-36 nt, Invitrogen, listed in Supplementary Table 3) Supplementary Table 4. A TaqMan genotyping assay for rs36115365 (C_470504_10, Life Technologies) was used to quantify the C and G alleles in immunoprecipitated DNA samples in seven independent experiments ( Supplementary Fig. 12g). A paired two sided T-test was applied to C-and G-allele signals (normalized to input DNA) in order to assess significance of enrichment of the C versus G allele at rs36115365. The specificity of the ZNF148 antibody was tested by western blot analysis with and without siRNA mediated knockdown of ZNF148. GAPDH (ab37168, 1 mg ml À 1 , Abcam) was used as a loading control ( Supplementary Fig. 18).
Proteome-wide analysis of disease-associated SNPs. Nuclear extract collection and DNA pulldowns were performed essentially as described previously for both PANC-1 and UACC903 cell lines, using biotin-tagged oligo probes consisting of 20 bp on either side of rs36115365 (refs 60,61). After PBS washes, beads were resuspended in 50 ml 100 mM TEAB buffer, reduced, alkylated and digested with trypsin overnight. Then, digested peptides were labelled using dimethyl chemical labelling as described previously 62,63 . Experiments were performed in duplicate using label-swapping, and separately conducted using poly-dAdT competitor only, as well as using both poly-dAdT and poly-dIdC competitor. Data analysis was performed using MaxQuant (version 1.3.0.5) as described previously, using dithiomethane instead of carbamidomethylation as a fixed modification 32,64 .
Luciferase cloning and expression analysis. The genomic region containing and surrounding rs36115365 (240 bps) was PCR-amplified (primers listed in Supplementary Table 5) from HapMap CEU DNA samples with the appropriate genotypes to obtain clones with each genotype, and cloned into the NheI and BglII sites of the pGL4.23[luc2/minP] (Promega) luciferase vector in the 5 0 -to-3 0 or 3 0 -to-5 0 orientation. Plasmid inserts were sequence-verified to contain the correct inserts and genotypes. The forward (F) orientation of the inset is the same as the genomic orientation. The Firefly reporter plasmids (and a Renilla luciferase control vector) were co-transfected into pancreatic (PANC-1, MIA PaCa-2), melanoma (UACC903, UACC1113), lung (A549, NCI-H460) and testicular (NTERA-2, 2102Ep) cancer cell lines at B70% confluence using Lipofectamine 2000 (Life Technologies). Luciferase activity was measured 36 h after transfection with the Dual Luciferase Reporter Assay System (Promega). Firefly luciferase activity was normalized to Renilla luciferase activity, and graphed as compared to the empty luciferase vector. Experiments were performed in triplicate and repeated at least three times. A T-test was used to assess significance for differences in luciferase activity.
Region targeted siRNAs to rs36115365 regulatory locus. On-target antisense enhanced siRNAs targeting the locus encompassing rs36115365 were designed by using an siRNA design tool (http://dharmacon.gelifesciences.com/) and ordered from Dharmacon RNAi and Gene Expression in GE Healthcare and listed in Supplementary Table 6. No siRNAs were designable to directly overlap with rs36115365; the location of the nearest siRNA (siRNA3) was 8 bp from this variant. The siRNAs were introduced to cell lines by using RNAiMAX (Life Technologies) at a final concentration of 15 nM. RNA was extracted 48 h after transfection and reverse transcribed to cDNA by the SuperScript III First-Strand Synthesis System for RT-PCR (Life Technologies). Expression of target genes was determined on the cDNA by RT-qPCR TaqMan gene expression assays as described below. Experiments were performed in triplicate and repeated at least three times. We first tested 8 siRNAs in 4 cell lines. Three of the 8 siRNAs inhibited TERT expression in all four cell lines (PANC-1, A549, NTERA-2 and UACC903) whereas none of the 8 siRNAs inhibited CLPTM1L, ACTB or GAPDH expression. Thus, the inhibition of TERT expression by 3 out of 8 siRNAs versus 0 out of 8 for the other three genes gives rise to a Fisher's Exact test P value of 0.011, indicating that the inhibition of TERT is specific.
siRNA-mediated knockdown of ZNF148 and TERT mRNA. ON-TARGETplus Human SMARTpool siRNAs to ZNF148 (cat# L-012658-00-0005), VEZF1 (ZNF161; cat# L-019623-00-0005), ZNF281 (cat# L-006958-00-0005), ZNF740 (cat# L-030075-02-0005), and TERT (cat# L-003547-00-0005) were purchased from Dharmacon RNAi and Gene Expression in GE Healthcare. To assess possible off-target effects for the ZNF148 siRNAs we also purchased each of the four siRNAs from the SMARTpool separately and tested their effects on ZNF148 and TERT expression. All four siRNAs inhibited both ZNF148 and TERT expression indicating that off-target effects are not likely to explain our findings ( Supplementary Fig. 14). Transfection, RNA purification, cDNA generation and expression analysis procedures were as described above for the region-targeted siRNA assay, except that RNA was isolated 72 h after transfection. Experiments were performed in triplicate and repeated at least three times.
Real-time quantitative PCR. Gene expression levels were quantified by quantitative real-time PCR using TaqMan assays for TERT (Hs00972656_m1), CLPTM1L (Hs00363947_m1), ACTB (cat# 4333762), ZNF148 (Hs01070570_m1), and GAPDH (cat# 4333764) from Life Technologies. Gene expression levels of TERT, CLPTM1L and ACTB were normalized to GAPDH, while expression of GAPDH was normalized to ACTB. Allele-specific TERT expression was determined using an allelic discrimination TaqMan assay for rs2736098 (assay C_26414916_20, Life Technologies), and the gene expression of each allele of TERT was also normalized to the gene expression of GAPDH. Each experiment was performed in triplicate and repeated three times. Significance was assessed using a Student's two-tailed T-test (labelled significant if Po0.01).
Telomerase activity and telomere length. The telomeric repeat amplification protocol (TRAP) (ref. 20) was used to evaluate telomerase activity according to the manufacturer's guidelines (Millipore, #S7700). siRNAs targeting the gene regulatory region (siRNA3), ZNF148 (cat# L-012658-00-0005), TERT (cat# L-003547-00-0005) and a scrambled control siRNA (sequence listed in Supplementary Table 6) were administered to MIA PaCa-2, A549, UACC903 and NTERA-2 cells at a final concentration of 15 nM for 72 h. At that time the cells were harvested and whole cell extract prepared using CHAPS (3-[(3-cholamidopropyl)-dimethylammonio]-1-propanesulfonate) solution. The Bradford assay kit (Bio-Rad) was used to determine total protein concentration. Equal amounts of protein extracts were used to add telomeric repeats (GGTTAG) onto 3 0 end of substrate oligosnucleotide (TS) at 37°C for 30 min followed by 30 cycles of TRAP PCR and separation of PCR products on 12.5% non-denatured PAGE gels. The gels were stained with SYBR Gold Nucleic Acid Gel Stain (Life Technologies, #S-11494).
For rescue experiments, we attempted to create cell lines devoid of ZNF148 using CRISPR. Extensive screening of clones revealed none with homozygous loss of ZNF148, consistent with an essential role for ZNF148, further supported by the observed embryonic lethality of homozygous ZNF148 knock-out mice (International Mouse Phenotyping Consortium, IMPC Data Coordination Centre, MRC Hartwell Institute, Biocomputing, Harwell Campus, https://www.mousephenotype.org/data/charts?accession=MGI:1332234&allele_ accession_id=MGI:5636955 &zygosity=homozygote &parameter_stable_id= IMPC_VIA_001_001&pipeline_stable_id=BCM_001). Rescue experiments were instead performed by depletion of endogenous ZNF148 expression using an siRNA targeting the 3 0 -UTR of ZNF148 (designed using the 3 0 -UTR sequences of ZNF148; sense: AUGGAGAACUUGAUGCAAU; antisense: AUUGCAUCAAGUUCUC CAU) and reintroduction of exogenous ZNF148 expression. Human ZNF148 ORF (ORigene TrueORF RC222687) was cloned into the pDest-663 (derived from pFUGW) lentiviral expression vector and sequence verified. For lentivirus production, lentiviral vectors were co-transfected into HEK293FT cells with packaging vectors psPAX2, pMD2-G and pCAG4-RTR2. Virus was collected two days after transfection and concentrated by Vivaspin, before infecting MIA PaCa-2 and A549 cells. Seventy-two hours after delivery of the siRNA, ZNF148 and TERT expression, and telomerase activity, were assessed as described above.
To assay effects on telomere length, siRNA3, ZNF148 siRNA (cat# L-012658-00-0005), TERT siRNA (cat# L-003547-00-0005) and a scrambled siRNA (Supplementary Table 6) were administered to MIA PaCa-2 and A549 cells at a final concentration of 15 nM. The cells were re-transfected with siRNAs every four days and genomic DNA extracted 20 days later using the DNeasy Blood and Tissue Kit (Qiagen, #69506). Telomere length was then determined by qPCR by comparing telomere repeat sequence copy number to a single-copy gene (RPLP0) copy number in a given sample using telomere repeat-specific primers as previously described 65 . The assays were performed in triplicate and repeated three times.
Data availability. The authors declare that all data supporting the findings of this study are available within the article and its supplementary information files or from the corresponding authors upon reasonable request. Pancreatic cancer GWAS data is available from dbGAP (phs000206.v5.p3). Testicular germ cell tumour fine-mapping data are available from the corresponding authors upon reasonable request. The lung cancer fine-mapping data that supports the findings of this study are available from the corresponding authors upon reasonable request and are currently being processed for availability through dbGAP. The melanoma finemapping data that support the findings of this study are available from MMI (M.M.Iles@leeds.ac.uk) on reasonable request subject to specific consent for contributing cohorts.