INTRODUCTION

In recent years, genome-wide association studies (GWAS) have identified many genes associated with type 2 diabetes (T2D).1, 2, 3, 4, 5 The most strongly associated T2D locus in multiple ethnicities resides within the transcription factor 7-like 2 (TCF7L2) gene.6, 7, 8

TCF7L2 (formerly known as TCF4) is a member of high-mobility group box-containing transcription factors and operates at the last key stage and is therefore one of the main effectors of the canonical Wnt signaling transduction cascade,9 regulating the expression of downstream target genes. In the absence of β-catenin, TCF7L2 binds to Wnt-responsive elements to repress target gene transcription while β-catenin binding to TCF7L2 activates gene expression.

How genetic variation in TCF7L2 influences the risk of T2D has remained elusive and the primary site(s) of action for the gene product in the context of the disease is still unclear. It was initially speculated that TCF7L2 operated in tandem with insulin to influence blood glucose homeostasis through the alteration of levels of glucagon-like peptide 1 in the gut,6, 10 with knockout mouse models suggesting that TCF7L2 has an indispensable role in intestinal epithelium development.11 However, other studies have shown that the TCF7L2 variant is associated with increased TCF7L2 expression and decreased insulin secretion, possibly implicating the pancreatic β-cell,12, 13 although an indirect effect on insulin secretion by TCF7L2 action in another tissue cannot be excluded, including adipose14 and liver.15

Resolving the underlying functional mechanism to a given genetic association in the post-GWAS era has proven extremely challenging. However, the discovery of the TCF7L2 locus in the context of T2D presents a specific opportunity for translational analyses, as studies in multiple ethnicities16, 17 and with Bayesian modeling18 have now strongly implicated the intron 3 SNP rs7903146 (NG_012631.1:g.53341C>T) as the causal variant within this gene.

We therefore hypothesized that protein factors which bind to the immediate intronic region harboring rs7903146:C>T modulate TCF7L2 function and thus have impact further downstream where it exerts its effect. We elected to carry out oligo pull-down combined with mass spectrophotometry (MS) to elucidate the transcriptional machinery across this intronic SNP.

MATERIALS AND METHODS

Cell culture and nuclear extracts preparation

Human HCT 116 cells, where TCF7L2 is abundantly expressed, were cultured in Dulbecco’s Modified Eagle Medium (DMEM; 4.5 g/l glucose, 10% FCS, 100 U/ml penicillin and 100 μg/ml streptomycin). For the SILAC experiments, cells were grown according to the standard culture procedures in SILAC-light and SILAC-heavy labeled DMEM (ThermoFisher, Waltham, MA, USA) supplemented with 10% dialyzed fetal bovine serum, 13C6-lysine and 13C6-arginine. Cells were washed with cold phosphate-buffered saline then harvested by centrifugation for 5 min at 5000 rpm at 4 °C. In all, 108 cells were lysed in 250 μl low salt buffer (10 mM HEPES, 1.5 mM MgCl2, 10 mM KCl, 0.5% NP-40, proteasome inhibitor, phosphatase inhibitor, 10% glycerol, pH 7.9) by incubation on ice for 20 min, then vortexed for 10 s per 5 min. The nuclei were separated from the cytoplasmic fraction by centrifugation at 10 000 rpm for 1 min at 4 °C. The nuclei pellets were washed once with cold low salt buffer and harvested by centrifugation. The nuclear proteins were then released into high salt buffer (20 mM HEPES, 1.5 mM MgCl2, 420 mM NaCl, 0.2 mM EDTA, proteasome inhibitor, phosphatase inhibitor, 25% glycerol, pH 7.9), and the supernatant was subsequently collected through centrifugation at 12 000 rpm for 10 min at 4 °C.

Oligonucleotide pull-down

The 5' Dual Biotin modified oligonucleotides for rs7903146 were synthesized by Integrated DNA Technologies, Inc. (Coralville, IA, USA) The oligonucleotide sequences are as follows:

C allele:

For: 5′-ACAATTAGAGAGCTAAGCACTTTTTAGATACTATATAATTTAATTGCCGTATGAGGCACCC-3′

Rev: 5′-GGGTGCCTCATACGGCAATTAAATTATATAGTATCTAAAAAGTGCTTAGCTCTCTAATTGT-3′

T allele:

For: 5′-ACAATTAGAGAGCTAAGCACTTTTTAGATATTATATAATTTAATTGCCGTATGAGGCACCC-3′

Rev: 5′-GGGTGCCTCATACGGCAATTAAATTATATAATATCTAAAAAGTGCTTAGCTCTCTAATTGT-3′

In all, 10 μm of forward and reverse oligonucleotides were annealed before use. Four picomole of the oligonucleotides were then combined with 1 mg of nuclear extract in binding buffer (20 mM HEPES, 1.5 mM MgCl2, 150 mM NaCl, 0.2 mM EDTA, proteasome inhibitor, phosphatase inhibitor, 25% glycerol, pH 7.9) in a total volume of 1 ml. These solutions were incubated for 1 h at 4 °C on a rotator. A total of 60 μl of streptavidin was added to each tube, and the mixture was then further incubated for 30 min at 4 °C on a rotator. The beads were spun for 30 s at 800 g and then washed in 5 × 500 μl binding buffer. The supernatant was discarded, and proteins were eluted by boiling in SDS sample buffer. The eluates were then isolated via SDS-PAGE gel followed by Coomassie blue R-250 staining. The bands of interest were subsequently excised from the gel and digested by trypsin for MS.

MS

The sample was digested with trypsin and analyzed with nanoLC/MS/MS at the University of Pennsylvania Proteomics Core. The data were analyzed with Sequest (ThermoFinnigan, San Jose, CA, USA; version SRF v. 5). Scaffold (version Scaffold_3.6.0, Proteome Software Inc., Portland, OR, USA) was used to validate MS/MS-based peptide and protein identifications. For the SILAC experiments, Zoom Scan was added to the MS analytical method to monitor the labeled and unlabeled peptides. The quantification of the SILAC peaks was performed with ProteoIQ with customized modification so that the software could use the zoom-scanned SLICA peaks for quantification.

Immunoprecipitation and western blotting

Co-immunoprecipitation was performed using IP buffer (50 mM tris, 150 mM NaCl, proteasome inhibitor, phosphatase inhibitor, 10% glycerol, pH 8.0) on a rotator at 4 °C overnight. Western blotting was performed according to the standard procedures. The following antibodies were used: anti-TCF7L2 (Millipore, Billerica, MA, USA; 05–511), anti-PARP-1 (Cell Signaling, Danvers, MA, USA; 46D11), anti-DNA topoisomerase 1 (Abcam, Cambridge, MA, USA; Ab28432), and anti-RNA helicase A (Abcam, Ab26271).

Cloning, transfection and luciferase assay

The 501-bp fragments harboring rs7903146 were successfully amplified by PCR using genomic DNA from HCT116 cells. Both the A and T alleles were cloned in to the pGL4.26 vector (Promega, Fitchburg, WI, USA) in the forward strand orientation and also in the reverse orientation using restriction enzymes, KpnI and XhoI. The sequences of the oligonucleotides were as follows:

Forward 5′ KPN I: 5′-cccGGTACC TCCAATTTTTTCACATGTGAAGACATACAC-3′

501 bp Forward 3′ Xho I: 5′-cccCTCGAG CATTACAAATTATTAGAACTTTCACTATGTATTG-3′

Reverse 5′ KPN I: 5′-cccGGTACC CATTACAAATTATTAGAACTTTCACTATGTATTG-3′

Reverse 3′ Xho I: 5′-cccCTCGAG TCCAATTTTTTCACATGTGAAGACATACAC-3′

All constructs were verified by sequencing. HEK-293T cells, which are typically used due to the general ease of transfection, were seeded at 2 × 105 cells per well in opaque 48-well plates 24 h before transfection. Each well was transfected with 100 ng of the PGL4.26-Luc firefly reporter construct. Firefly luciferase constructs harboring each genomic element were then co-transfected with an internal control vector, pGL4.74 (hRluc/TK, Promega), plus 100 ng TCF7L2 and/or 100 ng β-catenin. β-Catenin is a key component of the experimental design as it is a well-established and key mediator of Wnt signaling via activation of TCF7L2. These transient co-transfections were carried out in four independent transfection experiments. Twenty-four hours after transfection, the activities of firefly and renilla luciferase were measured using the Dual-Luciferase Assay Kit (Promega), according to the manufacturer’s instructions. Luciferase activity ratio (firefly/renilla) for each transfection condition was normalized to the ratio obtained with the control vector.

RESULTS

PARP-1 and other peptides interact specifically with an oligonucleotide harboring rs7903146

We used the combination of oligo pull-down and MS in triplicate to identify the transcriptional machinery interacting with the genomic element within the third intron of TCF7L2 harboring the SNP rs7903146. Nuclear lysates from HCT116 cells, where TCF7L2 is abundantly expressed, were incubated with biotin-labeled, double-stranded 60-bp oligonucleotides spanning this SNP. The DNA–protein complexes were precipitated with streptavidin-agarose beads, and the bound proteins were isolated by denaturing SDS-PAGE, followed by staining with Coomassie blue R-250.

As shown in Figure 1, several bands were visible in the pull-down samples. An oligo coinciding with a SNP in strong linkage disequilibrium with rs7903146 in Caucasians but not in other ethnicities and widely rejected as the causal variant,19 namely rs12255372, did not yield this extra band to any extent. Furthermore, this particular band was also negligible with a scrambled oligo (Supplementary Figure S1).

Figure 1
figure 1

Oligo pull-down for protein identification. Proteins from nuclear lysates of HCT116 cells (2 mg total protein each) that bind to biotin-labeled, double-stranded oligonucleotides, stained with staining with Coomassie blue R-250. The protein band unique in the rs7903146 oligo pull-down, as compared with the non-functional proxy SNP, rs12255372, is identified by the arrow. The specific band in this run was also confirmed by MS to be the same protein complex as seen in previous runs.

This specific band was cut from the gel, digested with trypsin and submitted for LC-MS/MS analysis. We set a cutoff to N=10 for the number of identified peptides to be considered. The identified proteins from the initial run above the cutoff are listed in Table 1, in order of peptide abundance, with PARP-1 showing the greatest abundance by far, followed by DNA topoisomerase I and ATP-dependent RNA helicase A; comparable results from subsequent runs are shown in Supplementary Table S1.

Table 1 Identity of the most abundant proteins determined by mass spectrophotometry

rs7903146-associated self-regulation of TCF7L2

As PARP-1, DNA topoisomerase I and ATP-dependent RNA helicase A were identified as the most abundant proteins binding at this location, we attempted to characterize their interaction with each other and with TCF7L2 at the protein level to understand how these four factors may be involved in TCF7L2 regulation. Informed by a previous study, where it was suggested that the PARP-1 and TCF7L2 proteins physically interact,20 the obvious hypothesis was that these main proteins interact to form a complex bound across the rs7903146 region.

In order to test this hypothesis, we performed co-immunoprecipitation experiments. As shown in Figure 2, these four factors exhibited interaction with each other. This supports the notion that TCF7L2, PARP-1, DNA topoisomerase I and ATP-dependent RNA helicase A form a complex to regulate TCF7L2 itself.

Figure 2
figure 2

HCT116 nuclear extracts were used for co-immunoprecipitation to determine protein interaction. TCF7L2, PARP-1, DNA topoisomerase I and RNA helicase A interact with each other and form a complex. The first lane represents the negative control.

To test for further evidence of a feedback mechanism for TCF7L2 expression operating at this rs7903146 location, we cloned the genetic elements in both orientations upstream of the luciferase reporter gene in PGL4.26. Promoter activity was measured in 293T cells. Transfections were carried out in quadruplicate.

TCF7L2 lies at the end of the Wnt signaling pathway and is thus the effector for this cascade but of course needs to be activated first before it can exert its effect on the set of genes it regulates. Therefore, only overexpressing TCF7L2 will not be sufficient in our reporter system. β-Catenin is the key mediator to this effect and lies immediately upstream of TCF7L2, where it travels to the nucleus to physically interact with TCF7L2 to activate downstream gene expression thus its overexpression being necessary for our luciferase experiments. As such, and mirroring this point, we observed that overexpression of TCF7L2 or β-catenin alone is not sufficient to activate transcription; however, the combination of TCF7L2 and β-catenin overexpression produced a striking increase of upto fivefold in transcription levels (Figure 3). Furthermore, at this stage we also observed differences between rs7903146 alleles (Figure 3).

Figure 3
figure 3

Outcome of firefly luciferase constructs harboring each genomic element being co-transfected with an internal control vector, plus TCF7L2 and/or β-catenin, into 293T cells. Twenty thousand 293T cells were plated in per well of 48-well dish in 0.5 ml of complete growth medium 24 h before transfection. In all, 100 ng of firefly luciferase constructs harboring each genomic element were co-transfected with 1 ng internal control vector hRluc/TK, plus 100 ng TCF7L2 or/and 100 ng β-catenin, into each well. PGL4.26 empty vector was used as the control for the basal level of luciferase activity. Dual Luciferase assay was performed 24 h after transfection, and promoter activity values are expressed as arbitrary units using a hRluc/TK reporter for internal normalization. Experiments were carried out in quadruplicate, and the SD is indicated. Significant differences in expression levels are indicated.

‘X-ray repair cross-complementing protein 5’ (Ku80) and ‘replication protein A 70 kDa DNA-binding subunit’ preferentially bind to the T allele of rs7903146

We investigated whether there was allele-specific preferential binding for some of the proteins detected across rs7903146 in order to gain further insight into the regulation of TCF7L2. To test our hypothesis, we performed ‘two-way’ oligo pull-down experiments. The nuclear extracts from the ‘light’ cells that were isotopically labeled in SILAC medium were used in the C allele oligo pull-down while the nuclear extracts from the ‘heavy’ cells that were isotopically labeled in SILAC medium were used for the pull-down of the T allele oligo. The SILAC-labeled extracts were then switched for a reverse experiment in order to assess the reliability and reproducibility of the approach. Interestingly, we identified two less abundant proteins, X-ray repair cross-complementing protein 5 (XRCC5; also known as Ku80) and replication protein A 70 kDa DNA-binding subunit, that were preferentially binding to the T allele over the C allele (Figure 4).

Figure 4
figure 4

Relative DNA binding affinity of XRCC5 and RP-A p70 between C and T allele of rs7903146. Peptide abundance and fold change are shown in the table. One peptide spectrum of each protein is given. (a) Forward labeling experiment. (b) Reverse labeling experiment.

DISCUSSION

There have been many efforts to resolve the underlying causative mechanism for a given GWAS signal for a complex trait. However, in the vast majority of situations, there still remains only a list of candidate variants that could represent the causal event. The situation is somewhat more advanced for the T2D locus, TCF7L2, where rs7903146 is widely regarded as the causal variant through a logical process of elimination leveraging multiple ethnicities and novel statistical approaches.16, 17, 18

Indeed, when we first reported the association of TCF7L2 with T2D, we noted that rs12255372 and rs7903146 both captured the association well,6 but subsequent studies in other ethnicities observed rs12255372 was a less optimal tag-SNP and revealed rs7903146 to be clearly the best SNP to test across multiple populations.16, 17 As such, rs12255372 served as good control in this current study.

As this variant does not represent a coding variant residing in an exon, rather it resides in an intronic region, it is reasonable to presume it is involved in a regulatory process. As such, one obvious tactic to attempt to resolve its function is to determine whether a transcriptional complex binds specifically at this location, using a method such a oligo pull-down followed by detection with MS.

Through the use of this technique, we have identified PARP-1 binding across the immediate region harboring this variant. It is well established that PARP-1 has a role in DNA damage detection and repair, and recent studies have revealed important roles for PARP-1 in chromatin and transcriptional regulation.21, 22 PARP-1 is the focus of many key oncology programs within the pharmaceutical industry, so one intriguing question raised by our findings is: will PARP-1 inhibitors also alleviate the symptoms of T2D and could rs7903146 dictate dose response? Clearly much more work is required before this can be resolved, but there are already intriguing clues in the literature to support this notion. For example, PARP activation has been shown to be present in healthy subjects at risk of developing diabetes, as well as in established type 2 diabetics.23 PARP-1 inhibitors have also been shown to alleviate symptoms of diabetic complications.24, 25, 26 Furthermore, PARP-deficient mice are protected from streptozotocin-induced diabetes.27

With respect to the connection with oncology, missense mutations in TCF7L2 have been known for many years to be strongly linked with colorectal cancer risk.28, 29 Indeed, this connection intensified following reports that the 8q24 locus revealed by GWAS of a number of cancers, including colorectal carcinomas, was due to an extreme upstream TCF7L2-binding element driving the transcription of MYC.30, 31, 32 In addition, it has been shown that when TCF7L2 recurrently fuses with its neighboring gene, VTI1A, colorectal adenocarcinomas result. Curiously, many of the other GWAS-implicated loci associated with T2D are also associated with the risk for prostate cancer.33 Although there are apparent genetic links between cancer and T2D, the mechanism is still not well understood, but the fact that we propose a role of PARP-1 in the regulation of TCF7L2 may shed some light on this striking picture emerging from GWAS findings.

Our results suggest that TCF7L2 may be also regulated by other proteins in this PARP-1 dominated complex, including DNA topoisomerase 1 and ATP-dependent RNA helicase A. DNA topoisomerase 1 is known to interact with DNA intermediates and proteins of base excision repair;34 indeed, poly(ADP-ribose) reactivates stalled DNA topoisomerase 1 and induces DNA strand break resealing.35 RNA helicases can unwind double-stranded DNA and RNA in a 3′ to 5′ direction and thereby regulate the interactions between proteins and DNA or RNA;36 indeed, RNA helicases participate in multiple biological processes, including transcription, splicing and translation.37 The mechanism by which these three protein operate resonates with the fact that the open chromatin in islets has been reported to differ between alleles of this key SNP.38

The fact that the top three most abundant binding proteins physically interact with the TCF7L2 protein suggests a feedback mechanism. After all, TCF7L2 regulates MYC at the key cancer GWAS-implicated locus, so it is conceivable that TCF7L2 regulates TCF7L2 at the key T2D GWAS-implicated locus. Our reporter assay results are consistent with the fact that TCF7L2 and β-catenin are both required for canonical Wnt pathway signaling, and our data strongly suggest that the Wnt signaling pathway has a crucial role in the regulation of TCF7L2 expression itself, such that a possible alteration in this feedback mechanism via rs7903146 could, at least in part, explain the functional mechanism through which this variant confers T2D risk. Interestingly, our previous TCF7L2 ChIP-seq work characterized four sites within the TCF7L2 gene itself,39 three of which were in intron 3 in close single-digit kilobase proximity to rs7903146, thus resonating with these results; these findings, including the fact that TCF7L2 was not detected in our MS work but did exert an effect when using a much longer DNA context with the luciferase assays, suggest that TCF7L2 is binding from a distance and interacts with this complex via DNA folding, which would need to be fully resolved in future studies using techniques such as 3C. Curiously, we observed stronger transcriptional activity in the reverse direction than in the forward direction. Importantly, we detected differences between alleles, which may partly explain functional difference between alleles.

The consistent results from two-way pull-down suggest that XRCC5 and RP-A p70, additional logical members of the detected protein complex, may have a role in the regulation of TCF7L2 by an allele-specific binding preference. XRCC5, also known as Ku80, is a component of ATP-dependent DNA helicase II complex and is known to competitively regulate β-catenin and TCF7L2-mediated gene transactivation.20 The XRCC5/6 heterodimer is also involved in DNA damage and repair.40, 41 The replication protein A 70 kDa DNA-binding subunit (RP-A p70), also known as replication factor A protein 1, participates in multiple biological processes in DNA metabolism, such as recombination, replication, damage and repair.42, 43

However, it should be noted that the difference in DNA binding affinity of these two proteins, based on our MS results, between the C and T allele of rs7903146 are moderate. Indeed, it is conceivable that not all proteins in the complex mediate an effect directly so it is possible that not all would yield an allelic difference; but it is also possible that the lack of allelic difference with the more abundant proteins could be due to the sensitivity of the method we used. In fact, the odds ratio for the T-risk allele of rs7903146 is approximately 1.4 in GWAS studies. Odds ratios reported in GWAS are generally relatively modest as it is widely viewed that common diseases are caused by a large number of causal variants with small effect sizes. Therefore, the development of new approaches for sensitive, reliable and quantitative analyses in biological processes is necessary for effects of this kind. Quantitative and high-throughput proteomics may prove to be powerful tools for truly post-GWAS functional studies.

Although we ascertained this binding complex to be specific to rs7903146, where it was neither binding to a scrambled oligo nor to its strongest non-causal proxy, more work is required. More studies need to be carried out in other cell lines, tissues and settings beyond the standard culturing conditions for colorectal carcinoma cell line HCT116 used in this study. We elected to use HCT116 in this study as this colorectal carcinoma cell line has been leveraged in the past to study TCF7L2 activity,44 and this gene has long been known to have a role in colorectal cancer as well as T2D. This cell line was considered optimal for our study for a number of reasons. First, our previously published ChIP-seq analysis of TCF7L2 occupancy across the genome in HCT116 revealed that the top enriched pathways for the genes bound by this transcription factor were related to diabetes and cardiovascular traits.39 Furthermore, as TCF7L2 protein levels are abundant in non-transformed HCT116 cells, the gene is clearly being expressed in this setting. Also, this cell line divides rapidly, thus providing us with sufficient levels of total protein extract in a timely manner for us to carry out the oligo pull-down experiments.

HCT116 is heterozygous for rs7903146, but we do not believe this to be relevant to the experiments we carried out; after all, we are just leveraging the nuclear extract from this cell line to interact with our designed oligos. As such, we were not considering or engaging the genomic status of the cell at all. Furthermore, rs7903146 is intronic so it does not directly influence the amino-acid sequence of the TCF7L2 protein present in the cell.

Collectively, we propose a possible mechanism where TCF7L2 interacts with a protein complex, which includes PARP-1, to form a complex to regulate TCF7L2 itself. However, more detailed studies of these proteins are required to fully characterize the functional meaning of these interactions.