Coordinated Regulatory Variation Associated with Gestational Hyperglycemia Regulates Expression of the Novel Hexokinase HKDC1

Maternal glucose levels during pregnancy impact the developing fetus, affecting metabolic health early and later in life. Both genetic and environmental factors influence maternal metabolism, but little is known about the genetic mechanisms that alter glucose metabolism during pregnancy. Here we report that haplotypes previously associated with gestational hyperglycemia in the third trimester disrupt regulatory element activity and reduce expression of the nearby HKDC1 gene. We further find that experimentally reducing or increasing HKDC1 expression reduces or increases hexokinase activity, respectively, in multiple cellular models; and that purified HKDC1 protein has hexokinase activity in vitro. Together, these results suggest a novel mechanism of gestational glucose regulation in which the effects of genetic variants in multiple regulatory elements alter glucose homeostasis by coordinately reducing expression of the novel hexokinase HKDC1.


Introduction
Regulation of glucose metabolism during pregnancy differs markedly from the non-gravid state as the mother must meet both her own and the growing fetus's energy needs. The differences are characterized by the combination of lower fasting glucose levels, increased hepatic glucose output, increased nutrient-induced insulin secretion, and significant insulin resistance by the third trimester 1 . In 5-10% of pregnancies, however, glucose homeostasis is not maintained, resulting in hyperglycemia and gestational diabetes, i.e. diabetes that is first diagnosed during pregnancy. Gestational hyperglycemia is a major health risk to the mother, and is associated with adverse fetal outcomes including increased risk of type 2 diabetes (T2D) and obesity in the offspring [2][3][4][5] . Despite the prevalence and health risks, the mechanisms leading to gestational hyperglycemia remain largely unknown.
We recently identified a strong association between genetic variants in the first intron of HKDC1 and 2 hr plasma glucose levels during an oral glucose tolerance test performed in a multiethnic cohort of 4,437 mothers at ~28 weeks gestation (F-test, p = 8.26 × 10 −13 , β = 0.167 − 0.229 √[mmol L −1 ]) 6 . The association was replicated in a cohort of 2,192 additional European mothers and two smaller independent European and Canadian cohorts (n = 228 and 606). In a separate study of ~47,000 non-gravid individuals of European ancestry, the same region had only marginal association with 2 hr plasma glucose, suggesting that the association with HKDC1 is largely pregnancy specific 7 . In the 1,000 Genomes Project database (1KGP), there are 60 variants in the coding exons of HKDC1. Seven of those variants are common, defined by a minor allele frequency > 1%. According to computational predictions of the effects of those variants on HKDC1 activity, between one and four of the common variants may impact the function of HKDC1 8,9 (Supplementary Data 1). However, none of the coding SNPs were associated with 2 hr plasma glucose at genome-wide significance (p < 1 × 10 −8 ). While the function of HKDC1 is unknown, the gene is broadly expressed including in liver and β-islet cells 6,10 ; conserved across vertebrates; and may be a novel human hexokinase based on its sequence similarity to hexokinase I (HK1) 11 .
Based on these data, we hypothesized that genetic variation that alters gene regulation may contribute to the association with gestational hyperglycemia. Here we show that there are multiple variants that alter regulatory element activity in the region. Moreover, the effects of those variants are coordinated across four enhancer elements in the associated HKDC1 locus. For each variant, alleles associated with reduced expression also associate with higher plasma glucose in mothers. We further show that modulating HKDC1 levels alters hexokinase activity in multiple cellular models, and that purified HKDC1 protein has hexokinase properties in vitro. Together, our results support a novel mechanism of gestational glucose regulation in which coordinated variation across multiple enhancer elements within regulatory haplotypes reduces expression of the novel hexokinase HKDC1.

Identification of regulatory variants in the HKDC1 locus
The strongest genetic association with 2 hr plasma glucose was located within the first intron of HKDC1 at rs4746822. That variant and variants in moderate linkage disequilibrium (LD) with rs4746822 (r 2 > 0.3) lie within a region of the genome exhibiting chromatin modifications consistent with active gene regulation across diverse tissues (Supplementary Figs. 1 and 2, Supplementary Table 1). To investigate the potential for regulatory variation contributing to the association with gestational hyperglycemia, we focused on a ~30 kb region defined both by the observed pattern of LD in the region and by evidence of gene regulation. As shown in Supplementary Fig. 2, there is little evidence of regulatory activity immediately flanking the target region, increasing our confidence that any regulatory variants contributing to the genetic association will be evaluated. There were three genotyped variants and nine imputed variants in the target region that were associated with 2 hr plasma glucose at genome-wide significance (p < 1 × 10 −8 , Supplementary Data 2). Together, those data led us to hypothesize that rs4746822 or variants in LD with rs4746822 influence maternal glucose metabolism by altering the activity of regulatory elements that control HKDC1 expression.
Within the target region, we identified 11 candidate regulatory elements based on increased chromatin accessibility across 16 tissues including the metabolically relevant liver stellate cells and pancreatic islet β cells, (Fig. 1a, Supplementary Table 1) 12,13 . The candidate regulatory regions account for 8.5 kb (28%) of the nucleotides in the 30 kb target region. Of the 425 variants in the target region identified by the 1KGP, 132 (30%) were in the candidate regulatory elements 14 . There were 203 total haplotypes of the individual regulatory elements, 60 of which were common in the 1KGP population (haplotype frequency > 1%). The 60 common haplotypes accounted for 98% of all of the haplotypes for the regulatory elements sequenced by the 1KGP (Supplementary Table 2).
To determine if genetic variation in the identified regulatory elements influences endogenous HKDC1 expression, we compared the variants associated with gestational hyperglycemia to those associated with gene expression changes in an expression quantitative trait locus (eQTL) study performed in primary human livers 15 . To do so, we imputed the tag variants from both the GWA and eQTL studies to the variants identified in the 1KGP 14 . That analysis revealed four significant eQTLs for HKDC1 [log 10 (Bayes Factor) > 2.5] within DNaseI hypersensitive sites (DHS) that were also associated with maternal 2 hr glucose levels (Fig. 1b, Supplementary Fig. 2, Supplementary Data 3). We did not find evidence that the same variants also influence expression of nearby HK1 in the same study (Supplementary Data 4), and known HK1 eQTLs in other cell types do not overlap the variants associated with maternal glucose levels 16 . When we expanded our eQTL analysis to all genes within 500 kb of HKDC1, we did not find any evidence that variants associated with 2 hr maternal glucose levels were also eQTLs for those genes ( Supplementary Fig. 3). Together, these results suggest that genetic variants associated with gestational hyperglycemia near HKDC1 alter HKDC1 expression.
To identify causal variants that control HKDC1 expression, we used luciferase reporter assays to measure allele-specific regulatory activity of the candidate regulatory elements in the region in the HepG2 liver cell line. We identified 1KGP subjects that maximized the genetic diversity in the candidate regulatory elements, and used PCR to clone the regulatory elements from those individuals upstream of a Simian Virus 40 promoter driving expression of a luciferase reporter gene. From the genomes of 19 individuals (Supplementary Table 3), we cloned 60 different naturally occurring haplotypes representing 93% of the haplotypes of the 11 regulatory elements that were identified by the 1KGP. To disentangle the effects of individual variants on expression that do not segregate in the population and to increase our power to detect regulatory effects of variants that are rare in the population, we used site directed mutagenesis to generate reporter constructs for 45 additional haplotypes (Supplementary Data 5). By balancing the number of observations of each allele, we were able to alleviate some of the loss in power due to allele frequency differences (Supplementary Table 4). In total, we assayed 105 haplotypes composed of 57 variants. Nine of the 11 regulatory elements had enhancer activity, with relative luciferase expression between 1.9 to 14-fold over control ( Supplementary Fig. 4). Enhancer strength was positively and significantly correlated with average DHS peak intensity in HepG2 cells (Spearman ρ = 0.64, p < 0.05, Fig. 1b), however, we did not find significant correlation between DHS peak width and enhancer strength ( Supplementary Fig. 5). We do not take these results as evidence that DHS peak intensity generally predicts enhancer strength, only that DHS sites are enriched for active regulatory elements.
To identify individual genetic variants in the regions that altered enhancer activity, we used a multiple linear regression model to estimate the effect of each variant acting independently on reporter expression (Supplementary Data 6, Supplementary Fig. 6). That analysis revealed 14 variants distributed across six regulatory elements that significantly altered gene expression (Bonferroni-adjusted p < 0.01, Fig. 1c, Table 1, and Supplementary Table 5). In agreement with other studies, we found that the effect of a single variant on enhancer activity is generally < 2 fold [17][18][19] . We next determined if the estimated effects of individual variants within an element behaved cumulatively. For each of the six elements containing regulatory variants, we quantified the expected luciferase values for each haplotype by adding the coefficients based on each variant within the haplotype and plotted those against the observed luciferase values. In all six elements, predicted luciferase values were positively and significantly correlated with observed luciferase values (Pearson r 2 > 0.25, p < 0.0001, Supplementary Fig. 7). Although we observe substantial cumulative effects, we cannot dismiss the possibility that more complex interactions also exist.
The results from the allele-specific reporter assays were positively validated in the liver eQTL study. For 10 out of 12 variants present in the eQTL analysis, the effect observed in our allele-specific reporter assays was in the same direction as predicted by the eQTL association (p = 0.02, binomial test). Among the validated regulatory variants were four genome-wide significant eQTLs (Fig. 1d, Table 1). The observed effects of those four variants in our reporter assays were not specific to HepG2 cells. Specifically, in seven of the eight cases in which one of the variants significantly altered regulatory element activity in a different cell type, the direction of the effect was consistent with that observed in HepG2 cells (Supplementary Table 6). Together, these results indicate that rs10762264, rs4746822, rs2394529, and rs9645501 are regulatory variants in four separate enhancers that contribute to HKDC1 expression in human liver.

The identified regulatory variants have coordinated effects
In order for the regulatory variants identified to collectively contribute to maternal glycemia, we hypothesized that there would be coordination between allele-specific regulatory element activity and maternal 2 hr glucose levels. Supporting our hypothesis, all four variants that were significant eQTLs for HKDC1 in liver were associated with maternal 2 hr glucose levels with three reaching genome wide significance in the imputed GWA data ( Table 1). All four alleles that were associated with increased 2 hr glucose also had decreased luciferase expression in the reporter assays and decreased HKDC1 expression in the eQTL analysis ( Fig. 1d, Table 1). We expanded the analysis to include 12 genome-wide significant SNPs associated with 2 hr glucose and found that each of the 12 SNPs associated with increased glucose also associated with decreased HKDC1 expression (p = 0.0002, binomial test, Supplementary Data 7). We further expanded the analysis to include all 178 tested SNPs within the 30 kb locus and found the risk allele to decrease HKDC1 expression in 131 cases (p = 1.14 × 10 −10 , binomial test, Supplementary Data 7). Together, those results support a model in which multiple genetically linked regulatory variants have a coordinated effect of reducing HKDC1 expression in women with higher gestational glucose levels. Exceptions to the coordination were limited to variants that were not significantly associated with 2 hr plasma glucose levels or HKDC1 expression and that were not in strong LD with the lead GWA variant. For rs7089277, the minor allele frequency is low (7%) and therefore the variant may have limited effects in the population, and there is only nominal association with 2 hr glucose (p = 8.02 × 10 −5 ). The 2 SNPs in region X have large luciferase β's, but were not significantly associated with 2 hr glucose (p = 0.0183 and 0.0184), were not in LD with rs4746822 (r 2 = 0.077), and did not significantly associate with HKDC1 expression. Furthermore, reporter assays in three additional cell types found cell-dependent effects for the variants in region X (Supplementary Table 7). Together, these results indicate that there are strongly coordinated regulatory effects across the genomic region associated with 2 hr plasma glucose levels, and that exceptions to that coordination likely do not contribute to the expression of HKDC1.

HKDC1 is a novel human hexokinase gene
Having demonstrated that alleles associated with higher maternal 2 hr glucose levels decrease HKDC1 expression, we next sought to understand whether HKDC1 has hexokinase activity that may explain its contribution to glucose homeostasis. Phosphorylation of glucose by hexokinase is the first step in glucose metabolism. Screens for hexokinase activity in rat cells have identified four distinct hexokinases using electrophoresis and chromatography. Those genes have been mapped, and are known as HK1, HK2, HK3, and glucokinase (GCK) [20][21][22] . Reduced hexokinase activity has previously been associated with metabolic phenotypes including diabetes 23,24 .
To determine if HKDC1 contributes to cellular hexokinase activity, we first used siRNAmediated knockdown to model the genetically reduced levels of HKDC1 expression present in women with higher 2 hr glucose concentrations, and measured the effect of that reduction on cellular hexokinase activity. Targeting HKDC1 with two different siRNAs individually and together in HepG2 cells, we reduced mRNA expression by 35-60%. Reduced HKDC1 expression resulted in a dose-dependent decrease in cellular hexokinase activity (Fig. 2a). The siRNAs were specific and did not alter expression of the HK1, HK2, or GCK (Fig. 2b). HK3 expression was not detectable in HepG2 cells, as expected 12 . The magnitude of reduction in hexokinase activity indicates that about half of the hexokinase activity in HepG2 cells is due to HKDC1. The result agrees with RNA-seq data showing that half of the hexokinase mRNA expression in HepG2 cells is HKDC1 (Supplementary Table 8). Providing further evidence that HKDC1 contributes to cellular hexokinase activity, we also found that overexpression of HKDC1 mRNA by transient transfection of an HKDC1 expression plasmid increased cellular hexokinase activity ( Fig. 2c and 2d).
To demonstrate that the activity of HKDC1 was not specific to HepG2 cells, we transduced the rat pancreatic β cell line, INS-1, with an adenovirus that expressed HKDC1 from a human cytomegalovirus promoter, and measured hexokinase activity in the cell lysates at glucose concentrations ranging from 0-50 mM 25 . The predominant hexokinase in INS-1 cells is GCK, and the cells have little or no detectable low K m hexokinase activity 25 . Cells transduced with the HKDC1 adenovirus had substantially increased HKDC1 protein (Fig.  2e, Supplementary Fig. 8), and increased hexokinase activity across the concentration range (Fig. 2f, Supplementary Fig. 9). Moreover, the shift in the dose-response curve suggests that HKDC1 has a lower K m than GCK. Adenoviral-mediated HKDC1 overexpression did not alter the expression of HK2, HK3, or GCK and decreased HK1, demonstrating that the virus specifically overexpressed HKDC1 (Supplementary Fig. 10). Transduction also did not result in decreased cell density ( Supplementary Fig. 11). We repeated the experiment with independently purified virus and independently grown INS-1 cells, with similar results ( Supplementary Fig. 12). Together, these results demonstrate that HKDC1 contributes to cellular hexokinase activity either through direct hexokinase activity or through modulating the activity of the other hexokinases.
Based on the extent of nucleotide and amino acid similarity between HKDC1 and members of the hexokinase family, Supplementary Data 8 and 9, Supplementary Table 9 and 10), we hypothesized that the effects of HKDC1 expression on cellular hexokinase activity are in part the result of the direct enzymatic activity of HKDC1. To test that hypothesis, we purified HKDC1 and compared its specific activity to that of purified HK1. Protein purity was confirmed via a Coomassie blue stain (Fig. 2g, Supplementary Fig. 8). Results of the assays indicate that HKDC1 protein alone has hexokinase activity, with 20% of the specific activity of HK1 under the same conditions (Fig. 2h). Because our results show HKDC1 hexokinase activity in vitro and in vivo, we propose renaming the HKDC1 gene to HK5.

Discussion
We have shown that regulatory variation associated with gestational glucose metabolism alters expression of a novel human hexokinase. This result adds to the growing empirical evidence that regulatory variants contribute to a variety of common complex human phenotypes [26][27][28] . Moreover, our results provide empirical evidence that regulatory variants spanning multiple enhancers have a coordinated allelic effect on HKDC1 expression. A recent analysis suggests that such coordinated effects may be a common mechanism by which regulatory variants influence gene expression 26 . Such coordinated disruption of regulatory haplotypes may also explain how modest effects of individual regulatory variants 17 can together have a sufficiently strong effect so as to be detectable in a genomewide genetic analysis. The burden of multiple genetic variants disrupting regulatory haplotypes may therefore help explain the overabundance of non-coding genetic variants in GWA studies. Notably, however, independent effects were not observed in our analysis of the liver eQTL data. This negative result may be explained by differences in effect size, limited power between closely linked variants, and tissue-specific effects ( Supplementary  Fig. 13). Together, we take the results presented here to be consistent with a model of coordinated regulatory variation, and we emphasize the need for additional investigation. In part, challenges of strong association between variants and heterogeneous effect sizes may be overcome with high-throughput empirical investigation of allele-specific regulatory activity in large populations 17,29 .
Importantly, the plasmid-based reporter assays used to detect regulatory variants here are likely biased towards detecting regulatory elements that act independent of genome context. Additional elements may be active when in the native genomic context, where interactions between from multiple clustered regulatory elements contribute to effects on nearby gene regulation. Similarly, determining the direct effects variants detected here on endogenous gene expression remains a challenge. Addressing that possibility will require additional studies that include genomic context, potentially through direct modification of the genome via genome editing 30,31 .
Studying regulatory effects in the locus associated with maternal glycemia also led us to the discovery of a novel human hexokinase that appears to have important metabolic effects during pregnancy. Previous screens to identify vertebrate hexokinases 32,33 have identified HK1, HK2, HK3, and GCK, but did not identify HKDC1. One possible reason why HKDC1 was missed previously is that a high degree of structural similarity between HKDC1 and HK1 may have obscured separation in chromatography and electrophoresis. We also found that purified HKDC1 has reduced hexokinase activity when compared to HK1 in our in vitro assays. One possibility is that additional co-factors could be required for full HKDC1 activity in vitro, but that those factors are lost during the purification process. Such diminished hexokinase activity may have prevented earlier detection of HKDC1 as a hexokinase by less sensitive methods.
The tissues that are most relevant for HKDC1's role in meeting the metabolic demands of pregnancy are not yet known, nor is it known how HKDC1 is regulated in response to pregnancy hormones. We expect that future evaluation of these questions will provide new insights into the metabolic changes that occur during pregnancy. More broadly, the results from this study demonstrate the value of identifying and pursuing the targets of noncoding phenotype-associated genetic variants for revealing novel mechanisms of human disease 34 .

Luciferase Reporter Assays
Selected regions were amplified from genomic DNA from 1KGP subjects and In-Phusion cloned at the NheI cut site into pGL4.13 luciferase expression vector (Promega). The construct was then transformed into TOP-10 competent cells and plated onto LB agar plates with ampicillin and incubated overnight at 37°C. In order to capture both haplotypes from subjects who were heterozygous in those regions, multiple colonies were selected and grown individually in LB media overnight. Plasmids were extracted using the PureYield Plasmid Miniprep System (Promega). Constructs were sequenced using Sanger sequencing and variants were confirmed in dbSNP 35 . Site directed mutagenesis was carried out by amplifying the plasmid using primers containing the allele of choice, treating with DpnI, and transforming into TOP-10 competent cells. Primers for cloning and site directed mutagenesis are listed in Supplementary Tables 11 and 12. HepG2 cells (Duke Cell Culture Facility)were plated into white flat-bottom 96-well plates at a density of 25,000 cells/well. After 48 hr, 100 ng of each construct/well was transfected with Fugene HD (Promega) at a 5.5:1 Fugene:DNA ratio. At least 16 biological replicates for each construct were transfected. After 24 hr, luciferase and renilla luciferase signal were quantified using the Dual-glo Luciferase Assay (Promega) using a Victor3 1420 plate reader (PerkinElmer). Luciferase values were normalized by dividing the luciferase signal by the renilla signal. A linear regression model implemented in the R statistical package (R function lm) was used to determine the regulatory effect of individual SNPs where (normalized luciferase intensity) = β 1 SNP 1 + β 2 SNP 2 +β 3 SNP 3 …+ ε. Raw data for luciferase reporter assays are provided as supplementary datasets.

RT-qPCR
RNA was isolated using the RNeasy kit (Qiagen). 2 μg of RNA per reaction was reverse transcribed using the SuperScript Vilo cDNA Kit (Life Technologies). All qPCR reactions were performed in biological triplicates and technical duplicate using the PerfeCTa qPCR Fastmix (Quanta) on an ABI StepOnePlus cycler. Reactions were cycled at 95°C (10 s) and the primer annealing temperature (30 s) for 40 cycles. Calculations were performed using the ΔΔC t method using β-actin as a reference control. Primer sequences are listed in Supplementary Table 13.

Western Blots
Transduced cell pellets were sonicated for 5 × 30 s pulses in 100 μL of RIPA buffer supplemented with protease inhibitors. The protein concentrations were quantified using a Bicinchoninic acid (BCA) assay (Thermo). The lysate was then mixed 1:1 with loading buffer supplemented with 5% 2-mercaptoethanol and incubated at 76°C for 15 m. 10 μg of protein along with the Precision Plus Protein Dual Color Standard (Biorad) was loaded into each lane of a Mini-Protean TGX 4-20% gel (Biorad) and run at 100 V for 1 hr at 4°C. The membrane was blocked with 5% skim milk for 30 m and washed 3 times in 0.1% Tween-20 in PBS. The membrane was incubated overnight in anti-HKDC1 antibody produced in rabbit (Sigma) diluted 1:500 in skim milk. The membrane was washed 3 more times and incubated in horseradish peroxidase (HRP) conjugated anti-rabbit antibody diluted 1:2500 in skim milk for 75 m at room temperature (Santa Cruz Biotechnology). The membrane was washed 3 times and incubated at room temperature in 10 μL SuperSignal West Pico Chemiluminescent Substrate (Thermo) for 15 m. After exposure and imaging, the membrane was stripped using Restore Western Blot Stripping Buffer (Thermo) and blotted using 1:500 anti β-actin antibody produced in mouse (Santa Cruz Biotechnology) and 1:2500 HRP conjugated anti-goat antibody (Santa Cruz Biotechnology) diluted in skim milk. After transfection, viral lysate was collected, further amplified in HEK293 cells plated in 5 × p150 tissue culture plates. Prior to complete lysis, media was removed, infected cells harvested in 2 mL Freeze/thaw buffer (10 mM Tris/Hcl, pH 8.0, 1 mM MgCl 2 ), lysed by 2 freeze/thaw cycles, and the virus purified by ultracentrifugation on a CsCl gradient. Purified virus was de-salted with a 7k MWCO column (Thermo Scientific) equilibrated with freeze/ thaw buffer, and glycerol was added to a final concentration of 10%. Viruses were titered by measuring OD 260 (1 OD 260 = 1.1 × 10 12 virions mL −1 ) as well as by plaque assay in HEK293 cells.

Adenovirus Construction and Transduction
To prepare cells for transduction, INS-1 cells were first seeded at a density of 7 × 10 5 per well into 12-well plates. INS-1 832/13 cells were kindly provided by Christopher Newgard (DMPI, Duke University) 37 . After 24 hr, the cells were transduced with HKDC1-, HK1-, or GFP-expressing adenovirus in triplicate. Cells were grown for another 24 hr and pelleted with centrifugation.

Protein Purification
Rosetta 2 (DE3) (EMD Millipore) cells were transformed with pReceiver-B01 plasmid containing either HKDC1 or HK1. A single colony was grown overnight in 100 mL TPM media (20 g L −1 Tryptone, 15 g L −1 Yeast extract, 8 g L −1 NaCl, 2 g L −1 Na 2 HPO 4 1 g L −1 KH 2 PO 4 ) augmented with 50 μg mL −1 ampicillin and 34 μg μL −1 chloramphenicol at 37°C. The pregrowth culture was added to 500 mL of the same media and grown to an OD of ~0.6. Isopropyl β-D-1-thiogalactopyranoside (IPTG) was added to a final concentration of 1 mM, and the cells were grown for an additional 4 hr at room temperature.
Cultures were centrifuged at 10,000 × g for 30 m and resuspended in lysis buffer (50 mM NaH 2 PO 4 , 300 mM NaCl, 0.25% Tween-20, 5% sucrose, 5% glycerol, 2 mM imidazole, and 10 mM 2-mercaptoethanol) supplemented with protease inhibitors (1 mM phenylmethanesulfonyl fluoride [PMSF], 2 μg mL −1 aprotinin, 0.5 μg mL −1 leupeptin, 0.7 μg mL −1 pepstatin A). Cells were sonicated for 5 m followed by DNase I treatment (4 U mL −1 ) for 15 m on ice. The lysates were clarified by centrifugation at 10,000 × g for 30 m and the supernatant passed over a HisTALON column (Clontech). The column was washed with 25 mL of wash buffer (Lysis buffer pH 7.0, 25 mM imidazole). The protein was eluted from the column in 1 mL fractions using wash buffer with 500 mM imidazole. DTT was added to each fraction to a final concentration of 1 mM.

Cellular Hexokinase Assays
To measure cellular hexokinase activity in knockdown and overexpression experiments, we used an absorbance-based hexokinase assay (Sciencell #8408). Cell pellets were sonicated in 200 μL of 1% Triton at 4°C for 5 × 30 s pulses at medium intensity using a Bioruptor (Diagenode). The lysate was centrifuged at 1000 × g for 5 m to remove remaining cell debris. The assay was performed by combining 80 μL of reaction mix (42 μL assay buffer, 20 μL cofactor, 2 μL developer, 6 μL enzyme mix, and 10 μL NADP), 10 μL of cell lysate, and 10 μL of 0-500 mM glucose was in each well of clear 96-well microplates. The plates were incubated in the dark for 90 m at room temperature. Absorbance at 490 nm was quantified in technical duplicates using a Victor3 1420 plate reader (PerkinElmer).

Multiple Sequence Alignment
Multiple sequence alignments and percent identity matrices were constructed using Clustal Omega using default settings http://www.ebi.ac.uk/Tools/msa/clustalo/) 38 . Protein and DNA sequences were downloaded from UCSC genome browser 39 .

siRNA Knockdown
HepG2 cells were plated at 1 × 10 5 cells/well into 24-well plates. After 48 hr, cells were transfected with either siRNAs targeting HKDC1 (Ambion Silencer Select s37044, s37046) or siRNA not known to target any genes (Ambion Silencer AM461) using 6 pM of siRNA and 1 μL of Lipofectamine RNAiMax transfection reagent (Life Technologies) per well. Each siRNA was transfected in biological triplicate. Cells were harvested 24 hr later for RT-qPCR and hexokinase assays as described above.

Genome Wide Association with Imputed Genotypes
Genotypes from the previously published GWA study that identified the HKDC1 locus 6 were imputed separately in each of the four quality-control (QC) cleaned and filtered genotyping sets using IMPUTE2 v2.3.0 and the 1KGP reference panel (December 2013 release). A cosmopolitan reference panel of unrelated individuals of the Africa ("AFR"), Americas ("AMR"), Asia ("ASN"), and Europe ("EUR") populations was used. The strandchecking utility of SHAPEIT v2 was used to ensure consistent strand assignments between the reference data set and the QC cleaned and filtered data sets. Strand was corrected as indicated, and SNPs for which strand could not be resolved were removed. A conservative info threshold (synonymous to allelic r 2 ) of 0.9 was used to remove questionable imputed SNPs.
The genotype call probabilities from the filtered IMPUTE2 output were used in a linear regression model between each of the phenotypes and the genotypes probabilities under an additive model adjusting for age, BMI, and the first two principal components of ancestry. The frequentist approach in SNPTEST v2.4.1 was used to estimate the beta and standard errors for each regression model and assess the significance of the association between the SNP and the phenotype of interest.

Liver eQTL analyses
Liver gene expression and genotype data were generated as follows 15 . Gene expression array probes were aligned to the human reference genome (hg19) and gene models (RefSeq). Probes lacking unique genomic or transcriptomic alignments were discarded. Furthermore, probes overlapping common polymorphisms (minor allele frequency > 5%, based on 1KGP pilot release data) were discarded. Gene expression array feature intensities were extracted from arrays, background subtracted and log 2 transformed. Missing data was imputed by k nearest neighbors (R package impute, function impute.knn, k = 10). The distribution of expression measurements on each array was normalized to the average empirical distribution across all arrays (R package limma, function normalizeBetweenArrays, method quantile). For each probe and across arrays, the distribution of expression values was transformed to the quantiles of the standard normal distribution (R function qqnorm). This matrix of processed gene expression values was then subjected to PCA (R package pcaMethods, function pca). We controlled for the effect of these PCs by taking the residuals of linear models using the first 15 PCs as covariates. Values from replicate arrays were then averaged. Because this sample set included both European and African American individuals, residual expression values were then transformed to the quantiles of the standard normal distribution, within each population, and then pooled. Given the mappings of probes to gene models, if there were multiple probes that mapped to the same gene, probes were clustered [R package mclust, function Mclust(y,G=1:min(4,probe_count))] and the mean per cluster per individual was estimated.
These residuals were then transformed by gene to standard normal. These data were used as phenotypes for the eQTL scan.
Genotype data were processed and quality controlled as follows 15 . Genotyping was performed on the Illumina human 610 quad beadchip platform (GPL8887) at the Northwestern University Center for Genetic Medicine Genomics Core Facility according to the manufacturer's instructions. One sample was removed because it had a no call rate >10%. The initial marker set comprised 620,901 markers. 8,300 markers were removed because they showed significant deviation from Hardy-Weinberg equilibrium (HWE, Fischer's exact test, p < 0.001). 29,705 SNPs were removed from the analysis because they had a no call rate in more than >10% of the samples. Hence, our final marker set is comprised of 583,073 SNPs. Imputation was performed using impute2 and the full 1KGP reference panel, as per impute2 recommendations, in 5 Mb segments using the commands: The cis-eQTL scan was conducted with Bayesian regression using all SNPs within 1 Mb of the gene using the command: snptest_v2.5-beta4 -use_raw_phenotypes -bayesian 1 -method expected -pheno pheno1 -prior_qt_mean_b 0 -prior_qt_V_b 0.02 -prior_qt_a 3 -prior_qt_b 2

Supplementary Material
Refer to Web version on PubMed Central for supplementary material. . Enhancer activity was determined by dividing the relative luciferase signal from the most active haplotype by that of a control vector with the same promoter but no enhancer. The red line indicates the Pearson correlation between DNase-seq signal and enhancer activity. Error bars show standard deviation (s.d.). (c) Coordinated regulatory variation in the HKDC1 locus. SNPs that are significantly associated with gestational hyperglycemia ("GWA SNPs"), HKDC1 mRNA expression ("HKDC1 eQTLs"), or regulatory activity in allele-specific luciferase reporter assays ("Reg. Vars") are marked with an asterisk. (d) Example box plots showing allele specific regulatory activity for the four SNPs that were significantly associated with gestational hyperglycemia, HKDC1 expression, and luciferase reporter gene expression. In each example, they associated with increased 2 hr plasma glucose are shown in bold face. The bottom and top boxes are the first and third quartiles, and the band inside the box is the median. The ends of the whiskers represent the lowest and highest data points within 1.5 interquartile range of the lower and upper quartiles. Black squares represent outliers defined as 1.5 times the interquartile range above the upper quartile or below the lower quartile. The number of replicate measurements followed by each allele are as follows: 103 of rs10762264A, 79 of rs10762264G, 80 of rs4746822C, 115 of rs4746822T, 129 of rs2394529C, 47 of rs2394529G, 80 of rs9645501A, and 80 of rs9645501G.  Table 1 Association of functional genetic variants with regulatory activity (β Luciferase), HKDC1 expression (β eQTL), and gestational hyperglycemia (β GWA) * indicates parameters and p-values that were not estimated.