Blood triglyceride levels are associated with DNA methylation at the serine metabolism gene PHGDH

Efficient interventions to reduce blood triglycerides are few; newer and more tolerable intervention targets are needed. Understanding the molecular mechanisms underlying blood triglyceride levels variation is key to identifying new therapies. To explore the role of epigenetic mechanisms on triglyceride levels, a blood methylome scan was conducted in 199 individuals from 5 French-Canadian families ascertained on venous thromboembolism, and findings were replicated in 324 French unrelated patients with venous thromboembolism. Genetic context and functional relevance were investigated. Two DNA methylation sites associated with triglyceride levels were identified. The first one, located in the ABCG1 gene, was recently reported, whereas the second one, located in the promoter of the PHGDH gene, is novel. The PHGDH methylation site, cg14476101, was found to be associated with variation in triglyceride levels in a threshold manner: cg14476101 was inversely associated with triglyceride levels only when triglyceride levels were above 1.12 mmol/L (discovery P-value = 8.4 × 10−6; replication P-value = 0.0091). Public databases findings supported a functional role of cg14476101 on PHGDH expression. PHGDH catalyses the first step in the serine biosynthesis pathway. These findings highlight the role of epigenetic regulation of the PHGDH gene in triglyceride metabolism, providing novel insights on putative intervention targets.


Technical validation of DNA methylation findings
Technical validation of DNA methylation at cg14476101 by relative quantitative PCR with specific TaqMan probes was performed according to standard protocols. Briefly, sodium bisulfite conversion was performed from 300 ng genomic DNA using EZ-96 DNA Methylation kit (ZymoResearch). From the resulting bisulfite DNA, a relative quantitative PCR was performed using 2 TaqMan MGB probes (Life Technologies) specific to the methylated or unmethylated studied CpG and specific labelled with FAM or VIC reporter. The primers and probes design were performed with Primer Express software from the bisulfite converted DNA sequence. The primers and probes were chosen with no CpG island in their sequence, except the one being studied. The sequence of the probes and primers were: probe C (labelled hand, the cellular heterogeneity can introduce noise in the analysis and may induce a dependency between the methylation levels (and therefore between the statistical tests). For those reasons, the models should be adjusted for the cell type proportions.
However, the cell type proportions were not measured in the F5L family study. We used the Remove Unwanted Variation (RUV) approach 6-9 to capture the cellular heterogeneity in order to account for their effects on the associations In addition, RUV is able to capture other unwanted variation from the methylation data. Briefly, the method performs a factor analysis on the methylation levels measured at a subset of CpG sites (control probes). The factor components are then included in the regression model to adjust for the unwanted variations. For control probes, we used 473 CpG sites that have been found to be specific to different types of white blood cells 6,10 and whose methylation levels were not associated with log(triglyceride) in the discovery dataset (P-value > 0.10). The RUV components were estimated with the ruv4 method 8 in the ruv R-package. We selected the number of components based on the P-value distribution as suggested in Gagnon-Bartsch et al 7 .
Finally, the models in the F5L family study were adjusted for age, sex and the three first RUV components. In the MARTHA study, the models were adjusted for age, sex and the proportions of lymphocytes, monocytes, eosinophils and basophils. In the F5L family study, the degrees of freedom in the F-statistics were estimated with the Kenward-Roger approach as implemented in the pkbrtest Rpackage 11 .

Piece-wise linear regression
A segmented relationship between mean response µ=E[Y] and the variable X (with a breakpoint at θ) was modeling by: The model assumed a continuous relationship at X= , i.e.

% + = , +
Instead of constraining the parameters, the model can be reparameterized as: = % % + , , + with % = ≥ otherwise and , = − ≥ 0 otherwise According to this parameterization, β 1 is the left slope (slope when X < θ) and β 2 is the right slope (slope when X > θ) and the segments are connected at Z = θ. A Wald test or log-likelihood ratio test can be used to assess if the slopes (β 1 and β 2 ) are significantly different from 0. In our analyses, Y represents the methylation level at the PHGDH CpG site, X represents the log(triglyceride) and θ is the breakpoint (i.e θ = log(1.12) ). Therefore, β 1 is the effect size for triglyceride levels is below 1.12 mmol/L and β 2 are the effect size when triglyceride levels are above 1.1 mmol/L.

Adjustment of cell type heterogeneity
We first evaluated the association of the true cell type proportions of lymphocytes, monocytes and granulocytes on triglyceride levels in the MARTHA study (Supplementary Table S7). Triglyceride levels are associated with monocyte proportion: β = -5.3, P-value = 0.0017, 95% CI = (-8.5, -2.00). These results suggest that cell type proportions represent a potential confounder of the direct effect of triglyceride levels on the methylation levels at cg14476101.
We then evaluated the quality of the cell type proportion estimations by comparing the true and the estimated cell type proportions in the MARTHA study (Supplementary Table S8, Supplementary Fig S9, S10 and S11). The estimated and true cell type proportions are highly correlated, supporting that the deconvolution method works well on whole blood.
We then report the strength of association of triglyceride levels on the methylation levels at cg14476101 using different adjustments for cell type proportions (Supplementary Table S9) in the MARTHA study. The strength of associations adjusted for the estimated or true cell type proportions are similar. Thus, these results validate our approach to account for cellular heterogeneity in our investigation of the association between blood DNA methylation and triglyceride levels.
Finally, we report the strength of association between triglyceride levels and the methylation levels at cg14476101 adjusting for the estimated cell type proportions in the F5L family study (Supplementary Table S10). The strength of association estimated from the model adjusted for the RUV components is very similar to the one estimated from the model adjusted for estimated cell type proportions.

Reverse causality
The reverse causality ( Figure 1C) was assessed using a weight genetic score. The score was generated with the SNPs reported associated with triglyceride levels in the Global Lipids Genetics Consortium metaanalysis 12 . We excluded two SNPs (rs10761731 and rs11649653) due to the missing information on the strand. The SNPs were weighted using the effect sizes estimated in the meta-analysis. In total, 32 SNPs were included in the genetic score. The detailed information of the 32 SNPs can be found in Table 1 in (Johansen et al, 2011) 13 . All the models were adjusted for age and sex.
We first tested the association between the genetic score and the triglyceride levels (Supplementary Table S4). We observed a significant association in the F5L family Study (β = 0.40, P-value = 0.045) and in the MARTHA study (β = 0.39, P-value=8.7 × 10 -4 ). However, the evidence of association is not strong in the F5L family study, indicating that our discovery dataset may be underpowered to detect an association between the methylation and the triglyceride levels.
We then evaluated the association of the genetic score on the methylation levels at cg14476101 (Supplementary Table S4). Weak evidence for association was observed in MARTHA (β = -0.13, P-value = 0.11), but not in the F5L Family study (β = -0.089, P-value = 0.48). The sign and size of the estimates in both the studies were consistent with those observed for the association with triglyceride levels. The direction of association between the normalized triglyceride levels and the genetic risk score in a recent study 14 was also consistent with our results (β = -0.8, 95%CI = ( -1.7, 0.1), P-value = 0.092). Taken together, these results do not exclude the possibility of an association between the genetic score and methylation levels, but cannot unambiguously support it either. Well-powered studies are needed to answer whether the genetic score is associated with the PHGDH CpG methylation levels.

Estimation of the variance explained in the family study
We used a pseudo-R2 15 to estimate the variance explained by triglyceride levels in the F5L family study. It represents the variance explained by the fixed factors. In our model, we used the following formula: Where σ 7 , is the variance of the fixed effect components; σ 8 , and σ 9 , are the variance components of the model.  rs, reference SNP ID number; a1, reference allele Supplementary Table S4. Association between the triglyceride-associated genetic score and methylation levels at PHGDH cg14476101 in the F5L family and MARTHA studies. Associations were tested using a linear regression model (a variance components model in the F5L family study) where cg14476101 methylation levels expressed as M-value were analyzed as the outcome, and the triglycerides genetic score as a predictor. Models were adjusted for age and sex.
Supplementary Table S5. Associations of methylation levels at CPT1A cg00574958 and at ABCG1 cg06500161 with levels of triglycerides in the F5L family study and in the MARTHA study.