A multi-ethnic epigenome-wide association study of leukocyte DNA methylation and blood lipids

Here we examine the association between DNA methylation in circulating leukocytes and blood lipids in a multi-ethnic sample of 16,265 subjects. We identify 148, 35, and 4 novel associations among Europeans, African Americans, and Hispanics, respectively, and an additional 186 novel associations through a trans-ethnic meta-analysis. We observe a high concordance in the direction of effects across racial/ethnic groups, a high correlation of effect sizes between high-density lipoprotein and triglycerides, a modest overlap of associations with epigenome-wide association studies of other cardio-metabolic traits, and a largely non-overlap with lipid loci identified to date through genome-wide association studies. Thirty CpGs reached significance in at least 2 racial/ethnic groups including 7 that showed association with the expression of an annotated gene. CpGs annotated to CPT1A showed evidence of being influenced by triglycerides levels. DNA methylation levels of circulating leukocytes show robust and consistent association with blood lipid levels across multiple racial/ethnic groups.

A bnormal blood lipid levels are important risk factors for various diseases including cardiovascular disease 1 , diabetes 2 , renal disease 3 , Alzheimer's disease 4 , and cancers 5,6 . Several large-scale genome-wide 7,8 and exome-wide association studies of lipids 9,10 have identified single nucleotide polymorphisms (SNPs) involved in lipid metabolism. However, the biological mechanisms behind abnormalities in lipid metabolism are not fully understood.
Complex traits are a manifestation of not only genetic but also environmental factors which in part express themselves through the epigenetic modification of DNA in all cell types. Epigenetic modifications can explain differences in a phenotype between monozygotic twins 11 as well as changes in a phenotype within an individual over time 12 . Epigenome-wide association studies (EWAS) provide an opportunity to document differences in epigenetic marks between individuals through the quantification of the degree of methylation at thousands of CpG sites across the genome. To date, such studies have identified methylation signatures associated with a number of cardiometabolic traits including cigarette smoking 13 , BMI 14 , hepatic fat 15 , fasting insulin or HOMA-IR 16 , incident type2 diabetes 17 , renal function 18 , blood pressure 19 , and C-reactive protein 20 . In addition, several EWAS identified CpGs significantly associated with blood lipid levels in populations of European ancestry [21][22][23][24] . However, large-scale multi-ethic studies to identify epigenetic determinants of blood lipid levels are lacking 25 .
Here, we present a racial/ethnic groups specific meta-analysis of 15 EWAS of DNA from circulating white blood cells involving a total 16,265 participants from 3 racial/ethnic groups to identify CpGs with DNA methylation levels that are significantly associated with blood lipid levels. We identify 187 racial/ethnic specific novel CpG associations among Europeans, African Americans, and Hispanics and an additional 186 novel association through a transethnic meta-analysis. To aid in the interpretation of our results, we quantify the consistency of associations across racial/ethnic groups, determine overlap between our findings and previously published relevant genome wide and epigenome-wide association studies, explore for the presence of cis-methylation quantitative trait loci (cis-mQTL) and/or cis-expression quantitative trait methylation (cis-eQTM) for CpG associations found to be significant in at least 2 racial/ethnic groups, and attempt to provide evidence on the direction of these associations using bi-directional Mendelian randomization.

Results
Study population. We analyzed 12 cohorts of Europeans (EA) involving 11,114 participants, 7 cohorts of African Americans (AA) involving 4,425 participants, and 2 cohorts of Hispanics (HISP) involving 699 participants (Table 1, Supplementary methods). The TwinsUK, WHI-BA23, and WHI-EMPC cohorts were composed of female participants only while NAS was composed of male participants only. The range of mean age, body mass index (BMI), high-density lipoprotein (HDL) levels, lowdensity lipoprotein (LDL) levels, and triglyceride (TG) levels was 42.7 to 76.0 year, 26.6 to 32.6 kg per m 2 , 45.5 to 59.3 mg per dl, 104.9 to 152.6 mg per dl, 74.1 to 168.5 mg per dl, respectively ( Table 1). The percentage of study participants taking any lipid control medication at time of blood lipid measurement ranged between 0% in the Amish to 44% in FHS.
EWAS stratified by racial/ethnic group. We measured DNA methylation levels using the Illumina Infinium HumanMethylation 450 K Beadchip in peripheral blood leukocytes or whole blood, except in GOLDN where CD4 + T cells exclusively were used (Supplementary Data 1). We performed an EWAS on HDL, LDL, and TG using four linear mixed effects models in each cohort, stratified by racial/ethnic group and a random effects meta-analysis 26 with genomic control (GC) and Bonferroni correction for the number of probes tested (P < 1.09 × 10 −7 ) (Methods, Supplementary Table 1). We identified 447, 25, and 496 CpGs for HDL, LDL, and TG, respectively, among EA using the basic set of covariates (model 1 adjusted for age, sex, smoking, lipid medication, four SNP PCs, estimated cell proportions, plate, row, and column of plate) (Methods, Supplementary Data 2). When we further adjusted for BMI (model 2 additionally adjusted for BMI), the numbers of significant CpGs decreased substantially for HDL (146) and TG (206) but increased modestly for LDL (30). Among AA, we identified 34, 7, and 76 CpGs in model 1 for HDL, LDL, and TG, respectively, and the numbers decreased to 9, 7, and 55 with a further adjustment with BMI (model 2). For HISP, we identified 2, 0, and 6 CpGs in model 1 for HDL, LDL, and TG, respectively, and the number decreased to 0 for HDL. Excluding participants taking any lipid lowering medication decreased the sample size by 18% and decreased power but the effect estimates remained similar (models 3 and 4 adjusted for the same set of covariates of models 1 and 2, respectively, with the exception of adjustment for the use of lipid medications, Methods). Among EA, we identified 74, 15, and 86 CpGs significantly associated (P < 1.09 × 10 −7 ) with HDL, LDL, and TG, respectively, using this most conservative model 4 that excluded statin users and adjusted for BMI (Fig. 1, Supplementary Data 2, Supplementary Fig. 1). For AA, these numbers were 7, 5, and 43 and, for HISP, they were 2, 2, and 4 CpGs, respectively. Through trans-ethnic metaanalyses of the same model, we additionally identified 49, 24, and 119 significant (P < 1.09 × 10 −7 ) CpG-lipid level associations for HDL, LDL, and TG, respectively, of which 46, 22, and 118 were novel when compared to both our racial/ethnic specific analyses and the literature.
Comparison of results across racial/ethnic groups. A majority of significant CpG-lipid level associations in European population did not reach statistical significance in other racial/ethnic groups. For CpGs found to be significant in at least one racial/ethnic group (model 4), we found a high rate of concordance (88 to 100%) in the direction of effect observed in Europeans versus that observed in African Americans and, separately, Hispanics for all three lipid fractions (Fig. 2). We found a high correlation between effect sizes (Pearson correlation coefficient = 0.69 to 0.93) for HDL and TG but not for LDL (−0.20 to 0.20). Lastly, regression slopes between any 2 sets of betas were close to 1 for HDL (0.75, 1.03) and TG (0.68 to 1.12) but not for LDL (−0.28 to 0.26) (Fig. 2). We calculated the correlations coefficients and estimated regression slopes for a higher number of CpGs (122 CpGs with P < 1.09 × 10 −5 ) before and after natural log transformation to further explore whether the differences in correlations and regression slopes between LDL and HDL/TG could be a consequence of having much smaller number of significant CpGs and/or the use of a log transformed lipid measure. For these analyses, we found both higher correlation of betas (0.21 to 0.49) and regression slopes (0.34 to 0.47) for LDL although still not as high as those observed for HDL and TG ( Supplementary  Fig. 2).
When comparing results across all 3 racial/ethnic groups, we identified 4, 1, and 26 CpGs associated with HDL, LDL, and TG, respectively, in more than one racial/ethnic group (Table 2). Of these, 1 CpG, cg06500161, in ABCG1 was associated with both HDL and TG in opposing directions. Consistent with our findings for significant CpG-lipid trait associations overall, we found high rate of concordance of the direction of the associations of the 30 CpGs across all 3 racial/ethnic group but variable effect sizes . The 4 CpGs in the ABCG1, carnitine palmitoyltransferase 1 A (CPT1A), sterol regulatory element binding transcription factor 1 (SREBF1), and thioredoxin interacting protein (TXNIP) genes identified in HISP for TG were also significant in the EA and AA populations. We found multiple CpGs to be significantly associated with TG levels located within or near the phosphoglycerate dehydrogenase (PHGDH), CPT1A, SREBF1, and ABCG1 genes. Twelve out of the 30 replicated CpGs significant in at least 2 racial/ethnic groups have not been previously reported [21][22][23][24] (Table 2).
Methylation quantitative trait loci analysis. We searched for methylation quantitative trait loci (mQTL) influencing methylation levels of the 30 CpGs listed in Table 2. Five out of the 15 cohorts provided genetic data for this analysis including ARIC (N AA = 1,717), GOLDN (N EA = 713), KORA (N EA = 1,379), WHI-BA23 (N EA = 790, N AA = 540, and N HISP = 324), and WHI-EMPC (N EA = 494, N AA = 424, and N HISP = 221). We restricted our analysis to SNPs located within 25 kilobases up-or downstream of the CpGs with a minor allele frequency (MAF) > 0.01 in each cohort and implemented a fixed effects meta-analysis within each of the three racial/ethnic groups. A total of 11, 18, and 5 CpGs had at least one significant mQTL in EA (number of tests = 5549, Bonferroni corrected P = 9.01 × 10 −6 ), AA (number of tests=8316, Bonferroni corrected P = 6.01 × 10 −6 ), and HISP (number of tests=4,713, Bonferroni corrected P = 1.06 × 10 −5 ), respectively (Supplementary Data 3). We found 7 out of our 11 CpGs in EA to also be listed to have at least one mQTLs in the mQTL DB (http://www.mqtldb.org/) 29 (Supplementary Data 4). For the 7 CpGs, 95% of the significant mQTL SNPs in mQTL DB were also significant in our EA population and had consistent direction of effect. Out of the 190 significant mQTLs (SNP-CpG pairs) common to both our study and the mQTL DB, 51 (27%) were found to be mQTLs in datasets spanning across the life course in the mQTL DB including birth, childhood, adolescence, pregnancy, and middle age.
Mendelian randomization approach. We explored the causal relationships between methylation and blood lipid levels for the 30 CpGs in EA (Table 2) using a bi-directional Mendelian Randomization (MR) study design 31,32 . First, we used genetic risk scores (GRS) for HDL, LDL, and TG constructed from established susceptibility loci for these phenotypes as instruments to examine the relationship between blood lipids and methylation (Supplementary Table 4). We found the GRSs to be significantly associated with their respective lipid levels in the 4 cohorts (GOLDN (N EA = 713), KORA (N EA = 1,379), WHI-BA23 (N EA = 790), and WHI-EMPC (N EA = 494) participating in the MR follow-up analysis (HDL: P meta = 1.86 × 10 −37 , LDL: P meta = 1.13 × 10 −22 , TG: P meta = 1.13 × 10 −8 ). Our Mendelian randomization analysis suggested that the DNA methylation levels of three CpGs, cg00574958 (P = 4.23 × 10 −6 ), cg17058475 (P = 4.72 × 10 −4 ), and cg09737197 (P = 3.33 × 10 −3 ), located in the 5′UTR region of the CPT1A were influenced by blood TG levels (Supplementary Table 5). Next, we investigated the effect of DNA methylation on blood lipid levels. A total of 7 out of the 30 CpGs had at least one significant mQTL with available GWAS results from the Global Lipids Genetics Consortium results (Supplementary Table 6). We implemented inverse-weighted MR method and MR-egger when >2 mQTLs were available for a given CpG (3 CpGs out of 7). None of the CpGs were significantly associated with lipid levels using MR-egger.

Discussion
We report the first large-scale multi-ethnic epigenome-wide association study (EWAS) of blood lipids. Our population specific meta-analyses revealed 187 novel CpG-lipid trait associations and identified a high concordance of the direction of effects across racial/ethnic groups for all 3 lipid traits and a high correlation of Results are plotted as negative log-transformed P values (y-axis) across the genome (x-axis). Odd chromosomes are in green and even chromosomes are in orange. The red horizontal line represents the epigenome-wide significance threshold of 1.09 × 10 −7 . Linear mixed effects models were implemented adjusting for age, sex (reference = male), smoking variable (never/previous/current, reference = never), lipid medication (Yes or No, reference = No), the top four principal components from genotypes (SNPs), the proportion of 5 types of cells estimated with the Houseman method (CD8 T lymphocytes, CD4 T lymphocytes, natural killer cells, B cells, and monocytes), and random effects for plate, row, and column, and BMI (model 4). The top CpGs of each chromosome were annotated with a gene name (in blue font: identified in a racial/ethnic group; red: identified in multiple racial/ethnic groups; bold: significantly associated with multiple lipid measures).
effects sizes for associations with HDL and TG. A majority of our significant CpG-lipid associations do not implicate genes previously identified through GWAS of lipids 7-10,27,28 , but many of our associations overlap with those identified in EWAS to date of related cardiometabolic traits especially for TG and HDL [14][15][16][17][18][19]33 . Thirty CpG-lipid trait associations were significant in at least 2 racial/ethnic groups with~1/3 of these being novel. In subgroup analyses, 19 significant CpGs also harbored mQTLs and 7 were inversely associated with levels of expression of the annotated gene. Lastly, our Mendelian randomization analyses suggested that DNA methylation levels at one locus appeared to be influenced by blood TG levels.
The numbers of statistically significant CpGs decreased dramatically for HDL and TG after adjusting for BMI. These findings suggest that BMI serves as either a strong confounder or a strong mediator of a large fraction of our CpG-lipid associations for these traits. Even after adjusting for BMI, we found~1/3 of CpGs to be linked to BMI in other studies 14 . In addition to the potential of residual confounding, we hypothesize that many CpGs may be independently influenced by both BMI and blood lipid levels (akin to how diet and exercise have an independent effect on weight loss).
We found a relatively high overlap of findings from our study with previous EWAS findings for cardiometabolic traits and a  Table 2 CpGs significantly associated with lipids levels in more than one racial/ethnic group.  relative paucity of overlap with lipid loci implicated through GWAS. We hypothesize that the influence of important upstream environmental determinants of the metabolic syndrome such as diet and exercise may be responsible for these patterns although substantial additional research is needed to prove this hypothesis including studies measuring changes in epigenetic profiles of multiple relevant cell types after dietary and physical activity interventions. We could not detect methylation QTLs (mQTLs) for many of our 30 CpGs-lipid trait associations replicating across 2 racial/ ethnic groups. Among mQTLs identified, a majority were not consistently associated with the CpG over the life course 29 . Whether this observation reflects direct changes in methylation levels of CpGs associated with lipid levels that occur over a lifetime of accumulated environmental exposures, such as diet and exercise, remains unknown. Other environmental exposures or time-dependent events leading to subtle changes in white blood cell proportions may also be responsible for these observations. We highlight 3 EWAS lipid loci uncovered through our metaanalysis among the plethora of both novel and known findings. First, we found a CpG to be associated with PARP9 among our few LDL findings. PARP9 is homologous to PARP10 and both are ADP-ribosyltransferases with 30% of their catalytic domains being identical 34,35 . Both also have been previously identified as either LDL or total cholesterol loci through GWAS 28 . Second, we identified CpG associations in ABCG1 for both HDL and TG. ABCG1 is a member of the superfamily of ATP-binding cassette transporters that plays an important role in macrophage cholesterol efflux. Notably, ABCA1, another member of this gene family, has also been robustly linked to the control of HDL and TC levels through GWAS 36 . We replicate the previously reported association between ABCG1 methylation and ABCG1 expression 37 and note that expression levels of ABCG1 have also been found to be positively correlated with obesity 38 . Lastly, the methylation status of cg06500161 in ABCG1 is associated with an elevated risk of developing type 2 diabetes 39 while genetic variants mapped to this locus are linked to atherosclerosis 40,41 . These constellations of findings suggest that ABCG1 may play a role in predisposing to or mediating the effects of the metabolic syndrome. Third, we found the methylation status of CpGs in CPT1A, a gene that initiates the oxidation of long-chain fatty acids, to be influenced by blood levels of TG through our MR analysis. The same CpG (cg00574958) in CPT1A was also found to be influenced by blood pressure levels in another EWAS 19 . In other human and animal studies, DNA methylation levels of the CpGs in CPT1A have been associated with CPT1A expression 21 , plasma adiponectin levels 42 , the metabolic syndrome 43 , BMI 14 , hepatic fat 15 , and high fructose consumption 44,45 . Collectively, these findings suggest that methylation status of CPT1A may mediate the downstream effects of the metabolic syndrome.
The two major strengths of our study are its size and ethnic diversity. These strengths allowed for improved power to detect novel CpG-blood lipid trait associations and to robustly explore the generalizability of the findings across multiple racial/ethnic groups. Our study has limitations in other respects some of which are common to all EWAS studies of blood. First, it remains unclear whether epigenetic changes in blood cells serve as a good surrogate of changes in the most relevant tissues controlling blood lipid levels 46 . To establish a reliable surrogate tissue, interindividual epigenetic differences must not only correlate between blood and the tissue of interest but also exposures must induce similar changes to both tissues 46 . Such evidence either does not yet exist or is incomplete for most trait-exposure combinations [47][48][49][50] . While circulating leucocytes are likely to exert at least partial direct control over blood lipid levels 51,52 , other relevant tissues/cells that we did not examine include hepatocytes, adipocytes, and enterocytes. A second limitation common to all EWAS of circulating leucocytes includes the potential for findings to not be truly reflective of a chronic alteration of transcriptional regulation from environmental perturbations but rather residual confounding due to persistent cell subtype proportional heterogeneity despite the application of statistical deconvolution techniques 46,53,54 . Lastly, the cross-sectional design of our study makes it difficult to determine the directionality of our associations 54 . We attempted to provide additional evidence of directionality using Mendelian randomization techniques 32 , but the power of our MR analyses was limited due to the lack of availability of genetic data for many cohorts limiting sample size combined with a lack of strong genetic instruments for many of CpGs examined 55 . In addition, the known shared genetic background of HDL, LDL, and TG introduces the possibility of biases due to pleiotropy in our MR analysis.
In conclusion, we identified 373 novel CpG-lipid traits associations through the largest multi-ethnic EWAS to date. We found a high concordance of the direction of effects for all 3 lipids traits across racial/ethnic groups and a high correlation of effects for HDL and TG with 30 CpGs-including 12 novel CpGsreaching stringent statistical significance in at least 2 racial/ethnic groups. A large majority of implicated genes do not overlap with lipid loci identified to date through GWAS although many loci associated with HDL and TG in >2 racial/ethnic groups have been associated with related cardio-metabolic traits in previous EWAS. We provide some limited insights on mechanism of association through our mQTL, eQTM, and MR analyses but additional studies are needed before firm conclusions can be made on the causality and mechanisms behind a large majority of the associations we observed. Lipid measurements. High-density-lipoprotein (HDL, mg per dl) and triglycerides (TG, mg per dl) were directly measured in blood samples taken from participants after at least an 8 h fast. Low-density-lipoprotein (LDL, mg per dl) was inferred using the Friedewald's formula 56 in all cohorts except for GOLDN, HyperGEN, and KORA where LDL was measured directly. We did not infer LDL in subjects with triglycerides >400 mg per dL and we excluded lipid measure from subjects who did not fast for at least 8 h. We also excluded outliers as defined by >5 standard deviations from the mean of blood lipid in each cohort. To reduce skewness, HDL and triglycerides were natural log-transformed.

Methods
DNA methylation measurement, QC, and normalization. DNA methylation was produced by investigators from each cohort independently. Levels were measured from peripheral blood leukocytes isolated from whole blood in all studies except GOLDN where only CD4 + T cells were examined. The EZ DNA Methylation Gold Kit (Zymo Research, Orange CA) was used for bisulfite conversion. The Illumina® Infinium HumanMethylation450 BeadChip and the Illumina BeadXpress reader were used to perform the methylation assays. Either the SWAN 57 method in the minfi 58 R package, the Beta Mixture Quantile method (BMIQ) 59 , the DASEN method in the wateRmelon R package 60 , or the GenomeStudio® Methylation Module was used for pre-processing and normalization of the data in each cohort (Supplementary Data 1). For each CpG site, a beta-value was calculated representing the percent methylation at that CpG site. We used an annotation file provided on the Illumina website to annotate CpGs to genes. CpGs were annotated to genes by Illumina using the following rules: those located within 1500 bp upstream of transcription start site (TSS1500), TSS200, 5′UTR, 1 st exon, gene body, or 3′UTR of a gene were annotated to that gene. All other intergenic CpGs were not annotated to a gene. To reduce technical batch effects, plate, row, and column information were added as random effects in the association analyses. To reduce confounding from cellular heterogeneity 61 , we estimated cell proportions using Houseman's method 53 in each subject and used these proportions as covariates in the association analyses.
Any single value with a detection p-value > 0.01 was set to missing. In each cohort, we excluded probes with a detection p-value > 0.01 in greater than 5% of samples. In addition, we excluded samples with a detection p-value > 0.01 in greater than 5% of probes. To avoid spurious signals in DNA methylation data, we excluded 29,233 CpGs that co-hybridize to alternate genomic sequences (highly homologous to the intended targets) 62 .
Epigenome-wide association study. Epigenome-wide association analyses (EWAS) were performed in each cohort stratified by racial/ethnic group (European, African, and Hispanic). For Model 1, a linear mixed effects model was used to study the association between the DNA methylation level of a CpG (dependent variable) and each of the lipid measures (independent variable; HDL, LDL, or TG), adjusting for age, sex (reference = male), smoking variable (never/previous/current, reference = never), lipid medication (Yes or No, reference = No), the top four principal components from genotypes (SNPs), and the proportion of 5 types of cells estimated with the Houseman method (CD8 T lymphocytes, CD4 T lymphocytes, natural killer cells, B cells, and monocytes) 53 . We added random effects for plate, row, and column. We also included family structure as a random effect among family-based studies. For Model 2, we further adjusted for BMI in addition to Model 1 covariates. We also ran Model 3 and Model 4 which were analogous to Model 1 and 2, respectively, in the subset of individuals not taking lipid-lowering medication.
Meta-analysis. We performed meta-analyses of all the participating cohorts (N = 16,265) and also stratified by racial/ethnic group: European Americans (12 cohorts, N = 11,114), African Americans (7 cohorts, N = 4,452), and Hispanics (2 cohorts, N = 699). These meta-analyses were performed for each of the 4 models, respectively. We used a random effects meta-analysis implemented in METASOFT 26 to take into account the heterogeneity of the effect sizes of different cohorts while achieving a higher or comparable statistical power compared to fixed effects metaanalysis. To avoid spurious findings from population substructure, we applied genomic control 63 . We considered a p < 1.09 × 10 −7 to be significant, equivalent to a Bonferroni correction for the number of CpG probes. We then compared results for all CpGs across all three racial/ethnic groups to identify the subset of CpGs with significant associations across two or more racial/ethnic groups. These CpGs were prioritized and followed-up with mQTL analysis, eQTM analysis, and a Mendelian randomization approach for causal inference.
Overlap with prior related genome-wide association studies. To identify the overlap between EWAS CpGs and genome-wide association studies (GWAS) SNPs of lipids, we tried three approaches: (1) Identify a GWAS SNP and a EWAS CpG pair located within 1 Mbp; (2) Identify a GWAS SNP and a EWAS CpG pair located within 10 Mbp; and (3) Identify a GWAS SNP and a EWAS CpG pair annotated to the same gene. SNPs were annotated to a gene if it is located within the transcript boundary of a protein-coding gene or a nearest gene if it is located outside of genes. The CpGs were annotated to a gene if it is located within 1500 base pairs of the transcription start site, 5′-UTR, gene body, or 3′-UTR using the Illumina annotation file. Intergenic CpGs were not compared for this 3rd approach.
Methylation quantitative trait loci. We investigated the association between either imputed or genotyped SNPs located within 25 kb upstream or downstream of each CpG and DNA methylation levels to identify cis-acting methylation quantitative trait loci (mQTL). For imputed SNP data, we restricted the mQTL analysis to SNPs with a good quality imputation (IMPUTE info > =0.4 or MACH r^2 > =0.3). Subjects taking lipid lowering medications were excluded from this analysis. Beta-values of DNA methylation levels were inverse-normal transformed and regressed on age, sex, smoking (current/former/never), BMI, at least 4 SNP PCs, cell proportions (WBC count and/or estimated WBC proportions (granulocytes as a reference)), and technical covariates (plate, row, and column as random effects) (two-sided test). Family information was also included as a random effect if a cohort was a family-based study. We then regressed the residuals on each SNP of interests stratified by racial/ethnic group.
Expression quantitative trait methylation. We extracted associations between DNA methylation levels and gene expression in blood (expression quantitative trait methylation or eQTM) from a pre-existing investigation involving 4278 participants of the FHS (2,726 offspring cohort participants and 1552 third-generation cohort participants) 30 . For gene expression, whole blood was collected in PAX-gene™ tubes (PreAnalytiX, Hombrechtikon, Switzerland) and frozen at −80°C. RNA was extracted using the whole blood RNA System Kit (Qiagen, Venlo, Netherlands) and mRNA expression profiling was assessed using the Affymetrix NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-23899-y ARTICLE Human Exon 1.0 ST GeneChip platform (Affymetrix Inc, Santa Clara, CA), which contains more than 5.5 million probes targeting the expression of 17,873 genes. The Robust Multi-array Average (RMA) package 64 as used to normalize the gene expression values and remove any technical or spurious background variation. Linear mixed effects model was used to assess associations between residuals of DNA methylation levels and residuals of gene expression levels (two-sided test). We first regressed out the fixed effects of age, sex, white blood cell type proportions as estimated through the Houseman method 53 , technical variables (sample storage time, RNA integrity number), and the first component of a principal component analysis and the random effects of amplification batch in the gene expression levels. Next, we regressed out the fixed effects of age, sex, Houseman's white blood cell type proportions, and the first two PCs of a principal component analysis and the random effect of batch (plate) from the DNA methylation levels. We then applied an additional adjustment for 25 surrogate variables to both the gene expression and DNA methylation data before the methylation-gene expression association analysis. Surrogate variables were calculated using the gene expression data (residualized for PC1, and technical variables as fixed effects and amplification batch as random effect) and the DNA methylation data (residualized for PC1 and PC2 as fixed effects and batch as random effect) adjusting for age, sex, and Houseman's blood cell type prediction.
Mendelian randomization approach. We applied a bi-directional Mendelian randomization approach to shed light on the causal nature of associations we identified between blood lipid levels and CpGs across ≥ 2 racial/ethnic groups 31,32,[65][66][67] . First, we created genetic risk scores (GRS) for increasing lipid levels using either imputed (where IMPUTE info > =0.4 or MACH r^2 > =0. 3) or genotyped SNPs known to be robustly associated with lipids through prior large scale GWAS 7,8 . The GRSs for each subject were calculated as the sum of the lipid (HDL, LDL, or TG) increasing allele dosages of the SNPs listed in Supplementary  Table 4 divided by the number of SNPs used to calculate the specific GRS. We performed a Mendelian randomization analysis in each study among participants not taking lipid lowering medication followed by a meta-analysis of analogous results in each study. Each lipid measure was regressed on the corresponding GRS adjusting for age, sex, smoking, BMI, at least 4 SNP PCs, and a family effect if a cohort was a family-based study. The DNA methylation level of each CpG was also regressed on the corresponding GRS adjusting for age, sex, smoking, BMI, at least 4 SNP PCs, cell proportions, technical batch effects, and a family effect if a cohort was family-based study. The study-specific estimates and standard errors of the GRS-lipid and the GRS-CpG associations were used as input for a meta-analysis to obtain overall estimates for GRS-lipid and GRS-CpG associations. The overall estimates were then used to assess the causal effect of each lipid measure to a CpG with a Mendelian randomization approach.
Secondly, we implemented Mendelian randomization 65,66 method to study the effect of DNA methylation on blood lipids levels. The identified significant mQTL SNPs were used as an instrument for the CpGs of interest. Study participants taking any lipid lowering medication were excluded in the Mendelian randomization analysis. Beta-values of DNA methylation levels were inversenormal transformed and regressed on age, sex, smoking (current, former, or never), BMI, at least 4 SNP PCs, cell proportions (WBC count and/or estimated WBC proportions (gran as a reference)), and technical covariates (plate, row, and column as random effects). Family information was also included as a random effect if a cohort is a family-based study. Then the residuals were regressed on each SNP located 25 kb up and downstream of the CpG site. The mQTL analyses were performed separately for each racial/ethnic group. Each SNPs were also searched to identify associations with lipid levels in the previous GWAS results from Global Lipids Genetics Consortium (GLGC, http://csg.sph.umich.edu/willer/public/ lipids2013/) 7 . The estimates and standard errors of the SNP-CPG (obtained from our samples) and SNP-lipid associations (obtained from the GLGC GWAS results) were used as input for inverse-weighted MR and MR-egger methods to assess the causal effect of each CpG to a lipid measure.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The methylation QTL association results as well as the full summary statistics for the meta-analysis of the epigenome wide association study performed within each and across all racial/ethnic groups combined for all four models and all three lipid traits are available at [https://doi.org/10.5061/dryad.qfttdz0d8]. All other relevant data supporting the key findings of this study are available within the article and its Supplementary Information files or from the corresponding authors upon reasonable request. A reporting summary for this Article is available as a Supplementary Information file.