Introduction

Coffee is one of the most widely consumed beverages in the world and is believed to have potential health risks and benefits.1 Coffee consumption has been linked to a wide range of health outcomes including cardiovascular, metabolic, and neurocognitive function. Heavy coffee consumption induces cardiovascular responses and insomnia,1 but coffee consumption has also been associated with lower risk of type 2 diabetes, endometrial cancer,2 and neurodegenerative diseases such as Parkinson’s disease (PD) and Alzheimer’s disease (AD).3 Caffeine is thought to prevent cognitive decline by inhibiting formation of beta-amyloid and by acting as an anti-inflammatory agent in AD,4, 5 whereas in PD, it is thought to reduce neuroinflammation and lipid-mediated oxidative stress.6, 7 AD and PD are slowly progressive diseases with a long prodromal phase, making it difficult to rule out reverse causality such that at risk individuals may decrease coffee intake due to development of sleep problems or loss of smell.8

Genomic studies identified eight genetic loci that have an influence on habitual coffee consumption, including some near CYP1A2 and AHR, encoding the caffeine metabolizing enzyme Cytochrome P450 1A2 and a CYP1A2 regulator Aryl Hydrocarbon Receptor, respectively.9, 10 DNA methylation (DNAm) might act as a potential epigenetic mediator for caffeine’s influence on health.11 Mechanistic epigenetic studies of caffeine have mainly focused on animal models.12, 13, 14 For example, maternal prenatal caffeine intake increased methylation of the steroidogenic factor 1 promoter in fetal adrenal tissue in mice,12 whereas caffeine elicited effects similar to acute exercise in rat skeletal muscle tissue and resulted in lower DNAm levels in promoter regions of energy metabolism genes.15 Little is known whether epigenetic changes can be found in human due to their coffee consumption habits. Exploring whether coffee consumption affects DNAm can help identify epigenetic signatures and provide mechanistic insights for results from past epidemiological studies and possibly new insights into health risks or benefits of coffee consumption.

Here, for the first time we identified DNAm sites from a genome-wide screen that relate to habitual coffee consumption in humans. We conducted a meta-analysis of DNAm levels in blood samples from two different data sets: PD-free control subjects enrolled in the Parkinson’s Environment and Genes (PEG first round, 2001–2007)16, 17 study consisting of 215 non-Hispanic Caucasians, and women from the WHI consisting of 995 Caucasians, 431 Hispanics, and 674 African Americans. We also related coffee consumption to DNAm levels in saliva samples from 127 PD patients and 129 PD-free controls (age-, gender-, and ethnicity-matched) enrolled in the second round of the PEG study (2009-ongoing).18, 19 Detailed information for each data set can be found in Supplementary Table S1 and in Methods section.

Methods

Description of PEG1 subjects

Study population

This data set consists of 215 Caucasian population controls with complete information on coffee consumption and blood samples for DNA. The PEG1 study is a population-based case–control study in central California (Fresno, Kern, or Tulare Counties) recruiting subjects from 2001 to 2007. To be eligible, participants had to be residents of one of three central California counties, had to have lived in California for at least 5 years, and to be at least 35 years of age.16 Population controls were identified from Medicare lists and also using residential property tax assessor records. Potential controls were screened for eligibility by mail or telephone, and only one person per household was allowed to enroll.18, 19 The study was approved by the UCLA Institutional Review Board, and informed consent was obtained from all subjects.

Exposure assessment

Standardized interviews were conducted to obtain information on demographics, lifetime caffeinated beverage consumption, smoking, and menopausal hormone therapy (MHT) histories. In the interview, information on the frequency and amount of caffeinated beverage consumption at different periods of lifetime were collected: young adult <25 years, adult 25–44 years, middle-aged 45–64 years, and senior ≥65 years. We used this information to calculate weighted average daily coffee consumption. Only caffeinated coffee consumed during the past 12 months prior to the date of blood draw contributed to our exposure measures.

Description of PEG2 subjects

Study population

This data set consists of data from 127 PD patients and 129 population controls with complete information on coffee consumption and saliva samples. We extracted DNA from recruited for the PEG2 study which started in 2009 (ongoing). PD patients were identified using the California PD Registry for the three target counties in central California. Those who lived in the study area were eligible and were mailed invitations, and those who agreed were examined by a UCLA movement disorder specialist who applied UK Brain Bank and Gelb diagnostic criteria.20, 21 Controls selection was based on the same criteria as in the PEG1 study, but only used tax assessor records to identify residents whom we recruited at the door step. Saliva samples selected from PEG2 participants were matched on age, gender, and race for cases and controls.

Exposure assessment

Methods for assessing exposure were identical to the methods employed in the PEG1 study. Phenotype data and DNAm data of the PEG studies are available at GEO accession database GSE72775 (blood) and GSE78874 (saliva).

WHI subjects description

Study population

This dataset consists of a subgroup of 2100 women (995 Caucasians, 431 Hispanics, and 674 African Americans) with complete information on coffee consumption as well as genome-wide DNAm data from blood drawn at baseline. The WHI is a multi-center study launched in 1993, which enrolled postmenopausal women aged 50–79 years into either one or more randomized clinical trials or an observational study.22 These women were originally selected from two WHI subcohorts for a nested genomic case–control study of coronary heart disease (CHD) with genome-wide genotype and cardiovascular disease-related biomarker data.23 Thus, 50% (n=1053) of these WHI women were eventually diagnosed with CHD; however, disease status has no effect on DNAm level measured at baseline. The two cohorts are: (1) the WHI SNP Health Association Resource (SHARe) cohort, which includes genotyping data from ~8500 African American and ~3500 Hispanic women through WHI core study M5-SHARe (www.whi.org/researchers/data/WHIStudies/StudySites/M5) as well as information on biomarker through WHI Core study W54-SHARe (...data/WHIStudies/StudySites/W54); (2) the two European Americans Hormonal Therapy (EA HT) trials selected for GWAS and biomarkers in core studies W58 (.../data/WHIStudies/StudySites/W58) and W63 (.../data/WHIStudies/StudySites/W63).

Exposure assessment

Information on demographics, smoking history, and MHT was obtained using a structured questionnaire at baseline. Food frequency questionnaires were used to collect information on daily coffee or tea (all types) consumption in the past 3 months prior to baseline. Our exposure measures were directly taken from answers provided in response to the questionnaire.

DNA extraction and genome-wide DNA methylation analysis

DNAm data were obtained from the Infinium HumanMethylation450 BeadChip (Illumina, San Diego, CA, USA) using DNA samples extracted from peripheral blood cells and leukocytes in saliva. Methylation β values ranging from 0 (unmethylated) to 1 (fully methylated) were used for analysis.24

Statistical analyses

The raw methylation data were preprocessed using the background normalization method from the Genome Studio software (Illumina, San Diego, CA, USA). To assess correlations between continuous coffee consumption (cup/day) and site-specific DNAm levels, biweight midcorrelation (bicor) was applied in a genome-wide screen. In the main correlation analysis using DNAm levels from blood, potential confounders such as age at blood draw, gender, and blood cell counts were adjusted for by regressing out the effects of these factors and retaining the residuals. Smoking status (ever vs never) was further adjusted for in ancillary analyses. We used the Houseman algorithm in the minfi R package and epigenetic clock software for estimating blood cell counts.25, 26, 27 All blood analyses were stratified by ethnicity, thus four subsets were generated: PEG1 Caucasians PD-free controls, WHI Caucasians, WHI Hispanics, and WHI African Americans. In order to obtain an overall P-value across the four subsets, we conducted a meta-analysis using Stouffer’s method for combining Z-values (meta.Z), that is, Σzi/sqrt(4). The corresponding two-sided P-values (meta.P-value) were calculated under the assumption of a normal distribution. These approaches were also applied to identify smoking-associated CpGs and CpGs influenced by both coffee and smoking. We then identified the top-ranked coffee-associated CpGs by meta.P-value, and applied functional enrichment analysis on 2124 genes identified from the top 3000 most significant coffee-associated CpGs (meta.P-value threshold ~1.1 × 10−3) using the online bioinformatics tool – the Database for Annotation, Visualization and Integrated Discovery (DAVID v.6.7, NIAID/NIH, Bethesda, MD, USA). We further conducted MHT-stratified meta-analysis for the top 11 coffee-associated CpGs using the WHI data in order to investigate the modifying effect MHT has on the coffee–DNAm association. In the analysis using DNAm levels from saliva in PEG2, potential confounders such as age at saliva collection, gender, and ethnicity were adjusted for as above. Analyses and scatter plots were created using the WGCNA package in R v.3.1.2 (R Development Core Team 2016, Vienna, Austria), whereas Manhattan plots of epigenome-wide association study (EWAS) P-values were generated with the qqman package. QQ-plots of EWAS P-values were also generated in R, and lambda, that is, median(X2)/0.454, were calculated to identify potential inflation.

Results and discussion

Coffee consumption and DNA methylation levels in blood

In our EWAS study, we analyzed methylation levels of ~486 k CpGs on the Illumina 450 K array. Since many CpGs exhibit strong pairwise correlations, the Bonferroni-corrected significance threshold of α=0.05/500 000=1 × 10−7 was considered overly conservative; we used a modified threshold of P<5 × 10−6 to evaluate genome-wide significance in our study. In the PEG1 and WHI data sets, adjusting for chronological age, gender, and imputed blood cell counts, we identified one CpG with genome-wide Bonferroni-corrected significance: cg21566642 near the ALPPL2 gene (meta.P=3.7 × 10−10). Ten additional CpGs surpassed the significance threshold of P<5.0 × 10−6 (Table 1a and Figure 1a) and are located in/near the genes GPR132, BSCL2, MALRD1, GRK5, PSMD8, FSTL5, PTHLH, and so on. (Table 1a).

Table 1 The top-ranked CpG sites associated with coffee consumption in blood with/without smoking adjustment
Figure 1
figure 1

Blood DNA methylation levels associated with coffee consumption adjusted for age, gender, and blood cell counts. (a) Manhattan plot of the meta-analysis methylation association P-values adjusted for chronological age, gender, and blood cell counts. The line indicates P-value threshold of 10−7. One CpG on chromosome 2 passed this threshold. The y axis corresponds to negative log10 transformed meta.P-value. The x axis refers to chromosome number, and X and Y chromosomes. (b) Distributions of CpGs relative to CpG island and gene regions for all 450 k CpGs on the microarray and the 11 most significant coffee-associated CpGs listed in Table 1a. P-values were obtained by Fisher’s test for comparing proportions. A full color version of this figure is available at the European Journal of Human Genetics journal online.

The top-ranked CpGs appear to be linked to genes involved in lipid metabolism28, 29, 30, 31 and immune response (RefSeq, July 2008). For instance, the protein encoded by GPR132 is a receptor for oxidized free fatty acids and is a treatment target for diabetes because of its role in lipid metabolism and antioxidant activity.28 In a mixed race study of atherosclerosis, GPR132 was found to be hypomethylated among low socioeconomic status (SES) individuals with increased inflammatory activity compared with high SES individuals.29 BSCL2 encodes the transmembrane protein 'seipin' residing in the endoplasmic reticulum. Variants in BSCL2 cause congenital generalized lipodystrophy, characterized by the loss of adipose tissue and severe insulin resistance.30 MALRD1 encodes yet another lipid-related gene that has been shown to regulate bile acid and lipid levels in the enterohepatic system.31 Genes related to immune response include GRK5 and PSMD8. The protein encoded by GRK5 regulates polymorphonuclear leukocyte motility, whereas PSMD8 encodes an immune-proteasome component related to major histocompatibility (MHC) class I antigen processing and presentation (provided by RefSeq, July 2008). These coffee-associated CpGs were mostly located within 200–1500 bps upstream of a transcription start site of a gene, that is, promoter region (Fisher’s P=0.03, Figure 1b). Additional stratified analyses for the PEG and WHI samples, as well as comparisons of effect sizes between genders or ethnicities are provided in Supplementary Tables S2, S3 and Table 1.

Many studies have focused on associations between DNAm and smoking, and it is well known that a subgroup of coffee consumers is more likely to smoke. The highest correlation we observed between coffee intake and smoking in any of our cohorts was r=0.31 among PEG1 Caucasian controls (P=3.4x10−6). Due to the common co-exposure to coffee and smoking, any adjustment for smoking is expected to affect associations between DNAm and coffee consumption. Indeed, smoking adjustment reduced the statistical significance of cg21566642 (meta.P=5.4 × 10−4, Table 2 and Figure 2a) to less than the genome-wide threshold. However, associations between the 11 top-ranked CpGs and coffee consumption were still preserved after smoking adjustment (meta.P<0.05/11=4.5 × 10−3, Table 2). Further, we identified methylation differences for 135 CpGs associated with both coffee drinking and smoking (meta.P≤1.0 × 10−7, Supplementary Table S4). After smoking adjustment, the most significant differentially methylated genes were BSCL2 and GPR132, along with CNTN4 and ROBO3 that appear to be involved in axonal navigation. (meta.P≤5 × 10−6, Table 1b).

Table 2 Smoking adjusted results and saliva results for the 11 top-ranked CpG sites in Table 1a
Figure 2
figure 2

Blood DNA methylation levels associated with coffee consumption adjusted for age, gender, blood cell counts, and smoking. (a) Manhattan plot of the meta-analysis methylation association P-values adjusted for chronological age, gender, blood cell counts, and smoking. The line indicates P-value threshold of 10−7-no CpG passed this threshold. Y axis corresponds to negative log10 transformed meta.P-value. X axis refers to chromosome number, X and Y chromosomes. (b) Distributions of CpGs relative to CpG island and gene regions for all 450 k CpGs on the microarray and the five most significant coffee-associated CpGs listed in Table 1b. P-values were obtained by Fisher’s test for comparing proportions. A full color version of this figure is available at the European Journal of Human Genetics journal online.

It is worth noting that genetic variants previously linked to coffee consumption in GWAS did not reach the significance threshold of P≤1.0 × 10−3 in our study, specifically AHR, CYP1A1, CYP1A2, NRCAM, and ADORA2 A.9, 10, 32, 33, 34 However, our study corroborated the importance of the STK11 gene (cg24145685: meta.Z=4.54, meta.P=5.7 × 10−6; after smoking adjustment: meta.Z=3.99 and meta.P=6.6 × 10−5), which encodes a member of the serine/threonine kinase family and interacts with another gene (CAB39L) identified in a previous GWAS focused on coffee consumption.32

As mentioned above, previous studies reported reduced risks of developing PD and AD with habitual coffee consumption.35, 36 To the best of our knowledge, our blood tissue data did not include AD or PD patients. Surprisingly, we found some CpGs located near genes linked with familial forms of PD associated with coffee consumption: GBA (meta.P=7.9 × 10−5), PARK2/Parkin (meta.P=7.3 × 10−4), and PINK1 (meta.P=8.9 × 10−4). Similarly, some GWAS-identified loci for AD were also associated with coffee intake: PICALM (meta.P=1.3 × 10−5), CLU (meta.P=6 × 10−4), and EDC3 (meta.P=1.1 × 10−4).37, 38 The PARK2 gene encodes an ubiquitin protein ligase called Parkin that targets proteins for degradation in the proteasome. Pathways related to Parkin include oxidative stress, Class I MHC antigen processing and presentation, and alpha-synuclein signaling.39 The EDC3 gene is also of interest as it is located near CYP1A1/CYP1A2 and the enzymes encoded by them may interact with coffee consumption in reducing AD risk.38

Using the 2124 genes linked with the 3000 most significant coffee-associated CpGs (P≤1.1 × 10−3 without smoking adjustment) in gene set enrichment analysis, we identified 2901 CpGs (in/near 2058 genes) that were hypermethylated in habitual coffee drinkers, whereas only 3% (99 CpGs in/near 66 genes) were hypomethylated. Results of the DAVID functional analysis showed that these coffee-associated genes are enriched in functional categories of transcription factor binding (P=1.2 × 10−6, Table 3) and protein kinase activity (P=2.9 × 10−5). The enriched biological terms remained statistically significant after correcting for multiple comparisons (Benjamini-adjusted P<0.05).

Table 3 Functional enrichment analysis for top 3000 most significant coffee-associated CpG sites in 2124 genes (meta.P-value cutoff ~1.1 × 10−3)

It has previously been suggested that the potential protective action of coffee on PD in women may be abrogated by postmenopausal estrogen use.40, 41 Interestingly, when we stratified the participants from the WHI by MHT, we observed significant associations between coffee consumption and the 11 top CpG sites from Table 1a only in women who never used MHT and not in MHT users (Table 4).

Table 4 Stratified analysis of the 11 coffee-associated CpGs found in blood (in Table 1a) by MHT using WHI data only, adjusted for chronological age and blood cell counts

Coffee consumption and DNA methylation levels in saliva

We also evaluated associations between coffee and genome-wide DNAm levels in saliva provided by 256 participants with and without PD enrolled in the PEG2 study. After adjustment for chronologic age, gender, and ethnicity, no CpGs achieved genome-wide significance (P≤10−7, Figure 3a). When examining the 11 most significant coffee-associated CpGs we previously identified in blood, none of the significant associations were preserved in saliva (Table 2). Moreover, we did not observe positive correlations between meta Z-values for blood and Z-values for saliva (Figure 3b). Further adjustment for PD status did not change these results (Table 2), suggesting that PD status did not affect DNAm levels in saliva for the coffee-related CpGs identified in blood. After adjusting for smoking, there appeared to be a significant correlation for coffee-related DNAm in blood and in saliva tissues, but in the opposite direction of what we expected (Figure 4b and Supplementary Table S5). This can be explained by the ‘regression to the mean’ effect, suggesting DNAm data from saliva did not replicate associations we found in blood.

Figure 3
figure 3

Saliva DNA methylation levels associated with coffee consumption adjusted for age, gender, and ethnicity. (a) Manhattan plot of the methylation association P-values adjusted for chronological age, gender, and ethnicity. The line indicates the P-value threshold of 10−7-no CpGs passed this threshold. The y axis corresponds to negative log10 transformed meta.P-value. The x axis refers to chromosome number, and X and Y chromosomes. (b) Correlation between Z-values from biweight midcorrelations between DNA methylation and coffee consumption in blood and saliva for 50 most hypermethylated CpGs and 50 most hypomethylated CpGs in blood. The x axis corresponds to the meta Z-values adjusted for age, gender, and blood cell counts from PEG1 and WHI. The y axis corresponds to the Z-values adjusted for age, gender, and ethnicity from PEG2. A full color version of this figure is available at the European Journal of Human Genetics journal online.

Figure 4
figure 4

Saliva DNA methylation levels associated with coffee consumption adjusted for age, gender, ethnicity, and smoking. (a) Manhattan plot of the methylation association P-values adjusted for chronological age, gender, ethnicity, and smoking. The line indicates P-value threshold of 10−7-no CpG passed this threshold. Y axis corresponds to negative log10 transformed meta.P-value. X axis refers to chromosome number, X and Y chromosomes. (b) Correlation between Z-values from bi-weighted midcorrelations between DNA methylation and coffee consumption in blood and saliva for 50 most hypermethylated CpGs and 50 most hypomethylated CpGs in blood. X axis corresponds to the meta Z-values adjusted for age, gender, blood cell counts, and smoking from PEG1 and WHI. Y axis corresponds to the Z-values adjusted for age, gender, ethnicity,and smoking from PEG2. A full color version of this figure is available at the European Journal of Human Genetics journal online.

Potential limitations

Our study has some potential limitations: first, we have a small amount of uncertainty regarding the reported ethnicity since we do not have genome-wide SNP data to compute the genetic principal components. However, ethnicity information in the studies we included have been carefully verified using 37 Ancestry Informative Markers. Moreover, we stratified by ethnicity to address the possibility of ethnic confounding. Second, stratification by gender in the PEG1 study removed the significance of some of the associations between coffee consumption and methylation loci. However, this might be due to the decreased sample size imparted by stratification. Third, similar to other DNAm studies, lambda values in this study were inflated; therefore, we should interpret P-values with caution. However, our findings in Caucasians were replicated in other ethnic groups giving some validation to the results. In addition, it remains debatable whether to present QQ-plots in DNAm studies, as CpGs are highly correlated and the distributional assumptions made in GWAS may not be met in EWAS (Supplementary Figure S1).

Conclusions

In summary, in peripheral blood mononuclear cells we identified CpGs located near 11 genes that were associated with habitual coffee consumption based on the significance threshold (meta.P≤5.0 × 10−6) while adjusting for age, gender, and blood cell composition. Moreover, these correlations remained significant after further adjustment for smoking. Furthermore, many differentially methylated CpGs are located in/near genes reported to be associated with coffee-related chronic diseases or the common neurodegenerative diseases PD and AD for which coffee consumption has been suggested to be protective. Our results point to possible mechanisms through which coffee consumption may have beneficial effects and possibly may confer risk reduction. The measures of habitual coffee consumption we used in this study were based on recall over a short period (last 3 months in WHI or 12 months in PEG1); however, in PEG1 reported lifetime coffee consumption was highly correlated with coffee consumption reported for the past 12 months (cor=0.87), suggesting that coffee consumption is consistent across time. Moreover, this is a mixed race and mixed gender study, therefore the coffee associations with DNAm levels in blood appear to extend to both genders and different ethnic groups, even though in women, results also seemed to depend on MHT use. Finally, our study suggests that while coffee affects DNAm levels in blood this does not seem to extend to saliva-derived tissue.