Genetic variation and pesticide exposure influence blood DNA methylation signatures in females with early-stage Parkinson’s disease

Although sex, genetics, and exposures can individually influence risk for sporadic Parkinson’s disease (PD), the joint contributions of these factors to the epigenetic etiology of PD have not been comprehensively assessed. Here, we profiled sex-stratified genome-wide blood DNAm patterns, SNP genotype, and pesticide exposure in agricultural workers (71 early-stage PD cases, 147 controls) and explored replication in three independent samples of varying demographics (n = 218, 222, and 872). Using a region-based approach, we found more associations of blood DNAm with PD in females (69 regions) than in males (2 regions, Δβadj| ≥0.03, padj ≤ 0.05). For 48 regions in females, models including genotype or genotype and pesticide exposure substantially improved in explaining interindividual variation in DNAm (padj ≤ 0.05), and accounting for these variables decreased the estimated effect of PD on DNAm. The results suggested that genotype, and to a lesser degree, genotype-exposure interactions contributed to variation in PD-associated DNAm. Our findings should be further explored in larger study populations and in experimental systems, preferably with precise measures of exposure.


Study inclusion and exclusion criteria
For the TERRE study, individuals aged 18-75 who applied for healthcare through the Mutualité Sociale Agricole and who were not receiving free health care for dementia were eligible (1).PD patients were defined as having parkinsonism (≥ 2 of rest tremor, bradykinesia, rigidity, impaired postural reflexes) with exclusion of more prominent nervous system involvement and of druginduced parkinsonism (2).Controls were randomly selected from individuals affiliated with the Mutualité Sociale Agricole and not receiving healthcare for PD or dementia.
For the DIGPD study, patients were recruited between May 2009 and July 2013 at 8 French hospitals, and were eligible if diagnosed with PD according to the UK Brain Bank criteria, and if disease duration was 5 years or less at time of recruitment (3,4).Controls were selected from friends and relatives of PD patients, and were required to be free of PD according to the UK Brain Bank criteria (4).
Individuals recruited for the PEG1 study were residents of Fresno, Kern, or Tulare Counties in California, USA, and had lived in California for ≥ 5 years (5).PD patients were included if they were diagnosed within 3 years of recruitment, and satisfied the UK Brain Bank and Gelb criteria (4,6).Controls free of PD and aged 65 and older were recruited from Medicare lists, and additional controls free of PD and aged 35 and older were recruited from randomly selected tax assessor residential units in each county.One control per household was eligible for the study (7).
The SGPD study included individuals from the QPP, NZBRI, and SYD cohorts (8).For the QPP cohort, PD patients were included if they satisfied the Calne diagnostic criteria, and controls were either spouses of patients, siblings of patients, or community volunteers from the same area as patients, with the same ethnicity, and of similar age (8,9).PD patients from the NZBRI cohort were recruited by the NZBRI, and were excluded if they had a history of learning disability, severe head injury, stroke, or other neurological impairment, or major psychiatric complications.Finally, patients and controls from the SYD cohort were recruited at the Parkinson's Disease Research Clinic, Brain and Mind Research Institute at the University of Sydney, with PD diagnosed according to the UK Brain Bank criteria (4).

Genotyping data generation and preprocessing
The Illumina NeuroChip array was used to generate genotype data for cases and controls from TERRE starting at 486,137 SNPs.Genotype data were processed using the RICOPILI pipeline (10).Preimputation quality control (QC) first consisted of a set of cleaning filters applied using PLINK 1.9 (11): removal of strand-ambiguous SNPs, mismatch between recorded sex and sex determined on the array, a 0.05 call rate filter for SNPs, an individual missingness filter of 0.02, a minor allele frequency (MAF) filter of 0.01, and finally a heterozygosity filter removing individuals with F > 0.2.Variants on sex chromosomes and variants with Hardy-Weinberg equilibrium p < 1 × 10 −10 were then removed.
To estimate genotype PCs, we used a set of SNPs pruned in the following manner.Samples were processed to remove SNPs in linkage by two rounds of pruning of SNPs within 200 kb and in linkage disequilibrium (squared Pearson's correlation between SNPs [r 2 ] > 0.2) in a pairwise manner (i.e., --indep-pairwise 200 0.2 in PLINK 1.9 was run twice).We next removed SNPs with MAF < 0.05, and SNPs in highly recombinant regions.The samples were then filtered to ensure that identity by descent did not exceed |FST| > 0.2 between samples within each ancestry group.Ten genotype PCs were computed using FastPCA (Supplementary Fig. 9) (12).SNPs were then aligned to 1000 Genomes Phase 3 using BCFtools (13), and imputed to the 1000 Genomes Phase 3 European reference panel using the Michigan Imputation Server.Variants were filtered to have an imputation quality R2 (squared Pearson's correlation between imputed and true genotypes for a given SNP) > 0.3 and MAF within 0.1 of the 1000 Genomes Phase 3 European population.Finally, imputed SNPs associated with their genotype batch were removed, and inflation of the association between genotype and PD was checked by logistic regression accounting for sex, age at collection, and three genotyping PCs (with genomic inflation factor, λGC = 1.06 in TERRE) (Supplementary Fig. 10), yielding 8,354,189 SNPs for analysis.

Propensity matching and effect size adjustment in PEG1, DIGPD, and SGPD (replication samples)
For the PEG1 sample (n = 539 after QC), two sets of matching were performed with the MatchIt package, using the optimal method with glm distance estimation, a probit link function, and caliper width of 0.2: 1) between female (or male) PD cases and controls, and 2) across females (or males) from both samples (14,15).For both sets, individuals were matched on age, predicted neutrophil proportion, smoking score, and ethnicity (retaining subjects of European ancestry in each sample; Supplementary Figs.17,18).This resulted in 100 females from PEG1 (45 cases, 55 controls) matched to 100 females from TERRE (33 cases, 67 controls), and 118 males from PEG1 (59 cases, 59 controls) matched to 118 males from TERRE (38 cases, 80 controls).CMR median β values were estimated separately in PEG1 females and males based on these subsets of individuals, and PD case-control CMR Δβadj values were calculated by regressing each CMR median β on disease status, age, ethnicity, smoking score, and three cell type PCs (explaining 86% of variance in cell type composition; using "rlm" from the MASS package, Huber Mestimator, 500 iterations) (16).
For SGPD (n = 1751 after QC), two sets of matching were performed as stated above, excluding ethnicity as all individuals were of European descent (Supplementary Figs.17,18).When matching between SGPD and TERRE, a 4:1 ratio was used, in order to take advantage of the large sample size in SGPD.The matched datasets consisted of 400 females from SGPD (202 cases, 198 controls) matched to 100 females from TERRE, and 472 males from SGPD (241 cases, 231 controls) matched to 118 males from TERRE.The model median CMR β ~ disease status + age + smoking score + cell type PCs 1-3 (explaining 83% of variance in cell type composition) was used to calculate Δβadj values, separately in each sex.
Finally, as DIGPD was of a similar size to TERRE (n = 222 after QC), this sample was not subset for replication analysis.In both sexes, chip was confounded with age, and was thus not included in linear regression models, and in males, row was also excluded due to a mild association with disease status (r = 0.16, p = 0.07).Thus, the model median CMR β ~ disease status + age + smoking score + cell type PCs 1-4 (explaining 89% of variance in cell type composition) + genotype PCs 1-3 + plate + row was used to calculate Δβadj values in females, and the same model without row was used in males.Additionally, the lag time between last known pesticide exposure and DNAm measurement in each subject was recorded.In this sensitivity analysis, we assessed genetic and exposure related influences on CMRs across all females using the same linear model described in Methods, coding each exposure variable based on this lag time.Specifically, we constructed a variable for recent exposure as reported exposure at less than the median lag time across samples, and a variable for past exposure as a reported exposure at greater than or equal to the median lag time across samples.We then assessed E, G+E, and G×E models including both of these variables, accounting for each covariate present in our original analysis (See Methods).Supplementary Figure 16.Propensity matching in TERRE.Absolute standardized mean differences in age, alcohol consumption ("alcohol1"; 1: never; 2: occasionally; 3: regularly; 4: daily), smoking (1: never smoker; 2: former smoker; 3: current smoker), and history of head trauma (0: no; 1: yes) before and after matching are shown for: (A) TERRE female cases (n = 33) and controls (n = 67), and (B) TERRE male cases (n = 38) and controls (n = 80).In both instances, "full" matching was performed, using "glm" distance measure and a probit link function.(18), cell type PCs 1-3 (87% of variance).DIGPD adjustment covariates: age, predicted smoking, cell type PCs 1-4 (89% of variance), genotype PCs 1-3.SGPD adjustment covariates: age, predicted smoking, cell type PCs 1-3 (83% of variance).Glasses beer per week 0 (0, 0) 0 (0, 0) 0 (0, 0.  For each row, "additional" indicates the number of subjects not already flagged in the checks listed above.All time points of the longitudinal DIGPD study were processed together ("Total" column); the numbers of subjects removed from the first time point of DIGPD presented in this manuscript are indicated in the "Visit 1" column.For each row, "additional" indicates the number of probes not already flagged in the checks listed above.

Figure 2 .Supplementary Figure 6 .
Correlation of PD-associated blood DNA methylation patterns in females and males from TERRE.CMRs that passed median |Δβadj| ≥ 0.03 and padj ≤ 0.05 in TERRE epigenome-wide association analyses in each sex are shown.x-axis: median CMR |Δβadj| in females from TERRE (n = 100); y-axis: median CMR |Δβadj| in males from TERRE (n = 118).(A) Δβadj correlations for 67 CMRs differentially methylated in females and also tested for differential DNAm in males.Pearson's correlation coefficient (r) and correlation p-value are shown.Labeled CMRs had median |Δβadj| ≥ 0.03 in the same direction in both sexes.(B) Δβadj correlations for 2 CMRs differentially methylated in females and also tested for differential DNAm in males.DM: differentially methylated.Supplementary Figure 3. PD-associated differentially methylated CMR identified with sexadjusted model.(A) Volcano plot: Adjusted PD case-control DNAm differences in TERRE (sex-combined sample, n = 218).Colored point passed thresholds of median CMR |Δβadj| ≥ 0.03 and padj ≤ 0.05.Inset: the chr3:137228231-137228637 CMR is shown.y-axis: β value (level of DNAm) in subjects from TERRE.Black: PD cases; grey: controls.(B) Δβadj correlations, Pearson's correlation coefficient (r), and correlation p-value for 17 CMRs that passed padj ≤ 0.05 in the sex-adjusted analysis are shown.x-axis: median CMR |Δβadj| in females from TERRE (n = 100); y-axis: median CMR |Δβadj| in males from TERRE (n = 118).DM: differentially methylated.Confirmation of biological sex in TERRE.Raw methylation data are shown on the left; processed methylation data (following outlier removal, normalization, and probe filtering) are shown on the right.Red: individuals self-reported as female; blue: individuals self-reported as male.(A, B) PCA of X chromosome β values.(C, D) X and Y chromosome intensities as calculated using getSex from the minfi R package.(E, F) Proportion of missing β values on the Y chromosome (expected to be near zero for males and above zero for females before data processing, indicating detection of the Y chromosome in males only).Supplementary Figure 11.Robust principal components of cell type composition in TERRE and their correlation with disease status.(A) The first 10 PCs representing the variance in cell type composition among the 218 TERRE participants included in this study.x-axis: principal component of cell type; y-axis: proportion of variance in cell type composition explained.(B) Case-control differences in loadings for cell type PCs 1-6, explaining 89% of variance in cell type composition in TERRE.x-axis: PD status; y-axis: PCA loadings.p-values were adjusted for multiple comparisons using the Benjamini-Hochberg method.Supplementary Figure 12.Correlation between principal components (PCs) of normalized and probe-filtered β values in TERRE, demographic data, technical variables, predicted cell type proportions, and genotype PCs.PCA on β values for 803,777 EPIC probes and 219 subjects from TERRE passing quality control is shown. 1 subject was later removed due to missing self-reported smoking status.SentrixBarcode_A: sample chip; SentrixPosition_A: position on chip; PD: Parkinson's disease status; alcohol1: alcohol consumption; alcohol2: weekly wine consumption; alcohol3: weekly beer consumption; alcohol4: weekly aperitif consumption; alcohol5: change in alcohol consumption over time; MMS: Mini-Mental State Examination score; time_sto: sample storage time; bmi: body mass index; CTP: cell type proportion; PC: principal component.Supplementary Figure 15.Correlation between principal components (PCs) of median reference CMR β values in TERRE, demographic data, technical variables, predicted cell type proportions, and genotype PCs for CMRs variable in (A) only females, (B) only males, or (C) the sex-combined sample.SentrixBarcode_A: sample chip; SentrixPosition_A: position on chip; PD: Parkinson's disease status; alcohol1: alcohol consumption; alcohol2: weekly wine consumption; alcohol3: weekly beer consumption; alcohol4: weekly aperitif consumption; alcohol5: change in alcohol consumption over time; MMS: Mini-Mental State Examination score; time_sto: sample storage time; bmi: body mass index; CTP: cell type proportion; PC: principal component.

Table 2 . Standardized Mean Difference (SMD) in Overall Distance Before and After Propensity Matching Replication Samples
"distance" calculated by "glm" using a probit link function.Two rounds of matching were performed: 1) between cases and controls (on predicted smoking, predicted neutrophil proportion, and age, separately within each sex), followed by 2) between TERRE and each replication sample (on predicted smoking, predicted neutrophil proportion, and age, separately within each sex)."Before"indicates SMD before any matching; "after" indicates SMD after both rounds of matching.Supplementary