Differential methylation of the TRPA1 promoter in pain sensitivity

Chronic pain is a global public health problem, but the underlying molecular mechanisms are not fully understood. Here we examine genome-wide DNA methylation, first in 50 identical twins discordant for heat pain sensitivity and then in 50 further unrelated individuals. Whole-blood DNA methylation was characterized at 5.2 million loci by MeDIP sequencing and assessed longitudinally to identify differentially methylated regions associated with high or low pain sensitivity (pain DMRs). Nine meta-analysis pain DMRs show robust evidence for association (false discovery rate 5%) with the strongest signal in the pain gene TRPA1 (P=1.2 × 10−13). Several pain DMRs show longitudinal stability consistent with susceptibility effects, have similar methylation levels in the brain and altered expression in the skin. Our approach identifies epigenetic changes in both novel and established candidate genes that provide molecular insights into pain and may generalize to other complex traits.


Supplementary Figures
Supplementary Figure S1 50  Supplementary Figure S4. DNA methylation levels in genomic regions that overlap histone modification peaks for 5 histone marks in one lymphoblastod cell line sample from the Encode project. Boxes show the 25% and 75% quantiles and whiskers extend to 1.5 times the IQR.  Figure S6. Follow-up unrelated-sample EWAS results. (a) Association results from a linear model regressing normalized methylation levels on pain scores, compared to the permuted data (permutation-based 95% confidence band shaded). (b) Association results from a linear model regressing normalized methylation levels on pain scores, including the first three autosomal principal components as covariates. The first three PCs together explained 25.7% of the variance in methylation.  Figure S7. Discovery MZ twin EWAS results, controlling for MeDIPfragment size variability. Supplementary Figure S8. Discovery MZ twin EWAS results, controlling for CpG-density using MEDIPs AMS methylation quantification scores.  Figure S9. Discovery MZ twin Illumina 450k EWAS results, compared to the permuted data (permutation-based 90% confidence band shaded).  Figure S10. DMR effects in 73 known candidate genes for pain in the discovery set of 50 MZ twins. Figure S11. Differential methylation in TRPA1 in pain sensitivity. Association results in the TRPA1 gene region (negative strand) in the meta-analysis (black solid line) and in the discovery (blue dotted) and follow-up (red dotted) samples. Bisulfite pyrosequencing was performed in the pain-DMR promoter region (red ticks), and data for one gene expression probe was available located near the TES (green tick).  Figure S12. Within-pair DMRs in the discovery MZ twin set. P values are the rank correlation P-values comparing the HPST differences to methylation differences within MZ twin pairs.  Table S6. Longitudinal results at 18 MAP-DMRs in Figure 3B. HPST-MethStable refers to the mean longitudinal HPST change per individual, in the group of individuals where DNA methylation changed by less than 30% over time. HPST-MethVar refers to the mean longitudinal HPST change per individual, in the group of individuals where DNA methylation changed by greater than 30% over time. The P-value is the T-test p-value comparing the mean HPST in the two groups (MethStable and MethVar).

Sample selection and phenotyping
The individuals included in the study were unselected twin volunteers from the TwinsUK cohort, who were recruited through national media campaigns and from other twin registers. Twins from this cohort have been shown to be comparable to the age-matched general population singletons for a broad variety of medical and behavioural traits 62 . Subjects were invited to attend St Thomas Hospital where they completed questionnaires gathering demographic information, clinical history and current medications. Exclusions for this study included volunteers who had consumed analgesic medication within 12 hours of the study visit, and those with likely impaired upper limb neurology, for example, known neuropathy, previous stroke or chemotherapy. Subjects with common painful conditions such as osteoarthritis were not excluded.
Twins underwent quantitative sensory testing (QST) individually, with the co-twin absent from the room while the test was performed, as previously described 61 . Briefly, each subject was seated at a table onto which the test arm was placed. A 25mm2 x 50mm2 probe connected to a Modular Sensory Analyzer Thermal Stimulator (Somedic, Sweden) was secured with a fabric-covered band on the volar surface of the forearm. Subjects received standardized instructions before assessment. The heat pain supra threshold (HPST) represents the temperature at which the sensation evoked by a thermal stimulus changes from "painful" to "unbearable". HPST was measured by heating the probe (at 10C/s) from an adaptation temperature of 320C until the subject perceived the stimulus as changing from painful to unbearable and stopped the experiment, at which point the temperature (equivalent to HPST) was automatically logged. If the probe reached 500C the machine automatically returned to adaptation temperature to prevent thermal burn. Blood was taken for DNA extraction after QST had been performed.
In the discovery sample we selected twenty-five monozygotic (MZ) female twin-pairs (age range 46-76, median age 62) from TwinsUK. The 25 discovery MZ twin pairs were selected as the most discordant MZ twin pairs for HPST in the TwinsUK cohort at the time of the study ( Supplementary Fig. 1). Discordance was defined as a difference of at least 20C in HPST scores, and one twin in each pair fell in the upper tail of the HPST distribution. The HPST MZ intraclass correlation coefficient was -0.144 in these data, which are selected for discordance. The follow-up sample consisted of 50 unrelated individuals (age range 42-86, median age 63.6) was ascertained separately 2-3 years after the discovery sample. The follow-up unrelated sample was unselected for HPST score distribution (Supplementary Figure 1).
White blood cell (WBC) sub-type counts were obtained in 48 individuals from discovery sample using fluorescence activated cell sorting of peripheral blood 3. WBC sub-type cell counts were calculated by multiplying the proportion of the WBC count comprised by each cell type by the total WBC cell count (estimated in thousands of cells per ml), for four cell types in our sample: neutrophils, eosinophils, monocytes, and lymphocytes.

MeDIP-seq Data Quality Control
We used Maq (v0.7.1) in the discovery MZ sample and BWA (v0.5.9) in the follow-up unrelated sample to align the sequence reads to the human genome (hg18), excluding clones that are not yet finished or cannot be placed with certainty at a specific place on the chromosome (files that were labelled "random"). The alignment was performed with the default parameters in Maq and BWA. We then assessed the quality of the read alignments, fragment location, and depth of coverage. We only considered reads that mapped at stringent alignment mapping thresholds (Maq q > 10 in discovery set; BWA q >10 in follow-up set) and excluded duplicated reads. We also only considered reads that were assigned as properly paired in the discovery set.

Quantifying MeDIP-seq DNA methylation levels
In the discovery set, we used the post-QC paired-end reads to determine the size and location of the corresponding MeDIP fragment, where the average fragment size was 236bp. We first obtained the per base-pair coverage across the genome, where each base that overlapped a mapped MeDIP fragment was given a score of +1. This constitutes the read-depth distribution. We obtained the read-depth distribution per individual including both lanes for that individual. We then binned coverage sum in overlapping windows of 1kb (overlap of 500bp).

Illumina 450k DNA methylation
We used Illumina 450K methylation profiles for 44 individuals from the discovery sample. We initially obtained Illumina 450k unprocessed beta values for 46 individuals from the discovery sample for 485,837 Illumina 450k probes. At each CpG-site for each individual the beta value represents the ratio of methylated bead signals over the sum of the methylated and unmethylated bead signals from the probe on the array. Illumina beta values range between 0 (unmethylated) and 1 (methylated).
We mapped the 50bp Illumina 450K probe sequences to the human genome (hg18) to confirm probe locations and check for sequence mismatches. We excluded all probes that mapped to multiple locations in the genome with up to 2 mismatches. Altogether, we excluded 17,651 probes. Furthermore, we only considered autosomal probes with no missing data in our sample, which resulted in a final number of 424,368 autosomal probes for further analysis.
We examined the individual-level Illumina 450k profiles for the presence of outliers and to identify potential confounders. The distributions of the unprocessed beta profiles across the genome highlighted two individuals who were outliers, and these were subsequently removed from the analyses.We then performed principal component analysis (PCA) of the Illumina 450k data and compared the first PC loadings (the first PC explained 22% of the variance) to potential confounders, and found associations with methylation chip, the concentration of the bisulfite converted DNA, and a weak association with position of the sample on the chip. Therefore, we included these three terms as covariates in the phenotype analyses.

Bisulfite pyrosequencing
Bisulfite pyrosequencing of 250-300bp regions was performed by EpigenDx laboratory service. The methylation status of each CpG site was analyzed individually as an artificial TC single nucleotide polymorphism using Q CpG software (Qiagen Pyrosequencing, Valencia, CA). The methylation level at each CpG site for each sample was calculated as the percentage of the methylated alleles over the sum of the methylated and unmethylated alleles. The mean methylation level was calculated using methylation levels of all measured CpG sites within the PCR region of a gene. The methylation level at each CpG site ranges from 0 (unmethylated) to 1 (fully methylated). EpigenDx performed the assay validation, bisulfite conversion, and pyrosequencing. We included high, medium, and low methylated DNA as controls in each run.
Controls included low methylated control DNA and in vitro methylated DNA were mixed at different ratios (0, 20, 40, up to 100%) followed by bisulfite modification, PCR, and pyrosequencing analysis. The percent methylation obtained from the mixing study was highly correlated with an R2 value of 0.99. The assay was also validated by sensitivity and reproducibility testing. Different amounts of genomic DNA were used for the analysis in triplicates. Reproducible results were obtained by using as low as 5 ng of genomic DNA in the bisulfite modification reaction.

Discovery MZ-twin MeDIP-seq pain-DMRs
In the discovery sample we used a linear mixed effects model where we fit the temperature-sensitivity of each individual as a function of methylation at a given 1kb locus. At each locus the methylation data were quantile normalised (to N(0,1)) prior to the analyses. We also ran the analyses by including age as a fixed effect covariate, but the top-ranked results were very similar to the non-corrected results (ranks varied by at most 5 units in either direction). We also ran the analyses by normalising the RMQ scores by the overall number of post-QC reads assigned to each chromosome, and observed that the top-ranked results were also very similar.
To further assess the sensitivity of pain-DMRs to methylation quantification of MeDIP-seq data we also performed analyses controlling for MeDIP-seq fragment size and CpG density (using MEDIPS AMS) in the discovery EWAS ( Supplementary Figures 7-8). The majority of the top-ranked pain-DMRs remained nominally significant in the additional analyses, although the relative rank could change. For example, the TRPA1 pain-DMR (originally ranked 50th, P = 2.6 x 10-6) was nominally significant when controlling for MeDIP-seq fragment size (ranked 49th, P = 2.6 x 10-6) and CpG-density (ranked 36,921st (in top 0.07%), P = 7.8 x 10-3). Further detail on these additional analyses is provided in the following two paragraphs.
The first additional analysis corrected for MeDIP-fragment size variation across the 50 individuals (Supplementary Figure 7). We compared the distributions of the proportion of the MeDIP-fragments at each given size from 50 to 350bps. There were 4 outliers in the data, which we removed. We then took the minimum proportion of MeDIP-fragments per size category and used this distribution as the distribution to normalize each individual to. For each individual we then dropped fragments at random within size category, until the MeDIP-fragment by size distribution matched the normalisation distribution. We used this subset of MeDIP-fragments in the downstream quantifications and analyses, which were performed as described for the main (RMQ) analyses.
The second additional DMR analysis in the discovery set took into account local CpGdensity in the DNA methylation quantifications (Supplementary Figure 8). We used MEDIPs to calculated absolute methylation scores (AMS), which we then used in downstream analyses as described for above. We excluded all bins where MEDIPS was not able to estimate discrete AMS (score was Infinity), or where fewer than 10% of individuals had methylation levels > 0. In the regression model, we also quantile normalised (to N(0,1)) the AMS scores in each 1kb bin. In general, the RMQ analyses were more similar to the analyses correcting for fragment-size, than the CpG-density correction. However, in the CpG-density weighted results the ranks of the top ranked RMQ DMRs showed greater variability, although the majority of the top 100 RMQ DMRs remained nominally significant.
We also compared the within-pair signed differences between pain-sensitivity and degree of methylation at a given locus within each MZ twin pair. The phenotype data here represented the difference in temperature-sensitivity between the pain-insensitive and pain-sensitive twin (therefore, the phenotype differences were always positive). The methylation data represented the corresponding difference in methylation quantification, assessed in tiles of 1000bp, between the pain-insensitive and pain-sensitive twin. To compare the methylation to phenotype differences, we applied a moderated t-test and a non-parametric Mann-Whitney U test to these data.

Follow-up unrelated MeDIP-seq pain-DMRs
In the follow-up unrelated sample, we ran the initial analyses by comparing normalised (to N(0,1) RMQ scores against HPST for each bin in the genome, using linear models. We also performed these analyses including age as covariate, without normalising (to N(0,1)) the RMQ scores and observed that the top ranked results had very similar ranks, effects and P-values in the additional analysis. We observed an inflation of test statistics overall in the follow-up results, and to explore this further we also repeated the analyses by including the first up to 3 principal components in the autosomal methylation data in the follow-up samples, which together explained 25.7% of the variance.

Discovery MZ-twin BiSeq pain-DMRs
We performed BiSeq validation of the TRPA1 region. In the pain-DMR analyses with the BiSeq data, we validated the MeDIP-seq signal at two CpG-sites at chr8:73151235bp (Pain-DMR rank correlation = 0.22, and association taking into account twin structure P = 0.03), and at chr8:73151380bp (Pain-DMR rank correlation = 0.67, and association taking into account twin structure P = 0.025).

Allele specific methylation (ASM) analysis
ASM analyses were performed only in the discovery set of MZ twins because MeDIP-seq coverage was greater in these data (about 50mln paired-end reads per individual). Briefly, we obtained a subset of heterozygous genetic variants from genotype and exome sequencing data in 48 MZ twins, and for each individual at each genetic variant we scored the number of MeDIP-seq reads spanning the reference (R) and non-reference alleles. We used the frequency of the reference allele in the MeDIP-seq data as a measure of ASM at each heterozygous genetic variant.
We merged exome sequencing data and directly genotyped and imputed (to hapmap2) genotype data to obtain a set of heterozygous SNPs for ASM analysis. For exome sequencing data we used genotype calls according to the analysis described in reference 61. For hapmap2 imputed genotypes, we required that the Impute2 info threshold of the SNP surpassed 0.8 and that the probability of a heterozygous genotype surpassed 0.5. We then selected all SNPs where at least 50% of individuals were heterozygous (13 of 24 MZ pairs). According to these criteria, there were 3,099 heterozygous genetic variants in the exome sequencing data and 278,304 heterozygous SNPs in the hapmap2 imputed data. We merged these into a final set of 279,885 unique candidate heterozygous genetic variants in our dataset, at which we estimated MeDIP-seq read coverage using maq pileup.
We considered variants where MeDIP-seq coverage depth surpassed 8 reads and used the allele frequency of the reference allele (calculated in VarAllele) as a measure of ASM; excluding all variants that were not bi-allelic. We called a site within an individual as ASM only if the reference allele frequency was < 0.25 or > 0.75. Altogether, ASM calls were obtained for 17,261 variants. We excluded all SNPs in genomic regions reported to be problematic in alignment according to reference 58.

Blood-Brain comparison
We used MeDIP-seq levels from blood and multiple brain regions in two female donors from Davies et al. 37 . The brain regions included inferior frontal gyrus, middle frontal gyrus, left frontal gyrus, entorhinal cortex, superior temporal gyrus of the temporal cortex, visual cortex, and cerebellum. To assess tissue-specificity of methylation effects at the DMRs we first considered regions to have tissue-shared methylation if the methylation difference between blood and one brain region was less than 10% of the overall blood methylation level in each of the two individuals profiled. Five of the 9 significant MAP-DMRs (and 63% of 100 top-ranked MAP-DMRs) met these criteria, including: MTMR12, chr4:165,977,000-165,978,000bp, MICAL2, ST6GALNAC3, and NFU1. Under more stringent criteria, if the mean difference between blood and all brain region was less than 10% of the overall blood methylation level, one MAP-DMR on chromosome 4 (and 8% of 100 top-ranked MAP-DMRs) had tissue-shared methylation levels. We also computed the correlation between brain and blood methylation levels, by using the mean methylation levels across both individuals in all brain regions and also in blood.

Genome Annotation and Gene Ontology
Gene annotations and DNA sequence features information were obtained from the UCSC genome browser. CpG Islands track information 63 was obtained from UCSC. Each methylation region was assigned to the gene with the nearest TSS from Refseq, and restricted analyses to genes that were within 50kb and up to 100kb of the TSS.
Histone modification ChIP-seq data were obtained from the Encode project from the CEPH HapMap LCL (GM12878) in the UCSC genome browser. We obtained data for genome-wide distribution of reads from ChIP-seq histone modifications for 7 modifications: H3K9ac, H3K27ac, H3K27me3, H3K4me1, H3K4me2, H3K4me3, H4K20me1. Each histone genome-wide distribution of reads was smoothed using a bandwidth of 100 and peaks were assigned using a stringent threshold of 1.0 as previously described in reference 64 .