A cell-type deconvolution meta-analysis of whole blood EWAS reveals lineage-specific smoking-associated DNA methylation changes

Highly reproducible smoking-associated DNA methylation changes in whole blood have been reported by many Epigenome-Wide-Association Studies (EWAS). These epigenetic alterations could have important implications for understanding and predicting the risk of smoking-related diseases. To this end, it is important to establish if these DNA methylation changes happen in all blood cell subtypes or if they are cell-type specific. Here, we apply a cell-type deconvolution algorithm to identify cell-type specific DNA methylation signals in seven large EWAS. We find that most of the highly reproducible smoking-associated hypomethylation signatures are more prominent in the myeloid lineage. A meta-analysis further identifies a myeloid-specific smoking-associated hypermethylation signature enriched for DNase Hypersensitive Sites in acute myeloid leukemia. These results may guide the design of future smoking EWAS and have important implications for our understanding of how smoking affects immune-cell subtypes and how this may influence the risk of smoking related diseases.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Andrew E Teschendorff Aug 10, 2020 No particular software was used to collect data.
All statistical analyses were performed using R-version 3.6.2 freely available from cran.r-project.org the following R/BioC packages were used: EpiDISH v2.0.2 , locfdr v1. by submitting data requests to mrclha.swiftinfo@ucl.ac.uk; see full policy at http://www.nshd.mrc.ac.uk/data.aspx. Managed access is in place for this 73 year old study to ensure that use of the data are within the bounds of consent given previously by participants, and to safeguard any potential threat to anonymity since the participants are all born in the same week. The Illumina EPIC DNAm data for the TZH cohort can be viewed at NODE (https://www.biosino.org/node ) under accession number OEP000260, or directly at https://www.biosino.org/node/project/detail/OEP000260 , and accessed by submitting a request for data-access.
In our study, most of the data analysed is already in the public domain and therefore sample sizes were pre-determined. For the TZH cohort, we analysed over 700 samples, which is similar in size to some of the largest EWAS in blood performed to date. In our study, we have only included fairly large EWAS datasets, with the smallest study still containing over 450 samples. Power calculations that these sample sizes are adequate to identify blood cell-type specific differentially methylated cytosines was provided in our Zheng SC et al Nat Methods 2018 paper.
Results contained in the current study support the view that we had adequate power. The meta-analysis performed over the 6-7 large EWAS sets further confirmed that we are adequately powered.
In general, all samples and probes that passed our QC criteria, as detailed in the Methods section, were used.
In our study, we perform a meta-analysis over 6-7 large EWAS, precisely in order to assess reproducibility of our findings. Results obtained are highly consistent across all 7 studies, and also consistent with independent small-scale EWAS which assessed small panels of smokingassociated loci in purified blood cell subtype samples.
For the TZH cohort, blood samples were randomized in relation to beadchipID, beadchip position, sample well, plate, year of sample collection, subcohort (Han vs Zhuang ethinicities), and all major epidemiological variables, including age, smoking, gender and BMI. This is not relevant to this study, as we perform a meta-analysis of smoking EWAS, where the phenotype (i.e .smoking status) needs to be known in advance in order to conduct the supervised analyses in each EWAS study.