Introduction

Almost one in ten mothers smoke during pregnancy, with state-specific prevalence ranging from as low as 1.8% to as high as 27.1% in the USA1, while in Europe, the prevalence of maternal smoking during pregnancy (MSDP) ranges between 4.2 and 18.9% (ref. 2). Consequently, the numerous health effects of MSDP convey a significant public health concern. The impact of this exposure on fetal development has been the source of significant investigation, resulting in MSDP being recognized as a cause of multiple negative pregnancy and birth outcomes3.

The mechanisms that underlie this reproductive and developmental toxicity are partially understood, and include molecular and anatomical changes of the placenta4,5. In addition, experimental mouse models have recently highlighted the critical roles of proper placental function in ensuring successful pregnancy outcomes6. Epigenetic responses to prenatal exposures have emerged as potential intermediate links between early life exposures and developmental health outcomes, and epidemiologic studies of DNAm, particularly multi-cohort collaborative efforts, are powerful approaches to investigate these types of research questions7. Most studies of MSDP and epigenetics have focused on DNA methylation (DNAm) in cord blood, although some studies of placenta, peripheral blood, and lung tissues have also been performed8. The Pregnancy and Childhood Epigenetics (PACE) consortium9 published a large meta-analysis identifying thousands of MSDP-associated variations in DNAm within cord blood and child peripheral blood10. However, the placental epigenome has not been as thoroughly studied, although the placenta is likely a critical target organ of MSDP-associated toxicity. A handful of prior studies have examined the relationships between MSDP and (DNAm) in human placenta11,12,13,14, identifying MSDP-associated CpGs some of which have been suggested to partially mediate the effects of MSDP on lower birth weight (BW)15.

These studies have begun to characterize the impact that MSDP has on the human placental epigenome, but have been limited by small sample sizes. We aimed to address this gap by performing a fixed effects meta-analysis examining the relationships between MSDP and variations in the placental methylome across seven independent studies that are members of the PACE consortium. We also aimed to gain insights into the potential biological processes that might be affected and the relationships with birth outcomes by performing additional analyses with nearby mRNA expression, as well as functional, regulatory, and phenotypic enrichment, and a secondary meta-analysis of the associations between placental DNAm and birth outcomes. Here, we demonstrate that placental DNAm is associated with MSDP throughout the genome, and these differences in DNAm are associated with gestational age (GA) and birth size metrics. We also show that many of these CpGs are correlated with nearby gene expression and are enriched for genes involved in environmental response, growth factor signaling, and inflammation. Further studies are needed to assess potential causality and mediation.

Results

Study population

Seven American, Australian, and European studies (N = 1700) contributed to the epigenome-wide association study (EWAS) linking MSDP to placental DNAm: including Asking Questions about Alcohol in pregnancy (AQUA)16, Study on the prenatal and early postnatal determinants of child health and development (EDEN)17, Genetics of Glucose regulation in Gestation and Growth (Gen3G)18, Genetics, Early Life Environmental Exposures, and Infant Development in Andalusia (GENEIDA), Environment and Childhood Project (INMA)19, New Hampshire Birth Cohort Study (NHBCS)20, and Rhode Island Child Health Study (RICHS)21. For this meta-analysis, 344 (20.2%) mothers reported any MSDP, defined as any cigarette smoking during any trimester of pregnancy. Any MSDP tended to be less prevalent in the cohorts from Canada and the USA compared to those from Australia and Europe (Table 1). Three cohorts (N = 795, EDEN, GENEIDA, and INMA) contributed to the EWAS of sustained MSDP, defined as maternal smoking throughout pregnancy, among which 163 (20.5%) mothers reported sustained MSDP. Distributions of covariates by cohort are provided in the Supplementary Materials (Supplementary Data 1). The comparison group for models of any or sustained MSDP included all mothers that did not report smoking cigarettes during any trimester.

Table 1 Frequencies of any and sustained MSDP within participating cohorts.

Genome-wide DNAm meta-analyses

We produced four statistical models for each CpG site, regressing DNAm on both any and sustained MSDP, with and without adjustment for putative cellular heterogeneity, which was estimated with a reference-free deconvolution algorithm, RefFreeCellMix22. All models were adjusted for maternal age, parity, and maternal education. Genomic inflation factors from the cohort-specific models (ranging from λ = 0.824 to 2.478) and meta-analyses (ranging from λ = 2.002 to 2.839; Supplementary Data 2 and Supplementary Fig. 1) revealed potential residual confounding and inflation of test statistics. All EWAS results with Bonferroni-corrected p values < 0.05 are included in the Supplementary Materials (Supplementary Data 36). Heterogeneity in the associations across cohorts was lowest for the models that were adjusted for RefFreeCellMix (Supplementary Data 7), and thus we utilized the results from these adjustment models for all downstream analyses. To correct for residual bias and inflation, we, then, implemented BACON, which estimates an empirical null distribution to the data23. This substantially reduced inflation among cohort-specific results (ranged 0.957–1.228), and meta-analyses of these BACON-corrected estimates yielded 443 CpGs that were associated with any or sustained MSDP after Bonferroni correction (Supplementary Data 8 and Fig. 1). While 443 CpGs is a large number of sites to identify in an EWAS, this is similar in size to the 568 CpGs identified in a previous PACE meta-analysis of cord blood DNAm associated with MSDP10. Sustained MSDP yielded similar, but stronger effects for 93% of these 443 CpGs (Supplementary Fig. 2).

Fig. 1: Volcano and Manhattan plots of the inverse-variance fixed meta-analysis results for any and sustained MSDP.
figure 1

A Placental DNAm associations with any MSDP (total N = 1700 independent samples from seven independent studies; exposed = 344). B Placental DNAm associations with sustained MSDP (total N = 795 independent samples from three independent studies; exposed = 163). For both analyses, models were adjusted for maternal age, parity, maternal education, putative cellular heterogeneity, and residual bias. In the volcano plots, the x-axes show the estimated mean difference in DNAm (effect size), when comparing mothers that smoked during pregnancy (MSDP) to those that did not, with a possible range between 0 and 1, while the x-axes in the Manhattan plots represent genomic location; both plots share the same y-axes with −log10(p values). Bonferroni thresholds for statistical significance are shown as blue dots and a blue horizontal line, for volcano and Manhattan plots, respectively. The y-axes were truncated to a minimum p value of 1 × 10−30 (or maximum −log10(p) of 30), to allow for better visualization of the majority of our results.

We did not adjust for GA because we hypothesized that it was likely downstream of the epigenetic response to MSDP or possibly on the causal pathway between MSDP and changes in DNAm; in either case, it would likely result in overadjustment24. However, we did explore whether additional adjustment for GA altered the observed relationships in four of our cohorts (EDEN, INMA, NHBCS, and RICHS). We found that these additional adjustments had almost no effect on the estimates of differential DNAm, with no apparent attenuation of effects toward the null (Supplementary Fig. 3). Thus, MSDP-associated differences in GA could not explain the observed relationships between DNAm and MSDP.

The most notable association was observed at cg27402634, located upstream of the LEKR1 gene and the noncoding RNA LINC00886, which showed the largest differential DNAm and smallest p values in all meta-analyses. Placentas that were exposed to any MSDP had 23.33% lower DNAm (95% CI: 21.17–25.51% lower DNAm; inverse-variance fixed-effect meta-analysis p value = 8.85E−99) and those exposed to sustained MSDP had 25.08% lower DNAm (95% CI: 23.06–27.11% lower DNAm; inverse-variance fixed-effect meta-analysis p value = 2.20−E130), when compared to mothers that did not smoke at all during pregnancy. Although all cohorts observed substantial hypomethylation with MSDP at this CpG, the actual estimates of the associations were highly variable between cohorts for models of any MSDP (Cochran’s Q test p value = 1.74E−15), but relatively consistent for models of sustained MSDP (Cochran’s Q test p value = 1.61E−01; Fig. 2A). Overall, we observed consistency in the associations across cohorts for the vast majority of the 443 CpGs: 93% and 96% of the 443 CpGs yielded heterogeneity p values > 0.01, for any and sustained MSDP, respectively. In addition to cg27402634, we highlight those relationships that yielded that largest magnitudes of association: |βAny MSDP| > 0.05 for cg26843110 (EDC3), cg20340720 (WBP1L), and cg17823829 (KDM5B; Fig. 2B–D). We identified numerous other noteworthy relationships but due to the large number of genome-wide significant associations, we highlight the relationship among the 20 most statistically significant CpGs from the primary meta-analysis of any MSDP going forward (Table 2), while results for all BACON-adjusted genome-wide significant CpGs, along with detailed annotations, are included in the Supplementary Materials (Supplementary Data 8).

Fig. 2: Forest plots of cohort specific and inverse-variance fixed-effect meta-analysis estimates of associations between MSDP with placental DNAm.
figure 2

Estimated differential DNAm (Mean Diff.) and 95% confidence intervals (95% CI) at A cg27402634, B cg26843110, C cg20340720, and D cg17823829 with any MSDP and sustained MSDP. All models were adjusted for maternal age, parity, maternal education, putative cellular heterogeneity, and for residual bias. The mean difference represents the estimated difference in the proportion of DNAm at each CpG when comparing mothers that smoked during pregnancy (MSDP), to those that did not smoke during pregnancy via linear regression.

Table 2 Top 20 meta-analysis results from models of any and sustained MSDP.

Expression quantitative trait methylation (eQTM) analyses

We then performed expression quantitative trait methylation (eQTM) analyses, testing whether the DNAm levels at MSDP-associated CpGs were associated with the expression of nearby mRNA (within 250 kb of the CpG) from 194 placental samples in the RICHS cohort. We mapped each CpG to the gene that was most strongly associated with DNAm levels if the association produced a p value < 0.05, then used a Bonferroni-corrected threshold to identify those eQTMs with the strongest evidence of a relationship. Among the 421 CpGs that were within 250 kb of a transcription start site (TSS), 258 CpGs were mapped to eQTM genes (p values < 0.05), 40 of which were significant after Bonferroni correction (α = 1.43E−05; Supplementary Data 9). The majority of mapped eQTMs exhibited inverse associations (65%) and statistical significance was strongest for CpGs that were closest to the TSS (Supplementary Fig. 4). Among the top 20 CpGs from our meta-analysis, 15 mapped to eQTM genes (Table 3) and 5 of which were significant at the Bonferroni-adjusted threshold: SH3D21, TBC1D8, USP46, CRAT, and TGFB1.

Table 3 Results from eQTM models, DNAm versus GA at birth, and DNAm versus BW.

Functional and regulatory enrichment analyses

Enrichment analyses were performed to gain insights into the biological processes in placenta that may be impacted by MSDP through altered epigenetic regulation. We performed gene-set enrichment analyses using two different gene sets (Supplementary Data 10), first using the genes annotated to the 443 MSDP-associated CpGs according to the Illumina annotation file (284 genes), and second using the genes mapped to these CpGs via the eQTM analyses (211 genes). The first gene list leverages as much of the data as possible that was generated from our most well-powered analysis. Since DNAm is a relatively stable epigenetic feature, this could represent a record of the genes whose methylation levels were perturbed by MSDP across the pregnancy period. On the other hand, gene expression is more dynamic than DNAm and is affected by multiple stimuli. Thus, the second eQTM-mapped gene set likely represents the genes that are influenced by DNAm at these MSDP-associated CpGs at, or close to, the time of birth.

We found that 46 and 9 biological pathways were significantly (q value < 0.05) enriched among the Illumina annotated and eQTM-mapped genes, respectively (Supplementary Data 11 and 12). Overall, the eQTM genes were involved in inflammatory activity (aryl hydrocarbon receptor (Ahr) pathway, Th17 cell differentiation, neutrophil degranulation, and platelet degranulation), tyrosine kinase signaling (TYROBP causal network and signaling by receptor tyrosine kinases), carcinogenesis (pathways in clear cell renal cell carcinoma), adipogenesis, and platelets. These pathways included multiple eQTM-mapped genes that were within the Bonferroni-corrected significance threshold; most notably, TGFB1, was involved in almost all of these pathways, and TNFRSF1B and ACLY, were each involved in two of these pathways. The Illumina annotated gene set, which included a larger number of genes annotated to more MSDP-associated CpGs, was additionally enriched with numerous pathways involving growth factor signaling (FGFR, EGF-EGFR, and PDGFR), hormones (aldosterone, insulin, and TSH), immune and inflammatory signaling (IL2 and IL6), MAPK signaling, myometrial and vascular smooth muscle contraction, signal transduction, and cancer pathways. While only one pathway was enriched among both annotated and eQTM genes, signaling by receptor tyrosine kinases (annotated genes q value = 0.025 and eQTM-mapped genes q value = 0.032, via hypergeometric test), genes from both lists were involved in numerous inflammatory signaling pathways.

To further understand the regulatory landscape for these differentially methylated CpG sites, we used EnrichR to test for enrichment of transcription factor (TF) targets from ENCODE/ChEA databases25. The genes annotated to our MSDP-associated CpGs were targets of GATA1 and GATA2, the androgen receptor, TP63, SMAD4, RUNX1, and ZBTB7A (Supplementary Data 13). While the eQTM gene list was not significantly enriched for TF binding, the top TF was RUNX1 (Supplementary Data 14). We then examined whether the MSDP-associated CpG sites were enriched for allele-specific germline differentially methylated regions (gDMR)26, regulatory features from the placenta-specific 15-chromatin state annotation from ROADMAP27, or placenta-specific partially methylated domains (PMD)28, which contain placenta-specific repressed genes (annotated to the results files in Supplementary Data 8). Most notably, the MSDP-associated CpGs were substantially depleted for PMDs (Supplementary Fig. 5), and highly enriched in placental enhancers (Supplementary Fig. 6); both of these are indicators that our set of placental MSDP-associated CpGs occur within highly active regions of the placental methylome. While we also explored whether our findings were enriched for allele-specific gDMRs, only one of the 443 MSDP-associated CpGs was within a candidate maternal gDMR (cg05211790 annotated to RAI14), suggesting that gDMRs are not substantially affected by MSDP.

Phenotype enrichment analyses

We tested for enrichment for phenotypes within the database of Genotypes and Phenotypes (dbGAP) using EnrichR, to understand the types of health outcomes that have been associated with these genes. We again tested the gene lists based on those annotated to our CpGs (n = 284 genes) and those mapped to eQTM genes (n = 211 genes). The Illumina annotated genes were enriched for cell adhesion molecules (CAM), asthma, body mass index (BMI), blood pressure, and antipsychotic agents; while the gene list based on eQTM mapping did not yield significantly enriched phenotypes (Supplementary Data 15 and 16). The CAM designation in dbGAP, which was the most significantly enriched phenotype (Fisher’s exact p value = 6.51E−05), describes a broad array of molecular functions related to cellular mobility and integration, wound healing, and metastasis, which are related to many of the biological functions identified in the above pathway enrichment tests. While the other phenotypes are related to some of the health outcomes that have been associated with prenatal tobacco smoke exposure: asthma, cardiometabolic effects (BMI and blood pressure), and psychiatric effects (antipsychotic agents).

Proximity to genetic variants linked to birth outcomes

We aimed to understand whether the regions of the placental genome where DNAm is responsive to MSDP are also important in fetal growth regulation. Thus, we explored whether genetic variants that have previously been associated with birth outcomes via genome-wide association studies (GWAS) are within close proximity to our identified CpGs. We examined whether MSDP-associated CpGs were within ±0.5 Mb (1 Mb window) of single-nucleotide polymorphisms (SNPs) that have been associated with BW, birth length (BL), head circumference (HC), and GA29,30,31,32,33,34 (Supplementary Data 8). Of the 330 birth outcome SNPs in autosomal chromosomes, 61 SNPs were within 0.5 Mb of 51 CpGs (Supplementary Data 17), including cg27402634 (LEKR1), cg26843110 (EDC3), and cg20340720 (WBP1L), suggesting that these genomic regions that are responsive to MSDP appear to be involved in growth regulation. We also explored whether our MSDP-associated CpGs may be biased by methylation quantitative trait loci (mQTLs), in which SNPs influence the methylation levels at nearby CpGs. Two studies have examined this question in human placenta, identifying 866 (ref. 35) and 4342 (ref. 36) placental mQTLs. Our findings did not appear to be biased by genetic variation as only 5 of the 443 MSDP-associated CpGs were previously characterized placental mQTLs.

DNAm associated with smoking-related birth outcomes

We then performed secondary meta-analyses to examine the relationships between DNAm with GA at birth, preterm birth, BW, BL, and HC z-scores. Of the 443 CpGs tested, 142 (32.1%) were related to at least one birth outcome after Bonferroni adjustment (0.05/443). The majority of birth outcome associations were related to GA at birth (121 CpGs) (Supplementary Data 18). Preterm delivery, produced similar associations, although fewer CpGs were statistically significant (Supplementary Data 19). We also found that numerous CpGs were associated with birth size z-scores, with the majority of these being associated with BW (25 CpGs; Supplementary Data 20), followed by BL (11 CpGs; Supplementary Data 21) and HC (2 CpGs; Supplementary Data 22). Some of the CpGs associated with GA were also associated with birth size measurements, even although BW, BL, and HC were standardized for GA, suggesting independent associations with both gestational duration and fetal growth (Supplementary Fig. 7). Four CpGs (annotated to KDM5B, TTC7B, SFRS1, and DUSP6) shared associations with both GA and BW, two (annotated to TMEM51 and MYO7A) with both GA and BL, and one with all three of GA, BW, and BL (annotated to KIAA1211). Among the CpGs that were associated with at least one of these birth outcomes, those that tended to have positive associations with BW, BL, HC, or GA were typically hypomethylated with exposure to MSDP, while CpGs that exhibited inverse associations with birth outcomes tended to be hypermethylated with exposure to MSDP.

Among our top 20 CpGs that were associated with any MSDP, 5 were associated with GA at birth and 6 were associated with BW z-scores (Table 3). DNAm at cg27402634 (LEKR1) and cg20340720 (WBP1L), both located close to BW-SNPs and for which MSDP associated with lower DNAm, were associated with larger BW (p value = 6.71E−07 and p value = 2.42E−07, respectively). On the other hand, DNAm at cg26843110 (EDC3; hypomethylated in response to MDSP and also close to BW-SNPs) and at cg17823829 (KDM5B; hypermethylated) were associated with longer and shorter GAs at birth, respectively (p value = 5.09E−12 and p value = 9.11E−06, via inverse-variance fixed-effect meta-analysis). Forest plots of BW z-scores and GA for these four CpGs are shown in Fig. 3.

Fig. 3: Forest plots of the cohort specific and inverse-variance fixed-effect meta-analysis estimates of association between higher levels of placental DNAm with gestational age at birth and birth weight.
figure 3

Estimated differences in gestational age at birth and birth weight z-scores (slope) and 95% confidence intervals (95% CI) associated with increasing levels of DNAm at A, B cg27402634, C, D cg26843110, E, F cg20340720, and G, H cg17823829. All models were adjusted for maternal age, parity, maternal education, and putative cellular heterogeneity. The slopes and 95% confidence intervals (95% CIs), represent the regression coefficients from linear models, in which gestational age (inverse normal transformed) and birth weight (z-scores) were regressed on DNAm at each CpG, while adjusting for confounders.

Comparison with CpGs associated with MSDP in cord blood

We then assessed whether the DNAm signatures of MSDP in the placenta were consistent with MSDP associations in cord blood previously reported by the PACE consortium10. Only four CpGs (annotated to CYP1A1, GNG12, RNF122, and ZBTB4) yielded significant associations in both tissues (Table 4). Of note, the CpGs within CYP1A1 and RNF122 showed opposite directions of association with MSDP in cord blood and placenta. There was no overall genome-wide correlation (r2 < 0.1) of the regression coefficients across these two tissues (Supplementary Fig. 8). We also explored whether there was more consistency if we used a relaxed significance threshold. Of the 6073 CpGs that were within a 5% FDR from the cord blood analysis, 115 also yielded associations with FDR < 5% in placenta, 70 (61%) of which had consistent directions of effect between the two tissues (Supplementary Data 23). Among these, one CpG within AHRR, cg21161138, exhibited consistent hypomethylation with MSDP across both studies; this was notable since CpGs within the AHRR gene have been most consistently identified in studies of MSDP and cord blood DNAm.

Table 4 Comparing MSDP-associated CpGs from placenta to those from cord blood.

Discussion

We identified 443 CpG sites with placental methylation levels that were associated with any or sustained MSDP. Differential DNAm was greater for the majority of these CpGs when we stratified to sustained MSDP, and a large proportion of the MSDP-associated CpGs were related to birth outcomes. Those CpGs that were observed to have higher DNAm associated with MSDP, tended to be inversely associated with GA and birth size, while CpGs exhibiting lower DNAm with MSDP tended to be positively associated with GA and birth size.

The most statistically significant association (cg27402634), also identified in prior EWAS of MSDP in placental tissues15, yielded dramatically lower DNAm levels in association with MSDP exposure. This effect size is much larger in magnitude (~25% difference for sustained MSDP) compared to what has generally been observed in most exposure-focused EWAS, although within the same range as a CpG site in AHRR (cg05575921; 18% difference between exposed and unexposed) from a prior EWAS of current smoking and blood DNAm37. In addition, decreased placental DNAm at cg27402634 correlates with increased expression of LEKR1, and associates with smaller BW and BL. Thus MSDP-associated hypomethylation at this CpG would be consistent with the well-known effect of maternal smoking, resulting in shorter gestation and smaller birth size.

The functional activities of cg27402634, or corresponding LEKR1 gene, in human placental tissues are not known. However, GWAS findings provide evidence that genetic variants within this region (3q25.31) might be involved in fetal growth and possibly metabolic programming. For instance, the SNP rs1482852 or its proxies (rs900400; rs13322435) have been associated with fetal growth38, adiposity in newborns39,40, maternal adiponectin levels, cord blood leptin40, and insulin release after an oral glucose challenge41. These findings from genetic studies in combination with our current study, implicate that this locus on chromosome 3 (3q25.31) contains active determinants of fetal growth regulation and metabolic activity, and that placental DNAm at cg27402634 is highly responsive to maternal smoking. Future mechanistic work is necessary to investigate whether the placental epigenetic regulation at this locus specifically influences placental functions and/or overall growth and metabolic functions of the developing fetus.

We identified numerous other notable MSDP-associated CpGs, and highlight those CpGs yielding the strongest magnitudes of effect (cg20340720, cg26843110, and cg17823829). MSDP was associated with lower DNAm at cg20340720, located within WBP1L (also annotated as C10orf26), while lower DNAm at this CpG correlated with lower with BW and BL z-scores. Genetic variants nearby to this CpG have been related to BW34 and blood pressure42. We also observed lower DNAm with MSDP at cg26843110, which is within the body of the EDC3 gene, and is nearby to SNPs associated with BW (rs3784789, ref. 34). Lower DNAm at cg26843110 associated with shorter GA at birth, and decreased expression of CSK, which is involved in trophoblast differentiation43, as well as blood pressure and aldosterone regulation44. Finally, cg17823829 (annotated to KDM5B) was hypermethylated with MSDP. Higher DNAm at this CpG correlated with shorter GA at birth and with lower expression of PPFIA4 gene, which can be induced in response to hypoxia45.

Our enrichment analyses identified numerous interrelated pathways that are critical to placental growth and development. The Ahr pathway was significantly enriched among the eQTM-mapped genes, and is well recognized for its roles in responding to environmental exposures and influencing immune activity, particularly among Th17 cells46. Multiple pro-inflammatory immune cell pathways, such as Th17 cell differentiation, as well as neutrophil and platelet degranulation were also enriched among these MSDP-associated eQTM genes. Th17 cells can induce inflammation and oxidative stress in the placenta47, while neutrophils are involved in the inflammatory cascades that are thought to contribute to preterm birth48. Platelets can recognize damaged tissues and coordinate T-cell-mediated inflammatory responses, including Th17 cells49. Receptor tyrosine kinase pathways, which were enriched among both the eQTM and Illumina annotated gene sets, describe a broad array of cell surface receptors that can bind to cytokines, hormones, and growth factors, and include the epidermal growth factor receptor (EGFR). The EGFR is highly expressed in placental tissues, while EGF is involved in protecting trophoblasts from hypoxia-induced apoptosis50, and perturbed EGF-EGFR signaling has been associated with placental pathologies and growth restriction51. In addition, when activated, the EGFR initiates multiple signaling pathways involving MAPK/JAK kinases or STAT TFs52, while the MAPKAPK3, JAK1, and STAT5A genes were among those with MSDP-associated CpGs and were involved in numerous pathways identified in our enrichment analyses. Thus, these functional enrichment analyses characterized numerous interrelated pathways that contribute to sensing and responding to environmental stressors, regulating placental inflammatory activity, and influencing growth factor signaling.

We aimed to better understand the overall regulatory landscape of these CpGs by testing for enrichment for regulatory features (TSSs, enhancers, gene bodies, and untranslated regions), PMDs, allele-specific gDMRs, and whether they were enriched for TF binding and phenotypes via EnrichR. These CpGs were enriched in placental enhancers while depleted in PMDs28, suggesting that they are located within active regulatory regions. The genes that were annotated to MSDP-associated CpGs were enriched for genes regulated by specific TFs, most notably GATA1, GATA2, and RUNX1. Together with PPARG and TP63, GATA factors are part of the core transcriptional regulatory circuit that guides and maintains proper trophoblast differentiation53,54, while angiogenic activity is reduced in placentas lacking GATA255. RUNX1 on the other hand, is a driver of haematopoietic development56, and while not thoroughly studied in human placenta, there is some evidence that placental inflammation is associated with the upregulation of placental RUNX1 and hematopoiesis57. Genes that were annotated to MSDP-associated CpGs have also been linked to human health and disease traits via dbGAP, including a number of conditions related to cardiometabolic health (BMI and blood pressure), which has previously been linked to MSDP58,59. This may indicate that MSDP effects placental genes that are involved in energy uptake and expenditure, lipid and glucose metabolism, blood pressure regulation, and inflammation, which are some of the key physiological processes that are disrupted in the pathogenesis of metabolic syndrome60. Genes were also enriched for asthma and psychiatric health, which are known to be related to MSDP61. Thus, the epigenetic response to MSDP predominantly occurs at CpGs that are in active regions of the placental genome, at genes that are targeted by TFs involved in angiogenesis and hematopoiesis, and within genes that have been linked to cardiometabolic, respiratory, and psychiatric health.

We also aimed to understand whether the regions of the epigenome that associate with MSDP are involved in fetal growth regulation. We found that many of these CpGs, including some of our strongest hits from the meta-analysis, were in similar genomic proximity (within 0.5 Mb) to SNPs that have previously been associated with birth size or GA at birth29,30,31,32,33,34. In addition, our secondary meta-analyses demonstrated that DNAm levels at almost half of our CpGs were also associated with GA and/or BW and BL. Thus, the CpG and gene lists that we have produced in this study are robustly associated with MSDP and with birth outcomes across multiple independent populations, making these compelling candidates for future studies aimed at performing causal mediation analyses. We did not pursue mediation in this study as it has been established that due to measurement error in self-reported smoking and the fact that smoking-related methylation is an excellent biomarker of exposure, mediation can be overestimated62. Thus, mediation studies are most suitable for cohorts that collect objective smoking exposure biomarkers, such as cotinine or 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol (NNAL). Some studies, however, have demonstrated that placental DNAm is an interesting candidate as a possible mediator between MSDP and lower BW15,63, and we encourage additional investigation of this possibility (while also considering other birth outcomes), using the CpGs that we identified as candidates and using Mendelian Randomization approaches. It is also possible that MSDP, genetic variation, and placental DNAm in these regions could yield additive or interactive effects on birth outcomes. Prior studies have addressed this question in blood64,65, but future research is needed to explore the potential interactive effects between genetic variants and MSDP in relation to DNAm in the placenta or between genetic varaints and MSDP-associated DNAm in relation to birth outcomes. In addition, while our MSDP-associated CpGs are in similar genomic regions (1 Mb windows) with birth outcome SNPs, very few of our CpGs (4 of 443) are known placental mQTLs35,36, and thus we do not think that our findings were substantially biased by nearby genetic variation.

We compared our findings to those of a previous PACE meta-analysis of MSDP and cord blood DNAm10. The most statistically significant association with MSDP in placenta (cg27402634, LEKR1) was not associated with MSDP in the cord blood meta-analysis. Only two CpG sites, annotated to GNG12 and ZBTB4, were differentially methylated at genome-wide threshold of statistical significance in both placenta and cord blood, with the same direction of association in both tissues. While two CpG sites within CYP1A1 and RNF122 were genome-wide significant in both meta-analyses, but with different directions of association in cord blood versus placenta. Interestingly, we observed CYP1A1 to be hypomethylated in placenta with exposure to MSDP, which is consistent with studies of adipose, skin, and lung tissues66, but this CpG was hypermethylated in cord blood10. Even when using relaxed significance thresholds, such as FDR, only 70 CpGs exhibited consistent effects between these two tissues. One CpG within AHRR (cg21161138) did exhibit consistent hypomethylation with MSDP across both tissues at this relaxed FDR significance threshold. These observations suggest that there are unique placenta-specific CpG methylation responses to this exposure. However, as mentioned above, the Ahr pathway, was significantly enriched among our eQTM genes, and the cord blood study did identify some genes involved in pro-inflammatory response and growth factor signaling. Thus, while there was little overlap in specific MSDP-associated CpGs and genes between these two tissues, there was some overlap in the overall biological processes.

The above findings should be interpreted within the context of this study’s limitations. MSDP was self-reported and subject to misclassification, although differential misclassification most likely would have biased our findings toward the null. We found that sustained MSDP produced larger magnitudes of association, as has been previously found in studies of blood DNAm67, but we did not explicitly compare those mothers that smoked throughout pregnancy to those that quit during early pregnancy, nor did we assess dose–response patterns (i.e., number of cigarettes or biomarker concentrations), both of which should be the focus of future investigations. Our study predominantly consisted of samples from mother–infant pairs of European ancestry, and thus additional studies involving diverse ancestries and ethnic backgrounds are needed, in order to improve the generalizability of these findings. The observed associations between DNAm levels and reproductive outcomes could be due to reverse causation, which is one of the reasons we did not pursue formal mediation analyses. In addition, placenta is a heterogeneous tissue with multiple different cell types68 that serve different functions and thus have different epigenetic states69. To correct for this, we estimated and adjusted for variability using a data-driven approach, RefFreeCellMix22, since no references for placental cell-type methylomes was available at the time. In addition, it is possible that cohort-specific sampling protocols or other cohort-specific differences in data generation could have resulted in placental samples from some cohorts having greater heterogeneity than others. Residual confounding from cellular heterogeneity or other unmeasured confounders may have contributed to the observed inflation in our original meta-analyses, thus we implemented BACON23 and only present those findings that were statistically significant after applying Bonferroni correction for multiple testing after inflation and bias correction with BACON.

Despite these limitations, our study had numerous strengths, including a large sample size across seven independent studies, harmonized definitions of exposure variables and covariates, and standardized protocols for quality control, pre-processing and analyses of DNAm data. We performed secondary analyses involving mRNA expression, functional and phenotype enrichment, overlap with GWAS hits for reproductive outcomes, and meta-analyses of DNAm variation with birth outcomes to provide biological and health-related interpretations of our findings. Overall, we identified a DNAm signature of MSDP in the placenta that shows substantial differences from that observed in cord blood, most notably cg27402634 which is intergenic between LINC00886 and LEKR1, where placentas that were exposed had ~25% lower DNAm than those that were not exposed. The MSDP-associated CpGs are within active regions of the placental epigenome, and the genes that are associated with them are involved in responding to environmental stressors, regulating inflammatory activity, signaling through growth factors, and have previously been related to cardiometabolic outcomes. In addition, many of these CpGs were within similar genomic proximity to birth outcome SNPs, and we demonstrated that DNAm was also associated with GA at birth or birth size. These CpGs, genes, and biological pathways provide compelling candidates for follow-up studies aimed at testing causal mediation or at elucidating mechanisms.

Methods

Participating cohorts

Cohorts that are members of the PACE consortium were identified for participation in the current study, if they had existing DNAm data quantified from placental tissue via the Illumina Infinium HumanMethylation450 BeadChip, and if they had obtained information on self-reported smoking during pregnancy. The seven cohorts that contributed to the meta-analysis of any MSDP included AQUA16, EDEN17, Gen3G18, GENEIDA, INMA19, NHBCS20, and RICHS21. EDEN, GENEIDA, and INMA also contributed to the sustained MSDP stratified analyses. RICHS contributed RNA-seq data for analyses with mRNA expression. All cohorts acquired ethics approval and informed consent from participants prior to data collection through local ethics committees. Exclusion criteria for this study were non-singleton births, preeclampsia, and DNAm not assessed in the fetal side of the placenta. All participants in the study were of European ancestry, except 1.85% of EDEN mothers. Detailed methods for each cohort are provided in the Supplementary Material (Supplementary Methods File).

Tobacco smoking definitions

Any MSDP was defined as mothers reporting smoking cigarettes at any time during pregnancy. Sustained MSDP was defined as mothers reporting smoking cigarettes in the first and third trimester of pregnancy. For both exposure variables, the comparison group was defined as the mothers that reported no smoking during any of the pregnancy.

DNAm data quality control and normalization

All DNAm data processing and analyses were conducted in R, with the exception of the meta-analyses, which were performed with METAL. Placental DNAm from the fetal side was assessed with the Infinium HumanMethylation450 array (Illumina, San Diego, CA USA). See Supplementary Methods file for extra details on placenta collection, DNA extraction, and DNAm acquisition in each cohort. Quality control of DNAm was standardized across all cohorts. Low-quality samples were filtered out and probes with detection p values > 0.01 were excluded. Beta values were normalized via functional normalization70, and beta-mixture quantile normalization71 was applied to correct for the probe type bias. Cohorts searched their data for batch effects and applied ComBat when applicable; all but one cohort (GENEIDA) identified batch effects and used ComBat to remove this source of variation (Supplementary Data 2). Probes that hybridize to the X/Y chromosomes, cross-hybridizing probes, and probes with SNPs at the CpG site, extension site, or within 10 bp of the extension site with an average minor allele frequency > 0.01 were filtered out72. Overall, 418,658 probes and 415,396 were available for modeling any MSDP and sustained MSDP, respectively. Finally, DNAm extreme outliers (<25th percentile − 3 × IQR or >75th percentile + 3 × IQR across all the samples) were trimmed.

Estimates of putative cellular heterogeneity

Placental putative cellular heterogeneity was estimated from DNAm data using a reference-free cell-mixture decomposition method (RefFreeCellMix)73. The number of components varied between cohorts ranging from two to five components, which could be due to different sampling protocols, resulting in differential heterogeneity across cohort, or since this approach is data driven these components could be capturing other major sources of variation in the array data, such as residual technical artifacts. Models for differential DNAm were corrected for the number of surrogate variables minus one to reduce multi-collinearity.

Genome-wide differential DNAm analyses

Within each cohort, robust linear regression from the MASS package74 in R were used to account for potential heteroskedasticity, while testing the associations between normalized DNAm beta values at each CpG with any MSDP and sustained MSDP. Models were adjusted for maternal age, parity, maternal education, and cohort-specific variables first unadjusted for putative cellular heterogeneity then adjusted for RefFreeCellMix estimates of putative cellular heterogeneity. Due to the inflation and potential residual or unmeasured confounding, we applied BACON to the cohort-specific results before performing meta-analyses. BACON was specifically developed for EWAS and estimates an empirical null distribution to correct for residual bias and inflation23. We performed inverse-variance weighted fixed-effect meta-analyses using METAL75. The meta-analysis was performed independently by two groups to ensure consistent results. Both groups detected some inconsistent errors within some cohort-specific results, and the cohorts were contacted to redo the analysis. After these were addressed, results between independent groups were completely consistent. A third group then used an independent R script to perform meta-analysis with the metafor package, and again successfully reproduced identical results. CpGs not retained in at least two cohorts were filtered out. We used the Bonferroni adjustment to control for multiple testing. To examine whether differential methylation associated with sustained smoking yielded stronger magnitudes of effect relative to models of any smoking, we calculated the percent change in the coefficients between the two models (|βsustained| −  |βany|)/|βany| × 100. Secondary analyses were only performed on CpGs that passed a Bonferroni-corrected threshold for associations with any or sustained MSDP in models that were adjusted for RefFreeCellMix and corrected for residual bias via BACON.

Expression quantitative trait methylation

We performed eQTM76 analyses in the RICHS cohort. Transcription was measured via RNA-seq on 194 placentas. The details of sample collection, assay, and QC for the RNA-seq data are presented in detail elsewhere77, and summarized in the Supplementary Material (Supplementary Methods File). In this dataset, we identified 2567 unique transcripts annotated to an Ensembl ID (GrCh37/hg19) and with a TSS within 250 kb upstream or downstream of 423 out of the 445 candidate CpGs. The association between DNAm and expression levels was assessed via 3507 linear regression models, using the MEAL package78 in R; these models were adjusted for RNA-seq batch, DNAm batch, the RICHS selection factor (small, appropriate, or large for GA), and self-reported maternal ancestry. In addition, principal components analyses revealed that the first four components explained 40% of the total variation in the expression data, and they were incorporated as covariates in the model. We annotate CpGs with the eQTM genes that yielded p values < 0.05, while statistically significant eQTMs were determined at a Bonferroni-corrected threshold (α = 1.43E−05).

CpG site annotation

We annotated CpGs to genes and CpG islands with notations from the Illumina HumanMethylation450 K annotation file, and with several regulatory features using publicly available data: placental 15-chromatin states79 released from the ROADMAP Epigenomics Mapping Consortium27 (ChromHMM v1.10), placental PMDs28, and placental gDMRs26.

Enrichment analyses

Functional enrichment analyses were performed at the gene level via ConsensusPathDB80 using KEGG, Reactome, Wikipathways, and Biocarta as reference gene sets and restricting enrichment to include at least four genes from our gene lists. ConsensusPathDB performs a hypergeometric test and corrects multiple testing with FDR. Enrichment for TFs and for phenotypes were assessed at the gene level with EnrichR, using ENCODE and ChEA consensus TFs from ChIP-X database, and dbGaP database, respectively. EnrichR results were ranked using the combined score (p value computed using Fisher exact test combined with the z-score of the deviation from the expected rank)25. Enrichment for regulatory features was assessed with the hypergeometric test, and p values were Bonferroni corrected for 15 (placental chromatin 15 states) and 6 (relation to CpG island) tests, respectively.

Overlap of MSDP-sensitive CpG sites and birth outcome SNPs

Overlapping genomic regions between MSDP-associated CpGs in placenta with previously identified BW, BL, HC, and GA SNPs from the largest GWAS to date29,30,31,32,33,34 was assessed using the GenomicRanges package in R81. We identified which CpGs were located within 1 Mb windows (±0.5 Mb) surrounding each of the 324 autosomal SNPs, which correspond to 280 potential unique loci. Unique loci were defined based on the criteria in Warrington et al.34, and linkage disequilibrium in Europeans (r2 > 0.1 in <2 Mb).

Association between DNAm and birth outcomes

Within each cohort, robust linear regression models were utilized to test the association between normalized DNAm beta values at each CpG as the independent variable and GA at birth (inverse normal transformation of sex residuals), BW z-scores, BL z-scores, and HC z-scores as the dependent variables. Logistic regression was used to examine the relationships between DNAm and preterm birth (defined as <37 weeks of gestation). Birth size z-scores were calculated using international references from the INTERGROWTH-21st Project82 and standardized by both GA and newborn sex. Models were adjusted for maternal age, parity, maternal education, cohort-specific variables (see Supplementary Methods), and putative cellular heterogeneity. Inverse-variance weighted fixed-effect meta-analyses75 were again used to estimate pooled associations. Multiple testing was controlled with the Bonferroni adjustment (α = 0.05/443).

Comparison with CpGs associated with MSDP in cord blood

We examined the consistency between MSDP-sensitive CpGs in placenta and in cord blood10. First, we checked whether MSDP-sensitive CpGs identified in placenta were also reported in cord blood with the same direction of the effect. Then, we compared the coefficients from the models for sustained MSDP in cord blood, unadjusted for cellular heterogeneity, to results for both any and sustained MSDP in placenta, adjusted for cellular heterogeneity, using Pearson correlation coefficients.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.