Introduction

Intrauterine exposures, such as maternal smoking, pre-pregnancy body mass index (BMI), hyperglycaemia, hypertension, folate and famine are associated with fetal growth and hence birthweight1,2,3,4,5,6. Observational studies show that birthweight is also associated with later-life health outcomes, including cardio-metabolic and mental health, some cancers and mortality7,8,9,10,11. In these long-term associations, birthweight may act as a proxy for potential effects of intrauterine exposures12,13. Several mechanisms may explain the associations of intrauterine exposures with birthweight and later-life health as we illustrate in Fig. 1. Our overall conceptual framework in this study was that the intrauterine environment induces epigenetic alterations, which influence fetal growth and hence correlate with birthweight. This is partly supported by previous large-scale epigenome-wide association studies (EWAS) that have reported associations of relevant maternal pregnancy exposures, including smoking, air pollution and BMI, with DNA methylation in offspring neonatal blood14,15,16. However, whilst four previous EWAS have observed associations of DNA methylation with birthweight17,18,19,20, the evidence to date has been limited in scale and power with sample sizes ranging from approximately 200 to 1000.

Fig. 1
figure 1

Hypothetical paths that might link intrauterine exposures to DNA methylation, birthweight and later-life health outcomes. Red arrows summarise the paths that have motivated the analyses undertaken in this study (i.e. that maternal environmental exposures influence DNA methylation that in turn influences fetal growth and hence birthweight). The EWAS meta-analysis undertaken sought to identify methylation associated with birthweight. Blue arrows summarise other plausible paths, including that maternal exposures influence fetal growth first and it then influences DNA methylation or that maternal exposures may influence fetal growth/birthweight and later-life health outcomes through other pathways than DNA methylation

In this study, we hypothesised that there are associations between DNA methylation and birthweight. We further aimed to explore if these epigenetic alterations are associated with later disease outcomes (Fig. 1). If birthweight is a proxy for a range of adverse prenatal exposures, we might expect neonatal blood DNA methylation to be associated with birthweight. However, we acknowledge that any associations of DNA methylation with birthweight may be explained by confounding21 or reflect fetal growth influencing DNA methylation.

Here we present a large meta-analysis of multiple EWAS to explore associations between neonatal blood DNA methylation and birthweight. In further analyses, we explore whether any birthweight-associated differential methylation persists at older ages. To aid functional interpretation, we (i) explore the overlap of identified cytosine-phosphate-guanine sites (CpGs) that are differentially methylated in relation to birthweight with those known to be associated with intrauterine exposure to smoking, famine and different levels of BMI and folate; (ii) associate DNA methylation at identified CpGs with gene expression and (iii) explore potential causal links with birthweight and later-life health using Mendelian randomization (MR)22. We show that DNA methylation in neonatal blood is associated with birthweight and some of the differential methylation is also observed in childhood and adolescence, but not in adulthood. Also, we show overlap between birthweight-related CpGs and CpGs related to intrauterine exposures. Potential causality of the associations needs to be studied further.

Results

Participants

We used data from 8825 neonates from 24 studies in the Pregnancy And Childhood Epigenetics (PACE) Consortium, representing mainly European, but also African and Hispanic ethnicities with similar proportions of males and females. Details of participants used in all analyses are presented in Table 1, Supplementary Data 1 and study-specific Supplementary Methods.

Table 1 Characteristics for the participating studies in the main meta-analysis for the association between neonatal blood DNA methylation and birthweight

Meta-analysis

Primary, secondary and follow-up analyses are outlined in the study design in Fig. 2. Methylation at 8170 CpGs, measured in neonatal blood using the Illumina Infinium® HumanMethylation450 BeadChip assay and adjusted for cell-type heterogeneity23,24,25, was associated with birthweight (false discovery rate (FDR) <0.05), of which 1029 located in or near 807 genes survived the more stringent Bonferroni correction (p < 1.06 × 10−7, Supplementary Data 2). We observed both positive (45%) and negative (55%) directions of associations between methylation levels of these 1029 CpGs and birthweight (Fig. 3) and these CpGs were spread throughout the genome (orange track (1) in Fig. 4 and Supplementary Fig. 1). We found evidence of between-study heterogeneity (I2 > 50%) for 115 of the 1029 sites (Supplementary Data 2), thus we prioritised 914 CpGs, located in or near 729 genes, based on p < 1.06 × 10−7 and I2 ≤ 50% for further analyses (Fig. 3 and orange track (1) in Fig. 4). The CpG with the largest positive association was cg06378491 (in the gene body of MAP4K2). For each 10% increase in methylation at this site, birthweight was 178 g higher (95% confidence interval (CI): 138, 218 g). The CpG with the largest negative association was cg10073091 (in the gene body of DHCR24), which showed a 183 g decrease in birthweight per 10% increase in methylation (95% CI: −225, −142 g). The CpG with the smallest P-value and I2 ≤ 50% was cg17714703 (in the gene body of UHRF1), which showed a 130 g increase in birthweight for 10% increase in methylation (95% CI: 109, 151 g).

Fig. 2
figure 2

Design of the study. Schematic representation of the main meta-analysis, secondary meta-analyses, follow-up analyses and exploration of persistence at older ages. *We removed multiple births from all analyses and excluded preterm births (<37 weeks) and offspring of mothers with pre-eclampsia or diabetes (three major pathological causes of differences). **For sufficient power in the low vs normal BW analyses, we only included nine studies with >10 low birthweight cases

Fig. 3
figure 3

Volcano plot showing the direction of associations of DNA methylation with birthweight in 8825 neonates from 24 studies. The X-axis represents the difference in birthweight in grams per 10% methylation difference, the Y-axis represents the −log10(P). The red line shows the Bonferroni-corrected significance threshold for multiple testing (p < 1.06 × 10−7). Highlighted in orange are the 914 CpGs with p < 1.06 × 10−7 and I2 ≤ 50% and highlighted in blue are the 115 CpGs with p < 1.06 × 10−7 and I2 > 50%

Fig. 4
figure 4

Circos plot showing the (Bonferroni-corrected p < 1.06 × 10−7) results for associations of DNA methylation with birthweight. Results are presented as CpG-specific associations (−log10(P), each dot represents a CpG) by genomic position, per chromosome. From outer to inner track: [1, orange] Main analysis results for associations between DNA methylation and birthweight as a continuous measure (n = 8825), [2, blue] Results from participants from European ethnicity only, DNA methylation and birthweight as a continuous measure (n = 6023), [3, red] Results from analysis without exclusion for preterm births, pre-eclampsia and maternal diabetes, DNA methylation and birthweight as a continuous measure n = 5414), [4, purple] Results from logistic regression analysis without exclusion for preterm births, pre-eclampsia and maternal diabetes, for low (n = 178) vs normal (n = 4197) birthweight, [5, yellow] Results from logistic regression analysis for associations between DNA methylation and high (n = 1590) vs normal (n = 6114) birthweight, [6, green] Results from look-up analysis in methylation samples taken during childhood and its association with birthweight as a continuous measure (n = 2756). Track 1: highlighted in red are 115 CpGs with I2 > 50%. Tracks 2–6: highlighted in red are CpGs that were not found in the 914 main meta-analysis hits (though note differences in sample size and hence statistical power for different analyses presented in the different tracks)

Findings were consistent with results from our main analyses when restricted to participants of European ethnicity, with a Pearson correlation coefficient for effect estimates of 0.99 for the 914 birthweight-associated CpGs (Supplementary Fig. 2, blue track (2) in Fig. 4 and Supplementary Data 3) and 0.90 for all 450k CpGs. Comparing the main meta-analysis to the four Hispanic cohorts and the two African cohorts revealed that 94.9% and 74.0% of the 914 CpGs showed consistent direction of association, with Pearson correlation coefficients for point estimates of 0.82 and 0.48, respectively (Supplementary Data 3). In leave-one-out analyses, in which we reran the main meta-analysis repeatedly with one of the 24 studies removed each time, there was no strong evidence that any one study influenced findings consistently across the 914 differentially methylated CpGs that passed Bonferroni correction and for which between-study heterogeneity had an I2 ≤ 50%. For 139/914 CpGs (15.2%) the difference in mean birthweight for a 10% greater methylation at that site varied by ≥20% with removal of a study, but the study resulting in the change was different for different CpGs. Supplementary Fig. 3.1-3.20 show the results for a random 10 plots where removal of one study changed the result by 20% or more and a random 10 where this was not the case; full results are available on request from the authors. Findings were broadly consistent when birthweight was categorised to high (>4000 g, n = 1593) versus normal (2500–4000 g, n = 6377) (Supplementary Data 4, yellow track (5) in Fig. 4) and when we did not exclude neonates born preterm or to women with pre-eclampsia or diabetes (Supplementary Fig. 4 and Supplementary Data 5A and 5C, and red track (3) in Fig. 4). Without these exclusions, we were able to examine associations with low (<2500 g, n = 178) versus normal (2500–4000 g, n = 4197) birthweight, though statistical power was still limited. Four CpGs were associated with low versus normal birthweight (Bonferroni-corrected threshold), none of which overlapped with the 914 CpGs from the main analysis (Supplementary Data 5B, purple track (4) in Fig. 4). We identified that 161 of the 914 differentially methylated CpGs potentially contained a single-nucleotide polymorphism (SNP) at cytosine or guanine positions (i.e. polymorphic CpGs; Supplementary Data 6). Polymorphic CpGs may affect probe binding and hence measured DNA methylation levels26,27. We used one of the largest studies (ALSPAC; n = 633) to explore this. We found no indication of bimodal distributions for any of the 161 CpGs suggesting SNPs had not markedly affected methylation measurements at these sites (dip test p-values: 0.299–1.00)28,29,30.

Analyses at older ages

We took the 914 neonatal blood CpGs that were associated with birthweight at Bonferroni-corrected statistical significance and with I2 ≤ 50% and examined their associations with birthweight when measured in blood taken in childhood (2–13 years; n = 2756 from 10 studies), adolescence (16–18 years; n = 2906 from six studies) and adulthood (30–45 years; n = 1616 from three studies). Only participants from ALSPAC, CHAMACOS and Generation R had also contributed to the main neonatal blood EWAS. In childhood, adolescence and adulthood, we observed 87, 49 and 42 of the 914 CpGs to be nominally associated with birthweight (p < 0.05). All these CpGs showed consistent directions of association. Ten CpGs showed differential methylation across all four age periods. However, only a minority survived Bonferroni correction for 914 tests (p < 5.5 × 10–5): 12 (1.3%), 1 (0.1%) and 0 CpGs in childhood, adolescence and adulthood, respectively (Supplementary Data 7; the 12 CpGs that persisted in childhood are presented in the green track (6) in Fig. 4). Of the 914 CpGs, 50, 52 and 49% showed consistency in direction of association in childhood, adolescence and adulthood, but correlations of the associations of DNA methylation and birthweight between methylation measured in infancy and that measured in childhood, adolescence and adulthood were weak (Pearson correlation coefficients: 0.15, 0.06 and 0.02, respectively).

Intrauterine factors

We observed enrichment of previously published maternal smoking-related CpGs in the birthweight-associated CpGs14 (55/914 (6.0%) penrichment = 6.12 × 10−74, of which cg00253658 and cg26681628 also showed persistent methylation differences in the look-up in childhood). We additionally found enrichment of maternal BMI-related CpGs in the list of birthweight-related CpGs15 (3/914 (0.3%) penrichment = 1.13 × 10−3). All directions of association were consistent with the birthweight-lowering influence of maternal smoking or the positive association of maternal BMI with birthweight (Supplementary Data 8).  We did not find evidence for overlap with plasma folate31. For famine, we were unable to explore overlap with DNA methylation at the Bonferroni-significant level as the previous EWAS of famine only reported results that reached a FDR level of statistical significance32. In additional analyses for overlap between all FDR hits from the birthweight EWAS with those FDR hits presented in the smoking, maternal BMI, folate and famine EWAS, we found an overlap of 430/8170 CpGs (5.3%, penrichment = 7.38 × 10−132) for smoking, 584/8170 CpGs (7.1%, penrichment = 3.34 × 10−62) for maternal BMI and 14/8170 (0.2%, penrichment = 0.02) for folate. For famine we did not observe overlap.

Metastable epialleles and imprinted genes

We tested the birthweight-associated CpGs for enrichment of metastable epialleles (loci for which the methylation state is established in the periconceptional period33,34). We additionally tested for enrichment of CpGs annotated to imprinted genes (loci that depend on the maintenance of parental-origin-specific methylation marks in the pre-implantation embryo, some of which are known to regulate fetal growth35,36). We did not find evidence of enrichment for metastable epialleles (3/1936 metastable epialleles overlap a birthweight-associated CpG), imprinting control regions (0/741) or imprinted gene transcription start sites (5/1728) (Supplementary Data 9).

Comparison with GWAS for birthweight

To compare these EWAS results to those from genetic studies, we used the 60 recently published fetal SNPs associated with birthweight in a GWAS meta-analysis of 153,781 newborns37 and mapped the CpG sites identified in the EWAS to these SNPs to seek evidence of co-localisation of genetic and epigenetic variation (Supplementary Data 10). We repeated this for the 10 recently published maternal SNPs associated with birthweight in a GWAS meta-analysis of 86,577 women38 (Supplementary Data 11). We observed that one or more of the 914 birthweight-associated CpGs were within +/−2Mb of 34/60 fetal and all 10 maternal birthweight-associated SNPs. Of the 34 fetal SNPs, three were located in the same gene as the CpG, as was one of the ten maternal SNPs. Ten fetal and four maternal SNPs were within 100 kb of identified CpGs. In a look-up of the fetal and maternal SNPs from GWAS of birthweight in an online cord blood methylation quantitative trait loci (mQTL) database (mqtldb.org39), 35 fetal and four maternal SNPs affected methylation at some CpG(s), but none at the 914 birthweight-associated CpGs specifically.

Functional analyses

We compared the 914 birthweight-related CpGs with a recently published list of 18,881 expression quantitative trait methylation sites (cis-eQTMs, +/−250 kb around the transcription start site), CpG sites known to correlate with gene expression, from whole blood samples of 2101 Dutch adult individuals. We found that 82 of the 914 birthweight-associated CpGs were associated with gene expression of 98 probes (cis-eQTMs)40 (penrichment < 1.73 × 10−11, Supplementary Data 12). Additionally, in 112 Spanish 4-year-olds41, we observed that 19 CpGs were inversely associated with whole blood mRNA gene expression and four CpGs were positively associated with gene expression (FDR<0.05, Supplementary Data 13). Of these 23 CpGs, 13 were also found in the publicly available cis-eQTM list40. In 84 Gambian children (age 2 years)42, we found two CpGs that were inversely associated with whole blood mRNA gene expression, but neither were found in the Spanish results or the publicly available cis-eQTM list. The 914 birthweight-associated CpGs showed no functional enrichment of Gene Ontology (GO) terms or Kyoto Encyclopedia of Genes and Genomes (KEGG) terms (FDR<0.05).

Mendelian randomization

We aimed to explore causality using MR analysis, in which genetic variants associated with methylation levels (methylation quantitative trait loci (mQTLs)) are used as instrumental variables to appraise causality. For 788 (86%) of the 914 birthweight-associated CpGs, no mQTLs were identified in a publicly available mQTL database39. For 108 (86%) of the remaining 126 CpGs, only one mQTL was identified and for the remainder none had more than four mQTLs (Supplementary Data 14 provides a complete list of all mQTLs identified for these 126 CpGs). Many of the currently available methods that can be used as sensitivity analyses to explore whether MR results are biased by horizontal pleiotropy (a single mQTL influencing multiple traits) require more than one genetic instrument (here mQTLs) and even with two or three this can be difficult to interpret43. Having determined that it was not possible to undertake MR analyses of 86% of the birthweight-related differentially methylated CpGs (because we did not identify any mQTLs), and for the majority of the remaining CpGs we would not have been reliably able to distinguish causality from horizontal pleiotropy (because only one mQTL could be identified), we decided not to pursue MR analyses further.

Discussion

This large-scale meta-analysis shows that birthweight is associated with widespread differences in DNA methylation. We observed some enrichment of birthweight-associated CpGs among sites that have previously been linked to smoking during pregnancy14 and pre-pregnancy BMI15, consistent with the hypothesis that epigenetic pathways may underlie the observational associations of those prenatal exposures with birthweight21,44,45. However, the actual overlap in this analysis was modest, likely explained by the adjustments for maternal smoking and BMI in the EWAS analyses. The overlap that we observed with pregnancy smoking-related CpGs may reflect the possibility that smoking-related CpGs capture smoking better than self-report46,47, in line with expectations of pregnant women underreporting their smoking behaviour. Adjustment for maternal smoking and BMI may have masked a greater level of overlap between our results and EWAS of these two maternal exposures. The fact that we find an association of DNA methylation across the genome with birthweight provides some support for our conceptual framework shown in Fig. 1. However, we acknowledge that the associations that we have observed may also be explained by causal effects of maternal pregnancy exposures on both DNA methylation and fetal growth, as well as subtle inflammatory responses in cell-type proportions associated with maternal smoking that might not have been completely captured with the currently available cell type estimation methods.

The differential methylation associated with birthweight in neonates persisted only minimally across childhood and into adulthood. Larger (preferably longitudinal) studies are needed to explore persistent differential methylation in more detail and with better power at older ages. It is possible that inclusion of the Gambia study in the childhood EWAS (which was the only non-European study in these analyses and was not included in the main meta-analyses with neonatal blood) might have impacted these results, although this study made up just 7% of the total child follow-up sample. A rapid attenuation of differential methylation in relation to birthweight in the first years after birth has previously been reported19, but our sample size for these analyses may have been too small to detect persistence. This rapid decrease, if real, may indicate a reduction in the dose of the child’s exposure to maternal factors such as smoking once the offspring is delivered, with that reduction continuing as the child ages. Persistence of birthweight-related differential DNA methylation may not necessarily be a prerequisite for long-term effects, as transient differential methylation in early life may cause lasting functional alterations in organ structure and function that predispose to later adverse health effects.

Methylation is known to be associated with gene expression48. However, we found no consistent associations between birthweight-related methylation and gene expression in two childhood studies. This could be due to the relatively small sample sizes, differences in ethnicities, age, or platforms to measure gene expression. The use of blood, which is likely only a possible surrogate tissue for fetal growth phenotypes, for gene expression analysis might also explain the lack of findings. We did find multiple cis-eQTMs among the birthweight-related CpGs at which methylation was related to gene expression in blood when using a publicly available database from a larger adult sample40, providing some evidence that birthweight-related differentially methylated CpGs may be associated with gene expression. These initial in silico association analyses need further exploration to establish any underlying causal mechanisms.

In observational studies, birthweight has repeatedly been associated with a range of later-life diseases. Change in DNA methylation has been hypothesized as a potential mechanism linking early exposures, birthweight and later health (Fig. 1). We originally aimed to explore this using MR analysis. For the vast majority of the birthweight-associated CpGs, no genetic instrumental variables were available. For the remaining 126 CpGs, only one mQTL was available, which would make it impossible to disentangle causality from horizontal pleiotropy. To ensure a strong basis for future MR analyses on this topic, there is a clear need for a more extensive mQTL resource.

Strengths of this study are its large sample size and the extensive analyses that we have undertaken. In a post hoc power calculation based on the sample size of 8825 with a weighted mean birthweight of 3560 g (weighted mean standard deviation (SD): 483 g) and with an alpha set at the Bonferroni-corrected level of P < 1.06 × 10−7 we had 80% power, with a two-sided test, to detect a minimum difference of 0.13 SD (63 g) in birthweight for each SD increase in methylation. The difference in methylation corresponding to a 1 SD increase differs per CpG, as it depends on the distribution of the methylation values. We acknowledge that smaller differences which might be clinically or biologically relevant may not have been identified in the current analysis. Nonetheless, to our knowledge this analysis has brought together all studies currently available with relevant data and is the largest published study of this association. DNA methylation patterns in neonatal blood, whilst easily accessible in large numbers, may not reflect the key tissue of importance in relation to birthweight. DNA methylation and gene expression in placental tissue may be important targets for future studies. DNA methylation varies between leucocyte subtypes49 and we used an adult whole blood reference to correct for this in the main analyses23,24, as the study-specific analyses were completed before the widespread availability of specific cord blood reference datasets50,51. However, we observed very similar findings in two studies (Generation R and GECKO) when we compared the results with those using one of the currently available cord blood references50. Although we adjusted for potential major confounders that may affect both methylation and fetal growth, we acknowledge that the main results cannot ascertain causality. That is, whilst we have hypothesised that variation in fetal DNA methylation influences fetal growth and hence birthweight, and undertaken the analyses accordingly, we cannot exclude the possibility that differences in neonatal blood DNA methylation are caused by variation in fetal growth itself, or that the association is confounded by factors, including maternal smoking and BMI, that independently influence both fetal growth and DNA methylation (as suggested in Fig. 1). The 450k array that was used to measure genome-wide DNA methylation only covers 1.7% of the total number of CpGs present in the genome and specifically targets CpGs in promoter regions and gene bodies52. We removed the CpGs that were flagged as potentially cross-reactive, as the measured methylation levels may represent methylation at either of the potential loci. Also, although we did not find evidence for polymorphic effects for the 161 potentially polymorphic CpGs in ALSPAC, we cannot completely exclude these potential polymorphic effects in the meta-analysed results. The majority of participants were of European ethnicity and when analyses were restricted to those of European ethnicity the results were essentially identical to those with all studies included. Direct comparisons of the main analysis with analyses in those of Hispanic or of African ethnicity for the 914 hits suggested strong correlations with Hispanic but weaker with African ethnicity. However, these results need to be treated with caution. First, we had very few studies of Hispanic and African populations. Second, we only compared the initial hits from the main meta-analysis with all ethnicities included. A detailed exploration of ethnic differences would require similar large samples for each ethnic group and within ethnic EWAS, which is beyond the scope of the data currently available.

Neonatal blood DNA methylation at many sites across the genome is associated with birthweight. Further research is required to determine if these are causal and if so whether they mediate any long-term effect of intrauterine exposures on future health.

Methods

Participants

In the main EWAS meta-analysis we explored associations of neonatal blood DNA methylation with birthweight using data from 8825 neonates from 24 studies in the PACE Consortium53 (Table 1). We removed multiple births from all analyses and excluded preterm births (<37 weeks) and offspring of mothers with pre-eclampsia or diabetes (three major pathological causes of differences in fetal growth). In follow-up analyses, we explored whether any sites found in the main analysis were discernible in relation to birthweight when examined in DNA from blood drawn during childhood (2–13 years; 2756 children from 10 studies), adolescence (16–18 years; 2906 adolescents from six studies) or adulthood (30–45 years; 1616 adults from three studies), see Supplementary Data 1B. Informed consent was obtained from all participants, and all studies received approval from local ethics committees. Study-specific methods and ethical approval statements are provided in Supplementary Methods.

Birthweight, DNA methylation and covariates

Our primary outcome was birthweight on a continuous scale (grams), adjusted for gestational age, and measured immediately after birth or retrospectively reported by mothers in questionnaires. In secondary analyses, we categorised and compared associations with high (>4000 g, n = 1593) versus normal (2500–4000g, n = 6377) birthweight. We also explored all associations with (continuous and categorical) birthweight in analyses that did not exclude women with pre-eclampsia, diabetes or preterm delivery, which also resulted in enough cases to explore low (<2500 g, n = 178) versus normal (2500–4000 g, n = 4197) birthweight (Supplementary Data 1C shows the characteristics of participants). Primary, secondary and follow-up analyses are outlined in the study design in Fig. 2. DNA methylation was measured in neonatal blood samples using the Illumina Infinium® HumanMethylation450 BeadChip assay. All participants had cord blood samples except for three studies with heel stick blood spots (n = 1254 [14.2%]). After study-specific laboratory analyses, quality control, normalisation, and removal of control probes (n = 65) and probes that mapped to the X (n = 11,232) and Y (n = 370) chromosomes, we included 473,864 CpGs. DNA methylation is expressed as the proportion of cells in which the DNA was methylated at a specific site and hence takes values from zero to one. We converted this to a percentage and present differences in mean birthweight per 10% higher DNA methylation level at each CpG. All analyses were adjusted for gestational age at delivery, child sex, maternal age at delivery, parity (0/≥1), smoking during pregnancy (no smoking/stopped in early pregnancy/smoking throughout pregnancy), pre-pregnancy BMI, socio-economic position, technical variation, and estimated white blood cell proportions (B-cells, CD8+ T-cells, CD4+ T-cells, granulocytes, NK-cells and monocytes)23,24,25. In studies with participants from multiple ethnic groups, each group was analysed separately and results were added to the meta-analyses as separate studies. Further details are provided in the study-specific Supplementary Methods.

Statistical methods

Robust linear (birthweight as a continuous outcome) or logit (binary birthweight outcomes) regression EWAS were undertaken within each study according to a pre-specified analysis plan. Quality control, normalisation and regression analyses were conducted independently by each study. After confirming comparability of study-specific summary statistics54, we combined results using a fixed effects inverse variance weighted meta-analysis55. The meta-analysis was done independently by two study groups and the results were compared in order to minimise the likelihood of human error. We show (two-sided) results after correcting for multiple testing using both the FDR<0.0556 and the Bonferroni correction (p < 1.06 × 10−7). We completed follow-up analyses for differentially methylated CpGs that reached the Bonferroni-adjusted threshold and did not show large between-study heterogeneity57 (I2 ≤ 50%). We annotated the nearest gene for each CpG using the UCSC Genome Browser build hg1958,59. We explored whether between-study heterogeneity might be explained by differences in ethnicity between studies, by repeating the meta-analysis including only participants of European ethnicity, which was by far the largest ethnic subgroup (n = 6023 from 17 studies) (Fig. 2). Ethnicity was defined using maternal or self-report, unless specified otherwise in study-specific Supplementary Methods. We also did meta-analyses only including the Hispanic studies and only including the African American studies and present those results for illustrative purposes only, given the much smaller sample size. All analyses were performed using R60, except for the meta-analysis which was performed using METAL55. We removed CpGs that co-hybridised to alternate sequences (i.e. cross-reactive sites), because we cannot distinguish whether the differential methylation is at the locus that we have reported or at the one that the probe cross-reacts with. We compared the birthweight-related CpGs to lists of CpGs that are potentially influenced by a SNP (polymorphic sites)26,27. For these CpGs, we determined if DNA methylation levels were influenced by nearby SNPs, by assessing whether their distributions deviated from unimodality using Hartigans’ dip test28,29 and visual inspection of density plots in n = 742 cord blood samples in the ALSPAC study.

Analyses at older ages

Analyses of the associations with DNA methylation in blood collected in childhood, adolescence and adulthood followed the same covariable adjustment and methods as for the main analyses (p < 5.5 × 10−5 for 914 tests). All participants and studies in these analyses at older ages had not been included in the main meta-analysis in neonatal blood, except for ALSPAC (n = 633 in neonatal analyses, n = 605 in childhood and n = 526 in adolescence), CHAMACOS (n = 283 in neonatal analyses and n = 191 in childhood) and Generation R (n = 717 in neonatal analyses and n = 372 in childhood). Characteristics are shown in study-specific Supplementary Methods and Supplementary Data 1B.

Intrauterine factors

We used a hypergeometric test to explore the extent to which any of the birthweight-related CpGs overlapped with those previously associated with intrauterine exposure to smoking14 (n = 568 CpGs), BMI15 (n = 104 CpGs) and plasma folate31 (n = 48 CpGs), using the same (Bonferroni-corrected) cut-off for statistical significance. No CpGs reached the Bonferroni-corrected cut-off for famine32. We additionally appraised this overlap using the FDR<0.05 cut-off for all traits (n = 8170 birthweight-related CpGs, n = 6703 smoking-related CpGs, n = 16,067 BMI-related CpGs, n = 443 folate-related CpGs, n = 7 famine-related CpGs). These FDR results were available from the publications for smoking, folate and famine, and we obtained them from the corresponding author for BMI.

Metastable epialleles and imprinted genes

We tested the birthweight-associated CpGs for enrichment of metastable epialleles and CpGs associated with imprinted genes. The metastable epialleles were derived from a recently published study that identified 1936 putative metastable epialleles34. For imprinted genes, we first identified a set of CpGs falling within a curated set of imprinting control regions; differentially methylated regions controlling the parental-specific expression of one or more imprinted genes36. Second, we extracted the set of imprinting control region controlled genes from the above source and identified all 450k CpGs within +/−10kbp of the gene transcription start site, including all known alternative TSS identified in grch37.ensembl.org using biomaRt61,62.

Comparison with GWAS for birthweight

We compared the birthweight-associated CpGs with the 60 SNPs from the most recent GWAS meta-analyses of fetal genotype associations with birthweight in >150,000 newborns37 and with the 10 SNPs from the most recent GWAS meta-analysis of maternal genotype associations with birthweight in >86,000 women38. With this comparison we checked if the EWAS top hits were located within a 4 Mb window (+/− 2 Mb) surrounding these SNPs. We additionally checked whether SNPs and CpGs were located in the same gene.

Functional analyses

To explore the association of methylation with gene expression, we compared birthweight-related CpGs with a recently published list of 18,881 cis-eQTMs from whole blood samples of 2101 Dutch adult individuals40. With a hypergeometric test, we calculated enrichment of cis-eQTMs in the list of birthweight-associated CpGs. We further explored methylation of birthweight-associated CpGs in relation to whole blood mRNA gene expression (transcript levels) within a 500 kb region of the CpGs (+/−250 kb, FDR<0.05) in 112 Spanish 4-year-olds41 and 84 Gambian 2-year-olds42 (Supplementary Methods). To better understand the potential mechanisms linking DNA methylation and birthweight, we explored the potential functions of the birthweight-associated CpGs using GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. We used the missMethyl R package63, which enabled us to correct for the number of probes per gene on the 450k array, based on the November 2018 version of the GO and KEGG source databases. To filter out the large, general pathways we set the number of genes for each gene set between 15 and 1000, respectively. We calculated FDR at 5% corrected P-values for enrichment.

Mendelian randomization

MR uses genetic variants as instrumental variables to study the causal effect of exposures on outcomes64,65. We aimed to use two-sample MR22,66 to explore (a) evidence of a causal association of methylation levels at the identified CpGs with birthweight and (b) evidence of a causal association of these CpGs with later-life health outcomes (i.e. to explore our hypothesised causal mechanisms shown in Fig. 1). We did this by first searching a publicly available mQTL database39 to identify cis-mQTLs within 1 Mb of each of the Bonferroni-corrected, with I2 ≤ 50%, birthweight-related differentially methylated CpGs. These mQTLs could then be used as genetic instrumental variables for methylation levels of the birthweight-related CpGs. We then aimed to determine the association of these mQTLs with birthweight and later-life health outcomes from publicly available summary GWAS results66.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.