Introduction

Birth order, or the ordinal position of a child within their family, is associated with a wide variety of health outcomes. First-borns are at a higher risk for type 1 diabetes1, high blood pressure2, synovial sarcoma3, metabolic diseases4, immune diseases (including allergy5, eczema6, acute lymphoblastic leukemia7,8, and lymphoma9). First-born children are at lower risks for other diseases including acute myeloid lymphoma10 and non-Hodgkin lymphoma11. These risk associations are robust, being replicated in populations worldwide. The proportion of first-born children compared to later born children is increasing due to decreasing birth rates worldwide12, suggest that some disease trends may be related to this changing demographic. Notably, most of the diseases listed above have exhibited increased incidence over the same time period as demographic changes leading to decreasing family size, for instance allergies and type 1 diabetes13,14, suggesting that some proportion of the observed disease incidence trends can be attributed to this change.

Importantly, first-borns experience different gestational environments than their later-born siblings, as indexed using a variety of different biomarkers. These environments may impact later disease risk and support a biological basis for prenatal environmental conditions related to birth order. First-borns experience less sufficient placentation, higher estrogen levels, and lower insulin sensitivity, which could all contribute to subsequent post-birth disease risks15,16,17,18. The means and mechanisms by which these factors (related to birth order) impact childhood outcomes are not currently understood but may be crucial to efforts at understanding disease etiology and prevention. Li et al. reported that DNA methylation using a genome-wide correlation analysis of array-based DNA methylation marks of sibling pairs born after a twin birth was more correlated than sibling pairs born before a twin birth from the same mother19. This study suggests that DNA methylation tends to be more consistent and stabilized for later born infants, subsequent to prior deliveries; however, this study did not examine the directionality of DNA methylation alterations after twin pregnancies. In another candidate gene study20, the DNA methylation of genes in T-cell pathways were reported to be associated with birth order, and they could in turn affect immune functions of the newborn. However, these studies had small sample sizes and could not detect DNA methylation changes with birth order on a wider genomic scale.

Here we aimed to investigate associations between neonatal DNA methylation and birth order on a genome-wide scale for the first time that we are aware of, combining results from 16 cohorts from the Pregnancy and Childhood Epigenetics Consortium (PACE). The large number of studies allowed extensive replication and consistency of findings, yielding a veritable catalog of birth order associations. Investigating differentially methylated probes (DMPs), as well as differentially methylated regions (DMRs) in infants with different birth orders may provide mechanistic insights on how birth order could impact associated developmental differences and disease risks.

Methods

Participating cohorts

Sixteen cohorts from 12 countries (Germany, South Africa, Belgium, United Kingdom, Norway, Italy, Greece, Finland, Gambia, Spain, Netherlands, United States of America) were included in this study, including 8164 participants (Table 1). All studies used neonatal blood—for most this was derived from the umbilical cord, and for some from heel-prick blood spots. For a detailed description of each cohort, including DNA methylation extraction and data preprocessing steps, see Supplementary Note 1. Additional details on key birth characteristics particularly birthweight was published previously21. Each cohort acquired individual site-specific ethics approval as well as informed consent. The overall analysis was approved by the University of Southern California Institutional Review Board in Health Science.

Table 1 Description of participation cohorts.

Definition of birth related variables

Birth order refers to the number of deliveries the mother had at the time of the subject’s birth. It was coded as an ordinal variable (1, 2, 3, …). Only singletons, and whose older siblings are also singletons, were included in this project, if such information is available. If multiple participants within a sample set were from the same family, only one of them was randomly included in this study to maintain independence of all study subjects. Miscarriages and abortions were not counted as delivery events. Stillbirth refers to fetal death at 20 weeks or later of pregnancy. If stillbirth information was available, it was included as a previous delivery.

DNA methylation measurement

Extraction of blood samples, isolation of genomic DNA, and DNA methylation array measurements were done separately by each cohort. See Supplementary Note 1 for a detailed description for each cohort. The Illumina450K array was used by 14 cohorts and the EPIC array by 8 cohorts.

Statistics and Reproducibility

Epigenome-wide association (EWAS) models were run in each cohort independently, with a prespecified pipeline using robust linear regression. If participants from a cohort included multiple ancestries, each ancestry was run separately. In total, there were 23 datasets, each including one ancestry from a specific cohort.

Briefly, winsorized DNA methylation beta value for each CpG was modeled as the dependent variable, with birth order (coded as 1, 2, 3,…) as a discrete independent variable. Covariates included child sex (male as 0, female as 1), technical variable to address potential batch effects, cell type proportional estimates based on the Salas et al. cord blood reference panel22, selection factor, maternal age (years), gestational age (weeks), birthweight (gram), and maternal smoking status (nonsmoker as 0, smoker as 1). Selection factor applies when there was selection on a phenotype to create the original DNA methylation dataset for each individual study—for instance leukemia status (case/control) in the CCLS study. Note that despite the selection factor, all children were not identified as such at birth—any conditions or diseases selected were diagnosed/developed later in childhood. The main model is as follows:

$${{{{{\rm{Methylation}}}}}} \, {{{{{\rm{\beta }}}}}} \, {{{{{\rm{value}}}}}} \sim \, {{{{{\rm{birth}}}}}}\; {{{{{\rm{order}}}}}} \, \left({{{{{\rm{ordinal}}}}}}\right)+{{{{{\rm{sex}}}}}} + {{{{{\rm{gestational}}}}}}\; {{{{{\rm{age}}}}}} \, \left({{{{{\rm{weeks}}}}}}\right)\\ +{{{{{\rm{Batch}}}}}}+{{{{{\rm{selection}}}}}}\; {{{{{\rm{factor}}}}}}+{{{{{\rm{maternal}}}}}}\; {{{{{\rm{age}}}}}}+{{{{{\rm{birthweight}}}}}} \\ \, +{{{{{\rm{maternal}}}}}}\; {{{{{\rm{smoking}}}}}}+{{{{{\rm{deconvoluted}}}}}}\; {{{{{\rm{cell}}}}}}\; {{{{{\rm{proportion}}}}}}$$

In meta-analysis, CpGs on sex chromosomes, as well as those overlapping SNPs and probes with >5% minor allele frequency in the entire population, were not included. “IlluminaHumanMethylationEPICanno.ilm10b4.hg19”23 was used to annotate CpGs including their locations, overlapping genes or closest genes, and their regulatory regions.

Meta-analysis of all cohorts was conducted using METAL24 weighted by inverse of standard errors, assuming fixed-effects. There were 754,340 CpGs in the final analysis that were included in at least 1 cohort. A differentially methylated probe (DMP) was defined as a CpG with false discovery rate (FDR) adjusted P value < 0.05. Heterogeneity between cohorts was measured using heterogeneity P (P_het) value output by METAL. Differentially methylation regions (DMR) were identified using “ipdmr”25 function from the ENmix26 R package, with default parameters. Meta-analysis and shadow meta-analysis were done in two different institutions, one from USA (USC), the other France (IARC). Comb-p27 was also used to identify DMRs, using meta-analyzed DMPs, to test the robustness of the “ipdmr” function.

Gene pathway enrichment analyses of DMPs were performed with “methylGSA” R package28, using all FDR adjusted significant CpGs from the meta-analysis as inputs. Gene Ontology (GO)29 and Kyoto Encyclopedia of Genes and Genomes (KEGG)30 databases were both used and pathways with FDR-corrected P value < 0.05 were considered significant. Enrichment analyses of DMRs were carried out using the database for annotation, visualization and integrated discovery (DAVID)31,32, with genes overlapping DMRs as input, focusing on GO and KEGG results.

We investigated if there was a significant increase or decrease in overlap with transcription factor (TF) binding sites among top hits. TF data for 161 transcription factors from 91 cell types were downloaded from the ENCODE project (wgEncodeRegTfbsClusteredV3.bed). The number of CpGs among significant hits overlapping TF binding sites were compared to that of array-wide CpGs with the Fisher’s exact test33.

EWAS Open Platform34 was used to conduct trait enrichment analysis, in order to identify if significant CpGs from our study were associated with other phenotypes included in EWAS Atlas35. Associations of DNA methylation levels in the blood and brain were inferred using BECon36. Expression levels of genes related to DMPs in different tissues were queried on the GTEx portal37.

Finally, the associations between methylation levels of significant CpGs and expressions of nearby genes (cis-expression quantitative trait methylation, cis-eQTMs) in the blood were queried from published results38 from the Human Early Life Exposome (HELIX) project39.

Sensitivity analyses

We conducted several sensitivity analyses to test the robustness of our results.

For the top 20 CpGs from the meta-analysis, we conducted leave-one-out (LOO) analyses, excluding one cohort at a time, to observe if the results were driven by one specific cohort. Forest plots showing LOO results were plotted with ‘forestplot’ function in the ‘forestplot’ R package40.

To investigate whether the associations between DNA methylation and birth order were different in different ancestries, we repeated the meta-analyses in European participants (n = 7484) and African participants (n = 378) separately. Ancestry specific analysis was not run in Latinos because Latino participants had the smallest sample size, and they all came from one cohort.

In addition, miscarriages and abortions have arguably smaller physiological effects than full term pregnancy; however, their effects on neonatal DNA methylation of future babies are unclear. Therefore, for cohorts with miscarriage/abortion information available, we carried out sensitivity analysis counting miscarriages and abortions as a delivery.

Lastly, maternal weight gain was reported to be associated with placental DNA methylation alterations41, which in turn could affect neonatal methylation. To test this, we adjusted for maternal weight gain as an additional sensitivity analysis in cohorts with this information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Meta-analysis identified significant CpG probes associated with birth order

Our meta-analysis included all 23 datasets from 16 cohorts identified 341 CpGs differentially methylated at FDR adjusted P value < 0.05 (Fig. 1, Supplementary Data 1). In these and all data presented, positive coefficients refer to higher (hyper-) DNA methylation with later birth order compared to earlier, and negative coefficients refer to lower (hypo-) methylation with later birth order compared to earlier. The most significant CpG (cg09249800, FDR adjusted P value = 7.24 × 10−6) was in a CpG island in the gene body of ACOT7. The second most significant CpG was located in the transcription start site (TSS) of LOC650226, located in a Chromosome 7 peak overlapping shore and island regions of LOC650226 and ZNF727 genes (Fig. 1, Supplementary Data 2). CpG sites in the promoter regions of FAM169A (cg04654716, FDR adjusted P value = 4.40 × 10−4) and LIF (cg19539004, FDR adjusted P value = 4.40 × 10−4) were also among the top hits. See Table 2 for annotation of all significant hits including their genomic coordinates, meta-analysis I2 value and additional outputs. We also computed the top associations with a statistical model examining first birth versus all subsequent births as a group (bivariate analysis) (Supplementary Data 3).

Fig. 1: Bi-direction Miami plot showing associations between DNA methylation and birth order.
figure 1

Bi-directional Miami plot showing the results of meta-analysis of the association between DNA methylation and birth order, adjusting for sex, maternal age, gestational age, birthweight, maternal smoking, batch effects, selection factor, and cell proportions. Directions of the associations were shown on the Y-axis, with positive associations above Y = 0 and negative associations below. Threshold of significance after false-discovery rate (FDR) correction is shown in a dashed horizon line. A total of 341 CpGs were significant after FDR multiple correction, the threshold of which was shown with a dashed line.

Table 2 Top 25 significant CpGs associated with ascending birth order from meta-analysis.

A total of 1 KEGG and 43 GO pathways were enriched among these 341 DMPs (Supplementary Data 4), including those involved in cell growth development (germ cell development, multicellular organism reproduction, growth factor activity etc.) and leukocyte activation and migration (leukocyte transendothelial migration, positive regulation of B cell activation, regulation of leukocyte chemotaxis etc.).

We collected data on all 161 transcription factors (TFs) from ENCODE ChiP-seq database and tested if birth order related CpGs were more or less likely to overlap with TF bindings sites. As a result, 10 TF binding sites (MAZ, CTCF, POLR2A, RAD21, EZH2, ZBTB7A, GATA3, GATA2, TAL1, POU5F1) were enriched, while 13 (ATF1, CREB1, NFYA, GTF2F1, CEBPD, ELK1, RFX5, TAF7, RELA, KDM5B, E2F4, PML, SIN3AK20) were depleted (i.e., significantly under-represented) among these CpGs.

Trait enrichment analysis suggested that birth order-associated hits were also associated with 69 other traits (Supplementary Data 5), the top 4 of which were all immune-related phenotypes including allergic sensitization (P value = 5.90 × 10−96), fractional exhaled nitric oxide (P value = 1.53 × 10−69), childhood asthma (P value = 3.42 × 10−60), and atopy (P value = 3.42 × 10−60). Smoking (P value = 1.98 × 10−38), maternal smoking (P value = 5.28 × 10−23), down syndrome (P value = 2.58 × 10−20) and neurodevelopmental presentations and congenital anomalies (ND/Cas) (P value = 8.31 × 10−18) were also among top enriched traits.

Forty out of the 341 significant CpGs (11.73%) were previously reported to be cis-eQTMs in blood (Supplementary Data 6), with some of the CpGs associated with multiple transcripts. This proportion was much higher than that for all CpGs on the 450 K array (2.37%). For example, the methylation level of cg04654716 was reported to be positively associated with FAM169A expression level (eQTM P value = 6.24 × 10−7).

DNA methylation levels are often tissue-specific, and because we analyzed DNA from blood, we wanted to evaluate whether we could infer DNA methylation levels of these 341 birth-order related CpGs in the brain, because trait hits above seemed to be very relevant to neural functions. By querying published dataset by Edgar et al.36 which reported concordance of DNA methylation in the blood and the brain, 277 birth order related CpGs had blood-brain association data available (Supplementary Data 7), and 113 (40.79%) CpGs among them had an absolute Spearman correlation coefficient bigger than 0.2. Interestingly most of the genes we mentioned as top birth order-associated hits exhibited enhanced gene expression in brain tissues compared with other tissues (including PRRT1, PLEKHB1, ACOT7, FAM169A, ZBED9) (Supplementary Fig. 15).

Differentially methylated regions associated with birth order

We identified 1,107 DMRs associated with birth order (Table 3, Supplementary Data 8). Functional annotations with genes overlapping these DMRs by DAVID31 identified 17 significant pathways (adjusted P value < 0.05). Eleven (64.70%) of them are related to DNA transcription regulation, 3 of them likely related to transcription regulation (17.65%), and only 2 (GO:0098978 glutamatergic synapse, and GO:0005887 integral component of plasma membrane) (11.76%) are not related to this function (Supplementary Data 9).

Table 3 Top 10 DMRs associated with birth order from meta-analysis.

Sensitivity analyses

We ran several sensitivity analyses to test the robustness of our results. We did leave-one-out analyses for top 20 hits from our analysis, each time excluding one dataset from the meta-analysis, to test if results were heavily influenced by any one dataset. For all top 20 CpGs, leaving datasets out one by one did not change the significance of our results. Effect sizes were all in the same direction as the main model, and none of the 95% confidence interval (CI) of the meta-analysis estimates crossed zero (Supplementary Figs. 69).

Since our participants were from different ancestral groups but predominantly European, we conducted meta-analyses in European (n = 7484) and African (n = 378) ancestries separately to observe ancestry-specific birth order related CpGs (Supplementary Data 1, Supplementary Data 10). In European participants, there were 316 significant CpGs after multiple correction, while in African participants alone, only 1 CpG remained significant (Supplementary Data 2), likely due to small sample size. 117 of the 341 significant CpGs from the main model were also significant In European participants, and all CpGs had the same direction of effects (Supplementary Data 1). However, in participants of African ancestry, 273 CpGs out of the 341 CpGs (80.06%) were in the same direction as the main model, and none of these 341 CpGs were significant in African participants alone (Supplementary Data 1).

We also controlled for maternal weight gain as an additional variable, and results were highly consistent, including the 341 significant CpGs from the original model (Fig. 2a). We also counted abortions/ miscarriages as a birth event, and results were similar to our main models (Fig. 2b).

Fig. 2: Comparisons of results from sensitivity models to the main model.
figure 2

a Effect sizes from the main models (X-axis) plotted against effect sizes from models including maternal weight gain as an additional covariate (Y-axis). Y = X line was plotted in a red line. The 341 significant CpGs from the main models were colored blue, and all other non-significant CpGs were colored yellow. b Effects sizes from the main models (X-axis) plotted against effect sizes from models counting miscarriage/abortion as a birth event (Y-axis), similar to that of (a).

Discussion

In this study, we combined multiple cohorts from 12 countries, including participants of European, African and Latino self-reported ancestries, and identified 341 CpGs whose DNA methylation levels were associated with birth order. This was the first multi-cohort large-scale EWAS study investigating the associations between neonatal DNA methylation and birth order. As no single cohort was specifically designed to examine DNA methylation and birth order our results may be considered exploratory, however the strength of the PACE Consortium allows confirmatory replication and validation.

Birth order has been associated with multiple diseases and does not have a genetic cause. Therefore, it is of interest to investigate whether epigenetic alterations, especially DNA methylation observable at birth, is associated with birth order. These epigenetic alterations may mediate the impact of birth order on disease risk, and can serve as a roadmap of candidate biomarkers to investigate such risk. To establish a robust set of birth order-associated biomarkers, we conducted an EWAS meta-analysis including multiple datasets from cohorts around the world. We found numerous CpGs differentially methylated in relation to birth order, with some associated with gene expression in tissues that have birth order disease associations such as the brain, immune system, and cardiovascular system. The dramatic fall in fertility rates worldwide over the preceding decades and projections for the future are leading to a higher proportion of first-born individuals with certain future continuation of such trends; in addition the contribution of variance in DNA methylation impacted by birth order and its associated diseases is of strong interest to the Developmental Origin of Health and Disease (DOHaD) community.

The most significant CpG was cg09249800 (adjusted P value 7.24 × 10−6), in the gene body region of ACOT7. The encoded protein hydrolyzes palmitoyl-coenzyme A (palmitoyl-CoA), and was reported to be associated with mesial temporal lobe epilepsy42. Interestingly, a previous GWAS study43 identified a SNP (rs11121611) within ACOT7 to be associated with “asthma exacerbation measurement, response to corticosteroid”. However, cg09249800 (chr1:6341287, Hg19) was about 25 kb upstream of rs11121611 (chr1:6367119, Hg19), and it was not reported to be a cis-eQTM of ACOT7 (Supplementary Data 6).

While cg09249800 was the most statistically significant association, its effect size (−2.8 × 10−3) was nearly 4 times smaller than strongest effect size CpG which was cg26865747, (coefficient = 0.0105), proximal to the SCAND3 gene, a zinc finger transcription factor implicated in tumor proliferation and invasion44. Significant individual CpG sites ranged in effect sizes from 0.0001 to 0.01, namely over two orders of magnitude, and 70 of the 341 CpGs had larger effect sizes than the most significant single CpG site at ACOT7. Other significant hits were of interest. For example, there was a prominent cluster of 10 CpGs overlapping the LOC650226 and ZNF727 genes. In addition, all significant CpGs overlapping ZNF727 were reported to be cis-eQTMs, meaning their DNA methylation levels were associated with expression levels of the ZNF727 gene. The reasons for their associations with birth order requires further investigation, although it is interesting that all 10 CpGs were also reported by Håberg et al.45 to have a significantly lower DNA methylation level in babies born with assisted reproductive technology (ART) compared to naturally conceived babies (FDR adjusted P value < 9.86 × 10−5). Interestingly in our study, later-borns were more methylated than first-borns in this region. It was not clear why DNA methylation patterns vary in this manner. A potential explanation is that the later order a child was born, the more established the pregnancy process becomes including placentation, leading to more stable nutrition status promoting physiologic homeostatic DNA methylation patterns. We did not evaluate the relationship of ZNF727 DNA methylation to postnatal outcomes, but such an effort would be valuable, particularly in the modern era as family size in some countries has decreased compared to historical trends. Either way, additional data is needed to elucidate answers to whether DNA methylation alterations may mediate some of the disease associations ascribed to birth order.

The CpG site cg04654716 (effect size −0.0023, adjusted P value 5.12 × 10−9) in the transcription start site of FAM169A is also of interest. Similar to CpGs overlapping ZNF727, cg04654716 was also reported to be a cis-eQTM, whose DNA methylation level was positively related to NSA2 expression level. SNPs in NSA2 were also associated with metabolism-related traits in many GWAS studies (low density lipoprotein cholesterol measurement46,47, total cholesterol measurement46,47, linoleic acid measurement48, omega-6 polyunsaturated fatty acid measurement48,49, and HMG CoA reductase inhibitor use measurement50). Interestingly, metabolic function was also related to birth order4, and its causal pathway is worth further investigation.

Most of the participants in this study were of European ancestry, and unsurprisingly, in European participants alone, the effect sizes of all significant CpGS were in the same directions as the main model, while in African participants, about 80% significant CpGs were in the same direction (Supplementary Data 2). Further investigation on how these CpGs were related to birth order in other ancestries including Asians and Latinos is required.

In trait enrichment analyses, birth order related CpGs were also associated with 69 traits (Supplementary Data 5), especially allergy-related features including allergic sensitization, childhood asthma, atopy, serum immunoglobulin E levels, allergic asthma, wheeze, respiratory allergies, primary Sjögren’s Syndrome, systemic lupus erythematosus, multiple sclerosis and cow’s milk allergy. Interestingly, first-borns were also reported to have a higher risk of allergy51 and eczema was previously reported to be associated with birth order6,51. Other immune related features were also significantly associated with birth order, for example, psoriasis, B acute lymphoblastic leukemia (B-ALL) and acute chorioamnionitis (aCA). Among those, B-ALL was reported to be more common in first-borns7.

Several other traits previously reported to be related to birth order were also identified, including blood pressure2 (cardiac autonomic responses (deceleration capacity), diastolic blood pressure, systolic blood pressure, atherosclerosis, preeclampsia, maternal hypertensive disorders in pregnancy), metabolism4 (serum liver enzyme levels (alanine aminotransferase, ALT), serum liver enzyme levels (gamma-glutamyl transferase, GGT), metabolic trait, hepatic steatosis, hepatic fat), birth weight52 and body mass index (BMI)53. The top association in our DMR analysis was ZBED9, recently identified as a regulatory gene for blood pressure54.

Additional traits were associated with birth order related CpGs in our study, but have not previously reported to be associated with birth order. These include abnormal karyotype related traits (Down syndrome, Klinefelter syndrome), and several neural function-related traits (soluble tumor necrosis factor receptor 2 levels in plasma, neurodevelopmental presentations and congenital anomalies (ND/CAs), schizophrenia, myalgic encephalomyelitis/chronic fatigue syndrome, leukoaraiosis). More investigation could reveal whether these traits are also related to birth order, and how birth order related epigenetic changes might contribute to such relationships.

Neurological traits previously assessed in relationship to birth order include intelligence, in which first-borns tended to display higher levels55,56,57. As our samples were collected from blood, these enriched neural related traits also led us to investigate the consistency of methylation levels of birth order associated CpGs in blood and brain. Of all the significant CpGs whose blood-brain association data were available, 40.79% had modest to strong associations (absolute Spearman correlation coefficient >0.2). When we confined our trait enrichment analysis to CpGs whose methylation levels were highly correlated in blood and brain only, similar traits were enriched, and neurodevelopmental presentations and congenital anomalies (ND/CAs) became the second most enriched phenotype (Supplementary Data 5).

We identified 1,107 DMRs associated with birth order. Enrichment analysis of these genes showed that almost all significant pathways were related to regulation of gene transcription. More work is needed to understand what proteins were regulated and in which direction to elucidate more specific information that could impact human development. Only 2 enriched pathways were not transcription related, one of them being the glutamatergic synapse pathway. Glutamatergic synapse is involved in neural network development, and is essential for transferring and processing information58, which may contribute to the association between birth order and intelligence.

There were some shortcomings with this study. While investigating birth order, we examined unrelated individuals, instead of same-family siblings. This inevitably introduced noise and decreased the reliability of our findings due to genetic, socioeconomic status, and cultural differences between study participants both within and across study cohorts. The choice to remove any genetically related study subjects (by family relationships) maintains the independence of every study subject, randomizing unmeasured confounders whereas we statistically controlled for key known confounders—including sex, cell type distributions, maternal age, gestational age, birthweight, and maternal tobacco use (when available). We were not able to control for socioeconomic status characteristics (such as maternal education or family income), characteristics which will impact disease risk and potentially DNA methylation profiles. Moreover, our study was designed to identify associations only, and is not capable of demonstrating causality or mediation by DNA methylation and birth order-related diseases. Instead, the data provides a catalog of candidates for future research. Also, despite the worldwide scope and large number of studies in the PACE Consortium, we lack extensive data on some race/ethnic groups particularly non-Europeans including Latinos, Africans and Asians. It would be of interest to investigate how birth order is associated with DNA methylation changes in these groups, and how they vary from the conclusions in this study. Another weakness is the variability of average family size among the various PACE consortium studies (which vary with regards to fertility rates) which might affect power to detect effects from higher birth orders. Also, our focus on neonatal blood exclusively may limit the discovery aspects of our data for other tissues, along with the limited coverage of the epigenome afforded by the Illumina array platforms used. Strengths of this study include the large sample size afforded by the PACE on a common analysis platform, and the presumed consistency in our main predictor variable (birth order) which should have a universal description worldwide and similar physiologic impacts across countries. The use of cord blood is also a major strength in that a molecular phenotype was captured before onset of many of the associated traits. The consistency of our top CpG hits across cohorts argues for true and meaningful associations which may prove to have further resources for the maternal/fetal health research community.

We note that first-borns, compared to their later-born siblings, have a variety of postnatal environmental differences which may also impact disease risk. First-borns are typically exposed to infectious agents later in their childhood development; indeed, birth order was often used as a proxy for infection timing59,60,61. Such postnatal experiences including child rearing practices and postnatal infections are commonly conjectured to be mediators for birth order’s health and disease impacts. These factors are not likely to be related to pre-birth environments and are not a subject of the current analysis but are important postnatal mediators. Future studies should robustly evaluate both the prenatal and postnatal mediators of birth order on disease risk —including DNA methylation at birth.

In conclusion, our results from multiple datasets showed with high confidence that birth order has a widespread and consistent association with DNA methylation in the cord blood of newborns. These differences provide a catalog of associations which can be assessed as causal mediators in the etiology of health conditions related to birth order.