The early-life exposome modulates the effect of polymorphic inversions on DNA methylation

Polymorphic genomic inversions are chromosomal variants with intrinsic variability that play important roles in evolution, environmental adaptation, and complex traits. We investigated the DNA methylation patterns of three common human inversions, at 8p23.1, 16p11.2, and 17q21.31 in 1,009 blood samples from children from the Human Early Life Exposome (HELIX) project and in 39 prenatal heart tissue samples. We found inversion-state specific methylation patterns within and nearby flanking each inversion region in both datasets. Additionally, numerous inversion-exposure interactions on methylation levels were identified from early-life exposome data comprising 64 exposures. For instance, children homozygous at inv-8p23.1 and higher meat intake were more susceptible to TDH hypermethylation (P = 3.8 × 10−22); being the inversion, exposure, and gene known risk factors for adult obesity. Inv-8p23.1 associated hypermethylation of GATA4 was also detected across numerous exposures. Our data suggests that the pleiotropic influence of inversions during development and lifetime could be substantially mediated by allele-specific methylation patterns which can be modulated by the exposome.

I nversions are segments of DNA that run in the opposite direction to a reference genome. They are balanced mutations of different sizes, from a gene's exon to a chromosome's portion 1 . Because of their role in adaptation to the environment, chromosome evolution, and sex-determination systems in multiple species, polymorphic inversions have traditionally displayed a great interest in evolutionary biology 2,3 . Recent studies have shown that they are important contributors to the genetic basis of common complex diseases in humans, such as obesity, diabetes, asthma, cancer, and neurological conditions such as depression or neuroticism [4][5][6][7][8][9][10][11] . By capturing multiple functional variants, inversions can confer simultaneous risks to different diseases, and, as such, increase the frequency of the diseases' comorbidities. Human inversions at 8p23.1, 16p11.2, and 17q21.31 are large, common, and associate with multiple diseases, including those co-occurring with obesity 5,8 . In addition, they have been strongly correlated with the expression of the several genes they encapsulate across multiple tissues 8,[12][13][14] . There are different mechanisms from which inversions can modulate gene expression. First, inversions can break genes or displace regulatory elements with important functional and phenotypic consequences 10,12,15 . Second, recombination is suppressed in the inverted region in heterokaryotypes. As such, inverted and noninverted alleles accumulate different genetic variants that support differences of gene expression between alleles 2,16,17 . Although several studies have demonstrated the effect of inversions on gene expression, it is unknown the extent to which inversions are also characterized by specific methylation patterns.
DNA methylation, the addition of a methyl group in a CpG DNA site, plays an important and complex role in the regulation of gene expression 18 . Depending on the relative position of the CpG site within the gene, its methylation can increase or decrease the gene's expression 19 . Methylated promoters are often associated with deactivation of transcription, while methylation within the gene's body avoids alternative start sites 20 . Methylation is often strongly correlated across contiguous CpG sites, a fact that is used to determine differentially methylated regions (DMR) of kilobase-pair lengths 21 . At larger distances, coherent methylation patterns may be supported by genomic variants such as copy number variants 22 . However, it is unknown if methylation patterns in inverted regions can also be detected. We, therefore, hypothesized that the common human inversions at 8p23.1, 16p11.2, and 17q21.31 are correlated with the methylation of multiple CpG sites within and surrounding the inverted region, creating allele-specific methylation patterns. In support of this hypothesis, some studies have already reported associations between inversion and phenotypes likely modulated by specific methylation changes 6,23,24 . Besides, since CpG methylation is involved in regulating chromatin structure 25 , these methylation patterns could be associated with different tridimensional (3D) DNA structures for each allele. This would be in line with the influence on 3D DNA structure by large structural variants reported by Shanta et al. 26 .
The epigenetic landscape of genes can be altered due to environmental exposures, leading to disease [27][28][29] . In 2005, Wild introduced the term "exposome" that encompasses all the environmental exposures to which an individual was subjected, from conception to death 30 . This concept has evolved and now it does not only include environmental exposures but also exposures to diet, behavior, and endogenous processes 31 . Common exposures, like air pollution, stress, and heavy metals, among many others, have been associated with distinct epigenetic marks in relevant genes. For example, psychosocial stressors early in life, even in utero, can induce methylation changes on specific genes in the brain 32 . Studies have demonstrated, for instance, that abnormal DNA methylation can lead individuals to be more sensitive to stressful stimuli, increasing the stress burden and anxiety over the life course 33 . More generally, Teh et al. demonstrated that only 25% of the interindividual variation in neonatal DNA methylation was explained by genetic variants, while the 75% was better explained by the interaction of genotype with different in utero environments (considering maternal smoking, maternal BMI, and maternal depression, among others) 34 . Therefore, given its strong link with exposome and genetic variation, methylation is currently considered an important target of gene-environment interactions 35 .
Here, we first evaluated whether three common polymorphic inversions in humans affect the methylation patterns of their encapsulated and surrounding DNA sequences in blood cells from children and in prenatal heart tissue. Second, using a large set of 64 early-life exposures, we then asked which of these exposures had a different impact on DNA methylation according to the inversion status at 8p23.1, 16p11.2, and 17q21.31.

Results
Frequency of inversions at 8p23.1, 16p11.2, and 17q21.31. We analyzed data from the Human Early Life Exposome (HELIX) project, a multicenter European cohort (Spain, United Kingdom, France, Lithuania, Norway, and Greece). This project comprises 1301 children with genomic, transcriptomic, epigenomic, and exposome data 36 . HELIX has the goal of characterizing the exposome during early life and evaluating its relationship with molecular signatures and child health outcomes. The genomewide blood DNA methylation and blood cell transcriptome were measured at the ages between 6 and 11. From this dataset, we selected children with genetic and methylation data. We used Peddy 37 to estimate major population ancestry groups and individuals of European ancestry were kept in the analysis, resulting in a total of 1009 children included in the analyses.
Inversions as eQTLs in blood cells. We first evaluated the inversion status as expression quantitative trait loci (eQTL) of the genes within the inversion regions ±1 Mb. We performed the association analyses of the inversions in each separate cohort adjusting by sex, age, cell-type proportions (inferred from methylation data), and 10 genome-wide principal components of genomic SNP variation (N = 790). We then combined the results with a meta-analysis across cohorts. The results were considered significant when they passed Bonferroni's correction for multiple comparisons. We confirmed that the inv-8p23.1 and inv-16p11.2 were eQTLs for the numerous neighboring genes and the genes they encapsulate (see Supplementary Data 1 and Supplementary  Fig. 3). We observed 12 genes that were significantly associated with inv-8p23.1. We detected significant upregulation of BLK, SLC35G5/SLC35G4, FAM86B1/FAM86B2, and FAM86B3P, and downregulation of FDFT1, FAM167A, FAM66D, SGK223, XKR6, and LOC100506990 for the inverted allele. In the case of the polymorphic inversion at 16p11.2, we observed 10 significant associations, including upregulation of TUFM, MIR4721, EIF3C/ EIF3CL, LAT, SPNS1, and NPIPB9/NPIPB8/NPIPB7 for the inverted allele and downregulation of SGF29, SBK1, LOC388242, and SULT1A1. Finally, for inv-17q21.31, we did not observe eQTL effects, perhaps because single-copy genes within this inversion are mostly expressed in the brain 14 . We thus confirmed the effect of the inversions 8p23.1 and 16p11.2 on the gene expression in blood in 6-11-year-old children, as previously observed in adults across different tissues 8,[12][13][14] .
Inversions as mQTLs in blood cells. We then studied the associations of the genotypes of each of the three inversions with the differential methylation of CpG sites within the ±1-Mb regions containing the inversions (Supplementary Data 2). We removed CpG sites with single-nucleotide polymorphic (SNP) variation. We performed the analyses in each separate cohort adjusting by the same covariates likewise the transcription analyses. We combined the results with a meta-analysis across cohorts (N = 1009). As illustrated in Fig. 1a-c, all three inversions were significantly associated with differences in methylation across multiple CpG sites after Bonferroni's correction for multiple comparisons. We also observed that the most significant associations were in CpG sites within the inversion region or close to the breakpoints. In particular, we observed that 15.21% (129 of 848) CpG sites within and around inv-8p23.1 had significant differences in methylation levels according to to the inversion status (min. P = 63.1 × 10 −147 , Fig. 1a), with 49 significant CpG sites hypermethylated and 80 hypomethylated in the inverted concerting the noninverted allele. For this inversion, we observed 24 genes with at least one significant differentially methylated CpG site and five genes with more than five differentially methylated sites; namely MSRA, MFHAS1, BLK, RP1L1, and XKR6. For inv-16p11.2, we found 27 significant CpG sites differentially methylated from a total of 401 (6.73%, min. P < 10 −300 , Fig. 1b), with 9 significant CpG sites hypermethylated and 18 hypomethylated at the inverted allele. For this inversion, we observed 11 genes with at least one significant CpG site. IL27 was the gene with the greatest number of CpG sites (5) differentially methylated (all hypomethylated at the inverted allele). Finally, 58 CpG sites from 666 (8.71%, min. P < 10 −300 , Fig. 1c) had significant methylation differences for inv-17q21.31 (30 hypermethylated and 28 hypomethylated at the inverted allele). CRHR1, MAPT, and KANSL1 were the 17q21.31 genes with the highest number of differentially methylated CpG sites and a total of 14 genes had at least one CpG site differentially methylated. Therefore, each of these three inversions behaves as an extended methylation quantitative trait loci (mQTL) covering hundreds of kilobases, an observation that had not been previously reported.
To establish the degree to which the association between the effect of inversion status on CpG methylation is associated with changes in gene expression of surrounding genes, we searched for the methylation changes that locate in differentially expressed genes ( Supplementary Fig. 4). We observed that four genes (BLK, FDFT1, XKR6, and FAM167A) overlapped for the inv-8p23.1 with differentially methylated CpG sites. We analyzed whether the observed expression changes were in the expected directions based on the methylation of these regions, that is, hypermethylation of the promoters for downregulated genes, hypomethylation of the promoters for upregulated genes, and hypermethylation of the bodies for upregulated genes. XKR6 was a highly consistent case whose downregulation and methylation, across 11 CpG sites within its body, were associated with the inverted allele. For inv-16p11.2, we observed four genes that were differentially expressed and methylated by the inversion allele (TUFM, SBK1, SPNS1, and SULT1A1). In this case, most of the CpG sites were in the promoter region (TSS1500) and the relation between the expression and methylation levels was consistent. We further observed that SULT1A1 and TUFM had CpG sites in their promoters (cg01378222 and cg00348858) that highly associated with the effect of inversion in gene expression. We found that cg01378222 mediated the 95% of the association between inv-16p11.2 and the expression of SULT1A1 (P < 2 × 10 −16 ), and that cg00348858 mediated the 5% of the association between the inversion and TUFM expression (P = 0.002).
These findings provided evidence of regulatory pathways where inversion, methylation, and gene expression are all involved. In addition, our observation that inv-17q21.31 did not show eQTL effects in blood indicates that the three-way association of the variables is tissue specific, as we observed a clear methylation pattern for the inversion.
Inversion-state-specific methylation patterns. In order to define whether the methylation patterns were specific to each inversion allele, we performed principal component (PC) analysis of the methylation levels of CpG sites within and around each inversion. We thus quantified individual differences in methylation profiles across the inverted regions. We included the region ±1 Mb to account for the effect of the inversions beyond the breakpoints. Remarkably, the first component strongly correlated with the inversion genotype of the individuals in all three inversions (inv-8p23.1 PC 1: R 2 = 0.68, P < 2 × 10 −16 , inv-16p11.2 PC 1: R 2 = 0.05, P = 1.34 × 10 −12 , and inv-17q21.31 PC 1: R 2 = 0.70, P < 2 × 10 −16 ), see Fig. 1d-f. We observed that the first PC clearly separated the genotypes of inversions at 8p23.1 and 17q21.31, possibly sustained by the haplotypic differences between inversion status. While the first PC of inv-16p11.2 was significantly associated with inversion genotypes, the second PC was also needed to distinctly separate the genotypes (R 2 = 0.33, P < 2 × 10 −16 ). This is in line with the univariate differential analysis, where inv-16p23.1 showed the smallest proportion of CpG sites differentially methylated according to the inversion status. This is possibly explained by the multiple haplotypes supported by this inversion 11 . These analyses showed that hyperand hypomethylation patterns of CpG sites across the inverted regions are specific to the inversion status.
Inversions as mQTLs in fetal heart DNA. We asked whether the effect of the inversion on DNA methylation could be also seen prenatally and in another tissue. Using methylation data of heart DNA from 39 fetuses from interrupted pregnancies at 21-22 weeks of gestational age due to congenital heart defects 38 , we performed the same differential analysis adjusting by sex. We observed that all the inversions act as mQTLs during early development from conception, although few CpG sites per inversion passed Bonferroni's threshold ( Fig. 1g-i and Supplementary Data 3). This can be explained by the small sample size. Nonetheless, we observed that the distribution of the significant associations was very similar to the one observed in HELIX data, having greater differences in methylation in the CpG sites between the breakpoints. In addition, we saw that 38 CpG significant sites overlapped between heart (nominal P-value) and blood (adjusted P-value) tissues, 32 of which were in the same direction, suggesting that the effect of inversions on CpG methylation may be sustained between tissues and stages of life.
Effect of inversion-exposure interactions on DNA methylation. As these common human inversions at 8p23.1, 16p11.2, and 17q21.31 offered a solid genetic context where allele-specific methylation patterns were found, we then asked whether these patterns were modulated by environmental exposures. Thus, we assessed which of 64 exposures at early life differentially modified the methylation levels of the CpG sites within the inversion regions according to the inversion status.
We performed differential methylation analyses for the interactions of the 3 inversions with 64 exposures (7 during pregnancy and 57 at 6-11 years of age) grouped by 12 exposure families, including build environment, air pollution, persistent and nonpersistent chemicals, diet, and exposure to tobacco smoke, among others ( Fig. 2a and Supplementary Data 4). We observed 36 exposures and 58 CpG sites implicated in at least one significant inversion-exposure interaction after Bonferroni's correction for multiple comparisons (see Table 1 and Supplementary Data 5). All exposure families had at least one exposure that interacted with one of the three inversions, except natural spaces and polybrominated diphenyl ether compounds (PBDE). Remarkably, the exposure families with the greatest number of significant interactions were metals (13 interactions), diet (11), phenols (11), and organochlorines (OCs) (10) (Supplementary Data 6).
Genes with the strongest and most numerous inversionexposure interactions. Within the significant interactions (Table 1), we looked in detail at the genes that showed both the highest significant levels and multiple interactions across different CpG sites for the same gene. We identified three relevant genes within inv-8p23.1, namely TDH, GATA4, and TRMT9B. Within TDH, we found two CpG sites significantly associated with the interaction between the inversion and meat intake: cg01489256 (β = 0.0156, P = 3.8 × 10 −22 ) and cg02601489 (β = 0.0092, P = 1.8 × 10 −8 ). More specifically, we observed that individuals homozygous for the noninverted allele (N/N) had a negative association, while heterozygous individuals did not present any association, and homozygous for the inverted allele (I/I) had a positive association (Fig. 3a). We also observed that the association was consistent across all the cohorts, with no significant heterogeneity (cg01489256: P = 0.39; cg02601489: P = 0.45), see Fig. 3b. We further observed that the increase of meat intake reduced the expression of TDH (P = 0.00398), while the associated methylation effect on the expression depended on the genetic context given by the inversion, adjusting by sex, age, and cohort (CpG-inversion interaction, P = 0.00193) ( Supplementary  Fig. 5). Remarkably, the gene, the inversion, and the exposure have been independently associated with obesity in adults 5,39-41 .
Another interesting result of our analysis relates to the methylation of the TRMT9B gene, also known as C8orf79 or KIAA1456, a tRNA methyltransferase. The gene has been seen to associate with laryngotracheitis, an upper respiratory tract disease in chicken 43,44 . We observed that parental smoking during childhood significantly modulated the inversion-associated methylation of cg08196601 (β = −0.010, P = 1.3 × 10 −10 ) (Fig. 3e). The interaction of the inversion with maternal smoking during pregnancy was also associated with the methylation of cg08196601 (β = −0.020, P = 5.9 × 10 −8 ). In addition, the methylation of cg26339990 was associated with the interaction of the inversion with outdoor PM2.5 (an air pollution exposure) during pregnancy (β = −0.003, P = 5.5 × 10 −8 ). In the three cases, the noninverted allele was associated with increased levels of methylation with the exposures. We observed that the heterogeneity across cohorts was not significant (P = 0.63) (Fig. 3f). In line with these observations, the noninverted allele for inv-8p23.1 has been found to associate with asthma 5 while parental smoking and exposure to high levels of PM2.5 during pregnancy or childhood increase the risk of respiratory diseases in children [45][46][47] .

Discussion
Here, we show that the common human chromosomal inversions at 8p23.1, 16p11.2, and 17q21.31 have distinctive methylation patterns in blood across the inverted regions and that the earlylife exposome modulates these patterns. We observed that during childhood, approximately 10% of the CpG sites within the inverted regions ±1 Mb were significantly differentially methylated according to the inversion genotype. The amount of the differentially methylated CpG sites was high within the region and sharply decreased after the breakpoints, indicating the targeted effect of genomic inversions on DNA methylation. We could also identify the effects of the inversions at prenatal stages in heart tissue, suggesting their relevant role during development even in utero. As such, inversions are early methylation quantitative loci for the genes they enclose. Our findings, therefore, add to other effects that inversions have on gene expression 8,13,14,48 , derived from their genetic variability or from the displacement of regulatory elements near the breakpoints 10 . While individual CpG associations with the inversion may be due to the inversion or to local genetic variability in linkage with the inversion, our observations in the PC analysis reveal a spatial pattern given by the correlation of several CpG-site associations that fits the extension of the inversion. It is clear that the cause of such extended pattern along the affected sequence has been produced by the presence of the inversion, likely due to both the DNA reconfiguration and the accumulation of specific genetic variability along the segment that results from the suppression of recombination between inversion states.
We show that an important influence of inversions on phenotypes could be derived from the methylation patterns they support. Few previous studies have analyzed targeted methylation changes when studying a specific inversion or disease. We previously reported that the effect of inv-17q21.31 on colorectal disease-free survival is more likely mediated by DNA methylation than by gene expression 6 . Here, we document that the effect of inversions on methylation is strong along the inverted segment and already significant during early embryonic and fetal development in heart-tissue DNA. One of the main established mechanisms underlying the influence of inversions on phenotypic traits and their pleiotropy is the suppression of recombination within the inverted sequence in heterozygotes. Allele combinations can thus be protected, leading to the generation and possible selection of specific haplotypes for each inversion state 10 . In addition, inversion breakpoints can disrupt coding regions or regulatory elements, altering gene expression or generating novel transcripts with phenotypic consequences, including deleterious effects 15 . These effects likely play a role in the association of these three polymorphic inversions with complex diseases, like obesity 5,8 , autoimmune diseases 49 , or neurodegenerative disorders [50][51][52] . For these diseases with important environmental components, our results further suggest the additional role of inversion-associated methylation that is modifiable by environmental exposures.
Allele-specific methylation patterns in inversions can be caused or facilitated by their specific genetic variability and/or different chromatin structure. In our study, we removed probes with SNPs within 5-bp distance and overall population frequency higher than 1%, ruling out technical and genetic variation as main contributors to the methylation differences. We observed that inversions at 8p23.1 and 17q21.31 were strongly characterized by their methylation patterns in the region. However, the effect was less strong for inv-16p11.2, which can be due to the higher number of haplotype groups supported by the inversion, that is, two distinct haplotype groups in the standard allele and one in the inverted allele, and the fact that this inversion is smaller in size (0.45 Mb vs. 0.9 Mb for inv-17q21.31 and almost 4 Mb for inv-8p23.1) 8 . These specific effects on the methylation patterns could be mainly caused by differences in the three-dimensional (3D) DNA configuration for each allele 26 , rendering some haplotypes more accessible to the different factors that could facilitate DNA methylation. This mechanism would explain how a recurrent but nonpolymorphic inversion at Xq28 causing Hemophilia A has been associated with specific methylation changes 23 or how de novo inversions at 11p15.5 causing Beckwith-Wiedemann syndrome can be hypermethylated 24 . The possible correlation of inversion haplotypes with different 3D configurations and nuclear localization should be investigated in future studies.
We found that while the effects of the inversion on gene transcription and CpG methylation are widespread across the affected region with some overlap, the specific expression changes driven by inversion-association methylation need to be individually assessed. While the extended pattern of methylation across the inversion can be a consequence of the reconfiguration of the chromatin structure, gene expression may be more susceptible to the tissue and the local genetic variability in linkage with an inversion allele. In the case of 17q21 inversion, for instance, we found clear methylation patterns associated with inversion alleles, but no expression differences, which suggests that these methylation changes would have no relevant consequences in blood. By contrast, we also identified a relevant and specific mediator role by the methylation at promoters of TUFM and SULT1A1 on the associations of their expressions with inv-16p11.2. Remarkably, these are candidate genes in the association between inv-16p11.2 and the co-occurrence of asthma and obesity 8 .
Previous studies have reported transcriptomic effects of inv-17q21.31 in blood only in genes with multiple copies 53,54 . This is a complex region with high variability in the gene copies within the inversion alleles, high homology between the genes with multiple copies, and low expression of the genes in blood 14,55 . This could explain the lack of eQTL effects of inv-17q21 in blood that we observed.
We have found that several methylation effects of inversions are modifiable by numerous environmental exposures, suggesting additional inversion-methylation effects to those driven by genetic variability. We observed that inversions significantly interacted with a wide range of exposures affecting DNA methylation across the inverted segments. Therefore, inversions are common copy-neutral polymorphisms that seem to be important contributors to gene-environment interactions, whose detection remains elusive in genomic and high-dimensional exposure data [56][57][58] . We analyzed data from an exposome study, covering a wide range of exposure families believed to affect children's development. The exposome data included environmental exposures but also exposures from the diet, urban exposome, and chemical compounds 31 . In total, we assessed 64 exposures (7 during pregnancy and 57 at 6-11 years of age) grouped in 12 families. We observed inversion interactions in most of the exposure families, most prominently in metals, diet, phenols, and organochlorines. Validation of these results and their consequences remain to be evaluated. Our results support the notion that inversions can change the way exposures affect a child's development by changing the genetic context. Carriers of genomic variants, such as these inversions that may affect the function of a set of genes in a specific direction, can be more susceptible to (or naturally protected against) disease or developmental disorders if exposed to a relevant environmental risk factor 59 . Thus, allele-specific methylation in response to different environmental factors could also contribute to the positive selection that has been documented for all three inversions in some human populations 8,12,60 .
We found numerous significant inversion-exposure interactions on methylation levels in important genes that deserve further study. These include, among others, Alzheimer's MAPT and its associations with copper 61 , MSRA's role in repairing oxidative damage to proteins and its relation with diet and parental smoking, and the oncogene WNT3 and its relation to molybdenum and mercury exposure. Here, we highlight three interactions with potential clinical interest and substantial support from previous studies. First, we observed the interaction of inv-8p23.1 with meat intake associated with TDH methylation levels. Remarkably, the inversion, the exposure, and the gene are independently associated with obesity in adults 5,39-41 . Our data revealed that noninverted homozygous individuals, those with a higher risk of obesity, decreased methylation of two CpG sites within TDH as meat intake increases. While further studies are needed to describe the role that this pseudogene plays in obesity during development, it is clear that these need to incorporate the effects of the inversion and its methylation status. In addition, clinical interventions of obesity aiming at managing meat intake should consider the methylation of the gene and the inversion genotype of individuals. Second, we observed that cg26020513 within GATA4 was hypermethylated in blood when manganese exposure increased but only in noninverted homozygous individuals. It is notable that the hypermethylation of cg26020513 has been strongly associated with congenital heart defects in fetuses 42 , mutations in GATA4 have been associated with cardiac septal defects 62 , and manganese toxicity in heart tissue is well documented 63 . The inversion also interacted with other relevant exposures on GATA4 methylation, including mercury, with reported effects in heart-rate variability in children 64 , diethylphosphate, Mediterranean diet, and PCB 138. Therefore, the extent to which the inversion status can protect against the positive association between these exposures and GATA4 methylation deserves further scrutiny. Third, we observed that the effects of tobacco smoke (during pregnancy or in childhood) and air pollution (outdoor PM2.5 exposure) on TRMT9B methylation changed, depending on the inv-8p23.1 genotype. Since these two exposures increase the risk of respiratory diseases [45][46][47] and TRMT9B is a gene associated with an upper respiratory tract disease 43,44 , our results suggest a likely role of the gene in the association between inv-8p23.1 and asthma 5 .
To the best of our knowledge, this is the first study to systematically assess the methylation landscape within three common human inversions and its interaction with the exposome. We have shown that genomic inversions are associated with the methylation of the CpG sites within the inversion region and that this association is modulated by a wide range of environmental exposures during childhood.

Methods
Study population. The Human Early Life Exposome (HELIX) project 36 comprises a total of 1301 mother-child pairs from six birth cohorts in Europe: BIB (Born in Bradford; the United Kingdom) 65 69 , and Rhea (Greece) 70 . These mother-child pairs participated in a common, completely harmonized, follow-up examination between December 2013 and February 2016, when children were between 6 and 11 years old 71 . The main goal of this project was to implement exposure assessment and biomarker methods to characterize early-life exposure to multiple environmental factors and associate these with omics biomarkers and child health outcomes. For these same children, multi-omics molecular phenotyping was performed, including measurement of blood DNA methylation (450 K, Illumina), blood gene expression (HTA v2.0, Affymetrix), blood miRNA expression (SurePrint Human miRNA rel 21, Agilent), plasma proteins (Luminex), serum metabolites (AbsoluteIDQ p180 kit, Biocrates), urinary metabolites ( 1 H NMR spectroscopy), and DNA microarray (Chemagen kit, Perkin Elmer). All studies received approval from the ethics committees of the centers involved and written informed consent was obtained from all participants.

Molecular phenotypes
Inversion genotype data. DNA was obtained from buffy coat collected in EDTA tubes at 6-11 years of age. Briefly, DNA was extracted using the Chemagen kit (Perkin Elmer) in batches of 12 samples. Samples were extracted by cohort and following their position in the original boxes. DNA concentration was determined in a NanoDrop 1000 UV-Vis Spectrophotometer (ThermoScientific) and with Quant-iT™ PicoGreen® dsDNA Assay Kit (Life Technologies). Genome-wide genotyping was performed using the Infinium Global Screening Array (GSA) MD version 1 (Illumina) at the Human Genomics Facility (HuGe-F), Erasmus MC (www.glimdna.org). Genotype calling was done using the GenTrain2.0 algorithm based on a custom clusterfile for 692,367 variants implemented in the GenomeStudio software. Annotation was done with the GSAMD-24v1-0_20011747_A4 manifest, SNP coordinates were reported on human reference GRCh37 and Source strand (Forward strand report in GenomeStudio). The initial dataset consisted of 1,397 samples and 692,367 variants. Samples with discordant sex, duplicated, contaminated (high heterozygosity), and relatives (IBD > 0.185) were filtered out. SNPs with variant call rate <95%, minimum allele frequency <1%, and HWE Pvalue (1 × 10 −6 ) were excluded. Major population ancestry groups were estimated using Peddy 37 and only individuals of European ancestry were kept in the analysis. The final dataset consisted of 1,009 samples and 509,344 SNP variants. From this dataset, we selected inversions that could be genotyped with scoreInvHap and had more than 10 CpG sites in the inversion region: inv-8p23.1, inv-16p11.2, and inv-17q21.31 (Table 2 and Supplementary Tables 2 and 3).
DNA methylation. The DNA was obtained using the same methodology as for genetics data. DNA methylation was assessed using the Infinium Human Methylation 450 beadchip (Illumina), following the manufacturer's protocol. Minfi R package 72 was used for the preprocessing of DNA methylation data. MethylAid package 73 was employed to perform the first quality control of the data. Probes with low call rates were filtered following the guidelines of Lehne et al. 74 . The functional normalization method was further applied, including Noob background subtraction and dye-bias correction 75 . Several quality-control checks were performed: sex consistency using the shinyMethyl package 76 , consistency of duplicates, and genetic consistency for the samples that had genome-wide genotypic data. Duplicated samples and control samples were removed, as well as probes that measure methylation levels at non-CpG sites 77 . Probes that cross-hybridize were excluded. Moreover, we used InfiniumAnnotation from https://zwdzwd.github.io/ InfiniumAnnotation to filter probes where 30-bp 3′-subsequence of the probe is nonunique, probes with INDELs, probes with extension base inconsistent with specified color channel (type I) or CpG (type II) based on mapping, probes with a SNP in the extension base that causes a color-channel switch from the official annotation, and probes where 5-bp 3′-subsequence overlap with any of the SNPs with global population frequency higher than 1%. Consequently, the number of CpG probes analyzed was 371,533, initially available for 1192 subjects. We then used Combat algorithm to remove the batch effects supported by the slide. Methylation levels were expressed as beta values (average methylation levels for an individual, between 0 for a never-methylated CpG site and 1 for an alwaysmethylated CpG site) and CpG sites were annotated to genes by Illumina HM450 manifest file (version 1.2). We discarded the subjects without inversion-status data and without European ancestry based on genomic data, resulting in 1009 individuals for the analysis. For each inversion, we selected the CpG sites contained in the inversion region ±1 Mb, resulting in 848 CpG sites for inv-8p23.1, 401 for inv-16p11.2, and 666 for inv-17q21.31 (Table 2 and Supplementary Table 2). Blood cell-type proportions were estimated from methylation data according to Houseman et al. algorithm 78 and Reinius reference panel 79 .
Gene expression. At the period of clinical examination that took place when children were between 6 and 11 years old, RNA was extracted from whole blood collected in Tempus tubes. Samples with RIN > 5 were considered. Gene expression was assessed using the GeneChip ® Human Transcriptome Array 2.0 (HTA 2.0) (Affymetrix, USA) at the University of Santiago de Compostela (USC, Spain), following the manufacturer's protocol. Samples were randomized and balanced by sex and cohort within each batch. Data were normalized at the gene level with the GCCN (SST-RMA) algorithm, and batch effects and blood cell-type composition were controlled with two surrogate variable analysis (SVA) methods, isva 80 and SmartSVA 81 , during the differential expression analyses. Gene expression values were log2 transformed, and annotation of transcript clusters (TCs) to genes was done with NetAffx annotation (version 36). Genes without Gene Symbol annotation or with call rate <20% were removed, restricting to 25,255 genes. From this number of genes, we selected those within the inversion regions ±1 Mb (inv-8p23.1: 83 genes; inv-16p11.2: 58 genes; inv-17q21.31: 61 genes). From a total of 1158 subjects that had transcriptomic data, we selected individuals with European ancestry (based on genomic data) who had available inversion-status data and celltype proportions assessed from methylation data, resulting in a total of 790 subjects ( Table 2 and Supplementary Table 1).
Exposome assessment. The assessment of the exposome has been previously published 82 . In our study, we included 7 exposures assessed during pregnancy and 57 exposures assessed during childhood at age 6-11 y (Supplementary Data 4). These 64 exposures were selected from the entire exposome dataset according to the number of missing values they had. We did not include exposures that had more than 10% of missings in the whole dataset or with more than 20% missing in one or more cohorts. We also excluded exposures whose levels were not present in all cohorts. Third, we selected the most representative exposures within each family.
The pregnancy exposome consists of 7 exposures, including outdoor PM2.5, normalized difference vegetation index (NDVI), 4 PFASs, and maternal smoking during pregnancy. The postnatal exposome was divided into 12 exposure families: outdoor air pollution (2), building environment (1), diet (6), metals (9), natural spaces (1), organochlorines-OCs (8), organophosphate pesticides-OP pesticides (5), polybrominated diphenyl ethers-PBDEs (2), perfluorinated alkylated substances-PFAS (5), phenols (7), phthalates (10), and second-hand exposure to tobacco smoke (1) (Fig. 2a). Metals, OCs, OP pesticides, PBDEs, PFASs, phenols, and phthalates were assessed by biomarkers in children at the time of the clinical examination, from a pool of two urine samples or one serum sample 83 . Air pollution, natural spaces, and building environment quantification were assessed during the year before child examination or during pregnancy by environmental geographic information systems (GIS). Tobacco smoke and diet were evaluated by questionnaires. Missing values for all exposures were imputed using the method of chained equations 84 , as described in detail elsewhere 82 . Most exposure variables were transformed as described in Supplementary Data 4.
Fetal heart-tissue samples. Human fetal samples from 40 fetuses of terminated pregnancies due to a major congenital heart defect (gestational age 21-22 weeks in all cases) were obtained from Biobanc Hospital Universitari Vall d'Hebron (HUVH) in a related project addressed to define the genetic and epigenetic basis of congenital heart defects 38 . Informed consent was obtained from parents and the study was approved by the institutional ethics committee. Heart-tissue DNA was obtained following necropsy using standard procedures, whole-genome sequencing was performed at Centogene, and DNA methylation was measured with Infinium MethylationEPIC 38 .

Statistics and reproducibility
Genome-wide analysis. Differential methylation analyses were performed using MEAL Bioconductor's package 85 . We performed a differential mean analysis The table shows the length in kb, the mapping coordinates hg19 ±1 Mb, the frequency of all the inversions obtained from scoreInvHap 11 , and the number of samples and features used in transcriptome and methylome analysis for each inversion.
(DMA) on inversion genotypes using the function runDiffMeanAnalysis that calls limma 86 . Based on a priori knowledge, we adjusted all the regression models by sex, age, population stratification (using the first 10 principal components of the GWAS that highly correlated with cohort), and cell type (Supplementary Tables 1 and 2). To correct for the variance between cohorts, we performed this analysis for each cohort separately, and we meta-analyzed the results using the function metagen from meta package 87 . For each inversion, in each cohort, we fitted models where E j is the methylation or expression-level vector across individuals at probe j, I k are the individuals' genotypes for inversion k (8p23.1, 16p11.2, and 17q21.31), C r is the r covariate and its effect γ r , and ε j is the noise that follows the distribution of methylation or expression levels with mean 0. β jk is the effect of interest measuring the effect of the inversion. The β jk were then meta-analyzed across cohorts. Pvalues derived from the meta-analyses were corrected for multiple comparisons for the number of probes using Bonferroni's correction. The inflation or deflation of Pvalues across the methylome or transcriptome was tested with Q-Q plots.
Exposome-wide interaction analysis. Based on the genome-wide analysis, the same functions were implemented for the exposome-wide interaction analysis. In this case, the effect of interest was the inversion-exposure interaction in the model where X i is the level of exposure i across individuals. β jik is the effect of interest given by the exposure-inversion interaction. In this case, the covariates also included exposure i, the inversion genotypes, maternal education level, and child body mass index (BMI). P-values were corrected for multiple comparisons across CpG sites and exposures using Bonferroni's correction. The inflation or deflation of P-values across the methylome was tested with Q-Q plots.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Source data underlying Figs. 2a, 3a and e are available in Supplementary Data 7. The HELIX data warehouse has been established as an accessible resource for collaborative research involving researchers external to the project. Access to HELIX data is based on approval by the HELIX Project Executive Committee and by the individual cohorts. Further details on the content of the data warehouse (data catalog) and procedures for external access are described on the project website (http://www.projecthelix.eu/index.php/es/data-inventory). The data used in this analysis are not available for replication because specific approvals from HELIX Project Executive Committee and the University of Southern California Institutional Review Board must be obtained to access them. Please contact the corresponding author for more information regarding access to HELIX data.