Epigenomic profiling of preterm infants reveals DNA methylation differences at sites associated with neural function

DNA methylation (DNAm) plays a determining role in neural cell fate and provides a molecular link between early-life stress and neuropsychiatric disease. Preterm birth is a profound environmental stressor that is closely associated with alterations in connectivity of neural systems and long-term neuropsychiatric impairment. The aims of this study were to examine the relationship between preterm birth and DNAm, and to investigate factors that contribute to variance in DNAm. DNA was collected from preterm infants (birth<33 weeks gestation) and healthy controls (birth>37 weeks), and a genome-wide analysis of DNAm was performed; diffusion magnetic resonance imaging (dMRI) data were acquired from the preterm group. The major fasciculi were segmented, and fractional anisotropy, mean diffusivity and tract shape were calculated. Principal components (PC) analysis was used to investigate the contribution of MRI features and clinical variables to variance in DNAm. Differential methylation was found within 25 gene bodies and 58 promoters of protein-coding genes in preterm infants compared with controls; 10 of these have neural functions. Differences detected in the array were validated with pyrosequencing. Ninety-five percent of the variance in DNAm in preterm infants was explained by 23 PCs; corticospinal tract shape associated with 6th PC, and gender and early nutritional exposure associated with the 7th PC. Preterm birth is associated with alterations in the methylome at sites that influence neural development and function. Differential methylation analysis has identified several promising candidate genes for understanding the genetic/epigenetic basis of preterm brain injury.


INTRODUCTION
Preterm birth affects 5-13% of newborns, 1 and is a profound early-life stressor that is closely associated with cerebral palsy, cognitive impairment, autism spectrum disorder and psychiatric disease. [2][3][4][5][6] The prevalence of impairment is related to gestational age at birth and to adverse exposures such as inflammation, ischaemia, respiratory morbidity and sub-optimal nutrition, 7 but the mechanisms underlying these associations are poorly understood.
Epigenetic modification has a fundamental role in regulating gene expression and determining neural cell fate, and DNA methylation (DNAm) is one such modification that is highly conserved across species. 8 DNAm is dynamic during development, including in the brain 9 and this could provide a mechanism by which environmental factors lead to disturbances of neural development that underpin later impairment. 10 DNAm mediates gene-environment interactions between early-life stress and several neuropsychiatric outcomes, [11][12][13][14] but little is known about DNAm in relation to brain development after preterm birth.
Although DNAm patterns are tissue specific, a number of recent observations suggest consistency between peripheral tissues and brain. First, DNAm profiles are altered consistently between prefrontal cortex and T cells in a rhesus macaque model of early-life stress. 15 Second, the top enriched biological processes from peripheral blood cells of adults with post-traumatic stress disorder and early-life trauma concern central nervous system development, 12 which suggests considerable overlap between tissues. Third, inter-individual variation tends to be consistent across tissue types. 16 Furthermore, sampling DNA from saliva rather than blood is informative in brain DNAm studies because: methylation profiles obtained from saliva show greater correspondence with brain tissue extracts than those obtained from blood; 17 inherent properties of DNAm from buccal cells (greater enrichment of DNaseI hypersensivity sites, histone modifications and disease-associated single nucleotide polymorphisms (SNPs)) may make them a more favourable proxy tissue than blood for epigenome-wide association studies of non-haematological disease. 18 Structural and diffusion magnetic resonance imaging (dMRI) reveal a cerebral signature of preterm birth that includes reduced connectivity of white matter tracts, focal tissue volume reduction in deep grey matter nuclei and reduced cortical complexity. [19][20][21][22][23][24] Specifically, fractional anisotropy (FA) and mean diffusivity (〈D〉) derived from dMRI provide measures of tract integrity in the newborn brain that have a predictable pattern of alteration in preterm infants at term equivalent age (TEA). [25][26][27] These biomarkers are sensitive to genetic and environmental risk modulators for injury, and can detect neuroprotective treatment effects. [28][29][30][31] Probabilistic neighbourhood tractography (PNT) is an automatic segmentation technique, based on single seed point tractography, that can identify the same fasciculus-of-interest across a group of subjects by modelling how individual tracts compare with a predefined reference tract in terms of length and shape. 32,33 This tract shape modelling is unique to PNT and allows not only measurement of tract integrity parameters, such as tract-averaged 〈D〉 and FA, but also provides a metric, the absolute goodness-offit of the segmented tract to the reference (R), which can be used to quantify differences in tract shape between individuals. The inclusion of anatomic information in dMRI models makes PNT ideally suited to studies of genetic and epigenetic effects because brain structure is heritable. 34,35 In this study, we tested the hypotheses that the stress of preterm birth leads to alterations in the methylome that are apparent early in the newborn period, and variance in DNAm is associated with dMRI parameters in major white matter tracts and clinical risk factors for adverse outcome.

MATERIALS AND METHODS Participants
The study was conducted according to the principles of the Declaration of Helsinki, and ethical approval was obtained from the UK National Research Ethics Service. Written parental informed consent was obtained.
The cohort consisted of two groups of neonates who received care at the Royal Infirmary of Edinburgh between January 2012 and September 2014: (1) preterm neonates (defined as postmenstrual age (PMA) at birth o32 completed weeks gestation); and control infants born at full term (437 weeks PMA). Infants were not eligible if they had dysmorphic features suggestive of a chromosomal abnormality that was confirmed by karyotype, a congenital malformation or a congenital infection.

DNA extraction
The DNA OG-575 kit was used for sampling of saliva at TEA, defined as 38-42 weeks PMA (DNAGenotek, Ottawa, ON, Canada). DNA was extracted using an alcohol precipitation technique as per manufacturer's instructions, and was rehydrated in TE 0.5. Gel electrophoresis was used to qualify DNA extraction and Qubit 2.0 Fluorometer was utilised for quantification of DNA concentration (Invitrogen Life Sciences, Carlsbad, CA, USA).

DNAm analysis
DNAm analysis was performed at the Genetics Core of the Edinburgh Clinical Research Facility (Edinburgh, UK). Bisulphite conversion of 500 ng input DNA was carried out using the EZ DNAm Kit (Zymo Research, Freiburg, Germany). Four microlitres of bisulphite-converted DNA was processed using the Infinium HD Assay for Methylation (Illumina Methylation 450k beadchip and Infinium chemistry (Illumina, San Diego, CA, USA)). Each sample was interrogated on the arrays against 485 000 methylation sites. The arrays were imaged on the Illumina HiScan platform and genotypes were called automatically using GenomeStudio Analysis software version 2011.1 (Illumina). The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE72120 (http://www.ncbi.nlm.nih. gov/geo/query/acc.cgi?acc = GSE72120).

MRI acquisition
The preterm infants underwent brain MRI at TEA. A Siemens MAGNETOM Verio 3T MRI clinical scanner (Siemens Healthcare, Erlangen, Germany) and 12-channel phased-array head coil were used to acquire: T1-weighted MPRAGE volume (~1 mm 3 resolution), T2-weighted STIR (~0.9 mm 3 resolution), T2-weighted FLAIR (~1 mm 3 resolution) and dMRI (11 T2and 64 diffusion encoding direction (b = 750 s mm −2 ) single-shot spin-echo echo planar imaging volumes with 2 mm isotropic voxels using a prototype sequence). All examinations were supervised by a pediatrician experienced in MRI procedures. Infants were examined in natural sleep and pulse oximetry, temperature and electrocardiography data were monitored throughout. Ear protection was used for each infant, comprising earplugs placed in the external ear and neonatal earmuffs (MiniMuffs, Natus Medical, San Carlos, CA, USA).

Methylation analysis
Data were processed with the RnBeads tool 36 v 0.99.17, for processes including data loading, pre-processing, normalisation and differential methylation. The software was modified slightly to allow filtering of problematic CpG loci identified by Chen et al. 37 During pre-processing and using the Chen annotation, probes were removed if their CpG loci overlapped with known SNPs from the 1000 Genomes Project (www.1000genomes.org), if a SNP occurred at the site of single-base extension, or if the probe had been shown to be non-specific. A further, smaller set of probes was removed where RnBeads' iterative 'GreedyCut' algorithm identified a large number of unreliable measurements across samples.
Normalisation was carried out using the beta mixture quantile dilation method of Teschendorff et al. 38 in which the differing distributions of Type I and Type II probes is taken into consideration. This method had been shown to perform well in comparison with other methods. 39 Following normalisation a batch correction was applied by use of ComBat 40 to account for the well-documented chip-wise batch effect of the Infinium platform. Before downstream analyses, probes were removed if their target sites occurred on sex chromosomes, or in non-CpG contexts.
Finally, differential methylation between term and preterm individuals was assessed in gene bodies and promoters. RnBeads includes gene-level annotations from Ensembl (www.ensembl.org; v77 for the version of RnBeads used), and assigns promoters as the regions from 1.5 kb upstream to 0.5 kb downstream of the transcription start site. Differentially methylated positions were assessed with Limma, 41 and aggregated for genes and promoters using a generalisation of Fisher's method. The false discovery rate (FDR)-corrected version of these aggregated region-level Pvalues was used to select genes with significantly differentially methylated regions (DMR) in bodies and/or promoters. Gene function annotation was determined from the National Center for Biotechnology Information Gene database (http://www.ncbi.nlm.nih.gov/gene/about-generif).

Validation by pyrosequencing
Pyrosequencing was used to validate DNAm at five selected genes that showed differential methylation (Po 0.05, FDR corrected) between preterm infants at TEA and term controls: SLC7A5, SLC1A2, NPBWR1 and QPRT. APOL1 was included in validation studies because of its functional relevance and the significance value from the array was marginal (P = 0.05). Bisulphite conversion was performed on 500 ng of genomic DNA with the EZ DNAm kit (Zymo Research, Freiburg, Germany). The converted DNA was amplified using the AmpliTaq Gold 360 kit (Applied Biosystems, Warrington, UK) with primers mapping to target regions containing CpGs assayed within the array. PCR primers were designed using PyroMark Assay Design Software 2.0 (Qiagen; https://www.qiagen.com). Pyrosequencing was performed using PyroMark Q24Gold reagents on a PyroMark Q24 Pyrosequencer (Qiagen) according to the manufacturer's instructions. Data were extracted and analysed using PyroMark Q24 1.0.10 software (Qiagen). Background non-conversion levels were~1-3%.

Diffusion MRI analysis
After conversion from DICOM to NIfTI-1 format, the dMRI data were preprocessed using FSL tools (http://www.fmrib.ox.ac.uk/fsl) to extract the brain and eliminate bulk patient motion and eddy current-induced artifacts by registering the diffusion-weighted to the first T2-weighted echo planar imaging volume of each subject. Using DTIFIT, 〈D〉 and FA volumes were generated for every subject. From the underlying white matter connectivity data, eight major white matter fasciculi (genu and splenium of corpus callosum, left and right cingulum cingulate gyrus, left and right corticospinal tracts (CST), and left and right inferior longitudinal fasciculi) were identified using PNT optimised for neonatal dMRI. As described in detail in the study by Anblagan et al., 33 this optimisation principally involved using reference tracts created from a group of 20 term controls.

Principal components analysis
Dimension reduction using principal components (PC) analysis was used to inspect the dataset for signal in the methylation values that is related to clinical variables and imaging features that are associated with neurodevelopmental outcome (implemented in RnBeads). The clinical variables tested were: gender, PMA at birth, PMA at scan, chorioamnionitis, exposure to antenatal steroids, exposure to antenatal magnesium sulphate, number  Of the preterm infants, 9 (25%) had intrauterine growth restriction, 35 (97%) had been exposed to antenatal steroids, 20 (56%) had been exposed to antenatal MgSO 4 and 11 (31%) had histological chorioamnionitis. The mean duration requiring parenteral nutrition after birth was 11 days (range 5-25).

Participants
Of the controls, none had intrauterine growth restriction, and none was exposed to MgSO 4 or steroids for threatened preterm labour at any stage in pregnancy. None received parenteral nutrition.
Association between DNAm and preterm birth About 112,818 probes were removed after: first, pre-filtering (probes on SNPs (n = 66,877); non-specific probes (n = 26,505); sites with excess high detection P-values (n = 8,852)); and second, post-filtering (non-CpG probes (n = 1130); and probes on sex chromosomes (n = 9,454)). The remaining probes were used to calculate aggregate P-values for DMRs in two categories: gene bodies and promoters. About 87 genes were assigned as differentially methylated by this approach (P o 0.05, FDR corrected, Supplementary Table 1), of which 25 were protein coding. About 138 promoter regions were differentially methylated (Supplementary Table 2), of which 58 related to protein-coding genes. About 34 genes were present in both sets (partly due to the overlapping gene and promoter definitions), of which 11 were protein coding. Genes that encode proteins with neural function and/or those with neuropsychiatric disease associations are listed in Table 1.

Array validation
To validate the array findings, pyrosequencing was performed at selected annotated CpG sites in five selected genes: SLC7A5, SLC1A2, NPBWR1, APOL1 and QPRT. CpG sites in all five genes which were identified on the array were confirmed as being differentially methylated (Figure 1, Table 2). Because some assays covered additional neighbouring CpGs, which were not also interrogated in the array, it was possible to assess methylation patterns in the nearby region. For SLC7A5, the assay covered three upstream CpGs that all showed similar differences in methylation; the SLC7A2 and APOL1 assays both captured a second neighbouring downstream CpG that was also differentially methylated; and the assay for NPBWR1 cg26205771 covered 1 upstream and 1 downstream CpG, and both showed similar methylation patterns.
Diffusion MRI analysis Figure 2 shows illustrations of segmented tracts for a representative subject, while Table 3 presents descriptive statistics for 〈D〉, FA and R in the eight major fasciculi identified from the dMRI data in the preterm group using PNT.
Principal components analysis Ninety-five percent of the variance in the preterm methylome was explained by 23 components, with most variance explained by the first two PCs (31.8% and 20.1%, respectively), (Supplementary Table 3). In exploratory analyses, gender was associated with the first PC (P = 0.0071); FA in the genu of the corpus callosum was associated with the 5th PC (3.3% variance; P = 0.0061); right CST R and chorioamnionitis were associated with the 6th PC (2.9% variance; P = 0.0011 and P = 0.0053, respectively); and both gender (P = 0.0016) duration of parenteral nutrition use (P = 0.0017) were associated with the 7th PC (2.6% variance). After correction for multiple tests, three associations remained: right CST R with the 6th PC (P = 0.036); and both gender (P = 0.028) and duration of parenteral nutrition (P = 0.028) with the 7th PC. No variable was significantly associated with any DMR when tested directly (adjusted P-value o0.05).

DISCUSSION
In a deeply phenotyped representative sample of newborns, preterm birth was associated with significant alterations in the methylome in 10 protein-coding genes whose products influence neural cell function and are associated with behavioural traits/ neuropsychiatric disease. We found that specific risk modulators of neurodevelopmental outcome after preterm birth (gender, chorioamnionitis and early nutritional factors) explained a modest, but significant proportion of the variance in DNAm. Furthermore, there was an association between DNAm and white matter tract tissue integrity and shape inferred from dMRI, suggesting that epigenetic variation may contribute to the cerebral phenotype of preterm birth.
Epigenome-wide association studies have provided new insights into genes whose regulation pattern varies in the context of child abuse, post-traumatic stress disorder, schizophrenia and autism spectrum disorder, 12,64,65 but to our knowledge this is the first epigenome-wide association studies in preterm infants and healthy controls to identify differential methylation at loci that influence neural development. The magnitude of the DNAm differences between preterm and term infants varied between and within genes; however, in support of a potential biological role for these changes, differential methylation was identified at multiple CpGs on the array for most loci. Pyrosequencing analysis for all genes selected for validation confirmed the differences at individual CpGs seen on the array and also identified additional differentially methylated neighbouring CpGs, suggesting that preterm birth associates with widespread effects on DNAm at these loci. For the majority of these genes, differential DNAm was identified in the gene promoter, although for LRG1 and SLC7A5 differential methylation also extended into the gene body. In general, DNAm at DMRs has a negative correlation with gene expression, with recent studies reporting that this correlation is stronger not only for CpGs close to the transcription start site but also for intragenic DMRs, which do not necessarily mark intragenic

LRG1
Leucine-rich alpha-2-glycoprotein 1 Involved in protein-protein interaction, signal transduction, cell adhesion and development, is expressed in astrocytes of cerebral cortex and is linked with ageing and neurodegeneration 54,55 Promoter and CpG islands or CGI shores and may instead represent functional elements. 66 A particularly large number of differentially methylated CpGs were identified in two members of the solute transporter family of membrane transport proteins, SLC7A5 and SLC1A2. In preterm infants, a significant reduction (~10%) in DNAm was seen at multiple CpGs in the SLC7A5 promoter and gene body on the array, and more were identified on pyrosequencing. SLC7A5 (also known as LAT1) is a member of the solute transporter family of membrane transport proteins and is involved in the transport of large amino acids, including methionine, across the blood-brain barrier. 67 Methionine is a key component of S-adenosylmethionine, the major methyl donor and very recent data suggests that SLC7A5 can act as an indirect regulator of the epigenome, at least in terms of histone modification through effects on the availability of methionine and the subsequent availability of Positions of additional cytosines covered by pyrosequencing assays but which are not present on the array are given relative to the annotated CpG from the array. Figure 1. Differential methylation between preterm infants at term equivalent age and healthy infants born at term at CpG sites in proteincoding genes with neural function identified in the array (Po 0.05, corrected).
S-adenosylmethionine. 68 SLC1A2 (also known as EAAT2, GLT-1) is predominantly expressed in astrocytes but is also expressed by oligodendroglia and macrophages, and on neurons during development. It has a role for clearing glutamate throughout the neuroaxis and can be both transcriptionally and posttranscriptionally regulated. 44 The expression of SLC1A2 is reported to be developmentally regulated, particularly at the window of peak vulnerability for the development of periventricular leukomalacia. 69,70 The prevailing form of preterm brain injury is diffuse and involves multiple cell lines (reviewed by Back and Miller 71 ). The pathogenesis includes death of pre-myelinating oligodendrocytes (pre-OLs) because of vulnerability to inflammatory mediators, reactive oxygen and nitrogen species, and glutamate excitotoxicity. This is followed by defective pre-OL regeneration and repair, leading to hypomyelination. In pre-clinical and human postmortem studies the diffuse form of white matter injury coincides with enrichment of reactive glia (activated microglia/macrophages and reactive astrocytes) that inhibit the maturation of pre-OLs to myelin-forming oligodendrocytes. The neuronal population is not thought to degenerate under conditions that generate pre-OL loss (outside the context of tissue necrosis and cystic periventricular leucomalacia), but rather there is a dysmaturation response characterized by aberrant dendritic arborisation, disturbances in synaptic activity and reduced spine density. The functional profiles of the 10 genes that we found to be differentially methylated in preterm infants include neuronal and glial signalling, neurotransmission, apoptosis and cellular energetics. Our findings focus attention on the role of these genes in mediating injury and regeneration/repair processes after preterm birth, and their candidacy is further strengthened by the neuropsychiatric disease associations in later life (Table 1).
PCs analysis was used to explore whether dMRI measures in the major white matter fasciclui or clinical risk factors contributed to structure in the methylation data of the preterm group. After correction for multiple tests and exclusion of probes on sex chromosomes, a small proportion of the variance was explained by the shape of CST, which was associated with the 6th principal component (3.3% of variance); and of the clinical factors tested, gender and number of days requiring parenteral nutrition, both associated with the 7th principal component (2.9% of variance). However, none of these three variables was significantly associated with any DMR when tested directly, which indicates that if an effect is present, it is subtle and distributed over many loci.
We sampled the methylome at a single time point chosen to reflect the allostatic load of preterm birth and neonatal intensive care among children who survive to hospital discharge, but this leaves uncertainty about the temporal cues for epigenetic modification in the perinatal period. A recent study of DNAm in umbilical cord blood of 11 preterm infants and 11 term controls demonstrated 20 DMRs between the groups, including loci in 3 genes that are involved with neuronal development: PPT2, GABBR1, PLEKHB1. 72 We did not identify DMRs in these genes,   Preterm brain injury and the methylome S Sparrow et al which may be explained by differences in study population, timing of sampling or tissue-type sampled. A multiple sampling design that includes parental samples, placental tissue, cord blood and extends across the life-course would be required to investigate the relative contributions of in utero and postnatal exposures to changes in DNAm, and the extent to which preterm birth leaves a legacy on the methylome. 73 In conclusion, these novel data show that the profound earlylife stress of preterm birth is associated with differential methylation at sites in several protein-coding genes. The analysis of differential methylation has identified provide promising candidate genes for understanding genetic influences on brain development after preterm birth.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

ACKNOWLEDGMENTS
The study was funded by Theirworld (http://www.theirworld.org), and was sponsored by the University of Edinburgh. The study was supported by NHS Research Scotland and NHS Lothian Research and Development. We are grateful to the families who consented to take part in the study and to the nursing and radiography staff at the Clinical Research Imaging Centre, the University of Edinburgh (http://www.cric.ed.ac. uk) who scanned the infants. We thank Thorsten Feiweier at Siemens Healthcare for collaborating with dMRI acquisitions (Works-in-Progress Package for Advanced EPI Diffusion Imaging). AJD receives funding from the Scottish Funding Council (SCD/09).