Introduction

Preterm birth is a major public health concern, accounting for nearly one out of every ten births in the United States.1 Even though survival of infants born before 30 weeks postmenstrual age (PMA) has increased due to improved medical technologies and interventions,2 preterm birth still places infants at risk of developing chronic diseases, brain injuries, and adverse long-term behavioral and cognitive outcomes such as learning disabilities, autism spectrum disorders, cerebral palsy, epilepsy, and various other psychopathologies.3,4,5 These potential outcomes further lead to an increased need (and government spending) for healthcare, education, and supportive resources in addition to the emotional and financial challenges for the children’s family members.

Preterm birth is also associated with a range of medical conditions that could affect the developmental integrity of these infants. Acoustic characteristics of the infant’s cry have been related to prematurity and to a broad spectrum of medical conditions, including, asphyxia, hyperbilirubinemia, trisomy anomalies, sudden infant death syndrome, prenatal drug exposure, longer-term developmental outcome in preterm infants, and, more recently, risk for autism.6 The acoustic characteristics related to these medical conditions famously include measures related to vibration of the vocal folds responsible for the fundamental frequency or “pitch” of the cry. Some of these characteristics are audible to humans such as the high pitched in cry in cri du chat, named for its distinctive sound. These acoustic characteristics are largely due to brainstem (cranial nerves 9–12) innervation of the larynx and pharynx, suggesting that acoustic cry analysis could provide a biobehavioral barometer of the central nervous system (CNS) integrity.6,7

Biological markers, such as epigenetic modification, can provide information about the internalization of environmental and experience-based stimuli. The inclusion of biological markers that reflect the molecular underpinnings of biobehavioral measures, such as acoustic cry characteristics, could help identify infants at risk for long-term developmental delays. Epigenetics refers to changes in gene expression potential that are not explained by DNA sequence changes and that are heritable across cellular divisions. DNA methylation is one of the most studied epigenetic modifications and involves the addition of a methyl group to cytosine-phosphate-guanine (CpG) islands. DNA methylation can be affected by environment and life experiences, and is thought to be one mechanism through which environmental experience can confer long-term disease risk. In preterm infants, DNA methylation variability has been associated with conditions such as pain-related stress, sepsis, and medical and neurobehavioral risk.8,9,10 Differential methylation has also been related to cry acoustics in term, healthy neonates.11 Combining both biobehavioral measures (such as acoustic cry characteristics) and molecular biomarkers could lead to a new understanding of biological mechanisms through which medical problems such as prematurity influence neurobehavioral outcomes. These insights could improve our ability to detect the risk of long-term developmental delays as early as possible. In previous work, we related variability in genome-wide DNA methylation to performance on the neonatal intensive care unit (NICU) Network Neurobehavioral Scale (NNNS) in a US multisite cohort of infants born <30 weeks PMA.12 While the field of biobehavioral epigenetics continues to grow, the only prior epigenetic studies of cry acoustics have focused specifically on the glucocorticoid receptor. Thus, we aimed to perform the first untargeted epigenome-wide study of cry acoustics to examine whether novel genes or pathways may be related to infant cry characteristics. Here, we profiled genome-wide DNA methylation to explore whether differences in DNA methylation are related to acoustic cry characteristics in these infants.

Methods

Subjects

The Neonatal Neurobehavior and Outcomes in Very Preterm Infants (NOVI) study was conducted at nine university-affiliated NICUs in Providence, RI, Honolulu, HI, Grand Rapids, MI, Torrance and Long Beach, CA, Kansas City, MO, and Winston-Salem, NC from April 2014 to June 2016. These NICUs also participated in the Vermont Oxford Network. All participating mothers provided written informed consent. The enrollment and consent procedures of this study were approved by the institutional review boards of Women and Infants Hospital, Spectrum Health, Children’s Mercy Office of Research Integrity, Wake Forest University Health Sciences, John F. Wolf, M.D., Human Subjects Committee at Los Angeles BioMed, Emory University, and Western Institutional Review Board; all methods used in the study were performed in accordance with the relevant regulations and guidelines. Eligibility was based on the following inclusion criteria: (1) birth at <30 weeks gestation; (2) parental ability to read and speak English or Spanish; and (3) residence within 3 h of the NICU and follow-up clinic. Exclusion criteria included maternal age <18 years, maternal cognitive impairment, maternal death, infants with major congenital anomalies, including CNS, cardiovascular, gastrointestinal, genitourinary, chromosomal, and nonspecific anomalies,13 and NICU death. Parents of eligible infants were invited to participate in the study when survival to discharge was determined to be likely by the attending neonatologist. Overall, 704 infants were enrolled and buccal samples were collected from 624 of these infants for epigenomic screening. Demographic variables, including infant gender, race, ethnicity, maternal education, and partner status were collected from the maternal interview. The Hollingshead Index was used to assess socioeconomic status (SES), with Hollingshead level V, indicating low SES.14 Neonatal medical variables including birthweight, gestational age, length of NICU stay, weight at discharge, and gestational age at discharge were abstracted from medical records. Buccal swabs were collected during the week of discharge from the NICU. Thus, PMA at buccal swab collection represents the combination of gestational age at birth plus the length of NICU stay, and since DNA methylation has been associated with aging metrics, PMA was included as a covariate in our statistical analyses.

Measurements

DNA collection and methylation (DNAm) analysis

Genomic DNA was extracted from buccal swab samples obtained near-term-equivalent age, using the Isohelix Buccal Swab system (Boca Scientific). DNA was quantified using the Qubit Fluorometer (Thermo Fisher, Waltham, MA, USA), then aliquoted into a standardized concentration for subsequent analyses, and samples were randomly plated across 96-well plates. The Emory University Integrated Genomics Core performed bisulfite modification using the EZ DNA Methylation Kit (Zymo Research, Irvine, CA). Subsequent assessment of genome-wide DNAm was performed using the Illumina MethylationEPIC Beadarray (Illumina, San Diego, CA) following standardized methods that were based on the manufacturer’s protocol. Samples with >5% of probes yielding detection p values >1.0E − 5, with a mismatch between reported and predicted sex, and samples with incomplete covariate data were excluded from the analysis. The array data were normalized by functional normalization and standardized across probe designs Type-I and Type-II with beta-mixture quantile normalization.15 Probes that measured methylation on X and Y chromosomes, probes with single-nucleotide polymorphisms (SNPs) within the binding region, probes that could cross-hybridize to other regions of the genome, and probes that had low variability (beta values range <0.05) were removed.16 After exclusions, 706,278 probes were available for analysis from 341 samples that also had acoustic cry data. The methylation data are publicly available through the Gene Expression Omnibus via accession series GSE128821.

Tissue heterogeneity estimates

Cellular heterogeneity is a well-recognized source of confounding in epigenome-wide association studies.17 As a result, the proportion of epithelial, fibroblast, and immune cells (including B cells, natural killers, CD4+ T cells, CD8+ T cells, monocytes, neutrophils, and eosinophils) were estimated in cheek swab samples using reference methylomes.18 For 95% of our buccal swab samples, 95.7% of the cells were epithelial cells and immune cells were the majority of the remaining cell types, which were inversely correlated with epithelial cell proportions.12 Thus, cellular heterogeneity was adjusted for by including the proportions of epithelial cells as a covariate in statistical models.

Recording and extraction of cry episodes

Cries of the preterm infants were elicited and recorded during the administration of the NNNS during the week of NICU discharge (±3 days). Audio recordings were made using an Olympus direct to PCM digital voice recorder and saved in an uncompressed.wav audio format (recording parameters: 16 bit, 48 kHz). Recorders were attached to the side of the infant’s bassinette/pram at a standardized location and oriented towards the infant’s mouth. This allowed for a standard distance between the infant’s mouth and the recording device during the infant examination. Episodes of cry vocalizations were identified from these audio recordings. Cry episodes suitable for acoustic analysis were identified based on the absence of background noises that would interfere with the analysis (i.e., adult talk, medical equipment noises, and other environmental noises). The first suitable cry episode from each exam was excerpted into a.wav file for subsequent acoustic analysis. The extracted recordings only contained infant cry and were stripped of any identifiable information. Of the infants with DNA methylation array data, 335 had extractable cries (infant cried during the audio recording, good sound quality, no background noise, extractable and analyzable files), which were then analyzed.

Cry acoustic analysis

The extracted cry episodes were then analyzed using a computer system that has been specifically designed and well-validated to perform cry analyses in infants.19 Analysis proceeded in two phases. The first phase applied a cepstral-based acoustic analysis used to extract acoustic parameters with a 12.5 ms frame advance. A second phase organized and summarized this information in cry utterances. Cry utterances were defined as a cry during the expiratory phase of respiration lasting at least 0.5 s. The software then calculated summary variables within the cry utterances.

Data analysis

In order to analyze the cry acoustic data, principal component analysis (PCA) was performed with rotation (oblimin with Kaiser normalization). This PCA allowed for dimension reduction, as there were too many cry variables to examine individually. The factors that explained the greatest proportion of variation were used for subsequent analyses. These factors each had an eigenvalue >1 and explained >10% of the proportion of variation.20 The statistical analyses for epigenomic data were performed in R version 3.5. The epigenome-wide association study (EWAS) analysis (n = 335 infants with complete cry data and epigenomic data) was performed with robust linear regression using the MASS package and robust standard errors were estimated using sandwich package for each CpG site that passed QC. While the majority of infants in this study were from independent families (275 of 335), we estimated cluster robust standard errors, specifying family ID as a clustering variable, to account for potential correlations in the errors between siblings. The models were adjusted for infant sex, PMA, recruitment site, and proportions of epithelial cells. Separate models were run for each factor; factor 2 was found to have a bimodal distribution, thus the variable was dichotomized.

Results

Maternal infant characteristics

Mothers of infants included in the current study of cry acoustics (n = 289) when compared to those not included (n = 312, their infants did not have buccal and cry data) had no significant differences in maternal demographic characteristics. Included infants (infants with buccal and cry data, n = 341) were discharged ~1 week earlier, stayed 8.5 days less in the NICU, and were less likely to have chronic lung disease or sepsis compared to study infants not included in the current analyses (n = 363) (Table 1).

Table 1 Maternal and infant characteristics for included vs. not included subjects.

At enrollment, 57.4% of mothers self-identified as minority, 26.6% did not have a partner, 9.3% had low SES, and 12.8% did not complete high school (Table 1). The average PMA at birth was 27.1 weeks and average birthweight was 966 g. Infants were discharged at 40 weeks and weighed on average 2989 g. The average length of stay in the NICU was 90 days. Infant medical complications included necrotizing enterocolitis (6.5%), sepsis (10.6%), chronic lung disease (47.2%), and brain injuries, including parenchymal echodensity (8.3%), parenchymal echolucency (6.5%), ventricular dilation (9.2%), and intraventricular hemorrhage (17.5%) (Table 1).

Cry acoustics

Two cry factors explained the greatest proportion of variation. They were frequency/energy-related (factor 1) and pitch/hyperpitch-related (factor 2) (Table 2). Factor 1 included variables that described the overall spectral energy contained in cry utterances, including the amplitude of the cry signal at the fundamental frequency, overall loudness of the utterances, and signal amplitude within specific frequency bands. Factor 2 included measures of fundamental frequency (F0, or “pitch”), as well as the portions of utterances where F0 was >1 kHz (“hyperpitch”). Factor scores were then calculated for each individual for the two significant cry factors such that an increase in the factor score meant a higher degree of frequency/energy (factor 1) or higher pitch/hyperpitch (factor 2). These two factors were used for the EWAS, for those infants that had both infant cry data and epigenomic data (n = 335).

Table 2 Factor loadings for cry acousticsa.

EWAS of cry acoustics

An EWAS between DNA methylation and the two cry factors were performed to explore whether differences in DNA methylation in buccal cells was related to acoustical cry characteristics. The genomic distributions of these results are presented in Manhattan (Fig. 1) and the observed vs. expected distributions of p values are presented via quantile–quantile (QQ) plots, which do reveal some inflation and thus a potential for residual confounding (Supplementary Fig. 1). The results of these models can be found in Supplementary Material (Supplementary Tables S1 and S2), and all associations reported had a false discovery rate (FDR) <10% (q < 0.1). Two CpG sites (cg08643005 and cg06728103) were associated with factor 1 at genome-wide significance (α = 7.08E − 09), and we include 651 CpGs with FDR q < 0.1 in Supplementary Table S1. The CpG site cg08643005 demonstrated reduced methylation with an increase in factor 1 scores (frequency/energy) and it is linked to the gene TCF3. Site cg06728103 showed significantly increased methylation with an increase in the frequency/energy factor scores, and it is located on chromosome 12; however, it is intergenic. The results for these two significant CpG sites, as well as the identified gene’s roles, can be found in Table 3.

Fig. 1: Manhattan plots demonstrating the genomic distributions between DNA methylation levels and cry acoustics.
figure 1

The top panel demonstrates associations for cry acoustic factor 1 and the bottom panel demonstrates associations for cry acoustic factor 2, across 22 chromosomes (x-axis); alpha and the horizontal dotted lines represent the Bonferroni-threshold for statistical significance.

Table 3 Summary of the results for cry acoustics factor 1 (FDR < 10%) as well as biological roles of identified genes.

For factor 2 (pitch/hyperpitch), 16 CpGs attained genome-wide significance (α = 7.08E − 09), and we include 220 CpGs with FDR q < 0.1 in Supplementary Table S2. Of the sites that were genome-wide significant, five exhibited lower levels of DNA methylation, associated with higher scores for the pitch/hyperpitch factor compared to the group with lower scores, while the other 11 exhibited higher methylation. The CpG site, cg16459276 (intergenic), exhibited the greatest increase in methylation related to this pitch/hyperpitch factor, while cg05429445, in the PPP1CA gene, showed a similarly decreased extent. The results for these 16 significantly differentially methylated CpG sites as well as the identified gene’s roles can be found in Table 4.

Table 4 Summary of the results for cry acoustics factor 2 (FDR < 10%) as well as biological roles of identified genes.

CpG annotation

We identified phenotypes or traits that have been associated with the genes that were annotated to CpGs associated with cry acoustics at genome-wide significance (Tables 5 and 6).

Table 5 Genes annotated to cry acoustics (factor 1)-associated CpGs that have been linked to traits from GWAS.
Table 6 Genes annotated to cry acoustics (factor 2)-associated CpGs that have been linked to traits from GWAS.

Discussion

We studied the association between genome-wide DNA methylation variation and changes in cry acoustic characteristics, particularly frequency/energy and pitch/hyperpitch. Our EWAS highlighted 18 CpGs that were significantly associated with the acoustic cry factors. Two CpG sites, one that was intergenic and one that was linked to gene TCF3 (important for B and T lymphocyte development), were associated with frequency/energy-related cry variables. Increased methylation of TCF3 was an association with less energy in the cry across multiple frequency bands. We also found significant differences in DNA methylation at 16 CpG sites that were associated with pitch/hyperpitch in the cry, four of which were related to the regulation of the cell cycle and DNA repair. Of these, PPP1CA gene (involved in cellular processes such as cell division) showed the greatest decrease in methylation, which was associated with increased hyperpitch in the cry.

Sixteen epigenetic loci were annotated in GWAS studies. Although these results likely include some false positives, a number of genes annotated to these CpGs have been linked to cellular growth, proliferation, and differentiation (TCF3, CDC14A, ERCC2, and UCK2), cognition (RXRA and PALM), and neurobehavioral or neurological disorders (TCF3, RXRA, CARHSP1, PALM, and CPLX2) in GWAS studies.21 We also found that the CpGs from this analysis were enriched for genes within pathways involving DNA binding and repair, cell cycle control, mRNA expression, immune response, and synaptic activity. Of particular note, four genes annotated to the significant CpGs were within the cell division Gene Ontology terms and pathways: protein phosphatase 1 catalytic subunit alpha (PPP1CA), ribosomal protein s6 kinase b2 (RPS6KB2), cell division cycle 14a (CDC14A), and uridine–cytidine kinase 2 (UCK2). The CpG site with the largest magnitude of inverse association between our statistically significant epigenetic loci, cg05429445, was within the body of the PPP1CA. This gene codes for one of the subunits of protein phosphatase 1, which is known to be involved in the regulation of cell processes, such as cell division, glycogen metabolism, and protein synthesis. Protein phosphatase 1 is involved in the regulation of ionic conductance and long-term memory.22 Experimental mouse models have suggested that protein phosphatase 1 functions as a suppressor of learning and memory.23 Therefore, differential regulation of this gene is implicated in cognitive impairments. A number of these genes, such as RPS6KB2, CDC14A, UCK2, TCF3, and RXRA, have been associated with cellular growth, proliferation, and differentiation.24,25,26,27,28,29,30,31 These are primarily processes related to growth regulation and are highly active and dynamic during this early developmental period, and thus cry-associated differential methylation of these genes may be an indicator of the developmental state of these infants.

Of the genes associated with hyperpitch, two were related to neuronal dynamics: paralemmin-1 (PALM) and complexin 2 (CPLX2). The protein encoded by PALM is involved in neuronal plasma membrane dynamics, particularly D3 dopamine binding.32 The protein encoded by CPLX2 is involved in synaptic vesicle exocytosis, and it is associated with Huntington disease and schizophrenia. Overall, our findings suggest that our cry acoustic-associated epigenetic variations occurred at various genomic regions with recognized roles in cognitive impairments, growth regulation, and neurodevelopment and neurodegenerative disorders. In previous work with this sample, we found that epigenetic variation was associated with neurobehavioral profiles on the NNNS at key genes involved in neurodevelopment, including PLA2G4E, TRIM9, GRIK3, and MACROD2. Thus, acoustic cry characteristics and NNNS profiles were related to epigenetic variation in different sets of genes related to neurodevelopment, suggesting that separate molecular pathways could be involved.

The neonatal cry provides information about CNS function as it is produced by the coordination among several brain regions, including the brainstem, midbrain, and limbic system. The lower brain stem controls the muscles of the larynx, pharynx, chest, and upper neck through the vagal complex and the phrenic and thoracic nerves that are responsible for acoustic cry characteristics.33,34 These associations have also been found in work relating acoustic cry characteristics to developmental outcome in preterm infants.35 Thus, the study of acoustic cry analysis and epigenetic variation could lead to a better understanding of the molecular underpinnings of acoustic cry characteristics in preterm and other at-risk populations. The combination of the acoustic cry analysis and DNA methylation measures could help in the identification of children who may be at greater risk for adverse developmental outcomes, which is the focus of some of our ongoing work in the NOVI cohort.

There were limitations to this investigation. First, both infant cry and adequate buccal swabs were only available for half the sample. Second, QQ plots revealed that we identify a larger number of small p values than would be expected, thus we focused our results and discussion only on those CpGs that were significant after Bonferroni adjustment. However, it is likely that some false positives are mixed within these results. Additional studies are required to examine genome-wide DNA methylation and cry characteristics to determine whether similar relationships are observed in independent populations or in larger samples sizes. Third, residual confounding could still exist; however, we robustly adjusted for many potential confounding variables throughout the study. Fourth, this investigation is associational and not experimental. Therefore, the associations were not hypothesized in advance and it is important to remain cautious in terms of interpretations as they relate to cellular and molecular mechanisms. Finally, there were differences in neonatal characteristics between those included and those not included. Interestingly, these differences were more favorable to the included group.

Despite these limitations, this is a novel study relating acoustic cry characteristics to variation in DNA methylation in very preterm infants. Many of these differentially methylated CpGs were within genes that are involved in multiple neurodevelopmental processes. Thus, we hypothesize that the observed differences in methylation may be correlated with neuro- and/or behavioral–developmental processes. This is the first large-scale genome-wide study of DNAm and infant cry acoustics, and we encourage future studies to examine the reproducibility of these findings in independent datasets, and/or to examine whether DNAm at these same CpGs is related to other biobehavioral characteristics.