Main

Necrotizing enterocolitis (NEC) is one of the most severe life-threatening complications of premature birth. The incidence of NEC in extremely preterm neonates (birth weight 401–1,000 g or gestational age 22–28 weeks) is 9% (1). The mortality of the disease is 30% in very-low-birth-weight and 50% in extremely-low-birth-weight neonates (2). In addition to death in the acute phase, the most devastating long-term sequelae of NEC are short bowel syndrome and neurodevelopmental impairment (2, 3).

The potential contribution of genetic predisposition to NEC has been considered in prior studies. Twin studies indicate that genetic factors may account for 50% of the variance in liability for NEC, although adjusting for covariates negated statistical significance in this relatively small cohort (4). Certain single-nucleotide polymorphism (SNPs) such as in carbamyl phosphate synthetase (5), interleukin-12 (p40 promoter CTCTAA/GC) (6), vascular endothelial growth factor (C−2578A) (7), and nuclear factor kappa B subunit 1 (8) have been found to be associated with NEC. Other SNPs in Toll-like receptor 4 (A+896G, C+1196T), CD14, CARD15 (9), platelet-activating factor acetylhydrolase (10), macrophage migration inhibitory factor (MIF) (11), mannose-binding lectin (12), angiotensin-converting enzyme and ATR1166A/C (13), or other cytokines (14) have not been found to be associated with NEC. However, there are no reports to date of genome-wide association studies (GWAS) for NEC. Furthermore, with the exception of the angiotensin-converting enzyme study, the aforementioned negative results were obtained from small cohorts, with low statistical power to detect differences.

There are >10 million SNPs in the human genome, of which 70% are in intergenic regions (15). Therefore, attempting to identify disease-causing genetic variations by hypothesis-driven, targeted analysis of SNPs in specific genes is akin to searching for a needle in a haystack. With the increasing availability of information on variations in the human genome, GWAS became the most efficacious method to identify relationships between gentic variation and diseases (16).

Our objective was to identify genes and pathways associated with surgical NEC (Bell stage III), compared with infants surviving without medical or surgical NEC. We found that genetic variations most significantly associated with increased risk of surgical NEC were located in a cluster of minor alleles in an intergenic region of chromosome 8 in the 8q23.3 region. Since there was no prior knowledge regarding any potential significance to this intergenic region, our further objectives were to perform in silico analysis to identify potential novel coding sequences or other potentially functional domains that might explain why genetic variations at this location would have physiological or pathological consequences.

Methods

Cohort

Patients included were a subset of infants enrolled in the Eunice Kennedy Shriver NICHD Neonatal Research Network’s Cytokines study that enrolled infants weighing 401–1,000 g at birth, <72 h of age, and free of major congenital anomalies (17). The study was approved by institutional review boards at participating centers, and written informed consent was obtained from parent(s). Additional institutional review board review was required to allow the federally funded GWA genotyping results with a limited number of phenotype data to be included in the NHGRI Database of Genotypes and Phenotypes.

Isolation of DNA

DNA was extracted from the earliest age blood spot collected on filter paper. Whole-genome amplification was used for samples that did not provide adequate genomic DNA. Genotyping was carried out on the Illumina HumanOmni1-Quad_v1-0_B BeadChip.

Definitions

NEC was defined as proven NEC (≥Bell stage II (18, 19)). Surgical NEC was defined as placement of drain or performance of laparotomy (Bell stage III). Importantly, we needed to exclude infants with spontaneous intestinal perforations (SIPs) from analysis, and we also needed to exclude neonates that did not live long enough to have a chance to develop NEC. Therefore, we excluded all infants who died or developed perforation (surgical NEC) before postnatal day 7 (to avoid including infants who died from respiratory causes, severe intraventricular hemorrhage, or developed SIP), and evaluated only infants who survived beyond day 7 (at the risk of developing NEC).

Ethnicity

Ancestry was classified as African-American, non-Hispanic Caucasians, Hispanic Caucasians, and others, including Asian and multiancestral using GWASTools (20) to generate eigenvalues for the entire data set.

Imputation

Imputation was run using Beagle 3.3.1. In total, 769,757 SNPs were used for imputation with 7,500,443 SNPs being imputed (21).

GWA Analysis

SNPs were analyzed using PLINK (22) using logistic regression under an additive model. Three models were run:

  1. 1)

    Proven NEC (stage II or greater) or death vs. survival without proven NEC.

  2. 2)

    Surgical NEC or death vs. survival without either medical or surgical NEC.

  3. 3)

    Surgical NEC in survivors vs. survivors without surgical NEC.

The regression models included covariates for GA, small for GA, gender, Apgar score at 5 min <5, antenatal steroids, and the genomic ancestry eigenvalues 1–4. The top 10 SNPs (by the lowest P value) for each of the three models were mapped to genes.

Validation cohort details: Replication was attempted in the Gene Targets for Intraventricular Hemorrhage Study (23). This study is of inborn infants with birth weights 401–1,000 g with either intraventricular hemorrhage or normal cranial ultrasounds enrolled prospectively at 24 universities; additional samples were provided from ELGAN, Iowa Prematurity, and Oulu University cohorts. The babies were also evaluated for other birth-related end points such as NEC.

Pathways

We assigned genes to pathways using the Molecular Signatures Database (MSigDB) (http://www.broadinstitute.org/gsea/msigdb/collections.jsp). SNPs were assigned to gene(s) based on being exonic, intronic, untranslated region, or within 20 kb of the ends of the gene model. Pathways were analyzed using gene set enrichment analysis (24). Reactome pathways significant at false-discovery rate <0.15 and P <0.01 were considered relevant.

RNA Sequencing

RNA was isolated from mucosal strips of freshly obtained surgical specimens during bowel resection under institutional review board-approved protocols. RNA isolation was performed using the RNAeasy Mini Kit on the Qiacube platform (Qiagen, Germantown, MD). One sample was collected from a specimen obtained during bowel resection due to NEC and another sample was collected from a specimen collected during resection to correct ileal atresia to ensure coverage. RNA sequencing (RNAseq) was performed at Hudson Alpha (Huntsville, AL). Fastq sequences were groomed with FASTQ groomer, and they were aligned to the genome (hg19) by Bowtie and Tophat (25, 26). The resulting BAM data sets were restricted to a ±500-kb region around the NEC risk (NECRISK) region using the BAM slicer. Clusters of aligned sequences were identified in Galaxy visualizer, and the genomic coordinates were used to generate further slices of the BAM file corresponding to individual clusters of alignment. These cluster BAM files were then converted into lists of FASTA files using the convert, merge randomize tool in Galaxy. The resulting FASTA files were converted into contigs using the CAP3 program at the Pôle Rhône-Alpes de Bioinformatique Site Doua (27). The contigs were used with the Blat tool in the UCSC genome browser to align with the genome.

Results

The patient population consisted of all extremely-low-birth-weight infants with birth weights ≥401 g and ≤1,000 g for whom blood spots were available in the repository of the Eunice Kennedy Shriver NICHD Neonatal Research Network. We found that it is important to exclude any overlap between SIP and NEC, and we also needed to exclude neonates who died in the period when NEC is very rare but SIP is common, that is, the first postnatal week. Therefore, infants who died or developed intestinal perforation before postnatal day 7 were excluded, based on the observations that the vast majority of NEC cases occur past day 7 (28) and the majority of SIPs occur before day 7 (29). A total of 751 infants were included in the analysis, of which 30 infants were diagnosed with surgical NEC after day 7 (of 40 infants with genomic data, 10 had spontaneous gastrointestinal perforation before day 7, and 30 had surgical NEC diagnosed after day 7).

The major demographic variables describing the cohort analyzed in this study are summarized in Table 1A,B,. Table 1A contains variables at birth and Table 1B contains variables describing the clinical course. There were no significant differences in patient characteristics at birth, including ancestry, birth weight, gestational age, small for gestational age, gender, Apgar score, and cesarean-section delivery. Age at full enteral feeds (P=0.007), age at the first enteral feed (P=0.02), and days of assisted ventilation (P=0.001) were significantly different between the surgical NEC group compared with the controls. There was no difference in the need for bag and mask ventilation at birth, patent ductus arteriosus (PDA), Indocin for PDA, or surgery for PDA.

Table 1A Characteristics of enrolled infants: at birth
Table 1B Characteristics of enrolled infants at birth: clinical course

There were 261 SNPs that exhibited allelic frequencies that differentiated between patients with surgical NEC vs. patients without any (medical or surgical) NEC at P<0.05 (Supplementary Table 1 online), of which 35 were significant at P<10−7 (Table 2). A particularly strong association was found between a cluster of SNPs spanning 43 kb on chromosome 8 at the location 8q23.3 and the incidence of NEC (Figure 1). Having minor allele(s) in this region conferred an odds ratio (OR) of 4.72 (2.51–8.88) for surgical NEC. We termed this region the NECRISK cluster. Table 3A, 3B, 3C, 3D illustrates allelic frequencies observed in controls vs. surgical NEC patients for the SNP (rs7820058) within the NECRISK cluster that showed the highest significance of association with NEC. This increased risk was similar for all three genetic ancestries that were represented in this population (Table 3A–D). Notably, the minor allele frequencies observed in controls overall and in all three ancestries analyzed corresponded to the minor allele frequency of this SNP in 1,000 genomes. Once the diagnostic criteria were loosened, that is, surgical NEC or death, or NEC stage II or greater or death were compared with all survivors without NEC, and the degree of association with NECRISK SNPs decreased (Supplementary Table S2 online). Notably, there was no association in minor allele frequency of rs7820058 and stage II NEC (Supplementary Table S3A online), or stage II NEC or death (Supplementary Table S3B online).

Table 2 SNPs found significantly associated at <P=10–6 with surgical NEC by GWAS
Figure 1
figure 1

Manhattan plot of SNPs that exhibited association with surgical necrotizing enterocolitis (NEC) vs. controls. Data shown are –log P values on the Y axis along with chromosome locations on the X axis. The arrow points to the NEC risk (NECRISK) region.

Table 3A Allelic frequencies of rs7820058 in surgical NEC in infants surviving >7 days vs. survival without medical or surgical NEC across the entire NRN cohort
Table 3B Allelic frequencies of rs7820058 in surgical NEC in infants surviving >7 days vs. survival without medical or surgical NEC among non-Hispanic Caucasians
Table 3C Allelic frequencies of rs7820058 in surgical NEC in infants surviving >7 days vs. survival without medical or surgical NEC among African Americans
Table 3D Allelic frequencies of rs7820058 in surgical NEC in infants surviving >7 days vs. survival without medical or surgical NEC among Hispanic Caucasians

A validation cohort consisted of 1,018 multiancestral extremely preterm neonates from the Gene Targets for Intraventricular Hemorrhage Study (30). This cohort included 52 NEC cases, of which 26 had surgical NEC along with 966 controls. Controls were defined as survival >7 days, but no data were available on spontaneous intestinal perforation and whether the diagnosis of NEC was before or after postnatal day 7. The validation cohort was 50% African American, 42.7% non-Hispanic Caucasian, and 8.8% Hispanic Caucasian with the balance being other ancestries. The majority of SNPs within the NECRISK region did not differ between cases with NEC and cases without NEC (Supplementary Table S4 online). However, the minor allele frequency of the only SNP that did exhibit P<0.05 (rs13252246; P=0.02) among the NECRISK SNPs was enriched in NEC cases similar to our discovery cohort. The SNP exhibiting the next lowest P value (P=0.06; rs10755911) was also enriched in NEC.

In addition to the NECRISK region, which showed the highest degree of correlation with the incidence of NEC, in the Neonatal Research Network (NRN) cohort, a cluster of four SNPs on chromosome 14 showed the second highest level of association with surgical NEC with unadjusted P values of 10–7. This chromosome 14 cluster corresponds to adenylate cyclase 4 (ADCY4) and leukotriene B4 receptor (LTB4R) genes. The next most significant cluster of four SNPs was on chromosome 11 corresponding to the neurogranin gene (P=4 × 10–7).

In silico analysis of the NECRISK region: As shown in Figure 2a, the nearest two genes to the NECRISK region are CSMD3 (–1.43 Mb) and TRPS1 (+542 kb). The same two genes are located on a syntenic region of murine chromosome 15 with a similar intergenic distance (Figure 2a), suggesting a high degree of evolutionary conservation for this region. The chromosomal distances between the NECRISK cluster, CSMD3, and TRPS1 are beyond the typical distances observed in direct regulation of transcription, such as the core promoter, proximal promoter, enhancer, silencer, insulator, and locus control regions that are typically within 100 kb of the genes they regulate (31). For this reason, we thoroughly analyzed a 200 kb region surrounding the NECRISK SNP cluster in search of known transcripts or potential novel transcripts within a distance that would allow regulation of these transcripts by genetic variation.

Figure 2
figure 2

The NEC risk (NECRISK) region exhibits a high degree of evolutionary conservation. The image shown is the region between the flanking known genes of TRPS1 and CSMD3 of both human chr 8 and mouse chr 15. The preserved orientation and spacing of genes indicate a high degree of evolutionary conservation.

Based on predicted transcripts annotated in the UCSC genome browser (https://genome.ucsc.edu/), there are three predicted transcripts within the 200 kb domain surrounding the NECRISK region (Figure 3). On the other hand, the Ensembl genome browser (http://www.ensembl.org/index.html) shows a single predicted gene in the same region (data not shown). Using another gene prediction program (Softberry; http://www.softberry.com/), we identified five additional potential transcripts in the ±100-kb domain around the NECRISK region.

Figure 3
figure 3

The NEC risk (NECRISK) region and contiguous novel transcripts between the known genes of CSMD3 and TRPS1. The NECRISK region is shown in dark blue, whereas potential novel transcripts identified by alignment of RNAseq results with the human genome are shown in yellow.

To identify predicted and novel transcripts, we performed reverse transcription-polymerase chain reaction and RNAseq on RNA extracted from small intestine samples obtained at the time of bowel resection from two premature neonates. We selected one specimen from an infant who had bowel resection due to NEC and another specimen (control) from an infant with ileal atresia. Exon-spanning reverse transcription-polymerase chain reaction attempts failed to verify the existence of any of the predicted transcripts. RNAseq data were aligned to the genome, as described in Methods. Alignments between the identified contiguous RNA sequences (contigs) and the human genome were done, along with alignments of human expressed sequence tags. Expressed sequence tags are sequences derived from cDNA libraries that match the human genome. All the identified RNA contigs matched the human genome and corresponded to ESTs matching at the same position. The RNAseq did not reveal any alignments that would have matched the predicted transcripts in this region. The RNA sequence that aligned closest to the NECRISK region was located on the negative strand 200 kb from the NECRISK region. This RNA contig contains an open-reading frame encoding 419 amino acids that exhibited a 96% identity to the long interspersed element-1 (LINE-1) retrotransposable element. The other identified RNA contig sequences all contained potential open-reading frames, including the contig at +5 kb corresponding to the first exon of TRPS1. This can be viewed as a “positive control,” indicating that our strategy of transcript identification can predict known genes.

Pathway analysis performed using SNP results from the initial GWAS results indicated 52 reactome pathways (www.reactome.org) significant at the false discovery rate <0.15 and P<0.01 (Table 4). These pathways include many that are involved in growth factor receptor (type I insulin-like growth factor receptor, fibroblast growth factor receptor, and epidermal growth factor receptor) signaling, eicosanoid signaling, and T-cell regulation (cluster of differentiation 28, cytotoxic T-lymphocyte associated protein 4), as well as signaling mediated via calcium (cytosolic Ca2+, calmodulin, Ca-permeable kainate receptor, etc.) and G proteins. Interestingly, the phototransduction cascade is also represented, perhaps due to the high representation of G proteins in this pathway.

Table 4 Pathway analysis listing reactome pathways significant at FDR <0.15 and P<0.01

Discussion

In this study, we used GWAS on a relatively large cohort of extremely-low-birth-weight neonates to identify genetic susceptibility to NEC using very stringent inclusion and exclusion criteria. The choice of extremely-low-birth-weight patient population was dictated by the goal of achieving a relatively high case–control ratio; that is, the population at the highest risk for NEC. This was necessary both for improving statistical power and minimizing the cost of genotyping. The stringent inclusion criterion of surgically verified NEC was chosen because it reduces the possibility of diagnostic error. Finally, we excluded all data when death occurred before postnatal day 7, because NEC is rare during this time period, whereas the potentially confounding spontaneous intestinal perforation is relatively frequent at this time. As shown in Supplementary Table 3 online, broadening the inclusion criteria to NEC or death, or inclusion of stage II NEC notably reduced the statistical significance of association. Furthermore, when stage II NEC was analyzed alone, there was no statistically significant association between minor allele frequency in the NECRISK region and stage II NEC (Supplementary Table 4 online). We reason that the stronger genetic association with surgical NEC is due to a more precise diagnosis, that is, a direct visual verification of the necrotic bowel, as opposed to weaker association with stage II NEC, where the diagnosis is indirect and may be less precise. We identified a very strong genetic association between surgical NEC and a 43-kb intergenic region of chromosome 8, delineated by 25 SNPs of the Illumina bead chip array that was used. This association was consistent across the three main ancestral groups in the cohort and conferred an odds ratio of 4.3 of risk for surgical NEC.

The same cohort of patients and the same genotyping data used herein were used earlier in GWAS to interrogate the potential genetic basis for sepsis (32), intraventricular hemorrhage (23, 30), and bronchopulmonary dysplasia (33). Although all these GWAS identified significant associations between SNPs or SNP clusters with the aforementioned neonatal morbidities, none of those were in the same region of chromosome 8 as the NECRISK region, or in regions on chromosomes 11 and 14 that we found to be associated with NEC. These data indicate that specific genetic variations may underlie specific neonatal pathologies in different organs and that the NECRISK region is specific to be associated with NEC.

NEC has been considered to be a multifactorial disease in which prematurity, variations in clinical practice, and altered microbial colonization have been recognized as the main contributors to pathogenesis. Genetic susceptibility has been suspected and there have been several studies addressing the role of genetics by targeted analysis of specific SNPs in genes potentially associated with NEC pathogenesis. Although the majority of these efforts were based on reasonable hypotheses, the chance of identifying the most significant contributors of genetic predisposition by such targeted approaches is practically negligible. The human genome has at least 38 million SNPs (15), making genome-wide association studies the only viable strategy to identify the most significant associations with disease susceptibility. Even with the 1 million SNP coverage of the Illumina chip used in the present study with imputation of 7 million other SNPs, there is a high chance of missing potentially highly significant SNPs, and this coverage is best suited to identify regions of the genome that associate with disease severity as opposed to identifying single SNPs. Indeed, the three most significant findings in our study pointed to clusters of SNPs as opposed to single SNPs.

Surprisingly, the NECRISK region that showed the most significant correlation with the incidence of surgical NEC in our patient population is located on an intergenic region where the nearest two known genes are ~0.5 and ~1.5 Mb away. These distances are beyond the 100-kb limit that is generally considered to be the distance that allows direct regulation of genes by common mechanisms. To analyze alternate mechanisms that are distinct from regulation of known distant genes, we performed in silico analysis as well as RNAseq to identify potential novel transcripts that are within a closer distance to the NECRISK region. RNAseq identified a number of potential transcripts between the CSMD3 and TRPS1 genes. The nearest potential novel transcript to the NECRISK region is 96% homologous to the LINE-1 retrotransposon. The LINE-1 sequence comprises 17% of the human genome, and leads to genomic diversity and alters gene function (34). There are at least three major mechanisms by which LINE-1 affects the genome. The main mechanism that is known to occur in all eukaryotes is that LINE-1 activation results in a coupled reverse transcription integration event referred to as target-primed reverse transcription, thereby resulting in multiplication of the LINE-1 loci that may result in gene interruptions or effects of gene expression in the vicinity of integration. A less-common, but well-recognized role of LINE-1 is the transposition of nonautonomously mobile sequences. The most common elements transposed this way are Alu (named after the Alu1 restriction endonuclease site that it contains) and SINE-VNTR-Alu-s (SVAs). In addition, LINE-1 retrotransposon may mediate nonallelic homologous recombination, which is a key mechanism in structural variants or copy number variants in the human genome. There are an abundance of structural variants that have been reported in the immediate vicinity of the NECRISK region. Therefore, we speculate that one of the potential mechanisms that may mediate the effects of the NECRISK region on the incidence of NEC is a de novo structural variation of chromosome 8 involving LINE-1. This hypothesis is supported by the fact that the NECRISK region is a large segment of the genome, as opposed to a single or small group of SNPs. Validation of this hypothesis is beyond the scope of the present study. Murine studies have demonstrated that retrotransposition of LINE-1 is induced by inflammation in colonic mesenchymal cells, and is associated with the severity of colitis (35), although it is not known if similar results will be obtained in murine NEC models.

In addition to the identification of the NECRISK region, it is notable that there are several other SNPs that showed association with NEC. Intriguingly, the next two most significant associations also involve SNP clusters, albeit they are much smaller clusters than the NECRISK region. Although the NECRISK region seemingly does not involve known genes, the next two most significant clusters involve known genes. The second most significant cluster involves the ADCY4 and LTB4R genes. Although neither of these two genes have been directly implicated in the pathogenesis of NEC, both are involved in regulating signaling in epithelial cells and in regulating inflammation. Intestinal epithelial cells express various adenylyl cyclase isoforms, including ADCY4, and their expression levels are regulated during cellular differentiation (36). In turn, cyclic AMP, the product of adenylyl cyclase, is a key regulator of epithelial functions such as ion transport, proliferation, migration, apoptosis, membrane recycling, and macromolecule secretion (37, 38, 39, 40). Intriguingly, LTB4 plays a significant role in a Toll-like receptor 4 and cyclooxygenase-2-mediated mechanism of intestinal ischemia–reperfusion injury (41). This finding may be related to a potential role for LTB4 in NEC as Toll-like receptor 4 has been well documented to play a role in animal models of NEC (42, 43). Notably, the LTB4R gene encodes an eicosanoid receptor, and pathway analysis revealed eicosanoid receptor signaling as the second most prominent pathway affected by NEC-associated SNPs. Variation at these two gene loci may be associated with cAMP or eicosanoid signaling and potentially be associated with vascular dysfunction that predisposes to NEC. The third most specific cluster of SNPs corresponds to the genomic location of the neurogranin gene. Although neurogranin used to be considered to be exclusive to the brain, it has been shown to play a role in interleukin-2-dependent survival of T cells, establishing the feasibility of a role in immune/inflammatory signaling (44).

NEC is primarily a disorder of the immature developing intestine, and our pathway analysis confirmed many that are involved in growth factor receptor (type I insulin-like growth factor receptor, fibroblast growth factor receptor, and epidermal growth factor receptor) signaling that may be involved in normal gut development. NEC is also characterized by inflammation and necrosis, and pathways related to eicosanoid signaling, T-cell regulation (cluster of differentiation 28, cytotoxic T-lymphocyte associated protein 4), signaling mediated via calcium (cytosolic Ca2+, calmodulin, Ca-permeable kainate receptor, etc.), and G proteins may contribute to predisposition to intestinal injury, apoptosis, or repair pathways.

The strengths of our study are the relatively large cohort (for studies in premature neonates), stringent diagnostic criteria, and the strong genetic association between genotype and phenotype. The weaknesses of the study are our inability to identify a large validation cohort with sufficient information regarding the clinical course that would permit the same stringent criteria that we used in the discovery cohort. However, it is encouraging that despite the low power in the validation cohort, one SNP within the NECRISK cluster exhibited a statistical significance and a similar enrichment in the NEC cases vs. controls compared with our discovery cohort. Furthermore, due to the intergenic location of the NECRISK cluster and the consequent multiple potential mechanisms for the effect, we were unable to elucidate the mechanism of action that is responsible for disease susceptibility by genetic variation.

In summary, in this study, we identified a very strong genetic association between surgical NEC and an intergenic region of chromosome 8, which we labeled the “NECRISK” region, although we were unable to validate this finding in a different GWAS data set. RNA sequencing identified a RNA sequence similar to the LINE-1 retrotransposable element that aligned to the negative strand 200 kb from the NECRISK region. Pathway analysis identified pathways related to growth factor, calcium, and G-protein signaling, as well as other pathways associated with inflammation and injury that may contribute to NEC.