Introduction

Growth and maturation of the gastrointestinal tract occurs on a set developmental ontogeny1,2. Although some aspects of intestinal development appear to be hardwired and do not emerge until a specific gestational age3, others are influenced by dietary intake1 and microbial colonization4. Accordingly, the early neonatal period is a critical phase for both intestinal digestive development as well as establishment (colonization) of the intestinal microbiota5. Functional immaturity of the gut compromises nutrient absorption and utilization in the preterm3. This digestive immaturity, coupled with immature mucosal barrier function, immune response and inappropriate bacterial colonization, makes premature neonates particularly susceptible to intestinal inflammation and injury and the development of necrotizing enterocolitis (NEC), a potentially lethal bowel disorder3,6,7,8.

The intestine is lined by epithelial cells that process nutrients and provide the first line of defense against pathogens. Because colonization of the intestine with non-pathogenic, or commensal, bacteria is vital for infant health, it is important to understand how epithelial cells and the microbial ecosystem are modulated by diet and disease9. Therefore, our on-going efforts are directed at understanding the regulation of neonatal development by components present in human milk, as it is the gold standard for infant nutrition10,11,12. Advancing knowledge in this area has been limited by an inability to directly assess epithelial cell biology in the healthy newborn intestine. Previously, to that end, we developed non-invasive high throughput microarray techniques to examine intestinal gene expression in stool samples11 and to probe the cross-talk between host gene expression and the microbiota12. This methodology has the advantage of using exfoliated cell mRNA directly isolated from feces, which contains sloughed small intestinal and colon cells and therefore does not require invasive procedures or discomfort to the subject.

In this study, we applied our non-invasive methodology, which has been optimized in the term infant, to the extremely preterm infant (24–30 weeks gestational age). With respect to neonatal health, it is well known that extremely preterm infants are a vulnerable population whose intestinal and immune development is modulated early in the postnatal period by multiple environmental factors, such as medications, indigenous microorganisms in the intensive care unit and limited enteral diet13. Using global transcriptome RNA Sequencing (RNA-Seq) profiling, we compared host transcript abundance and alternative splicing in healthy term and extremely preterm infants. This novel application of RNA-Seq to measure host gene expression has, for the first time, provided insight into host responses to dietary and environmental influences in the early neonatal period.

Results

Fecal samples were obtained from three term and three preterm infants and enriched for host mRNA transcripts through poly A+ RNA selection. We also considered the effects of pooling by sequencing a sample consisting of multiple term infants. Approximately 50 M reads were sequenced for each individual and 30 M reads for the pooled sample and on average 5075 human genes were detected above an FPKM threshold of one despite 55–90% of the reads mapping to a representative set of microbial genomes (Supplementary Table 1). Complementary analyses were conducted using ERCC spike-in RNA controls, sample correlations and qPCR in order to validate the host mRNA transcript data.

In general, transcripts consisted of hundreds of reads that accumulated in a narrow region in or close to the 3′ UTR. A typical example is RefSeq id NM_000482, representing apolipoprotein A-IV (Supplementary Figure 1). Since bacterial RNA in stool is much more abundant than human RNA14, the first objective was to ascertain the efficacy of host poly A+ RNA enrichment, library preparation and sequencing procedures. We note that poly A+ RNA enrichment precludes the use of these data to probe microbiome diversity or transcript levels due to the bias this procedure might place on the microbial transcripts. ERCC spike-in controls added at step A in Figure 1 were used to verify the integrity of sample handling from steps A to B. The ERCC controls reflect a diverse set of sequence content and length, have low homology with eukaryotic transcripts and span a large range of concentrations. The amount of observed reads from the six individual samples, which mapped to known ERCC transcripts, is plotted in Figure 2. The observed ERCC reads were highly correlated between samples (all Pearson correlation coefficients > 0.998, all Spearman correlation coefficients > 0.992). Additionally, each sample correlated strongly to the known concentrations (all Pearson correlation coefficients > 0.88, all Spearman correlation coefficients > 0.9, see methods for details). Furthermore, using known quantities and concentrations of exogenous ERCC transcripts, we observed reads from mRNA species present at amounts as low as 800 molecules in our samples.

Figure 1
figure 1

Fecal samples were processed to enrich eukaryotic mRNA, develop libraries and assess bioinformatic sequencing content.

The steps in cyan were applied to mRNA processing and the magenta steps were applied to sequenced read data. Step A, ERCC controls were spiked-in to determine processing efficacy and reproducibility up to and including step B.

Figure 2
figure 2

Observed reads were mapped against known ERCC reference sequences and read counts were compared against known amounts added in the spike-in control.

High correlations between samples (all Pearson correlation coefficients > 0.998, all Spearman correlation coefficients > 0.992) and to known concentrations (all Pearson correlation coefficients > 0.88, all Spearman correlation coefficients > 0.9, see methods for details) indicate that the sequencing and mapping procedures are effective and reproducible across a variety of transcript lengths.

We subsequently validated RNA-Seq against qPCR as in15. For this purpose, eleven differentially-expressed genes, as detected by RNA-Seq, were selected based on the fact that in the RNA-Seq data set expression of each gene in all term infants was either higher or lower than each gene in all preterm infants. In addition, we selected genes with FPKM greater than 10 since genes expressed at a lower level are typically difficult to detect by qPCR. Nine of the eleven genes measured (Figure 3) demonstrated fold-changes of similar magnitude and direction by qPCR and RNA-Seq (Supplementary Table 2), while SLC2A1 and RPS16 differed in directionality of differential expression. By comparison, a similar RNA-Seq validation using qPCR in the literature observed four out of five DE genes replicated16. In addition, the average Spearman correlation coefficient between qPCR and RNA-Seq fold changes calculated was 0.586 and average Pearson correlation coefficients were 0.570 with an average line of best fit with slope near unity at 0.869 (Supplementary Figure 2).

Figure 3
figure 3

Eleven differentially expressed genes were selected for validation using qPCR.

Nine of the eleven genes exhibited average-fold changes in the same direction of change as qPCR, with SLC2A1 and RPS16 being the only exceptions. By comparison, a similar RNA-Seq validation using qPCR in the literature observed four out of five DE genes replicated36. Additionally, correlation analysis (method described in Figure 2 from16) (Supplementary Table 1 and Supplementary Figure 2) generated average Pearson correlation coefficients of 0.57 and average Spearman correlation coefficients of 0.59 with an average best fit slope line of 0.87.

To obtain a benchmark for comparison, the average quantified gene expression for three preterm and three term infants were compared. Overall, global transcriptional profiles compared favorably with high correlation for highly expressed genes (Figure 4). In addition, expression correlation between individual samples verified the presence and detection of human epithelial mRNA transcripts in the fecal samples (Supplementary Figure 3, Figure 5a). Examples of genes associated with specific intestinal cell types included absorptive enterocytes (lactase, 7.4-fold higher expression in preterm than term and sucrase – isomaltase, 1.7-fold higher expression in term than preterm), Goblet cells (muc-2, 1.3-fold higher expression in preterm than term), enteroendocrine cells (chromogranin A, 12-fold higher expression in term than preterm) and Paneth cells (lysozyme, 1.6-fold higher expression in preterm than term). In some cases where small volumes of stool are available, pooling of samples may be necessary. Therefore, the effect of sequencing pooled samples versus individual samples from term infants was assessed. Individual samples and their pooled counterparts exhibited homogeneity as determined by Spearman correlation coefficients plotted in a heatmap (Figure 5a). Interestingly, the preterm individuals appeared more heterogeneous amongst themselves and as compared to the term samples. This is potentially a result of the larger clinical variation of the preterm samples in terms of nutrition and gut health. In addition, when considering genes expressed at an arbitrary cutoff threshold (>10 Fragments per Kilobase per Million [FPKM]), the number of genes expressed (Figure 5b) in common among the four term infant samples (three individual and one pooled) was larger than the number of genes expressed in common by any subset of these samples. Therefore, the term individual samples and the pooled sample were more similar than dissimilar.

Figure 4
figure 4

Expression profiles of reads to mapped human genes show good between-group correlation on average.

This indicates that the detection of similar expression profiles is likely from the similar tissue types present in both sets of samples.

Figure 5
figure 5

(a) Pearson correlation coefficients among normalized mapped read counts (see Methods for details) for three preterm, three term individuals and a pooled term sample. Term samples are visibly correlated amongst each other, whereas higher heterogeneity amongst the preterm population is expected given their differing treatment regimens and developmental stages. A similar correlation heatmap using Spearman correlation coefficients is shown in Supplementary Figure 6 for completeness. (b) A four-way Venn diagram shows the number of expressed genes (>10 FPKM) among three individual term samples and a sequenced pooled sample. In the center of the diagram, 2132 genes are expressed in the pooled sample and all three individuals at greater than 10 FPKM. This large number of shared genes indicates that the sequencing procedure is consistent across sets of samples.

Functional gene set enrichment analysis was performed using Ingenuity Pathway Analysis (IPA) pathway profiling17 to probe the biological relevance of the differentially expressed genes between preterm and term infants. For this purpose, differentially expressed genes with a p value cutoff <0.05 between groups were associated with canonical gene networks using the Ingenuity Knowledge base (Figure 6). Broad differences in genes associated with lipid metabolism, molecular transport, organismal injury, infectious disease and cellular development were observed. This is further highlighted in Table 1 (preterm > term expression) and Table 2 (term > preterm gene expression), where differentially-expressed genes associated with these networks are documented. preterm infants expressed numerous genes associated with immune cell function (e.g., CASP1, IL-1beta, IL-33, NFkB1A, S100-A9, SOCS3 and TREM-1) and lipid metabolism (e.g., ApoA1, ApoA4, ASAH1, MTM1 and PLIN2) (Table 1). In contrast, term infants exhibited highly up-regulated expression of genes associated with regulation of cell growth/cell cycle (e.g., CDKN2B, ESRRA, INSR, KREMEN2, MTRNR2L6, PDPK1 and TRIM36) (Table 2).

Table 1 Representative differentially expressed genes that were significantly higher in preterm infants versus term infants
Table 2 Representative differentially expressed genes that were significantly higher in term versus preterm infants
Figure 6
figure 6

Network enrichment analysis (Ingenuity IPA software) was performed using differentially expressed genes.

Three networks of interest (a) Lipid metabolism, molecular transport, small molecule biochemistry; (b) Neurological disease, organismal injury and abnormalities and infection disease; and (c) Cellular development, tissue development, lipid metabolism were generated. Red indicates higher expression in preterm infants and green indicates higher expression in term infants.

Discussion

Nutritional regulation of intestinal development begins in utero with exposure to protein-rich amniotic fluid and continues after birth with human milk and/or infant formula1,10. These developmental processes are essential for continued cellular differentiation of the gut and development of mucosal immunity18. In the healthy term infant, the continuum of enteral stimulation is continued postnatally, whereas the preterm infant is typically supported on parenteral nutrition with limited enteral stimulation in the first few weeks of life3. In addition, postnatal exposure to environmental organisms in the neonatal intensive care unit and the routine use of antibiotics can lead to aberrant intestinal development, microbial colonization and risk of intestinal disease in the preterm infant3,6. Hence, it is imperative to understand the transcriptional responses of the preterm gut so that specific nutritional practices can be employed in order to optimize intestinal development.

Sensitive noninvasive tests will become critical tools in tailoring nutritional interventions, including pre- and probiotics, in order to promote intestinal development and maturation in the growing infant. As part of this effort, our laboratory has developed a molecular methodology that utilizes stool samples containing intact sloughed epithelial cells in order to noninvasively quantify intestinal gene expression profiles in both the human infant11,12 and adult19. Systems biology approaches, such as computational linear discriminant analysis (LDA), were used to identify the best single genes and two- to three-gene combinations for distinguishing term breast-fed versus formula-fed groups11. In addition, putative “Master” regulatory genes were identified using coefficient of determination (CoD) analysis11. Collectively, these approaches can be used to identify mechanistic pathways of intestinal development in the first few months of life and to assess the impact of nutrition and other environmental exposures on the microbiome in the developing gut12. In this study, we have extended upon our initial observations by unraveling previously inaccessible complexities in the term vs preterm infant intestinal transcriptome by non-invasively interrogating the infant intestine using RNA-Seq, rather than gene microarray11.

In order to develop a more comprehensive understanding of the complexity of transcriptome profiles in the intestine, we utilized neonatal stool samples containing intact sloughed epithelial cells and generated large-scale RNA-Seq genome profiles. For this purpose, poly A+ mRNAs were first copied into DNA sequences, randomly sheared, attached to linkers and directly sequenced. Sequences were compared with the reference human genome and the density of corresponding reads determined. Furthermore, using this form of global digital transcriptome profiling, we documented the host transcript abundance and alternative splicing in healthy term infants at 12-weeks postnatal age and extremely preterm infants (24–30 weeks gestational age) at 2–5 weeks postpartum. Although the precise origin of exfoliated cells is not known, results from our previous study11 and reported herein, indicate that genes associated with discrete epithelial cell types (absorptive enterocytes, goblet cells, enteroendocrine cells and Paneth cells) are detectable. Thus, it is likely that transcriptome signatures of both the small and large intestine can be monitored over time.

The examination of global alterations in gene expression offers insight into the effects of premature birth and the resultant influence of environmental exposures uniquely experienced by the preterm infant (e.g. antibiotics, other medications, prolonged period of parenteral nutrition) on intestinal mRNA profiles. RNA-Seq (validated by qPCR) revealed that following an enrichment process, reads from stool derived RNA are of human origin. Unlike RNA extracted from human cell cultures or surgical specimens, where the quality and quantity of RNA is usually high and RNA degradation can be controlled with tissue handling20, infant stool represents a unique biological sample. Typically, expressed host transcripts consist of a narrow stretch of RNA that is rarely longer than several hundred bp. While exon-exon junctions in principle can be detected, we noted that less than 5% of transcripts exhibited their splice variants. In this respect, non-invasive gene expression analysis using infant stool appears to be more challenging than analysis of formalin fixed, paraffin embedded (FFPE) tissue21,22,23. Unlike FFPE tissue blocks that can yield large quantities of fragmented DNA and RNA sufficient to explore complete topologies of expressed genes and local properties of DNA, infant stool requires careful enrichment of human RNA. Our experience with gene expression in the infant gut indicates that next generation sequencing provides a robust non-invasive glimpse into the host transcriptome. We expect that the development of novel methodologies for library preparation will allow us to further elucidate the physiology of the developing infant intestine.

For obvious reasons, directly examining the host epithelium in the human preterm infant is unlikely, as intestinal biopsies are not routinely performed unless medically indicated. Therefore, noninvasive methodologies currently provide us the best snap shot of infant gene expression11 and host-microbe dynamic interactions12 in an in vivo setting. Although future well-controlled studies are needed to evaluate environmental/dietary exposures, this study highlights the potential of using the described noninvasive technology. We evaluated genes that were over-expressed in preterm vs. term (Table 1) or term vs. preterm intestine (Table 2). Although none of the infants were clinically ill at the time the stool samples were collected, one of the major categories of genes overexpressed in preterm vs. term samples was immune function. Because of the vast number of exfoliated epithelial cells shed from the lining of the intestine on a daily basis, it is unlikely that changes in cell composition, e.g., contribution of inflammatory cells from the submucosa, directly contributed to alterations in gene expression. Several cytokines, including IL1α and IL-33 were up-regulated in preterm vs. term. In addition, several genes that regulate the expression of cytokines and other immune genes were expressed at 3- (NFKB1α) to 6-fold (CASP1) higher levels in preterm vs. term infant exfoliated cells. Previous studies have shown that immortalized cells isolated from fetuses (H4 cells) or tissue explants from fetuses mount a more robust proinflammatory cytokine response (IL-8) after inflammatory stimulation with lipopolysaccharide or IL1β than cells from adult tissue (Caco-2) or explants from older children24. The excessive inflammatory response of the immature intestine is in part due to a developmental under-expression of IkB25 as well as overexpression of the NFkB/MyD88 innate inflammatory genes (TLR2, TLR4, MyD88, TRAF-6, NFkB1 and IL-8) and under-expression of negative regulator genes (SIGIRR, IRAK-M, A-20 and TOLLIP) in fetal intestine relative to older children8. Thus, it appears that activation status of the intestinal innate immune response may contribute to excessive inflammation in the immature intestine in response to colonizing bacteria, which is a hallmark of NEC8.

In exfoliated cells of term infants, up-regulated immune genes were associated with balancing the immune system, e.g. promoting T-cell development (LCP2; 3.6-fold greater in term than preterm), while inhibiting macrophage activation (LENG9; 16-fold greater in term than preterm). The majority of genes were involved in cell turnover, by regulating proliferation and apoptosis. One of the most highly differentially-expressed genes was an anti-apoptotic factor (MTRNR2L6), which was 5-fold higher in term than preterm. Another notable gene is SP3 (~2-fold higher in term than preterm), which is a transcription factor that can be regulated through short-chain fatty acid - acetylation, potentially supporting the role of these products of microbial metabolism in regulating normal gut growth in term infants26.

In summary, we provide incontrovertible evidence that whole-genome sequencing of stool-derived RNA can be used to generate a global transcriptome gene expression signature in neonates. We have also compared for the first time, the intestinal global transcriptome in individual term and preterm and pooled term infants. Our findings provide insight into the global patterns of gene expression that vary in exfoliated epithelial cells of term and preterm infants. We anticipate that the described noninvasive RNA sequencing-based approach will enable elucidation of how the bedside clinical management of an extremely preterm infant population influences intestinal gene expression. With this understanding, dietary and medical practices can be evaluated that optimally promote intestinal development and, ideally, identify those clinical practices that approximate as closely as possible the development of the healthy, term breast fed infant. The possible uses of non-invasive high-throughput RNA sequencing data are vast and include early detection (screening), monitoring disease progression, risk assessment and diet-dependent interaction between gut microbiota and host epithelium. We propose that stool samples containing exfoliated cells have the potential for generating comprehensive, diagnostic gene sets for the noninvasive identification/prediction of different intestinal phenotypes in infants.

Methods

Subject recruitment

The experimental human protocol for term infants was approved by the University of Illinois and Texas A&M Institutional Review Boards and for preterm infants by the Beth Israel Deaconess Medical Center and Texas A&M Institutional Review Boards. Informed consent was obtained from parents prior to participation in the study and all experiments were performed in accordance with relevant guidelines and regulations. Details of the study admission criteria and protocols for term infants have been previously described11. Briefly, healthy, term exclusively BF or FF infants were eligible for enrollment into the study. All infants were considered term and gestational age was similar for BF (39.7 ± .08 weeks; range: 38 5/7 to 41 3/7 weeks) and FF (39.7 ± 0.4 weeks; range: 38 5/7 to 40 4/7 weeks) infants. For each term infant, a stool sample was collected at three months-of-age. In addition, stool samples were collected from six extremely preterm (24–30 weeks gestation) infants admitted to the Beth Israel Deaconess Medical Center neonatal intensive care unit (NICU). The six preterm infants had a gestational age ranging from 24 1/7 weeks of gestation to 30 0/7 weeks. Four infants under 30 weeks gestational age had a stool sample collected at 4–5 weeks of life, while two preterm infants at 30 weeks gestational age had a stool sampled at 2 weeks of life. All six preterm infants were on full enteral feedings (1 formula; 2 breast milk; 3 mixed formula and breast milk). For full metadata for term and preterm infants, please consult Supplementary Tables 3 and 4.

Sample preparation and sequencing

PolyA+ RNA was isolated from stool samples from term and preterm infants as previously described27. ERCC was diluted 1:10 and 0.5 mL was added to each sample. Pooled polyA+ samples from 18 term infants or 6 preterm infants were processed with the NuGEN Ovation 3′-DGE kit (San Carlos, CA) to convert RNA into cDNA followed by NuGEN Encore NGS Library System I kit to create Illumina libraries, as per manufacturer's instructions. Sequencing on Illumina GAIIx and HiSeq 2000 platforms (San Diego, CA) were carried out using standard Illumina protocols on the Texas A&M University campus. Briefly, 70 ng of each sample were used to synthesize first and second strand cDNA, which was purified using Agencourt RNAClean XP beads (Brea, CA) included in the kit. The cDNA was linearly amplified using the NuGEN SPIA primer and cDNA quality and quantity were determined using an Agilent 2100 Bioanalyzer and Nanodrop spectrophotometer. Three micrograms of cDNA was fragmented using a Covaris S2 sonicator with the following settings: duty cycle 10%, intensity 5, cycles/burst 100, time 5 min. Fragmented samples were concentrated using the QIAquick PCR purification kit (Qiagen, Venlo, Netherlands) as per manufacturer's instructions. Samples were quantified using the Quant-iT kit (Invitrogen, Carlsbad, CA) and evaluated for proper fragmentation of 150 to 200 bp on an Agilent Bioanalyzer DNA 1000 chip. Following cDNA fragment repair and purification, Illumina adaptors were ligated onto fragment ends followed by amplification to produce the final library. Libraries were quantified using Quant-iT (Invitrogen) and run on an Agilent DNA Chip 1000 to confirm appropriate sizing and the exclusion of adapter dimers. Approximately 32 million, 36 bp reads were sequenced in each lane for the pooled samples on an Illumina GAIIx [SRA:SRR626229] and approximately 300 million, 50 bp single end reads on an Illumina HiSeq for the six individual samples [SRA:PRJNA182262]. Pooled and individual sample collection and processing is depicted visually in Supplementary Figure 4. Raw, de-multiplexed FASTQ files were examined using FastQC and determined to be of sufficient read quality and sequence, nucleotide and k-mer diversity. The number of ERCC molecules present in the samples was calculated using known concentrations of ERCC transcripts as published by Ambion and the quantity of RNA solution that was used for library preparation.

Next generation sequencing read alignment

A wide array of reference genomes were assembled to identify the rough composition of detected RNA reads (see Supplementary Methods). The UCSC hg19 human reference genome was used with annotation from the Illumina iGenomes resource28. Microbial, mitochondrial, fungal, viral and protozoan nucleotide databases were obtained from RefSeq version 59 which resulted in 18.5 Gb, 100 Mb, 2.3 Gb and 1.7 Gb genomes, respectively. 580 Mb of ribosomal sequences were obtained from SILVA release 11129, using both small and large subunit repositories. BWA30 was used to align reads against the bacterial reference genome due to its size. SNAP31 was used to align to mitochondrial, ribosomal and viral references and STAR32 was used for protozoan, fungal, human and ERCC reference genomes due to the possible presence of splice junctions. All alignment was performed using default tolerance parameters and considered acceptable based on the agreement with known and observed ERCC controls. We also compared our alignments against the slower TopHat aligner33 and obtained similar results. Read mapping results were visualized in Integrative Genomics Viewer, IGV34. Additional details regarding reference genomes and automated source code for analysis are available in the Supplementary Materials.

Gene expression quantification

As a genome-guided assembly method, Cufflinks was used to estimate gene expression for the human reference and HTSeq was used for all other references35,36. Due to the 3′ selection based enrichment and subsequent observed 3′ bias in read alignment, normalized read counts for expression correlation results were also used. Read counts were obtained using HTSeq and normalized by the mean expression of 3804 “house-keeping” mRNA species as determined by37. Correlation of ERCC measurements between samples and known concentrations were performed following log transformation and addition of 0.001 to the counts to avoid unbounded values. This was utilized for the Spearman correlation coefficient because of the logarithmic nature of the concentrations provided in ERCC spike-in kits. Differential expression testing was performed using Cuffdiff as available from the Cufflinks package. Genes with P-values < 0.05 were input into the Ingenuity IPA analysis tools (Ingenuity Systems Inc., Redwood City, CA) to assess pathway, biological function and upstream activity14 and are visualized in a volcano plot in Supplementary Figure 5 as produced by the cummeRbund package.

Real time PCR was used for gene expression confirmation. For this purpose, cDNA was prepared from fecal RNA from three term and three preterm infants using SuperScript II (LifeTechnologies, Carlsbad, CA) followed by PCR on an ABI 7900HT Real Time PCR System. TaqMan assays were purchased for ABCC5, APOA4, CASP1, DYNLL1, NFKBIA, PLIN2, PPAP2A, RPS16, SCNN1A, SLC2A1 and TMSB4X (LifeTechnologies). These genes were selected from a larger set of differentially expressed genes in term vs. preterm infants as identified in the IPA analysis.

Ingenuity pathway analyses

“Functional enrichment” analysis was performed using Ingenuity Pathway Analysis (IPA) version 2.0 software. To perform IPA analysis, all differentially expressed genes (P < 0.05) in the preterm or term infants were uploaded into three columns for the purpose of generating Illumina probe ID, t-value (fold change) and P-value data. P-values were uncorrected for multiple testing owing to the number of human fecal samples. By default, during IPA analysis, only molecules from the data set associated with the Ingenuity Knowledge Base repository (Ingenuity Systems Inc.) were considered. Functional Analysis identified the biological functions and/or diseases that were most significant to the data set. The significance of the association between the data set and the specific pathways of interest was determined in three ways: (a) as a ratio of the number of molecules from the data set that mapped to the pathway divided by the total number of molecules that mapped to the Ingenuity Knowledge Base pathway, (b) Fisher's exact test was used to calculate a P value determining the probability that the association between the genes in the data set and the pathway of interest could be explained by chance alone and (c) activation state (“increased” or “decreased”) was inferred by the activation z-score. The derivations of the z-scores are based on relationships in the molecular network that represent experimentally observed causal associations between genes and those functions.

“Canonical pathway” analysis was used to identify networks from the IPA library that were most significantly modulated across subject groups. Significance of the association between each data set and the canonical pathway was measured in 2 ways: (1) as a ratio of the number of molecules from the data set that mapped to the pathway divided by the total number of molecules that mapped to the canonical pathway and (2) Fisher's exact test was used to calculate p-values determining the probability that the association between genes in the dataset and each canonical pathway was explained by chance alone.