Cerebral palsy (CP) is a neurological syndrome of onset in early childhood characterized by impaired control of motor function (1), with a prevalence of ~2–3 per 1,000 live births in Western countries (2). The etiology of conditions, such as CP, that originate with exposures occurring in pregnancy and the perinatal period is difficult to study because our capacity to interrogate such exposures once disease is manifest years later is limited. We here report on the assessment of biological processes occurring around birth, via examination of the transcriptome in residual filter paper blood spots archived after newborn genetic screening in 53 singleton children with CP and 53 age-, gender-, and gestational age–matched controls.

Examination of the transcriptome has found little application thus far in etiologic epidemiology. mRNA is hard to retrieve from serum samples because ribonucleases and micro-RNAs degrade mRNA quickly once transcriptional activity is complete. Although blood is ordinarily collected in glass or plastic tubes, permitting such enzymatic activity to continue, newborn blood is directly spotted from a heel stick incision onto filter paper, completely drying at room temperature within 3 h (3,4). RNA degradation is an enzymatic process tightly controlled by ribonucleases, water-soluble enzymes that require one water molecule for each reaction (5). Drying of the sample interferes with this reaction and also limits access of ribonucleases to RNA.

White blood cells share >80% of the transcriptome with at least nine organs, including the brain (6); 82% of the 13,961 genes expressed in the brain, according to the Uni Gene database (, are expressed in blood (7). We hypothesized that gene sets reflecting four pathophysiological pathways to CP (inflammatory, hypoxic, coagulative, and thyroidal) would be dysregulated shortly after birth in children later diagnosed with CP and that this dysregulation would be detectable in newborn blood spots. We also performed exploratory gene set analyses using a database of clinically relevant gene sets (8). The hypothesized gene sets were obtained by searching the literature, aiming for one canonical gene set (i.e., based on expert opinion (9)) and one experimental gene set (i.e., based on experimental findings) for each pathway. For coagulation, we could only find a canonical gene set.


Comparability of Cases and Controls

Cases and controls were similar in gestational age, a matching criterion, but fetal growth was somewhat reduced in cases. More cases were admitted to newborn intensive care and had lower Apgar scores. Mean day of blood spot collection did not differ between cases and controls. Cases and controls were broadly comparable in socioeconomic and demographic characteristics ( Table 1 ).

Table 1 Characteristics of cases and controls at birth of child

Analysis of Seven Preselected Gene Sets

Each case–control pair received a score (expressed as a generally applicable gene set enrichment (GAGE) t-statistic) summarizing the case–control difference in the expression of all genes (from 31 to 200 genes, depending on the gene set). This score can be interpreted as the difference between cases and controls expressed as fractions of an SD. These score differences were then summarized across pairs and assessed for the statistical significance of the summarized difference, expressed as the global P value for the contrast of cases and controls for each gene set.

Three gene sets, all empirical, showed a significantly different gene regulation across the population of case–control pairs ( Table 2 ). The empirical inflammatory and asphyxial gene sets were both significantly downregulated in CP cases, as compared with controls, with effect sizes of −0.19 SD units and −0.16 SD units, respectively. The thyroidal gene set was significantly upregulated by +0.13 SD units. These three differences were highly significant either with (q value) or without (P value) correction for multiple testing.

Table 2 γ-GAGE analysis for seven gene sets representing four prehypothesized pathways

The distribution of GAGE t-statistics for each of the gene sets, which illustrates the gene set differences across case–control pairs, is shown in Figure 1 . The three significant gene sets showed many pairs with large interpair differences in gene expression, whereas the canonical hypoxic and thyroidal gene sets showed modest interpair differences. The coagulation gene set and the canonical inflammatory gene set showed virtually no differences. Heterogeneity across pairs in the direction of differential expression (either up or down) is notable for all three statistically significant gene sets ( Table 2 , Figure 1 ).

Figure 1
figure 1

GAGE t-statistics for the seven prehypothesized gene sets. (a) Canonical coagulation, (b) canonical inflammation, (c) empirical inflammation, (d) canonical asphyxia, (e) empirical asphyxia, (f) canonical thyroid, and (g) empirical thyroid gene sets. For each graph: x-axis: matched pair (total 53 pairs); y-axis: scale of GAGE t-statistic; each bar within each graph: the GAGE t-statistic of the gene set for each pair. GAGE, generally applicable gene set enrichment.

Individual Genes in the Fetal Inflammatory Response Syndrome Gene Set

The largest case–control differences in gene expression (0.19 SD) were seen for the fetal inflammatory response syndrome (FIRS) gene set. Differences in gene expression are often described as “fold changes,” i.e., the ratio of the degree of gene expression in cases as compared with controls, most conveniently expressed as the binary logarithm (log2n = logarithm to the base 2) of the fold change. Figure 2 shows the heat map of the log2 fold change of all genes in the FIRS gene set for all case–control pairs ordered by the magnitude of the GAGE t-statistics. The largest pair differences were seen in the following genes: S100A9 (S100 calcium-binding protein A9), S100A12 (S100 calcium-binding protein A12), ALOX5AP (arachidonate 5-lipoxygenase-activating protein), PGLYRP1 (peptidoglycan recognition protein 1), HP (haptoglobin), FLOT1 (flotillin 1), and FGR (Gardner-Rasheed feline sarcoma viral oncogene homolog) ( Figure 2 ).

Figure 2
figure 2

Heat map of FIRS gene set with pairs ordered by magnitude of GAGE t-statistics. (a) FIRS gene set in which the matched pairs are ordered by the values of the GAGE t-statistics of the pairs from most positive to most negative. (b) Heat map: x-axis: matched pairs in the same order as the upper graph; y-axis: gene names. Each small square represents log2 fold change of each of all genes of FIRS gene set of each of all pairs. Gradient scale for color from bluest (most negative log2 fold change or the gene expresses lowest in case vs. control) to white (log2 fold change is zero or the gene expresses equally in case vs. control) to reddest (positive log2 fold change or the gene expresses highest in case vs. control): −4 to 0 to +4. Gray color: absence of data (missing values) due to unmet filtering criteria. FIRS, fetal inflammatory response syndrome; GAGE, generally applicable gene set enrichment.

Assessing Significant Gene Set Findings in CP Subsets

The heterogeneity of GAGE t-statistics across pairs suggests the need to stratify on covariates such as gestational age and motor type. Table 3 shows the findings for each gene set for children born at 37 wk or later (n pairs = 33) and before 37 wk (n pairs = 20) and for those with quadriplegia (n pairs = 24), diplegia (n pairs = 15), and hemiplegia (n pairs = 13).

Table 3 Gene expression findings for three gene sets stratified by GA and CP type

FIRS showed a clear interaction with gestational age: among 20 preterm pairs, FIRS was significantly upregulated in CP cases, whereas among 33 term-born pairs, FIRS was significantly downregulated among cases. The FIRS upregulation seen in preterm cases was paralleled by the upregulation in diplegic cases, who are dominantly preterm. In parallel, the strongest contribution to downregulation of inflammation came from hemiplegic cases (n = 13), who were nearly all born at term. Quadriplegia also showed downregulation of FIRS but not as strongly as hemiplegia.

The empirical hypoxic gene set also showed significant upregulation in preterm cases and the opposite with term cases. Downregulation of the hypoxic gene set was seen in hemiplegia and quadriplegia. The thyroidal upregulation signal was derived entirely from preterms and from children with quadriplegia. The sample size precluded consideration of interactions between gestational age and motor typology ( Table 3 ).

Kyoto Encyclopedia of Genes and Genomes Gene Sets

We explored case–control differences for the 205 gene sets archived by the Kyoto Encyclopedia of Genes and Genomes (2009 version) (8). The five most upregulated gene sets in CP cases as compared with controls were ribosome, systemic lupus erythematosus, olfactory transduction, cell cycle, and oxidative phosphorylation. Downregulation was seen most strongly for three of the above-mentioned five gene sets, ribosome, cell cycle, and systemic lupus erythematosus, as well as for leukocyte transendothelial migration and regulation of actin cytoskeleton gene sets.

Using the approach of Storey (10,11) to control for the false-discovery rate, ribosome, systemic lupus erythematosus, and olfactory transduction were significantly upregulated; ribosome, leukocyte transendothelial migration, and regulation of actin cytoskeleton were significantly downregulated; and the ribosome gene set was significantly bidirectionally regulated.

The analysis for all individual genes available in the arrays reveals that no individual gene was significantly differentially expressed between cases and controls after adjusting for multiple testing. The lack of single gene expression differences confirms the value of gene set analysis for aggregating coordinated expression signals from related genes in gene sets in exploring pathophysiological pathways to disease ( Table 4 ).

Table 4 Most upregulated and most downregulated Kyoto Encyclopedia of Genes and Genomes gene sets in cases as compared with controls

Quantitative PCR Validation of mRNA Data

To validate our microarray findings, we used quantitative PCR (qPCR) techniques to examine the housekeeping genes ACTB (β-actin) and PPIA (peptidylprolyl isomerase A), both commonly used in the literature to validate microarray findings. To validate genes differentially expressed, we selected FCGR2A (Fc fragment of IgG, low-affinity IIa receptor), a representative gene of the lupus pathway from the Kyoto Encyclopedia of Genes and Genomes database.

For ACTB and PPIA genes, the correlation coefficient between the log2 intensity of microarray data and mean CT (cycle threshold) value of qPCR was −0.52 (P < 0.0001). For FCGR2A, the correlation coefficient between the log2 intensity of microarray data and mean CT of qPCR was −0.43 (P < 0.0001), and the correlation coefficient between log2 fold change of microarray data and log2 fold change of qPCR data was 0.38 (P = 0.0197).


Principal Findings and Their Direction

We found three gene sets, each reflecting a separate pathophysiological pathway, to be significantly dysregulated in newborns who later developed CP. The direction of the signals was not entirely as expected. Although the upregulation of inflammation noted among prematures later developing CP is consistent with several previous studies (12,13,14), the significant downregulation of inflammation in term CP and upregulation of thyroidal function in premature CP seem opposite to what might be expected on the basis of the literature (15,16). However, dysregulation of a molecular pathway, as measured by changes in gene expression, can result from the influences of several types of cellular control processes, not all in the same direction. For example, mitochondrial defects leading to decreased energy output activate a transcriptional response to increase energy output, registered as increased expression of mitochondrial genes (17). It is thus more useful to consider dysregulation in either direction as indicating some perturbation of cellular control systems involved in the pathophysiological pathway of interest.

Validity Studies of Archived Newborn Blood Spots

In Michigan, spots were normally collected on day 1 or 2 of life, allowed to dry in ambient conditions for 4 h, and sent to the Michigan Department of Community Health, where, after testing, they were stored in boxes at room temperature. Since 2009, with the establishment of the Michigan Biotrust for Health (18), all new specimens are stored at −20 °C. None of the spots used in this study were frozen.

Four sources of evidence support the validity of the mRNA data obtained from our samples:

  1. 1

    Published research by others: Several studies have shown that even after 20 y of unfrozen storage, mRNA can be recovered from unfrozen blood spots (19,20,21,22).

  2. 2

    Our own published work: We have published two papers demonstrating the reliability and validity of the archived unfrozen newborn blood spots we use in this study. Haak et al. (23) have demonstrated the replicability (R > 0.90) of microarray gene expression, and Resau et al. (24) have shown that sex-specific genes such as XIST and KDM5D are distinguishable in male and female archived unfrozen neonatal blood spots. A third paper by Khoo et al. (25) shows the high sensitivity of gene detection in blood spotted on filter paper with the Agilent 4 × 44K microarray system (Agilent Technologies, Santa Clara, CA) used in this study.

  3. 3

    qPCR confirmation of microarray findings: In our earlier study (23), we showed that using qPCR, we could identify two housekeeping genes (HKI and MRPLII) and two inflammatory genes (ITGAX and NFKB-1) in archived unfrozen blood spots. As shown above, in the current study, we move beyond qPCR identification and show the strong correlation between the amount of mRNA detected by microarray and qPCR for another three genes.

  4. 4

    Unpublished pilot data comparing frozen and unfrozen newborn blood spots: Microarray was performed in our laboratory on paired spots from the same newborns studied 10 y ago. Frozen (−80 °C) blood spots obtained from five Michigan newborns as part of a research study were compared with the blood spots archived by the state after genetic screening obtained at approximately the same age. The average Pearson correlation coefficient in total gene expression between frozen and unfrozen spots was 0.76.

Exploratory Gene Set Analyses

Our exploratory analysis of the Kyoto Encyclopedia of Genes and Genomes gene set reinforced the importance of inflammatory processes in CP, given that several gene sets related to such processes were significantly dysregulated. The lupus gene set, for example, contains genes such as those related to T-cell and B-cell receptor signaling pathways and cytokine–cytokine receptor interaction. Several Kyoto Encyclopedia of Genes and Genomes inflammatory gene sets that differed in case–control expression showed the same interaction with gestational age as FIRS, with upregulation in preterm-born CP cases and downregulation in term-born CP cases.

Sensitivity Analyses

To determine whether values resulting from filtering procedures affect the results, we repeated the analyses with imputation for missing values and found nearly identical gene set findings. We also performed permutations on randomly selected gene sets of different sizes to evaluate the false-positive rate of the GAGE method on our actual microarray data. The probability of false-positive findings of the GAGE method is <0.05 when the significance level for the test is set at <0.05. (A detailed description is in the Supplementary Methods online.)

In conclusion, we found that archived unfrozen filter paper blood spots, even after many years of storage, retain enough mRNA to describe a distinct gene expression profile in neonates who later develop CP. The pathophysiological pathways we identify get us closer to understanding the changes that take place in the interval between prenatal or perinatal brain insult and the clinical manifestation of the CP syndrome. Our results also underline the scientific promise of using the large collections of state-archived newborn blood spots to develop compendia of expression profiles in disease states characterized by perturbations in gene expression that can be captured shortly after birth.


Study Subjects

Children with clinically verified CP aged 2–15 y (n = 53) were enrolled in specialty clinics in Lansing, Ann Arbor, and Grand Rapids, MI, if they had Gross Motor Function Classification System scores >1 and did not have a recognized malformation syndrome accounting for the CP. Control subjects were enrolled in primary-care settings in Michigan and were individually matched on year of birth, sex, and gestational age in four categories (<28, 28–32, 33–37, and >37 wk) and were free of major brain disorders. Birth certificates, maternal and infant hospital discharge abstracts, and residual newborn blood specimens were retrieved from the Michigan Department of Community Health, after obtaining written informed consent from the mother or guardian of the study subject. Case and control mothers were interviewed by telephone about reproductive exposures. Approval for this study was obtained from the institutional review boards of all participating institutions.

RNA Isolation and cDNA Preparation

RNA isolation, cDNA synthesis, and microarray analysis of blood spot samples was performed at the Laboratory of Microarray Technology at VARI (Van Andel Research Institute) per our published protocol (25). Three 3-mm blood spot punches were homogenized using a TissueLyser (Qiagen, Valencia, CA) before isolation using the illustra RNAspin Mini kit (GE Healthcare, Buckinghamshire, United Kingdom). Deoxyribonuclease digestion was carried out during RNA isolation to remove any contaminating DNA in RNA. Total RNA was then concentrated with an RNA Clean & Concentrator-5 Kit (Zymo Research, Orange, CA). RNA quality and quantity were evaluated using an RNA Pico Lab Chip on the Agilent BioAnalyzer. The average RNA integrity number was 2.3 ± 0.71. The WT-Ovation Pico RNA Amplification System (NuGEN Technologies, San Carlos, CA), based on the Ribo-SPIA technology, was used to amplify total RNA and synthesize first- and second-stranded cDNA. Because this amplification system is initiated both at the 3′ end and randomly throughout the whole transcriptome, it has the advantage of amplifying non–poly A transcripts and compromised RNA.

Gene Expression Microarray Assays

For gene expression microarray, cDNA was labeled with Alexa Fluor 3 fluorescent dye from the BioPrime Total Genomic Labeling System (Invitrogen Life Technologies, Carlsbad, CA) before purification using the PureLink PCR Purification System (Invitrogen). Purified labeled product was then applied to a 8 × 60,000 whole human genome gene expression microarray (Agilent). Each array contains 60,000 oligonucleotide probes (60 bp probe) covering 27,958 Entrez gene RNAs and 7,419 long intergenic noncoding RNAs. The arrays were hybridized for 17 h at 65 °C and 10 rpm rotation speed, then washed for 2 min each with washing buffers 1 and 2 and scanned with an Agilent G3 high-resolution scanner. Probe features were extracted from the microarray scan data using Feature Extraction software v. (Agilent).

Microarray Confirmation by qPCR

Masked cDNA samples synthesized from neonatal blood spot RNA were used for qPCR analysis. Specific optimized Taqman probes and primers were obtained from Applied Biosystems by Life Technologies (Carlsbad, CA), and qPCR was performed using Applied Biosystems 7500 Fast Real-Time PCR System. We examined the correlation between log2 intensity of microarray data and mean CT of qPCR data for all three genes, and for the FCGR2A gene, we also assessed the correlation between log2 fold change of microarray data and −log2 fold change of qPCR.

Selection of Seven Gene Sets Representing Four Hypothesized Pathways to CP

The empirical hypoxic gene set is based on the responses of cells in tissue culture exposed to hypoxemia as compared with normoxemia (26), whereas the canonical hypoxic gene set is based on the view that hypoxia-inducible transcription factor binds a consensus DNA sequence termed the hypoxia-responsive element (27). A similar approach was used to construct the canonical thyroid-responsive element gene set (28). The experimentally derived thyroid gene set was isolated following human exposure to thyroid hormone (29). The canonical inflammation pathway is GO:005072 (inflammatory response) (30), whereas the empirical gene set is from an experiment in prematures with and without evidence of fetal inflammation and infection (31). The coagulation cascade was represented by GO:0007596 (blood clotting biological processes) (30).

Statistical Methods

Data processing and analysis were performed using open source statistical software R (, version 2.13.2). Unqualified spots were filtered using the method of Patterson et al. (32), where the expression data were removed whenever the gProcessed signal was less than twice the gProcessed signal error. Gene expression data were normalized using a between-array quantile normalization method (33) and further aggregated to the gene level using the mean of the expression signal of all probes of each gene. Differential expression of individual genes was examined with the moderated paired t test (appropriate for matched pairs) of the linear model and an empirical Bayes method implemented in R package limma (34). The significance of gene expression was corrected for multiple testing, setting the false-discovery rate at 0.05 (10). We chose the GAGE method (35), the only published method applicable to a matched case–control study among several methods for gene set analysis (36,37). A fuller description of the GAGE method is provided in the Supplementary Methods online. For case–control comparisons of clinical data, exact conditional logistic regression and paired t-tests were used for categorical variables and continuous variables, respectively.

Statement of Financial Support

This work was funded by National Institutes of Health grant R01 NS055101 and the Van Andel Research Institute. N.T.H. is partially funded by the Vietnam Education Foundation.


The authors declared no conflict of interest.