ArticlePlus

Click on the links below to access all the ArticlePlus for this article.

Please note that ArticlePlus files may launch a viewer application outside of your web browser.

Disturbed perinatal adaptation may affect almost all the major organ systems and may lead to a number of perinatal complications such as infant respiratory distress syndrome (IRDS), patent ductus arteriosus (PDA), intraventricular hemorrhage (IVH), acute renal failure (ARF), necrotizing enterocolitis (NEC), and cardiac failure (CF), respectively. Underdeveloped organs, impaired vasoregulation, infection, and inflammation contribute to higher susceptibility. Clinical status of the infants (particularly birth weight and gestational age), therapeutic interventions, and drugs are major determinants. However, not every infant from the same gestational age cohort and under the same conditions develops perinatal complications, and this fact stresses the need for identification of additional factors that have a possible impact on risk. Genetic polymorphisms are possible candidates (for more data, see refs. 16). Accumulating literature suggests that the carrier state of specific genetic variants may protect against, whereas others may increase, the risk of some perinatal complications. On the one hand, these findings may help to identify specific elements in disease pathogenesis and therapeutic targets in the future. Moreover, they may be of practical importance as they could help to identify patients at risk. However, what additional information may genotypes provide for the treating physician at the bedside? Can they help identify infants at risk? Available reports do not address these questions, probably as the risk of perinatal complications is influenced by so many interacting factors that classic statistical methods (e.g. logistic regression) become unstable and cannot handle simultaneously many genetic variants and other risk factors without requiring prohibitively large sample sizes. Recognizing this drawback, newly developed statistical approaches such as high-dimensional nonparametric methods seem more suitable for the analysis of the impact of genotypes on phenotype. One of these techniques is the random forest technique (RFT) (7). Quite recently, RFT was used to detect interactions between genetic polymorphisms and asthma (8).

Presumably, if genetic polymorphisms do contribute to the later development of perinatal complications, knowledge of a newborn infant's genotype at birth should increase the accuracy of prediction of perinatal complications compared with that obtained solely using clinical data. We tested this hypothesis with RFT on our database containing basic clinical data at birth and genotype data of preterm infants born with a birth weight ≤1500 g.

PATIENTS AND METHODS

In recent years, we have established a database of preterm neonates with a birth weight of ≤1500 g to investigate the impact of genotypes on disturbed perinatal adaptation. We analyzed the records of those 135 preterm infants who had been genotyped for at least 24 genetic variants (for a brief overview, see Table 1). The records also contain the following parameters: gender (65 girls, 70 boys), birth weight [median (range)]: 1280 g (700–1500) and gestational age 30 wk (24–32), presence of CF (n = 44), IRDS (n = 67), PDA (n = 52), IVH (n = 44), NEC (n = 53), sepsis (n = 37), and ARF (n = 42). The diagnosis of these morbidities was set up according to internationally accepted criteria and is described in detail in the references in Table 1. Our findings published earlier along with representative allele prevalence in this perinatal population are presented in Table 1. In this work, infants' records were analyzed anonymously, and we were blinded to their personal data. Our study was approved by an institutional review board (TUKEB 16/2003).

Table 1 Published associations between genotype and disease based on the database used in our analysis

Statistical analysis.

Classification tree–based methods can be very efficient for selecting from large numbers of predictor variables those that best explain a phenotype. The ease of interpretation of classification trees, along with their flexibility in accommodating large numbers of predictors and ability to handle heterogeneity, has resulted in increasing interest in their application to genetic association and linkage studies.

Classification trees use all variables and relevant cases when creating nodes in a single tree that represents the outcome of the learning process. To the contrary, RFT creates a large number of trees developed on random samples of input cases. The input data for each tree is based on iterative sampling from the original data. The variables used for constructing splits are a random subsample of the complete set of variables. The necessary calculations are carried out tree by tree as the random forest is constructed. RFT estimates the importance of a variable by looking at how much classification error increases when data for that variable are permuted and all other data are left unchanged and calculates ISs. ISs can be used to prioritize the variables in this model. (If the values of this score from tree to tree are independent, then the SE can be computed. The raw score divided by its SE yields a z score, and, thus, a significance level can be assigned to each importance scores (ISs)).

RFT with the aid of variables classifies patients as to whether they are affected by the complication tested, and the results of classification are compared with actual clinical data. As the result of this process, 2 × 2 confusion matrices (true-negative, false-positive, true-positive, and false-negative groups) are created showing the number of patients classified properly or not. Accuracy values are calculated as the number of true negative + true positive classifications/the number of analyzed patients.

As immaturity is the major determinant of perinatal adaptation, the additional protective/worsening role of individual polymorphisms may be masked to different extent at different developmental status and the large heterogeneity of population may lead to unstable results.

The first step in our analysis was the identification of the gray zone of those infants in whom genotype data may improve the prediction of complications.

We added up the number of complications (i.e. CF, NEC, ARF, sepsis, PDA, IVH, IRDS) for each infant and gave them a score (range, 0–7; median value, 3). According to this score, infants were assigned a dummy variable (0 for infants with score ≤3 and 1 for infants with score >3).

Then we ranked the infants according to their birth weight. We created a frame of 45 consecutive infants that was shifted along the data set of all infants ranked according to their birth weights by adding of the next and leaving out the smallest infant of the analyzed cohort. At each position of the frame, we calculated the accuracy of classification when it was based solely on patients' birth data and when it was based on birth data plus all tested genotypes. We created a graph representing the alteration of accuracy values of the prediction of morbidity throughout the whole range of infants (Fig. 1). Obviously, the accuracy values with and without genotype data diverged between 680 and 1430 g; for further analyses, this birth weight range was used.

Figure 1
figure 1

Selection process of that birth weight range where genotypes might add information to patient classification. , accuracy values based on birth data; ▪, accuracy values based on birth data plus genotype data.

In this group of infants (n = 100), we calculated the IS values of all clinical and genotype data for the classification of each perinatal complication. The clinical and genotype data with the highest IS values (with p < 0.10) were used to classify the patients according to the occurrence of severe morbidity, NEC, ARF, IVH, IRDS, CF, PDA, and sepsis; the accuracy of classification was compared with that performed solely based on birth data.

The analyses were performed using Fortran 77 codes (9) and in the R statistical environment (10) using the “party” package (11). Program codes are available upon request.

RESULTS

First, as described in detail in the previous section, we searched for that range of birth weight for which genotype data may help classify patients according to the presence or absence of a complication; this range has been established as 680–1430 g and contained 100 infants. The IS values of all clinical and genotype data for the classification of each perinatal complication were calculated; IS values for the classification of patients according to severe morbidity dummy variables are visualized on a heat map figure (Figure S1, supplemental material online). The clinical and genotype data with the highest IS values (with p < 0.10) are listed in Table 2.

Table 2 List of variables with highest ISs in the classification of patients according to perinatal complications

For each complication (except ARF), the most significant predictors included birth weight and gestational age. Interestingly, some genetic polymorphisms are also major predictors, and in some complications, their IS approached or even surpassed that of birth data (such as IL-12 p40 GC for NEC). For ARF, the ISs were low in general and did not include birth weight and gestational age, just some genetic polymorphisms.

Based according to calculated ISs, we created optimized classification patterns that included each IS with a p value <0.10. At this step, we calculated the accuracy of classification and related to that based solely on birth data (Table 3).

Table 3 Accuracy of classification of patients with the use of solely birth data or birth data plus variables with the highest ISs

The accuracy of classification according to overall morbidity, NEC, ARF, IRDS, CF, and PDA improved by 0.06–0.10, when the genotype data with the highest IS values were also incorporated to the analysis.

DISCUSSION

During past years, several associations between genetic variants and perinatal complications [including CF (12), sepsis (4), PDA (5), ARF (3), NEC (1), and IVH (6)] have been reported. These studies are generally of case-control design, and, therefore, efforts have been done to enroll populations as homogeneous as possible. However, it is likely that preterm infants' investigated populations are still heterogeneous in terms of gestational age, applied therapies, etc. Therefore, logistic regression analysis is generally used to adjust the genotype-phenotype association to risk factors. The downside of this approach is that, as a result of this process, several subgroups with very low patient numbers are created and this leads to the instability of this model. This is a probable explanation for contradictory results of genetic studies performed in preterm infants with sepsis (4,13,14), disturbances of perinatal cardiorespiratory adaptation (12,15), or chronic lung disorder (1619). Recognizing this limitation, we used RFT, a new statistical approach, to assess whether genotypes may help to identify infants at risk. RFT is a type of high-dimensional nonparametric predictive model that consists of a collection of classification or regression trees. Thus, RFT appears to be particularly well suited to address a primary problem posed by large-scale association studies (20).

The results of our analyses are in accordance with everyday practice: the major determinant of perinatal complications is prematurity reflected by low birth weight and gestational age. Indeed, this is in line with the general clinical experience as proficient neonatologists are usually able to classify the preterm infant in the delivery room whether he or she is at a higher risk of perinatal complications. However, can the accuracy of classification be improved when some information about genetic variants is also known?

Before addressing this question, we assumed that the impact of genotypes on perinatal complications is influenced by developmental status. Infants with low gestational age are probably more susceptible to disturbances of perinatal adaptation independently of their genetic inheritance; therefore, genetic variants play fewer roles in the complications in a very immature population. In contrast, the importance of inherited susceptibility is higher in more mature infants, but the incidence of perinatal morbidities among near-term infants is low; therefore, a large number of patients is required to investigate the role of genotypes in perinatal complications in these populations. To identify that gray zone preterm population in whom prematurity itself does not fully mask the effect of genetic variants, but the incidence of morbidity is still high enough to be investigated, we screened the importance of 24 genetic variants in 135 preterm infants.

In agreement with our hypothesis, the generated curves representing the accuracy of classifications of infants according to the presence of several morbidities with and without genotype data diverged somewhat in a subset of neonates, namely in those whose birth weight ranged between 680 and 1430 g. We analyzed this population further.

We demonstrated that knowing the carrier state of some genetic polymorphisms in this subgroup of infants might help to predict the risk of some perinatal complications. The identified genes are partly identical to those reported previously in other experimental settings and case-control studies (such as IL-12 p40 GC for NEC; ER 2 PvuII p and ACE D for CF; for references, see Table 1).

It is worth mentioning that RFT revealed other polymorphisms not detected by the standard statistical approaches, which may indicate other directions for research and therapy. Although our results might give rise to several hypotheses, we should stress that our analysis was of descriptive nature and focused on finding possible predictive markers of perinatal complications.

Moreover, there are some limitations of our analysis. Although RFT is a robust method, and especially suitable for studies with low patient numbers (21), the number of patients analyzed is still small. Our population is possibly biased by the over- and underrepresentation of some complications. Furthermore, we considered just 24 genetic variants, and it is reasonable to postulate that the expansion of the analysis to other genotypes may reveal the importance of additional genetic variants as well. Finally, we omitted important clinical conditions such as intrauterine growth retardation, postnatal therapeutic interventions, and drugs from our analysis and focused exclusively on the additional predictive value of genotypes to that of birth data. Considering these factors in a larger number of patients would specify, however, further subgroups with special genetic susceptibility to perinatal complications. We should also stress that the accuracy we obtained with or without genetic patterns is still far less than 1.00, indicating that there are still major unrecorded perinatal, postnatal, and, possibly, genetic factors that contribute to the risk of perinatal morbidities.

To sum up our results, what are the implications of our analysis for the interested clinicians? First, we think that our method offers a new statistical way to test the possible role of genetic polymorphisms in disease prediction of preterm infants. Second, we identified a subgroup in that the association between genotype and complications is worthy of investigation. Third, our holistic approach designates those genetic variants that may contribute to the most important perinatal complications in preterm infants.