Main

The diagnosis of severe GH deficiency (GHD) on the one hand or complete GH insensitivity (GHI) on the other, usually is obvious in the short child in whom appropriate studies have excluded other causes for growth failure. Among children forming the continuum between these two extremes, diagnosis is more challenging; that is, children with partial GHD or those considered to have partial GHI, who may be classified as idiopathic short stature (ISS). Despite investigations and discussions aimed at attaining consensus on the diagnostic discrimination between GHD and ISS (1, 2), none of the clinical measures used to date provide a reliable means for categorizing these patients and for predicting the value of GH therapy (3). The effect of the GH axis on statural growth in an individual child depends on the interaction between GH secretion and GH responsiveness. With better understanding of conditions causing GH resistance (47), the need to consider responsiveness to GH, as well as secretion of GH when interpreting the growth of a child has become more apparent.

Traditionally, the diagnosis of GHD relies on the interpretation of the serum GH concentrations attained during at least two GH provocation tests. However, the lack of convergence between GH response to provocation tests and growth response to GH therapy has led many clinicians to disregard the GH stimulation test results, and to use the first y growth response to GH treatment to determine need for treatment (8). Diagnosis, however, should not be based on auxological criteria alone (9). Similarly, to rely solely on GH provocation tests can be misleading, as only 33% of the variance in growth response to GH treatment is explained by the results of conventional diagnostic procedures (10). Many now accept that the diagnosis of GHD must be based on an integrated judgment from many variables (11). However, with addition of biochemical markers such as IGF-I, IGFBP-3, and leptin to the diagnostic evaluation, not more than 41% to 58% of the variance in growth response could be explained (12, 13). With addition of the spontaneous 24-h GH secretion profile from a group of 60 children, the variance in growth response explained could be increased to about 70% (unpublished).

Prediction models are used widely in medicine with the aim that outcome of therapies in individual patients can be determined. The usefulness of such models is defined by how well they perform in practice, rather than the level of statistical significance they achieve. Consequently, it is crucial that the model be tested using data from patients who fulfill the inclusion criteria for the model, but whose data were not used in deriving the model (14).

The aim with this study was to develop models for prediction of the one and two-y growth responses to a standardized GH treatment in individual slowly growing or short prepubertal children considered to have isolated GHD or ISS. The predicted individual growth response can be used as an estimate of GH secretion in relation to GH responsiveness and serve as a guide for better evidence-based decision making with regard to growth promoting treatment (15).

METHODS

Study Design

The reliability and stability of prediction models depends on the inclusion of a wide range of values in the variable used. In the present study group, therefore, we included short children covering the entire continuum of GH secretory capacities, ranging from very low (GHD) to high (ISS) values. The outcome variable for the study, the growth response to GH treatment, appears also as a continuum when related to GH secretory capacity (Fig. 1). As no distinct cut-off limit could be discerned, and in keeping with the study aim, children with and without the classic definition of GHD were analyzed together.

Figure 1
figure 1

Growth response to the 1st y of GH treatment (0.1 U/kg/d, 0.033 mg/kg/d) vs maximum serum GH concentration during an AITT (GHmaxAITT) in the model group (Δ) and in the validation group (•) of short prepubertal children diagnosed as having isolated GHD or ISS. In clinical practice 32 mU/L (WHO IRP 80/505) is used as the cut-off point for diagnosis of GHD. This corresponds to 20 mU/L with the previously used standard (WHO IRP 66/217; (24)). A continuum in the GHmax response for the GHD and ISS children is observed, as is a continuum in the growth response to GH treatment. Note the wide range in growth responses, indicating a wide range in GH responsiveness when the patients are treated with the same weight-based GH dose.

Data derived from two separate groups of children were used in the study. The groups differed only in the calendar time at which they started GH treatment. Data from the first (“model”) group, were used to calculate the prediction functions, while data from the second (“validation”) group, who started GH therapy after recruitment to the prediction model group was closed, were used only to test the validity of the prediction models (14).

Patients

Prediction model group.

This group of 269 prepubertal short or slowly growing children (45 girls, 224 boys) was of Caucasian origin, and was mainly Swedish. They had been treated with GH, 0.1 U/kg/d (0.033 mg/kg/d), for at least 1 y, and 213 (33 girls, 180 boys) had been treated for 2 y. They were either children with isolated idiopathic GH insufficiency who were included in the Swedish National Registry for GH treatment (n = 173), or short children without GH insufficiency who were included in clinical trials of GH treatment (n = 96).

Children with dysmorphic syndromes or chronic diseases, other than those related to impaired GH secretion or action, were excluded. The children were prepubertal, as defined by breast stage 1 (16) or testicular volume below 4 mL (17). Boys, however, who have a testicular volume between 4 and 6 mL have not yet started their pubertal growth spurt (10), and fit the childhood component of the growth standard from the infant, childhood and puberty (ICP) model (18). Two of the 224 boys in the model group and 6 of the 129 boys in the validation group had a testicular volume between 4 and 6 mL at the start of the study and were included in the analysis, as their testicular volume was still < 6 mL after 1 and 2 y of treatment. The characteristics of the patients in the prediction model group are given in Table 1 and of a subgroup for which spontaneous 24h GH profiles were available, in Table 2.

Table 1 Characteristics of the children included in the prediction model group and the validation group
Table 2 Characteristics of the children included in the 24-h profile prediction model group and the validation group

Validation group.

Children in the validation group were selected using the same inclusion criteria and clinical evaluation procedures used for the prediction model group. This group consisted of 149 children (20 girls, 129 boys), 129 from the Swedish National Registry for GH treatment and 20 from ongoing clinical studies of GH treatment for short children without GH insufficiency. All 149 children were followed for 1 y (±3 mo) and 109 (13 girls, 96 boys) were followed for 2 y, (±3 mo). The characteristics of the patients in the validation group are given in Tables 1 and 2.

Study Protocol

Pretreatment investigations.

The investigations were performed during the pretreatment year, including a GH stimulation test (arginine-insulin tolerance test, AITT) as described previously (10). A spontaneous 24-h GH-profile with samples taken every 20 or 30 min (19) was also obtained from 107 children, (18 girls, 89 boys) in the prediction group and 32 children (2 girls, 30 boys) in the validation group. Blood samples for IGF-I and IGFBP-3 were drawn at the start of treatment from 106 children in the prediction model group, and from 95 children in the validation group. The clinical characteristics of the groups of children from which only auxological information was used were not different from those in which serum IGF-I and IGFBP-3 or 24h GH profiles were obtained.

Treatment.

All children underwent the same regimen of daily s.c. injections of GH 0.1 U/kg (0.033 mg/kg).

Auxological Methods

Information on gestational age, birth weight, and birth length was collected from the Swedish Medical Birth Registry. The growth of the children was recorded at health care units from birth until inclusion in the study, i.e. 1 y (±3 mo) before the start of GH treatment. Thereafter, for the majority of the children, height was measured using a Harpenden stadiometer at pediatric units. Height data were transformed into SDscores for age and sex using the childhood component of the infancy, childhood and puberty (ICP) growth model of Karlberg et al. (18), and weight data according to Karlberg et al. (20). Weight for height data also were transformed into SDscores (WHSD scoreSDscore; (21). Parental heights were expressed in SDscores (20). The intra-familial height deficit (diffSDscore; the difference between the height SDscore of each child at the start of GH treatment and the mid-parental height expressed in SDscore) was calculated.

Hormone Analysis

GH.

Serum concentrations of GH from the AITT were generally analyzed using an immunoradiometric assay with polyclonal antibodies and the WHO IRP 80/505 Standard (Pharmacia Diagnostics AB, Uppsala, Sweden). If another method or an earlier standard was used, the GH concentrations were transformed into comparable levels using transformation factors derived in our laboratory (2224). The detection limit of the assay is 0.4 mU/L and the intra-assay coefficients of variation are 7.1%, 1.9% and 2.3% at concentrations of 1, 5–20 and 30 mU/L, respectively. The interassay coefficients of variation are 14%, 4%, 4%, and 7% at concentrations of 1, 21, 30, and 46 mU/L, respectively.

Analyses of the 24-h GH profiles were conducted in the same laboratory using an immunoradiometric assay and the WHO IRP 80/505 standard (Pharmacia Diagnostics AB, Uppsala, Sweden). The results from the GH profiles were analyzed with the Pulsar program (25) giving the calculated baseline, number of peaks, peak amplitudes, and area under the curve above the zero line (AUC0) or above the calculated baseline (AUCb). The AUC was used to calculate the GH secretion rate (26), which was used in the analysis along with the maximal GH peak during the 24h GH profile (GHmax24h), AUC0 and AUCb (27).

IGF-I.

An IGFBP-blocked RIA was used without extraction of the sample, and in the presence of an approximately 250-fold excess of IGF-II (Mediagnost GmbH, Tübingen, Germany) (28).

IGFBP-3.

An RIA was used as previously reported (28). Serum concentrations of IGF-I and IGFBP-3 were converted into SDscore (29).

STATISTICS

General Considerations about Prediction Models

How well a model fits (is adapted to) the data can be evaluated on a group level; the value of R2 is an estimate of how the observed values are correlated with the fitted values. In the final analysis usefulness of a model is determined by how well it works in practice, not the level of statistical significance. Therefore it is crucial that the model should be tested (validated), using data from patients fulfilling the inclusion criteria for the model, but who were not among the patients whose data were used to derive the model. The model is considered to be statistically valid if the SDres for the validation group of patients is in the same range as observed for the group of patients from whose data the model was derived. This procedure is referred to as “validation” or “generalization” (14).

With focus on the validation group, the R2 analysis is too sensitive for possible extremes, and insensitive for shifting of the mean predicted value. We therefore solely present analyses of the residuals, i.e. differences between the observed outcomes and those fitted by the models for the modeling group, or predicted by the models for the validation group.

The residuals were calculated and the SD of these differences was evaluated as the SD of the residuals (SDres). This is a measure of the root mean squared differences between observed and predicted outcome.

Modeling

There was no standard statistical method available for the approach used in the present study. The technique used is best described as nonlinear data fitting (empirical curve fitting) and empirical testing. The nonlinear approach was chosen because of a nonlinear relation was found between growth response and other variables. The following modeling guidelines were applied.

For each predictor variable, the form of the transformation curve that gives the best overall prediction result was determined (30), estimated by as high a correlation as possible. The slope of the tails of the optimal function was restricted to stabilize the nonlinear models. Optimal transformed variables were selected. Interactions of optimal transformed variables in the nonlinear multiple regression were included, if there was a contribution to the prediction. Transformed intermediate factors were not selected, because of the danger of overfitting. In the present approach, overfitting was prevented by selecting stepwise subsets of nonlinear transformed original variables that gave the best overall prediction result, and by evaluating the predictive power by cross-validation by successively omitting different subgroups of model patients accounting for at least 20% of the patients in a first modeling step, and thereafter predicting the response of the omitted subgroups. As the growth response curve is nonlinear, a nonlinear correction for differences in measurement time (1 y ± 3 mo) was developed. The modeler (AN) had no access to the data from the validation group. Testing of the final prediction models with the validation group children was performed by others (S.R., B.S.) on another computer. A computer program for calculation of the prediction was constructed, using the algorithms presented in the appendix.

For comparison, a traditional multiple stepwise linear regression analysis was applied to those variables associated (at p < 0.10) with the growth response to GH treatment, using the model group of children. Correlations were tested using Pitman's nonparametric permutation test (31). Algorithms were constructed for each variant of the models, and used for prediction of the growth response for the children in the validation group.

Ethics.

The studies were approved by the Ethics Committee of the Medical Faculties of the Universities of Göteborg, Lund, Linköping, Uppsala and Umeå and of the Karolinska Institute. Informed consent was obtained from the parents of each child and, from the child, where appropriate.

RESULTS

Prediction Models

Five sets of clinical information were used;1) auxology alone from close to the start of treatment; or auxology plus 2) data on growth before 2 y of age, 3) IGF-I and IGFBP-3, 4) the maximal serum GH concentration in response to the AITT (GHmaxAITT), or 5) GH estimates from the spontaneous 24-h GH profile. The auxological variables made available for modeling, and those selected in the models, are shown in Table 3. The growth response outcome was expressed as the change in height SDscore for 1 or 2 y of GH treatment. The equations for the models are given in the Appendix.

Table 3 Auxological variables available and selected for the statistical analyses in the basic model (top) and the early growth model (bottom) * Ref 18.

Basic model.

The Basic model was created using only auxological information from 1 y before the start and from the start of GH treatment, together with parental heights in 269 children (Table 3). The result is given in Table 4. The results from the validation group of children (n = 149) were consistent (SDres = 0.24) with the results from the model group (SDres = 0.28), indicating that it is statistically valid. A plot of the studentized residuals in relation to the predicted growth response for the validation group is presented in Fig. 2.

Table 4 Model reliability, comparison of the model group with the validation group (non-linear multiple regression) SDres = root mean square error of the residuals.
Figure 2
figure 2

Individual studentized residuals (residuals/SDres) in relation to predicted 1st y growth responses using the Basic model (top), and the GH24h Early (bottom) using the data of children in the validation group (n = 149).

Early growth model.

The variables of length, weight, and weight/length at birth and at 1 and 2 y of age were added to the Basic model. In Table 3 the available variables are listed and the variables found informative and included in the models are marked. Results are given in Table 4.

GH24hmodel.

Different variables from the 24-h GH profile were added to the Basic model to produce the GH24h model (Table 4). The maximum GH peak over 24-h (GHmax24h) was the most informative variable. The second most informative variable from the profile was the area under the curve above baseline (AUCb) (data not shown).

GH24h combined with early growth model.

Both the early growth variables and the GHmax24h variable were added to the Basic model. This resulted in the model with the lowest SDres (0.19) (Table 4).

GHmaxAITT, IGF-I SDscore and IGF-bindingprotein (IGFBP)-3 SDscore added to Basic model.

The GHmaxAITT, the IGF-I SDscore and the IGFBP-3 SDscore were evaluated by adding their values to the Basic model. Both GHmaxAITT and the IGF-I SDscore were predictive, but at an intermediate magnitude. The SDres were 0.27 and 0.24, respectively (Table 4). The IGFBP-3 SDscore values were less informative than the IGF-I SDscores or the GH values from the provocation test, and did not improve the prediction beyond that achieved from the Basic model.

Comparison of the Results of Nonlinear Models

The best 1st y model (the GH24h early growth model with the most extensive pretreatment information) had a SDres of 0.19 SDscore, gave height predictions in the validation group within 0 to 0.25 SDscore for 84% of the children, and between 0.25 to 0.50 SDscore for the remaining 16% (Fig. 3). We observed good extrapolation properties for all models. In Table 1 the maximum change in height SDscore during 2 y of treatment was 3.20 for the prediction model group and 5.54 for the validation group. While this is far outside the modeling range, the overall prediction results outside the modeling range were excellent, adding credibility to the prediction models. The prediction intervals, expressed as ± 1.96 × SDres were for the 1st y ± 0.37 to ± 0.54 SDscores with the different models. Corresponding values for the 2nd y prediction were ± 0.55 to ± 0.76 SDscores. Figure 4 shows the prediction intervals from different models for a boy from the validation group.

Figure 3
figure 3

Absolute difference (irrespective of sign) between observed and predicted change in height SDscore after 1 y (top) and 2 y (bottom) of GH treatment calculated with the different models for the children in the validation group. With the GH24 Early model 84% of the 1st y growth response was predicted within 0 to 0.25 SDscore, and between 0.25 to 0.50 SDscore for the remaining 16%. With the Basic model the 1st y growth response could be predicted within 0 to 0.25 SDscore for 73% of the children, within 0.25 to 0.50 SDscore for 23%, and within 0.5 to 0.75 score for 5%.

Figure 4
figure 4

The 95% prediction intervals for three children from the validation group, calculated at the start of treatment with the Basic model (left symbol of each set), the early growth model (middle left), the GH24h-model (middle right), and the GH24h early growth model (right symbol of each set). The bar represents ± 1SD and the whiskers ±1.96 SD. The actual growth before treatment (dotted line) and during treatment (solid line) is presented in a growth chart, with growth curve lines indicating mean (solid line) and ±3 SD (dotted lines).

Comparison with Linear Multivariate Statistical Method

A multiple stepwise linear regression analysis was applied to the same variables as in the models described above. An algorithm was constructed for each model, using the variables and regression coefficients found. These algorithms were used to predict the growth responses of children in the validation group and compared with the observed growth response. When comparing the SDres values from the nonlinear validation with the linear method, the SDres values from the former validation results were substantially lower, giving a more accurate prediction. For the Basic model, the validation SDres for the 1st y was 0.327 using the linear approach and 0.241 using the nonlinear approach; corresponding results for the 2-y SDres were 0.535 and 0.363, respectively.

DISCUSSION

This study derives and presents validated models that can be used in individual children to predict the initial 1- and 2-y growth response to a given GH treatment regimen in short children with a broad range in GH secretory capacity. The precision of these models is the best to date, as estimated by the narrow individual prediction interval. Thus, the models will serve as a valuable tool for selecting children for successful GH treatment, and allow better evidence-based decision making with regard to growth promoting treatment. The value of our prediction study is enhanced by the use of one group of children to develop, and a strictly separate group of children fulfilling the inclusion criteria coming from different endocrine centers in Sweden to validate the models, i.e. “temporal validation,” the second step out of three according to Altman and Royston (14). The third step, external validation of the models using data from children from other countries, is in progress.

Five sets of clinical information were used. A Basic model was based solely on auxological data from the year before treatment was started, and parental heights. The other models, which included the Basic model plus data on growth in early life (before 2 y of age) and/or biochemical measures, result in three levels of prediction accuracy. While the auxological data provide a means for predicting growth responses, additional pretreatment investigations can be selected to improve the accuracy of prediction. In our hands, the accuracy with which the growth response to GH could be predicted was the same using the models that included the AITT, IGF-I, or the early infancy growth data, with only a minor improvement compared with Basic model. This may indicate that these variables already were substantially reflected by the auxology variables included in the Basic model. The predictive value was improved, however, when the GHmax24h data were added to the Basic model.

The model having the narrowest prediction interval, was the one using the 24-h GH profile data added to Basic + early growth model. Using this model, gain in height after 1 y of GH treatment could be predicted with an accuracy of ± 0.37 SDscores (±1.96 SDres). For a 3-y-old boy, 1 SDscore is 3.2 cm, which corresponds to a prediction interval of ±1.2 cm. For a boy aged 9 y, the mean age at start of GH treatment in the present study, 1 SDscore is 5.3 cm. Therefore a prediction interval of ±2.0 cm would be anticipated (32). Using the Basic model, which relies solely on auxology, the 1st y prediction interval was ±0.54 SDscore, corresponding to ±1.7 cm for a 3-y-old child and ±2.9 cm for a child aged 9 y. Thus there is a choice between different work up procedures depending on accuracy demanded. The improvement of the prediction interval measured in cm may not be impressing, but measured in percentage it is substantial. Even more important is that the SDres for the group is presented. The low value is a sign of the low ratio of extreme residuals, i.e. high accuracy for individual children. The less accuracy in the model used, the less reliable is the prediction and the higher the risk to make a wrong clinical decision for the individual child about a treatment with considerable cost for the community and effort for the child. Extra time and expense for work up is low compared with the cost and effort for 1 y of treatment.

The finding that the GH peak from the 24-h profile was more informative than the peak from the provocation test may reflect the higher reproducibility of the former (3335). Results from provocation tests show low reproducibility and wide intra-individual variability, partly due to total or partial refractoriness of the somatotrophs at the time the test is performed (36). Moreover, the GH value obtained during the spontaneous profile better reflects physiologic secretion and the overall secretory pattern. Growth in children has been shown to be modulated by the GH pulse amplitude (23, 37, 38). This is supported by our results, in which the GH peak is more informative than other estimates from the GH profile. In a recent study in which leptin was measured as a metabolic marker for the lipolytic effect of GH, the change in serum leptin level was shown to correlate with the growth responses indicating that metabolic effects may parallel effects on bone growth (13). Unfortunately, we had insufficient numbers of children with serum leptin measurements to include leptin as a variable in the present models.

The multivariate approach reflects the clinical considerations and has been used in earlier studies with prediction models (3941). The approaches used in the present study are similar to those used for children in the world database KIGS by Ranke et al. (39, 42, 43), although some differences exist. In previous studies, the children were separated according to the maximum GH level during a provocation test into those with GHD and those with ISS; the statistical method used was linear multivariate analysis; and the GH dose was not uniform because of the different treatment modalities used world-wide. The latter, of course, has the benefit of providing insight into the effects of individual GH treatment modalities. The forthcoming question will be: what minimal growth response justifies treatment with use of what maximal GH dose, in relation to benefit, safety, and cost.

While the models in this study only predict the 1- and 2-y growth responses of individual children to GH treatment, they also provide insight into the final outcome: adult height. The short time period was selected because it is the period for catch-up growth, as reported in children with celiac disease or late-onset primary hypothyroidism (44, 45), and we believe it is a suitable model for catch-up growth in GHD children being treated with GH. The 1st y growth response is an indicator of growth to come (46). This 1st y growth influences markedly the height achieved by the onset of puberty, an important determinant of adult height (47, 48). During analysis of our data, we developed a model for prediction of the 2nd y growth response by including the 1st y growth response variable. Including the 1st y data made all other variables unnecessary, as the prediction was quite reliable (data not shown). Thus, the bulk of information about the GH responsiveness is obtained from the growth response during the first y.

In keeping with criteria established for other therapies in medicine, the short or slowly growing child should not be denied GH therapy solely based on results of tests of GH secretion. Rather, decisions about treatment with GH should be made on the basis of the likelihood of benefit, depending not only on the sufficiency of GH secreted, but also on the responsiveness to GH. In a declining economy, there is temptation to return to older diagnostic criteria that use lower cut-off GH values in provocation tests, despite their well known shortcomings. The models presented here permit estimation of GH responsiveness with an accuracy that substantially improves on the clinical diagnostic procedures used hitherto, and facilitates selection of children who will respond to GH treatment. The models presented here are validated for children judged as having ISS or GHD by classical criteria. Therefore, neither wide variability in responses to a GH provocation tests nor arbitrarily chosen cut-off limits, preclude their use.