Introduction

Recombinant human growth hormone (r-hGH) is used as a therapeutic agent for a range of disorders of growth impairment including growth hormone deficiency (GHD) and Turner syndrome (TS). Treatment is costly between £6000 and £24000 per centimetre (cm) gained in final height [1]. Therapy is not always successful in patients and there are currently no genomic markers for predicting positive or negative responses. Prediction models up to four years of therapy have been defined using clinical measurements [2] but have been difficult to implement in practise. Whilst an understanding of the pharmacogenetic background has been established [3, 4], such approaches are of limited predictive value due to the influence of covariates related to the child’s developmental stage, disease severity and geographical location [5, 6]. The pre-treatment blood transcriptome has been previously shown to relate to first year response to r-hGH therapy [7]; however, little is known about the predictive value of this association and its relationship to longer term response to therapy. The transcriptome represents a level of ‘omic’ data that reflects genetic information, developmental stage in relation to age [8] along with the impact of the local environment [6] and, therefore, has potential to classify response to r-hGH.

Response to r-hGH in the first year of therapy is considered to be a primary marker of growth response. Prediction of first year growth has been shown to be dependent on GHD severity, age, distance to target height, body weight, dose of r-hGH, birth weight and, as defined by regression models, can account for 61% in GHD [9] and 46% in TS [10, 11] of the variation within the data. Clinical markers such as distance to target height are surrogate genetic variables and this implies that an effective level of genomic prediction is hypothesised to be possible if developmental [8, 12] and environmental covariates [13] of growth response can be taken into account.

Transcriptomic data have been used extensively in cancer tissues both to sub-type the tumour [14,15,16] and to predict response to therapies [17, 18]. In contrast, in this study, we have used peripheral blood gene expression profiling as the source for gene expression profiles, and show that these patterns can be used to predict response to r-hGH in each year of treatment up to 5 years in two different growth disorders that account for ~60% of GH prescriptions in the USA, Europe and the UK [19, 20].

Methods

Patients

The PREDICT Long-term Follow-up study (multicentre, open-label, prospective, phase IV) and the pharmacogenetics of the first year of r-hGH treatment have been described extensively previously [7, 21]. Briefly, pre-pubertal children with GHD and TS were enroled. A diagnosis of GHD was reached following two pharmacological stimulation tests with a peak GH concentration of <10 µg/L. Prior to enrolment in the study, none of the children had received GH therapy. Children with GHD due to central nervous system tumours or radiotherapy were excluded but children born small for gestational age were not (15% of the cohort [22]). The majority of GHD patients had isolated GHD but five were treated with both thyroxine and hydrocortisone replacement, three with just thyroxine and one with just hydrocortisone [22]. The diagnosis of TS was based on karyotype.

This PREDICT study was conducted in compliance with ethical principles based on the Declaration of Helsinki, the International Conference on Harmonization Tripartite Guideline for Good Clinical Practice and all applicable regulatory requirements. The PREDICT (NCT00256126) and PREDICT Long-term Follow-up (NCT00699855) studies were approved by the Scotland Medical Research and Ethics Committee (reference 05/MRE10/61) and the North West Research Ethics Committee (reference 08/H1010/77), respectively. Informed consent was obtained from parents for all study participants.

Genetic analysis

A total of 1219 genetic markers were used in the analysis, 1217 Illumina-genotyped single-nucleotide polymorphisms (SNPs) corresponding to a candidate list of 103 genes and 2 TaqMan-genotyped SNPs in the IGFBP3 promoter. All genes selected are known to be involved in growth regulation and GH action as previously described [5, 7].

A Kruskal–Wallis rank sum test was applied on the following three genetic models (a) genotypic (AA, AB, BB); (b) dominant (AA/AB + BB) and (c) recessive (AA + AB/BB). For non-pseudoautosomal X chromosome markers, GHD boys and TS girls were analysed as having only two homozygote categories (AA/BB). Adjustment for multiple testing was performed using Bonferroni correction with two different parameters as the number of independent tests, the number of linkage disequilibrium (LD) blocks in the gene in which the SNP is contained and the total number of LD blocks present in all genes (768 in GHD and 563 in TS). Filtering criterion for prediction was defined as a false discovery rate (FDR) modified p value < 0.05 unmodified for LD blocks.

Transcriptome analysis

Transcriptomic profiling was carried out on whole blood RNA as described previously [7] using Affymetrix GeneChip Human Genome U133 plus 2.0 Arrays. For background correction, the Robust Multichip Average was applied with quantile normalisation and a mean probe set summarisation using Qlucore Omics Explorer 2.3 ([QOE] Qlucore, Lund, Sweden). The data set generated was subject to quality control to investigate the presence of outliers and further confounding effects.

Baseline gene expression associations with height velocity in each year of growth response were determined using rank regression with microarray batch, age, body mass index (BMI) at baseline as covariates (eliminated factor function in QOE) for both GHD and TS patients along with gender and peak GH test response (average of two provocative tests) for the GHD patients. Over the study, a number of children either entered puberty spontaneously or received exogenous sex steroids for pubertal induction. We therefore introduced a further normalisation for Tanner stage to the analysis to account for the proportion of children entering puberty in each year of the study.

Generation of network models

Network analysis allows the identification and prioritisation of key functional elements within interactome models. To derive an interactome model differentially expressed genes were used as ‘seeds’ and all known protein:protein interactions between the seeds and their inferred immediate neighbours were calculated to generate a biological network using the output of the Biogrid model of the human Interactome (3.3.122) [23]. Network generation and processing were performed using Cytoscape 2.8.3 [24].

Analysis of gene network models

Clustering and ‘community structure’ of modules within biological networks arise from variation in connectivity within the network and are known to be associated with function [25, 26]. To rank these functional components within interactome models, we used the ModuLand plugin for Cytoscape 2.8.3 to determine overlapping modules and to identify hierarchical structure using the centrality property thus enabling the identification of key network elements [27]. The central core unit of each network module (metanode) was defined as the ten most central genes. A list of the unique genes in each metanode was generated and used as a model of the functional core of the associated network for further comparison. Network topology was analysed using the CytoHubba plugin for Cytoscape [28]. The String database was used to assess the integrity and connectivity of gene modules [29].

Analysis of epigenomic data

Previously published methylation profiles of six GHD patients were used to assess the relationship of changes in DNA methylation in relation to response to r-hGH [30]. The data from GSE57107 were re-analysed in Qlucore Omics Explorer 3.3 and a median-based gene level summary of methylation was determined (n = 20,618). The relationship between gene level DNA methylation and response to r-hGH was determined using rank regression.

Classification of growth response

All analysis was performed using the statistical software R 3.3.2 [31]. The relationship of baseline gene expression to potential predictive value (classification of low and high quartiles of response) was performed using Discriminant Analysis of Principal Components (DAPC) [32], Partial Least Squares Discriminant Analysis (PLS-DA) (mixOmics 6.1.1R package [33]) and random forest (RF) with 1000 trees [34]. Class size imbalance was corrected for using Synthetic Minority Oversampling Technique (SMOTE) [35]. Feature selection from RF data was performed using the BORUTA algorithm [36]. The area under the curve of the receiver operating characteristic (AUC) was used to present the probability of a randomly selected sample being classified correctly.

In RFs, about one third of the cases are left out of each iteration and can be used as a test set to perform cross-validation and to get an estimate of the test set error, the out of bag (oob) error estimate. The oob error estimate is recognised as being unbiased [34].

We used RF to investigate whether blood transcriptomic data from GHD and TS patients provided additional value for prediction of response to r-hGH based on baseline patient auxology (age, weight SDS, birthweight SDS and distance to target height SDS in both TS and GHD with the addition of peak GH value for GH provocation test in GHD). These analyses were performed by defining the predictive value of baseline clinical phenotype alone and these data were then compared to baseline clinical phenotype in addition to blood transcriptomic markers.

Statistics

Analyses were performed to determine genetic associations with response to r-hGH using the Kruskal–Wallis rank-sum test with Bonferroni corrections for FDR. This study was powered to detect an effect size of >6 cm/year as previously described [7].

Transcriptomic data were subjected to dimensional scaling using Principal Components Analysis and Iso-map multidimensional scaling [37] and used to demonstrate data homogeneity (Qlucore Omics Explorer 3.3) along with outliers using cross-validation. Unsupervised analysis of transcriptome data was performed using a projection score to select optimal variable subsets by variance filtering [38].

Transcriptomic associations with response to r-hGH were performed using rank regression (p < 0.01) and modified for the listed covariates. This was done by fitting a linear model with the factors to be eliminated as predictors, and retaining only the residuals (i.e. subtracting the part explained by the predictors). When a nominal factor was used as covariate (such as gender), this is equivalent to mean-centring each variable over each subgroup defined by the factor. To use covariates to adjust analysis with the downstream statistical test to relate gene expression and response to r-hGH a General Linear Model was set up, where the null hypothesis is that the data can be modelled by the covariates, and the alternative hypothesis is that the data can be modelled by covariates and the response to r-hGH.

The significance of gene set overlaps derived from the network analysis was determined using the hypergeometric test. Analyses were performed in the stated software or using R [31].

Study approval

The PREDICT (NCT00256126) and PREDICT Long-term Follow-up (NCT00699855) studies were approved by the Scotland Medical Research and Ethics Committee (reference 05/MRE10/61) and the North West Research Ethics Committee (reference 08/H1010/77), respectively. Informed consent was obtained from parents for all study participants.

Results

Growth response of patients over 5 years of r-hGH treatment

The auxology of the PREDICT study has been previously described at baseline and after 1 year [7] and after 3 years [5] of therapy with r-hGH. Height velocities as a measure of response to r-hGH at each year in GHD and TS are shown in Table 1A. As expected, first year growth response is the largest with a decline in subsequent years to a maintenance growth rate [39].

Table 1 Patient characteristics. (A) Growth response endpoints used over the duration of the study and (B) baseline auxology for patients with growth hormone deficiency (GHD) and Turner Syndrome treated with recombinant human growth hormone (r-hGH).

Genetic associations were not robust enough to be used to predict changes in growth rate over the 5 years of the study

The association between SNP carriage and growth response was assessed for 1096 and 792 growth-related candidate genes, in GHD and TS, respectively, which passed the filtering criterion. Whilst 113 SNPs were associated with growth response endpoints with an FDR p value < 0.05 modified by the number of blocks of LD, none of these were deemed to pass the stringency criteria required for predictive value (Table S1A–E).

Unsupervised and supervised analysis demonstrates that GHD and TS blood transcriptome at baseline can be used to classify response to r-hGH therapy over 5 years of treatment

We first demonstrated that a fundamental relationship existed between the baseline blood transcriptome and response to r-hGH over the 5 years of the study (GHD n = 50, TS n = 22) using DAPC on the unsupervised baseline transcriptome (GHD = 8875, TS = 8455 gene probe sets). These analyses showed clear segregation of the low response and the high response quartiles of response to r-hGH thus demonstrating the utility of blood transcriptome to differentiate response groups (Fig. 1). PLS-DA of the unsupervised baseline transcriptome demonstrated similar findings (Fig. 2).

Fig. 1: The association of whole blood gene expression at baseline with response to recombinant human growth hormone (r-hGH) over all 5 years of therapy in patients with growth hormone deficiency (GHD) and Turner syndrome (TS).
figure 1

Comparison of patient response to r-hGH using Discriminant Analysis of Principal Components (DAPC). Low quartile (green, LoQ) and high quartile (red, HiQ) of growth response over 5 years of therapy (cms grown) compared to the remaining patients (orange) in GHD (N = 50) and TS (N = 22). Unsupervised transcriptomic data with no normalisation for phenotype are shown, GHD = 8875 and TS = 8455 gene probesets. DAPC generates a discriminant function, a synthetic variable that optimises the variation between the groups whilst minimising the variation within a group. The frequency of the discriminant function of DAPC is plotted (colour figure online).

Fig. 2: Whole blood gene expression is associated with response to recombinant human growth hormone (r-hGH) over 5 years of therapy in patients with growth hormone deficiency (GHD) and Turner syndrome (TS).
figure 2

Partial least squares discriminant analysis (PLS-DA) of unsupervised transcriptome using three components. The low and high quartiles of growth response are shown for response to r-hGH (cm) over 5 years in A GHD and B TS. Star plot shows sample distance from the centroid, the arithmetic mean position of all the points in each group.

GHD and TS blood transcriptome at baseline can be used to classify response to r-hGH therapy year-on-year over 5 years of treatment

Baseline gene expression associated with height velocity at each year of the 5 years after the start of treatment with r-hGH was defined using rank regression (p < 0.01) (Table S2) with a range of covariates—microarray batch, age, BMI at baseline for both GHD and TS patients along with gender and peak GH test response in GHD. Tanner stage was added as a further covariate to account for the pubertal status of the patients (Fig. S1). There was no difference in auxology at baseline between each group of patients at each year of the study (Tables 1B and S2). First classification of low and high responding quartile groups of patients was assessed by PLS-DA using unmodified class sizes (Table S3A, B): clear separation of the quartiles was observed (example of first year GHD response, Fig. 3). We also examined classification of growth response using RF with oversampling by SMOTE to correct for uneven class size (GHD, Table S3A and TS, Table S3B). These data show clear classification of good and poor responders: at each year of the study all PLS-DA AUCs were between 73 and 98% and all RF AUCs were between 78 and 98% in both conditions.

Fig. 3: Predictive value of whole blood gene expression associated with response to recombinant human growth hormone (r-hGH) in patients with growth hormone deficiency (GHD).
figure 3

Classification of low quartile (LoQ) and high quartile (HiQ) of growth response (height velocity, cm/year) over each of 5 years of therapy with r-hGH (Y1-Y5) was performed in GHD patients and TS patients. Gene expression associated with growth response was determined using rank regression (p < 0.01) and Partial Least Squares Discriminant Analysis (PLS-DA) with two components (X-variate 1 and 2) was used to visualise response groups; PLS-DA is an analytical approach that determines the similarity between individual patients whilst maximising the difference between patient groups. Low quartile (green) and high quartile (red) compared to the rest of the data (orange) are shown for first year growth response to r-hGH in GHD (N = 71, 330 gene probesets with rank regression p < 0.01). Similarity between samples is represented by their proximity. The star plot shows sample distance from the centroid, the arithmetic mean position of all the points in each group (colour figure online).

Interactome network models of response to r-hGH

There was a limited overlap between GHD and TS whole blood transcriptomic markers related to growth response at each year of the study (Table S2). We therefore generated interactome network models including inferred interactions to assess whether GHD and TS growth response-associated gene expression was related by affecting the function of similar network modules, albeit in different ways.

Interactome network models of gene expression associated with height velocity at each year of the study were generated. The hierarchy of overlapping modules of genes was identified in each network using the network topology parameter of ‘centrality’ (Table S4). Network centrality is a measurement that is known to be related to gene function within networks; the more central a gene is, the more capable it is of influencing other genes within the network [40].

The gene level summary of SNP associations with change in height and height velocity measurements with FDR < 0.05 (Table S1C, D) were mapped onto the network models (Table S4). Most of the genetic associations with change in height and height velocity were also present within the network models—15/25 SNPs in GHD and 9/12 in TS (Table S5), implying that these genes have a functional role in network action.

Network models associated with height velocity in each year in both GHD and TS demonstrated significant overlaps (hypergeometric test, p < 0.01) (Fig. 4). These observations imply that whilst associated gene expression may be different between GHD and TS, common network elements are being affected in the two conditions.

Fig. 4: Overlap of the core interactome models of height velocity-related gene expression in GHD and TS.
figure 4

Interactome models were generated from the gene expression associated (p < 0.01) with the height velocity at each year of the study. The functional hierarchy of gene interaction modules within the interactome models was determined using the Moduland algorithm, and the core of the interactome model was defined as the unique sum of the top ten elements of the modules as ranked by network centrality. The overlap of the core of the interactome models between GHD and TS was then determined and visualised as a Venn diagram.

The overlap between network models formed a discrete interactome element shared between GHD and TS (Fig. 5). When this network was partitioned into genes related to each year of response to r-hGH (coloured Fig. 5A), it was determined that the genes associated with year 3 formed a less distinct cluster within the network (Fig. 5B). This observation is in alignment with a partition between early (years 1 and 2) and later (years 4 and 5) response to r-hGH as would be expected clinically.

Fig. 5: Network structure of the common core network module shared in patients with growth hormone deficiency (GHD) and Turner syndrome (TS) related to response to recombinant human growth hormone (r-hGH).
figure 5

A Similarities in the interactome models of the response of GHD and TS to r-hGH were identified by overlap at each year of therapy. Genes were selected that were significantly related to growth response in either or both GHD and TS. The genes related to each year of therapy were combined into a set of 58 uniquely identified genes and this set was used to generate an interactome module (reactome plugin for Cytoscape 3.6.0). Genes with a dark border also have a genetic association with growth response in either GHD or TS. Connecting lines represent known protein:protein interactions, size of the node is proportional to the number of connections made. B The clustering coefficient of the group of genes in the network module associated with each year of therapy was determined and presented as a histogram (average ± standard error of the mean). The clustering coefficient measures the tendency of nodes to cluster together within a network. C The correlation coefficient linking gene expression with growth response at each year of therapy was mapped to the network model, red = positive correlation, green = negative correlation. Genes with a thick border also have a genetic association with growth response in either GHD or TS (colour figure online).

The facts that (i) genetic associations with growth response map to the network models derived from transcriptomic data and that (ii) the network connectivity of the central modules changes over the duration of the study imply that the network models are robust and account for the effect of development on related phenotype (Table S5).

The identification of core sets of genes that can classify response to r-hGH in both GHD and TS

The overlap between network models was used to select a common set of genes at each year of therapy present in both GHD and TS. Genes within this common list were selected for growth response classification if they had previously been identified as significantly associated with height velocity by rank regression in either GHD or TS (p < 0.05) (Fig. 5C and Table S2).

Classification of both high and low r-hGH response quartiles against the remaining patients was shown using PLS-DA (no oversampling) and RF (using SMOTE oversampling). All AUCs for classification were between 74 and 96% (Table S6).

Further confidence in the findings was provided by assessing the predictive quality of the gene probe sets using BORUTA to define the limits of the noise in the analysis using a 100-fold permutation of the data (e.g. first year growth response Fig. 6 and Table S7).

Fig. 6: Predictive value of an identical set of blood gene expression markers identified by network analysis in the classification of response to recombinant human growth hormone (r-hGH) in patients with growth hormone deficiency (GHD) and Turner syndrome (TS).
figure 6

First year growth response is used as an example. Similarities in the interactome models of the response of GHD and TS to r-hGH were identified by overlap at each year of therapy. Genes were selected that were significantly related to growth response in either or both GHD and TS, generating an identical set of gene probesets used for prediction of both high and low response in both GHD and TS. BORUTA, an all relevant feature selection wrapper random forest-based algorithm, was used to confirm the importance of gene expression probe-sets used for classification of response to r-hGH. The BORUTA algorithm uses a 100-fold permutation to define the noise present in the data; the noise is modelled as shadow variables and used as a basis to assess confidence in the data. Green = confirmed gene probeset, yellow = tentative gene probeset, red = rejected gene probeset, blue = shadow variables (high, medium and low shadow variables are derived to define the noise within the dataset). Low quartile (left column—LoQ) and high quartile (right column—HiQ) are shown for first year growth response to r-hGH in GHD and TS. The same group of gene probesets are used in each case (colour figure online).

The core sets of genes with expression in whole blood that can classify response to r-hGH in both GHD and TS are associated with differential genomic methylation

Changes in genomic methylation in response to short-term treatment with r-hGH (4 days) have been demonstrated in children with range of conditions that manifest short stature [30]. Using the data provided by this previously published study, we examined the epigenome at baseline (prior to r-hGH treatment) in relation to growth response (measured by knemometry) in GHD patients (n = 6) and found that using a gene level summary of DNA methylation (20,618 genes) 497 had methylation associated with growth response to r-hGH (rank regression, p < 0.01) (Fig. 7). The majority of associated genes (425/497) were hypermethylated at lower rates of growth response.

Fig. 7: Gene level summary of DNA methylation in GHD patients is related to growth response as measured by Knemometry.
figure 7

Whole epigenome measurements of six GHD patients with growth response after 4 days of r-hGH therapy measured by knemometry were available from previously published data (GSE57107). A gene level summary of DNA methylation was conducted using median values in Qlucore Omics Explorer (version 3.3) (n = 20,618). A Rank regression of whole-genome DNA methylation against growth response after 4 days of r-hGH as measured by knemometry (p < 0.01) found 497 genes with differential methylation the majority of which showed increased methylation at low rates of growth (negative correlation). B Whole-genome methylation in the six GHD patients ordered by growth response in the sets of genes identified as predicting response to r-hGH in the first year of therapy in both GHD and TS.

We took the core sets of genes previously identified as classifiers of response to r-hGH in both GHD and TS and mapped gene level methylation present in the six GHD patients with knemometry measurements. The majority of genes (57/71) were correlated with growth response (|r| > 0.3) these were evenly distributed between positive (n = 27/71) and negative (n = 30/71) correlations (year 1 data shown in Fig. 7B).

Transcriptomic markers combined with phenotype lead to better growth response prediction

It is known that the baseline phenotype of GHD and TS patients can be used to predict response to r-hGH [2, 41,42,43]. We can use clinical parameters alone to achieve good classification of response to r-hGH in our current data sets (GHD range: 0.86–0.94, TS range: 0.84–0.91) (Table 2). However, we also found that including the blood transcriptome markers increased predictive value at each year (GHD range: 0.95–0.97, TS range: 0.92–0.95). These data reveal an average increase of 7% (p = 0.0031) and 4% (p = 0.0365) (prediction of low quartile) along with 4% (p = 0.0179) and 4% (p = 0.0097) (prediction of high quartile) in GHD and TS, respectively (Table 2).

Table 2 Comparison of the prediction of growth response using clinical phenotype with and without transcriptomic data.

Importantly, we also noted a significant decrease of error rate in the prediction of growth response at each year when blood transcriptome markers were combined with clinical phenotype markers. Error rates decreased by an average of 5% (p = 0.0084) and 5% (p = 0.0400) (prediction of low quartile) along with 5% (p = 0.0252) and 5% (p = 0.0067) (prediction of high quartile) in GHD and TS, respectively (Table 2). The reduction observed amounted to an average halving of the error rate seen when predicting response to r-hGH using clinical phenotype markers alone.

Discussion

This study aimed to identify for the first time the genomic associations that classify response to r-hGH therapy from 1 year up to 5 years of treatment with r-hGH in children with TS and GHD.

Our previous analysis has shown limited utility of genetic associations derived from a candidate set of growth-related genes in the prediction of response to r-hGh in GHD and TS after 1 year of therapy [5, 7, 44]. Hence, genetic data do not appear to be powerful enough on their own to be used in prediction and clinical management. Recently a genome-wide association study (GWAS) on r-hGH response over the first year of treatment has been published [45]. Inevitably this work uses small numbers and did not find any evidence of genome-wide significance in primary analysis. After secondary analysis including replication there was no evidence of association with either previously published first year response genes [7] or with polygenic predicted height score; however, several loci were identified as possibly significant. The transcriptomic analysis presented in our manuscript covers the whole genome and the associated network modelling links genes with differential expression to the entire genomic background. In a rare disease, such as GHD, a major problem with genetic investigation is that patient numbers will never be sufficient for a fully powered analysis with GWAS as the size of the set of genetic variants will be several orders of magnitude higher than the number of the expressed genes. Analysis of the transcriptome in this situation suffers from issues in the interpretation of tissue-specific action but is much more likely to achieve robust findings. In our study, we have used the blood transcriptome as a marker of the action of r-hGH and, therefore, present the basis of a future test using routine clinical sampling.

The whole-blood transcriptomic profile of GHD and TS patients has been shown to be associated with first year growth response to r-hGH [7] and to correlate with the interaction between GHRd3 and GHD severity [44]. We therefore reasoned that there may be value in using transcriptomic data to classify growth response, as it reflects both a child’s genetic profile and the complex clinical phenotypes arising from changes in physical development during childhood, as well as variation in the severity of the underlying condition. By normalising gene expression for phenotype, including pubertal stage, we were able to show that whole blood transcriptomic data, associated with height velocity at each year of the study, could be used to classify both the low and high quartiles of growth response, with ‘area under the curve’ up to 97%, providing the basis for a predictive test.

Little overlap between GHD and TS was observed between the gene expression data that was associated with each year of growth response. We therefore investigated whether GHD and TS were interacting with similar functional units of genes using network models [46]. We generated network models of growth response (as determined by height velocity) at each of the 5 years of treatment using baseline gene expression. Functional modules of genes within these models were ranked according to their network centrality. The measure of network centrality is known to be associated with mechanism [26, 47] and we used this measure to define the functional hierarchy of the modules of genes whose expression was linked to r-hGH response at each year of therapy.

We demonstrated robustness of the network modules identified by mapping the genetic associations identified in this study to the network models. This process highlighted genes previously identified as associated with growth response after 1 year of therapy (GRB10, SOS1 and INPPL1 in GHD) [7] along with 1 month change in serum IGF-I associated with r-hGH therapy (CDK4) [21]. It was also noted that three genes were present (INPPL1 and SOS1 in GHD and PTPN1 in TS) out of the four genes identified within the PREDICT validation study as having replicated an association with first year growth response when controlled for co-variates [5].

A significant overlap between the core network gene modules between GHD and TS was identified. We then used gene expression changes associated with growth response within these network elements to identify genes common to both conditions and show that their expression could be used to classify growth response.

The major strength of this study is to have identified predictive markers and common genomic mechanisms related to early and later growth in two different growth disorders. Our findings are also supported by the demonstration of differential methylation in these shared genes, associated with response to r-hGH in another study [48]. Importantly, we have defined sets of gene expression with predictive value in two conditions where the number of genes [17,18,19,20,21,22,23,24,25,26] is smaller than the number of patients in the group (33–70) (Tables S2 and S6); this indicates that the findings are not a consequence of overfitting [49].

In this study we have compared the use of baseline patient auxology to blood transcriptome in predicting response to r-hGH. Linear models based on baseline patient auxology can account for ~40–60% of the variance observed [9, 10]. Using RF, we found no significant difference in the AUC of baseline auxology alone compared to using blood transcriptome alone in either GHD or TS (all ~90%). It should be noted that this comparison was with the transcriptome shared between GHD and TS and if the full blood transcriptome is used then the average AUC is significantly higher than that derived from baseline auxology (average AUC ~90% compared to ~95%). We recognise that further work would need to be done to refine a smaller number of genes and therefore minimise the risk of overfitting when using the full blood transcriptome. However, we did identify a significant boost to prediction between 4 and 7% when the transcriptomic signature shared between GHD and TS was combined with the baseline patient auxology. Importantly, the gain in prediction was combined with an average halving of the error rate, a feature that represents a major clinical advance in the prediction of response to r-hGH.

We propose that the work presented in this manuscript represents a step towards individualised prediction of response to r-hGH. To proceed towards a clinical test, further validation would be required using the effect sizes we have defined and we would need to compare the group findings established in our study with the absolute individual gene expression provided by RNAseq. It would also be possible to link the transcriptomic findings to the complex genetic background of response to r-hGH using the definition of expressed quantitative trait loci potentially providing a route to a genetic risk score refined by the transcriptomic response.

This work has led to three novel findings relevant to growth studies, and potentially to other therapeutic areas in paediatrics. First, this study has demonstrated the utility of whole blood transcriptome in the classification of growth response in GHD and TS, derived from a baseline blood sample, which is straightforward to obtain in any child. This technique may be of particular use in conditions with marked variability in response to r-hGH such as the short child born small for gestational age. Second, network analysis provides a novel approach that can be used to identify genomic features that are likely to have high predictive value. Finally, a set of common genes in GHD and TS identified by a network approach can be used to classify growth response in both conditions, providing the opportunity to develop a test to inform clinical management.