Evolutionary conserved networks of human height identify multiple Mendelian causes of short stature

Height is a heritable and highly heterogeneous trait. Short stature affects 3% of the population and in most cases is genetic in origin. After excluding known causes, 67% of affected individuals remain without diagnosis. To identify novel candidate genes for short stature, we performed exome sequencing in 254 unrelated families with short stature of unknown cause and identified variants in 63 candidate genes in 92 (36%) independent families. Based on systematic characterization of variants and functional analysis including expression in chondrocytes, we classified 13 genes as strong candidates. Whereas variants in at least two families were detected for all 13 candidates, two genes had variants in 6 (UBR4) and 8 (LAMA5) families, respectively. To facilitate their characterization, we established a clustered network of 1025 known growth and short stature genes, which yielded 29 significantly enriched clusters, including skeletal system development, appendage development, metabolic processes, and ciliopathy. Eleven of the candidate genes mapped to 21 of these clusters, including CPZ, EDEM3, FBRS, IFT81, KCND1, PLXNA3, RASA3, SLC7A8, UBR4, USP45, and ZFHX3. Fifty additional growth-related candidates we identified await confirmation in other affected families. Our study identifies Mendelian forms of growth retardation as an important component of idiopathic short stature.


Introduction
Human height is a heritable and highly heterogeneous trait [1]. Efforts to understand the genetic basis of growth have employed genome-wide association studies (GWAS) to systematically assess the effect on human height variation of common variants with a minor allele frequency > 5% [2]. 697 variants, mainly located in 423 noncoding loci, have been implicated in height variance in the population [2,3]. Subsequent studies on rare variants, both at the nucleotide and genomic levels, further expanded the number of associated loci [3,4]. So far, rare and common height-associated variants together explain about 27.4% of height heritability [3]. In addition, it is known from Mendelian forms of growth retardation that rare, large effect-size variants can have extremely large effects on growth development [3].
Short stature, defined auxologically as a height two standard deviations below the mean height in the population, affects about 3% of individuals and is a common medical concern. In a recent study combining systematic phenotyping and exome-based sequencing, we were able to identify a genetic cause in up to 33% of individuals with idiopathic short stature (ISS) [5]. Consequently, 67% of the affected individuals remained undiagnosed. Most forms of short stature have been attributed to Mendelian causes, highlighting defects in a diverse range of functional pathways [6,7]. The most common monogenic causes include defects of the SHOX gene (2.4%) [8], heterozygous variants in ACAN (1.4%) [9] and many genes for rare syndromic forms as well as skeletal dysplasias [8,[10][11][12]. At least 477 genes have been found to affect human growth [13], but as yet there are no reliable estimates of the number of growthassociated genes. For most of these genes, though, no association with short stature has been found in humans. Affected individuals and their families would thus benefit from the identification of further genes associated with growth retardation. In this study in 254 unrelated individuals with ISS and their families, we used exome sequencing to identify and characterize novel candidate genes based on evolutionarily conserved networks.

Individuals
The study was approved by the ethics committee of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU). 565 individuals and their families were referred by medical specialists for evaluation of growth retardation after endocrine defects of the growth hormone pathway and other organic causes of growth retardation were excluded. After previous targeted testing including array analysis in some and exome sequencing failed to identify a known cause, we further assessed the exomes of 254 well-characterized families with at least 1 offspring whose growth standard deviation score (SDS) was ≥2 below the mean population height and/or the estimated family (est. height in Table 1) for potential candidate genes (Table 1 and Supplementary  Table 2). Participants' mean age (±standard deviation) was 9.2 ± 0.43 years, and 155/254 (61%) participants were female. Most index individuals (53%) presented with a height between 3 and 2 SDS below the mean, and 32% were born small for gestational age (Supplementary Figure 1). In 68/254 (27%) participants, additional features such as Affected individuals with a height above −2 SDS, but below the est. height microcephaly, syndactyly, nail dysplasia or any nonspecific facial gestalt resulted in a diagnosis of syndromic short stature.

Exome sequencing and variant assessment
We performed whole-exome sequencing in 185 affected individuals and both of their respective parents (trio analysis) and in 69 affected individuals (affected-only analysis) after enrichment by SureSelect targeted capturing on HiSeq 2500 (94.3%) or SOLiD 5500xl (5.7%). Exomes were analyzed by semiautomatic selection and data quality inspection of variants, followed by the interpretation in relation to the reported phenotypic spectrum (Supplementary Figures 2 and 3). Variants and familial segregation were confirmed by Sanger sequencing. All 185 trios were analyzed for variants with de novo, compound heterozygous, homozygous, and X-linked recessive inheritance. We classified variants according to their population frequency and potential effect on gene function (Supplementary methods, Fig. 1c and Supplementary Tables 1-3). Population data from gnomAD was considered as most likely appropriate controls. After excluding benign and likely benign variants, we evaluated affected genes for their relevance to growth phenotypes. For this gene-level assessment, we included information from association studies, copy number variants, gene ontology (GO) terms, protein-protein interaction data, suitable mouse and zebrafish models, and a previous exome study [14] ( Fig. 1d and Supplementary Table 4). Additional data on expression in chondrocytes was obtained by RNA sequencing (RNASeq) of cartilage tissue for all genes studied (Fig. 1d, Supplementary Methods and Figure 4). The combined results of the variant-level and gene-level assessments were then used to identify genes, which were further investigated with respect to the observed mode of inheritance in the 69 affected-only exomes (  Table 5). Genes were finally classified as high-confidence and medium-confidence candidates based on the number of affected individuals and the combined variant-level and gene-level scores (Supplementary Table 6 and Supplementary Methods). Variants were uploaded to ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/, Submission ID SUB5032330).

Functional clustering analysis
Functional enrichment analysis was performed using the Database for Annotation, Visualization, and Integrated Discovery, which comprises 1025 Genes using the keywords "growth delay" from the Human Phenotype Ontology (HPO) database and "short stature" from Online Mendelian Inheritance in Man (OMIM) and MedGen (Supplementary methods). Proteins were submitted to DAVID using human gene Ensembl identifiers. Significantly overrepresented annotation terms were retrieved with the options GOTERM_MF_ALL, GOTERM_ CC_ALL, GOTERM_BP_ALL, KEGG_PATHWAY, UP_KEYWORDS, and OMIM_DISEASE. A false discovery rate (FDR) of 0.05 by the Benjamini and Hochberg approach was used to determine significant enrichment using all human gene Ensembl identifiers in the BioMart database as the background (Supplementary methods). Functional Annotation Clustering was performed using the DAVID functional annotation clustering tool with the following parameters: overlap = 5, initialSeed = 5, finalSeed = 5, linkage = 0.5, kappa = 20. This tool implements a fuzzy clustering algorithm to cluster functional annotation terms based on the degree of the overlap between associated genes. Raw p-values were used to compute the initial clusters. Annotation terms with an FDR > 0.05 were subsequently pruned. Given a list of candidate genes and an annotation term, we calculated (i) the odds of a human gene (Ensembl gene identifier) in the list being associated with the annotation term; and (ii) the odds of a human gene (Ensembl gene identifier) not in the list being associated with the term. The odds ratio was calculated by dividing the odds from (i) by the odds from (ii) using the fisher.test () function implemented in the R environment for statistical computing. Results were presented as a treemap based on the functional annotation clustering using R's treemap package.

Protein structure analysis
For all variants of unknown significance in high-confidence candidate genes and for missense variants in RASA3, we performed in silico structural analyses and reclassified variants based on predicted functional consequences ( Fig. 1f and Supplementary Figure 8). Models for all wildtype protein domains were either obtained from Modbase [15] or modelled using HHpred [16] and Modeller [17]. The variant amino acid was exchanged using Swiss-Model [18] and visualized with RasMol [19]. Detailed information on individual modelling is provided in the Methods section in the supplementary information.

Identification of novel candidate genes for ISS
Characterization of involved variants and genes and consideration of independent affected individuals carrying these variants revealed 63 candidate genes in 92 (36%) of the 254 affected individuals included in the analysis. We classified 13 genes as high-confidence genes and the remaining 50 genes as medium-confidence candidates (Tables 2 and 4, Supplementary Tables 7 and 11, and Supplementary Figures 6-18). The mode of inheritance was mainly autosomal dominant (71%), followed by X-linked recessive (17%), and autosomal recessive (11%). The most  Tables 1,  2). d Categories of gene level assessment. Numbers represent the genes to which each category applies (see Supplementary Table 4). Numbers in brackets represent the genes among all selected known growth and short-stature genes. e The results of the variant and gene level evaluation were merged to a combined score (Shown is the highest score for each gene, Supplementary Table 5). f Based on structure analysis of variants of unknown significance (VUS), 5 variants in 4 of the high-confidence candidate genes were reclassified to likely pathogenic. Model of the RASA3 C2-domain showing the site of the Asp82Glu and Val85Ala variants. Both residues are located in a pocket of the C2 domains that contains two Ca 2+ ions (Ca). Asp82 forms interactions with both Ca 2+ ions (green dotted lines), whereas the longer glutamate side chain of the Asp82Glu variant can only interact with one of the Ca 2+ ions, probably leading to a loss of the second Ca 2+ ion from the binding pocket. Val85 (blue) is located on the lateral wall of this pocket, and the shorter alanine side chain in the Val85Ala variant affects the width of the pocket. g Distribution of the 63 high-and medium-confidence candidate genes in the growthassociated clusters  abundant variants were missense variants (78%), followed by nonsense (15%), splice site (6%), and non-frameshift insertion / deletion variants (1%). Nonsense variants were identified in the high-confidence candidate genes RASA3 and USP45. Based on in silico structural analysis, we reclassified 5 variants in 4 of the high-confidence candidate genes from "variants of unknown significance" (VUS) to "likely pathogenic" (Fig. 1f and Supplementary Table 7).

Previously reported candidate genes
Of the 63 candidate genes we identified, 6 (9.5%) were previously found to be associated with short stature or syndromes featuring short stature (Table 3 and Supplementary Table 12). A de novo loss of the start codon in ZBED4 was found in another, smaller exome study in individuals with ISS [14]. A missense variant in BRD4 was reported to segregate in one family with short stature [20]. Variants in FZD2 and LZTR1 were described in individuals with Robinow syndrome-like phenotype (FZD2) [21,22] and Noonan syndrome (LZTR1), respectively [23,24]. Recently, variants in AMMECR1 were observed in individuals with midface hypoplasia, hearing impairment, elliptocytosis, and nephrocalcinosis (OMIM 300990) [25,26]. Here, short stature is a constant feature. Biallelic variants in IFT81 underlie a severe form of short stature (short-rib thoracic dysplasia 19; OMIM 617895). Interestingly, a heterozygous missense variant segregates with the growth deficit in one family, suggesting autosomal dominant inheritance.
Enrichment analysis and clustering of genes known to be related to growth retardation 1025 genes potentially involved in growth delay or growth regulation were selected using the keywords "growth delay" from the Human Phenotype Ontology (HPO) database and "short stature" from OMIM and MedGen (Supplementary methods). Based on their significantly enriched functions, pathways and other biological features, we identified 29 clusters, including skeletal system development (GO:0001501), appendage development (GO:0048736), and ciliopathy and metabolic processes (GO:0044710) (Fig. 1b). Moreover, clusters were often related to central processes like chromatin binding (GO:0003682) or extracellular matrix (GO:0310122).
Enrichment analysis revealed that 95% of the candidate genes were mapped to 26 of the 29 clusters previously implicated in growth or short stature ( Fig. 1g and Table 4). LAMA5 and MED24 were not mapped to any of these 29 clusters (Table 4 and Supplementary Tables 7, 10, and 11). The main involved KEGG pathways for the 13 highconfidence candidate genes were: metabolic pathways (hsa:01100), PI3K-Akt signaling pathway (hsa:04151),

Discussion
Growth-related disorders constitute a very heterogeneous group of disorders. Based on the results from large GWAS and gene-expression studies, we have estimated that at least 1000 genes are involved [5,28]. These studies highlighted the observation that rare, large effect-size variants of growth retardation are inherited according to the three classical patterns of Mendelian inheritance [28]. Nevertheless, 67% of individuals with ISS remain without a diagnosis [5].
Using exome sequencing, we therefore aimed to identify novel genes associated with short stature in 254 affected families in whom known causes of short stature had previously been excluded ( Table 2, Supplementary Tables 7  and 11, and Supplementary Figure 1) [5]. We identified variants in 63 candidate genes in a total of 92 independent families. Based on a classification scheme using variant and gene information as well as the number of independent affected individuals, we classified 13 genes as highconfidence genes and 50 other genes as medium-confidence candidates (Fig. 1c-e and Supplementary Figure 2). To facilitate their characterization, we compiled a list of 1025 genes from HPO, OMIM, and MedGen based on the keywords "growth delay" and "short stature" and clustered their significantly enriched functions, pathways and other biological features. This resulted in 29 clusters, the largest of which were skeletal system development, appendage development, metabolic processes, and ciliopathy. These clusters reflect not only processes of skeletal development but also genes involved in basic processes like cellular growth (Fig. 1a, b, Table 4, and Supplementary Figures 5-7). Interestingly, the clusters with the highest mean enrichment were bone morphogenesis (6.3-fold), peroxisome biogenesis disorder (6.0-fold), and ossification (5.6fold). These results confirmed a broad functional range of potentially growth-related genes. In addition, we generated a functional map of growth-associated genes that provides general information on functions, pathways and other biological features overrepresented in this gene set. We were able to map 95% of the 63 candidate genes onto at least one of the 29 clusters, supporting their relevance to growth ( Fig. 1g and Table 4). These include the aforementioned clusters with a high mean enrichment, but also more specific clusters such as extracellular matrix, ossification, bone morphogenesis, and several catalytic processes.
The sensitivity of our approach to identify candidate genes for short stature was further supported by the identification of six genes (IFT81, AMMECR1, BRD4, FZD2, LZTR1, ZBED4) recently found to be associated with this phenotype (Table 3, Supplementary Table 12). Both the previously reported mode of inheritance and the type of variant were observed for AMMECR1 (X-linked recessive loss of function variants) [25,26], BRD4 (autosomal dominant missense variants) [20] and ZBED4 (de novo nonsense variants) [14]. In addition, the clinical phenotype of the affected individuals was part of the phenotypic spectrum reported, providing additional evidence for their implication (Table 3 and Supplementary Table 12). Interestingly, while nonsense variants in FZD2 cause severe skeletal dysplasia phenotypes [21,22], we identified missense variants in this gene in individuals with ISS, suggesting that missense variants are associated with a milder phenotype. IFT81 has previously been shown to be associated with an autosomal recessive syndromic form of short stature [24,29,30]. We propose that heterozygous variants affecting only one allele cause only short stature, a mechanism similar to that recently demonstrated for variants in ACAN [9]. For LZTR1, which is involved in the RAS-Map kinase pathway, the situation is more complex, since variant type-dependent recessive and dominant inheritance modes were observed to cause a Noonanspectrum disorder with short stature as a consistent phenotype [23,24]. We hypothesize that specific missense variants may be associated with isolated short stature.
RASA3 and FGF18 are two novel candidates implicated in known signaling pathways. In RASA3, we identified a de novo frameshift variant leading to a pre-terminal stop codon in one individual, and a de novo missense variants located in the C2 domain in a second individual, suggesting a lossof-function effect. Through collaboration, we identified one additional individual with ISS and a de novo missense variant in this domain. Molecular modeling revealed that these missense variants potentially interfere with proper function of the C2 domain, which is reported to be relevant in targeting the protein to the cell membrane (Fig. 1f) [31]. Both the disruption of GTPase, mediated by a pre-terminal stop codon, and mislocation in the cell may potentially interfere with its proper function and thus lead to reduced inactivation of Ras-signaling and further hyperactivation of the pathway [32]. Likewise, FGF18 plays an important role in chondrogenesis and osteogenesis by binding to FGFR2 and FGFR3 [33]. Gain-of-function variants in both receptors were reported to cause mainly syndromic forms of short stature [34][35][36]. We propose that, by analogy with other members of this group of growth factors, a disease causing variant in the interacting protein leads to constitutional activation of the receptors in the complex [37].
Our classification, in addition to variant and gene information, is based on the number of independent affected individuals. All 13 high-confidence candidate genes were identified in at least 2 families, and 5 candidates even exhibited variants in up to 8 families. The most promising candidates were LAMA5 with 4 variants likely affecting the function and 4 variants of unknown significance in 8 families, and UBR4 with 5 variants likely affecting the function and 1 variant of unknown significance in 6 families ( Table 2). All affected individuals carrying variants in either of these genes presented with proportionate short stature and a mean height that was 2.9 SDS below average. LAMA5 encodes the laminin subunit alpha 5, which plays a crucial role in development by regulating Wnt and PI3K signaling during osteoblast differentiation [38][39][40]. Recent reports demonstrated an involvement in syndromes including kidney disease, osteoarthritis and hypothyroidism, and an association with reduced height in elderly individuals from Southern Italy [41][42][43][44]. Studies in Lama5 mice with a hypomorphic allele indicated an essential role for LAMA5 in growth development [45]. Correspondingly, UBR4, the ubiquitin protein ligase E3 component n-recognin 4, is involved in cancer cell growth [46]. In Ubr4 knockout mice, growth retardation was reported in fetuses [47]. A 3.46 Mb deletion encompassing UBR4 was reported in an affected individual with short stature [48]. Intolerance to missense variants as indicated by a z-score of 5.98 (ExAC) suggested that the variants identified in the 6 affected individuals are highly likely to be associated with the phenotype [48,49]. For another 50 genes, we found strong functional evidence for their involvement, albeit some were identified only in a single family. Thus, we cannot exclude that some genes may be unrelated to short stature (Supplementary Table 8-10).
If we were to consider only the 13 high-confidence candidates, this would yield an exome sequencing detection rate of 15% in highly selected families in whom known causes were previously excluded. This probably constitutes an underestimation as we assume that some of the other 50 candidate genes may be confirmed in subsequent studies. We may have missed smaller structural variants, variants in noncoding or insufficiently covered regions, or epigenetic changes, which have also been reported in connection with short stature [4,50]. Furthermore, polygenic inheritance, mating selection and familial height variability may have hampered the clinical characterization of, and hence the identification of, the underlying variant.
In conclusion, we identified and characterized 13 highconfidence candidate genes with variants in two or more independent families and another 50 medium-confidence candidate genes. Of these candidate genes, 95% are annotated with functions, pathways and other biological features that are significantly enriched among 1,025 growthassociated genes. These results illustrate that in entities with extremely high heterogeneity and complexity, such as growth defects, clinical characterization and variant-level and gene-level information need to be combined to identify candidate genes. Our study also suggests that single gene defects are an important contributor to the extreme lower end of the growth distribution.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. EK, AMJ, TRR, DW, ARa, and CTT contributed data from affected individuals. All affected individuals were clinically evaluated by HGD and CTT. Statistical analysis was done by LT, BP, SU, FF, CB, ABE, and HS. PK and FF contributed the chondrocyte expression data. CK and UT performed clinical diagnostic testing. Splice site validation was performed by NNH and ES. Data was analyzed by NNH, BP, SS, FF, SU, CB, HS, AR, and CTT. NNH and CTT interpreted the results. NNH, AR and CTT wrote the draft manuscript. All co-authors provided feedback on the estimates and contributed to the subsequent versions of the manuscript. All authors read and approved the final version of the manuscript.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.