Abstract
Height is a highly heritable, classic polygenic trait with approximately 700 common associated variants identified through genome-wide association studies so far. Here, we report 83 height-associated coding variants with lower minor-allele frequencies (in the range of 0.1–4.8%) and effects of up to 2 centimetres per allele (such as those in IHH, STC2, AR and CRISPLD2), greater than ten times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (giving an increase of 1–2 centimetres per allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes that are mutated in monogenic growth disorders and highlight new biological candidates (such as ADAMTS3, IL11RA and NOX4) and pathways (such as proteoglycan and glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate-to-large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Fisher, R. A. The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918)
Silventoinen, K. et al. Heritability of adult body height: a comparative study of twin cohorts in eight countries. Twin Res. 6, 399–408 (2003)
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014)
Flannick, J. et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat. Genet. 46, 357–363 (2014)
Steinthorsdottir, V. et al. Identification of low-frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes. Nat. Genet. 46, 294–298 (2014)
Gudmundsson, J. et al. A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat. Genet. 44, 1326–1329 (2012)
Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015)
Danjou, F. et al. Genome-wide association analyses based on whole-genome sequencing in Sardinia provide insights into regulation of hemoglobin levels. Nat. Genet. 47, 1264–1271 (2015)
Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–E464 (2014)
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015)
Grove, M. L. et al. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS One 8, e68095 (2013)
Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739 (2007)
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012)
Lanktree, M. B. et al. Meta-analysis of dense genecentric association studies reveals common and uncommon variants associated with height. Am. J. Hum. Genet. 88, 6–18 (2011)
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015)
Lamparter, D., Marbach, D., Rueedi, R., Kutalik, Z. & Bergmann, S. Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLOS Comput. Biol. 12, e1004714 (2016)
Schwartz, N. B. & Domowicz, M. Chondrodysplasias due to proteoglycan defects. Glycobiology 12, 57R–68R (2002)
Wei, H. S., Wei, H. L., Zhao, F., Zhong, L. P. & Zhan, Y. T. Glycosyltransferase GLT8D2 positively regulates ApoB100 protein expression in hepatocytes. Int. J. Mol. Sci. 14, 21435–21446 (2013)
Ito, H. et al. Molecular cloning and biological activity of a novel lysyl oxidase-related gene expressed in cartilage. J. Biol. Chem. 276, 24023–24029 (2001)
Wakahara, T. et al. Fibin, a novel secreted lateral plate mesoderm signal, is essential for pectoral fin bud initiation in zebrafish. Dev. Biol. 303, 527–535 (2007)
Kawano, Y. & Kypta, R. Secreted antagonists of the Wnt signalling pathway. J. Cell Sci. 116, 2627–2634 (2003)
Mastaitis, J. et al. Loss of SFRP4 alters body size, food intake, and energy expenditure in diet-induced obese male mice. Endocrinology 156, 4502–4510 (2015)
Jepsen, M. R. et al. Stanniocalcin-2 inhibits mammalian growth by proteolytic inhibition of the insulin-like growth factor axis. J. Biol. Chem. 290, 3430–3439 (2015)
Dauber, A. et al. Mutations in pregnancy-associated plasma protein A2 cause short stature due to low IGF-I availability. EMBO Mol. Med. 8, 363–374 (2016)
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010)
Karaplis, A. C. et al. Inactivating mutation in the human parathyroid hormone receptor type 1 gene in Blomstrand chondrodysplasia. Endocrinology 139, 5255–5258 (1998)
Sims, N. A. et al. Interleukin-11 receptor signaling is required for normal bone remodeling. J. Bone Miner. Res. 20, 1093–1102 (2005)
Takeuchi, Y. et al. Interleukin-11 as a stimulatory factor for bone formation prevents bone loss with advancing age in mice. J. Biol. Chem. 277, 49011–49018 (2002)
Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016)
Goldstein, J. I. et al. zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics 28, 2543–2545 (2012)
Liu, D. J. et al. Meta-analysis of gene-level tests for rare variant association. Nat. Genet. 46, 200–204 (2014)
Winkler, T. W. & Day, F. R. Quality control and conduct of genome-wide association meta-analyses. Nat. Protocols 9, 1192–1212 (2014)
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011)
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015)
Feng, S., Liu, D., Zhan, X., Wing, M. K. & Abecasis, G. R. RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 30, 2828–2829 (2014)
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012)
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015)
Pasaniuc, B. et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914 (2014)
Moayyeri, A., Hammond, C. J., Valdes, A. M. & Spector, T. D. Cohort Profile: TwinsUK and healthy ageing twin study. Int. J. Epidemiol. 42, 76–85 (2013)
Boyd, A. et al. Cohort Profile: the ‘children of the 90s’—the index offspring of the Avon Longitudinal Study of Parents and Children. Int. J. Epidemiol. 42, 111–127 (2013)
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010)
Purcell, S. M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014)
Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011)
Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010)
Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015)
Fehrmann, R. S. et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet. 47, 115–125 (2015)
Frey, B. J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007)
Overgaard, M. T. et al. Expression of recombinant human pregnancy-associated plasma protein-A and identification of the proform of eosinophil major basic protein as its physiological inhibitor. J. Biol. Chem. 275, 31128–31133 (2000)
Gyrup, C. & Oxvig, C. Quantitative analysis of insulin-like growth factor-modulated proteolysis of insulin-like growth factor binding protein-4 and -5 by pregnancy-associated plasma protein-A. Biochemistry 46, 1972–1980 (2007)
Oxvig, C., Sand, O., Kristensen, T., Kristensen, L. & Sottrup-Jensen, L. Isolation and characterization of circulating complex between human pregnancy-associated plasma protein-A and proform of eosinophil major basic protein. Biochim. Biophys. Acta 1201, 415–423 (1994)
Acknowledgements
A full list of acknowledgments appears in the Supplementary Information. Part of this work was conducted using the UK Biobank resource.
Author information
Authors and Affiliations
Consortia
Contributions
Writing group (wrote and edited manuscript): P.D., T.M.F., M.Gr., J.N.H., G.L., K.S.L., Y.Lu., E.M., C.M.-G., F.Ri. All authors contributed and discussed the results, and commented on the manuscript. Data preparation group (checked and prepared data from contributing cohorts for meta-analyses and replication): T.Es., M.Gr., H.M.H., A.E.J., T.Ka., K.S.L., A.E.L., Y.Lu., E.M., N.G.D.M., C. M.-G., P.Mu., M.C.Y.N., M.A.R., C.S., K.St., V.T., S.V., T.W.W., K.L.Y. This work was done under the auspices of the GIANT, CHARGE, BBMRI, UK ExomeChip, and GOT2D consortia. Height meta-analyses (discovery and replication, single-variant and gene-based): P.D., T.M.F., M.Gr., J.N.H., G.L., D.J.L., K.S.L., Y.Lu, E.M., C.M.-G., F.Ri., A.R.W. UK Biobank-based integration of height association signals group and heritability analyses: P.D., T.M.F., G.L., Z.K., K.S.L., E.M., S.R., A.R.W. Pleiotropy working group: G.A., M. Bo., J.P.C., P.D., F.D., J.C.F., H.M.H., S. Kat., C.M.L., D.J.L., R.J.F.L., A.Ma., E.M., M.I.M., P.B.M., G.M.P., J.R.B.P., K.S.R., C.J.W. Biological and clinical enrichment and pathway analyses: R.S.F., J.N.H., Z.K., D.L., G.L., K.S.L., T.H.P. Functional characterization of STC2: T.R.K., C.O.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Additional information
Reviewer Information Nature thanks J. Barrett, D. Hinds and D. Hunter for their contribution to the peer review of this work.
A list of members and their affiliations appears in the Supplementary Information.
Extended data figures and tables
Extended Data Figure 2 Height ExomeChip association results.
a, Quantile–quantile plot of ExomeChip variants and their association to adult height under an additive genetic model in individuals of European ancestry. We stratified results on the basis of allele frequency. b, Manhattan plot of all ExomeChip variants and their association to adult height under an additive genetic model in individuals of European ancestry with a focus on the 553 independent SNPs, of which 469 have a MAF > 5% (grey), 55 have MAF between 1–5% (green), and 29 have a MAF < 1% (blue). c, Linkage disequilibrium (LD) score regression analysis for the height association results in European-ancestry studies. In the plot, each point represents a linkage disequilibrium score quantile, where the x axis of the point is the mean linkage disequilibrium score of variants in that quantile and the y axis is the mean χ2 statistic of variants in that quantile. The linkage disequilibrium score regression slope of the black line is calculated using equation 1 in ref. 34, which is estimated upwards owing to the small number of common variants (n = 15,848) and the design of the ExomeChip. The linkage disequilibrium score regression intercept is 1.4, the λGC is 2.7, the mean χ2 is 7.0, and the ratio statistic of (intercept − 1)/(mean χ2 − 1) is 0.067 (s.e.m. = 0.012). d, Scatter plot comparison of the effect sizes for all variants that reached significance in the European-ancestry-discovery results (n = 381,625) and results including only studies with sample sizes of more than 5,000 individuals (n = 241,453).
Extended Data Figure 3 Height ExomeChip association results in African-ancestry populations.
Among the all-ancestry results, we found eight variants for which the genetic association with height is mostly driven by individuals of African ancestry. The MAF of these variants is <1% (or monomorphic) in all ancestries except African ancestry. In individuals of African ancestry, the variants had allele frequencies between 9 and 40%.
Extended Data Figure 4 Concordance between direct conditional effect sizes using UK Biobank (x axis) and conditional analysis performed using a combination of imputation-based methodology and approximate conditional analysis (SSimp, y axis).
The Pearson’s correlation coefficient is r = 0.85. The dashed line indicates the identity line. The 95% confidence interval is indicated in both directions. Red, SNPs with Pcond > 0.05 in the UK Biobank; green, SNPs with Pcond ≤ 0.05 in the UK Biobank.
Extended Data Figure 5 Heritability estimated for all known height variants in the first release of the UK Biobank dataset.
a, We observed a weak but significant positive trend between MAF and heritability (P = 0.012). b, Average heritability explained per variant when stratifying the analyses by allele frequency or genomic annotation. For heritability estimations in UKBB, variants were pruned to r2 < 0.2 in the 1000 Genomes Project dataset, and the heritability figures are based on h2 = 80% for height.
Extended Data Figure 6 Comparison of DEPICT gene set enrichment results based on coding variation from ExomeChip or non-coding variation from GWAS data.
The x axis indicates the P value for enrichment of a given gene set using DEPICT adapted for ExomeChip (EC) data, where the input to DEPICT is the genes implicated by coding ExomeChip variants that are independent of known GWAS signals. The y axis indicates the P value for gene set enrichment using DEPICT, using as input the GWAS loci that do not overlap the coding signals. Each point represents a meta-gene set and the best P value for any gene set within the meta-gene set is shown. Only significant (false discovery rate < 0.01) gene set enrichment results are plotted. Colours correspond to whether the meta-gene set was significant for ExomeChip only (blue), GWAS only (green), both but more significant for ExomeChip (purple), or both but more significant for GWAS (orange), and the most significant gene sets within each category are labelled. A line is drawn at x = y for ease of comparison.
Extended Data Figure 7 Heat map showing entire DEPICT gene set enrichment results.
This figure is analagous to Fig. 2. For any given square, the colour indicates how strongly the corresponding gene (shown on the x axis) is predicted to belong to the reconstituted gene set (y axis). This value is based on the Z score of the gene for gene set inclusion in DEPICT’s reconstituted gene sets, where red indicates a higher Z score and blue indicates a lower one. The proteoglycan-binding pathway was uniquely implicated by coding variants (as opposed to common variants) by both DEPICT and the Pascal method. To visually reduce redundancy and increase clarity, we chose one representative ‘meta-gene set’ for each group of highly correlated gene sets based on affinity propagation clustering (see Methods and Supplementary Information). Heat map intensity and DEPICT P values correspond to the most significantly enriched gene set within the meta-gene set; meta-gene sets are listed with their database source. Annotations for the genes indicate whether the gene has OMIM annotation as underlying a disorder of skeletal growth (black and grey) and the MAF of the significant ExomeChip variant (shades of blue; if multiple variants, the lowest-frequency variant was kept). Annotations for the gene sets indicate if the gene set was also found significant for ExomeChip by the Pascal method (yellow and grey) and if the gene set was found significant by DEPICT for ExomeChip only or for both ExomeChip and GWAS (purple and green). GO, Gene Ontology; KEGG, Kyoto encyclopaedia of genes and genomes; MP, mouse phenotype in the Mouse Genetics Initiative; PPI, protein–protein interaction in the InWeb database.
Extended Data Figure 8 Coding height variants are pleiotropic.
a, b, Heat maps showing associations of the height variants to other complex traits; –log10(P values) are oriented with beta effect direction for the alternate allele, white are missing values, yellow are non-significant (P > 0.05), green to blue shading for hits with positive beta in the other trait and P values between 0.05 and <2 × 10−7 and orange to red shading for hits with negative beta in the other trait and P values between 0.05 and <2 × 10−7. Short and tall labels are given for the minor alleles. Clustering is done by the complete linkage method with Euclidean distance measure for the loci. Clusters highlight SNPs that are more significantly associated with the same set of traits. a shows variants for which the minor allele is the height-decreasing allele. b shows variants for which the minor allele is the height-increasing allele.
Supplementary information
Supplementary Information
This file contains Supplementary Text and Data and additional references. (PDF 1045 kb)
Supplementary Tables
This file contains Supplementary Tables 1-24. (XLSX 2141 kb)
Rights and permissions
About this article
Cite this article
Marouli, E., Graff, M., Medina-Gomez, C. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017). https://doi.org/10.1038/nature21039
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature21039
This article is cited by
-
Insights into the ANKRD11 variants and short-stature phenotype through literature review and ClinVar database search
Orphanet Journal of Rare Diseases (2024)
-
Causal roles of educational duration in bone mineral density and risk factors for osteoporosis: a Mendelian randomization study
BMC Musculoskeletal Disorders (2024)
-
A method to estimate the contribution of rare coding variants to complex trait heritability
Nature Communications (2024)
-
The metalloproteinase PAPP-A is required for IGF-dependent chondrocyte differentiation and organization
Scientific Reports (2024)
-
Low-frequency and rare genetic variants associated with rheumatoid arthritis risk
Nature Reviews Rheumatology (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.