Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A positively selected FBN1 missense variant reduces height in Peruvian individuals

Abstract

On average, Peruvian individuals are among the shortest in the world1. Here we show that Native American ancestry is associated with reduced height in an ethnically diverse group of Peruvian individuals, and identify a population-specific, missense variant in the FBN1 gene (E1297G) that is significantly associated with lower height. Each copy of the minor allele (frequency of 4.7%) reduces height by 2.2 cm (4.4 cm in homozygous individuals). To our knowledge, this is the largest effect size known for a common height-associated variant. FBN1 encodes the extracellular matrix protein fibrillin 1, which is a major structural component of microfibrils. We observed less densely packed fibrillin-1-rich microfibrils with irregular edges in the skin of individuals who were homozygous for G1297 compared with individuals who were homozygous for E1297. Moreover, we show that the E1297G locus is under positive selection in non-African populations, and that the E1297 variant shows subtle evidence of positive selection specifically within the Peruvian population. This variant is also significantly more frequent in coastal Peruvian populations than in populations from the Andes or the Amazon, which suggests that short stature might be the result of adaptation to factors that are associated with the coastal environment in Peru.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Genetic architecture of height in the Peruvian population.
Fig. 2: rs200342067 is positively selected in the Peruvian population.
Fig. 3: Electron microscopy of fibrillin 1 in the skin.

Similar content being viewed by others

Data availability

Genotyping data are available through dbGAP, under accession number phs002025.v1.p1.

Code availability

No custom code was used to draw the central conclusions of this work. All the software and packages used in this work are included and referenced in the manuscript.

References

  1. NCD Risk Factor Collaboration (NCD-RisC). A century of trends in adult human height. eLife 5, e13410 (2016).

    Google Scholar 

  2. Homburger, J. R. et al. Genomic insights into the ancestry and demographic history of South America. PLoS Genet. 11, e1005602 (2015).

    PubMed  PubMed Central  Google Scholar 

  3. Harris, D. N. et al. Evolutionary genomic dynamics of Peruvians before, during, and after the Inca Empire. Proc. Natl Acad. Sci. USA 115, E6526–E6535 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Ruiz-Linares, A. et al. Admixture in Latin America: geographic structure, phenotypic diversity and self-perception of ancestry based on 7,342 individuals. PLoS Genet. 10, e1004572 (2014).

    PubMed  PubMed Central  Google Scholar 

  5. Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

    PubMed  PubMed Central  Google Scholar 

  6. Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    PubMed  PubMed Central  Google Scholar 

  11. Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).

    ADS  CAS  PubMed  Google Scholar 

  14. Domingue, B. W. et al. The social genome of friends and schoolmates in the National Longitudinal Study of Adolescent to Adult Health. Proc. Natl Acad. Sci. USA 115, 702–707 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Rask-Andersen, M., Karlsson, T., Ek, W. E. & Johansson, Å. Gene–environment interaction study for BMI reveals interactions between genetic factors and physical activity, alcohol consumption and socioeconomic status. PLoS Genet. 13, e1006977 (2017).

    PubMed  PubMed Central  Google Scholar 

  16. Pelova, N. Considerations on the so-called myelolipoma of the adrenals. Nauchni Tr. Vissh. Med. Inst. Sofiia 48, 31–35 (1969).

    CAS  PubMed  Google Scholar 

  17. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

  18. Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).

    PubMed  PubMed Central  Google Scholar 

  19. Johnson, K. E. & Voight, B. F. Patterns of shared signatures of recent positive selection across human populations. Nat. Ecol. Evol. 2, 713–720 (2018).

    PubMed  PubMed Central  Google Scholar 

  20. Akbari, A. et al. Identifying the favored mutation in a positive selective sweep. Nat. Methods 15, 279–282 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Sabeti, P. C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832–837 (2002).

    ADS  CAS  PubMed  Google Scholar 

  22. Nei, M. & Li, W. H. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl Acad. Sci. USA 76, 5269–5273 (1979).

    ADS  CAS  PubMed  MATH  PubMed Central  Google Scholar 

  23. Arbiza, L., Zhong, E. & Keinan, A. NRE: a tool for exploring neutral loci in the human genome. BMC Bioinformatics 13, 301 (2012).

    PubMed  PubMed Central  Google Scholar 

  24. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Albers, P. K. & McVean, G. Dating genomic variants and shared ancestry in population-scale sequencing data. PLoS Biol. 18, e3000586 (2020).

    PubMed  PubMed Central  Google Scholar 

  26. Lamason, R. L. et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310, 1782–1786 (2005).

    ADS  CAS  PubMed  Google Scholar 

  27. Fan, S., Hansen, M. E. B., Lo, Y. & Tishkoff, S. A. Going global by adapting local: a review of recent human adaptation. Science 354, 54–59 (2016).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  28. Adhikari, K. et al. A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia. Nat. Commun. 10, 358 (2019).

    ADS  PubMed  PubMed Central  Google Scholar 

  29. Sturm, R. A. & Duffy, D. L. Human pigmentation genes under environmental selection. Genome Biol. 13, 248 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Günther, T. & Coop, G. Robust identification of local adaptation from allele frequencies. Genetics 195, 205–220 (2013).

    PubMed  PubMed Central  Google Scholar 

  31. Lasker, G. W. Differences in anthropometric measurements within and between three communities in Peru. Hum. Biol. 34, 63–70 (1962).

    CAS  PubMed  Google Scholar 

  32. Sengle, G. & Sakai, L. Y. The fibrillin microfibril scaffold: a niche for growth factors and mechanosensation? Matrix Biol. 47, 3–12 (2015).

    CAS  PubMed  Google Scholar 

  33. Schrenk, S., Cenzi, C., Bertalot, T., Conconi, M. T. & Di Liddo, R. Structural and functional failure of fibrillin-1 in human diseases (review). Int. J. Mol. Med. 41, 1213–1223 (2018).

    CAS  PubMed  Google Scholar 

  34. Collod-Béroud, G. et al. Update of the UMD-FBN1 mutation database and creation of an FBN1 polymorphism database. Hum. Mutat. 22, 199–208 (2003).

    PubMed  Google Scholar 

  35. Tiecke, F. et al. Classic, atypically severe and neonatal Marfan syndrome: twelve mutations and genotype-phenotype correlations in FBN1 exons 24–40. Eur. J. Hum. Genet. 9, 13–21 (2001).

    CAS  PubMed  Google Scholar 

  36. Smallridge, R. S. et al. Solution structure and dynamics of a calcium binding epidermal growth factor-like domain pair from the neonatal region of human fibrillin-1. J. Biol. Chem. 278, 12199–12206 (2003).

    CAS  PubMed  Google Scholar 

  37. Booms, P., Tiecke, F., Rosenberg, T., Hagemeier, C. & Robinson, P. N. Differential effect of FBN1 mutations on in vitro proteolysis of recombinant fibrillin-1 fragments. Hum. Genet. 107, 216–224 (2000).

    CAS  PubMed  Google Scholar 

  38. Jensen, S. A., Robertson, I. B. & Handford, P. A. Dissecting the fibrillin microfibril: structural insights into organization and function. Structure 20, 215–225 (2012).

    CAS  PubMed  Google Scholar 

  39. Jensen, S. A., Corbett, A. R., Knott, V., Redfield, C. & Handford, P. A. Ca2+-dependent interface formation in fibrillin-1. J. Biol. Chem. 280, 14076–14084 (2005).

    CAS  PubMed  Google Scholar 

  40. McGettrick, A. J., Knott, V., Willis, A. & Handford, P. A. Molecular effects of calcium binding mutations in Marfan syndrome depend on domain context. Hum. Mol. Genet. 9, 1987–1994 (2000).

    CAS  PubMed  Google Scholar 

  41. Zoledziewska, M. et al. Height-reducing variants and selection for short stature in Sardinia. Nat. Genet. 47, 1352–1356 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Fumagalli, M. et al. Greenlandic Inuit show genetic signatures of diet and climate adaptation. Science 349, 1343–1347 (2015).

    ADS  CAS  PubMed  Google Scholar 

  43. Luo, Y. et al. Early progression to active tuberculosis is a highly heritable trait driven by 3q23 in Peruvians. Nat. Commun. 10, 3765 (2019).

    ADS  PubMed  PubMed Central  Google Scholar 

  44. Zelner, J. L. et al. Identifying hotspots of multidrug-resistant tuberculosis transmission using spatial and molecular genetic data. J. Infect. Dis. 213, 287–294 (2016).

    PubMed  Google Scholar 

  45. Odone, A. et al. Acquired and transmitted multidrug resistant tuberculosis: the role of social determinants. PLoS ONE 11, e0146642 (2016).

    PubMed  PubMed Central  Google Scholar 

  46. Zhou, X. & Stephens, M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135 (2008).

    CAS  Google Scholar 

  48. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Conomos, M. P., Reiner, A. P., Weir, B. S. & Thornton, T. A. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 98, 127–148 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    PubMed  PubMed Central  Google Scholar 

  52. Reich, D. et al. Reconstructing Native American population history. Nature 488, 370–374 (2012).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  53. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    CAS  PubMed  Google Scholar 

  55. Chen, C.-Y. et al. Improved ancestry inference using weights from external reference panels. Bioinformatics 29, 1399–1406 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Ziyatdinov, A. et al. lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related individuals. BMC Bioinformatics 19, 68 (2018).

    PubMed  PubMed Central  Google Scholar 

  57. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Schick, U. M. et al. Genome-wide association study of platelet count identifies ancestry-specific loci in Hispanic/Latino Americans. Am. J. Hum. Genet. 98, 229–242 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Balduzzi, S., Rücker, G. & Schwarzer, G. How to perform a meta-analysis with R: a practical tutorial. Evid. Based Ment. Health 22, 153–160 (2019).

    PubMed  Google Scholar 

  60. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Bakshi, A. et al. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits. Sci. Rep. 6, 32894 (2016).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  63. Szpiech, Z. A. & Hernandez, R. D. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol. Biol. Evol. 31, 2824–2827 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Marcus, J. H. & Novembre, J. Visualizing the geography of genetic variants. Bioinformatics 33, 594–595 (2017).

    CAS  PubMed  Google Scholar 

  65. Kelleher, J., Etheridge, A. M. & McVean, G. Efficient Coalescent simulation and genealogical analysis for large sample sizes. PLOS Comput. Biol. 12, e1004842 (2016).

    ADS  PubMed  PubMed Central  Google Scholar 

  66. International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).

    Google Scholar 

  67. Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  68. Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Lin, D. et al. Digestion-ligation-only Hi-C is an efficient and cost-effective method for chromosome conformation capture. Nat. Genet. 50, 754–763 (2018).

    CAS  PubMed  Google Scholar 

  70. Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank D. B. Moody for discussions, T. Horn for his feedback on optimizing skin immunohistochemistry and J. N. Katz for advising us on a structured clinical assessment of the musculoskeletal system. The study was supported by the National Institutes of Health (NIH) TB Research Unit Network, grants U19-AI111224-01 and U01-HG009088. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. S.A. was supported by the Swiss National Science Foundation (SNSF) postdoctoral mobility fellowships P2ELP3_172101 and P400PB_183823.

Author information

Authors and Affiliations

Authors

Contributions

S.R. and M.B.M. designed the study. S.A. analysed and interpreted the data. S.A. and S.R. drafted the manuscript. Y.L., G.M.B., E.E.K., J.N.H., E.B., K.S., H.G., T.D.O., A.A., D.N.H. and X.L. performed statistical analysis. M.B.M., L.L., R.C., J.M.C., C.C., R.Y., J.T.G., J.J., J.M.C. and C.F. recruited patients and obtained samples for this study. S.R., E.E.F., H.C.D., R.M.N. and M.S. conducted clinical assessment. All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence to Soumya Raychaudhuri.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Guillaume Lettre, Ben Voight and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Peruvian population structure.

a, b, PCA of genotyping data from Peruvian individuals included in this study (n = 3,134 individuals) merged with the data from continental populations from phase 3 of the 1000 Genomes Project (n = 3,469 individuals) as well as the data from Siberian and Native American populations from the previously published study52 (n = 738 individuals), which were used as a reference panel (number of variants, 34,936). Dots, individuals; colour, populations (AFR, African; AMR, South American; EAS, east Asian; SAS, south Asian; EUR, European; SIB, Siberian; NAT, Native American). c, Global ancestry analysis using ADMIXTURE (K = 4). We observed varying levels of European, African and Asian admixture in our cohort (n = 3,134 individuals) with a median proportion of Native American, European, African and Asian ancestry per individual of 0.83 (IQR = 0.72–0.91), 0.14 (0.08–0.21), 0.01 (0.003–0.03) and 0.003 (10−5–0.01), respectively. Vertical lines, individuals; colours, genomic proportion of a given ancestry in the genome of each individual. ADMIXTURE analysis (K = 4) is done using all populations in phase 3 of the 1000 Genomes Project as well as the Siberian and Native American populations from the previously published study52, which were used as a reference. African (AFR) ancestry includes Yoruba in Ibadan, Nigeria, Luhya in Webuye, Kenya, Gambian in Western Divisions in the Gambia, Mende in Sierra Leone, Esan in Nigeria, Americans of African Ancestry in southwest United States. European (EUR) ancestry includes central European, Utah residents (CEPH) with northern and western European ancestry (USA), Toscani in Italy, Finnish in Finland, British in England and Scotland, Iberian population in Spain. East Asian (EAS) ancestry includes Han Chinese in Beijing, China, Japanese in Tokyo, Japan, Southern Han Chinese, Chinese Dai in Xishuangbanna, China, Kinh in Ho Chi Minh City, Vietnam. South Asian (SAS) ancestry includes Gujarati Indian from Houston, Texas (USA), Punjabi from Lahore, Pakistan, Bengali from Bangladesh, Sri Lankan Tamil from the United Kingdom, Indian Telugu from the United Kingdom. Puerto Ricans (PUR) from Puerto Rico. Colombians (CLM) from Medellin, Colombia. Mexicans (MXL) from Los Angeles, California (USA). Peruvian individuals (PEL) from Lima, Peru. Altic, Altaic language family, which includes Yakut, Buryat, Evenki, Tuvinians, Altaian, Mongolian, Dolgan. North Amerind, northern Amerindian language family, which includes Maya, Mixe, Kaqchikel, Algonquin, Ojibwa and Cree. Central Amerind, central Amerindian language family, which includes Pima, Chorotega, Tepehuano, Zapotec, Mixtec and Yaqui. Andean, Andean language family, which includes Quechua, Aymara, Inga, Chilote, Diaguita, Chono, Hulliche and Yaghan. A full list of all populations in all language groups has been published previously52.

Extended Data Fig. 2 Association of rs200342067 and height.

a, Single-variant association analysis (n = 3,134 individuals and 7,756,401 variants). Dotted red line, genome-wide significance threshold of 5 × 10−8. Five SNPs that overlap the coding sequence of FBN1 passed the genome-wide significance threshold. We did not observe any inflation in test statistics (λ = 1.02). Association P values are from two-sided Wald tests. b, rs200342067 in heterozygous individuals reduces height by 2.2 cm (4.4 cm in homozygous individuals, including 11 individuals with the C/C genotype, 275 the C/T genotype and 2,848 the T/T genotype) and could explain 0.9% of the phenotypic variance in height in our cohort (n = 3,143 individuals). The x axis shows the rs200342067 genotype; the y axis shows the height residuals after adjustments for age, sex and a GRM as random effect.

Extended Data Fig. 3 rs12441775 DAF (rs12441775*G) and extended haplotype structure in the 1000 Genomes Project.

a, The derived allele, rs12441775*G, has a high frequency in all non-African populations in the 1000 Genomes Project (average DAF in non-Africans = 58% (IQR = 51–64) and in Africans = 4% (IQR = 1–5)). The map is generated using the GGV browser64 (http://www.popgen.uchicago.edu/ggv). bh, Haplotypes that carry the rs12441775*G (major/derived) allele are longer than haplotypes that carry the rs12441775*C (minor/ancestral) allele in non-African populations. Horizontal lines, haplotypes; the position of rs12441775 is marked below the haplotype. At any given position, adjacent haplotypes with the same colour carry identical genotypes between the core SNP (rs12441775) and that site, dashed line separates the haplotypes that carry the derived (above the line) and ancestral (below the line) alleles.

Extended Data Fig. 4 Haplotypes that carry the rs200342067 allele are longer than what is expected under neutral selection.

a, Haplotype decay around rs200342067 in our cohort (n = 3,134 individuals and 6,268 haplotypes). The position of rs200342067 is marked below the haplotypes. Haplotypes above the dashed line carry rs200342067*C allele (derived/minor, n = 297 haplotypes) and haplotypes below the dashed line carry the rs200342067*T allele (ancestral/major, n = 5,971 haplotypes). b, Integrated EHH of haplotypes carrying the rs200342067*C allele (n = 297 haplotypes) compared with the integrated EHH of haplotypes carrying 2,380 variants with similar DAF (4.7 ± 1%) that overlap the neutral regions of the genome in our cohort (n = 3,134 individuals). Haplotypes that carry the rs200342067*C allele are taller than 99.2% of the haplotypes carrying similar variants in neutral regions of the genome. Vertical red line, integrated EHH of haplotypes carrying the rs200342067*C allele (integrated EHH = 0.115). c, The same as a, but excluding the nine haplotypes that carry both rs200342067*C and rs12441775*G alleles. d, EHH decay curves for haplotypes carrying the rs200342067*C allele excluding the nine haplotypes that carry both rs200342067*C and rs12441775*G alleles (n = 288 haplotypes) compared with haplotypes carrying 2,309 variants that have a similar DAF to the updated frequency of rs200342067*C (4.6 ± 1%) and that overlap the neutral regions of the genome in our cohort (n = 3,134 individuals). Haplotypes with the rs200342067*C allele are longer than 99.7% of the haplotypes carrying similar variants in the neutral genomic regions. e, Integrated EHH for data shown in d. Vertical red line, integrated EHH for haplotypes carrying the rs200342067*C but not the rs12441775*G allele (integrated EHH = 0.124).

Extended Data Fig. 5 Simulation of haplotypes under the neutral demographic model.

a, PCA plot of principal component (PC)2 versus PC1 for simulated individuals (n = 1,000 simulated individuals and 2,000 simulated haplotypes). Individuals were simulated using a demographic model matching the population history of Peru and under neutral selection. Red dots, simulated individuals; other dots, reference populations from the 1000 Genomes Project. b, PCA plot of PC3 versus PC1 as described for a. c, We compared the integrated EHH of rs200342067*C with the integrated EHH of 1,000 variants that had a similar DAF to rs200342067 (DAF = 4.7 ± 1%) and that overlapped the same genomic region as rs200342067 on a simulated chromosome 15 (physical position, 48,773,926 ± 20 kb). The integrated EHH of rs200342067 is more extreme than the integrated EHH observed for any of the variants in the simulated data. The x axis shows the integrated EHH; the distribution is the integrated EHH of variants in simulated haplotypes (n = 2,000 haplotypes); the vertical red line shows the integrated EHH value of rs200342067 in our cohort (n = 6,628 haplotypes, integrated EHH = 0.115). d, e, Similar to c for two different neutral regions on chromosome 15. Vertical red lines, integrated EHH of rs17580697 (d, integrated EHH = 0.012, 76th percentile) and rs305008 (e; integrated EHH = 0.010, 74th percentile) in our cohort (n = 6,628 haplotypes).

Extended Data Fig. 6 Comparison of different selection statistics for rs200342067 and other variants with a similar DAF and recombination rate.

a, Distribution of iHS for 2,062 independent variants (that are at least 1 Mb apart) matched in DAF and local recombination rate to rs200342067. iHS values are calculated for Peruvian individuals in the 1000 Genomes Project (n = 85 individuals) and were obtained from a previously published study19. Red line, iHS of rs200342067 (iHS = −1.5; 4.7th percentile); green and blue lines, fifth and first percentile of the iHS distribution. b, EHH decay curves for rs200342067 (red line) as well as haplotypes that carry 2,062 independent variants (at least 1 Mb apart) matched in DAF and local recombination rate to rs200342067 in our cohort (n = 6,268 haplotypes (grey lines)). c, Distribution of integrated EHH for haplotypes shown in b, haplotypes carrying the rs200342067*C allele are longer than 97.5% of haplotypes that carry similar variants. The x axis shows the integrated EHH; the red line indicates the integrated EHH of the rs200342067*C allele (integrated EHH = 0.115). d, Histogram of Fisher’s exact test results comparing the extent of allele frequency differences between coastal (n = 46 individuals) and non-coastal (n = 104 individuals) regions in Peru for 2,062 independent variants that were matched in DAF and local recombination rate to rs200342067. the x axis shows the −log10-transformed P values from the two-sided Fisher’s exact test; the dashed blue and green vertical lines show the 99th and 95th percentiles, respectively; the solid red line indicates the −log10-transformed P value of the two-sided Fisher’s exact test (P= 0.0005) for rs200342067 (1.1% percentile). e, Bayenv2 XTX statistics, a measure of deviation from neutral patterns of population structure, for 2,062 independent variants that were matched in DAF and local recombination rate to rs200342067. The x axis shows the XTX statistics; the red line indicates the XTX value for rs200342067 (XTX = 2.13; 8.3th percentile); the green and blue lines show the fifth and first percentile of the XTX distribution, respectively.

Extended Data Fig. 7 Genomic context of rs200342067 FBN1(E1297G).

a, Schematic of FBN1, exons are shown as black bars. Exon 31 (ENSE00001753582) is shown in red. b, The FBN1 exon 31 sequence and PhyloP per-nucleotide conservation score based on multiple sequence alignment of 100 vertebrate species (obtained using the GRCh37 assembly conservation track of the UCSC genome browser). The T>C change due to rs200342067 occurs in a conserved nucleotide. c, Schematic of fibrillin 1 (ENST00000316623.5). Fibrillin 1 consists of the following domains: N- and C-terminal domains (black rectangles), EGF-like domains (stripped rectangles), hybrid domains (black pentagons), TGFβ-binding domains (grey ovals), a proline-rich domain (white hexagon) and 43 calcium-binding cbEGF-like domains (white rectangles). cbEGF domain 17, which is affected by rs200342067 FBN1(E1297G), is shown in red; E1297G is located between a conserved cysteine FBN1(C1296) involved in forming a disulfide bond with FBN1(C1284) and a conserved asparagine FBN1(N1298) involved in calcium binding. d, The sequence of FBN1(cbEGF) domain 17 of fibrillin 1 and the three-dimensional structures of cbEGF domains 17 and 18 (the three-dimensional structure was obtained based on homology with the previously published36 cbEGF domains 12 and 13 of fibrillin 1 (PDB 1LMJ). rs200342067 changes the glutamic acid, a large amino acid with a negatively charged side chain, to glycine, the smallest amino acid with no side chain (shown in red). The side chains are shown for rs200342067 (red spheres), as well as the calcium-interacting residues (beige sticks) and the cysteine residues involved in disulfide bonds (yellow sticks). A calcium ion is shown in green.

Extended Data Fig. 8 Immunohistochemical staining of fibrillin 1.

a, b, Fibrillin 1 staining of skin biopsies from two individuals with the rs200342067 C/C genotype. c, d, Fibrillin 1 staining of skin biopsies from two individuals with the T/T genotype matched for age, sex and ancestry proportions. Individuals with the C/C genotype have less fibrillin 1 deposition in the dermal extracellular matrix and shorter microfibrillar projections from the dermal–epidermal junction into the superficial (papillary) dermis (red arrows, 20×) as well as less fibrillin 1 deposition in the deeper dermis. Two magnification are shown, the red rectangles in the first column (20× magnification) are magnified in the second column (60×).

Extended Data Fig. 9 Electron microscopy of fibrillin 1 in skin.

a, c, Electron microscopy images of the dermal–epidermal junction in samples from two individuals with the rs200342067 T/T genotype. b, d, Electron microscopy images of the dermal–epidermal junction in samples from two individuals with the rs200342067 C/C genotype who are matched for age, sex and ancestry proportions. Individuals with the C/C genotype have short, fragmented and less densely packed microfibrils with irregular edges (red arrows) and their microfibrils are embedded in less dense collagen bundles (yellow arrows) compared with individuals with the T/T genotype. Two magnification are shown, the white rectangles in the first column (4,400× magnification; green scale bars, 2 μm) are magnified in the second column (11,000× magnification; yellow scale bars, 1 μm).

Extended Data Table 1 SNPs that overlap the 15q15–21.1 locus

Supplementary information

Supplementary Information

This file contains Supplementary sections 1-8, including Supplementary Figures and Tables, and additional references.

Reporting Summary

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asgari, S., Luo, Y., Akbari, A. et al. A positively selected FBN1 missense variant reduces height in Peruvian individuals. Nature 582, 234–239 (2020). https://doi.org/10.1038/s41586-020-2302-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-020-2302-0

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing