Abstract
The human face is complex and multipartite, and characterization of its genetic architecture remains challenging. Using a multivariate genome-wide association study meta-analysis of 8,246 European individuals, we identified 203 genome-wide-significant signals (120 also study-wide significant) associated with normal-range facial variation. Follow-up analyses indicate that the regions surrounding these signals are enriched for enhancer activity in cranial neural crest cells and craniofacial tissues, several regions harbor multiple signals with associations to different facial phenotypes, and there is evidence for potential coordinated actions of variants. In summary, our analyses provide insights into the understanding of how complex morphological traits are shaped by both individual and coordinated genetic actions.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Joint multi-ancestry and admixed GWAS reveals the complex genetics behind human cranial vault shape
Nature Communications Open Access 16 November 2023
-
Oral and craniofacial research in the Generation R study: an executive summary
Clinical Oral Investigations Open Access 10 June 2023
-
Automatic landmarking identifies new loci associated with face morphology and implicates Neanderthal introgression in human nasal shape
Communications Biology Open Access 08 May 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
All of the genotypic markers for the 3DFN dataset are available to the research community through the dbGaP controlled-access repository (http://www.ncbi.nlm.nih.gov/gap) at accession no. phs000949.v1.p1. The raw source data for the phenotypes—the 3D facial surface models in.obj format—are available through the FaceBase Consortium (https://www.facebase.org) at accession no. FB00000491.01. Access to these 3D facial surface models requires proper institutional ethics approval and approval from the FaceBase data access committee. Additional details can be requested from S.M.W.
The participants making up the PSU and IUPUI datasets were not collected with broad data sharing consent. Given the highly identifiable nature of both facial and genomic information and unresolved issues regarding risk to participants, we opted for a more conservative approach to participant recruitment. Broad data sharing of the raw data from these collections would thus be in legal and ethical violation of the informed consent obtained from the participants. This restriction is not because of any personal or commercial interests. Additional details can be requested from M.D.S. and S.W. for the PSU and IUPUI datasets, respectively.
The ALSPAC (UK) data will be made available to bona fide researchers on application to the ALSPAC Executive Committee (http://www.bris.ac.uk/alspac/researchers/data-access). Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.
Publicly available data used were the 1000G Phase 3 data (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), the list of HapMap 3 SNPs excluding the MHC region (http://ldsc.broadinstitute.org/static/media/w_hm3.noMHC.snplist.zip), and ChIP–seq files from Prescott et al.39 (GSE70751), Najafova et al.85 (GSE82295), Baumgart et al.86 (GSE89179), Nott et al.87 (https://genome.ucsc.edu/s/nottalexi/glassLab_BrainCellTypes_hg19), Pattison et al.88 (GSE119997), Wilderman et al.40 (GSE97752) and the Roadmap Epigenomics Project89 (https://egg2.wustl.edu/roadmap/data/byFileType/alignments/consolidated/). Meta-analysis GWAS statistics are available on GWAS Catalog (GCP000044). All data relevant to run future replications and meta-analysis efforts are provided in the FigShare repository for this work34, along with additional figures (https://doi.org/10.6084/m9.figshare.c.4667261). Items available in the FigShare repository are (1) anthropometric mask: a Matfile of the anthropometric mask used; (2) association statistics and effects of the 203 lead SNPs: facial effects, LocusZoom plots and association statistics from each stage of the analysis for the 203 lead SNPs; (3) calculation of study-wide-significance threshold: script and permutation outcomes needed to replicate the calculation of the study-wide-significance threshold; (4) facial segment assignments: segment assignments for each quasi-landmark in the anthropometric mask; (5) Fig. 2a labeled: a larger version of Fig. 2a, with all cell types and tissues labeled; (6) GREAT Export: raw output of the GREAT analysis; (7) PCA shape constructs: PCA shape spaces for all 63 facial segments; (8) QQ plots: QQ plots for each segment in all stages of the analysis; (9) script to explore facial segments and GWAS hits: MatLab script for select data exploration functions; (10) SNPs reaching suggestive significance in either meta-analysis track: association statistics of all SNPs with P < 5 × 10−7 in METAUS or METAUK tracks; (11) source data for manuscript figures: source data in Excel format for all figures, where possible.
Code availability
KU Leuven provides the MeshMonk (v.0.0.6) spatially dense facial-mapping software, free to use for academic purposes (https://github.com/TheWebMonks/meshmonk). Matlab 2017b implementations of the hierarchical spectral clustering to obtain facial segmentations are available from a previous publication25 (https://doi.org/10.6084/m9.figshare.7649024).
The statistical analyses in this work were based on functions of the statistical toolbox in Matlab 2017b, SHAPEIT2 (v.2.r900), Sanger Imputation Server (v.0.0.6), PBWT pipeline (v.3.1), MeshMonk (v.0.0.6), LDSC (v.1.0.1), FUMA (v.1.3.3), GREAT (v.3.0.0), Plink v.1.9, lavaan (v.0.6-3), R (>v.3.4), agricolae (v.1.3-0), cowplot (v.1.0.0), ggplot2 (v.3.1.1), ggpubr (v.0.2), gridExtra (v.2.3), gtable (v.0.3.0), grid (v.3.6.2), Hmisc (v.4.2-0), psych (v.1.8.12), data.table (v.1.12.0), Genotype Harmonizer (v.1.4.20), KING (v.2.1.3), bowtie2 (v.2.3.4.2), bedtools (v.2.27.1) and Bioconductor (v.3.7), as mentioned throughout the Methods.
References
Atchley, W. R. & Hall, B. K. A model for development and evolution of complex morphological structures. Biol. Rev. 66, 101–157 (1991).
Gratten, J., Wray, N. R., Keller, M. C. & Visscher, P. M. Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nat. Neurosci. 17, 782–790 (2014).
Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat. Rev. Genet. 19, 110–124 (2018).
Weinberg, S. M. et al. Hunting for genes that shape human faces: initial successes and challenges for the future. Orthod. Craniofac. Res. 22, 207–212 (2019).
Weinberg, S. M., Cornell, R. & Leslie, E. J. Craniofacial genetics: where have we been and where are we going? PLoS Genet. 14, e1007438 (2018).
Dixon, M. J., Marazita, M. L., Beaty, T. H. & Murray, J. C. Cleft lip and palate: understanding genetic and environmental influences. Nat. Rev. Genet. 12, 167–178 (2011).
Paternoster, L. et al. Genome-wide association study of three-dimensional facial morphology identifies a variant in PAX3 associated with nasion position. Am. J. Hum. Genet. 90, 478–485 (2012).
Liu, F. et al. A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS Genet. 8, e1002932 (2012).
Jacobs, L. C. et al. Intrinsic and extrinsic risk factors for sagging eyelids. JAMA Dermatol. 150, 836–843 (2014).
Adhikari, K. et al. A genome-wide association scan implicates DCHS2, RUNX2, GLI3, PAX1 and EDAR in human facial variation. Nat. Commun. 7, 11616 (2016).
Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).
Shaffer, J. R. et al. Genome-wide association study reveals multiple loci influencing normal human facial morphology. PLoS Genet. 12, e1006149 (2016).
Cole, J. B. et al. Genome-wide association study of African children identifies association of SCHIP1 and PDE8A with facial size and shape. PLoS Genet. 12, e1006174 (2016).
Lee, M. K. et al. Genome-wide association study of facial morphology reveals novel associations with FREM1 and PARK2. PLoS One 12, e0176566 (2017).
Crouch, D. J. M. et al. Genetics of the human face: identification of large-effect single gene variants. Proc. Natl Acad. Sci. USA 115, E676–E685 (2018).
Claes, P. et al. Genome-wide mapping of global-to-local genetic effects on human facial shape. Nat. Genet. 50, 414–423 (2018).
Endo, C. et al. Genome-wide association study in Japanese females identifies fifteen novel skin-related trait associations. Sci. Rep. 8, 8974 (2018).
Cha, S. et al. Identification of five novel genetic loci related to facial morphology by genome-wide association studies. BMC Genomics 19, 481 (2018).
Howe, L. J. et al. Investigating the shared genetics of non-syndromic cleft lip/palate and facial morphology. PLoS Genet. 14, e1007501 (2018).
Qiao, L. et al. Genome-wide variants of Eurasian facial shape differentiation and a prospective model of DNA based face prediction. J. Genet. Genomics 45, 419–432 (2018).
Wu, W. et al. Whole-exome sequencing identified four loci influencing craniofacial morphology in northern Han Chinese. Hum. Genet. 138, 601–611 (2019).
Li, Y. et al. EDAR, LYPLAL1, PRDM16, PAX3, DKK1, TNFSF12, CACNA2D3 and SUPT3H gene variants influence facial morphology in a Eurasian population. Hum. Genet. 138, 681–689 (2019).
Xiong, Z. et al. Novel genetic loci affecting facial shape variation in humans. eLife 8, e49898 (2019).
White, J. D. et al. MeshMonk: open-source large-scale intensive 3D phenotyping. Sci. Rep. 9, 6085 (2019).
Sero, D. et al. Facial recognition from DNA using face-to-DNA classifiers. Nat. Commun. 10, 2557 (2019).
Hayton, J. C., Allen, D. G. & Scarpello, V. Factor retention decisions in exploratory factor analysis: a tutorial on parallel analysis. Organ. Res. Methods 7, 191–205 (2004).
Franklin, S. B., Gibson, D. J., Robertson, P. A., Pohlmann, J. T. & Fralish, J. S. Parallel analysis: a method for determining significant principal components. J. Veg. Sci. 6, 99–106 (1995).
Stouffer, S. A., Suchman, E. A., Devinney, L. C., Star, S. A. & Williams, R. M. Jr. The American Soldier: Adjustment During Army Life. Vol. 1 (Princeton Univ. Press, 1949).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinforma. Oxf. Engl. 26, 2190–2191 (2010).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Som, P. M., Streit, A. & Naidich, T. P. Illustrated review of the embryology and development of the facial region, part 3: an overview of the molecular interactions responsible for facial development. Am. J. Neuroradiol. 35, 223–229 (2014).
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
White, J. & Indencleef, K. Insights into the genetic architecture of the human face. FigShare https://doi.org/10.6084/m9.figshare.c.4667261 (2020).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Watanabe, K., Taskesen, E., Bochoven, Avan & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Rada-Iglesias, A. et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283 (2011).
Creyghton, M. P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl Acad. Sci. USA 107, 21931–21936 (2010).
Prescott, S. L. et al. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell 163, 68–83 (2015).
Wilderman, A., VanOudenhove, J., Kron, J., Noonan, J. P. & Cotney, J. High-resolution epigenomic atlas of human embryonic craniofacial development. Cell Rep. 23, 1581–1597 (2018).
Kraus, P. & Lufkin, T. Dlx homeobox gene control of mammalian limb and craniofacial development. Am. J. Med. Genet. A 140, 1366–1374 (2006).
Hennekam, R. C. M., Krantz, I. D. & Allanson, J. E. Gorlin’s Syndromes of the Head and Neck (Oxford Univ. Press, 2010).
Attanasio, C. et al. Fine tuning of craniofacial morphology by distant-acting enhancers. Science 342, 1241006 (2013).
Beaty, T. H. et al. Testing candidate genes for non-syndromic oral clefts using a case-parent trio design. Genet. Epidemiol. 22, 1–11 (2002).
Alappat, S., Zhang, Z. Y. & Chen, Y. P. Msx homeobox gene family and craniofacial development. Cell Res 13, 429–442 (2003).
Satokata, I. & Maas, R. Msx1 deficient mice exhibit cleft palate and abnormalities of craniofacial and tooth development. Nat. Genet. 6, 348–356 (1994).
Nakatomi, M. et al. Genetic interactions between Pax9 and Msx1 regulate lip development and several stages of tooth morphogenesis. Dev. Biol. 340, 438–449 (2010).
Wang, J.-L. et al. TGF-β signaling regulates DACT1 expression in intestinal epithelial cells. Biomed. Pharmacother. 97, 864–869 (2018).
Rabadán, M. A. et al. Delamination of neural crest cells requires transient and reversible Wnt inhibition mediated by Dact1/2. Development 143, 2194–2205 (2016).
Stegman, M. A. et al. Identification of a tetrameric hedgehog signaling complex. J. Biol. Chem. 275, 21809–21812 (2000).
Méthot, N. & Basler, K. Suppressor of fused opposes hedgehog signal transduction by impeding nuclear accumulation of the activator form of Cubitus interruptus. Development 127, 4001–4010 (2000).
Monnier, V., Dussillol, F., Alves, G., Lamour-Isnard, C. & Plessis, A. Suppressor of fused links fused and Cubitus interruptus on the hedgehog signalling pathway. Curr. Biol. CB 8, 583–586 (1998).
Krzywinski, M. I. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Brown, G. W. & Mood, A. M. On median tests for linear hypotheses. In Proc. 2nd Berkeley Symposium on Mathematical Statistics and Probability (ed. Neyman, J.) 159–166 (Univ. of California Press, 1951).
Weinberg, S. M. et al. The 3D facial norms database: part 1. A web-based craniofacial anthropometric and image repository for the clinical and research community. Cleft Palate Craniofac. J. 53, e185–e197 (2016).
Boyd, A. et al. Cohort profile: the ‘children of the 90s’—the index offspring of the Avon longitudinal study of parents and children. Int. J. Epidemiol. 42, 111–127 (2013).
Fraser, A. et al. Cohort profile: the Avon longitudinal study of parents and children: ALSPAC mothers cohort. Int. J. Epidemiol. 42, 97–110 (2013).
Verma, S. S. et al. Imputation and quality control steps for combining multiple genome-wide datasets. Front. Genet. 5, 370 (2014).
Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Durbin, R. Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT). Bioinforma. Oxf. Engl. 30, 1266–1272 (2014).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
1000 Genomes Project Consortium. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 Genes Genomics Genet. 1, 457–470 (2011).
Heike, C. L., Upson, K., Stuhaug, E. & Weinberg, S. M. 3D digital stereophotogrammetry: a practical guide to facial image acquisition. Head. Face Med. 6, 18 (2010).
Robert, P. & Escoufier, Y. A unifying tool for linear multivariate statistical methods: the RV-coefficient. J. R. Stat. Soc. Ser. C. Appl. Stat. 25, 257–265 (1976).
Klingenberg, C. P. Morphometric integration and modularity in configurations of landmarks: tools for evaluating a priori hypotheses. Evol. Dev. 11, 405–421 (2009).
Rohlf, F. J. & Slice, D. Extensions of the Procrustes method for the optimal superimposition of landmarks. Syst. Biol. 39, 40–59 (1990).
Olson, C. L. On choosing a test statistic in multivariate analysis of variance. Psychol. Bull. 83, 579–586 (1976).
Ferreira, M. A. R. & Purcell, S. M. A multivariate test of association. Bioinformatics 25, 132–133 (2009).
Galesloot, T. E., van Steen, K., Kiemeney, L. A. L. M., Janss, L. L. & Vermeulen, S. H. A comparison of multivariate genome-wide association methods. PLoS One 9, e95923 (2014).
Porter, H. F. & O’Reilly, P. F. Multivariate simulation framework reveals performance of multi-trait GWAS methods. Sci. Rep. 7, 38837 (2017).
O’Reilly, P. F. et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One 7, e34861 (2012).
Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).
Stephens, M. A unified framework for association analysis with multiple related phenotypes. PLoS One 8, e65245 (2013).
Zhou, X. & Stephens, M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014).
Devroye, L. Non-uniform Random Variate Generation (Springer, 1986).
Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).
Karolchik, D. et al. The UCSC table browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).
Hooper, J. E. et al. Systems biology of facial development: contributions of ectoderm and mesenchyme. Dev. Biol. 426, 97–114 (2017).
Pers, T. H., Timshel, P. & Hirschhorn, J. N. SNPsnap: a Web-based tool for identification and annotation of matched SNPs. Bioinformatics 31, 418–420 (2015).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Rosseel, Y. lavaan: an R package for structural equation modeling. J. Stat. Softw. 48, 1–36 (2012).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Najafova, Z. et al. BRD4 localization to lineage-specific enhancers is associated with a distinct transcription factor repertoire. Nucleic Acids Res. 45, 127–141 (2017).
Baumgart, S. J. et al. CHD1 regulates cell fate determination by activation of differentiation-induced genes. Nucleic Acids Res. 45, 7722–7735 (2017).
Nott, A. et al. Brain cell type-specific enhancer-promoter interactome maps and disease risk association. Science 366, 1134–1139 (2019).
Pattison, J. M. et al. Retinoic acid and BMP4 cooperate with TP63 to alter chromatin dynamics during surface epithelial commitment. Nat. Genet. 50, 1658–1665 (2018).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Acknowledgements
We are extremely grateful to all the individuals and families who took part in this study, the midwives for their help in recruiting them and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. We are also very grateful to all of the US participants for generously donating their time to our research, and to present and former laboratory members who worked tirelessly to make these analyses possible. Pittsburgh personnel, data collection and analyses were supported by the National Institute of Dental and Craniofacial Research (U01-DE020078, program director/principal investigators (PD/PIs): M.L.M./S.M.W.; R01-DE016148, PD/PIs: M.L.M./S.M.W.; and R01-DE027023, PD/PIs: S.M.W./J.R.S.). Funding for genotyping by the National Human Genome Research Institute (X01-HG007821 and X01-HG007485, PD/PI: M.L.M.) and funding for initial genomic data cleaning by the University of Washington provided by contract HHSN268201200008I from the National Institute for Dental and Craniofacial Research awarded to the Center for Inherited Disease Research (https://www.cidr.jhmi.edu/). Penn State personnel, data collection and analyses were supported by Procter & Gamble, Company (UCRI-2015-1117-HN-532, PD/PIs: H.L.N.), the Center for Human Evolution and Development at Penn State, the Science Foundation of Ireland Walton Fellowship (04.W4/B643, PD/PI: M.D.S.), the US National Institute of Justice (2008-DN-BX-K125, PD/PI: M.D.S.; and 2018-DU-BX-0219, PD/PIs: S.W.) and by the US Department of Defense. IUPUI personnel, data collection and analyses were supported by the National Institute of Justice (2015-R2-CX-0023, 2014-DN-BX-K031 and 2018-DU-BX-0219, PD/PI: S.W.). University of Cincinnati personnel and data collection were supported by Procter & Gamble, Company (UCRI-2015-1117-HN-532, PD/PI: H.L.N.). The UK Medical Research Council and Wellcome (grant no. 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. The publication is the work of the authors and K.I. and P.C. will serve as guarantors for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). ALSPAC GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. The KU Leuven research team and analyses were supported by the National Institute of Dental and Craniofacial Research (R01-DE027023, PD/PIs: S.M.W./J.R.S.), The Research Fund KU Leuven (BOF-C1, C14/15/081 and C14/20/081, PD/PI: P.C.), The Research Program of the Research Foundation—Flanders (FWO, G078518N, PD/PI: P.C.) and a Senior Clinical Investigator Fellowship of The Research Foundation—Flanders (G078714N, PD/PI: G.H.). Stanford University personnel and analyses were supported by the National Institute of Dental and Craniofacial Research (R01-DE027023, PD/PIs: S.M.W./J.R.S.; and U01-DE024430, PD/PIs: J.W./L. Selleri), the Howard Hughes Medical Institute and the March of Dimes Foundation (1-FY15-312, PD/PI: J.W.).
Author information
Authors and Affiliations
Contributions
P.C., M.D.S., S.M.W., J.R.S., J.W. and S.W. conceptualized the study (ideas; formulation or evolution of overarching research goals and aims). J.D.W., K.I., R.J.E., M.K.L., J.L., S.W. and P.C. carried out the data curation (management activities to annotate (produce metadata), scrub data and maintain research data for initial use and later re-use). J.D.W., K.I., S.N., R.J.E., H.H., J.R., J.L. and P.C. carried out the formal analysis (application of statistical, mathematical, computational or other formal techniques to analyze or synthesize study data). S.R., H.L.N., E.F., T.S., M.L.M., J.R.S., J.W., S.W., S.M.W., M.D.S. and P.C. were responsible for funding acquisition (acquisition of the financial support for the project leading to this publication). J.D.W., K.I., S.N., R.J.E., H.H., J.R., M.K.L., J.L. and P.C. carried out the investigation (conducting a research and investigation process, specifically performing the experiments or data/evidence collection). J.D.W., S.N., R.J.E., J.M., S.R., E.E.Q., H.L.N., T.S., M.L.M., J.W., S.W., S.M.W. and M.D.S. provided the resources (provision of study materials, computing resources or other analysis tools). P.C., S.M.W., M.D.S., S.W., J.W., J.R.S., M.L.M., T.S., H.P. and G.H. carried out the supervision (oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team). J.D.W., K.I., S.N., R.J.E., H.H., J.R., M.K.L. and P.C. did the visualization (preparation, creation and/or presentation of the published work, specifically visualization/data presentation). J.D.W., K.I., S.N., R.J.E. and J.R. wrote the original draft. J.D.W., K.I., S.N., R.J.E., H.H., J.R., S.R., E.E.Q., M.L.M., H.P., J.R.S., J.W., S.W., S.M.W., M.D.S. and P.C. reviewed and edited the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
H.L.N. has received $6,000 in consulting fees from Procter & Gamble, Company. Procter & Gamble, Company had no role in the conceptualization, design, data analysis, decision to publish or preparation of this manuscript. All other authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Hierarchical spectral clustering of facial shape.
a, Global-to-local facial segmentation of all 3D images (nTotal = 8,246), obtained using hierarchical spectral clustering. Segments are colored in teal and identical to those in Fig. 1. Roman numerals represent ‘quadrants’ of facial segments. b, The number of principal components retained after parallel analysis for each facial segment.
Extended Data Fig. 2 Study design.
Sample Wrangling: Images and genotypes from each study were intersected and unrelated participants of European ancestry, with quality-controlled images, covariates, and imputed genetic data were selected to obtain the analyzed data. Identification: For each facial segment, canonical correlation analysis (CCA) and Rao’s F-test approximation was used to identify the multivariate combination of facial principal components most correlated with the genotypes, which led to a P value (PCCA-US or PCCA-UK) and multivariate phenotypic trait most correlated with each SNP (TraitUS and TraitUK). Verification: The principal components of the other dataset were then projected onto this trait to obtain a univariate variable representing the distribution of participants from the verification dataset for the trait identified in the identification dataset (UniVarUK and UniVarUS). The genotypes of the verification dataset are then tested against this variable via linear regression, resulting in an additional P value (PUniVar-UK and PUniVar-US). Meta-Analysis: The P values from identification and verification are meta-analyzed using Stouffer’s method, resulting in the final set of P values from each meta-analysis track (PMETA-US and PMETA-UK).
Extended Data Fig. 3 Genomic signal correlations.
LDSC correlations between segments. a, Correlations between segments from different quadrants, ranging from 0.8 to 0.88, which seem to reflect both physical proximity of segments on the face and shared embryological origins. b, Correlations ranging from 0.88 to 1, which are mostly between segments within the same facial quadrant.
Extended Data Fig. 4 Clustering of facial segments on the basis of shared genetic signals.
Correlations between facial segments on the basis of SNP P values were calculated using LDSC, as described in Methods, and average-linkage hierarchical clustering was performed using the matrix of correlation values. Quadrant colors in legend refer to the quadrant of the polar dendrogram in which the facial segment lies in, also represented by the facial images at the top, and embryonic facial prominences are assigned to each facial segment.
Extended Data Fig. 5 GREAT and FUMA analyses showing enrichment for craniofacial and limb development.
a, GREAT analysis. For the top ten GO terms in each category, plotted is the binomial test Bonferroni-corrected P value (red; negative values) and binomial region fold enrichment (blue; positive values). Behind every GO term, in parentheses we indicate the number of genes in the test set with the annotation (Observed) and the total number of genes in the genome with the annotation (Total), with the format (Observed/Total). Dashed line represents significance at P = log10(0.05) = -1.3. b, FUMA analysis, indicating the KEGG pathways that were significantly enriched in our results. Multiple pathways are relevant for craniofacial development. The right panel shows the genes that are involved in the pathways.
Extended Data Fig. 6 H3K27ac signal is significantly different in 203 lead vs. 203 random SNPs for relevant facial tissues.
For all cell types and tissues, each represented by a point above, the median difference between H3K27ac RPM signal between the 203 lead SNPs vs. 203 random SNPs was tested for significance using a two-sided Wilcoxon rank-sum test. The thin dashed line represents the 5% false discovery rate P value of 0.0094, using the Benjamini–Hochberg method. Relative to the random, MAF-matched SNPs, the lead SNPs are significantly enriched for H3K27ac signal in many cell types, with the highest magnitude differences being from CNCCs (blue) and embryonic craniofacial tissues (orange). Test statistics used to create this plot are available in Supplementary Table 4.
Extended Data Fig. 7 Correlation of H3K27ac activity among SEM models.
a, For all segments (aka ‘masks’), we compared the H3K27ac activity for significant SNPs from the refined SEM model for variation in that facial segment. Plotted is the Spearman’s rho correlation between pairs of SNPs significant in the same SEM model (‘Within Mask’); pairs of SNPs where one is from the SEM model and the other is not (‘Within To Out’), and where both SNPs in the pair are from a different SEM model (‘Out To Out’). Segments where the distribution of correlation across all cell types was significantly different (Benjamini–Hochberg adjusted P < 0.05) based on a two-sided Kruskal–Wallis test are indicated in black. b, For all cell types, the median correlation across all segments is plotted for each of the three SNP groupings. Significance between the means was determined using a two-sided Kruskal–Wallis test. Boxplots plot the first and third quartiles, with a dark black line representing the median. Whiskers extend to the largest and smallest values no further than 1.5 × the inter-quartile range from the first and third quartiles, respectively.
Extended Data Fig. 8 Phenotypic and marginal distributions for diplotype combinations.
For a random SNP pairing (a) and each significant epistasis pair (b–d), boxplots are plotted to visualize the epistatic effect on the phenotype. The marginal phenotypic medians of the singular genotypes (non-shaded boxplots) were used to calculate and visualize the predicted diplotype phenotypic distribution that would occur if the two genotypes were acting alone. The median phenotype was also calculated for each diplotype as the average of the marginal medians of the singular genotypes (blue dashed lines on the colored plots). This median was compared to the observed medians of the diplotypes (solid black lines; colored boxplots) via Mood’s Median test with one degree of freedom. Log-transformed P values were used to color boxplots if there was a significant (P < 0.05; log(P) > 1.30) difference between the expected phenotype of the combined genotype and observed diplotype. Boxplots plot the first and third quartiles, with a dark black line representing the median. Whiskers extend to the largest and smallest values no further than 1.5 × the inter-quartile range from the first and third quartiles, respectively.
Extended Data Fig. 9 MSX1 and DACT1 loci.
LocusZoom plots for the two association signals nearby MSX1 (a), which has previously been implicated in orofacial clefting in humans and mice, and DACT1 (f), which is a novel result. Points represent one-sided -log10(P) of the METAUK meta-analysis track for the facial segment illustrated in the normal displacement figures (b, d, g) and are colored based on linkage disequilibrium with the labeled SNP. Asterisks indicate genotyped SNPs and circles indicate imputed SNPs. Facial effects for the two association signals nearby MSX1: rs3910659 (b) and rs13117653 (d) and the signal nearby DACT1: rs10047930 (g). Effects are the normal displacement (displacement in the direction locally normal to the facial surface) in each quasi landmark of the lowest facial segment reaching genome-wide significance in METAUK, going from the minor to the major allele. Blue indicates inward depression; red indicates outward protrusion. Yellow rosette plots depict the -log10(P) of the meta-analysis P value (one-sided, right-tailed) per facial segment in METAUK track. Black-encircled facial segments have reached genome-wide significance (P = 5 × 10−8). (c) rs3910659; (e) rs13117653; (h) rs10047930.
Extended Data Fig. 10 Regions nearby previously published SNPs associated with risk for Crohn’s disease are preferentially active in immune cells and tissues.
Each boxplot represents the distribution of H3K27ac signal in 20 kb regions around 619 Crohn’s disease-associated SNPs from the NCBI-EBI GWAS catalog in one sample. See Methods for details on calculation of H3K27ac signal. Samples corresponding to immune cells and tissues are highlighted in red. Thin dashed line at ~2.9 is the median level of signal across all cell types and tissues. Boxplots plot the first and third quartiles, with a dark black line representing the median. Whiskers extend to the largest and smallest values no further than 1.5 × the inter-quartile range from the first and third quartiles, respectively.
Supplementary information
Supplementary Information
Supplementary Notes 1–3, Methods, Figs. 1 and 2, and Data 1 and 2.
Supplementary Tables
Supplementary Tables 1–5.
Supplementary Data 1
For each of the 24 multi-peak loci (listed in Supplementary Table 5): (A) -log10(P) of the meta-analysis one-sided, right-tailed P value per facial segment in METAUS and METAUK tracks. Black-encircled facial segments have reached genome-wide significance (P = 5 × 10−8). (B) The normal displacement (displacement in the direction locally normal to the facial surface) in each quasi-landmark of the facial segment reaching the lowest P value in METAUS and METAUK, going from the minor to the major allele. Blue indicates inward depression; red indicates outward protrusion. (C) LocusZoom plots in METAUS (top) and METAUK (bottom), for the segment in which the SNP had its lowest P value (one-sided). Points are colored based on linkage disequilibrium (r2) in the 1000 Genomes Phase 3 EUR population. Asterisks represent genotyped SNPs and circles represent imputed SNPs.
Supplementary Data 2
For each of the 50 segments with well-fitting SEM models, in this table we provide the number of PCs included to represent shape variation in that segment, the number of SNPs that survived the model refinement process (see Methods), the P value cutoff used to perform the model refinement and determine the SNPs to be used for epistasis, the number of SNPs used in the epistasis analysis for this segment and values for the 𝛘2, CFI, RMSE and SRMR model fit indices, which were used to evaluate the models for our analysis. We also include the TLI and GFI model fit indices for completeness. This table also contains internal links to separate tabs, where, for each surviving model, we have listed the parameters used and the estimate, standard error, z-score, two-sided P value and 95% CIs. SNPs which were selected for epistasis testing are highlighted in green.
Rights and permissions
About this article
Cite this article
White, J.D., Indencleef, K., Naqvi, S. et al. Insights into the genetic architecture of the human face. Nat Genet 53, 45–53 (2021). https://doi.org/10.1038/s41588-020-00741-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-00741-7
This article is cited by
-
Precise modulation of transcription factor levels identifies features underlying dosage sensitivity
Nature Genetics (2023)
-
Exploring regional aspects of 3D facial variation within European individuals
Scientific Reports (2023)
-
Shaping faces: genetic and epigenetic control of craniofacial morphogenesis
Nature Reviews Genetics (2023)
-
Human-specific genetics: new tools to explore the molecular and cellular basis of human evolution
Nature Reviews Genetics (2023)
-
Oral and craniofacial research in the Generation R study: an executive summary
Clinical Oral Investigations (2023)