Genome-wide association scans of complex multipartite traits like the human face typically use preselected phenotypic measures. Here we report a data-driven approach to phenotyping facial shape at multiple levels of organization, allowing for an open-ended description of facial variation while preserving statistical power. In a sample of 2,329 persons of European ancestry, we identified 38 loci, 15 of which replicated in an independent European sample (n = 1,719). Four loci were completely new. For the others, additional support (n = 9) or pleiotropic effects (n = 2) were found in the literature, but the results reported here were further refined. All 15 replicated loci highlighted distinctive patterns of global-to-local genetic effects on facial shape and showed enrichment for active chromatin elements in human cranial neural crest cells, suggesting an early developmental origin of the facial variation captured. These results have implications for studies of facial genetics and other complex morphological traits.
The human face is a multipartite trait composed of distinct features (eyes, nose, chin and mouth), whose size, shape and composition are clearly heritable. However, knowledge of which genetic variants are responsible for human facial variation is still lacking1. Several genome-wide association studies (GWAS) have each identified a handful of loci associated with a small number of facial traits, with few of these loci having been replicated2,3,4,5,6,7. While a GWAS benefits from being an unbiased approach to gene mapping, the phenotypic descriptions in such studies are typically preselected and used to classify individuals in a ‘phenotype-first’ way of thinking. This approach may be appropriate in certain instances (for example, affected and unaffected disease status) but is less so for complex, multipartite traits like the human face. The result is that facial shape has been reduced to a limited series of measurements (for example, linear distances) that are analyzed individually, resulting in a loss of information.
In this study, we present a data-driven approach to facial phenotyping that exploits both the partable and integrated information contained in 3D facial images, allowing for the identification of genetic effects on facial shape at multiple levels of organization—from global to local. This approach generates a nested series of multivariate GWAS, with a low computational burden and, more importantly, controlled multiple-testing burden. We applied this new approach to a European-derived discovery cohort and then tested significantly associated variants for replication in an independent European-derived cohort. In an effort to provide additional validation, we integrated our work with previously published human facial GWAS. We show a number of newly associated genetic loci supported by strong statistical evidence, identifying unreported patterns in global-to-local genetic effects on facial shape. Furthermore, these loci are preferentially marked by active chromatin signatures in human cranial neural crest cells (CNCCs), an embryonic cell type that gives rise to most of the craniofacial structures. This suggests a developmental origin of much of the facial variation uncovered by our study. These results offer new insights on the genetic basis of human facial shape with potentially far-reaching implications. More generally, the results present an alternative to the prevailing phenotype-first mindset and our approach is widely applicable to any GWAS on complex, quantitative and multipartite traits, especially those captured thoroughly using images.
A study sample of 2,329 unrelated participants of European ancestry made up the discovery cohort for our analysis (the Pittsburgh sample, PITT). These participants had a median age of 23 years and were recruited from several US sites. An additional, independently collected and genotyped sample of 1,719 adult participants of European ancestry made up our replication cohort (the Penn State sample, PSU). These participants had a median age of 22 years and were recruited at several sites in the United States and Europe. For both cohorts, imputation of unobserved genetic variants and sporadic missing genotype calls for assayed SNPs was performed using the 1000 Genomes Project8 Phase 3 reference panel. Basic demographic descriptors and general physical characteristics (for example, sex, age, height and weight) were available from participants in both cohorts.
Global-to-local facial segmentations
We used digital stereophotogrammetry to obtain 3D facial images from all participants. After trimming and cleaning, 3D facial images were aligned in dense correspondence9, ensuring that homology was established among the roughly 10,000 3D points (considered quasi-landmarks10) making up an individual’s facial shape. Facial shape, in both cohorts separately, was adjusted for potential confounding variables (age, sex, height, weight, facial size and cohort-specific population structure) and then partitioned in an unsupervised manner into a series of global-to-local facial segments. To accomplish this, we applied hierarchical spectral clustering11 to the facial quasi-landmark configurations of the PITT sample. First, the quasi-landmarks were listed in a squared (~10,000 × ~10,000) similarity matrix using pairwise 3D correlations. Second, a Laplacian transformation was applied to enhance similarities before an Eigen decomposition of the squared matrix. Finally, within the Eigen spectral map, k-means++ clustering was used to group highly correlated quasi-landmarks, which, when mapped back to the facial surface, result in a segmentation of the face. This was done in a bifurcating hierarchical manner using five levels, such that facial segments with closer relationships were located nearby one another and each facial segment was split in two toward the next level, resulting in a total of 63 segments, as depicted in Fig. 1 using a polar dendrogram layout. The hierarchical design gradually focused on more local shape aspects that were otherwise overlooked, without ignoring the integration of facial parts more globally at higher levels. The same pipeline was applied to the PSU cohort (Supplementary Fig. 1), and the resulting segmentation was compared to the PITT segmentation using the normalized mutual information (0 < NMI < 1; 0, no overlap; 1, perfect overlap) at each of the five levels (NMIL0 = 1, NMIL1 = 0.90, NMIL2 = 0.80, NMIL3 = 0.72, NMIL4 = 0.75, NMIL5 = 0.78). These high NMI values indicate substantial overlap and, therefore, good reproducibility of the global-to-local facial segmentations across similar but independent samples. The hierarchical spectral clustering (Fig. 1; using the PITT cohort as the reference for the remainder of the study) subdivided facial shape into meaningful and recognizable segments. Globally, the midface was first separated from the rest of the face and was further partitioned into the region of the mouth (quadrant 1, starting at segment 4) and nose (quadrant 2, starting at segment 5). The remainder of the face was further partitioned into the lower facial area (quadrant 3, starting at segment 6) and the upper facial area (quadrant 4, starting at segment 7). Each quadrant was repeatedly partitioned, increasingly focusing on smaller facial parts. This provided an efficient and objective means for subdividing facial shape into global-to-local parts with internally well-correlated shape variations.
To generate variables that captured biological shape, we applied generalized Procrustes analysis (GPA) separately to the quasi-landmarks making up each facial segment. As such, a shape space for each facial segment was constructed that was independent from the other segments and its relative positioning in lower-level (larger) segments. Following GPA, we applied principal-component analysis (PCA) to extract the major factors of shape variation characterizing each facial segment. We used parallel analysis12 to determine the number of principal components (PCs) needed to adequately summarize shape variation for the given segment. The number of PCs retained for each facial segment after parallel analysis and the percentage of total variation they explained in the PITT cohort are depicted in Supplementary Fig. 2. As expected, we found that lower-level facial segments required more PCs and that the retained PCs for all facial segments explained most of the total shape variance (median of 95%, min = 89%, max = 97%) present within the respective segments. The result was nearly complete coverage of all facial shape variation at five different levels of detail.
Genetic mapping of global-to-local shape
We performed a series of GWAS to test the genetic association of a total of 9,478,608 SNPs with the shape information contained in each of the 63 facial segments using the PITT discovery cohort. As each of the facial segments was represented by multiple dimensions of variation (PCs), we used a multivariate canonical correlation analysis (CCA)13. In brief, CCA extracts the linear combination of PCs from a facial segment that has maximal correlation with the SNP being tested, under the additive genetic model, after correcting for confounding variables. Therefore, CCA avoids the preselection of individual PCs and reveals the associated facial effect as a linear direction in a multidimensional shape space.
We found a total of 1,932 SNPs among 38 separate loci that reached nominal genome-wide significance (P ≤ 5 × 10−8). Quantile–quantile, regional association and detailed supplementary illustrations of all 38 loci are provided online (see URLs). To accommodate the multiple-testing burden present in performing separate genome scans for 63 facial segments, we determined the false discovery rate for any test dependency structure14 (FDRd) P-value threshold at 2.82 × 10−8 and a more conservative study-wide Bonferroni-corrected significance threshold at 1.28 × 10−9. Following imputation in the PSU cohort, 1,931 of these 1,932 SNPs were available for replication testing. Because it is possible for a SNP to affect different aspects of variation in the same facial segment across different datasets, replicating an observed genetic association necessitated both replicating a significant effect on a given facial segment and a determination that the same aspects of variation were affected. Therefore, we first used the discovery cohort as a phenotyping reference for the replication cohort (see the Methods for additional details). Then, the multidimensional PC scores of the PSU cohort were projected onto the CCA loadings of the discovered effect. In doing so, the associated facial trait, once identified using CCA in the PITT cohort, was kept fixed and thus consistent between the PITT discovery cohort and the PSU replication cohort. This resulted in a univariate facial measure that was modeled using linear regression, to test for replication. This was done separately for each SNP and facial segment combination that achieved genome-wide significance in the discovery phase. From all replication efforts combined, we computed the FDRd adjusted significance threshold at 4.20 × 10−3 and a Bonferroni-corrected significance threshold at 3.28 × 10−4. A total of 1,821 SNPs across 15 loci replicated with P values well below the FDRd threshold (Table 1
We focused on the 15 replicated loci to investigate patterns of genome to facial shape associations. These 15 loci involved a variety of facial segments, and several loci affected segments in more than one facial region, as illustrated in Figs. 2 and 3. Several interesting patterns were observed. First, the majority of affected segments were located in the nose and the lower facial quadrant, primarily the chin. Second, some associations were involved in high-level segments, often at the outer layer(s) of the polar dendrogram, suggesting a localized phenotypic effect (for example, 1p32.1). In contrast, we observed other associations involving numerous linked facial segments from the edge of the dendrogram (level5) to the center (level0). Interestingly, the evidence of association (obtained by tracking the –log10P values in Fig. 3) in these instances may increase from center to edge (global to local; for example, 7q21.3) or from edge to center (local to global; for example, 1p12), reflecting a localized versus global phenotypic effect, respectively. Still others, for example, 1q31.3, reached a maximum association partway up the dendrogram, reflecting an effect on an intermediate global-to-local level. Interestingly, both of the loci on chromosome 17 had the same pattern of association across nose segments, but 17q24.3 (rs11655006[T>C]) showed the stronger association at the more global levels and 17q24.3 (rs5821892[C>CG]) showed strongest association at the most local level. Third, most associations were limited to linked segments within one facial quadrant of the polar dendrogram, but in a few cases genetic associations involved two distinct quadrants, for example, 2q31.1 and 3q21.3. Finally, in the online illustrations, we observed that facial effects propagated consistently in linked facial segments across different levels in the hierarchical design.
For completeness, we present an overview of all 38 loci identified in the discovery cohort with their peak SNP statistical details and imputation quality scores in Supplementary Tables 1 and 2, respectively. For 14 of the non-replicating loci, association was represented by a single SNP, and for an additional 5 loci the peak SNP had a minor allele frequency (MAF) less than 2%. Another locus that did not replicate showed a significant discrepancy in peak SNP MAF between the discovery and replication cohorts. Of the remaining 18 loci (also given in Table 1), 16 loci replicated at a nominal level of significance (P ≤ 0.05), 15 of which replicated below the FDRd threshold and 12 of which replicated below the Bonferroni threshold. Two loci with robust associations in the discovery cohort failed to replicate. Additional illustrations on the discovery and replication results of all 38 loci are presented online (see URLs).
Association between 15 genomic regions and signatures of active regulatory elements in cranial neural crest cells
To explore biological processes and phenotypes associated with the identified GWAS loci, we performed Genomic Regions Enrichment of Annotations (GREAT) analysis15. Remarkably, even though only 15 loci were used in this analysis, we detected significant associations with craniofacial development, including categories such as chondrocyte differentiation, first and second branchial arch mesenchyme, development of facial bones and cleft palate (Supplementary Fig. 3). Although craniofacial development is complex and involves interactions between multiple cell types, CNCCs, an embryonic cell group that arises at 3–6 weeks of gestation, have a central role in the formation of the facial plan and in determining its species-specific and individual variation16,17,18. During embryogenesis, CNCCs migrate away from the neural tube along stereotypical paths and form the majority of the cranial mesenchyme, which then differentiates into the bone, cartilage and connective tissue of the face and head19,20. We reasoned that, if divergence of the facial shape within the human population captured by our GWAS indeed originates early during embryogenesis, then the 15 loci should be preferentially active in CNCCs. To test this hypothesis, we took advantage of the epigenomic mapping datasets generated from human CNCCs, which were derived in vitro from human embryonic stem cells (ESCs) or induced pluripotent stem cells (iPSCs)21,22. Acetylation of histone H3 on lysine 27 (H3K27ac) has been associated with the promoters of transcriptionally active genes and with active distal enhancers, and thus is a useful mark to consider when exploring cell-type-specific activity of both coding and noncoding regions of the genome23,24. We quantified H3K27ac ChIP–seq signals in the vicinity (for example, within 10 kb) of the 15 peak SNPs in CNCCs of different genetic backgrounds and compared them to the H3K27ac signals over the same regions in over 30 other cell types, representing distinct adult, embryonic and in vitro–derived cell types. We observed higher H3K27ac signals in the vicinity of the peak SNPs in CNCCs than in any other cell type examined (Fig. 4). These observations are consistent with the preferential activity of the identified loci in CNCCs and with an embryonic origin of the human facial variation captured by our study.
A large majority of the genetic variants associated with human trait variation map to the noncoding parts of the genome. Many such variants are thought to reside within cis-regulatory elements25. However, cis-regulatory elements, especially distal enhancers, are often characterized by highly cell-type-specific activity patterns (reflected in their cell-type-specific chromatin marking patterns), underscoring the importance of analyzing relevant cell types for GWAS interpretation (reviewed in refs. 26,27). To explore the overlap between the 15 loci and regulatory regions active in CNCCs, we used our epigenomic datasets (which included p300, TFAP2A, NR2F1, H3K4me1, H3K4me3, H3K27me3 and H3K27ac ChIP–seq data and nucleosomal depletion analysis by ATAC–seq) to identify and classify cis-regulatory elements on the basis of the type of element and activity. Specifically, transcription factor binding and transposase hypersensitivity maps were used to identify the genomic positions of candidate regulatory regions, whereas relative enrichments of H3K4me1 to H3K4me3 were used to distinguish enhancers from promoters and H3K27ac versus H3K27me3 signals were used to infer active, premarked or poised/repressed activity states (depicted in different colors in Fig. 5a). We observed that the peak SNPs were more likely to be located within or in the immediate vicinity of (<1 kb away from) a detectable regulatory chromatin feature in CNCCs (P = 1.58 × 10−3). Furthermore, these SNPs were enriched in enhancer elements, especially strong active CNCC enhancers (P = 1.19 × 10−2, odds ratio = 7.7, Fisher’s exact test). An example of this is shown in Fig. 5b: peak SNP rs6740960[A>T] (affecting the lower facial quadrant) is located within an active CNCC enhancer marked by p300, H3K27ac and H3K4me1. Interestingly, the same variant has also recently been associated with an elevated risk of cleft lip and/or palate (CL/P) in a European cohort28. Furthermore, comparisons with our epigenomic datasets previously obtained in chimpanzee CNCCs21 showed that the overlapping enhancer is biased toward chimpanzees in activity (as evidenced by the elevated levels of H3K27ac in chimpanzee as compared to human CNCCs). This example and others of loci (for example, PAX3) that showed association with facial variation, disease and enrichment for regulatory elements divergent between humans and chimpanzees raise an intriguing possibility that the genetic variation within an overlapping set of loci/regulatory elements may influence both species-specific and individual facial shape variation in humans as well as determine predisposition to craniofacial malformations.
Candidate genes and integration with facial GWAS literature
A literature-based annotation of genes was performed, and their involvement in facial variation is reported in Supplementary Note 1. Additionally, in Supplementary Note 2, Supplementary Tables 3–6 and Supplementary Figs. 4 and 5, we first report statistical replication for 10 of 16 SNPs from six previous facial GWAS based on 2D or 3D images2,3,4,5,6,7 (Supplementary Table 3). Second, we consulted GWAS Central and Phenoscanner29 and report additional evidence of similar effects on factial shape for nine SNPs in Table 1 (Supplementary Table 4). Third, we cross-referenced ten of our loci with a recent study30 based on self-reported information on chin dimple and nose size (Supplementary Table 5). Table 1 indicates the involvement of each locus in any of these three integration efforts. In summary, for the 15 replicated loci in this work, 4 were completely new, 9 were previously discovered showing consistent facial effects and 2 were previously discovered with pleiotropic facial effects.
A deeper understanding of the genetic basis of human facial traits may provide insights into the mechanisms of craniofacial morphogenesis, improve knowledge of the complex relationship between genotype and phenotype in craniofacial syndromes and birth defects, and eventually provide a basis for predicting facial features in numerous applications, ranging from early diagnosis to personalized medicine, treatment planning in craniofacial surgery and orthodontics, and biometrics and forensics. In mice, whole-genome quantitative trait locus (QTL) studies have identified numerous genetic regions associated with craniofacial traits31,32,33. In humans, both candidate gene approaches10,34,35 and GWAS2,3,4,5,6,7 have implicated specific variants. While the highly complex nature of craniofacial morphogenesis suggests that many genes are likely to influence facial morphology, the number of associated loci uncovered so far using GWAS has been limited, with few independent replications.
Although traditional anthropometric facial measures remain a valuable resource for anthropologists and clinicians, because of their widespread use and simplicity, limited GWAS results have been obtained with such measurements so far. One reason for this is the difficulty inherent in defining proper facial types or relevant facial measurements a priori. As noted by Adhikari et al.4, quantitative measures, on the basis of which all previous facial GWAS efforts from 2D or 3D images have been conducted, are expected to yield higher statistical power than categorical ones, but they have not yet resulted in many robust findings. Using well-defined categorical scale ratings instead, they discovered and replicated several associated loci, which we also replicate here. Similarly, but using self-reported categorizations on chin dimple and nose size, Pickrell et al.30 identified a number of associated loci in an impressively sized sample (n > 70,000). However, the relatively simple categorical phenotypes used in these studies preclude a comprehensive and precise description of the effects of genetic variants on facial morphology. We argue that these approaches are perfectly in line with the traditional phenotype-first thinking common to GWAS and remain a good paradigm for case–control designs and simple quantitative or qualitative traits, but perhaps are less suitable for multipartite quantitative traits like the human face.
The craniofacial complex is initially modulated by precisely timed embryonic gene expression and molecular and cellular processes mediated through complex signaling pathways36. As humans grow, hormones, nutritional status and biomechanical factors affect the face37,38. A natural consequence of these forces and constraints during facial morphogenesis and growth is that the face exhibits a modular organization, with suites of facial features at different scales that show internal integration but remain relatively independent from other features39. Therefore, we expect the human face to be influenced by many genes that exhibit a range of effects, with some influencing only localized parts of the face and others influencing more global aspects of morphology. As a result, the proposed facial segmentation is related to the concept of modularity and integration in morphological studies40, with the main addition that the partitioning was done hierarchically, moving from globally integrated to locally focused modules. This allows for an investigation of facial shape effects propagating at different scales. On the basis of structural correlations between a vast number of individual 3D points on facial surfaces, we were able to hierarchically cluster the human face into segments. These were learned from the data and thus are unsupervised and data derived instead of candidate driven, and they could be duplicated across independent cohorts. This allowed for an efficient divide-and-conquer strategy enabling the screening of a huge set of facial variations without compromising statistical power.
Efficient use of phenotypic data from modest-sized cohorts is necessary for investigating 3D facial shape. In contrast to a recent GWAS on human body height41, which, owing to the widespread availability of data on the phenotype, was able to identify 423 loci in combined samples over two orders of magnitude larger than what was used here, the specialized phenotype data needed for investigating 3D facial shape are limited to a few, smaller cohorts. Using the PITT cohort from Shaffer et al.6, we managed to substantially increase the number of associated loci identified.
The global and local facial patterns as depicted in Fig. 3 for the discovered loci may help elucidate their roles during craniofacial development. It is clear that many of the genes at these loci are expressed in relevant tissues during embryonic development. Connecting these patterns of early expression and function to eventual morphology is a major challenge. The kind of highly refined facial effects presented here can help provide a roadmap to clarify the connections between molecules and morphology. Consider, for instance, two loci associated with different aspects of nasal shape, which are depicted in Supplementary Fig. 6. 19q13.11 (KCTD15) showed a highly focal effect limited to the nasal tip. In contrast, 6p21.1 (SUPT3H) affected primarily the nasal root and lateral parts of the nasal bridge, with its effects effectively sparing the nasal tip. A similar nasal phenotype was observed by Adhikari et al.4 at 6p21.1. These highly specific phenotypic effects might provide important insights about the role of these two genes during human facial morphogenesis and growth. KCTD15 has been shown to regulate TFAP2A, which has a critical role in neural crest formation42 and, when mutated, results in reduced snout length in mice, among other defects43. Perhaps KCTD15 affects nasal tip shape in humans by influencing chondrocyte proliferation in the nasal septum, whereas SUPT3H exerts its influence on nasal shape by affecting portions of the maxilla and nasal bones. The precision of the phenotypic effects identified here can form the basis of such testable hypotheses in the laboratory. As another example, previous research extensively described the role of SOX9 in the development of a broad range of tissues44. For facial development specifically, SOX9 is expressed in CNCCs populating the pharyngeal arches in the head region. Previous animal and developmental studies described that the first pharyngeal arch gives rise to the maxillary and mandibular prominences45. However, the results of our study show that the locus associated with SOX9 (17q24.3) and the one located 1 Mb away (CASC17, which is known to functionally interact with SOX9 through transcription factors) both influence nasal shape, although the nose is known to be formed from the frontonasal prominence. This raises a fascinating question about the involvement of SOX9 during the development of the nose and is an interesting subject for functional follow-up studies.
In our search through web-based GWAS repositories, we did find associations (P < 1 × 10−7) with non-facial traits for the loci in Table 1, including aspects of body size and composition (waist–hip ratio and height, TBX15–WARS2 and SUPT3H, respectively), developmental traits like age at menarche (RAB7A) and brain DNA methylation levels (RPS12–EVA4), and risk of biliary atresia (PAX3). That genes and, more specifically, even relatively small gene regions, such as those identified in GWAS, have multiple functions in the body, or pleiotropy, has been appreciated for some time46. However, the extent to which pleiotropy has a role in trait variation and disease risk is perhaps much greater than what has previously been appreciated47.
In conclusion, we proposed a data-driven global-to-local facial phenotyping approach, well suited for a genome-wide association scan. Using this approach, we have substantially advanced the literature on facial genetics on several fronts. First, we identified and replicated a number of new associated genetic loci using modest-sized cohorts. Second, we provided additional support for numerous previously identified loci and showed a strong integration of our results with the facial GWAS literature. Third, we demonstrated the preferential activity of the replicated loci in CNCCs, consistent with an embryonic origin of the human facial variation captured by our study. Lastly, we identified patterns of global-to-local genetic effects on facial shape, supporting the genetic organization of facial features at different scales.
National Institute for Dental and Craniofacial Research, http://www.nidcr.nih.gov/; National Human Genome Research Institute, https://www.genome.gov/; National Institute of Justice, https://www.nij.gov/funding/Pages/welcome.aspx; US Department of Defense, http://www.defense.gov/; MeshMonk facial mapping software, https://github.com/TheWebMonks/meshmonk; GREAT analysis, http://great.stanford.edu/; database of Genotypes and Phenotypes (dbGaP), http://www.ncbi.nlm.nih.gov/gap; facial images for the PITT cohort, https://www.facebase.org/data/record/#1/isa:dataset/RID=14283; GWAS Central, http://www.gwascentral.org/; Table Browser of the UCSC Genomes Browser, http://genome.ucsc.edu/cgi-bin/hgTables; OMIM database, https://www.omim.org/. All relevant study and results data to run future replications and meta-analysis efforts are provided at https://www.esat.kuleuven.be/psi/research/global-to-local-facial-phenotyping.
Sample and recruitment details
For the discovery cohort (the Pittsburgh (PITT) sample), data from 2,449 participants were obtained from the 3D Facial Norms repository48. The repository includes 3D facial surface images and self-reported demographic descriptors (for example, age and ancestry) as well as basic physical characteristics (for example, height and weight) from individuals recruited at four US sites: Pittsburgh, PA; Seattle, WA; Houston, TX; and Iowa City, IA. Recruitment was limited to individuals aged 3 to 40 years and of self-reported European ancestry. Individuals were excluded if they reported a personal or family history of any birth defect or syndrome affecting the head or face, a personal history of any significant facial trauma or facial surgery, or any medical condition that might alter the structure of the face. A total of 2,329 participants were retained for analysis, after removing 120 participants, with missing information on sex, age, height, weight or with 3D image mapping artifacts (n = 22).
For the replication cohort (the Penn State (PSU) sample), participants were recruited through several studies at the Pennsylvania State University and sampled in the following locations: State College, PA; New York, NY; Urbana-Champaign, IL; Twinsburg, OH; Dublin, Ireland; Rome, Italy; Warsaw, Poland; and Porto, Portugal. Participants self-reported information on age, ethnicity, ancestry and body characteristics, and data were gathered on height, weight and pigmentation of the hair and skin. Individuals were excluded from the analysis if they were below 18 years of age (range 18–88) and if they reported a personal history of significant facial trauma or facial surgery, or any medical condition that might alter the structure of the face. No restriction on ancestry or ethnicity was imposed during recruitment, but only individuals of European descent were used in this study (n = 2,059). Participants were removed because of missing sex, age, height or weight information (n = 81) or the presence of 3D image mapping artifacts (n = 33). A further reduction to n = 1,719 was done by excluding participants who were not included in the genotype imputation efforts.
Facial phenotyping, 3D imaging quality control and shape variables
3D facial surface imaging
Digital facial stereophotogrammetry was used to capture 3D facial surfaces in both samples. This well-established approach uses digital photography to generate a dense 3D point cloud representing the surface geometry of the face from multiple 2D images with overlapping fields of view. For the Pittsburgh sample, facial surfaces were acquired using 3dMDface camera systems. For the Penn State sample, 3D images were obtained with 3dMDface or Vectra H1 (Canfield Scientific) systems. Applying standard facial image acquisition protocols49, participants were asked to close their mouths and hold their faces with a neutral expression.
Spatially dense facial quasi-landmarking
3D images in wavefront.obj file format were imported into an in-house 3D image-cleaning program for cropping and trimming, removing hair, ears and any dissociated polygons. Five crude positioning landmarks were placed on the face to establish a rough facial orientation. An anthropometric mask (AM)50 was non-rigidly mapped9 onto all 3D images and their reflections, which were constructed by changing the sign of the x coordinate51. The AM is essentially a surface template covering the facial area of interest, and the mapping thereof onto the facial images is a process equivalent to the indication of traditional landmarks. This establishes homologous spatially dense (~10,000) quasi-landmark (QL) configurations for all 3D images and their reflections. In other words, image data from different individuals became standardized, enabling a spatially dense analysis.
Facial size was obtained as the centroid size of the quasi-landmark configurations. Facial shape was symmetrized using GPA52 to eliminate differences in position, orientation and size of both original and reflected quasi-landmark configurations. The average of an original and its reflected quasi-landmark configuration constitutes the symmetric component, while the difference between the two configurations constitutes the asymmetric component. Although of interest, in this work we currently ignore variations in facial asymmetry. Therefore, unless otherwise mentioned, when discussing facial shape we always refer to the symmetric quasi-landmark configuration.
Facial quality control
Outlier faces, due to quasi-landmark mapping errors, were detected by measuring the Mahalanobis distance for each face to the overall average face in the symmetrized shape space spanned by an orthogonal basis of principal components that captures 98% of the total variation. From the distribution of Mahalanobis distances, a z score for each facial shape was established, and each face with a z score equal to or larger than 2 was manually inspected for quasi-landmark errors. Identified erroneous faces were removed, and the whole process starting from the generalized Procrustes superimposition of both original and reflected quasi-landmark configurations was repeated. The PSU and PITT datasets were processed separately with the same AM, resulting in error-free symmetric and compatible (homologous) quasi-landmark configurations across both datasets.
Phenotyping the discovery panel
The PITT cohort served as the discovery panel. First, the superimposed and symmetrized facial shapes were corrected using a partial least-squares regression (PLSR; function plsregress from Matlab 2016b) for the confounders of sex, age, age2, weight, height, facial size and the first four genetic PCA axes to correct for population stratification. Of note, while correcting for genetic background and other confounders is a required step in genetic mapping efforts, it additionally ensures that structural facial variations due to the confounders do not influence the global-to-local segmentation of faces. Second, a 3D correlation, using the RV coefficient53, between each pair of corrected quasi-landmarks was computed to construct the squared similarity matrix (~10,000 × ~10,000). Subsequently, a hierarchical spectral clustering with five levels was performed, resulting in a total of 63 hierarchically linked facial segments with 1, 2, 4, 8, 16 and 32 non-overlapping modules at levels 0, 1, 2, 3, 4 and 5 (Fig. 1). The hierarchical design provides a stepwise focused shape analysis at different scales, by gradually zooming in to more local shape variations without ignoring the integration of facial parts at previous levels. Furthermore, specific shape effects should propagate consistently in facial segments linked across different levels, lending additional support that the discovered signals are robust. At each level, for each segment, all quasi-landmarks in the segment are subjected to a new GPA. As such, a shape space for each facial module is constructed independently of the other modules and its relative positioning within the full face. This is particularly interesting for smaller shape variations (for example, nose tip) that when superimposed using the full facial surface are overlooked and become undetectable. Finally, after GPA, each modular shape space is spanned by an orthogonal basis using PCA combined with parallel analysis12 to determine the number of significant components contributing to facial shape.
This application of PCA had several advantages. First, PCs form an orthogonal basis, such that shape variations could be described as linear combinations of PCs. Second, by selecting significant PCs, it was possible to eliminate (via parallel analysis12) noisy or meaningless shape variations that result from various sources of error, such as 3D image acquisition and/or quasi-landmark mapping. Third, PCA is an excellent tool to reduce the dimensions of high-dimensional data when strong correlations between the individual data elements are present. The construction of facial modules, as performed here, resulted in the grouping of highly correlated quasi-landmarks, such that a substantial reduction of dimensions was obtained. This in turn enabled the use of multivariate association techniques, such as CCA, that are constrained in the number of variables tested as a function of the sample size. For almost all segments and especially for the larger segments containing up to ~10,000 quasi-landmarks, this constraint would otherwise have been violated. Furthermore, the multivariate association was more computationally tractable given the lower number of variables tested. Finally, an effect identified in PCA space could be transformed back into the quasi-landmark shape space.
Note that some prior studies2,3,5 also analyzed facial shape data using PCs, but they did so by analyzing them separately. Basically, in these studies, full facial multivariate sparse landmark configurations were projected onto each PC and thus the phenotypic data were preselected along the PCs individually and one by one. Any preselection of measurements, either distances or PCs that are subsequently analyzed individually results in a loss of information, as combinations of these measurements are not considered. In contrast, our work uses all PCs together in a single multivariate association effort based on CCA. Therefore, any possible linear combination of PCs is investigated simultaneously and information is not lost. In CCA, ‘canonical’ is the statistical term for analyzing latent variables (which are not directly observed) that represent multiple variables (which are directly observed). Such an optimization and construction of latent variables is absent in a linear regression on an individual PC.
Phenotyping the replication panel
The PSU cohort was used as the replication panel. First, independently of the PITT dataset, the superimposed and symmetrized facial shapes were corrected, using a PLSR, for the confounders of sex, age, age2, weight, height, facial size and the first three PCA axes of genetic background. The reason for this separate and independent correction was twofold: (i) both cohorts have independent axes of genetic background and (ii) this workflow shows how future replication efforts can be done without the need to merge, and therefore have access to, all the genetic and covariate data from the discovery cohort. The establishment of spatially dense quasi-landmarks (AM mapping) allowed for consistent phenotyping. After AM mapping, the PSU faces were segmented using the quasi-landmark clustering from the discovery PITT cohort. Finally, for each facial segment, all PSU shape instances were superimposed onto the segmental average shape and PCs of the discovery panel. In this way, the replication effort only required the discovery quasi-landmark cluster labels, average shapes and PCs for each of the facial segments.
Genotyping, quality checks, imputation, population structure and annotation
The Pittsburgh sample
Genotyping was performed at the Center for Inherited Disease Research (CIDR) at Johns Hopkins University. Participants, including 70 duplicate samples and 72 HapMap control samples, were genotyped on the Illumina OminExpress + Exome v1.2 array plus 4,322 investigator-chosen SNPs included to capture variation in specific regions of interest based on previous studies of the genetics of facial variation. Standard data cleaning and quality assurance procedures54 were performed in collaboration with the University of Washington Genetics Coordinating Center. Specifically, samples were evaluated for concordance of genetic and reported sex, evidence of chromosomal aberrations, biological relatedness across study participants, ancestry, genotype call rate and batch effects. SNPs were evaluated for call rate, discordant genotype calls between duplicate samples, Mendelian errors in HapMap control parent–offspring trios, deviation from Hardy–Weinberg genotype proportions and sex differences in allele frequency and heterozygosity.
Imputation of unobserved genetic variants and sporadic missing genotype calls for assayed SNPs was performed using the 1000 Genomes Project8 Phase 3 reference panel. SHAPEIT255 was used for prephasing of haplotypes, and IMPUTE256,57 was used to impute nearly 35 million variants. SNP-level (INFO score > 0.5) and genotype-per-participant-level (genotype probability > 0.9) filters were used to omit poorly imputed variants. Masked variant analyses, in which genotyped SNPs were imputed as though they had not been assayed, indicated high concordance between imputed and observed genotypes: 0.982 for SNPs with MAF ≥0.05 and 0.998 for SNPs with MAF <0.05.
Population structure was assessed using PCA of approximately 97,000 autosomal genotyped SNPs chosen for call rate (>95%), MAF (>0.05) and LD (pairwise r2 <0.1 across variants in a sliding window of 10 Mb). Tests of genetic association between the first 20 PCs and all SNPs confirmed that PCs did not represent local variation at specific genetic loci. On the basis of the scree plot and joint distributions, four PCs are sufficient to capture population structure within the PITT sample.
The Penn State sample
Participants sampled from 2006–2012 (IRB 32341) were genotyped on the Illumina Human Hp200c1 BeadChip. Participants sampled from 2013–2016 (IRB 44929, 13103, 2503 and 4320) were genotyped on the 23andMe v3 and v4 arrays. Samples were evaluated for concordance of genetic and reported sex, evidence of chromosomal aberrations, biological relatedness across study participants, ancestry, genotype call rate and batch effects. SNPs were evaluated for call rate, discordant genotype calls between duplicate samples, Mendelian errors in HapMap control parent–offspring trios, deviation from Hardy–Weinberg genotype proportions and sex differences in allele frequency and heterozygosity.
Using the 1000 Genomes Project8 Phase 3 reference panel, samples with >500,000 variants were imputed, as fewer variants result in uncertain imputation probabilities. SHAPEIT255 was used for prephasing of haplotypes, and IMPUTE256,57 was used to impute nearly 35 million variants. SNP-level (INFO score > 0.5) and genotype-per-participant-level (genotype probability > 0.9) filters were used to omit poorly imputed variants.
To select the Europeans used in this study, the HapMap 3 dataset (n = 998) was merged with the non-imputed genotypes, and genetic ancestry was estimated using the ADMIXTURE program58 assuming k = 3 to 9 ancestral populations. On the basis of the cross-validation (CV) error for each k value, we selected a value of k = 6 as appropriate for the dataset. These results were then used to provide a matrix of genetic ancestry axes, which were used to select samples most closely related to the CEU and TSI population references.
Additional population structure was assessed using PCA of approximately 100,000 autosomal genotyped SNPs chosen for call rate (>95%), MAF (>0.05) and LD (pairwise r2 <0.1 across variants in a sliding window of 10 Mb). Tests of genetic association between the first 20 PCs and all SNPs confirmed that PCs did not represent local variation at specific genetic loci. on the basis of the scree plot and joint distributions, three PCs are sufficient to capture population structure within the European-only PSU sample.
Top SNPs and annotation
We observed 1,932 genome-wide significant SNPs across 38 loci using a 500-kb window. For each locus, a top SNP was defined as the SNP generating the highest association (lowest P value) in any of the 63 facial segments. Genes 500 kb up- and downstream of the top SNPs were identified using the Table Browser of the UCSC Genome Browser (see URLs). Each gene was annotated on the basis of the literature found in PubMed. As a search term, only the gene name was used, and relevant articles were selected based on their title and abstract. The OMIM database (see URLs) was searched for syndromes associated with the annotated genes.
Statistics and epigenomic analysis
Our global-to-local facial phenotyping partitioned facial shape into 63 facial segments, each of which was represented by multiple dimensions of variation (PCs). CCA13, as implemented in the function canoncorr from Matlab 2016b, was used as a straightforward multivariate testing framework (note that CCA is also implemented in PLINK 1.959). CCA extracts the linear combination of PCs from a facial segment that has maximal correlation with the SNP being tested. The correlation is tested for significance based on Rao’s F-test approximation60 (right-tail, one-sided test). Using CCA, we tested each SNP (n = 9,478,608) individually under the additive genetic model in the PITT cohort (n = 2,329). Note that CCA does not accommodate adjustments for covariates, but effects of important variables such as sex, age, height, weight, facial size and genetic ancestry were corrected for (using PLSR) at the phenotyping stage. Additionally, we applied a similar correction for the covariates on each SNP, again using PLSR. Therefore, the CCA analysis was performed under the reduced model, which was obtained after removing the effects of covariates on both the independent (SNP) as well as the dependent (facial shape) variables. The minimum MAF cutoff for SNPs to test was 1%. Quantile–quantile plots, provided online, indicate that the population stratification under this reduced model was dealt with properly.
Given the burden of multiple comparisons, a strict significance threshold of P ≤ 5 × 10−8 was used to declare ‘genome-wide significance’, which corresponds to a Bonferroni correction for 1 million independent tests and mostly applicable in a European GWAS cohort61. Given that we tested facial variation as one of many facial segments separately, the multiple-comparisons burden was magnified. Therefore, we also determined a more stringent threshold for declaring ‘study-wide significance’ corresponding to an additional adjustment for the effective number of independent tests62. The eigenvalues of pairwise multivariate correlations of all 63 modules determined a total of 39 effective independent tests. This reduction in effective tests was expected because of the hierarchical and overlapping construction of the facial segments. Therefore, the study-wide significance threshold was determined to be 1.2821 × 10−9 (i.e., 5 × 10−8/39). Additionally, we computed an FDR-adjusted significance threshold of Benjamini and Yekutieli14 that is accurate for any test dependency structure.
On the basis of previous work10, the effect found in the discovery cohort was measured directly in the replication cohort (n = 1,719), which was followed by a univariate analysis. First, the discovery cohort was used as a phenotyping reference for the replication cohort. Facial shapes of the replication cohort were projected onto the PCs of each of the 63 facial segments. Subsequently, they were further projected onto the SNP-related CCA loadings, constructing specific genetic effect scores10. In doing so, the phenotypic trait, once discovered, was fixed and explicitly measured in the replication cohort. This resulted in a univariate phenotypic measure that was tested for association using a standard linear regression. We report the regression coefficient, standard error and a P value (two-sided) based on a t statistic for regression coefficients using the function regstats from Matlab 2016b. This was done separately for each combination of SNP (n = 1,931) and facial segment that achieved genome-wide significance. From all replication efforts combined (n = 7,467), we computed an FDR-adjusted significance threshold14.
SNPs—tissue-specific enhancer association
ChIP–seq datasets from human and chimpanzee CNCCs were obtained from Prescott et al.21 and compared to 72 publicly available H3K27ac ChIP–seq datasets corresponding to 31 cell types/tissues (as indicated in Fig. 4a) and downloaded from the Sequence Read Archive (SRA) repository. Normalized H3K27ac ChIP coverage was calculated for 800,000 genomic regions corresponding to a superset of ENCODE DNase-hypersensitive sites (DHSs) and transcription factor–binding sites. Next, all intervals in the 20-kb windows centered on the 15 top replicated SNPs were selected (intersectBed), and a box plot (using the boxplot function in R with default interval settings) of the logarithm of normalized ChIP coverage was plotted. As a control, a five times larger set of randomized SNP positions was used in the same procedure.
SNPs—CNCC chromatin state association and statistics
Identification of different classes of CNCC regulatory regions based on chromatin signatures was performed as described in Prescott et al.21. In brief, 106,000 human CNCC candidate regulatory regions were called on the basis of combined p300, AP2α and NR2F1 ChIP–seq and ATAC–seq data by combining MACS2-called peaks into one set using mean-shift clustering. Normalized coverage around these regions was obtained for H3K4me1, H3K4me3, H3K27ac and H3K27me3 histone modifications. To visualize chromatin states at the candidate regions, the log of the H3K4me1 to H3K4me3 signal ratio was plotted versus the log of the ratio of H3K27ac to H3K27me3 as a scatterplot and colored by k-means (k = 5) cluster identity, corresponding to promoters, active enhancers, poised regions and premarked regions. Regulatory regions closest to the top replicated GWAS (closestBed function) SNPs were identified and are highlighted by circles. We observed 3 of 15 lead SNPs to be closest to a region with an active chromatin state (distance less than 9 kb in each case). This represents a significant enrichment for active chromatin–lead SNP association under Fisher’s exact test (contingency table: 3 active regions proximal to SNP, 12 other regions proximal to SNP, 3,622 active regions not proximal to SNP, 102,694 other regions not proximal to SNP), exact P = 1.19 × 10−2. We also compared replicated SNPs to non-replicated SNPs from the screen in terms of direct overlap (1-kb distance) with CNCC regulatory elements (contingency table: 26 non-replicated SNPs with no overlap, 0 non-replicated SNPs with overlap, 7 replicated SNPs with no overlap, 5 replicated SNPs with overlap), Fisher’s exact test P = 1.58 × 10−3.
Chromatin modification visualization in the genome browser
Kernel smoothed coverage files in wig format were generated with QuEST2.063 from datasets from Prescott et al.21 and uploaded to the UCSC browser. Representative tracks were selected to make Fig. 5b concise and instructive.
Gene Ontology term enrichment analysis
Analysis was performed with GREAT 3.0.015. Genomic positions for replicated SNPs were uploaded to http://great.stanford.edu/, and the results of the analysis in terms of FDR and enrichments for each significant category are summarized in Supplementary Fig. 4.
Cell line sources
Cells from WiCell (H9 ESC) and the laboratory of F. Gage (iPSC; Salk Institute) were authenticated by analyzing genomic sequence data from the cell lines. PCR tests were done to test for mycoplasma contamination.
Institutional review board (IRB) approval was obtained at each recruitment site, and all participants gave their written informed consent before participation; for children, written consent was obtained from a parent or legal guardian. For the Pittsburgh sample, the following local ethics approvals were obtained: University of Pittsburgh IRB PRO09060553 and RB0405013; UT Health Committee for the Protection of Human Subjects HSC-DB-09-0508; Seattle Children’s IRB 12107; University of Iowa Human Subjects Office/IRB 200912764 and 200710721. For the Penn State sample, the following local ethics approvals were obtained: State College, PA (IRB 44929 and 4320); New York, NY (IRB 45727); Urbana-Champaign, IL (IRB 13103); Dublin, Ireland; Rome, Italy; Warsaw, Poland; and Porto, Portugal (IRB 32341); Twinsburg, OH (IRB 2503).
Life Sciences Reporting Summary
Further information on experimental design is available in the Life Sciences Reporting Summary.
All of the genotypic markers for the Pittsburgh dataset are available to the research community through the dbGaP controlled-access repository (see URLs) at accession phs000949.v1.p1. The raw source data for the phenotypes—the 3D facial surface models in .obj format—are available through the FaceBase Consortium (see URLs). Access to these 3D facial surface models requires proper institutional ethics approval and approval from the FaceBase data access committee. Additional details can be requested from S.M.W.
The participants making up the Penn State University dataset were not collected with broad data sharing consent. Given the highly identifiable nature of both facial and genomic information and unresolved issues regarding risk to participants, we opted for a more conservative approach to participant recruitment. Broad data sharing of these collections would thus be in legal and ethical violation of the informed consent obtained from the participants. This restriction is not because of any personal or commercial interests. Additional details can be requested from M.D.S.
KU Leuven provides the spatially dense facial mapping software, free to use for academic purposes: MeshMonk (see URLs). Matlab implementations of the hierarchical spectral clustering to obtain facial segmentations are available upon request from P.C. The statistical analyses in this work were based on functions of the statistical toolbox in Matlab 2016b as mentioned throughout the Methods.
All relevant data to run future replications and meta-analysis efforts are provided in Matlab format online (see URLs). This includes the anthropometric mask used, facial segmentation cluster labels, PCA shape spaces for all 63 facial segments in the PITT cohort, CCA loadings and all association statistics for the peak SNPs in Table 1. An example Matlab script to explore the data is provided as well.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This investigation was supported by KU Leuven, BOF funds GOA, CREA and C1. The collaborators at the University of Pittsburgh were supported by the National Institute for Dental and Craniofacial Research (see URLs) through the following grants: U01-DE020078, U01-DE020057, R01-DE016148, K99-DE02560 and 1-R01-DE027023. Funding for genotyping was provided by the National Human Genome Research Institute (see URLs): X01-HG007821 and X01-HG007485. Funding for initial genomic data cleaning by the University of Washington was provided by contract HHSN268201200008I from the National Institute for Dental and Craniofacial Research (see URLs) awarded to the Center for Inherited Disease Research (CIDR). The collaborators at Penn State University were supported in part by grants from the Center for Human Evolution and Development at Penn State, the Science Foundation of Ireland Walton Fellowship (04.W4/B643), the US National Institute of Justice (see URLs; 2008-DN-BX-K125) and the US Department of Defense (see URLs). The collaborators at the Stanford University School of Medicine were supported by the Howard Hughes Medical Institute, NIH U01 DE024430 and the March of Dimes Foundation 1-FY15-312 (J.W.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Integrated supplementary information
A global-to-local facial segmentation of the PSU cohort obtained using hierarchical spectral clustering. Note that the order of quadrants, or facial segments at each level, is not necessarily the same as for Fig. 1, on the PITT cohort. The reason is the randomness in the clustering that does not preserve such order and hence the use of the normalized mutual information as a measure of overlap.
Left, the number of principal components retained after parallel analysis for each facial segment. Right, the amount of variation explained by the principal components expressed as percentage for each facial segment.
Supplementary Figure 3 GREAT analysis GREAT GO gene ontology analysis results for the 15 top replicated SNPs in
Table 1. Plotted is the binomial test FDR (cyan) and binomial enrichment (magenta) for indicated top associated biological processes, phenotypes and expression pattern categories.
a, –log10 (P value) of the canonical correlation per facial segment ranging from 0 to –log10 (8.01 × 10–5), i.e., the Bonferroni-corrected P value for literature replication. Black-encircled facial segments have reached nominal replication (P = 0.05). b, The canonical correlation [0 1]. c, The normal displacement (displacement in the direction locally normal to the facial surface) in each quasi-landmark of facial segment 45 going from the major to the minor allele SNP variant. Blue, inward depression; red, outward protrusion.
a, –log10 (P value) of the canonical correlation per facial segment ranging from 0 to –log10 (1.28 × 10–9), i.e., the Bonferroni-corrected P value for discovery. Black-encircled facial segments have reached nominal genome-wide significance (P ≤ 5 × 10–8). b, The canonical correlation [0 1]. c, The normal displacement (displacement in the direction locally normal to the facial surface) in each quasi-landmark of facial segment 11, going from the major to the minor allele SNP variant. Blue, inward depression; red, outward protrusion.
a, 6p21.1 locus with peak SNP rs227833 and candidate gene SUPT3H. b, 19q13.11 locus with peak SNP rs287104 and candidate gene KCTD15. The locus in a is primarily affecting the nasal bridge and ridge, leaving the nose tip unaffected. The locus in b is focused on the nose tip only, which could indicate potentially different underlying soft tissue regulations. Top, –log10 (P value) of the canonical correlation per facial segment ranging from 0 to –log10 (1.28 × 10–9), i.e., the Bonferroni-corrected P value for discovery. Bottom, the normal displacement (displacement in the direction locally normal to the facial surface) in each quasi-landmark of a representative facial segment per locus, going from the major to the minor allele SNP variant. Blue, inward depression; red, outward protrusion.