Heritability maps of human face morphology through large-scale automated three-dimensional phenotyping

The human face is a complex trait under strong genetic control, as evidenced by the striking visual similarity between twins. Nevertheless, heritability estimates of facial traits have often been surprisingly low or difficult to replicate. Furthermore, the construction of facial phenotypes that correspond to naturally perceived facial features remains largely a mystery. We present here a large-scale heritability study of face geometry that aims to address these issues. High-resolution, three-dimensional facial models have been acquired on a cohort of 952 twins recruited from the TwinsUK registry, and processed through a novel landmarking workflow, GESSA (Geodesic Ensemble Surface Sampling Algorithm). The algorithm places thousands of landmarks throughout the facial surface and automatically establishes point-wise correspondence across faces. These landmarks enabled us to intuitively characterize facial geometry at a fine level of detail through curvature measurements, yielding accurate heritability maps of the human face (www.heritabilitymaps.info).

Scientific RepoRts | 7:45885 | DOI: 10.1038/srep45885 measured. Radiographs and photographs are both flat, two-dimensional images. Their use to measure inherently three-dimensional (3D) objects, such as facial surfaces, limits the extent of shape variability that can be captured and constrains the range of facial morphological descriptors that can be extracted. Other issues can be identified in the process of constructing and measuring facial traits. Quantifying face variability is heavily dependent on establishing landmarks across faces. The number of points that can be manually annotated on a face is affected by the type of imaging modality used, and by the ability of a person to establish landmark locations in an unambiguous manner across samples. Consequently, it is common for studies to annotate as landmarks only a few prominent facial markers, such as eye and mouth corners, nose tip and zygomatic bones [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29] , ranging in number between 10 up to, in exceptional cases, 40 landmarks. It becomes clear thus that manual landmarking poses an important constraint limiting the extend of facial variability that can be captured. Furthermore, most classical studies adopted facial traits derived from two-dimensional distances between landmark pairs [16][17][18][19][20][21][22][23][24][25]28,29 . More rarely, angles between connected landmark pairs have also been considered 15,19,26,27 . The widespread adoption of such facial phenotypes could be justified by the small number of annotated landmarks, the relative simplicity in which these measurements can be acquired, as well as their ease of interpretation. On the other hand, they offer a perhaps oversimplified characterization of face morphology and fail to take into full account the geometric variability that can be observed across faces at a more granular level.
A separate limiting factor that has affected twin heritability studies is related to sample sizes employed and associated statistical modeling implications. Twin studies predominantly employ statistical methods that estimate heritability as the percentage of phenotypic variation that is due to variation of genetic factors 34 . The statistical power of such studies is defined as the probability of correctly rejecting the null hypothesis of zero heritability 35 . A multitude of factors affect power, including the combination of variance components used in the model, heritability effect size (ranging between 0 and 1), sample size and the proportion of monozygotic (MZ) to dizygotic (DZ) twins in the dataset. How to best optimize the experimental design in terms of sample size is still largely debatable 35 . By assuming the existence of only additive genetic effects in the twin model, a previous simulation study reported minimum required sample sizes ranging between 75 and 2,000 twin pairs in order to achieve 95% statistical power, depending on heritability effect sizes ranging from 0.8 to 0.2 respectively, and keeping the MZ to DZ ratio close to one 36 . In practice, the large majority of twin studies to date have relied on sample sizes of 20 to 100 twin pairs 6,10,11,13,19,20,[24][25][26][27][28] , possibly due to difficulties in recruiting large cohorts of twins. The use of small to medium sized cohorts may have thus resulted in under-powered studies, especially for traits with low to moderate heritability.
In this work we present a large-scale heritability study of face geometry that departs from previous related investigations in various aspects. First, we acquired 3D facial models. A system for high-resolution 3D photographic scanning, the 3dMD face imaging system, was used to generate anatomically precise three-dimensional polyhedral surfaces of the faces. To capitalize on these representations, we developed a novel automated landmarking procedure, GESSA (Geodesic Ensemble Surface Sampling Algorithm). The algorithm automatically places landmark points throughout the facial surfaces and establishes point-wise correspondence across subjects. GESSA enables the annotation of thousands of landmarks, resulting in the ability to capture morphological variation across subjects at a much finer level of granularity, whilst removing human measurement errors and enabling scalability to large cohorts. The position of each landmark is automatically determined by the algorithm, which attempts to distribute landmark locations uniformly on individual surfaces whilst establishing a precise correspondence across all faces. GESSA was validated on a publicly available dataset of 3D facial surfaces, Morphface 37 . The software and validation data are available to download from https://github.com/dimostsag/gessa. The availability of densely sampled landmark positions on each face enabled a wider range of facial traits to be defined, each capturing a specific aspect of face-shape variability. In this study, we demonstrate that local curvature traits, computed at each three-dimensional position across the facial surface, provide highly informative quantitative measurements of facial geometry, and explore for the first time their heritability.
A face heritability study was performed on 952 British twins recruited from the TwinsUK adult twin registry 38 . All subjects were females and unselected for anolyhedral mesh comprising of approximately 4,500 points. Using GESSA, we identified 4,096y disease, of which 197 were MZ and 279 DZ pairs. To the best of our knowledge, this is the largest twin heritability study of the human face. Each face was represented in the dataset as a 3D p landmarks on each face, each one contributing a local curvature value whose heritability was independently assessed. Curvature-based heritability estimates at the individual landmark level were combined into face heritability maps highlighting in great detail, for the very first time, which facial parts are under high and low genetic control. A multivariate analysis involving thousands of closely sampled landmarks further identified extended and well-defined facial regions sharing similar patterns of variability, with heritability estimates reaching or exceeding 0.7, including the chin, nasal regions, nasolabial folds, upper lips and zygomatic bones. In addition, using a smaller set of these landmarks, we explored the heritability of more traditional distance-based facial traits. A number of facial lengths, including bizygomatic and nose width, had heritability estimates close to or greater than 0.7, values that are significantly higher than the ones previously encountered in the respective literature. This is the first time that such a detailed and comprehensive evaluation of facial shape heritability has been investigated using a large cohort and 3D data capture technology. Our heritability findings are likely to support future genome-wide studies on facial geometry, while dense representations of facial surfaces through curvature indices may find further use in face recognition and reconstruction techniques.

Results
We present here the phenotyping and heritability results from our sample of 952 TwinsUK twins. Point correspondences for 4,096 landmarks were automatically established using GESSA. Shape-related phenotypes were constructed using four different curvature indices on the landmark sets. Visualization of heritability estimates associated to landmark-wise curvature traits produced high-definition heritability maps. A multivariate analysis Scientific RepoRts | 7:45885 | DOI: 10.1038/srep45885 of these landmark-wise measurements, based on sparse PCA (sPCA), indicated the presence of spatially coherent traits extending over larger areas of the face, whose heritability was also estimated. Finally, a subset of 17 landmarks was selected and the heritability of 20 traits based on Euclidean and Geodesic distances was calculated.
Curvature-based Morphological Traits in the TwinksUK dataset. The set of 952 three-dimensional facial surfaces from the TwinsUK cohort was processed with GESSA and 4,096 landmarks were identified. Each landmark contributed four shape-related phenotypes, associated with how bent the surface is around that point. These traits were computed using local curvature indices, namely Mean Curvature (MC), Gaussian Curvature (GC), Curvedness (CU) and Shape Index (SI), resulting in 16,384 quantitative traits per face. A detailed description of the curvature indices, and the rationale for using different types, can be found in the Methods section. For each one of these four measures, curvature maps of the average TwinsUK face -constructed by averaging landmark positions of all faces -were obtained by color-coding all facial landmarks according to their curvature values. Figure 1 shows the resulting maps, which provide easily interpretable representations of facial morphology and underline the different attributes of each curvature index. It must be noted that, due to ethical reasons, individual faces cannot be shown. We only visualize the average face, computed by averaging landmark coordinates of the 952 faces. Due to the large sample size of our dataset, the average face appears somewhat smoothed, in particular around the mouth and eye regions. Supplementary Fig. 10 shows average faces obtained using only 10 and 200 individuals. It can be clearly noticed that the increasing sample size has a smoothing effect on the average face.
The MC index provides a balanced measure between shape morphology, i.e. flat vs. cylindrical vs saddle structure, and curvature magnitude, i.e. how bent the surface is irrespective of shape. In the MC curvature map, points belonging to protruding concave regions like the nose, chin and eyebrows had positive MC values, with higher measurements observed in the nasal surface. Flatter areas, such as cheeks and forehead, had MC values close to zero, while inner eye corners, ala of the nose, and to a lesser extent, the corner areas between lips and chin were comprised of negative-valued curvatures. GC describes well variation between and within cylindrical and saddle-like structures, while being less sensitive to other shape characteristics. In the GC map, facial features associated with positive values were the cylindrically structured inner eye corners and nose tip, while saddle-like regions such as nasion and base of the nose showed negative GC values. CU is affected by changes in the magnitude of the curvature but not by shape morphology. The CU curvature map highlighted facial parts with large overall curvature, for example nose and eyebrows, which confirms the intuitive observation that highly curved facial parts are mainly centrally located in the face. Contrary to CU, the SI index primarily distinguishes between different shape morphologies, but is less sensitive to curvature magnitude. The SI curvature map showed that that most facial areas have positive values corresponding to generally cylindrical structures.
The sample variability of each curvature trait was also investigated. Curvature variance maps can be found in the Supplementary Fig. 4. Depending on the curvature index, various facial areas showed increased variability. High variance of the MC measurements was observed in areas such as eye sockets, ala of the nose and mouth. GC and CU variance was located mainly in the nose, eye and philtrum regions, while the SI captured increased variation in the zygomatic and mouth areas. It is important to notice that particular facial areas, namely the mouth and eye regions, showed consistently high variability, which could relate to the increased motional ability of the specific structures. Irrespective of the curvature index used, average phenotypic variability was always higher in the subset of DZ, compared to the MZ pairs. As heritability is based on differences in similarity between MZ and DZ pairs 39 , we also computed the means and variances of absolute trait differences between pairs. The results were combined into facial maps ( Supplementary Fig. 5) and showed clearly higher values in the DZ subset, compared to the MZ one.
Univariate Heritability Analyses. We performed univariate analyses of all 16,384 curvature traits -4 traits per landmark -with the aim to combine local heritability results and produce global maps of heritability for the human face. The heritability of each trait was independently estimated using Structural Equation Modeling (SEM) 40 . The method evaluates which combination of additive (A) genetic, common (C) environmental and unique (E) environmental variance components can best explain the observed phenotypic variance and covariance of MZ and DZ twin data; see Methods for a detailed description of the model approach. Different combinations of A, C and E component models were considered. The Akaike Information Criterion (AIC) 41 was used to guide model selection. AE models were the best-fitting ones according to AIC. Summary statistics for all fitted models can be found in Supplementary Table 1. SEM also assessed the ability of the model to fit the observed data. Model Goodness-of-fit was examined using a log-likelihood ratio test between the structured model and a fitted saturated model where no structure was imposed on the covariances. Test p-values above 0.05 translated to the structured model providing a better fit than the saturated one. Details regarding goodness-of-fit can be found in the Supplementary Text. From the sets of 4,096 traits per index, 86.02%, 67.9%, 86.8% and 87.06% AE models for MC, GC, CU and SI respectively had goodness-of-fit p-values above 0.05.
For each curvature type, the 4,096 heritability estimates were visualized in a single heritability map plot. Each map provides a graphical representation of the extent by which the geometry of facial regions is controlled by genetic variability. Frontal and side facial views of these maps are shown in Fig. 2. For interactive viewing of the heritability maps, a website was created at http://heritabilitymaps.info/.
Several landmarks with high (> 0.65) heritability estimates were localized on well defined facial areas. Irrespective of the curvature index employed, landmarks belonging to the mental region, philtrum, nasal tip, nasion, inner eye corners, nasolabial folds and frontal process of maxilla gave consistently high estimates. Amongst all curvature types, MC yielded the highest heritability estimates over extended facial regions. The MC heritability map highlighted further highly heritable areas, including the zygomatic lines around the eye sockets, side areas of the mental foramen, the upper lip and frontal eminences. The results of the GC traits showed high heritability for a number of saddle-like facial structures, namely the whole nasion region, the philtral ridge, as well as the lower nasal bone. In the CU heritability map, a strong genetic influence is observed in the angular transition from the nasal bone and glabella towards the frontal eminences, as well as in the overall roundness of the facial circumference, highlighted by heritable lines across the upper part of the forehead and the lower part of the ramus of the mandible. The SI heritability map showed strong genetic control of the softly spherical flat regions of frontal eminences and upper lip, the cylindrical structure of the upper zygomatic bones and sides of the nose and finally the saddle-like areas of nasion, nasal bone and ala of the nose. The results indicated that local curvature is strongly determined by genes for large parts of the face.

Multivariate Heritability Analyses.
We aimed to identify larger facial areas showing common patterns of shape variation and produce single-valued heritability estimates for them. For each face and curvature index, we decomposed the aggregated landmark phenotypes using sparse Principal Component Analysis (sPCA) 42,43 (see Supplementary Text for further details on sPCA). sPCA automatically identified traits comprised of linear combinations of landmark phenotypes with similar curvature variability. We refer to these as regional traits. Sparsity affected the number of landmark traits comprising each regional phenotype, and was imposed in order to acquire measurements corresponding to extended but spatially consistent facial regions. The amount of sparsity was controlled through a single parameter. Different values were tested and results showed that the parameter had little effect in the heritability estimation process ( Supplementary Fig. 3). The parameter was set to 7.5 for GC, 12.5 for SI and 15 for MC and CU. We estimated the heritability of 100 regional traits for each curvature index. Each set of regional traits captured approximately 90% of its corresponding curvature's sample variability. The cumulative amounts of explained variances for all regional traits are reported in the Supplementary Files 1-4, for MC, GC, CU and SI indices respectively.
Average heritability statistics for the regional trait analyses are included in Supplementary Table 1. We found again that the best models were the AE as assessed by AIC. Goodness-of-fit p-values above 0.05 were acquired for 81, 75, 87 and 73 regional traits of MC, GC, CU and SI respectively.
The facial areas associated to regional traits were visualized by color-mapping the coefficients of the linear combinations -weights by which landmark traits contribute to the regional phenotypes -on the facial surface. We refer to these maps as Eigenface maps, since they correspond to the eigenvectors of the multivariate decomposition. Eigenface maps of the 10 highest variance explaining regional traits for each curvature index are shown in Supplementary Fig. 2. Figure 3 shows the Eigenface maps of the top 5 heritable regional traits for each curvature index, along with their heritability values. Corresponding phenotypic correlations for MZ and DZ subsets are reported in Supplementary Table 2. The maps of the most heritable traits were mostly comprised of landmarks closely located to each other, indicating patterns of common shape variation in the respective areas.
Heritability results for the regional and univariate traits were in good agreement. From Fig. 3 it is obvious that the morphologies of regions such as nasolabial folds, zygomatic bones, inner eye corners, mental region, frontal eminences and ala of the nose were highly heritable (> 0.65). Due to the good segmentation of the facial surface into clearly identifiable regional traits, we also identified heritable areas that were not as easily noticeable in the heritability maps. In particular, the mental foramen showed up as the second top heritable phenotype in the SI results, while the condyloid process of the mandible was highlighted in the fifth and third top heritable trait in MC and SI analyses, respectively. In the discussion, we identify regional traits by their curvature index and their variance-explaining order, as shown in Fig. 3.

Heritability of Distance-based
Traits. An analysis of traditional distance-based facial traits was performed to gain insights about the relative merits of our phenotypes and also enable direct comparisons to previous heritability studies. Out of the 4,096 landmarks, we located 17 corresponding to prominent fiducial points, by visual inspection of the average TwinsUK face. Figure 4 shows the selected landmarks. Using the computed correspondence, we were able to automatically locate the 17 landmarks on all 952 faces. Ten facial traits derived from Euclidean distances (EDTs) between selected landmark pairs were subsequently considered. The phenotypes are summarized in Table 1. In addition, we constructed ten equivalent distance traits measured as lengths of connecting paths between the same landmark pairs, where the paths were restricted to lie only on the facial surfaces. Such distances, defined on non-flat surfaces, are called Geodesic distances. An illustration of the difference between the two types of distances can be seen in Supplementary Fig. 6. We examined whether the use of facial traits derived from Geodesic distances (GDTs), only possible on 3D data, yielded any advantages compared to Euclidean traits.
Heritability estimation for the 10 EDTs and their equivalent GDTs was carried out using ACE, AE and E structural equation models, as before. AE models provided on average the best fits according to AIC. Supplementary Table 3 provides detailed statistics for the fitted models. Table 2 shows the resulting heritability estimates as well as SEM Goodness-Of-Fit test p-values. In the remainder, we concentrate on traits whose models' Goodness-Of-Fit test p-values were greater than 0.05, thus providing good fits of the observed data. Four EDTs corresponding to horizontal facial measurements, namely nose, zygomatic, mandible and mouth widths were found to be highly heritable. Of these, estimates greater than 0.7 were acquired for the two upper/middle face EDTs, zygomatic and nose widths, while the heritabilities of the mandible and mouth widths were found to be slightly lower at approximately 0.62 and 0.67, respectively. The EDT corresponding to nasal protrusion was also found to be moderately heritable at approximately 0.55. Heritable GDTs included the mandible and mouth widths, nasal protrusion, lower face height and biocular width (i.e. the distance between inner eye corners). The first three GDTs had slightly lower heritability estimates compared to the corresponding EDTs. Mandible and mouth GDTs traverse the lip region, which had low heritability estimates in our previous analyses. On the other hand, the biocular GDT, which traverses a surface area with consistently high heritability, i.e the nasion, provided the highest estimate (0.789) among all distance phenotypes.

Discussion
We presented a novel landmarking and phenotyping methodology for 3D surfaces and performed a large-scale twin heritability study of the human face. A salient aspect of our analysis is the automated dense landmarking procedure. Dense landmarking approaches have been recently adopted in face modeling and candidate association analyses in order to study genetic syndromes involving facial dysmorphisms and asymmetries 44,45 and recognize genetic variants that explain normal face variation 46,47 . Existing methods, though, are either still heavily . Eigenface maps of the top heritable regional traits. The maps depict the weights by which landmark phenotypes contribute to the regional traits. sPCA was utilized in order to achieve good spatial consistency of the Eigenface maps. Mean Curvature Eigenface Maps: Distinctive facial characteristics that emerged as heritable were the nasolabial folds, transitions to the ala of the nose, frontal eminences, zygomatic areas between and below the eye sockets and the condyloid process of the mandible, with the latter been more clearly portrayed in the multivariate rather than the univariate heritability analysis. Gaussian Curvature Eigenface Maps: The GC Eigenface maps were highly localized, compared to the rest of the curvature indices. Heritable regions included for the philtrum, ala transitions, inner eye corners and chin facial regions. Curvedness Eigenface Maps: The top heritable traits highlighted mainly the zygomatic areas around the eye sockets, as well as the nasion, chin and upper forehead areas. Shape Index Eigenface Maps: Highly heritable regional traits were located in the nasolabial folds, zygomatic areas,chin, condyloid process of the mandible and mental foramen regions.
Scientific RepoRts | 7:45885 | DOI: 10.1038/srep45885 dependent on some form of manual landmarking, which can be a tedious and error-prone process, or not suited for the analysis of polyhedral surfaces. Further details on related work can be found in the Supplementary Text. Here, we propose a new methodology, GESSA, which makes use of appropriate mathematical structures, such as distances and paths, directly defined on the surfaces, in order to provide uniform and dense landmarking of 3D polyhedral models in an accurate and efficient manner. The GESSA software and validation data can be freely downloaded from https://github.com/dimostsag/gessa. Non sensitive data, such as curvature data matrices, are available upon request to the authors. Furthermore, this is the first time that dense facial landmarking has been used in a twin heritability study.
A second novel aspect of our methodology, facilitated by the three-dimensional face models, is the use of curvature-based phenotypes. For each landmark point in the surface, four different types of univariate measurements describing curvature for local patches centered around the landmarks were considered. Each curvature  type is able to highlight varying morphological structures. The use of these traits enabled us to characterize local variability in facial shape and identify its genetic content. Heritability estimates for individual landmark traits were combined to provide detailed global maps of heritability for the human face. Furthermore, regional traits, defined as linear combinations of the single landmark traits, were computed by a multivariate decomposition of the previous traits. Heritability analyses of the latter phenotypes allowed us to report on accurate heritability values for well-defined facial regions. In the literature, similar curvature traits have been successfully used for other applications such as face detection 48 , recognition 49,50 , segmentation 51,52 and affinity estimation 52,53 .
Our landmarking and phenotyping pipeline was employed for the analysis of 456 female twin pairs from the TwinsUK cohort 38 . Previous work has suggested that the TwinsUK sample is representative of the general British population, where the sample were ascertained from 54 . For facial traits in particular, it may be expected that gender also plays a role in shaping up the facial characteristics we study, such as curvature measures. For this reason, caution is warranted before any of these results are generalized and extended to male subjects.
To our knowledge, this is the largest face heritability study ever done. Compared to most commonly encountered sample sizes of between 20 and 100 pairs 6,10,11,13,19,20,[24][25][26][27][28] , the increased number of subjects improves the statistical power of our study to identify effects of heritability. Based on a previously published simulation study on the power of twin studies 36 , our sample size surpasses the minimum requirements for having 95% power of rejection of the false -zero heritability -hypothesis at the 5% significance level even when the true heritability effect is as low as 0.3.
We were able to identify curvature-based facial traits that were highly heritable (> 0.65) in both of our curvature-based analyses. A direct comparison with previously reported heritable facial lengths and angles is not straightforward, due to the different nature of the measurements. Certain connections though were made between our heritability and Eigenface maps and related published findings. A heritability estimate of 0.53 was previously reported for nose width in a pedigree analysis of 229 Korean individuals 29 . The MC heritability map (Fig. 2) showed high heritability for the line between the left and right corners of the ala through the base of the nose. In the same study, inner eye corner distance had a heritability estimate of 0.61. The GC, CU and SI heritability maps all indicated that the shape of the nasion region, including the inner eye corners, was highly heritable. This inference was further supported by our multivariate analysis. The 3rd and 5th most heritable GC regional traits, as well as the 3rd top heritable CU trait (Fig. 3) had Eigenface maps concentrated on the nasion area (Fig. 3) and their respective heritability estimates ranged between 0.699 and 0.737. Moderate to high heritability, ranging between 0.4 and 0.7, was identified in twin and family studies for the facial width 22,23,26,27 . Our MC and SI heritability maps (Fig. 2) highlighted highly heritable lines following the curve of the zygomatic bones. Another phenotype that was reported with heritability estimates of 0.59 and 0.66 is head circumference 22,55 . This result can be connected to the elevated heritability estimates regarding curvature magnitude in the periphery of the face, that we observed in the CU heritability map (Fig. 2). Finally, a number of twin and family studies identified moderate to high heritability estimates for various phenotypic traits relating to the position, length and angular structure of the jaw bone. Mandible ramus and body length heritabilities were reported to be 0.72 and 0.77 respectively in a study of 363 children and their parents 15 , while the angle between the two lines was found to have moderate heritability 0.47 and 0.453 in the same analysis and in a different study of 77 twins 25 . Chin width was estimated to have heritability of 0.42 in a family study 29 . In our multivariate results, we observed high estimates in the chin area -2nd SI, 8th CU and 15th GC regional traits, with respective trait heritabilities ranging from 0.685 to 0.711 (Fig. 3). All above comparisons are summarized in Supplementary Table 4.
We also explored the heritability of facial traits based on Euclidean distances (EDTs), as well as equivalent phenotypes measured using Geodesic distances on the facial surfaces (GDTs). 10 EDTs and GDTs yielded reliable heritability estimates, ranging from 0.505 to 0.789. Evidence supporting our results were found in a number of family-based heritability analyses. A previously mentioned study reported its highest heritability estimates of 0.42, 0.44, 0.53 and 0.61 for mandible width, nasal protrusion, nose width and inner eye corner distance respectively 29 . Here, the nose width EDT and biocular width GDT yielded two of the three highest estimates − 0.718 and 0.789 respectively-, while nasal protrusion and eye distance EDTs and GDTs were moderately to highly heritable, with heritability estimates ranging from 0.505 to 0.677. Nose width was also reported to be moderately to highly heritable in two studies of 125 Belgian and 342 Indian families with corresponding estimates of 0.639 and 0.498 12,32 . The same studies reported heritabilities of 0.606 and 0.605 for the bizygomatic width, while three further analyses, including large population samples from Russia and India and Europe, gave estimates of 0.52, 0.71 and 0.629 for the same trait 22,26,30 . Our most heritable EDT, with h 2 = 0.734 corresponded to that width. In comparison to previously reported values, our heritability estimates were either similar or significantly higher. Important inferences were made by exploring results from the various types of phenotypes used in this study. In the distance-based study, EDTs corresponding to mandible and mouth widths showed slightly higher heritability estimates in comparison to the respective GDTs. The latter traits had geodesic paths passing through the lip region, which, in the curvature analyses (Fig. 2), had consistently low heritability estimates. On the other hand, the GDT corresponding to inner eye corner distance gave the highest heritability estimate of 0.789. The geodesic path of that trait traverses the nasion region, which is a well known highly heritable area of the human face 24,29,56 . The results could indicate that GDTs are more sensitive than EDTs to the facial morphology between the considered landmarks. A comparison of summary statistics from the curvature and distance heritability analyses revealed that Structural Equation Models provided good data fits for approximately 80% of the curvature traits, but only 50% of the distance phenotypes, as assessed by Goodness-Of-Fit log-likelihood ratio tests. This observation could be a strong indicator that curvature phenotypes are more suitable for the study of facial morphology.

Conclusion
In this work we proposed a new method for the automated and dense landmarking of 3D surfaces, GESSA, and applied it in a novel large-scale twin heritability study of the human face morphology. Heritability estimates were computed for local curvature phenotypes corresponding to single landmarks and the results were combined to generate face heritability maps. Furthermore, regional curvature traits, corresponding to larger facial areas, were extracted through a multivariate analysis and their heritability was also estimated, yielding a number of highly heritable facial features. Heritability estimation was also performed for a number of traditional facial length traits, with estimates being equivalent or higher than the ones found in the existing literature. In conclusion, we provided a fresh perceptive in facial phenotyping and heritability analysis that could potentially inform future genome-wide studies and be useful in a variety of applications, ranging from population genetics and gene-mapping studies, to face modeling and reconstruction applications.

Methods
Sample Description. Imaging data were retrieved for 1,547 participants of British origin from the TwinsUK Cohort 38 , for which 3D facial models and age information was available. All subjects provided written and informed consent for academic use of the data. Experiments were approved by the Guy's and St. Thomas' (GSTT) Ethics Committee. The research was done in accordance with the tenets of the Declaration of Helsinki. The sample consists predominately of female twins unselected for any disease. Detailed sample characteristics of the TwinsUK Cohort have been previously described 38 . 314 subjects were dropped due to artifacts in the images or lack of zygosity information. 228 unrelated subjects were excluded from further analysis. An additional 56 subjects were removed due to mesh reconstruction errors during our phenotyping process. After preprocessing and quality control, we were left with 952 female subjects (59.31 mean age, standard deviation 9.85). Of these, 197 pairs were monozygotic and 279 were dizygotic.
Image Acquisition and Preprocessing. The raw 3D facial images were acquired using a 3D photographic scanning system manufactured by 3dMD. Participants were asked to keep their mouths closed and a neutral expression during the acquisition of the 3D scans. Each image was comprised of a 3D triangular mesh, with approximately 4,500 points representing the frontal facial surface, and the corresponding texture map. Texture maps were not used in our analysis. Due to large variation in the original pose and position of the original meshes, we manually located outer eye corners and nasion in all faces. The three landmarks were used to impose a common orientation of the faces under the same coordinate space. High landmark accuracy was not important for this purpose and these landmarks were discarded from any subsequent analysis step. Following that, the surfaces were cropped and trimmed to remove non-facial areas, such as neck and chest regions, hair and ears. Finally, the Iterative Closest Point (ICP) algorithm 57 was applied to align the cropped images. The preprocessing pipeline was performed using the Meshlab software suite.

The Geodesic Ensemble Surface Sampling Algorithm (GESSA). For the streamlined identification
and alignment of landmark points across 3D faces, we propose and use our novel landmark sampling procedure, Geodesic Ensemble Surface Sampling Algorithm (GESSA). GESSA automatically samples large sets of corresponding landmark points from sets of similar polyhedral surfaces. We formulate the problem of finding corresponding landmarks as a minimization of an objective function comprised of two entropy-based terms. The first term is the entropy of the data probability distribution. Minimization of this term is performed by moving landmark positions and leads to improvement of landmark correspondences across surfaces. The second term is the sum of entropies for landmark distributions on individual surfaces. By maximizing this term the algorithm achieves a uniform distribution of points on all related surfaces.
The above formulation is adopted from the work by Cates et al. 58 , which was, in turn, based upon previous work on statistical shape modeling with information-based optimization functions. The first such work that we are aware of was presented by Kotcheff and Taylor 59 , and subsequent articles by Davies et al. 60,61 . Further details on these methods can be found in the Supplementary Text. Here we built upon that framework and use geodesic distances between landmarks, directly defined on the polyhedral facial surfaces, in order to increase precision Scientific RepoRts | 7:45885 | DOI: 10.1038/srep45885 during uniform landmark sampling. Furthermore, a suitable gradient descent optimization technique was developed to optimize landmark locations by operating only on the surface structures. By incorporating these two key features, we were able to to deal with highly curved surfaces, improve upon computational space requirements and enhance the correspondence results.
General Methodology. Let us consider an ensemble of N polyhedral surfaces. Each surface S j , j = 1 … N, is represented, without loss of generality, as a a triangular mesh in R 3 . Our objective is to sample M points … x x , , , uniformly from each surface, and establish one-to-one correspondence among points on all surfaces. As such, the overall correspondence problem can be broken down to two major components; correspondence optimization of landmarks across the ensemble and uniform sampling on individual surfaces.
The coordinates of M points on a surface S j can be aggregated into a vector z j , with The vectors z j can be thought of as point representations of surfaces distributed in R 3M . As such, R 3M is taken to be the space of all surfaces, when each one is sampled in M locations, referred hereafter as shape space.
Consider Z ∈ R 3M to be a random variable in shape space with probability density function p(Z), and z j , j = 1 … N, realizations of that random variable. The differential entropy of Z is given by As such, the sample differential entropy is given by One-to-one correspondence of landmarks can be optimized by minimizing the above sample entropy. This minimization increases the compactness of the surfaces' distribution in shape space, which equates to bringing surface landmarks closer to each other. A potential risk though is that landmarks can be collapsed to the same surface locations. The solution is to balance the correspondence accuracy with uniform distributions of points on individual surfaces. Assuming , have been sampled in some way from the surface S j , their positions can be manipulated in order to make them uniformly distributed on the surface. This is done by maximizing the sample entropy for the distribution of landmarks in S j , since, in bounded domains, as are our surfaces, the uniform distribution has maximum entropy. Let ∈ X S j j be a random variable on surface S j with probability density function p X ( ) j j , and … x x , , , and the sample differential entropy is given by The combined optimization cost used in the correspondence algorithm balances the sample entropy in shape space with the sum of point distribution entropies and is given by N j N j j j M 1 1 1 Q must be minimized under a set of constraints imposing that each point lies in its corresponding surface. We propose the use of a geodesic gradient descent algorithm which directly follows straightest paths on the surfaces in order to update landmark locations without violating these constraints. We first introduce some notions related to manifold geometry that are required to develop our methodology. We then describe the details regarding uniform distribution of points on individual surfaces, followed by the correspondence optimization.
Geometric Structures on Manifolds. The structure of a 3D object is commonly described by its boundary surface in R 3 . Such surfaces can be mathematically described as 2D manifolds, curved topological spaces that locally, around each point, can be considered similar to a 2D Euclidean space 62 . We restrict our interest to continuous and differentiable manifolds where distances and shortest paths can be defined.
Between any two manifold points, there exists a unique shortest curve in  that connects these two points. This curve is called a geodesic curve and is equivalent to a straight line on the Euclidean space 62 . The length of the geodesic curve defines the distance between the two points in . Furthermore, for each point α on a manifold , we can define a plane, passing from that point, which can be understood as a local linearization of the manifold around α. This space is called the tangent space of  at point α and is denoted α T M. α T M has equal dimensionality to the manifold .
A tangent vector T M ∈ α u can be uniquely associated to the geodesic curve from point α to point β ∈ , using the exponential map function T M M → α exp: , with β = α exp u ( ) . The inverse of the exponential map is termed logarithmic map. It accepts two points on the manifold and returns the tangent vector that corresponds to the geodesic curve connecting the two points 63 An important inference is that β α log ( ) is the smallest tangent vector in norm such that β = α exp u ( ) 64 . As such, the norm of the logarithmic map provides the length of the geodesic and is used as the distance metric on the manifold: The gradient of the squared distance function is directly related to the logarithmic map 65 : Uniform Distributions of Landmarks in Individual Surfaces. Here we assume that M points have already been positioned on a surface and describe the methodology for distributing these points uniformly on that surface. By maximizing the sample differential entropy H w.r.t.
, we in essence manipulate point positions to achieve the required uniformity. The optimization problem can be written as In order to minimize the cost, which is equal to the negative sample entropy, we will employ a gradient descent technique, where points are iteratively moved proportionally to the negative gradient of the cost, until no significant improvement in cost can be achieved. , and Y, Z the × N M 3 matrices of sample vectors y j and z j respectively as their rows, the sample entropy Γ is written as Gradient descent can also be employed here. To simplify computations, the mean estimate z is considered fixed during each iteration. With this assumption, the matrix of partial derivatives of Γ can be written as 58 : A regularization term α is added above since in practice, Σ will not have full rank.
In order to accommodate these correspondence updates into our geodesic gradient descent method, we project each update on the plane of the point's corresponding mesh triangle. Let n j k be the perpendicular unit normal vector of the x j k 's current triangle. The tangent vector that maximizes the gradient update on the triangle plane is given by By adding up the gradients from equations (12) and (16), we finally acquire the gradient descent updates for the optimization of the overall correspondence cost 4: Initialization and Iterative Landmark Splitting. The number of landmarks, M, to be sampled from each surface is provided as a parameter by the user. The algorithm initially samples randomly one point from each surface and performs correspondence optimization. Following that, the landmark is split into two and the overall optimization process runs again on the new point sets. This procedure is repeated iteratively until the required number of sampled landmarks is reached and their positions are optimized.
Validation of GESSA using the MorphFace dataset. For this validation, we used the publicly and freely available Morphface dataset of 3D facial surfaces 37 . The dataset is distributed freely from the University of Basel for internal, non-commercial research, evaluation or testing purposes, and can be found at http://faces.cs.unibas.ch/bfm/ main.php?nav= 1-1-1&id= scans. It comprises of 11 individual subjects' faces with neutral pose. All the faces were registered to the Basel Face Model (BFM) facial template. The template mesh consisted of 53,490 landmarks. The registration process identified the locations of these landmarks on the 11 validation faces. For the purposes of this validation study, 19 of these landmarks, corresponding to prominent facial points, were manually selected to constitute the groundtruth landmarks (GTLs). Supplementary Fig. 8 depicts these 19 landmarks on the facial surface. The Morphface surfaces were subjected to cropping in order to remove non relevant areas, such as neck and ears. Subsequently, the facial meshes were subsampled, yielding approximately 4,700 points per face, a number that was similar to that of the facial meshes in our TwinsUK cohort. Using GESSA, we computed 4,096 GESSA Generated Landmarks (GGLs) in the validation surfaces. Supplementary Fig. 7(A) depicts these landmarks on three example faces, while Supplementary Fig. 9 depicts the average face of the validation dataset.
The goals of our validation study were threefold. First, we aimed to evaluate how uniform the distribution of the sampled landmark points on the surfaces would be. Second, we aimed to demonstrate that sampling approximately 4000 facial landmarks produces a dense landmark coverage of facial surfaces. Third, we aimed to show that GESSA is consistently and accurately aligning landmarks to achieve one-to-one correspondence across all surfaces.
First, we focus on the evaluation of the landmarks' surface distributions. Supplementary Fig. 11 shows the estimated surface density function of an example validation face. It can be observed that the density is almost uniform everywhere. In addition, examination of Supplementary Fig. 7(A) also shows clearly that the landmarks are uniformly distributed across the complete facial surfaces.
Second, we needed to examine if the chosen number of sampled GGLs is sufficient to densely cover the facial surfaces. A low number of sampled landmarks would lead to extended facial areas having no or very few landmarks. As a consequence, extracted traits would not be able to capture well the morphological information contained in these areas. On the other hand, very dense sampling would incur unnecessary computational costs.
To address this objective, we worked under the premise that, by sampling surfaces densely, we should be able to locate GGLs sufficiently close to the preselected GTLs. We measured the average distance, over all faces in the validation set, from each GTL to its closest GGL. Supplementary Fig. 7(B) shows the GTLs and respective closest GGLs for three example validation faces. Table 3 reports the 19 average distances between GTLs and GGLs over the set of 11 faces. Due to the fact that dimensional units on the Morphface dataset were unknown, we further divided each of the above values by the mean facial width -distance between the two zygomatic landmarks-in order to compute normalized distances, which are also included in Table 3.
The average distances per landmark were always less than 3% of the mean facial width. By further averaging over the 19 landmarks considered, the overall average distance was 1.77% of the mean facial width. The results indicated that the number of landmarks was sufficient to cover densely the surfaces and that landmarks extended over the complete surfaces.
Third, we concentrated on evaluating the ability of GESSA to place landmarks in equivalent positions consistently across the complete set of faces. The standard deviations of the above GTL-GGL distances can be used as measures of consistent placement, since small values indicate that GTL-GGL distances are similar across all faces and, consequently, landmarks are positioned in equivalent positions across the dataset. The details are reported in Table 3. The standard deviations of the GTL-GGL distances for all landmarks were always less than 1% of the mean facial width. By further averaging again over the 19 landmarks considered, the average standard deviation was 0.42%. The results indicated that the automatic landmarking produced by the algorithm was sufficiently accurate to be deployed in our heritability study.
Curvature-based Facial Traits. The shape around a point on a surface can be characterized using curvature descriptors 68 . Curvature is a directional property and describes how bent the surface is around each point 69 . The curvature magnitude of a point in some direction is given by the reciprocal of the radius of the circle that best approximates the slice of surface in that direction 68 . Normal curvatures are defined on orthogonal planes to a surface point and for each such point, there exists a single normal curvature that has the largest absolute curvature magnitude. This is called the maximum curvature K max . The curvature perpendicular to K max is the minimum curvature K min . These two surface attributes are collectively called principal curvatures and any normal curvature at a point on a surface can be derived as a combination of K max and K min 69 . Supplementary Fig. 1 shows a general characterization of shapes based on the signs of K max and K min . Univariate curvature indices derived from these measures have been proposed. Four such measures were used in this study to compute phenotypic traits: Mean Curvature:  Gaussian Curvature:

max m in
Curvedness: max m in 2 2 Figure 5. Topological characteristics of curvature indices. Each descriptor highlights different attributes of the surface's underlying topology. MC differentiates significantly areas of high and low curvature, as well as convex and concave shapes. GC discriminates well between spherical and saddle-like areas. CU is less representative of a particular morphology and reflects the absolute curvature magnitude in each point, irrespective of its specific shape. Finally, SI is scale-independent and able to differentiate between pure shape characteristics, e.g domes, ridges and saddles, regardless of their high or low CU. In order to explain the differences between the various indices, we first present two intuitive features that can be used so as to describe the shape of a surface patch. The first feature is the general shape morphology and is governed by the various sign combinations of K min and K max , i.e. flat ( ). The second feature is curvature magnitude, i.e. how bent the surface is irrespective of shape morphology. Any univariate descriptor of curvature needs to be a compromise, since it cannot include all information provided by these two features 68,69 . Individually, each principal curvature does not provide a useful interpretation of local surface shape 68 , as can be seen in Supplementary Fig. 1. In contrast, the four curvature indices yield more meaningful quantitative shape measures by grouping together or differentiating particular classes of basic shape structures. An illustration of the various indices' characteristics, including their domains and how shapes are differentiated are included in Fig. 5. The MC index provides a balanced measure between shape morphology and curvature magnitude. It is strongly affected by directional shape differences (convex vs. concave shapes) but is also sensitive to curvature magnitude. GC distinguishes primarily between shape morphologies of the same or opposite principal curvature signs. Finally, CU and SI indices are the most accurate quantitative measures of curvature magnitude and shape morphology respectively.
The calculation of local curvature on the 3D meshes was performed using a finite-differences algorithm 70 . Normal vectors perpendicular to the surface were computed in each landmark. Gradients across the surface were then approximated using finite normal differences of neighbor points. Principal curvatures were found through an eigenvalue decomposition of the normal gradients and curvature index values calculated from the respective formulas.
Distance-based Facial Traits. Two different types of distance-based phenotypes were considered in this study. Traits derived from Euclidean distances between landmark pairs (EDTs), which represent the main type of examined phenotypes in the literature, and traits derived from Geodesic distances between landmark pairs (GDTs). An illustration of the difference between the two types of distances can be seen in Supplementary Fig. 6. Geodesic computations were performed using an implementation of the exact discrete geodesic algorithm 66,71 .

Heritability Estimation. The heritability analyses were performed using Structural Equation Modeling
(SEM) 40 . SEM evaluates which combination of additive (A) genetic, common (C) environmental and unique (E) environmental variance components can best explain the observed phenotypic variance and covariance of MZ and DZ twin data. The importance of individual variance components is assessed by dropping parameters sequentially from the set of nested models ACE→ AE→ E. In choosing between models, variance components are excluded from the selection process if there is no significant deterioration in model fit after the component is dropped, as assessed by the Akaike Information Criterion (AIC) 41 . The E component represents random error and must be retained in all models 40 . Heritability estimates for the AE models are calculated as + a a e 2 2 2 , where a and e are the path coefficients of the A and E variance components in the SEM model. A detailed description of SEM can be found in the Supplementary Text. In this work, ACE, AE and E structural equation models were fitted using the OpenMx software 72 . Regarding the univariate heritability studies, the log files from OpenMx for MC, GC, CU and SI traits respectively are provided in: Supplementary Files 5-8 for the ACE models, Supplementary Files 9-12 for the AE models and Supplementary Files 13-16 for the E models. Regarding the multivariate heritability studies, the log files (ACE, AE and E models) from OpenMx for MC, GC, CU and SI traits respectively are provided in Supplementary Files 17-20. The log files include detailed fit statistics, as well as the estimated path coefficients for the latent factors, along with their standard errors. Furthermore, age was included in the SEM models as a covariate. In contrast, for the multivariate study, the age was regressed from the data before the application of SEM. While the outcome is identical for both approaches, in the second case, no age variable is shown in the models, and ACE, AE and E logs are all included in single files.