Dense surface registration, commonly used in computer science, could aid the biological sciences in accurate and comprehensive quantification of biological phenotypes. However, few toolboxes exist that are openly available, non-expert friendly, and validated in a way relevant to biologists. Here, we report a customizable toolbox for reproducible high-throughput dense phenotyping of 3D images, specifically geared towards biological use. Given a target image, a template is first oriented, repositioned, and scaled to the target during a scaled rigid registration step, then transformed further to fit the specific shape of the target using a non-rigid transformation. As validation, we use n = 41 3D facial images to demonstrate that the MeshMonk registration is accurate, with 1.26 mm average error, across 19 landmarks, between placements from manual observers and using the MeshMonk toolbox. We also report no variation in landmark position or centroid size significantly attributable to landmarking method used. Though validated using 19 landmarks, the MeshMonk toolbox produces a dense mesh of vertices across the entire surface, thus facilitating more comprehensive investigations of 3D shape variation. This expansion opens up exciting avenues of study in assessing biological shapes to better understand their phenotypic variation, genetic and developmental underpinnings, and evolutionary history.
The phenotypic complement to genomics is phenomics, which aims to obtain high-throughput and high-dimensional phenotyping in line with our ability to characterize genomes1. The paradigm shift is simple and similar to the one made in the Human Genome Project: instead of ‘phenotyping as usual’ or measuring a limited set of simplified features that seem relevant, why not measure it all? In contrast to genomic technologies, which successfully measure and characterize complete genomes, the scientific development of phenomics lags behind. With the advent of new technologies, hardware exists for extensively and intensively collecting quantitative phenotypic data. For example, 3D image surface and/or medical scanners provide the optimal means to capture information of biological morphology and appearance. Today, the challenge is to establish standardized and comprehensive phenotypic representations from large scale image data that can be used to study phenotypic variation2.
Accurate quantification of biological form is important beyond genomics. In evolutionary biology and physical anthropology, changes in structures over time provide insights into evolutionary processes. In medicine, the same structure may be compared in an individual (e.g. pre- and post-surgery) as part of medical assessment, or between groups (e.g. control vs. disease) to gain insights into the pathology. Traditionally, this has been achieved using visual clinical assessment, by taking measurements between manually placed anatomical landmarks (e.g. the tip of the nose), or by analyzing the entire landmark configuration using geometric morphometrics3,4. However, manual landmarking is tedious to perform, difficult to standardize in practice, and prone to intra- and inter-operator error5,6,7,8,9. Furthermore, sparse landmark configurations can only quantify form at defined landmarks that can be reliably identified and indicated by a human, and thus lack the resolution to fully characterize shape variation between landmarks. An alternative is to utilize surface registration algorithms to automatically establish dense configurations of ‘quasi-landmarks’, typically thousands of points across the entire surface of the structure. Typically, though not always (cf. Gilani et al.10), this is achieved by gradually warping a generic template composed of thousands of points into the shape of each target image. The most common and simple algorithm, the iterative closest point (ICP), transforms the template shape rigidly towards the target by imposing each point on the template to its closet point on the target11. Improvements on this basic algorithm have been made that incorporate a regularized non-rigid deformation of the template to the shape of the target12,13,14,15,16,17,18. The registration strategy is akin to fitting an elastic net onto a solid facial statue through a geometry-driven mapping of anatomically corresponding features. When applied to a sample of targets, the coordinates of these warped templates, now in the shape of each target, are in ‘dense correspondence’ and the points are homologous. In other words, each target is composed of the same numbers of points and if the tip of the nose is point 100 in one target, point 100 should also be the tip of the nose in all other targets.
An automatic approach like this is preferable for the analysis of large datasets, avoiding the problems of manual landmarking by multiple operators. Dense landmark configurations also provide a more comprehensive approximation of the form and are more suitable for applications that require synthesis of a recognizable instance of the actual structure, such as predicting a complete shape from DNA19, synthetic growth and ageing of a face20,21, constructing 3D facial composites for forensic applications22, and characterization of dysmorphology for clinical assessment23,24. Computer scientists are consistently developing and improving algorithms for the identification of landmark points on 3D image data10,16,17,25,26,27,28,29,30,31,32,33,34, but fewer of these developments have been utilized in biological studies of morphology18,35,36,37,38,39,40,41,42,43, and more rare is quantifying morphology on a dense scale10,16,17,18,34,37. Those that have been developed are often done so in-house and siloed. Thus, our goal is to contribute to a transition of 3D image registration technology from computer science to biology by providing a customizable framework that is available to the public and specifically geared towards biological studies of morphology. We introduce the MeshMonk toolbox, which implements a variation of ICP incorporating a regularized non-rigid registration deformation, is informed by extensive tests of various algorithmic options44, and can be applied to 3D facial images as well as 3D scans of other complex morphological structures.
When the template is warped onto each target, the coordinates of any anatomical landmark, manually annotated on the template, can also be defined on each target, thus the complete quasi-landmark indication can also be considered a method for automatic placement of sparse anatomical landmarks45. To validate the MeshMonk toolbox in a way that is consistent with its intended usage in geometric morphometric analysis, we compared a set of 19 sparse landmarks indicated manually by two observers with the placement of those landmarks by the MeshMonk toolbox. Although we validate the MeshMonk toolbox using a sparse set of landmarks on the human face, our intention is for the toolbox to facilitate morphometric analysis of the entire shape surface, not just sparse landmark positions.
Direct comparison of manual and automatic landmark placements
As one measure of validation of the automatic landmark indications, we compared the raw coordinate values of manual landmark indications with the raw coordinate values of automatic landmark indications, considering the manual landmarks to be the “gold standard.” Because of the leave-one-out nature of our approach, we can compare the manual and automatic landmark coordinates directly without fear of training bias. To compare landmark indications, we calculated the Euclidean distance between manual and automatic indications (Table 1). When comparing the average of all six manual landmarking indications (CML) and the automatic landmarks trained using this average (CAuto), the highest Euclidean distance was 1.68 mm, for the right side exocanthion landmark (Table 1). Overall, the average Euclidean distance between CML and CAuto across all landmarks was 1.26 mm. However, these errors are not isotropic. We also calculated the root mean squared error (RMSE) between manual and automatic indications in x, y, and z directions separately and noticed the error for some landmarks greater along one axis relative to the other axes (Supplementary Table S1). Bland-Altman comparisons showed that the 95% limits of agreement between methods are ±1.5 mm from the mean difference of 0 mm (Supplementary Fig. S1). Most individuals fall within these limits, with only a few comparisons from each axis having differences greater than 3 mm. The intraclass correlation coefficients between manual and automatic indications for each axis are around 0.99, representing very high agreement (Supplementary Fig. S1).
Centroid size comparison
We also compared estimates of centroid size (CS; the square root of the sum of squared distances from each landmark to the geometric center of each landmark configuration) as an additional assessment of the similarity between manual and automatic landmark placements, as centroid sizes feature heavily in geometric morphometric assessments. The Intraclass Correlation Coefficients (ICCs) were all high (ICCA = 0.9589, ICCB = 0.9486, ICCC = 0.9591; Supplementary Fig. S2). Analysis of variance (ANOVA) by individual, observer, and method shows that method is not a significant factor in explaining variance in centroid size (F = 0.002, p = 0.962; Table 2). Bland-Altman comparison showed that the 95% limits of agreement between methods are ±2 mm relative to an average centroid size of about 165 mm (Supplementary Fig. S2).
Analysis of shape variance
Another important quality in geometric morphometrics analysis is estimates of sample variance. A multivariate analysis of variance (MANOVA) on shape, based on the average of each observer’s manual landmark indications and automatic landmark configurations, separately, was performed to determine if the variance explained by individual and observer factors was similar in both methods (Supplementary Table S2). In both methods, individual variation contributed to most of the variation in shape (R2ML = 94%; R2Auto = 97%). Differences in observer accounted for 1.9% of the variation in shape from manual landmarks and 2.6% of the variation in shape from automatic landmarks. In total, 3.9% of the variation present in manual landmark shape configurations was unexplained by our model while only 0.22% of the variation was unexplained when testing the automatic landmark configurations. A MANOVA on Generalized Procrustes Analysis (GPA) aligned manual and automatic configurations from each observer, with method, individual, observer, and individual x observer as predictors showed that landmarking method did not significantly account for variation in landmark placement (F = 0.3463; p = 0.987; Table 3).
Intra- and inter-observer error of manual landmarks
The quantitative study of morphology using 2D and 3D coordinates requires specific attention to measurement error and has a robust presence in the literature. To first assess the reliability of our ground truth manual landmarks, we calculated the intra- and inter-observer error of the manual landmarks alone. We calculated the intra-observer error of the manual landmarks as the standard deviation of each observer’s three landmarking iterations in the x, y, and z directions. Supplementary Table S3 reports intra-observer standard deviations for the manual landmark indications along each axis, averaged across images. The average standard deviation of observer A across all landmarks was 0.58 mm while the average standard deviation of observer B across all landmarks was 0.44 mm. The average inter-observer error, measured as the standard deviation between the average x, y, and z coordinates of each observer’s landmarking iterations was 0.40 mm. This range of deviation is considered highly precise and is similar to previously reported measures of observer error8,46.
The analysis of measurement and observer error for the manual landmarks alone, assessed using a MANOVA for shape, with individual, observer, observer x individual, and nested observer x landmarking iteration as factors showed that non-individual factors contributed significantly to variation in shape (Supplementary Table S4). Individual variation contributed to most of the variation in shape (85%), as expected. Simple measurement error accounted for 3.5% of the total variation in shape. Additional to this, differences in observer accounted for 1.8% of shape variation, and deviation across landmarking iterations contributed an additional 1.5% of the total variation in shape. In total, non-individual effects contributed to 15% of the total shape variation, with 8.3% of this variation unexplained by the model.
Comparison of manual and automatic inter-observer errors
By treating the automatic landmark indications as if they were performed by a third observer, we were able to calculate “inter-observer” errors to assess if the variation of automatic and manual landmarking was similar to that of manual landmarking. In this assessment, we compared inter-observer errors calculated using only the manual landmarks (AML vs. BML) with error estimates calculated by replacing one of the observer’s manual landmark indications with the automatic indications trained using that observer’s average. This resulted in two extra estimations of inter-observer error (AML vs. BAuto and AAuto vs. BML), calculated as the standard deviation of x, y, and z coordinates (Supplementary Table S5; Fig. S3). The mean manual landmarking inter-observer error was 0.40 mm while both manual-automatic comparisons had mean standard deviation values of 0.53 mm (Supplementary Table S5). A paired t-test between the manual landmark error values and each of the manual-automatic comparison showed that the landmark indications that were significantly different between the two methods tended to be those where facial texture likely assisted in the placement of the manual landmarks (e.g. localizing the crista philtra by looking at the differences in color between the lips and the skin; Supplementary Table S6). This result indicates that automatic sparse landmarking using MeshMonk will likely produce more robust results when given input data that has a strong anatomical orientation (e.g. the nasion and pogonion). Even given these differences in variance, the manual-automatic comparisons were all under 0.70 mm, a sign of the reliability of the MeshMonk registration.
As an illustration of the low errors between automatic landmark indications trained using different observers, we calculated the standard deviation between automatic landmark indications trained using the average of observer A’s three landmark indications and the average of observer B’s three landmark indications (AAuto vs. BAuto; Supplementary Table S7, Fig. S3). Based on Levine’s test, the variance of the average standard deviation values was significantly different for all landmarks except labiale superius, where we could not reject the null hypothesis that the variances of the two standard deviation distributions were equal (F = 2.4213, p = 0.1236). Supplementary Fig. S3 shows that the variance between automatic landmarking indications (AAuto vs. BAuto) is easily identified as being smaller than the manual landmark inter-observer error (AML vs. BML).
Through studies utilizing manually placed sparse landmarks, we have begun to understand the biological basis and evolution of complex phenotypes, both normative and clinical. However, there is still much to be learned. One avenue for improvement is to expand and speed up the production and analysis of data using methods derived from engineering and computer vision, which allow for the description of shapes as “big data” structures instead of sparse sets of landmarks or linear distances, thus matching our ability to describe phenotypes with our ability to describe genomes. To this end, we introduce the MeshMonk registration framework, giving researchers the opportunity to quickly and reliably establish an anatomically comprehensive homologous set of positions across entire samples. We have validated this framework using a sparse set of landmarks, though the registration framework produces thousands of landmarks to finely characterize the structure.
MeshMonk represents a step forward in our ability to describe complex structures, like the human face, for clinical and non-clinical purposes. Consider Fig. 1, showing the starting template for facial image registration (left) as well as three example faces (right). Each point on the images represents a quasi-landmark data point that is homologous and can be compared across faces. Researchers are no longer limited to a few homologous points, chosen because they can be reliably indicated over hundreds of hours of work. Instead, minute details of the face can be identified and compared across thousands of images in a few hours, and additional images can be incorporated just as easily, regardless of the camera system with which they were captured and whether or not colour information is available, allowing for the incorporation of images from different sources and databases (e.g. Facebase.org).
Previous studies using versions of the MeshMonk framework have shown that the error associated with multiple registrations of the template onto the same facial images is 0.2 mm47 and parameters of the toolbox have been fine-tuned, as discussed elsewhere44 and in the Supplemental Material. To provide some validation regarding the ability of the registration process to accurately identify anatomical positions of interest, we used a total set of 41 faces with manual landmark indications. Of these, 40 were used to “train” positions of interest on the template, then automatically indicate these positions on the left-out face. This process was repeated iteratively until all faces were automatically landmarked. In the comparison of manual and automatic landmark indications, the positions of the manual landmarks were considered to be the gold standard, as they have a long history of use and validation in morphological studies7,46. By limiting ourselves to a set number of sparse landmarks, we cannot necessarily speak to the accuracy of structures not involved in our validation (e.g. the cheeks), but we argue that the results of our comparison speak highly of the fidelity with which the MeshMonk registration framework aligns to underlying anatomical structures.
In the direct comparison of sparse landmarks placed manually and using the MeshMonk toolbox, the average difference between the manual and automatic placements was low (Supplementary Fig. S1), with the average Euclidean distance across all landmarks ranging from 0.70 to 1.68 mm, which is similar or below errors reported in other comparisons of manual and automatic landmarking methods10,25,26,27,28,35,37,42,43 (Table 1). When assessing landmarking methods separately, the variance in landmark configuration attributable to individual and observer factors is similar, with considerably less variation left unexplained by a MANOVA model using automatic landmark configuration as the response (Supplementary Table S2). When assessing manual and automatic landmark configurations in a single MANOVA, the landmarking method is a nonsignificant factor, indicating that variation in scans is not attributable to variation in landmarking method (Table 3). This result was also reproduced when comparing centroid sizes calculated using manual and automatically placed landmarks (Table 2), speaking to the high correspondence between landmark indications placed by human observers and those indicated by the MeshMonk toolbox.
The validation results together suggest that the MeshMonk toolbox is able to reliably reproduce information given by manual landmarking. Though the larger contribution of the MeshMonk toolbox is the ability to quickly and densely characterize entire 3D surfaces, our illustration using a small number of manually placed landmarks as a training set could be useful for studies seeking specifically to study a sparse set of landmarks, perhaps to add more images to a dataset that is already manually landmarked or to add additional landmarks to an analysis. Utilization of the MeshMonk toolbox also gives the opportunity to minimize variation due to different observers. Take, for example, datasets with manual landmarks indicated by two different observers. During the course of analysis, the inter-observer error would have to be calculated and taken into account when interpreting results. From our own study, the inter-observer error of the manual landmarks placed by two different observers was 0.40 mm (Supplementary Table S3). With the automatic landmarking framework implemented during this study, we can minimize both intra-observer variance for a single scan (by averaging together all indications of that scan by a single observer) and intra-observer variance across scans by placing all indications from the training dataset on the template mesh and averaging the entire training set before using MeshMonk to place them in an automatic fashion on the target image. This process finely tunes the position of the landmark, such that even if the training sets were indicated by two different observers, the variation in automatic landmark indication is much smaller than the variation in manual landmark indication, averaging 0.27 mm in our study (Supplementary Table S7; Fig. S3).
A visual hallmark of the ability of spatially dense surface registration to reliably represent anatomical structures is found in the crispness of “average shapes,” constructed by averaging together all registered surfaces in a study sample. Because the MeshMonk registration aligns closely with the underlying anatomical structure, averages across the study samples continue to cleanly resemble the structure and detail is not lost in the averaging process. As depicted in Fig. 2, consider the sample average of the 41 faces in this work and 100 mandible scans. In the scaled rigid-only averages, details are overly smoothed compared to the level of detail present in the scaled rigid plus non-rigid registration averages. For example, it is obvious to the naked eye that the sharpness of the eyes, nose, philtrum and mouth for the facial average, and the alveolar crest, mental foramen, and coronoid and condylar processes for the mandible, are clearly better represented with the scaled rigid plus non-rigid registration. Thus, non-expert readers can easily evaluate the quality of dense-correspondence morphometrics research by looking at the average surfaces, which are typically used in manuscript figures, with the understanding that high quality registration leads to sharp average scans where anatomical positions of interest are clearly defined.
In this study, we introduce MeshMonk, an open-source resource for intensive 3D phenotyping on a large scale. Through dense-correspondence registration algorithms, like MeshMonk, we can advance our ability to integrate genomic and phenomic data to explore variation in complex morphological traits and answer evolutionary and clinical questions about normal-range variation, growth and development, dysmorphology, and taxonomic classification.
Materials and Methods
Explanation of MeshMonk registration
The core functionality of the MeshMonk toolbox is implemented in C++, with a focus on computational speed and memory to enable the processing of large datasets of 3D images. Interaction with the toolbox is provided using MATLABTM, enabling an easy to use implementation and visualization environment for the user. A schematic of the complete surface registration algorithm is presented in Fig. 3 and screenshots of the process on an example face are presented in Fig. 4. A short video of the registration on this example face is also available in the Supplementary Material.
To initiate the process, a scaled rigid registration based on the iterative closest point algorithm11 is performed to better align the template to the target surface (Fig. 4b). During the scaled rigid registration, the transformation model is constrained to changing the position (translation), orientation (rotation), and isotropic scale of the template only. Subsequently, a non-rigid registration is done that will alter the shape of the template to match the shape of the target surface (Fig. 4c). During the non-rigid registration, a visco-elastic model is enforced, ensuring that points that lie close to each other move coherently44. At any iteration during the registration, for both the scaled rigid and non-rigid registration steps, correspondences are updated by using pull-and-push forces (symmetrical correspondences)48 and a weighted k-neighbor approach (Supplementary Fig. S4)44.
3D surface images typically contain artifacts such as holes and large triangles indicating badly captured or missing parts. Any correspondence to such artifacts is meaningless and are indicated as correspondence outliers, not to be considered when updating the transformation model, though they are consistently transformed along with the inliers. The MeshMonk toolbox allows for the identification of outliers either deterministically or stochastically, or a combination of both. In each iteration, correspondences are updated and outliers are identified, then an updated transformation model is used. The smoothness of the transformation model is parametrized by convolving the displacement vectors between corresponding points with a Gaussian49. The amount of smoothing is high (multiple Gaussian convolution runs) at the beginning iterations, when correspondences are still noisy and hard to define, and reduces gradually towards the later iterations, when correspondences are more accurately defined.
Parameters and tuning
Given a dataset of 3D images of interest, the entire MeshMonk procedure can be optimized by setting a variety of parameters in the toolbox, and a parameter tuning can be done based on two “quality” measures. First, a quality of “shape fit” is defined as the root mean squared distance of all template points to the target surface after registration. This essentially measures how well the shape of the template was adapted to the target shape and can be measured over multiple images to deduct an overall quality of shape fit from the dataset. Second, an indication of the consistency of point indications across the same dataset is obtained following the principle of minimum description length in shape modelling50. Given two models explaining the same amount of variance, the model requiring fewer parameters is favored, or given two models with the same number of parameters, the one explaining more variance in the data is favored. To this end, a principal component analysis (PCA) is used to assess registration quality because, if the point indications were performed consistently, few PCs are required to explain variation in the registration results. A parameter tuning was done for the facial data in this work prior to the validation and is described in the Supplemental Material.
Validation sample and data curation
Our collaborative group has recruited participants through several studies at Pennsylvania State University, recruited at the following locations: State College, PA (IRB 44929 and 4320); New York, NY (IRB 45727); Urbana-Champaign, IL (IRB 13103); Dublin, Ireland; Rome, Italy; Warsaw, Poland; and Porto, Portugal (IRB 32341). All procedures were performed in accordance with the relevant guidelines and regulations from the Pennsylvania State University Institutional Review Board and all participants signed a written informed consent form before participation. Participants additionally gave optional informed consent for publication of their images in a variety of formats, including online open-access publications.
Stereo photogrammetry was used to capture 3D facial surfaces of N~6,000 participants using the 3dMD Face 2-pod and 3-pod systems (3dMD, Atlanta, GA). This well-established method generates a dense 3D point cloud representing the surface geometry of the face from multiple 2D images with overlapping fields of view. During photo capture, participants were asked to adopt a neutral facial expression with their mouth closed and to gaze forward, following standard facial image acquisition protocols51.
Manual placement of validation landmarks
Of the larger sample, N = 41 surface images were chosen at random for validation, excluding participants who reported facial surgery or injury. These images were diverse with respect to sex, age, height, weight, and 3D camera system used (Supplementary Table S8). 3dMDpatient was used to record the 3D coordinates of 19 standard landmarks (7 midline and 12 bilateral) from each unaltered surface in wavefront.obj format (Supplementary Fig. S5; Table S9). Two independent observers placed landmarks three times each, with at least 24 hours in-between landmarking sessions, resulting in six total landmark indications for each facial image. For each individual, we checked for gross landmark coordinate errors before analysis. In the subsequent analysis, AML represents the average manual landmarks from observer A, BML represents the average manual landmarks from observer B, while the combined average of all six manual landmark indications is denoted as CML.
Automatic placement of validation landmarks
To obtain automatic indications of the 19 validation landmarks, each of the validation faces was registered using MeshMonk and the manual landmark placements were transferred to the registered face by coordinate conversion (Fig. 5a)52. Because the registered faces are now in the same coordinate system as the original template, we can subsequently transfer the manual landmark indications to the original pre-registration template, giving a set of 41 × 2 observers × 3 indications = 246 manual landmark positions on the template scan (Fig. 5b). One by one, each face was left out while averaging the other 40 landmark placements to “train” the automatic landmarks (Fig. 5c). These averages were then transferred onto the left-out (target) face, resulting in the automatic placement of the validation landmarks using a “training” set that did not include the target face (Fig. 5d). Further detail on this process can be found in the Supplemental Material.
The placement of automatic landmarks was performed three times: once using the average of observer A’s manual landmark indications as input (AAuto), again using the average of observer B’s manual landmark indications (BAuto), and a final time using the combined average of all six manual landmark indications from both observers (CAuto). This process resulted in three placements of automatic landmarks for comparison.
We assessed the accuracy of the MeshMonk automatic landmark placements by calculating the Euclidean distance and root mean squared error between manual and automatic coordinates. We also calculated Bland-Altman53 and Intraclass Correlation Coefficient (ICC)54 statistics to compare the manual and automatic landmark indications. The Bland-Altman plot and ICC are preferred over Pearson’s correlation or regression as they indicate the absolute agreement between measurements, whereas the latter allows for differences in mean and scale between observers, which is undesirable for our purposes. We additionally compared estimates of centroid size calculated using each method and performed an ANOVA on the centroid size calculations, with individual, observer, method, and individual x observer as predictors, to determine if variation in centroid size could be attributable to variation in landmarking method.
We utilized several methods to determine if the variance structures produced by the two methods were similar. Fitting a MANOVA estimates the variance explained, in correlated outcome variables, by various factors included in the model. Here, we performed MANOVAs separately on the GPA-aligned average manual landmark indications from each observer (AML and BML) as well as on the GPA-aligned automatic landmark indications trained using the average of each observer’s three landmark placements (AAuto and BAuto), with image and observer as predictors in both tests. By comparing the results of these two tests, we can determine how the explanation of shape variance changes given a different landmarking method. To directly determine if any variance in shape was attributable to landmarking method, we combined the average manual landmark placements of each observer with the automatic placements trained using each of these averages and aligned them using GPA (AML, BML, AAuto, and BAuto). We then tested the shape variation in this combined space as the response in a MANOVA, with individual, observer, method, and individual x observer as factors.
We calculated the manual landmarking intra-observer error, the variation between indications taken at different times by the same individual, as the standard deviation between each observer’s manual landmark indications in the x, y, and z directions, separately. The inter-observer error, the difference between manual landmark indications made by different individuals, was calculated as the standard deviation between each observer’s average coordinates in the x, y, and z directions, separately (AML vs. BML). As an additional method to understand the variation present in the manual landmark indications only, we performed a MANOVA after GPA-aligning the six manual landmarking indications55. Study individual, observer, and landmarking iteration were used as factors and landmark configuration as the response.
To determine if the automatic indication process was more or less variable than manual landmarking, we compared the inter-observer error calculated using only the manual landmarks (AML vs. BML) to the standard deviation between one observer’s manual landmarks and the automatic landmarks trained using the other observer’s manual placements (AML vs. BAuto and AAuto vs. BML), as if the automatic indications replaced the manual indications in a calculation of inter-observer error. A paired T-test was used to determine whether the “inter-observer errors” calculated using the automatic indications were significantly different than the error calculated using only the manual indications. Standard deviation values calculated using both automatic placements (AAuto vs. BAuto) were compared to manual landmarking inter-observer error to illustrate the variance of automatic landmark indications. Levene’s test56 was performed to determine if the variances of the inter-observer errors calculated using the manual landmarks were equal to the standard deviation between the automatic landmarks (the null hypothesis). Levene’s test was chosen because the distribution of standard deviation values was non-normal.
All validation analyses were performed in R using the Geomorph57, BlandAltmanLeh (https://cran.r-project.org/web/packages/BlandAltmanLeh/BlandAltmanLeh.pdf), and ICC (https://cran.r-project.org/web/packages/ICC/ICC.pdf) packages, as well as packages for data manipulation (readxl, reshape2, plyr, car, data.table, dplyr, broom) and graphing (ggplot2, GGally, GGpubr). Centroid sizes were calculated using Geomorph and MANOVAs for shape variation were implemented using the ProcD.lm function from Geomorph57,58.
The informed consent with which the data were collected does not allow for dissemination of identifiable data to persons not listed as researchers on the IRB protocol. Thus, the full surface 3D facial images used for validation cannot be made publicly available. In the interest of reproducibility, we have provided the 19 manual and automatic landmarks used for validation as well as the code used to analyze them. These data are available in the following GitHub repository: https://github.com/juliedwhite/MeshMonkValidation/. The specific release used for these analyses is available at https://doi.org/10.5281/zenodo.2577628. The MeshMonk code and tutorials are available at https://github.com/TheWebMonks/meshmonk. The specific release used for these analyses is available at https://doi.org/10.5281/zenodo.2576795.
Houle, D., Govindaraju, D. R. & Omholt, S. Phenomics: The next challenge. Nat. Rev. Genet. 11, 855–866 (2010).
Walter, T. et al. Visualization of image data from cells to organisms. Nat. Methods 7, S26–S41 (2010).
Adams, D. C., Rohlf, F. J. & Slice, D. E. Geometric morphometrics: Ten years of progress following the ‘revolution’. Ital. J. Zool. 71, 5–16 (2004).
Rohlf, F. J. & Marcus, L. F. A revolution in morphometrics. Trends Ecol. Evol. 8, 129–132 (1993).
Fagertun, J. et al. 3D facial landmarks: Inter-operator variability of manual annotation. BMC Med. Imaging 14, 35 (2014).
Toma, A. M., Zhurov, A. I., Playle, R., Ong, E. & Richmond, S. Reproducibility of facial soft tissue landmarks on 3D laser-scanned facial images. Orthod. Craniofacial Res 12, 33–42 (2009).
Weinberg, S. M., Scott, N. M., Neiswanger, K., Brandon, C. A. & Marazita, M. L. Digital three-dimensional photogrammetry: Evaluation of anthropometric precision and accuracy using a Genex 3D camera system. Cleft Palate-Craniofacial J 41, 507–518 (2004).
von Cramon-Taubadel, N., Frazier, B. C. & Mirazon-Lahr, M. The problem of assessing landmark error in geometric morphometrics: Theory, methods, and modifications. Am. J. Phys. Anthropol. 134, 24–35 (2007).
Wong, J. Y. et al. Validity and reliability of craniofacial anthropometric measurement of 3D digital photogrammetric images. Cleft Palate-Craniofacial J 45, 232–239 (2008).
Gilani, S. Z., Mian, A., Shafait, F. & Reid, I. Dense 3D Face Correspondence. IEEE Trans. Pattern Anal. Mach. Intell 40, 1584–1598 (2018).
Besl, P. J. & McKay, N. D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell 14, 239–256 (1992).
Amberg, B., Romdhani, S. & Vetter, T. Optimal Step Nonrigid ICP Algorithms for Surface Registration. In 2007 IEEE Conference on Computer Vision and Pattern Recognition 1–8, https://doi.org/10.1109/CVPR.2007.383165 (IEEE, 2007).
Claes, P. A robust statistical surface registration framework using implicit function representations: application in craniofacial reconstruction (KU Leuven, 2007).
Claes, P., Walters, M. & Clement, J. Improved facial outcome assessment using a 3D anthropometric mask. Int. J. Oral Maxillofac. Surg. 41, 324–330 (2012).
Andresen, P. R. & Nielsen, M. Non-rigid registration by geometry-constrained diffusion. Med. Image Anal. 5, 81–88 (2001).
Booth, J., Roussos, A., Ponniah, A., Dunaway, D. & Zafeiriou, S. Large Scale 3D Morphable Models. Int. J. Comput. Vis. 126, 233–254 (2018).
Booth, J., Roussos, A., Zafeiriou, S., Ponniah, A. & Dunaway, D. A 3D Morphable Model Learnt from 10,000 Faces. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5543–5552, https://doi.org/10.1109/CVPR.2016.598 (IEEE, 2016).
Hutton, T. J., Buxton, B. F. & Hammond, P. Automated registration of 3D faces using dense surface models. in British Machine Vision Conference (eds Harvey, R. & Bangham, A.) 1–10, https://doi.org/10.5244/C.17.45 (Citeseer, 2003).
Claes, P., Hill, H. & Shriver, M. Towards DNA-based facial composites: preliminary results and validation. Forensic Sci. Int. 13, 208–216 (2014).
Matthews, H. et al. Estimating age and synthesising growth in children and adolescents using 3D facial prototypes. Forensic Sci. Int. 286, 61–69 (2018).
Imaizumi, K. et al. Three-dimensional analyses of aging-induced alterations in facial shape: a longitudinal study of 171 Japanese males. Int. J. Legal Med. 129, 385–393 (2015).
Blanz, V. & Vetter, T. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on computer graphics and interactive techniques 187–194, https://doi.org/10.1145/311535.311556 (ACM Press/Addison-Wesley Publishing Co, 1999).
Baynam, G. et al. Phenotyping: Targeting genotype’s rich cousin for diagnosis. J. Paediatr. Child Health 51, 381–386 (2015).
Hammond, P. et al. Discriminating Power of Localized Three-Dimensional Facial Morphology. Am. J. Hum. Genet. 77, 999–1010 (2005).
Nair, P. & Cavallaro, A. 3D face detection, landmark localization and registration using a Point Distribution Model. IEEE Trans. Multimed 11, 611–623 (2009).
Szeptycki, P., Ardabilian, M. & Chen, L. A coarse-to-fine curvature analysis-based rotation invariant 3D face landmarking. IEEE 3rd Int. Conf. Biometrics Theory, Appl. Syst. BTAS 2009, https://doi.org/10.1109/BTAS.2009.5339052 (2009).
Perakis, P., Passalis, G., Theoharis, T. & Kakadiaris, I. 3D facial landark detection and face registration (2011).
Segundo, M. P., Silva, L., Bellon, O. R. P. & Queirolo, C. C. Automatic face segmentation and facial landmark detection in range images. IEEE Trans. Syst. Man, Cybern. Part B Cybern 40, 1319–1330 (2010).
Lu, X. & Jain, A. K. Automatic feature extraction for multiview 3D face recognition. In FGR 2006: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition 585–590, https://doi.org/10.1109/FGR.2006.23 (IEEE, 2006).
Yu, T.-H. & Moon, Y.-S. A Novel Genetic Algorithm for 3D Facial Landmark Localization. In 2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems 1–6, https://doi.org/10.1109/BTAS.2008.4699322 (IEEE, 2008).
Dibeklioǧlu,, H., Salah, A. A. & Akarun, L. 3D facial landmarking under expression, pose, and occlusion variations. In BTAS 2008 - IEEE 2nd International Conference on Biometrics: Theory, Applications and Systems 1–6, https://doi.org/10.1109/BTAS.2008.4699324 (2008).
D’Hose, J., Colineau, J., Bichon, C. & Dorizzi, B. Precise localization of landmarks on 3D faces using gabor wavelets. In IEEE Conference on Biometrics: Theory, Applications and Systems, BTAS’07 1–6, https://doi.org/10.1109/BTAS.2007.4401927 (2007).
Colbry, D., Stockman, G. & Jain, A. Detection of Anchor Points for 3D Face Verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Workshops 1–8, https://doi.org/10.1109/cvpr.2005.441 (IEEE, 2005).
Grewe, C. M. & Zachow, S. In Lecture Notes in Computer Science, vol 9914 (eds Hua, G. & Jegou, H.) 552–568, https://doi.org/10.1007/978-3-319-48881-3_38 (Springer, Cham, 2016).
Liang, S., Wu, J., Weinberg, S. M. & Shapiro, L. G. Improved detection of landmarks on 3D human face data. Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. EMBS 6482–6485, https://doi.org/10.1109/EMBC.2013.6611039 (2013).
Li, M. et al. Rapid automated landmarking for morphometric analysis of three-dimensional facial scans. J. Anat. 230, 1–12 (2017).
Guo, J., Mei, X. & Tang, K. Automatic landmark annotation and dense correspondence registration for 3D human facial images. BMC Bioinformatics 14 (2013).
Boyer, D. M. et al. Algorithms to automatically quantify the geometric similarity of anatomical surfaces. PNAS 108, 18221–18226 (2011).
Montúfar, J., Romero, M. & Scougall-Vilchis, R. J. Automatic 3-dimensional cephalometric landmarking based on active shape models in related projections. Am. J. Orthod. Dentofac. Orthop. 153, 449–458 (2018).
Gupta, A., Kharbanda, O. P., Sardana, V., Balachandran, R. & Sardana, H. K. A knowledge-based algorithm for automatic detection of cephalometric landmarks on CBCT images. Int. J. Comput. Assist. Radiol. Surg 10, 1737–1752 (2015).
Subburaj, K., Ravi, B. & Agarwal, M. Automated identification of anatomical landmarks on 3D bone models reconstructed from CT scan images. Comput. Med. Imaging Graph. 33, 359–368 (2009).
De Jong, M. A. et al. An automatic 3D facial landmarking algorithm using 2D gabor wavelets. IEEE Trans. Image Process 25, 580–588 (2016).
De Jong, M. A. et al. Ensemble landmarking of 3D facial surface scans. Sci. Rep 8, 1–11 (2018).
Snyders, J., Claes, P., Vandermeulen, D. & Suetens, P. Development and comparison of non-rigid surface registration and extensions. (KU Leuven, 2014).
Wei, R., Claes, P., Walters, M., Wholley, C. & Clement, J. G. Augmentation of linear facial anthropometrics through modern morphometrics: A facial convexity example. Aust. Dent. J 56, 141–147 (2011).
Aldridge, K., Boyadjiev, S. A., Capone, G. T., DeLeon, V. B. & Richtsmeier, J. T. Precision and error of three-dimensional phenotypic measures acquired from 3dMD photogrammetric images. Am. J. Med. Genet. 138A, 247–253 (2005).
Claes, P. et al. Sexual dimorphism in multiple aspects of 3D facial symmetry and asymmetry defined by spatially dense geometric morphometrics. J. Anat 221, 97–114 (2012).
Redert, A., Kaptein, B., Reinders, M., van den Eelaart, I. & Hendriks, E. Extraction of semantic 3D models of human faces from stereoscopic image sequences. Acta Stereol 18, 255–264 (1999).
Bro-Nielsen, M. Medical image registration and surgery simulation. 0909–3192 (Technical University of Denmark, 1996).
Davies, R. H., Twining, C. J., Cootes, T. F., Waterton, J. C. & Taylor, C. J. A minimum description length approach to statistical shape modeling. IEEE Trans. Med. Imaging 21, 525–537 (2002).
Heike, C. L., Upson, K., Stuhaug, E. & Weinberg, S. M. 3D digital stereophotogrammetry: a practical guide to facial image acquisition. Head Face Med 6, 18 (2010).
Hille, E. Analytic Function Theory, Volume I. (AMS Chelea Publishing Company, 1982).
Altman, D. G. & Bland, J. M. Measurement in Medicine: The Analysis of Method Comparison Studies. Stat. 32, 307–317 (1983).
Fisher, R. A. Statistical methods for research workers. Biological Monographs and Manuals, https://doi.org/10.1056/NEJMc061160 (Oliver & Boyd, 1925).
Rohlf, F. J. & Slice, D. Extensions of the procrustes method for the optimal superimposition of landmarks. Syst. Zool. 39, 40–59 (1990).
Levene, H. In Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling (eds Olkin, I. & Hotelling, H.) 278–292 (Stanford University Press, 1960).
Adams, D. C. & Otárola-Castillo, E. Geomorph: An r package for the collection and analysis of geometric morphometric shape data. Methods Ecol. Evol 4, 393–399 (2013).
Collyer, M. L., Sekora, D. J. & Adams, D. C. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity (Edinb) 115, 357–365 (2015).
MeshMonk was developed with WebMonks (https://webmonks.vision), a Belgian startup that works as a bridge to highly qualified developers in third-world countries, and we are very grateful for their high-quality implementation and support. We also thank the many participants who have volunteered their time and all the past and present members of the Shriver and Claes labs, without whom we would have never been able to develop this toolbox or perform the research that it has contributed to. The sample collection and personnel involved in this work was supported by grants from the US National Institute of Justice (2008-DN-BX-K125), the US Department of Defense, the University of Illinois Interdisciplinary Innovation Initiative Research Grant, the Science Foundation of Ireland Walton Fellowship (04.W4/B643), the Penn State Center for Human Evolution and Development, the Research Fund KU Leuven (BOF-C1, C14/15/081) and the Research Program of the Fund for Scientific Research - Flanders (Belgium) (FWO, G078518N).
Authors J.D. White, A. Ortega-Castrillón, H. Matthews, A.A. Zaidi, O. Ekrami, Y. Fan, T. Penington, S. van Dongen, M.D. Shriver and P. Claes declare not competing interests. Author J. Snyders is employed by the Belgian non-profit company WebMonks.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
White, J.D., Ortega-Castrillón, A., Matthews, H. et al. MeshMonk: Open-source large-scale intensive 3D phenotyping. Sci Rep 9, 6085 (2019). https://doi.org/10.1038/s41598-019-42533-y
This article is cited by
Exploring regional aspects of 3D facial variation within European individuals
Scientific Reports (2023)
Influence of age and diet consistency on the oral muscle pressure of orthodontically treated and untreated subjects with normal occlusion and comparison of their 3D facial shape
Clinical Oral Investigations (2023)
Genetic variants underlying differences in facial morphology in East Asian and European populations
Nature Genetics (2022)
Combining genome-wide association studies highlight novel loci involved in human facial variation
Nature Communications (2022)
Joint 3D facial shape reconstruction and texture completion from a single image
Computational Visual Media (2022)
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.