Anthropometric﻿ accuracy of three-dimensional average faces compared to conventional facial measurements

This study aimed to evaluate and compare the accuracy of average faces constructed by different methods. Original three-dimensional facial images of 26 adults in Chinese ethnicity were imported into Di3DView and MorphAnalyser for image processing. Six average faces (Ave_D15, Ave_D24, Ave_MG15, Ave_MG24, Ave_MO15, Ave_MO24) were constructed using “surface-based registration” method with different number of landmarks and template meshes. Topographic analysis was performed, and the accuracy of six average faces was assessed by linear and angular parameters in correspondence with arithmetic means calculated from individual original images. Among the six average faces constructed by the two systems, Ave_MG15 had the highest accuracy in comparison with the conventional method, while Ave_D15 had the least accuracy. Other average faces were comparable regarding the number of discrepant parameters with clinical significance. However, marginal and non-registered areas were the most inaccurate regions using Di3DView. For MorphAnalyser, the type of template mesh had an effect on the accuracy of the final 3D average face, but additional landmarks did not improve the accuracy. This study highlights the importance of validating software packages and determining the degree of accuracy, as well as the variables which may affect the result.

www.nature.com/scientificreports/ population groups 11 . The average faces were generated using an image pre-alignment pipeline and the "built-in algorithm" in RapidForm software (Geomagic Korea, Seoul Korea) to determine the "best fit" of the facial images and then averaging the 3D datapoints of the images, based on a facial template. Average faces have been also used to analyse facial soft tissue following orthognathic surgery 12 . Using "in-house" developed software two 3D average faces were produced, pre-surgery and post-surgery, these were then used to assess surgical changes in the x, y and z direction. These 3D average faces were produced by "averaging" the indices of the conformed generic mesh to calculate where each of the corresponding vertices (those that share the same index value) were likely to be, across the facial meshes of the whole sample. Then using dense correspondence analysis to create a 3D average face. An alternative software solution for generating an average 3D facial mesh is the use of MorphAnalyser, which has been used to assess cleft outcome in adults 13 and infants 14 . MorphAnalyser again uses a base mesh and dense correspondence to create an average 3D facial mesh template. All these methods of 3D average face generation rely on various computer algorithms and processes, which as clinicians we assume to be valid and have an acceptable level of accuracy. However, no study has investigated the validity or accuracy of 3D average faces in comparison to the conventional anthropometric methods by which previous and current 2D facial norms are calculated. Therefore, the aim of this study was to assess the accuracy of linear and angular measurements obtained using conventional digital anthropometry, with the "3D average faces" produced by two different software packages; Di3DView (Di4D SNAP, Dimensional Imaging Ltd., Hillington, Glasgow, UK) and MorphAnalyser (http:// cherry. dcs. aber. ac. uk/ morph analy ser). In addition, differences in the surface topography of the 3D average faces were compared.

Material and methods
Sample size calculation. Based on a previous study 15  Image capture. Static three-dimensional (3D) images of each participant were taken using a 3dMDface System (3dMD Inc., Atlanta, GA, USA) by one professional photographer. The accuracy of the system had been previously published and was reported to be lower than 0.2 mm root mean square (RMS) 16,17 . Prior to image capture, the 3dMDface system was calibrated according to the manufacturers instructions. Immediately prior to capturing the 3D image, participants were seated 100 cm away from the capture system, looking forward with Frankfort plane parallel to the floor, and any glasses and jewellery removed. The camera system captured six 2D images; four black and white pictures, depicting facial structures and spatial relationships to form a facial framework; two coloured images to project the texture information onto mesh framework 16 . The capture took 1.5 ms and were saved as an object wavefront file (.OBJ) for later analysis.
Landmarking and facial image conformation. The process of average face construction involved two steps: • Step 1-Conformation, which involved transforming the topography of a generic mesh to each individuals original facial mesh. The generic mesh was a computer generated symmetrical facial image constructed of 3763 vertices. This required corresponding landmarks to be placed on each original facial mesh and the generic mesh. Using these landmarks as "anchors", the remaining generic mesh was elasticity deformed to fit the remaining original facial surface. As a result, the generic mesh had the same surface topography of the patients face but the number of vertices was the same across every face and listed or indexed in the same order to maintain anatomical correspondence. The generic mesh was of lower resolution than the original 3D captured facial mesh.  www.nature.com/scientificreports/ 15 and additional 9 landmarks were digitized on the generic mesh and saved for the conformation process (24 landmarks in total). For conformation, the generic mesh and its digitised 15 landmarks were imported into Di3DView together with each individual's original facial 3D image and corresponding digitized 15 landmarks. Using the "Shape transfer function", the generic mesh was elasticity deformed to fit the remaining original facial surface. The new "conformed generic mesh" was saved again in OBJ format (Di3D_15). This process was repeated using the 24 landmarks (Di3D_24). As a result, two conformed generic meshes were produced, one based on 15 landmarks and the other using 24 landmarks, these would be used for the average face construction.
MorphAnalyser did not routinely use a generic mesh for conformation, but instead used a "standard" template. In this case, the standard template or mesh was made up of 39,256 vertices and was chosen from one randomly selected original facial image. Conformation created the same mesh structure for all the images by warping all images to one image's structure i.e. all images were warp to the "standard template". Following landmark digitization as previously mentioned, the "conformation" process was conducted by elastically deforming or warping the template onto each individual facial surface.
To determine the effect of template mesh ("standard" template based on one randomly selected original facial image versus generic mesh) on conformation and averaging, the same generic mesh used in the Di3DView process was used in MorphAnalyser with digitation of both 15 and 24 landmarks. As a result, four conformed meshes were generated, two based on a randomly selected original facial image (Morph_Original_15 and Morph_Origi-nal_24) and the other two using the generic mesh (Morph_Generic_15 and Morph_Generic_24). For each patient both meshes were saved in .OBJ format for average face construction.
Average face construction. Di3DView. All the conformed generic images (Di3D_15) based on the 15 landmarks were saved in a single folder. These were then used in the "Average Face" function in Di3DView to create an average 3D facial mesh surface (Ave_D15) based on the mean position of each correspondence of all individuals. This was saved in .OBJ format. The same process was used for the conformed generic images (Di3D_24) based on the 24 landmarks. The average facial 3D mesh surface produced (Ave_D24) was again saved in .OBJ format, Fig. 2.
MorphAnalyser. The first conformed generic mesh image from the Morph_Original series was loaded into MorphAnalyser using the "Add to Average…" with two sets of landmark registration. The next patient's image  Analysis. Topographic analysis of average faces. The four average faces (Ave_D15, Ave_D24, Ave_MG15, Ave_MG24) were imported into MATLAB software (Version 9.7.0. Natick, Massachusetts, USA) for surface topography analysis. In-house software was developed to measure the Median Euclidean Distance (MED) between the four different average facial mesh combinations following superimposition on the centroids of each of the average facial meshes. The MED is a measure of the distance between corresponding vertices between any two facial meshes. In order for this to work all the facial meshes had to have the same number of vertices i.e. in this case the made up from the same generic mesh. This meant that the meshes derived on the randomly selected original facial image, using MorphAnalyser were not included (Ave_ MO15 and Ave_MO24). This was because they were made of a different number of vertices and so correspondences, and therefore measurements, with the generic mesh could not be calculated. The Median of Euclidean Distances (MED) were tested for normality and compared using a one-sample Wilcoxon signed test to a hypothesis median of 2 mm, as this was determined to be clinically significant. www.nature.com/scientificreports/ Anthropometric assessment. For each patient seventeen linear (L) and seven angular (A) measurements were taken of their static three-dimensional (3D) facial image using Di3DView, Table 1. This was repeated for all patients and a mean value obtained for each of the 24 measurements; these were taken as the "gold standard".
To determine the validity of each of the average faces, the gold standard anthropometric measurements were compared to the measurements obtained from each of the six average faces. The 24 measurements were taken when 15 landmarks were used for conformation (L1/A1) and again where 24 landmarks were used for conformation (L2/A2), using Di3DView and MorphAnalyzer. In both cases additional landmarks were used in the measurements that were not used during the conformation process.
The level of statistical significance was set at 0.05. In addition, clinical significance for linear parameters was 2 mm, and that for angular measurements as 5°. The statistical analysis was conducted using Statistical Package for Social Sciences V.25 (SPSS Inc., Chicago, Illinois, USA).
Error study. To assess intra-operator error, linear and angular measurement were undertaken twice with 2-week interval on the original images and generated average faces by one experienced examiners (SZY). For anthropometric assessment, the differences in landmark coordinates between the first and second digitisation were used for analysis of the errors of the assessment method. Systematic error was assessed by using a paired sample t-tests (p-values) and random error assessed using correlation coefficients, Table 2.

Results
Topographic analysis. For the six different combinations of average faces based on the generic mesh (Ave_ D15, Ave_D24, Ave_MG15, Ave_MG24) the MED ranged from 0.8 to 2.4 mm. Using the same number of landmarks but different software packages produced some differences in the MED (Ave_D15 & Ave_MG15 = 2.2 mm and Ave_D24 & Ave_MG24 = 2.4 mm). Following a one-sample Wilcoxon signed test, these differences were statically significantly greater than 2.0 mm (p < 0.001), Table 3. In addition, increasing the number of landmarks Table 2. Each of the four images (average faces) were landmarked twice, 2 weeks apart, and the differences in landmark coordinates between the first and second digitisation were used for analysis of the errors of the method. Systematic error was assessed by using a paired t-tests and random error assessed by coefficients of reliability.

Ave_D15
Ave_D24  www.nature.com/scientificreports/ but using the same software produced minimal changes in the MED (Ave_D15 & Ave_D24 = 0.9 mm and Ave_ MG15 & Ave_MG24 = 0.8 mm). Following a one-sample Wilcoxon signed test, these differences were statically significantly less than 2.0 mm (p < 0.001), Fig. 4. The superimposed 4 average faces based on the generic mesh and Di3DView and MorphAnalyzer are shown in Fig. 5.
Anthropometric assessment. Error study. No systematic errors were observed, all p-values were greater than 0.05. There was no random error, all correlation coefficients are above 0.95. All landmarks were digitised to within 1.0 mm, Table 2.  www.nature.com/scientificreports/ For angular measurements, Ave_MG15, Ave_MO15, and Ave_MO24 showed no parameter with absolute difference over 5 degree. Other average faces, i.e. Ave_D15, Ave_D24, and Ave_MG24 presented a relative inconsistency, particularly for Labiomental angles (− 5.1° to − 10.3°). Additionally, Nasal tip angle and Nasolabial angle in Ave_D15 were also considerably different from the arithmetic mean (6.0° and 9.2° respectively), Table 4. The use of 9 additional landmarks had little effect on the validity of the angular and linear measurements.

Discussion
This is the first study to investigate the accuracy of "3D average faces" as an anthropometric tool for facial analysis against conventional methods. With the ever increasing use of digital data and computer software solutions it is important for clinicians to have an understanding of the factors which may affect the output of the software. As clinicians we generally input variables into "blackbox technology/software" and assume the output to be accurate. As with most pieces of conventional laboratory equipment there is an essential step of validation and  www.nature.com/scientificreports/ determining its accuracy; working with digital technology should be no different. The 3D capture systems and software has been validated and their accuracy reported but this does not mean all subsequent applications are as accurate 18,19 . It is easy to be distracted by the visual superiority of three-dimensional images and just because an image looks "right" it does not mean the image is a true representation of the original image. Traditional anthropometric measurements based on linear and angular measurements provide measurements of specific areas of the face in a form that is not visually ideal i.e. it does not resemble a face. In addition, the use of Euclidian distances is problematic when measuring 3D distances 20,21 . The Euclidean distance is by definition the length of a line segment between two points. This means that the distance between two points can be the same but the direction different. For instance, the alar base width using the Euclidian distance and right and left alar landmarks could be the same between cleft and non-cleft individuals. If however the left alar base was depressed in the cleft group, this would only be apparent when differences in the x, y and z position between the right and left alar landmarks were assessed. Therefore moving away from Euclidian distances and using 3D co-ordinates when analysing 3D images would seem more appropriate.
Advances in 3D facial surface imaging have resulted in the ability to capture a human face in 3D as a matter of routine. Recently, a web-based 3D Facial Norms database for European Caucasians has been constructed by synthesizing 2454 individuals covering both genders from 3 to 40 age range 22 . The data contains the 3D coordinates for a variety of standard facial surface landmarks, selected linear distances and face and head measurements www.nature.com/scientificreports/ using traditional anthropometric methods (i.e. calipers). Interestingly, the 3D images are still analysed as if they were two-dimensional, producing simple linear Euclidian distances and angular measurements. For example, using conventional cephalometery, two points Labiale superius and Stomion (superius) are used to represent the upper lip, but the lip has a complex three-dimensional morphology and is cannot fully described by two landmarks. Three dimensional surface mesh images are made up of vertices, each with a 3D coordinate representing its position in space, in other words each vertex represents a landmark. This means that is possible to represent an anatomical region by the number of vertices it is made up of, with each representing a landmark 23 . Referring back to the previous example of the upper lip, this means that all the vertices making up the lip surface can be utilised. The problem is that each facial 3D image is made of a different number of vertices and therefore there is no consistency between two images, even if they are captured one after the other using the same imaging equipment. To overcome this, the use of a generic mesh and image conformation, or dense correspondence, has been used 24,25 . Image conformation transforms a generic mesh, made up of a known number of vertices, into the shape of the original 3D facial imaging, whilst maintaining anatomical correspondence between images. This latter feature means that the same vertex (or landmark) between images represents the same anatomical point, e.g. vertex number 345 is always Pogonion point. This then allows for averaging of facial images and the development of "3D average faces", with each point on the 3D facial mesh acting as a landmark. These landmarks are classified as anatomical, mathematical or pseudo landmarks 26 . Anatomical landmarks are determined by experts and correspond between individuals, mathematical landmarks that located on an object according to some mathematical or geometrical property, i.e. maximum point on a curvature and pseudolandmarks which are constructed points on an object either on the outline or between landmarks. Normal 2D and simple 3D anthropometry use a small number of anatomical landmarks whilst the use of 3D average face allows the use of both anatomical and pseudolandmarks. The additional use of pseudolandmarks allows more detailed measurements. However there is some debate on whether these pseudolandmarks are accurate, with some studies suggesting higher levels of inaccuracies 27, 28 and others not 15,29 .
The results of the present study showed that the least differences in anthropometric linear measurements were seen using MorphAnalyzer, the generic mesh and 15 landmarks (Ave_MG15) and was therefore more accurate than the other average faces. The average face generated using Di3DView, the generic mesh and 15 or 24 landmarks for conformation (Ave_D15 and Ave_D24), produced errors upto 4.3 mm (Sn-Gn), compared to the conventional arithmetic means of the individuals. For angular measurements this difference was close to 10° degree for the Labiomental angle using Di3DView, a generic mesh and 15 landmarks for conformation (Ave_D15). For average faces generated using Di3DView, the largest differences were seen in the peripheral region of the face (Gn, Go, and Zy). The addition of further landmarks during the conformation process, from 15 to 24 landmarks, marginally improved the accuracy of the measurements, as seen by the narrower level of agreement, Fig. 6b,c. This was also reflected in the marginal reduction in MED seen by increasing the number of landmarks, using Di3DView and a generic mesh.
Based on the topographical analysis all 3D average facial mesh combinations had a MED error greater than 2.0 mm, which would be clinically significant. However this should be viewed with caution, as the MED is the median value of all the points across the 3D entire facial mesh. The discrepancies in the different mesh combinations are not across the entire mesh surface but most marked around the forehead, Fig. 7. These large deviations increase the MED values. There are however small differences in the chin and nasal regions between the different meshes. Interestingly the addition of more landmarks using MorphAnalyser makes little difference to the meshes ( Fig. 7f) but has more of an effect on using Di3View (Fig. 7e), again around the forehead region and Subnasle (Sn). This problem has been previously highlighted and the use of regional mesh analysis suggested as a possible solution 30 . The morphology of the forehead is more "realistic" using MorphAnalyser but was not confirmed by anthropometric measurements.
Using Di3DView, additional landmarks improved the accuracy of angular measurements. This was probably due to the additional placement of anatomical landmarks. For instance the addition of Subnasle (Sn) improved the accuracy of the nasiolabial angle. However, this was not the case for the labiomental angle, as Supermentale (Sm) was never an additional landmark, yet additional landmarks improved the accuracy. A possible explanation maybe that the addition of anatomical point Gonion bilaterally may have helped to constrain the mesh during conformation. This highlights the importance of selecting the relevant landmarks and the potential affects they may have on distant landmarks. Measurements involving Gnathian (Gn) using Di3DView produced inaccuracies in the vertical direction i.e. Sn-Gn, N-Gn, Sm-Gn. This is probably a result of incorrect morphology of the chin region produced using Di3Dview and generic mesh and 15 or 24 landmarks, Fig. 2. The error maybe related to the conformation and averaging of peripheral landmarks i.e. Nasion, Gnathian and Zygonion.
For MorphAnalyser the use of the generic mesh rather than an original facial template produces more accurate results. Using the original facial template the intercanthal distance, En(L) -En(R), is inaccurate as well as the vertical lip (Ls-Li & Sn-Ls). The addition of extra landmarks do not improve the general accuracy of the linear measurements. The reason for this is unknown but can only be a result of the software algorithm which produces the average facial mesh. To further complicate the issue the use of a generic mesh improves the anthrometertic accuracy of the measurements; whilst extra landmarks reduces the accuracy. Overall, we could conclude that the validity and accuracy of 3D average faces is dependent upon the software that is being used, the type of baseline / generic mesh and the number of landmarks used during conformation.

Conclusion
Di3DView and MorphAnalyser are both able to produce 3D average faces for anthropometric analysis. Marginal and non-registered areas were the most inaccurate regions using Di3DView. For MorphAnalyser, the type of template mesh had an effect on the accuracy of the final 3D average face. Additional landmarks did not improve www.nature.com/scientificreports/ the accuracy. This study highlights the importance of validating software packages and determining the degree of accuracy as well as the variables which may affect the output. The use of MorphAnalyer, a generic mesh and 15 landmarks for conformation produces an 3D average face that that has the same anthropometric dimensions as average measurements determined by conventional anthropometric techniques.