Comparing 2D and 3D representations for face-based genetic syndrome diagnosis

Bannister, Jordan J.; Wilms, Matthias; Aponte, J. David; Katz, David C.; Klein, Ophir D.; Bernier, Francois P.; Spritz, Richard A.; Hallgrímsson, Benedikt; Forkert, Nils D.

doi:10.1038/s41431-023-01308-w

Article
Published: 07 February 2023

Comparing 2D and 3D representations for face-based genetic syndrome diagnosis

European Journal of Human Genetics volume 31, pages 1010–1016 (2023)Cite this article

547 Accesses
3 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Human genetic syndromes are often challenging to diagnose clinically. Facial phenotype is a key diagnostic indicator for hundreds of genetic syndromes and computer-assisted facial phenotyping is a promising approach to assist diagnosis. Most previous approaches to automated face-based syndrome diagnosis have analyzed different datasets of either 2D images or surface mesh-based 3D facial representations, making direct comparisons of performance challenging. In this work, we developed a set of subject-matched 2D and 3D facial representations, which we then analyzed with the aim of comparing the performance of 2D and 3D image-based approaches to computer-assisted syndrome diagnosis. This work represents the most comprehensive subject-matched analyses to date on this topic. In our analyses of 1907 subject faces representing 43 different genetic syndromes, 3D surface-based syndrome classification models significantly outperformed 2D image-based models trained and evaluated on the same subject faces. These results suggest that the clinical adoption of 3D facial scanning technology and continued collection of syndromic 3D facial scan data may substantially improve face-based syndrome diagnosis.

You have full access to this article via your institution.

Download PDF

Automated syndrome diagnosis by three-dimensional facial imaging

Article Open access 01 June 2020

PhenoScore quantifies phenotypic variation for rare genetic diseases by combining facial analysis with other clinical features using a machine-learning framework

Article 07 August 2023

Next generation phenotyping for diagnosis and phenotype–genotype correlations in Kabuki syndrome

Article Open access 28 January 2024

Introduction

Genetic syndromes represent an unusually difficult diagnostic challenge for even the most experienced clinicians, due to the large number, complexity, variability, and rarity of these disorders. Diagnoses are frequently delayed, are often initially incorrect, and patients often must proceed without even basic information regarding health and developmental outcomes, let alone tailored clinical care [1]. While advancements in gene-based technologies have greatly improved poor diagnostic rates, accurate testing is often unavailable, and complementary diagnostic modalities thus remain of great importance. Computer-assisted facial phenotyping is a complementary diagnostic modality that makes use of inexpensive, portable, and widely available facial imaging technologies, along with image processing and statistical methods. Dysmorphic (abnormal) facial features are associated with many human genetic syndromes [2] [3]. Several face-based approaches to genetic syndrome diagnosis have been developed to provide precision diagnostic assistance in a clinical context [4, 5].

Previous work

A variety of approaches for face-based syndrome diagnosis have been proposed using different types of facial representation as input [4, 6,7,8,9]. The most common forms of facial representation used for this purpose are 2D color images, 2.5D depth images, and 3D surface scans. To our knowledge, only one previous study has performed subject matched comparisons of 2D and 3D facial images for syndrome classification [10]. However, this study was limited to a single syndrome class (22q11.2 deletion) and the 2D images were colorless renderings of 3D surface scans that are not an accurate proxy for real 2D color or grayscale images.

Outside the application of genetic syndrome diagnosis, several studies have compared 2D and 3D facial representations for different purposes. Anas et al. [11] found that using 2D images and 3D surface scans for measuring facial morphology produced significantly different results, concluding that 3D imaging is a better approach to quantify facial morphologic phenotypes. Likewise, Zogheib et al. [12] found that 3D facial scanning produced more reliable facial measurements that were closer to the clinical standard than did 2D photographs. For the task of facial action unit detection, Savran et al. [13] found that, in general, 3D outperformed 2D facial representations. For the task of facial recognition, Chang et al. [14] found that 3D and 2D facial representations performed similarly. Nevertheless, many facial recognition researchers have turned their focus to 3D to overcome the recognized and inherent limitations of 2D photography [15]. Thus, many modern proprietary face-based bio-metric security applications (such as Face ID developed by Apple Inc.) now rely on 3D information [16].

2D color images

Most syndrome recognition models developed thus far rely on 2D color images. This is largely due to the relative ease of collecting 2D facial images using widely available digital camera technologies. As a result, databases of syndromic 2D images tend to be larger than those of syndromic 3D scans. Gurovich et al. [7] and Matthews et al. [4] provide detailed surveys of 2D image-based approaches and the different facial representations that they use. Anatomical landmarks as well as local geometry and texture features captured around each landmark are popular types of facial representations derived from 2D images. State-of-the-art approaches use deep convolutional neural networks (CNN) that take 2D color images as direct inputs [7, 17].

Despite the ease of acquisition, 2D facial images have some notable disadvantages for face-based syndrome classification. 2D facial images are projections of intrinsically 3D structures and, as such, may discard diagnostically relevant 3D morphologic information. 2D facial images are also highly sensitive to variations in illumination and pose. Even when controlling for illumination and pose, varying the distance from which a frontal 2D image is captured results in perspective distortions of the resultant 2D image (see Fig. 1) that could affect a diagnostic model. Finally, 2D images generally do not contain information about overall facial size, which may be an important diagnostic indicator for some syndromes. Therefore, additional calibration procedures or measurements must be used to capture true facial size information within or alongside 2D facial images.

**Fig. 1: Frontal renderings of the same example subject surface scan captured using different distances between the camera and the subject face.**

Depth images

In between 2D color photography and 3D surface scanning is depth imaging (sometimes called 2.5D imaging). Depth images are 2D grid-like representations, like 2D color images. However, each pixel in a color image corresponds to the color and intensity of light reflected by the subject, whereas each pixel in a depth image corresponds to the distance of the subject from the imaging device. Depth images are usually obtained using infrared or time-of-flight depth sensors and/or multiple cameras that can compute depth images using stereoscopic methods. Many modern smartphones have embedded hardware that supports depth imaging [16]. Depth images have been explored for face-based syndrome classification but are less common than 2D color images and 3D surface scans. Previous studies have used principal components analysis (PCA) and a naive Bayes classifier to analyze syndromic facial depth images [10]. Many of the limitations of 2D color images also apply to depth images (such as sensitivity to changes in subject pose). Nevertheless, unlike 2D color images, depth images are generally robust to differences in subject illumination.

3D surface representations

3D surface scanning is the most sophisticated approach for facial imaging. The typical output format of 3D surface scanning systems is a discrete 3D surface mesh consisting of 3D vertices connected by polygons, such as triangles. Surface color is sometimes captured along with a polygonal surface mesh, using per-vertex color information, or UV mapped texture images. Because 3D facial surface meshes are a loosely structured data type, it is common practice to identify facial landmarks or dense point correspondences to a reference surface mesh facilitating model training and inference (see “Facial Representations” for details).

3D surface scanning is rapidly becoming more widely accessible and user friendly, although not yet as widespread and easy to use as 2D color photography. Specialized structured light imaging systems, such as those sold by 3dMD^{Footnote 1} produce scans of the highest quality and accuracy but are not easily portable. Facial surface scans can also be acquired using some of the newest smartphones or by less expensive handheld devices. In general, 3D scan-based syndrome diagnosis approaches are less common than 2D color image-based approaches, but several advanced models have been developed that use 3D representations [8, 9].

In theory, a 3D surface representation should result in better syndrome classification performance than comparable 2D images. 3D surface representations can intrinsically capture 3D human facial structures without discarding information through a 2D projection. 3D surface scans are robust to variation in subject illumination and pose. Finally, 3D scanning systems are typically calibrated to accurately capture size information.

Contributions

Although intuition suggests that 3D surface representations should be superior to 2D representations for face-based syndrome diagnosis, there is a lack of quantitative evidence to support this claim. It is also unclear whether the performance benefits from 3D imaging (if any) are sufficient to justify the increased effort of scan acquisition. This gap largely exists because previously published 2D and 3D face-based syndrome classification approaches have been trained and evaluated on different datasets with different numbers of subjects, different genetic syndrome classes and composition of the overall data set, and different demographic distributions within the data. Thus, it is difficult to draw any robust conclusions about facial representations by comparing previously published results. In this work, we describe the creation of parallel 2D and 3D facial representations from matched subjects in a large and diverse syndromic population. We also report the results from subject-matched analyses of four different 2D and 3D facial representations.

In summary, our experiments are the first to directly compare 2D and 3D face-based syndrome diagnosis models using identical patient faces for training and evaluation. This is important for two reasons. The first reason is that the use of different evaluation data can affect the metrics used to evaluate diagnostic models. The second is that the performance of diagnostic models is influenced by the composition, quality, and amount of data used to train the model. Thus, results from models trained on different patient data may reflect differences in the amount, quality, and composition of the training data, rather than the differences between 2D and 3D imaging modalities. Our experiments isolate the effects of using different facial representations on diagnostic model accuracy, providing empirical justification for continued research, data-collection, and model development using 3D facial imaging modalities.

Materials and methods

Data description

1907 3D surface scans from subjects with 43 different genetic syndromes were used in our experiments. Each syndrome was represented by at least 20 subjects. The 3D scans were acquired using a 3DMD facial imaging system from patients across the United States and Canada and are available through application to the Face Base consortium^{Footnote 2}. All scans were in the format of polygonal meshes with additional per-vertex color information. The demographic distribution of the subjects is shown in Fig. 2 and Table 1.

**Fig. 2: Subject demographic histogram.**

Table 1 The syndrome class distribution of the facial data used in this study as well as per-syndrome accuracy statistics for the top performing 2D and 3D models.

Full size table

Facial representations

The following subsections describe the creation of the different 3D and 2D facial representations for each subject used in our comparative analyses. All representations were derived using a single raw 3D surface scan from each subject so that any variability in the imaging conditions (e.g., facial expression and illumination) is constant across the different representations. Figure 3 shows the image-like facial representations used in this study for an example subject.

3D surfaces

Just as 2D cameras can produce images with different numbers of pixels, raw 3D surface scans may have different numbers of vertices that are connected in different ways by different polygonal faces. Therefore, prior to model training and inference, raw 3D surface scans are typically processed to produce standardized 3D surface representations with a uniform number of vertices and a common mesh topology. To achieve this, dense vertex correspondences are identified between a reference facial mesh and each raw facial scan.

To facilitate the processing of raw 3D surface scans, eight anatomical landmarks were first identified on each scan using the automatic approach described in [18]. Next, dense vertex correspondences were estimated between a reference facial mesh and each raw facial scan using the non-rigid iterative closest point algorithm [19] guided by the anatomical landmarks. The reference mesh used in this work contains 26,649 vertices and the facial region that is covered by the reference mesh is shown in Fig. 3. Next, the standardized vertex configurations extracted from each raw facial scan were rigidly aligned to one another employing Procrustes alignment. Thus, diagnostically irrelevant information related to facial position and orientation was removed from the data while information about facial size and shape was retained. Finally, the aligned vertex coordinate information for each subject was flattened into a vector of length 3 × 26649 to be used for model training and inference.