Fully automated landmarking and facial segmentation on 3D photographs

Berends, Bo; Bielevelt, Freek; Schreurs, Ruud; Vinayahalingam, Shankeeth; Maal, Thomas; de Jong, Guido

doi:10.1038/s41598-024-56956-9

Download PDF

Article
Open access
Published: 18 March 2024

Fully automated landmarking and facial segmentation on 3D photographs

Bo Berends¹,
Freek Bielevelt¹,
Ruud Schreurs^1,2,
Shankeeth Vinayahalingam³,
Thomas Maal¹ &
…
Guido de Jong¹

Scientific Reports volume 14, Article number: 6463 (2024) Cite this article

432 Accesses
Metrics details

Subjects

Abstract

Three-dimensional facial stereophotogrammetry provides a detailed representation of craniofacial soft tissue without the use of ionizing radiation. While manual annotation of landmarks serves as the current gold standard for cephalometric analysis, it is a time-consuming process and is prone to human error. The aim in this study was to develop and evaluate an automated cephalometric annotation method using a deep learning-based approach. Ten landmarks were manually annotated on 2897 3D facial photographs. The automated landmarking workflow involved two successive DiffusionNet models. The dataset was randomly divided into a training and test dataset. The precision of the workflow was evaluated by calculating the Euclidean distances between the automated and manual landmarks and compared to the intra-observer and inter-observer variability of manual annotation and a semi-automated landmarking method. The workflow was successful in 98.6% of all test cases. The deep learning-based landmarking method achieved precise and consistent landmark annotation. The mean precision of 1.69 ± 1.15 mm was comparable to the inter-observer variability (1.31 ± 0.91 mm) of manual annotation. Automated landmark annotation on 3D photographs was achieved with the DiffusionNet-based approach. The proposed method allows quantitative analysis of large datasets and may be used in diagnosis, follow-up, and virtual surgical planning.

Automatic identification of posteroanterior cephalometric landmarks using a novel deep learning algorithm: a comparative study with human experts

Article Open access 19 September 2023

Cephalometric landmark detection without X-rays combining coordinate regression and heatmap regression

Article Open access 16 November 2023

A rapid identification method for soft tissue markers of dentofacial deformities based on heatmap regression

Article Open access 01 March 2024

Introduction

The fields of genetics, orthodontics, craniomaxillofacial surgery, and plastic surgery have greatly benefitted from advances in imaging technology, particularly in three-dimensional (3D) imaging. Within these fields, cone-beam computed tomography (CBCT) has become a well-established imaging technique as an alternative for computed tomography (CT) imaging, the conventional imaging method for depicting hard tissues¹. CBCT provides a high-resolution multiplanar reconstruction of the craniofacial skeleton and facial soft tissue. However, a drawback of CT and CBCT is the use of ionizing radiation. In contrast, 3D stereophotogrammetry can capture a detailed and accurate representation of craniofacial soft tissue without the use of ionizing radiation and, therefore, has gained popularity in soft-tissue analysis^2,3,4.

Cephalometric analysis can be performed on 3D stereophotographs to extract information about the position of individual landmarks or distances and angles between several landmarks, with the purpose of objectifying clinical observations⁵. Despite being a commonly used diagnostic tool in the craniofacial region, landmarking often remains a manual task that is time-consuming, prone to observer variability, and affected by observer fatigue and skill level^6,7. Therefore, there has been a growing interest in using artificial intelligence (AI), such as deep learning and machine learning algorithms to automate the landmark identification process.

Several studies have described the use of deep learning algorithms for the automation of hard-tissue landmark extraction for cephalometric analysis^5,8. Studies that include soft-tissue landmarks utilize (projective) 2D imaging, are pose dependent, or require manual input^9,10. Only a limited number of studies were performed on the automated extraction of facial soft-tissue landmarks from 3D photographs^11,12. Nevertheless, since the studies in question did not integrate deep learning algorithms into their methodologies, and considering that deep learning has demonstrated enhanced accuracy in hard tissue landmarking, the effectiveness of soft tissue landmarking could potentially see improvements through the integration of deep learning techniques⁵. Therefore, this study aimed to develop and validate an automated approach for the extraction of soft-tissue facial landmarks from 3D photographs using deep learning.

Material and methods

Data acquisition

In total, 3188 3D facial photographs were collected from two databases: the Headspace database (n = 1519) and the Radboudumc’s database (n = 1669)^13,14. The Radboudumc’s data consisted of healthy volunteers (n = 1153) and Oral and Maxillofacial Surgery patients (n = 516). The Radboudumc dataset was collected in accordance with the World Medical Association Declaration of Helsinki on medical research ethics. The following ethical approvals and waivers were used: Commissie Mensgebonden Onderzoek (CMO) Arnhem-Nijmegen (Nijmegen, the Netherlands) #2007/163; ARB NL 17934.091.07; CMO Arnhem-Nijmegen (Nijmegen, the Netherlands) #2019-5793. All data were captured using 3dMD’s 5-pod 3dMDhead systems (3dMDCranial, 3dMD, Atlanta, Georgia USA). Exclusion criteria were large gaps within the mesh, stitching errors, excessive facial hair interfering with the facial landmarks, meshes that lacked texture (color information), and mesh-texture mismatches. An overview of the data is presented in Table 1.

Table 1 Dataset distribution with the percentage of the total dataset.

Full size table

Data annotation

The 3D photographs were manually annotated by a single observer using the 3DMedX^® software (v1.2.29.0, 3D Lab Radboudumc, Nijmegen, The Netherlands; details can be found at https://3dmedx.nl). The following ten cephalometric facial landmarks were annotated: exocanthions, endocanthions, nasion, nose tip, alares, and cheilions. The 3D landmarks were annotated on the mesh surfaces (faces) independent of vertex coordinates. The texture of the 3D photographs was used in the annotation process as a visual cue. Manual annotation was repeated on 50 randomly selected 3D photos by the first observer, a second observer, and a third observer to assess the intra-observer and inter-observer variability.

Automated landmarking workflow

The automated landmarking workflow was developed for the same cephalometric landmarks (exocanthions, endocanthions, nasion, nose tip, alares, and cheilions). It consisted of four main steps: (1) rough prediction of landmarks using an initial DiffusionNet on the original meshes; (2) realignment of the meshes based on the roughly predicted landmarks; (3) segmentation of the facial region through fitting of a template facial mesh using a morphable model; (4) refined landmark prediction on the segmented meshes using a final DiffusionNet¹⁵. The DiffusionNet models used spatial features only and did not use texture information for the automated landmarking task. An overview of the workflow can be seen in Fig. 1.

DiffusionNet

DiffusionNet is a deep learning algorithm able to learn upon non-uniform geometric data, like 3D curved surfaces. It is designed to address challenges associated with the representation and analysis of surfaces in the field of geometric deep learning. DiffusionNet was used for this study as it is a highly accurate method for performing segmentation tasks, also when compared to other state-of-the-art mesh-based deep-learning algorithms. A key feature is the relative mesh sampling/resolution robustness of the algorithm as well as the invariance to input orientations. The DiffusionNet consists of three main components: a multilayer perceptron (MLP) for feature transformation, a diffusion layer responsible for information propagation, and a spatial gradient enabling the network to understand and learn. The architecture of the DiffusionNet depends on the configuration of several adjustable parameters. These include C-width, which corresponds to the size of the DiffusionNet and N-blocks, corresponding to the number of DiffusionNet blocks. In addition, the size of the (per-vertex) MLP can be altered by adjustment of its layer size¹⁵.

Training

The data were randomly divided into two sets, 85% for training and 15% for testing of the DiffusionNet models. As a data augmentation step, the 3D meshes from the training dataset were mirrored over the YZ plane to double the number of scans available for training. No validation set was used during training.

Step 1: Rough prediction of landmarks

A DiffusionNet was utilized for initial prediction of the exocanthions, endocanthions, nasion, nose tip, alares, and cheilions as visualized in Fig. 2¹⁵.

Preprocessing

To speed up the training process for rough landmark prediction, each mesh was downsampled to a maximum of 25.000 vertices¹⁶. After downsampling, the 3D meshes had a mean inter-vertex distance of 2.035 ± 0.356 mm. Subsequently, a mask was applied, assigning a value of 1 to all vertices located within 5 mm Euclidean distance to the manually annotated landmarks and a value of 0 to the remaining vertices.

DiffusionNet configuration for the first semantic segmentation task

Six output channels were configured for the first semantic segmentation task. The two midsagittal landmarks (nasion and nose tip) were assigned an individual channel. The four bilateral landmark pairs were assigned to the four remaining channels. The DiffusionNet model was configured with a C-width of 256, an MLP of 256 and 256 channels, and an N-block of 12. The network used an Adam optimizer with a Cosine Annealing learning rate of 2 × 10⁻⁵ and a T_max of 50 epochs, representing the number of epochs at which the learning rate is annealed to its minimum and starts increasing again. Furthermore, a binary cross-entropy loss and a dropout rate of 0.10 were applied. Since the orientation and position of the included 3D meshes was not fixed, the network was trained with Heat Kernel Signature (HKS) Features of the 3D meshes. The final output layer was linear. The model was implemented in PyTorch on a 24 GB NVIDIA RTX A5000 GPU and trained for 200 epochs.

Post-processing

After the semantic segmentation, the model was used to predict which vertices belonged to each of the configured channels. For the symmetrical landmarks, a 3D clustering algorithm was utilized to distinguish the predicted vertex clusters from each other. Subsequently, a weighted combination of the output values (activations), as well as the locations of each of the vertices that received a non-zero activation value, were used to determine the landmark positions using Eq. 1.

$$Predicted\;landmark\;location = \frac{{\sum \left( {10^{{Activations_{1,2, \ldots ,i} }} *coordinate_{1,2, \ldots ,i} } \right)}}{{\sum 10^{{Activations_{1,2, \ldots ,i} }} }}$$

(1)

A plane was formed by connecting the predicted nasion, nose tip, and cheilion midpoint to establish if the bilateral landmarks were on the left or right side of the face using the plane equation.

Step 2: Realignment

Based on the rough prediction of the exocanthions, nasion, and cheilions, the 3D meshes were positioned in a reference frame. The nasion was defined as the origin, with the x-axis running parallel to the line connecting both exocanthions and the z-axis parallel to the nasion-cheilion midpoint line (Fig. 1).

Step 3: Facial region segmentation

Facial segmentation was used to standardize the realigned 3D meshes for refined landmark prediction and reduce processing time without the need for downsampling. The MeshMonk algorithm, which utilizes a combination of rigid and non-rigid template matching, was utilized for segmentation (implemented in C++)¹⁰. The default face template of MeshMonk was used. The exocanthions, nose tip, and cheilions (step 2) were used for the initial registration of the facial template mesh to the 3D meshes. The configuration of the MeshMonk fitting algorithm is given in the Appendix. An additional margin of 15 mm around the fitted MeshMonk template was included to supply the final DiffusionNet with more context while still excluding hair regions (Fig. 1). After facial segmentation, the mean inter-vertex distance of the 3D meshes was 1.39 ± 0.16 mm.

Besides facial segmentation, the MeshMonk algorithm can also be utilized for landmark annotation. The ten landmarks were collected using this semi-automatic approach to serve as a reference for the precision of the automated annotation approach. In contrast to the automated approach, the manually annotated landmarks were used for template fitting in the semi-automated approach to comply with the MeshMonk workflow¹⁰.

Step 4: Refined landmark prediction

A second semantic segmentation task, using DiffusionNet, was used to predict the landmarks on the realigned and segmented 3D meshes (Fig. 3).

Preprocessing

In contrast to step 1, the meshes were not downsampled as it is made unnecessary by the facial region segmentation step. A mask was created in which the vertices within 3.5 mm of the manually annotated landmarks were assigned value 1 and the other vertices were assigned value 0. Default mesh normalization and scaling were applied as provided by the DiffusionNet package¹⁵.

DiffusionNet configuration for the second semantic segmentation task

For the second semantic segmentation task, ten output channels were configured: each individual landmark was assigned to an individual channel. The DiffusionNet was configured with a C-width of 384, an MLP of 768, and an N-blocks of 12. Compared to the first network, the same optimizer, loss, and drop-out were used. However, this second network was trained with XYZ settings instead of HKS settings as the rotation invariance was no longer present after step 3 (Fig. 1). The model was implemented in PyTorch on a 24 GB NVIDIA RTX A5000 GPU and trained for 200 epochs. The final output layer was linear.

Post-processing

A weighted combination of the activations, supplemented by the locations of each of the vertices, was again used to determine the final landmark positions.

Statistical analysis

Statistical analyses were performed on available patient characteristics to assess differences between the source databases and between the training and test data. To assess the intra-observer and inter-observer variability of the manual annotation method, the Euclidean distances between the landmarks annotated by the different observers were calculated. Descriptive statistics were used to summarize the results. The Euclidean distances between the predicted and the manually annotated landmarks were calculated for every test set to evaluate the performance of automated landmarking; descriptive statistics were used for summarizing the results. This was done for both the rough (initial DiffusionNet) and the refined (final DiffusionNet) predictions. The performance of the automated landmarking workflow was compared to the intra-observer and inter-observer variability of the manual annotation method.

The Euclidean distances between the manually annotated landmarks and the predictions by the semi-automated MeshMonk method were calculated and compared to the precision of the refined predictions using a one-way repeated measures ANOVA test. A p value < 0.05 was used as a cut-off value for statistical significance.

Ethical approval and Informed consent

CMO Arnhem-Nijmegen (Nijmegen, the Netherlands) #2007/163; ARB NL 17934.091.07; CMO Arnhem-Nijmegen (Nijmegen, the Netherlands) #2019-5793. Images of a 3D photo of the first author, who gave his informed consent to publish images in this online open-access publication, were used in Figs. 1, 2 and 3.

Results

Based on the stated exclusion criteria, 291 3D photographs were excluded, yielding a total of 2897 3D photographs that were used for training and testing of the developed workflow (Table 1). Most of the exclusions were due to the lack of texture information (n = 271). The age and gender characteristics are given in Table 2. A statistically significant difference was found for age and gender between the source databases (p < 0.001 and p < 0.001, respectively). However, there were no statistically significant differences between ages and genders of the training and test splits (p = 0.323 and p = 0.479, respectively). There were no unknown genders or ages in the test dataset. The training dataset held one transgender case and had five unknown ages.

Table 2 Population characteristics per dataset.

Full size table

The intra-observer and interobserver differences of the manual annotation method are summarized in Tables 3 and 4, respectively. The overall mean intra-observer variability for manual annotation of the ten landmarks was 0.94 ± 0.71 mm; the overall mean interobserver variability was 1.31 ± 0.91 mm.

Table 3 The intra-observer variability for all ten landmarks is stated in millimeters ± standard deviation.

Full size table

Table 4 The interobserver variability is computed by comparing the Euclidean distance between annotations made by three different observers.

Full size table

The initial DiffusionNet showed an average precision of 2.66 ± 2.37 mm, and the complete workflow achieved a precision of 1.69 ± 1.15 mm. The performance of both models is summarized in Table 5. The workflow could be completed for 98.6% of the test data; for six 3D photos (1.4%), one of the rough landmarks required for the consecutive steps could not be predicted by the first DiffusionNet. Upon visual inspection, the six excluded 3D photos contained large gaps and/or substantial amounts of information outside the region of interest, such as clothing or hair. Since the workflow could not be completed, these data sets were excluded from the results.

Table 5 The precision of the rough (first DiffusionNet) and refined (second DiffusionNet) is determined by computing the Euclidean distance between the DiffusionNet-predicted and manually annotated landmarks and is stated in millimeters ± standard deviation.

Full size table

The precision was within 2 mm for 69% of the refined predicted landmarks, within 3 mm for 89% of the landmarks, and within 4 mm for 96% of the landmarks. Table 6 details the precision within these boundaries for the individual landmarks. The exocanthions and alares were found to perform the worst. The precision of the semi-automated MeshMonk method was on average 1.97 ± 1.34 mm for the ten landmarks (Fig. 4). Compared to this semi-automatic method, the DiffusionNet-based method was found to have significantly better precision for the left exocanthion, endocanthions, nose tip, and cheilions and worse precision for the alares; no significant differences were found for nasion and right exocanthion.

Table 6 Overview of the accuracy distribution of each landmark as predicted by the complete workflow.

Full size table

Discussion

Soft-tissue cephalometric analysis can be used to objectify the clinical observations on 3D photographs, but manual annotation, the current gold standard, is time-consuming and tedious. Therefore, this study developed a deep learning-based approach for automated landmark extraction from randomly oriented 3D photographs. The performance was assessed for ten cephalometric landmarks: the results showed that the deep-learning-based landmarking method was precise and consistent, with a precision that approximated the inter-observer variability of the manual annotation method. A precision < 2 mm, which may be considered a cut-off value for clinical relevance, was seen for 69% of the predicted landmarks^17,18.

In the field of craniofacial surgery, different studies have applied deep-learning models for automated cephalometric landmarking, mainly focusing on 2D and 3D radiographs. Dot et al. used a SpatialConfiguration-Net for the automated annotation of 33 different 3D hard-tissue landmarks from CT images and achieved a precision of 1.0 ± 1.3 mm¹⁹. An automated landmarking method, based on multi-stage deep reinforcement learning and volume-rendered imaging, was proposed by Kang et al. and yielded a precision of 1.96 ± 0.78 mm²⁰. A systematic review by Serafin et al. found a mean precision of 2.44 mm for the prediction of 3D hard-tissue landmarks from CT and CBCT images⁵.

Some studies did describe automated algorithms for 3D soft tissue landmarking on 3D photographs, but these algorithms did not include deep learning models. Baksi et al. described an automated method, involving morphing of a template mesh, for the landmarking of 22 soft-tissue landmarks from 3D photographs that achieved a precision of 3.2 ± 1.6 mm¹¹. An automated principal component analysis-based method, described by Guo et al., achieved an average root mean square error of 1.7 mm for the landmarking 17 soft-tissue landmarks from 3D photographs¹².

Studies that did apply deep learning for landmark detection on 3D meshes mostly focused on landmarking on intra-oral scans. Wu et al. utilized a two-stage deep learning framework for the prediction of 44 dental landmarks and achieved a mean absolute error of 0.623 ± 0.718 mm²¹. DentalPointNet, consisting of two sub-networks; a region proposal network and a refinement network, as described by Lang et al.²², achieved an average localization error of 0.24 mm for the detection of 68 landmarks. Even though the precision achieved in these studies is better than was achieved in this study, a direct comparison is infeasible to make due to the difference in landmarks, datasets, object size, and imaging modalities used. Therefore, future research should investigate the efficacy of these methods for cephalometric soft-tissue landmarking on 3D photographs. However, as the precision achieved by this method (1.69 ± 1.15 mm) closely aligns with the inter-observer variability (1.31 ± 0.91 mm) observed for the ten landmarks utilized, it is improbable that employing an alternative machine learning-based method trained and evaluated on the same dataset would yield large improvements in precision. Especially beyond the inter-observer variability, as the evaluation of the developed method is confined to the precision level dictated by the inter-observer variability of the manual annotation method.

The effect of landmark choice on established precision is underlined by the MeshMonk results found in this study. In the original publication by White et al., an average error of 1.26 mm for 19 soft-tissue landmarks was seen¹⁰. The same methodology was used to establish the precision for the ten landmarks used in this study, and an overall precision of 1.97 ± 1.34 mm was found. This finding highlights the difficulty in comparing landmarking precision from literature. Compared to the semi-automatic method, the fully-automated workflow yielded significantly improved precision for six landmarks, emphasizing the feasibility of fully-automatically annotating soft-tissue landmarks from 3D photos using deep learning.

The proposed workflow uses two successive networks and additional algorithms for alignment and facial segmentation. Advantages of this approach include that the DiffusionNet assures robustness against sampling densities and the HKS settings inherently account for rotational, positional, and scale invariance that may arise between different 3D photography systems. Input resolution can be considered a limiting factor for the maximum landmark detection performance. This can partially explain the differences as seen in the examples mentioned earlier (e.g. intra-oral scans have a much higher mesh resolution as compared to e.g. 3D photos or CT-scans). Furthermore, as DiffusionNet natively gives a per vertex prediction, the precision of the landmarking method is dependent on the vertex density of the 3D meshes. Therefore, the original high resolution meshes were used for the refined landmark prediction step. Since the achieved precision exceeded half the inter-vertex distance, it is expected that the impact of the inter-vertex distance on the precision was negligible. Moreover, employing the weighted mean vertex location for transforming the predicted vertices into landmark coordinates ensures that the predicted landmarks are not constrained solely to the vertex coordinates of the mesh.

A limitation of the current study is that the workflow was only applied to 3D photographs captured using one 3D photography system. Despite the robust nature of DiffusionNet/HKS, the performance of the workflow might be affected when applied to 3D photographs captured with different hardware. Furthermore, the DiffusionNet models were only trained on spatial features, whereas in the manual annotation process texture information was used. Even though this has the advantage of making the DiffusionNet models insensitive to variations in skin tone or color, landmarks such as the exocanthions, endocanthions, and cheilions could presumably be located more precisely using manual annotation. This would not apply to the landmarks lacking color transitions, such as the nasion and nose tip. Based on these presumptions, the DiffusionNet-based approach might achieve a better precision if texture data of the 3D photographs would be available to the networks.

Another limitation of the proposed workflow arises from the utilization of HKS settings in the initial DiffusionNet, leading to occasional issues with random left–right flipping in the predictions of symmetrical landmarks (e.g., exocanthions). To overcome this challenge, a solution was devised that involved detecting symmetrical landmarks within a single channel. Subsequently, both landmarks were distinguished from each other using a clustering algorithm, followed by a left–right classification based on the midsagittal plane. Although a success rate of 98.6% was achieved using this solution, the workflow failed when the initial DiffusionNet was unable to predict one of the landmarks in the midsagittal plane (nasion, nose tip, or cheilion midpoint). Since this was mainly due to suboptimal quality of the 3D photo, it might be prevented by optimizing image acquisition. For optimal performance of the workflow, it is important to minimize gaps and restrict the depicted area in 3D photos to the face.

Within this workflow, ten clinically relevant yet rather easily identifiable and reproducible landmarks, were incorporated to provide a solid gold standard. However, the constraint of utilizing only ten landmarks may impede the adaptability of the current method to a broader anatomical region or specific research objectives. The effectiveness of the developed workflow for the detection of more challenging landmarks should be investigated in future research. However, adapting the current workflow to accommodate additional landmarks or different anatomical regions may necessitate modifications.

Due to its high precision and consistency, the developed automated landmarking method has the potential to be applied in various fields. Possible applications include objective follow-up and analysis of soft-tissue facial deformities, growth evaluation, facial asymmetry assessment, and integration in virtual planning software for 3D backward planning^23,24. Considering that the proposed DiffusionNet-based approach only uses spatial features, it could be applied on 3D meshes of facial soft-tissue that are derived from imaging modalities lacking texture, such as CT, CBCT, or MRI. Nevertheless, further research is necessary to ascertain the applicability of this workflow to these imaging modalities. The fully-automated nature of the workflow also enables cephalometric analysis on large-scale datasets, presenting significant value for research purposes. The position-independency of the workflow might make it suitable for automated landmarking in 4D stereophotogrammetry and give rise to real-time cephalometric movement analysis for diagnostic purposes^25,26.

Conclusion

In conclusion, a deep learning-based approach for automated landmark extraction from 3D facial photographs was developed and its precision was evaluated. The results showed high precision and consistency in landmark annotation, comparable to manual and semi-automatic annotation methods. Automated landmarking methods offer potential for analyzing large datasets, with applications in orthodontics, genetics, and craniofacial surgery and in emerging new imaging techniques like 4D stereophotogrammetry.

Data availability

Coded scripts are available within the following GitHub repository: https://github.com/rumc3dlab/3dlandmarkdetection/.

References

Ludlow, J. B. & Ivanovic, M. Comparative dosimetry of dental CBCT devices and 64-slice CT for oral and maxillofacial radiology. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. Endod. 106, 106–114. https://doi.org/10.1016/j.tripleo.2008.03.018 (2008).
Article PubMed Google Scholar
Dindaroğlu, F., Kutlu, P., Duran, G. S. & Görgülü, S. Accuracy and reliability of 3D stereophotogrammetry: A comparison to direct anthropometry and 2D photogrammetry. Angle Orthod. 86, 487–494. https://doi.org/10.2319/041415-244.1 (2016).
Article PubMed PubMed Central Google Scholar
Heike, C. L., Upson, K., Stuhaug, E. & Weinberg, S. M. 3D digital stereophotogrammetry: A practical guide to facial image acquisition. Head Face Med. 6, 18. https://doi.org/10.1186/1746-160X-6-18 (2010).
Article PubMed PubMed Central Google Scholar
Liu, J. et al. Reliability of stereophotogrammetry for area measurement in the periocular region. Aesthet. Plast. Surg. 45, 1601–1610. https://doi.org/10.1007/s00266-020-02091-5 (2021).
Article Google Scholar
Serafin, M. et al. Accuracy of automated 3D cephalometric landmarks by deep learning algorithms: Systematic review and meta-analysis. La Radiol. Med. 128, 544–555. https://doi.org/10.1007/s11547-023-01629-2 (2023).
Article Google Scholar
Park, J.-H. et al. Automated identification of cephalometric landmarks: Part 1—Comparisons between the latest deep-learning methods YOLOV3 and SSD. Angle Orthod. 89, 903–909. https://doi.org/10.2319/022019-127.1 (2019).
Article PubMed PubMed Central Google Scholar
Stewart, R. F., Edgar, H., Tatlock, C. & Kroth, P. J. Developing a standardized cephalometric vocabulary: Choices and possible strategies. J. Dent. Educ. 72, 989–997 (2008).
Article PubMed PubMed Central Google Scholar
Guo, Y. et al. Deep learning for 3D point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2020.3005434 (2020).
Article PubMed Google Scholar
Manal, E. R., Arsalane, Z. & Aicha, M. Survey on the approaches based geometric information for 3D face landmarks detection. IET Image Process. 13, 1225–1231. https://doi.org/10.1049/iet-ipr.2018.6117 (2019).
Article Google Scholar
White, J. D. et al. MeshMonk: Open-source large-scale intensive 3D phenotyping. Sci. Rep. 9, 6085. https://doi.org/10.1038/s41598-019-42533-y (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Baksi, S., Freezer, S., Matsumoto, T. & Dreyer, C. Accuracy of an automated method of 3D soft tissue landmark detection. Eur. J. Orthod. 43, 622–630. https://doi.org/10.1093/ejo/cjaa069 (2021).
Article PubMed Google Scholar
Guo, J., Mei, X. & Tang, K. Automatic landmark annotation and dense correspondence registration for 3D human facial images. BMC Bioinform. 14, 232. https://doi.org/10.1186/1471-2105-14-232 (2013).
Article Google Scholar
Dai, H., Pears, N., Smith, W. & Duncan, C. Statistical modeling of craniofacial shape and texture. Int. J. Comput. Vis. 128, 547–571. https://doi.org/10.1007/s11263-019-01260-7 (2020).
Article Google Scholar
Pears, N. E., Duncan, C., Smith, W. A. P. & Dai, H. The Headspace dataset (2018).
Sharp, N., Attaiki, S., Crane, K. & Ovsjanikov, M. DiffusionNet: Discretization agnostic learning on surfaces. ACM Trans. Graph. https://doi.org/10.1145/3507905 (2022).
Article Google Scholar
Garland, M. & Heckbert, P. S. in Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, 209–216 (ACM Press, 1997).
Hsu, S. S. et al. Accuracy of a computer-aided surgical simulation protocol for orthognathic surgery: A prospective multicenter study. J. Oral Maxillofac. Surg. 71, 128–142. https://doi.org/10.1016/j.joms.2012.03.027 (2013).
Article PubMed Google Scholar
Schouman, T. et al. Accuracy evaluation of CAD/CAM generated splints in orthognathic surgery: A cadaveric study. Head Face Med. https://doi.org/10.1186/s13005-015-0082-9 (2015).
Article PubMed PubMed Central Google Scholar
Dot, G. et al. Automatic 3-dimensional cephalometric landmarking via deep learning. J. Dent. Res. 101, 1380–1387. https://doi.org/10.1177/00220345221112333 (2022).
Article CAS PubMed Google Scholar
Kang, S. H., Jeon, K., Kang, S.-H. & Lee, S.-H. 3D cephalometric landmark detection by multiple stage deep reinforcement learning. Sci. Rep. 11, 17509. https://doi.org/10.1038/s41598-021-97116-7 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Wu, T.-H. et al. Two-stage mesh deep learning for automated tooth segmentation and landmark localization on 3D intraoral scans. IEEE Trans. Med. Imaging 41, 3158–3166. https://doi.org/10.1109/TMI.2022.3180343 (2022).
Article PubMed PubMed Central Google Scholar
Lang, Y. et al. Medical Image Computing and Computer Assisted Intervention—MICCAI 2022, 444–452 (Springer, 2022).
Memon, A. R., Li, J., Egger, J. & Chen, X. A review on patient-specific facial and cranial implant design using Artificial Intelligence (AI) techniques. Expert Rev. Med. Devices 18, 985–994. https://doi.org/10.1080/17434440.2021.1969914 (2021).
Article CAS PubMed Google Scholar
Tel, A. et al. Systematic review of the software used for virtual surgical planning in craniomaxillofacial surgery over the last decade. Int. J. Oral Maxillofac. Surg. 52, 775–786. https://doi.org/10.1016/j.ijom.2022.11.011 (2023).
Article CAS PubMed Google Scholar
Harkel, T. C. T. et al. Reliability and agreement of 3D anthropometric measurements in facial palsy patients using a low-cost 4D imaging system. IEEE Trans. Neural Syst. Rehabil. Eng. 28, 1817–1824. https://doi.org/10.1109/TNSRE.2020.3007532 (2020).
Article PubMed Google Scholar
Shujaat, S. et al. The clinical application of three-dimensional motion capture (4D): A novel approach to quantify the dynamics of facial animations. Int. J. Oral Maxillofac. Surg. 43, 907–916. https://doi.org/10.1016/j.ijom.2014.01.010 (2014).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors wish to thank Lisette Masselink for performing the annotations and Luca Carotenuto for exploring the different approaches and network architectures that led to the conceptualization of the research.

Funding

This research did not receive any specific Grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

3D Lab Radboudumc, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525 GA, Nijmegen, The Netherlands
Bo Berends, Freek Bielevelt, Ruud Schreurs, Thomas Maal & Guido de Jong
Department of Oral and Maxillofacial Surgery, Amsterdam University Medical Center (UMC), AMC, Academic Center for Dentistry Amsterdam (ACTA), Meibergdreef 9, 1105 AZ, Amsterdam, The Netherlands
Ruud Schreurs
Department of Oral and Maxillofacial Surgery, Radboud University Medical Center Nijmegen, Geert Grooteplein Zuid 10, 6525 GA, Nijmegen, The Netherlands
Shankeeth Vinayahalingam

Authors

Bo Berends
View author publications
You can also search for this author in PubMed Google Scholar
Freek Bielevelt
View author publications
You can also search for this author in PubMed Google Scholar
Ruud Schreurs
View author publications
You can also search for this author in PubMed Google Scholar
Shankeeth Vinayahalingam
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Maal
View author publications
You can also search for this author in PubMed Google Scholar
Guido de Jong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

BB: Conceptualization, Methodology, Software, Validation, Writing—Original Draft. FB: Conceptualization, Methodology, Software, Validation, Writing—Original Draft. RS: Conceptualization, Writing—Review and Editing. SV: Writing—Review and Editing. TM: Supervision. GdJ: Conceptualization, Software, Formal Analysis, Supervision.

Corresponding author

Correspondence to Bo Berends.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Berends, B., Bielevelt, F., Schreurs, R. et al. Fully automated landmarking and facial segmentation on 3D photographs. Sci Rep 14, 6463 (2024). https://doi.org/10.1038/s41598-024-56956-9

Download citation

Received: 17 November 2023
Accepted: 13 March 2024
Published: 18 March 2024
DOI: https://doi.org/10.1038/s41598-024-56956-9

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Automatic identification of posteroanterior cephalometric landmarks using a novel deep learning algorithm: a comparative study with human experts

Cephalometric landmark detection without X-rays combining coordinate regression and heatmap regression

A rapid identification method for soft tissue markers of dentofacial deformities based on heatmap regression

Introduction

Material and methods

Data acquisition

Data annotation

Automated landmarking workflow

DiffusionNet

Training

Step 1: Rough prediction of landmarks

Preprocessing

DiffusionNet configuration for the first semantic segmentation task

Post-processing

Step 2: Realignment

Step 3: Facial region segmentation

Step 4: Refined landmark prediction

Preprocessing

DiffusionNet configuration for the second semantic segmentation task

Post-processing

Statistical analysis

Ethical approval and Informed consent

Results

Discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Comments

Search

Quick links